Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disarm mem::uninitialized by having it initialize to an arbitrary valid value for each type #87675

Closed
bstrie opened this issue Jul 31, 2021 · 7 comments · Fixed by #99182
Closed
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@bstrie
Copy link
Contributor

bstrie commented Jul 31, 2021

For a while it has been understood that the mem::uninitialized API is broken. Originally the intuitive understanding of this API was that it produced a fixed, arbitrary value. However (as extensively discussed elsewhere) uninitialized memory is not a “fixed, arbitrary value”, and that for nearly all types in Rust it is instantaneous undefined behavior for them to be uninitialized.

What’s worse, even initialized values can be insta-UB. Rust uses its understanding of valid bit patterns to perform layout optimizations whereby invalid values can be repurposed as enum tags, which is how Option<&T> can be only a single word. Thus even mem::uninitialized’s sibling mem::zeroed is insta-UB when used with types like &T.

As a result, mem::uninitialized was deprecated and replaced with mem::MaybeUninit, which avoids the problems of the former. In addition, both mem::zeroed and mem::uninitialized were altered such that they will attempt to detect (and panic) when used on certain types: the former on types that must not be zero, and the latter on any types with invalid (defined) values.

However, implementing these panic checks caused a great deal of breakage (which arguably is desirable for safety, although still extremely disruptive), and to reduce disruption the check is conservative instead of exhaustive (#66151). Unfortunately, while improving the coverage of these checks will still leave mem::zeroed as perfectly usable, mem::uninitialized will be rendered all but unusable, as essentially all types cannot ever be in an uninitialized state.

This is a problem for legacy crates that were never migrated away from mem::uninitialized. However, there is a solution that both allows these legacy crates to compile while also avoiding the problem of invalid uninitialized values: mem::uninitialized can initialize with a valid value. This may seem contrary to the original intent of the API, but consider that the only reason to avoid initialization is performance, and that the choice is now between “my code doesn’t compile”, “my code contains undefined behavior”, and “my code is slower”; the latter is the most desirable outcome of the three.

 This raises the question: what value to initialize with? PR #87032 proposed the simplest option, which was to replace the innards of mem::uninitialized with mem::zeroed, however zero is the value that is most often used for niche optimizations, so this would still reject a lot of code.

But there is a more desirable alternative. Because Rust understands what values are invalid for a type—it must, in order to perform niche optimizations—it therefore should also understand which values are valid for a type. An intrinsic could be added to the compiler which, given a type, produces an arbitrary valid value of that type. This intrinsic could be used within mem::uninitialized, and the existing panic check could be removed. This would allow all code in the wild still using mem::uninitialized to compile, and would also avoid all insta-UB related to validity invariants.

@memoryruins
Copy link
Contributor

mem::uninitialized can initialize with a valid value

the existing panic check could be removed

What if there is no valid value?

enum Never {}
let x: Never = unsafe { core::mem::uninitialized() };

currently helpfully panics with

thread 'main' panicked at 'attempted to instantiate uninhabited type

@bstrie
Copy link
Contributor Author

bstrie commented Jul 31, 2021

I suspect it would not be a problem for it to continue to panic when used with uninhabited types.

@scottmcm scottmcm added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Aug 10, 2021
@scottmcm
Copy link
Member

An unsafe method for "get me an arbitrary value that meets the validity invariants but maybe not the safety ones" does sound potentially interesting, but if that's promised I think it should be a new API. I wouldn't want people to start calling uninitialized to get that behaviour, and thus making their code do horrible things on older rusts without a compile-time break.

@bstrie
Copy link
Contributor Author

bstrie commented Aug 16, 2021

Sure, the intrinsic could be exposed via something like mem::arbitrary, although I'm not sure if it's more than academically useful, since after all you're still paying the cost of initialization and you'd still have to use unsafe.

@beepster4096
Copy link
Contributor

@rustbot claim

@beepster4096
Copy link
Contributor

@rustbot release-assignment

@RalfJung
Copy link
Member

I am taking another shot at something like this in #99182, with a slightly different approach than #87032:

  • We keep all the existing panics. Things only change for uses of mem::uninitialized that do not (yet) panic. The general thrust currently is towards having panics in more cases, though not everyone is happy with that. If people want to reduce the amount of panics, I think we should discuss that separately from which bit pattern is actually being returned -- either in Tracking issue for panics in mem::uninitialized/zeroed #66151 or in a new issue.
  • We fill the resulting buffer with 0x01, not 0x00. This is not perfect, but it is noundef and valid for bool and all nonnull types. Only char and some enum are out of luck.

It's not quite a "valid" value, but that is not entirely possible anyway: we promise to LLVM that references are dereferenceable (whether they are used or not), so many existing uses of mem::uninitialized will never lead to well-defined LLVM IR.

@bors bors closed this as completed in 48316df Jul 28, 2022
workingjubilee pushed a commit to tcdi/postgrestd that referenced this issue Sep 15, 2022
mem::uninitialized: mitigate many incorrect uses of this function

Alternative to rust-lang/rust#98966: fill memory with `0x01` rather than leaving it uninit. This is definitely bitewise valid for all `bool` and nonnull types, and also those `Option<&T>` that we started putting `noundef` on. However it is still invalid for `char` and some enums, and on references the `dereferenceable` attribute is still violated, so the generated LLVM IR still has UB -- but in fewer cases, and `dereferenceable` is hopefully less likely to cause problems than clearly incorrect range annotations.

This can make using `mem::uninitialized` a lot slower, but that function has been deprecated for years and we keep telling everyone to move to `MaybeUninit` because it is basically impossible to use `mem::uninitialized` correctly. For the cases where that hasn't helped (and all the old code out there that nobody will ever update), we can at least mitigate the effect of using this API. Note that this is *not* in any way a stable guarantee -- it is still UB to call `mem::uninitialized::<bool>()`, and Miri will call it out as such.

This is somewhat similar to rust-lang/rust#87032, which proposed to make `uninitialized` return a buffer filled with 0x00. However
- That PR also proposed to reduce the situations in which we panic, which I don't think we should do at this time.
- The 0x01 bit pattern means that nonnull requirements are satisfied, which (due to references) is the most common validity invariant.

`@5225225` I hope I am using `cfg(sanitize)` the right way; I was not sure for which ones to test here.
Cc rust-lang/rust#66151
Fixes rust-lang/rust#87675
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants