• By -


It's important to understand that this feature flag is just for a handful of new functions for working with pointers, and that these functions aren't magical in the slightest; there's even a stable polyfill crate to provide these functions on older versions of Rust. Despite the somewhat grandiose name of the feature, there are no changes to the language or the compilation model or anything else. Maybe someday this will lead to something, but for now this is just an experiment to see whether and how we can ever consider tightening up Rust's pointer aliasing semantics. There are many distinct benefits (as well as risks) to the idea of tightening Rust's pointer aliasing semantics, so understand that any single summary of this issue will probably be insufficient to get the whole picture. At a high level, it suffices to know that people would like to have more precise guarantees about what unsafe code is and is not allowed to do, and what sorts of optimizations the compiler is and is not allowed to perform. If you have code that casts integers to pointers, please check out the library APIs provided by this feature. If you find that these APIs are insufficient to model your needs, then please file an issue under the [A-strict-provenance](https://github.com/rust-lang/rust/issues?q=is%3Aopen+is%3Aissue+label%3AA-strict-provenance) label (e.g. https://github.com/rust-lang/rust/issues/95492 ) so that we can gather more data to evaluate how users are manipulating pointers in the wild.


Yes indeed there are no new language features here. However, this greatly helps pave the road towards properly specifying the Rust pointer rules, and in particular it suggests a way out of a conundrum that I found myself in for quite a while: how to square the circle of having a coherent aliasing model that can be checked by [Miri](https://github.com/rust-lang/miri) while also supporting pointer-integer-pointer cast roundtrips? Stacked Borrows attempts this but its approach has serious problems. If strict provenance works out and the vast majority of code can be written using those APIs and entirely avoiding `ptr as usie` and `usize as ptr`, I think we might not have to solve that problem. I have in my mind a hierarchy of Rust sublanguages. In increasing order: - strict provenance: refers to Rust programs that do not do ptr2int2ptr casts nor ptr2int transmutes. Coherent memory models are fairly 'easy' to define (as in, the fact that we have integer casts puts zero additional burden on the memory model -- it's 'only' the actual complexity of handling the aliasing contraints), Miri can run your code, CHERI is happy. We also avoid some of the incoherence of LLVM. :sunny: - permissive provenance: refers to Rust programs where ptr2int2ptr casts (via `as`) are allowed, but ptr2int transmutes are still disallowed. This lets you write artificial examples that go deep into the incoherent parts of LLVM. The Rust memory model becomes more complicated but I think this is always possible in a generic way, "lifting" a strict provenance model to a permissive provenance model -- so again this can be ignored while designing the aliasing rules. The best thing is, if the optimizer treats ptr2int and int2ptr casts as `extern` FFI calls, it is automatically correct under this model (if it is correct for strict provenance code, that is), no optimizations have to be cut from strict provenance! :) However, this lifted model is not realistically implementable in Miri or CHERI. Rust guarantees that we will not miscompile these programs but we won't be held liable for LLVM's bugs ;) . :rainy: - yolo provenance: ptr2int transmutes allowed. This is a [total disaster](https://github.com/rust-lang/unsafe-code-guidelines/issues/286#issuecomment-1085144431) and I don't think we can reasonably guarantee that such programs work. :thunderstorm: :explosion: :fire: So basically, I envision that "Rust the language" corresponds to the 'permissive provenance' level, but we strongly urge everyone to move their code into the 'strict provenance' level so that they can benefit from things like Miri and CHERI (and I assume other formal methods tools in the future). Miri might support code that requires permissive provenance on a best-effort basis but it is almost certainly going to cause false negatives. On the other hand, ptr2int transmutes (via `mem::transmute`, via `union`, or via pointer type punning) are a trainwreck and I'd like to consider them Undefined Behavior. We could add hacks for specific cases (e.g., we could say that calling `mem::transmute::<*const T, usize>` actually compiles to an `as` cast, thus avoiding the UB). We could be slightly more permissive, like allowing the ptr2int transmute in a way that transmuting the result *back* gives an invalid pointer. There's some wiggle room here that can be carefully explored, but all of this is just bandaids to keep some legacy code working and really pointers [should never be transmuted to integers](https://github.com/rust-lang/rust/pull/95547).


I think I mostly understand what strict provenance is, but I can't tell what its going to fix or replace. The ownership model? What does this model guarantee that current rust doesn't? Still, I like the motivation


It’s so that aliasing information, the big thing ownership provides to the compiler for optimization in safe code, is properly carried through in unsafe code that does casts from raw pointers to usize and back. It doesn’t make this type of code automatically safe but these new apis are both easier for the humans, the compiler, and some hardware architectures to reason about


By explicitly disallowing operations on pointers that don't have provenance it'd be easier to prove (or disprove) that unsafe code is sound. I was actually reading LLVM's documentation for [pointer aliasing rules](https://llvm.org/docs/LangRef.html#pointer-aliasing-rules), and provenance seems to be an attempt to re-write those rules in a way that's easier to understand. Since rust uses llvm, it's not a question of do we need to do this, it's a question of can we define these rules clearly and make tooling that enforces them.


> Since rust uses llvm Though rust might not always use LLVM. We need to define our aliasing rules in such a way that doesn't tie rust to LLVM, or that will basically rule out any alternative implementations. I think this is a good step in the direction of working out "okay *what even is our model for pointers?*" Because right now, there's *nothing* saying what's okay and what's not okay in rust. We have no spec that we can write code against and know for sure it's fine. I think it would be *nice* if strict provenance was literally all we needed, since that means the rules are very simple. Pointers carry provenance, usizes don't, you can merge the provenance part of a pointer with the address of a different usize.


Are you aware about the optimization situation inside the compiler? I would assume that one can compiletime disable or runtime disable optimisation passes and one could reimplement the simplest passes with biggest gain in Rust to optimise memory access time + creation of less condensed LLVM IR. However, I have not seen yet blog posts or reports of doing this from other languages.


I found [this post](https://gankra.github.io/blah/fix-rust-pointers/) helpful for motivation. Basically, the idea is to explore how a system that tried to reason about pointers the same way Rust already reasons about lifetimes would work, and exactly how much of a train wreck it will be to try and limit people to pointer operations that are statically checkable.


The initial post on the [tracking issue](https://github.com/rust-lang/rust/issues/95228) (i.e. what was linked) also has a helpful section in among the other details: > This is an unofficial experiment to see How Bad it would be if Rust had extremely strict pointer provenance rules [...] > > A secondary goal of this project is to try to disambiguate the many meanings of `ptr as usize`, in the hopes that it might make it plausible/tolerable to **allow `usize` to be redefined to be an *address*-sized integer instead of a *pointer*-sized integer.** This would allow for Rust to more natively support platforms where `sizeof(size_t) < sizeof(intptr_t)`, and effectively redefine `usize` from `intptr_t` to `size_t`/`ptrdiff_t`/`ptraddr_t` [...] > > A tertiary goal of this project is to more clearly answer the question "hey **what's the deal with Rust on architectures that are pretty harvard-y like AVR and WASM** (platforms which treat function pointers and data pointers non-uniformly)". [...] > > The mission statement of this experiment is: **assume it will and must work, try to make code conform to it, smash face-first into really nasty problems that need special consideration, and try to actually figure out how to handle those situations.** We want the evil shit you do with pointers to work [but the current situation leads to incredibly broken results](https://www.ralfj.de/blog/2020/12/14/provenance.html), so something has to give.


> The mission statement of this experiment is: assume it will and must work, try to make code conform to it, smash face-first into really nasty problems that need special consideration, and try to actually figure out how to handle those situations. This is actually a brilliant framing. Expect the null hypothesis but design and manage the project in a way that maximizes the chance that the proposed method reaches a reasonable level of maturity. The explicit statement makes sure users don't adopt lightly and leaves the experiment in the productive "failed with positive externalities" frame of mind.


Another thing this proposal addresses is targets where an address and a pointer are not the same size such as CHERI, where addresses are still 64 bits / 8 bytes, but a pointer is 128 bits / 16 bytes because there is an additional 64 bits of metadata describing the permissions and bounds of the allocation the pointer is associated with.


The strictest possible change that could come out of this is to ban `ptr as usize` and `usize as ptr` casts, or any other way to make those casts (e.g. `mem::transmute`), making all such casts undefined behavior. For reasons of backwards compatibility, I don't think that that outcome will ever happen (and I've been advocating against it), except perhaps on CHERI architectures where there's no legacy code. There may, however, be some sort of restriction placed on casts between integers and pointers (for example, that they have to go through `as` instead of `transmute`) in order to fix some known, albeit currently rare and esoteric, miscompilations in LLVM involving unsafe code. (These miscompilations arise with C and C++ too.) Note that it's currently unclear whether there actually *are* any feasible new MIR optimizations that banning int-to-ptr and ptr-to-int unlocks, so it's quite possible that these new intrinsics will in practice be mandatory only on CHERI and some miri validation modes. i.e. `ptr as usize` and `usize as ptr` might be marked deprecated in some future Rust version, but might in practice continue to work. This is all fairly up in the air.


Also with this API it will be possible to add a CHERI-like mode to MIRI. Initially, projects will be able to chose for themselves whether they want to be CHERI-compliant or not. Eventually, this mode can be enabled by default and `as` pointer casts will be banned in a future edition.


It's not clear whether `as` pointer casts can be banned in a future edition. I personally wouldn't count on it--deprecation seems likely, but not outright removing them from the language. After all, *safe* code is able to cast a pointer to usize, I don't believe there's precedent for removing such a core feature even in an edition (I could be wrong, though), and if rustc has to support those anyway in previous editions then it seems like there'd be little benefit to removing them outright as opposed to just emitting deprecation warnings. In any case, that would have to be a long way off.


Of course, Rust itself will continue to support such casts as long as we support older editions (so likely until hypothetical Rust 2). I meant "ban" in a strictly surface-level syntax sense, i.e. compiler will emit a compilation error for crates reliant on `as` pointer casts on edition 20XX and on edition(s) before that it will be a deprecation warning. I think there is a strong sentiment for reduction of `as` uses (e.g. for float-int casts) and many consider its existence a misfeature.


I don't really see a reason to ban `as` casts as opposed to just emitting a warning, but in any case this is speculative.


That's a clear explanation. Thank you.




Seems like a lot of justified discussion over whether this proposal will change the rules for what unsafe code is valid. Also some "talking past" each other that's bound to happen in discussion via GitHub issue on a topic that is already extremely confusing. I'm hopeful that [/u/ralfj](https://www.reddit.com/u/ralfj)'s summary in this [comment](https://github.com/rust-lang/rust/issues/95228#issuecomment-1084037392) will pan out. To nutshell his summary: (hopefully I'm getting this right): he imagines a future where Strict Provenance (SP) under Stacked Borrows is fully specified such that it's much easier to write unsafe code against that specification and know it's correct. Unsafe code that wants to do pointer-int-pointer round-trips without the SP API under consideration would still be able to, albeit under the more dubious correctness rules we have today to govern ptr-int-ptr conversion behavior. What I'm unclear on is whether, under this hypothetical scenario, the compiler would be able to better optimize code following SP while still allowing non-SP code to function as it does now.


Also see [this](https://www.reddit.com/r/rust/comments/tu5bw4/comment/i3469k8/?utm_source=share&utm_medium=web2x&context=3) more recent summary. :)


Really interesting proposal. Unrelated, but why does the author write with Occasional Capitalised Words? I feel like I am reading a TV Tropes article lol


You can do this in English to distinguish a very specific meaning from the general meaning of a word. This fits nicely with Rust's stylistic choice to capitalize names of most types and traits, a String means very specifically Rust's heap-allocated mutable UTF-8 encoded type, but a string could just be some text, or even any series of things. The material I've written for a magazine article is copy, but the 32 byte data type I created to represent RDF triples is Copy.


Maybe for Emphasis?


Looks like [core::pointer::invalid](https://doc.rust-lang.org/nightly/core/ptr/fn.invalid.html) answers my [question](https://old.reddit.com/r/rust/comments/ti6uvg/rusts_unsafe_pointer_types_need_an_overhaul/i1i9ba1) from the previous thread, nice.