Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can references to uninhabited types ever be valid? #413

Open
RalfJung opened this issue Jun 6, 2023 · 47 comments
Open

Can references to uninhabited types ever be valid? #413

RalfJung opened this issue Jun 6, 2023 · 47 comments

Comments

@RalfJung
Copy link
Member

RalfJung commented Jun 6, 2023

This discussion presupposes that "Do we have full recursive validity for references" (#412) is answered "no".

Is the following UB?

let x: &! = transmute(&());

The reason we could consider making this UB is that all of the arguments in #412 do not apply: to determine whether an &! is valid, we do not have to check memory, do any potentially racy reads, or anything like that. We can just answer the question with "no".

Basically, we can think of ! having an impossible-to-satisfy alignment requirement, and therefore &! is invalid since it is never properly aligned. (I don't really want to fold this into the alignment check, but the point is that we already have the validity of &T depend on T, namely via alignment -- I see no fundamental reason why we couldn't do the same with inhabitedness.)

@digama0
Copy link

digama0 commented Jun 6, 2023

Does this issue also cover &Void and &(!, T)? The latter is problematic since I think there are situations where you can have partially uninit references (and this also partially overlaps with #346). But maybe it suffices to say that an initialized &T (i.e. a value returned from a function, like your transmute example) requires that T be inhabited, even though uninitialized or partially initialized places of type &! or &(!, T) can exist.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 6, 2023

Does this issue also cover &Void and &(!, T)?

Yes.

While you can have partially initialized locals of type (!, T), I don't think you can have references to them. Do you have an example for that?

@chorman0773
Copy link
Contributor

Is there a reason why they should be invalid unless &T is invalid if the T is?

@RalfJung
Copy link
Member Author

RalfJung commented Jun 6, 2023

One motivation is being able to replace the entire body of an &self function where self is uninhabited with just unreachable_unchecked(). Back at the last Rust All hands (2019 I think?), someone expressed strong interest in that (code size) optimization.

@digama0
Copy link

digama0 commented Jun 6, 2023

One motivation is being able to replace the entire body of an &self function where self is uninhabited with just unreachable_unchecked(). Back at the last Rust All hands (2019 I think?), someone expressed strong interest in that (code size) optimization.

This has been mentioned elsewhere and I think you agreed to it, but that is not a very compelling motivation since you can just put match *self {} in the body to get the same optimization. (If I had my druthers match self {} would also work; this does not strictly speaking require a "no" answer to the OP question.)

@digama0
Copy link

digama0 commented Jun 6, 2023

While you can have partially initialized locals of type (!, T), I don't think you can have references to them. Do you have an example for that?

I don't, at least not without writing an unsafe block that would be UB under the OP question (more or less by definition). A variation that may or may not count is &Box<!>; I'm not sure whether this counts as inhabited or not, and it might actually thread the needle between competing notions of inhabitedness since you can empty-match on it and move out a ! (although maybe you want to call this a safety violation), even though by structural analysis of Box<!> you get a Unique which could just be a dangling pointer.

My main motivation for considering that case specially is that the borrow checker does allow for types like &(T, U) to have more complex initialization states, and future lang proposals might reify these as types (e.g. view types), in which case you might have a function that takes an argument of type &{only T is init} (!, T) and you would not be able to assume that this type is uninhabited.

@Diggsey
Copy link

Diggsey commented Jun 6, 2023

One motivation is being able to replace the entire body of an &self function where self is uninhabited with just unreachable_unchecked(). Back at the last Rust All hands (2019 I think?), someone expressed strong interest in that (code size) optimization.

Wouldn't it be almost always trivial to prove that such a function is unreachable anyway? Assuming reachability checks occur post-monomorphisation.

@saethlin
Copy link
Member

saethlin commented Jun 6, 2023

One motivation is being able to replace the entire body of an &self function where self is uninhabited with just unreachable_unchecked(). Back at the last Rust All hands (2019 I think?), someone expressed strong interest in that (code size) optimization.

I feel like we could do an experiment here to figure out if this is actually worth anything?

@digama0
Copy link

digama0 commented Jun 6, 2023

If it's unreachable, then you wouldn't even need an unreachable_unchecked, the whole function could be deleted. But it could be a field in a trait object, for example, in which case you do need the function (but it could be a dangling pointer? 🤔 )

@RalfJung
Copy link
Member Author

RalfJung commented Jun 7, 2023

This has been mentioned elsewhere and I think you agreed to it, but that is not a very compelling motivation since you can just put match *self {} in the body to get the same optimization. (If I had my druthers match self {} would also work; this does not strictly speaking require a "no" answer to the OP question.)

I'm trying to remember the old discussion. The concern might have been that these are trait impls generated automatically / via proc macros for tons of types. So we can't just add custom code into the body.

I think it was @cramertj who raised this wish back then, not sure if they still remember any details though.

Wouldn't it be almost always trivial to prove that such a function is unreachable anyway? Assuming reachability checks occur post-monomorphisation.

If it's a public function in a lib crate, how would that be trivial?

@saethlin
Copy link
Member

saethlin commented Jun 7, 2023

The perf run for this optimization implemented as a MIR pass indicates no benefit in terms of code size: https://perf.rust-lang.org/compare.html?start=afab3662eb066f05fcdb43c421b72dd19472e752&end=0f4641ea16e650c77a03aa649330702722a5f66f&stat=size%3Alinked_artifact

@RalfJung
Copy link
Member Author

RalfJung commented Jun 8, 2023

I guess the benchmarks don't contain auto-derived code for uninhabited types. Also as a MIR opt this is less effective than it could be, if generic types and their impls are monomorphized for an uninhabited type.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 8, 2023

My main motivation for considering that case specially is that the borrow checker does allow for types like &(T, U) to have more complex initialization states

I have never seen this happen, so this is news to me. What is an example of that?

@digama0
Copy link

digama0 commented Jun 8, 2023

I take it back, the borrow checker will reason about partially initialized Box<(T, U)> but not &(T, U). However I can come up with examples that would plausibly benefit from it:

#[derive(Default)]
struct Foo {
  a: String,
  ...
  z: String,
}
let mut x = Foo::default();
drop(x.z);
|| println!("{}, ..., {}", x.a, ..., x.y);

Currently, the returned closure captures &x.a, ..., &x.y as references and hence would be 25 pointers large, but a future optimization could capture &x instead with a borrow state indicating that x.z is uninit.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 8, 2023

the borrow checker will reason about partially initialized Box<(T, U)> but not &(T, U).

Ah, yes that makes sense -- Box<T> is treated like T in many regards. I don't see us doing that for references though since it is valid in much fewer cases. And if we ever want to do that, we can just take back the UB on &Uninhabited.

Currently, the returned closure captures &x.a, ..., &x.y as references and hence would be 25 pointers large, but a future optimization could capture &x instead with a borrow state indicating that x.z is uninit.

With per-field closure capture, don't we actually do capture just a single reference and use it for all fields? That was considered as an implementation strategy at some point.

But anyway if we do that, we just have to pick a suitable type for the closure capture environment, one that is not uninhabited. Arguably actually using &(!, i32) for that capture would be a blatant lie and could confuse various parts of the compiler.

@chorman0773
Copy link
Contributor

The concern might have been that these are trait impls generated automatically / via proc macros for tons of types.

I'd expect that a lot of those trait impls will ultimately contain match *self{} as the first item.

use serde::Serialize;
#[derive(Serialize)]
pub enum Void{}

expands to

pub enum Void {}
#[doc(hidden)]
#[allow(non_upper_case_globals, unused_attributes, unused_qualifications)]
const _: () =
    {
        #[allow(unused_extern_crates, clippy :: useless_attribute)]
        extern crate serde as _serde;
        #[allow(unused_macros)]
        macro_rules! try {
            ($__expr : expr) =>
            {
                match $__expr
                {
                    _serde :: __private :: Ok(__val) => __val, _serde ::
                    __private :: Err(__err) =>
                    { return _serde :: __private :: Err(__err) ; }
                }
            }
        }
        #[automatically_derived]
        impl _serde::Serialize for Void {
            fn serialize<__S>(&self, __serializer: __S)
                -> _serde::__private::Result<__S::Ok, __S::Error> where
                __S: _serde::Serializer {
                match *self {}
            }
        }
    };

(I tried with builtin derives, and it just generates core::intrinsics::unreachable(), so apparently we can insert custom code into them in particular.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 9, 2023 via email

@saethlin
Copy link
Member

saethlin commented Jun 9, 2023

I guess the benchmarks don't contain auto-derived code for uninhabited types.

No, they do.

Also as a MIR opt this is less effective than it could be, if generic types and their impls are monomorphized for an uninhabited type.

I moved the transform to codegen where we have monomoprhized MIR. It still has no effect, and in a stage2 build of rustc it only fires one additional time.
https://perf.rust-lang.org/compare.html?start=8b35c0bb0f833c0077dc57006eb317edde2a2d1e&end=b7380c57d311609c0fd4f128fbe8adbbe0f96a24&stat=size%3Alinked_artifact

Many of the times this optimization fires, it transforms this:

bb0 {
    _0 = const _;
    return;
}

into

bb0 {
    unreachable;
}

Hardly anything to celebrate.

What we don't have is a highly artificial crate that just derives a bunch of traits on a bunch of types which are all uninhabited. Maybe then this would show up. But that form of argument can justify any optimization so I don't think it is worth considering.

I do not think an argument about the value of this optimization holds up. But the try toolchain(s) are up there, so if someone has a codebase where they can demonstrate it helping, please do so.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 10, 2023

Given a large tuple type with an uninhabited field, we currently do generate a whole bunch of MIR / LLVM IR that is all completely pointless, right? I was told that in the Fuchsia codebase, being able to optimize this is relevant. That's all I know, and it seems plausible. You are saying in our benchmarks we have no such types. Fair, but that doesn't really say much about the code in the wild.

I'm not trusting our ability to come up with real-world benchmarks and predict future coding patterns enough to conclusively say "this optimization is useless and will also never be useful in the future". (I will also note that this kind of argument hasn't been used in other UB discussions before, despite other forms of UB being a lot more subtle, having non-local effects, and having no real-world evidence of being helpful either... are we going to demand real-world optimization evidence for every single bit of UB, and make everything else defined? I'm fine with "here is an optimization that demonstrates conclusively that his UB is useful" as an argument, but I find the negative version of this a lot less convincing. It's hard to come up with good evidence for the absence of a usecase.)

It seems to me to be cheap and easy to do this (have this UB), since there really isn't any reason to allow such references to exist and the extra complexity incurred on the spec is minimal. It's also easy to take back the UB and make it a mere safety invariant, if this ever causes a problem somewhere. It's impossible to go in the other direction, if we discover in the future that we really want this optimization.

@saethlin
Copy link
Member

I don't mean to argue that "this optimization is useless and will also never be useful in the future". But I do think that if we are going to claim that some optimization is valuable currently, we should have data to back up that claim. I thought that claim was being made up-thread (by you someone at an all-hands you are echoing).

I'll see if I can take a look at Fuchsia later, if I can figure out how to interact with it.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 10, 2023 via email

@RalfJung
Copy link
Member Author

One other reason it might make sense to consider &! and &Void validity-uninhabited is coherence with match rules. I assume with x: &Void we want to accept match *x {}, and in fact we do accept it today. However, if validity for references allows &Void to exist, then there exists code where match *x { _ => () } is actually well-defined (i.e., the arm is reachable), and yet removing the match arm gets accepted! That seems somewhat incoherent.

Arguably, fixing this by making more code UB is a bad fix, but the alternative of rejecting the empty match does not seem appealing either.

@digama0
Copy link

digama0 commented Oct 25, 2023

However, if validity for references allows &Void to exist, then there exists code where match *x { _ => () } is actually well-defined (i.e., the arm is reachable), and yet removing the match arm gets accepted! That seems somewhat incoherent.

Just as a single data point, this behavior doesn't seem that weird to me. I am used to pattern matching that goes looking for exhaustive matches, and using _ means that it doesn't do that. I would rather optimize for the case of safe code here, because there is nothing about this example that involves unsafe code and x: &Void is already outside its safety invariant. This example is no more concerning to me than the fact that writing let foo = *x; is UB if x: &u32 is a dangling pointer (substitute &&u32, &Vec<u32> or similar depending on where we land on shallow validity). You shouldn't allow these time bombs in safe code in the first place.

@chorman0773
Copy link
Contributor

chorman0773 commented Oct 27, 2023

So I'd argue that the same case for &! or &Empty. In both cases, we've replaced every required match arm with a single arm that has a discard pattern, and removed undefined behaviour. The fact that in the &Empty case the set of replaced match arms is empty is immaterial, because it's a universal quantifier over the empty set.

@chorman0773
Copy link
Contributor

Also, the same code (as the &Empty) case is valid: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=1d4327fcced1e3b7f9e0eb9e1de6420b

Unless the place construction for the match itself would also be UB, you get the exact same incoherency.

@RalfJung
Copy link
Member Author

In both cases, we've replaced every required match arm with a single arm that has a discard pattern, and removed undefined behaviour.

True.

The fact that in the &Empty case the set of replaced match arms is empty is immaterial, because it's a universal quantifier over the empty set.

I guess that's the question. Having UB from absence of code is very weird and specifically only affects uninhabited types, not bool.

But yes, the proposal in this issue does special-case uninhabited types, so they do become different than bool. match construction also special-cases them when handling the situation where there are no match arms for some values, so this is not coming entirely out of nowhere.

@Nadrieril
Copy link
Member

Nadrieril commented Oct 28, 2023

In both cases, we've replaced every required match arm with a single arm that has a discard pattern, and removed undefined behaviour. The fact that in the &Empty case the set of replaced match arms is empty is immaterial, because it's a universal quantifier over the empty set.

Well, if we required a wildcard arm for the Empty case, the rule would be "we've replaced all the arms with a single wildcard arm and this removed UB". "all the arms" includes the case of no arms at all.

I don't have much of an opinion on this, just pointing out that both solutions have a symmetry like that.

EDIT: ignore me I had understood your point backwards. I shouldn't reply to github issues right out of bed x)

@Nadrieril
Copy link
Member

I plan to address the whole "UB from absence of code" by RFCing never patterns and deprecating match *x {} in favor of match *x { ! }, so hopefully that will be cleared up regardless of the decision of this issue.

The reason I'd want &! to be UB is a specific case I expect to see in user code: given res: &Result<T, !>, I'd want let Ok(x) = res.as_ref(); to be valid. Problem is, I'd also want let Ok(x) = res; to be valid, which would require that a discriminant read at a Result<T, !> place never returns Err, which is a different question. Having the first without the second feels weirder than having neither.

@Nadrieril
Copy link
Member

Thinking more about this, how would this work with generic code? E.g. NonNull::<!>::dangling().as_ref() would become UB even though ! is a ZST. More generally, that would mean generic code that plays with &uninit things would need to special-case !, and I don't know how it would be able to since size_of and align_of make it look like any other ZST.

@RalfJung
Copy link
Member Author

which would require that a discriminant read at a Result<T, !> place never returns Err, which is a different question

I was already assuming that GetDiscriminant can never return an uninhabited variant, and that's how Miri currently works.

NonNull::<!>::dangling().as_ref() would become UB even though ! is a ZST.

Yes. That's a feature, not a bug. The safety requirements for as_ref state
"The pointer must point to an initialized instance of T."
which you are clearly violating in your example.

@Nadrieril
Copy link
Member

I was already assuming that GetDiscriminant can never return an uninhabited variant, and that's how Miri currently works.

Oh, is this properly decided? I remember reading that it wasn't fully decided so I erred on the side of forwards-compatibility.

Yes. That's a feature, not a bug.

Well yeah. The problem is that NonNull::dangling().as_ref() is not UB for other ZSTs, so generic code that operates on ZSTs could be doing that, no?

@RalfJung
Copy link
Member Author

Oh, is this properly decided? I remember reading that it wasn't fully decided so I erred on the side of forwards-compatibility.

No I don't think it is. But from a Miri perspective, having more UB is the forwards-compatible option. ;)

@RalfJung
Copy link
Member Author

RalfJung commented Nov 16, 2023

Well yeah. The problem is that NonNull::dangling().as_ref() is not UB for other ZSTs, so generic code that operates on ZSTs could be doing that, no?

It is unsound to call this function if you don't actually have a valid initialized T there. ZST can have safety invariants, not just validity invariants. (If you are unsure what that means, please read rust-lang/rust#95385 and this blog post.) So it's definitely unsound for other ZST as well.

Why do you want to write code that breaks the precondition of as_ref? You really shouldn't do that, even if it might technically not be UB. (It is always library UB but it might not be language UB.)

I'm not aware of any situation where NonNull::<T>::dangling().as_ref() being UB for uninhabited T is a problem. Note that the libcore slice iterator code uses NonNull::dangling for ZST but it is sound for uninhabited T. The iterator code paths are only entered when we know there is an array with at least one T that we are iterating over, which implies that the type must be inhabited -- more than that, it implies that the safety invariant of the type is satisfied, and hence we can satisfy the precondition of as_ref.

@Nadrieril
Copy link
Member

Hm ok. I was only trying to conjure an example where ! being a weird ZST could trigger unexpected UB in generic code. You don't think that's an issue then?

@RalfJung
Copy link
Member Author

RalfJung commented Nov 16, 2023 via email

compiler-errors added a commit to compiler-errors/rust that referenced this issue Jan 26, 2024
… r=compiler-errors

Add the `min_exhaustive_patterns` feature gate

## Motivation

Pattern-matching on empty types is tricky around unsafe code. For that reason, current stable rust conservatively requires arms for empty types in all but the simplest case. It has long been the intention to allow omitting empty arms when it's safe to do so. The [`exhaustive_patterns`](rust-lang#51085) feature allows the omission of all empty arms, but hasn't been stabilized because that was deemed dangerous around unsafe code.

## Proposal

This feature aims to stabilize an uncontroversial subset of exhaustive_patterns. Namely: when `min_exhaustive_patterns` is enabled and the data we're matching on is guaranteed to be valid by rust's operational semantics, then we allow empty arms to be omitted. E.g.:

```rust
let x: Result<T, !> = foo();
match x { // ok
    Ok(y) => ...,
}
let Ok(y) = x; // ok
```

If the place is not guaranteed to hold valid data (namely ptr dereferences, ref dereferences (conservatively) and union field accesses), then we keep stable behavior i.e. we (usually) require arms for the empty cases.

```rust
unsafe {
    let ptr: *const Result<u32, !> = ...;
    match *ptr {
        Ok(x) => { ... }
        Err(_) => { ... } // still required
    }
}
let foo: Result<u32, &!> = ...;
match foo {
    Ok(x) => { ... }
    Err(&_) => { ... } // still required because of the dereference
}
unsafe {
    let ptr: *const ! = ...;
    match *ptr {} // already allowed on stable
}
```

Note that we conservatively consider that a valid reference can point to invalid data, hence we don't allow arms of type `&!` and similar cases to be omitted. This could eventually change depending on [opsem decisions](rust-lang/unsafe-code-guidelines#413). Whenever opsem is undecided on a case, we conservatively keep today's stable behavior.

I proposed this behavior in the [`never_patterns`](rust-lang#118155) feature gate but it makes sense on its own and could be stabilized more quickly. The two proposals nicely complement each other.

## Unresolved Questions

Part of the question is whether this requires an RFC. I'd argue this doesn't need one since there is no design question beyond the intent to omit unreachable patterns, but I'm aware the problem can be framed in ways that require design (I'm thinking of the [original never patterns proposal](https://smallcultfollowing.com/babysteps/blog/2018/08/13/never-patterns-exhaustive-matching-and-uninhabited-types-oh-my/), which would frame this behavior as "auto-nevering" happening).

EDIT: I initially proposed a future-compatibility lint as part of this feature, I don't anymore.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 26, 2024
… r=compiler-errors

Add the `min_exhaustive_patterns` feature gate

## Motivation

Pattern-matching on empty types is tricky around unsafe code. For that reason, current stable rust conservatively requires arms for empty types in all but the simplest case. It has long been the intention to allow omitting empty arms when it's safe to do so. The [`exhaustive_patterns`](rust-lang#51085) feature allows the omission of all empty arms, but hasn't been stabilized because that was deemed dangerous around unsafe code.

## Proposal

This feature aims to stabilize an uncontroversial subset of exhaustive_patterns. Namely: when `min_exhaustive_patterns` is enabled and the data we're matching on is guaranteed to be valid by rust's operational semantics, then we allow empty arms to be omitted. E.g.:

```rust
let x: Result<T, !> = foo();
match x { // ok
    Ok(y) => ...,
}
let Ok(y) = x; // ok
```

If the place is not guaranteed to hold valid data (namely ptr dereferences, ref dereferences (conservatively) and union field accesses), then we keep stable behavior i.e. we (usually) require arms for the empty cases.

```rust
unsafe {
    let ptr: *const Result<u32, !> = ...;
    match *ptr {
        Ok(x) => { ... }
        Err(_) => { ... } // still required
    }
}
let foo: Result<u32, &!> = ...;
match foo {
    Ok(x) => { ... }
    Err(&_) => { ... } // still required because of the dereference
}
unsafe {
    let ptr: *const ! = ...;
    match *ptr {} // already allowed on stable
}
```

Note that we conservatively consider that a valid reference can point to invalid data, hence we don't allow arms of type `&!` and similar cases to be omitted. This could eventually change depending on [opsem decisions](rust-lang/unsafe-code-guidelines#413). Whenever opsem is undecided on a case, we conservatively keep today's stable behavior.

I proposed this behavior in the [`never_patterns`](rust-lang#118155) feature gate but it makes sense on its own and could be stabilized more quickly. The two proposals nicely complement each other.

## Unresolved Questions

Part of the question is whether this requires an RFC. I'd argue this doesn't need one since there is no design question beyond the intent to omit unreachable patterns, but I'm aware the problem can be framed in ways that require design (I'm thinking of the [original never patterns proposal](https://smallcultfollowing.com/babysteps/blog/2018/08/13/never-patterns-exhaustive-matching-and-uninhabited-types-oh-my/), which would frame this behavior as "auto-nevering" happening).

EDIT: I initially proposed a future-compatibility lint as part of this feature, I don't anymore.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 26, 2024
… r=compiler-errors

Add the `min_exhaustive_patterns` feature gate

## Motivation

Pattern-matching on empty types is tricky around unsafe code. For that reason, current stable rust conservatively requires arms for empty types in all but the simplest case. It has long been the intention to allow omitting empty arms when it's safe to do so. The [`exhaustive_patterns`](rust-lang#51085) feature allows the omission of all empty arms, but hasn't been stabilized because that was deemed dangerous around unsafe code.

## Proposal

This feature aims to stabilize an uncontroversial subset of exhaustive_patterns. Namely: when `min_exhaustive_patterns` is enabled and the data we're matching on is guaranteed to be valid by rust's operational semantics, then we allow empty arms to be omitted. E.g.:

```rust
let x: Result<T, !> = foo();
match x { // ok
    Ok(y) => ...,
}
let Ok(y) = x; // ok
```

If the place is not guaranteed to hold valid data (namely ptr dereferences, ref dereferences (conservatively) and union field accesses), then we keep stable behavior i.e. we (usually) require arms for the empty cases.

```rust
unsafe {
    let ptr: *const Result<u32, !> = ...;
    match *ptr {
        Ok(x) => { ... }
        Err(_) => { ... } // still required
    }
}
let foo: Result<u32, &!> = ...;
match foo {
    Ok(x) => { ... }
    Err(&_) => { ... } // still required because of the dereference
}
unsafe {
    let ptr: *const ! = ...;
    match *ptr {} // already allowed on stable
}
```

Note that we conservatively consider that a valid reference can point to invalid data, hence we don't allow arms of type `&!` and similar cases to be omitted. This could eventually change depending on [opsem decisions](rust-lang/unsafe-code-guidelines#413). Whenever opsem is undecided on a case, we conservatively keep today's stable behavior.

I proposed this behavior in the [`never_patterns`](rust-lang#118155) feature gate but it makes sense on its own and could be stabilized more quickly. The two proposals nicely complement each other.

## Unresolved Questions

Part of the question is whether this requires an RFC. I'd argue this doesn't need one since there is no design question beyond the intent to omit unreachable patterns, but I'm aware the problem can be framed in ways that require design (I'm thinking of the [original never patterns proposal](https://smallcultfollowing.com/babysteps/blog/2018/08/13/never-patterns-exhaustive-matching-and-uninhabited-types-oh-my/), which would frame this behavior as "auto-nevering" happening).

EDIT: I initially proposed a future-compatibility lint as part of this feature, I don't anymore.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jan 26, 2024
Rollup merge of rust-lang#118803 - Nadrieril:min-exhaustive-patterns, r=compiler-errors

Add the `min_exhaustive_patterns` feature gate

## Motivation

Pattern-matching on empty types is tricky around unsafe code. For that reason, current stable rust conservatively requires arms for empty types in all but the simplest case. It has long been the intention to allow omitting empty arms when it's safe to do so. The [`exhaustive_patterns`](rust-lang#51085) feature allows the omission of all empty arms, but hasn't been stabilized because that was deemed dangerous around unsafe code.

## Proposal

This feature aims to stabilize an uncontroversial subset of exhaustive_patterns. Namely: when `min_exhaustive_patterns` is enabled and the data we're matching on is guaranteed to be valid by rust's operational semantics, then we allow empty arms to be omitted. E.g.:

```rust
let x: Result<T, !> = foo();
match x { // ok
    Ok(y) => ...,
}
let Ok(y) = x; // ok
```

If the place is not guaranteed to hold valid data (namely ptr dereferences, ref dereferences (conservatively) and union field accesses), then we keep stable behavior i.e. we (usually) require arms for the empty cases.

```rust
unsafe {
    let ptr: *const Result<u32, !> = ...;
    match *ptr {
        Ok(x) => { ... }
        Err(_) => { ... } // still required
    }
}
let foo: Result<u32, &!> = ...;
match foo {
    Ok(x) => { ... }
    Err(&_) => { ... } // still required because of the dereference
}
unsafe {
    let ptr: *const ! = ...;
    match *ptr {} // already allowed on stable
}
```

Note that we conservatively consider that a valid reference can point to invalid data, hence we don't allow arms of type `&!` and similar cases to be omitted. This could eventually change depending on [opsem decisions](rust-lang/unsafe-code-guidelines#413). Whenever opsem is undecided on a case, we conservatively keep today's stable behavior.

I proposed this behavior in the [`never_patterns`](rust-lang#118155) feature gate but it makes sense on its own and could be stabilized more quickly. The two proposals nicely complement each other.

## Unresolved Questions

Part of the question is whether this requires an RFC. I'd argue this doesn't need one since there is no design question beyond the intent to omit unreachable patterns, but I'm aware the problem can be framed in ways that require design (I'm thinking of the [original never patterns proposal](https://smallcultfollowing.com/babysteps/blog/2018/08/13/never-patterns-exhaustive-matching-and-uninhabited-types-oh-my/), which would frame this behavior as "auto-nevering" happening).

EDIT: I initially proposed a future-compatibility lint as part of this feature, I don't anymore.
@ijackson
Copy link

Currently, with nightly, this program:

#![feature(never_type)]
use std::mem::transmute;

fn main() {
    let x = &();
    let x: &! = unsafe { transmute(x) };
    match *x { _ => {} }
}

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=8cfc2b548c44d8f0b00b1dca258f7e43

runs, but complains about an unreachable match arm.

If you remove the supposedy unreachable arm,

#![feature(never_type)]
use std::mem::transmute;

fn main() {
    let x = &();
    let x: &! = unsafe { transmute(x) };
    match *x { }
}

it dies with SIGILL.

I think this is rather strange, unless both programs have UB.

@ijackson
Copy link

As a possible argument in favour of &! being UB to construct:

In derive-adhoc I have a large enum whose variants contain "markers" that are sometimes units and sometimes uninhabited: See https://gitlab.torproject.org/Diziet/rust-derive-adhoc/-/blob/main/macros/framework.rs?ref_type=heads#L115 and https://gitlab.torproject.org/Diziet/rust-derive-adhoc/-/blob/main/macros/syntax.rs?ref_type=heads#L67

This allows the reuse of the same enum (source code) for similar value sets that occur in different contexts. Call sites that receive one of these values can call trait methods like this: https://gitlab.torproject.org/Diziet/rust-derive-adhoc/-/blob/main/macros/paste.rs?ref_type=heads#L816

This allows me as the author to statically demonstrate that the situation is impossible, and therefore I don't have to write code to handle it.

It would be nice if the compiler could remove as much of this as possible. Ideally, the uninhabited enum variants, and all the code that handles uninhabited values, would completely vanish as soon as possible after monomorphisation.

@Nadrieril
Copy link
Member

Your example is an instance of rust-lang/rust#117288, which the lang team has agreed was something they wanted to fix (the arm is deemed unreachable because the *x expression is incorrectly believed to diverge).

@ijackson
Copy link

Here's another example where disallowing &! improves things:

enum MockableRef<'m> {
    Production,
    Mocked(&'m Mocks),
}

#[cfg(not(test))]
enum Mocks {}

We hope MockableRef is a unit in production code, since only one of its variants ought ever to exist. But it's not: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cb3a9289904acb3112d76b7a9284ea9f

@ijackson
Copy link

(Maybe bare &! could be ok, if we could say that the constructor MockableRef::Mocked counts as a function and therefore passing it &! is UB and therefore ::Mocked cannot exist.)

@RalfJung
Copy link
Member Author

runs, but complains about an unreachable match arm.

Yeah that should be fixed... I thought @Nadrieril fixed this recently but I guess that was another match lint issue that got fixed?

(Maybe bare &! could be ok, if we could say that the constructor MockableRef::Mocked counts as a function and therefore passing it &! is UB and therefore ::Mocked cannot exist.)

I wouldn't want to make function calls special this way. I'd prefer making &! validity-uninhabited. (It obviously is safety-uninhabited so the validity invariant is the only open question.)

@Nadrieril
Copy link
Member

Nadrieril commented Feb 13, 2024

Yeah what I fixed is the exhaustiveness code; that example triggers a warning because of divergence tracking instead (i.e. it triggers the unreachable_code lint, and I only fixed the unreachable_pattern lint)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants