Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults/corruption when reading an enum in release mode #77359

Closed
cormacrelf opened this issue Sep 30, 2020 · 23 comments · Fixed by #107256
Closed

Segfaults/corruption when reading an enum in release mode #77359

cormacrelf opened this issue Sep 30, 2020 · 23 comments · Fixed by #107256
Labels
A-mir-opt Area: MIR optimizations C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness requires-nightly This issue requires a nightly compiler in some way. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@cormacrelf
Copy link
Contributor

cormacrelf commented Sep 30, 2020

Reproduction here: https://github.com/cormacrelf/minimal-sigsegv-rust

There's more info in the readme on that repo. Essentially, I have some code that segfaults while debug-printing an enum. The enum has one Output(String) variant, followed by 11 empty variants. The repro only creates one of the empty ones, but it appears the debug routine attempts to read the string variant. It occurs in the context of some mutually recursive functions building a fairly simple datastructure by walking another. I have tried hard to minimise the repro but it could be better, but it's a bit fragile so I don't know what else I can remove. With a quick glance at the commit that bisect-rustc found, it looks like it could be related to optimising match arms, so that could be a start.

Edit: for some linkage, the commit below merges #76308, which 're-enables SimplifyArmIdentity'.

cargo bisect-rustc

searched nightlies: from nightly-2020-05-08 to nightly-2020-09-23
regressed nightly: nightly-2020-09-09
searched commits: from 0e2c128 to 5099914
regressed commit: 5a6b426

bisected with cargo-bisect-rustc v0.5.2

Host triple: x86_64-apple-darwin
Reproduce with:

cargo bisect-rustc --start 2020-05-08 --end 2020-09-23 --with-cargo --access github -- run --release

Note that the ref_sequence function is the one in which the debug segfaults, and it is debugging a larger structure. I have also seen it segfault when cloning just the EdgeData enum, which is the String+11x one mentioned above.

Backtrace from rust-lldb

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x0000000100002588 minimal`_$LT$$RF$T$u20$as$u20$core..fmt..Debug$GT$::fmt::h797ace08a5294f9f + 24
    frame #1: 0x000000010002e163 minimal`core::fmt::builders::DebugTuple::field::h5368cface0fdb245 [inlined] core::fmt::builders::DebugTuple::field::_$u7b$$u7b$closure$u7d$$u7d$::hb6c824bec58214a9 at builders.rs:347:17 [opt]
    frame #2: 0x000000010002e115 minimal`core::fmt::builders::DebugTuple::field::h5368cface0fdb245 [inlined] core::result::Result$LT$T$C$E$GT$::and_then::h2b3c0ffee0d30aef at result.rs:708 [opt]
    frame #3: 0x000000010002e109 minimal`core::fmt::builders::DebugTuple::field::h5368cface0fdb245 at builders.rs:334 [opt]
    frame #4: 0x000000010000459f minimal`_$LT$$RF$T$u20$as$u20$core..fmt..Debug$GT$::fmt::h2b06d94161eb79e0 + 95
    frame #5: 0x000000010002e163 minimal`core::fmt::builders::DebugTuple::field::h5368cface0fdb245 [inlined] core::fmt::builders::DebugTuple::field::_$u7b$$u7b$closure$u7d$$u7d$::hb6c824bec58214a9 at builders.rs:347:17 [opt]
    frame #6: 0x000000010002e115 minimal`core::fmt::builders::DebugTuple::field::h5368cface0fdb245 [inlined] core::result::Result$LT$T$C$E$GT$::and_then::h2b3c0ffee0d30aef at result.rs:708 [opt]
    frame #7: 0x000000010002e109 minimal`core::fmt::builders::DebugTuple::field::h5368cface0fdb245 at builders.rs:334 [opt]
    frame #8: 0x00000001000037e8 minimal`_$LT$minimal..ref_ir..RefIR$u20$as$u20$core..fmt..Debug$GT$::fmt::hfde74232d0e0d5d0 + 104
    frame #9: 0x000000010002e910 minimal`core::fmt::write::h0ce880d33cd2a300 at mod.rs:1080:17 [opt]
    frame #10: 0x0000000100015034 minimal`_$LT$$RF$std..io..stdio..Stderr$u20$as$u20$std..io..Write$GT$::write_fmt::h2508a17060d785a3 [inlined] std::io::Write::write_fmt::hdeaf6be33655ecfd at mod.rs:1517:15 [opt]
    frame #11: 0x0000000100014fe9 minimal`_$LT$$RF$std..io..stdio..Stderr$u20$as$u20$std..io..Write$GT$::write_fmt::h2508a17060d785a3 at stdio.rs:838 [opt]
    frame #12: 0x0000000100015596 minimal`std::io::stdio::_eprint::h8f478498c62451a2 [inlined] _$LT$std..io..stdio..Stderr$u20$as$u20$std..io..Write$GT$::write_fmt::h4bdcbad34c0de397 at stdio.rs:812:9 [opt]
    frame #13: 0x0000000100015539 minimal`std::io::stdio::_eprint::h8f478498c62451a2 [inlined] std::io::stdio::print_to::_$u7b$$u7b$closure$u7d$$u7d$::h805d34d32e3d72cc at stdio.rs:947 [opt]
    frame #14: 0x0000000100015404 minimal`std::io::stdio::_eprint::h8f478498c62451a2 [inlined] std::thread::local::LocalKey$LT$T$GT$::try_with::h81d8f4dd4c7ef03e at local.rs:271 [opt]
    frame #15: 0x00000001000153dc minimal`std::io::stdio::_eprint::h8f478498c62451a2 [inlined] std::io::stdio::print_to::h3e10e6f958040f4c at stdio.rs:936 [opt]
    frame #16: 0x00000001000153dc minimal`std::io::stdio::_eprint::h8f478498c62451a2 at stdio.rs:975 [opt]
    frame #17: 0x0000000100002c3b minimal`minimal::disamb::element_ref_ir_impl::h81621c2e3d855e9c + 1035
    frame #18: 0x000000010000423f minimal`minimal::main::h79d5d72421065e13 + 1375
    frame #19: 0x00000001000022fa minimal`std::sys_common::backtrace::__rust_begin_short_backtrace::h1ea652726f9417f0 + 10
    frame #20: 0x000000010000235c minimal`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h254a4aa9b725da66 (.llvm.9306634996709519214) + 12
    frame #21: 0x000000010001a290 minimal`std::rt::lang_start_internal::h5e1feb19f4099625 [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::h31ef09f9bee131c1 at function.rs:259:13 [opt]
    frame #22: 0x000000010001a286 minimal`std::rt::lang_start_internal::h5e1feb19f4099625 [inlined] std::panicking::try::do_call::h693e4d9f7366f6d8 at panicking.rs:381 [opt]
    frame #23: 0x000000010001a286 minimal`std::rt::lang_start_internal::h5e1feb19f4099625 [inlined] std::panicking::try::h58e51096fb939dc8 at panicking.rs:345 [opt]
    frame #24: 0x000000010001a286 minimal`std::rt::lang_start_internal::h5e1feb19f4099625 [inlined] std::panic::catch_unwind::h4406f92fae32aea8 at panic.rs:382 [opt]
    frame #25: 0x000000010001a286 minimal`std::rt::lang_start_internal::h5e1feb19f4099625 at rt.rs:51 [opt]
    frame #26: 0x0000000100004539 minimal`main + 41
    frame #27: 0x00007fff7f3603d5 libdyld.dylib`start + 1
    frame #28: 0x00007fff7f3603d5 libdyld.dylib`start + 1

@bjorn3
Copy link
Member

bjorn3 commented Sep 30, 2020

Can't reproduce with rustc 1.48.0-nightly (381b445ff 2020-09-29).

@steffahn
Copy link
Member

steffahn commented Sep 30, 2020

I don’t have MacOS, so I can’t test this myself easily. Here’s some tags though:
@rustbot modify labels: O-macos, T-compiler, E-needs-mcve. And someone might wanna add I-unsound.

Edit: To clarify, I’m not seeing any segfault on Windows or Linux (both 7f7a1cb 2020-09-27). Oddly enough, I’m getting Edge(Some(NotUsed)) on Windows, and Edge(Some(Locator)) on Linux. And I get Edge(Some(LocatorLabel)) without --release on both (also with --release on stable or beta or in miri, for that matter). Is this program supposed to be deterministic?

@cormacrelf Perhaps you could test your less minimized version on Linux or Windows if you have access to those (since you said “the segfault started to change a bit”, so maybe the original version also segfaults on e.g. Linux).

@rustbot rustbot added E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example O-macos Operating system: macOS T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 30, 2020
@cormacrelf
Copy link
Contributor Author

Yes, the program is deterministic. I have seen similar wrong variants as well in the original code on macOS, so I'll go out in a limb and say that's the same bug on Linux. I can run a docker image if you like but I think you've already found it. The fragility I was referring to was in terms of "add a debug statement or delete some line, the issue goes away or pops up on a different input instead". But once you have it it produces the same segfault again and again. Given the complexity of the repro I'm not surprised that platform differences including calling conventions etc would affect reproducibility.

I did run in Miri and it didn't find anything amiss, and produced correct output, ie the full Edge(Some(LocatorLabel)).

@cormacrelf
Copy link
Contributor Author

@bjorn3 Yeah I should have made it more obvious about macOS. I can still reproduce with:

rustc 1.48.0-nightly (7f7a1cbfd 2020-09-27)
binary: rustc
commit-hash: 7f7a1cbfd3b55daee191247770627afab09eece2
commit-date: 2020-09-27
host: x86_64-apple-darwin
release: 1.48.0-nightly
LLVM version: 11.0

@steffahn
Copy link
Member

steffahn commented Sep 30, 2020

Okay, I guess if it is supposed to be deterministic, this means I can then confirm the problem for linux and windows. I’m currently trying to reduce the example further, keeping the different behavior between --release and debug on linux. @rustbot modify labels: -O-macos

@rustbot rustbot removed the O-macos Operating system: macOS label Sep 30, 2020
@steffahn
Copy link
Member

Just got a segfault on Linux with a reduced example.

@tmiasko
Copy link
Contributor

tmiasko commented Sep 30, 2020

SimplifyArmIdentity moves an assignment to a local before StorageLive:

fn main() {
    f(None);
}

pub enum A {
    X,
    Y,
}

pub enum B {
    B(Option<A>),
}

pub fn f(a: Option<A>) -> B {
    if let Some(x) = a {
        let y = x;
        return B::B(Some(y));
    }
    B::B(None)
}
--- main.f.005-007.SimplifyArmIdentity.before.mir
+++ main.f.005-007.SimplifyArmIdentity.after.mir
@@ -1,2 +1,2 @@
-// MIR for `f` before SimplifyArmIdentity
+// MIR for `f` after SimplifyArmIdentity
 
@@ -13,6 +13,6 @@
     scope 1 {
-        debug x => _4;                   // in scope 1 at src/main.rs:15:17: 15:18
+        debug x => ((_7 as Some).0: A);  // in scope 1 at src/main.rs:15:17: 15:18
         let _6: A;                       // in scope 1 at src/main.rs:16:13: 16:14
         scope 2 {
-            debug y => _6;               // in scope 2 at src/main.rs:16:13: 16:14
+            debug y => ((_7 as Some).0: A); // in scope 2 at src/main.rs:16:13: 16:14
         }
@@ -47,9 +47,4 @@
         StorageLive(_6);                 // scope 1 at src/main.rs:16:13: 16:14
-        _6 = move _4;                    // scope 1 at src/main.rs:16:17: 16:18
+        _7 = move _1;                    // scope 2 at src/main.rs:17:21: 17:28
         StorageLive(_7);                 // scope 2 at src/main.rs:17:21: 17:28
-        StorageLive(_8);                 // scope 2 at src/main.rs:17:26: 17:27
-        _8 = move _6;                    // scope 2 at src/main.rs:17:26: 17:27
-        ((_7 as Some).0: A) = move _8;   // scope 2 at src/main.rs:17:21: 17:28
-        discriminant(_7) = 1;            // scope 2 at src/main.rs:17:21: 17:28
-        StorageDead(_8);                 // scope 2 at src/main.rs:17:27: 17:28
         ((_0 as B).0: std::option::Option<A>) = move _7; // scope 2 at src/main.rs:17:16: 17:29

cc @wesleywiser

@wesleywiser wesleywiser added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness and removed E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example labels Sep 30, 2020
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Sep 30, 2020
@steffahn
Copy link
Member

The label regression-from-stable-to-nightly might also be appropriate.

@wesleywiser wesleywiser added the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Sep 30, 2020
@wesleywiser
Copy link
Member

A slight modification to @tmiasko's repro and it will segfault in release mode:

fn main() {
    println!("{:?}", f(Some(A::Y)));
}

#[derive(Eq, Debug, PartialEq)]
pub enum A {
    X(String),
    Y,
}

#[derive(Eq, Debug, PartialEq)]
pub enum B {
    B(Option<A>),
}

pub fn f(a: Option<A>) -> B {
    if let Some(x) = a {
        let y = x;
        return B::B(Some(y));
    }
    B::B(None)
}

I can open a PR disabling this optimization so this doesn't hit beta next week.

@LeSeulArtichaut LeSeulArtichaut added the A-mir-opt Area: MIR optimizations label Sep 30, 2020
@wesleywiser wesleywiser self-assigned this Sep 30, 2020
@wesleywiser wesleywiser added P-critical Critical priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Sep 30, 2020
@Dylan-DPC-zz
Copy link

Dylan-DPC-zz commented Sep 30, 2020

Marking this as P-critical based on the wg-prioritization discussion

@jonas-schievink
Copy link
Contributor

We can probably turn bugs like this into ICEs by extending the MIR validator (unless another optimization interferes before this hits codegen)

@steffahn
Copy link
Member

steffahn commented Sep 30, 2020

Because I didn’t feel like aborting my own minimization attempt, here’s the (heavily) reduced version of OPs example that I got:

#[derive(Debug)]
enum MyEnum {
    Variant1(Vec<u8>),
    Variant2,
    Variant3,
    Variant4,
}

fn f(arg1: &bool, arg2: &bool, arg3: bool) -> MyStruct {
    if *arg1 {
        println!("{:?}", f(&arg2, arg2, arg3));
        MyStruct(None)
    } else {
        match if arg3 { Some(MyEnum::Variant3) } else { None } {
            Some(t) => {
                let ah = t;
                return MyStruct(Some(ah));
            }
            _ => MyStruct(None)
        }
    }
}

#[derive(Debug)]
struct MyStruct(Option<MyEnum>);

fn main() {
    let arg1 = true;
    let arg2 = false;
    f(&arg1, &arg2, true);
}

Causing a segfault on linux with nightly.

@steffahn
Copy link
Member

@wesleywiser Your modified example does not consistently segfault. It only does so for nightlies newer than 2020-09-27 or so. The difference between Some and None being printed does however start from the same commit that OPs bisection identified. (Actually it does not “segfault” at all but gives me an “invalid pointer” error.)

@wesleywiser
Copy link
Member

wesleywiser commented Sep 30, 2020

@steffahn

Your modified example does not consistently segfault. It only does so for nightlies newer than 2020-09-27 or so

That sounds about right to me since #76844 should have been in nightly 2020-09-26. Prior to that PR, the mis-optimization was disabled in #76837 which should have been in nightly-2020-09-19 or so. Between ~2020-09-19 and ~2020-09-26 the issue should not appear.

Does your repro work on the playground? When I try it with the nightly available, it works correctly.

@cormacrelf cormacrelf changed the title Segfault when debugging an enum Segfaults/corruption when reading an enum in release mode Sep 30, 2020
wesleywiser added a commit to wesleywiser/rust that referenced this issue Oct 2, 2020
The optimization still has some bugs that need to be worked out
such as rust-lang#77359.

We can try re-enabling this again after the known issues are resolved.
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 2, 2020
…ty, r=oli-obk

Disable the SimplifyArmIdentity mir-opt

The optimization still has some bugs that need to be worked out
such as rust-lang#77359.

We can try re-enabling this again after the known issues are resolved.

r? `@oli-obk`
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 2, 2020
…ness, r=wesleywiser

-Zvalidate-mir: Assert that storage is allocated on local use

This extends the MIR validator to check that locals are only used when their backing storage is currently allocated via `StorageLive`.

The result of this is that miscompilations such as rust-lang#77359 are caught and turned into ICEs.

The PR currently fails tests because miscompilations such as rust-lang#77359 are caught and turned into ICEs.

I have confirmed that tests pass (even with `-Zvalidate-mir`) once `SimplifyArmIdentity` is turned into a no-op (except mir-opt tests, of course).
@camelid

This comment has been minimized.

@wesleywiser

This comment has been minimized.

@wesleywiser
Copy link
Member

wesleywiser commented Oct 3, 2020

Both repros work correctly on the latest nightly (2020-10-02), so I'm adjusting the tags accordingly. Thanks @cormacrelf for reporting this issue!

@wesleywiser wesleywiser added requires-nightly This issue requires a nightly compiler in some way. and removed P-critical Critical priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Oct 3, 2020
@camelid

This comment has been minimized.

@wesleywiser

This comment has been minimized.

@wesleywiser wesleywiser removed their assignment Nov 29, 2020
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 25, 2023
Delete `SimplifyArmIdentity` and `SimplifyBranchSame` mir opts

I had attempted to fix the first of these opts in rust-lang#94177 . However, despite that PR already being a full re-write, it still did not fix some of the core soundness issues. The optimizations that are attempted here are likely to be desirable, but I do not expect any of the currently written code to survive into a sound implementation. Deleting the code keeps us from having to maintain the passes in the meantime.

Closes rust-lang#77359 , closes rust-lang#72800 , closes rust-lang#78628

r? `@cjgillot`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 25, 2023
Delete `SimplifyArmIdentity` and `SimplifyBranchSame` mir opts

I had attempted to fix the first of these opts in rust-lang#94177 . However, despite that PR already being a full re-write, it still did not fix some of the core soundness issues. The optimizations that are attempted here are likely to be desirable, but I do not expect any of the currently written code to survive into a sound implementation. Deleting the code keeps us from having to maintain the passes in the meantime.

Closes rust-lang#77359 , closes rust-lang#72800 , closes rust-lang#78628

r? ``@cjgillot``
@bors bors closed this as completed in c2f46df Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir-opt Area: MIR optimizations C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness requires-nightly This issue requires a nightly compiler in some way. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
10 participants