Miri hangs when a test failed #2273

RalfJung · 2022-06-26T20:13:21Z

When a test failed (not with UB, but with a regular assertion failure), and libtest goes to print the failed tests, something goes wrong and Miri just hangs. I am not sure if this always happens, but I just got it in crossbeam and I seem to recall @saethlin also mentioning this.

To reproduce:

checkout crossbeam-rs/crossbeam@f12133c
run MIRIFLAGS="-Zmiri-disable-isolation" cargo +miri miri test -p crossbeam-utils -- wait_and_drop

Using -Zmiri-progress-report, I can tell that interpretation itself grinds to a halt -- usually it should print progress every few seconds, but it doesn't, since operations seem to start taking a lot longer. By reducing the progress interval I got this backtrace:

   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/slice/mod.rs:405:20
    |
405 |         unsafe { &*index.get_unchecked(self) }
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^ progress report: current operation being executed is here
    |
    = note: inside `core::slice::<impl [u8]>::get_unchecked::<usize>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/slice/mod.rs:405:20
    = note: inside `<core::str::lossy::Utf8LossyChunksIter as std::iter::Iterator>::next` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/str/lossy.rs:65:34
    = note: inside `std::string::String::from_utf8_lossy` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/alloc/src/string.rs:633:48
    = note: inside `test::formatters::pretty::PrettyFormatter::<std::io::Stdout>::write_results` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/test/src/formatters/pretty.rs:141:30
    = note: inside `test::formatters::pretty::PrettyFormatter::<std::io::Stdout>::write_failures` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/test/src/formatters/pretty.rs:164:9
    = note: inside `<test::formatters::pretty::PrettyFormatter<std::io::Stdout> as test::formatters::OutputFormatter>::write_run_finish` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/test/src/formatters/pretty.rs:248:17
    = note: inside `test::run_tests_console` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/test/src/console.rs:298:5
    = note: inside `test::test_main` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/test/src/lib.rs:113:15
    = note: inside `test::test_main_static` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/test/src/lib.rs:132:5
    = note: inside `main`

Possibly #1935 will help.

The text was updated successfully, but these errors were encountered:

saethlin · 2022-06-26T23:55:04Z

I'm trying to use this as a test case and now the test in question is passing.

Here's a useful program:

fn main() {
    leaf(100);
}

fn leaf(x: usize) {
    if x == 0 {
        panic!("ouch");
    }
    leaf(x - 1);
}

#[test]
fn oof() {
    panic!("ouch");
}

So based on this, the stack depth is irrelevant (cargo miri run prints slowly, cargo miri test hangs), which is comforting. The problem is definitely related to libtests and that code path through PrettyFormatter specifically.

I actually added the backtraces bench-cargo-miri project to target this general situation, and at the time I didn't realize there are two different problems here. More motivation to go deal with that rebase I guess :p

RalfJung · 2022-06-27T00:00:12Z

there are two different problems here

There are?

saethlin · 2022-06-27T00:09:55Z

Ah sorry, speaking from my particular perspective on SB performance 😅

From what I recall, just the Debug impl that gets invoked to print a backtrace is pathological with respect to the caching I'm trying to add in #1935.

However, that PR does finish running these tests in ~46 seconds which makes it a >10x speedup over the current situation. Which I think is due to a big speedup along the code path you pointed out. Though we'll know for sure when I get around to rebasing the PR.

Kixiron · 2022-06-27T20:55:09Z

I made a PR that changes Utf8LossyChunksIter::next() to use pointers instead of references and it seems to completely alleviate the test case from #2273 (comment), dropping the runtime down to ~5 seconds on my machine

saethlin · 2022-06-30T00:11:37Z

Here is another fun program:

fn main() {
    let x: &[u8] = &[0u8; 256];
    let start = std::time::Instant::now();
    for i in 0..x.len() {
        //let _item = unsafe { *x.get_unchecked(i) };
        let _item = x[i];
    }
    println!("{:?}", start.elapsed());

    let x: &[u8] = &[0u8; 512];
    let start = std::time::Instant::now();
    for i in 0..x.len() {
        //let _item = unsafe { *x.get_unchecked(i) };
        let _item = x[i];
    }
    println!("{:?}", start.elapsed());

    let x: &[u8] = &[0u8; 1024];
    let start = std::time::Instant::now();
    for i in 0..x.len() {
        //let _item = unsafe { *x.get_unchecked(i) };
        let _item = x[i];
    }
    println!("{:?}", start.elapsed());
}

On my machine, this prints:

22.68581ms
43.578931ms
85.368123ms

Very reasonable, ~2x increase in runtime per doubling of the array size.
But if I swap in all the get_unchecked for the square bracket indexing, I end up with:

58.717391ms
319.230435ms
2.783580145s

That's more like a ~8x increase per doubling of the array size. Ouch. If I stick in a x.len() call to model the behavior of the iterator that powers String::from_utf8_lossy, I see a further ~4x regression. So some quick estimation based on the amount of output we get from the a RUST_BACKTRACE=full test failure suggests the backtrace should finish printing in about 3 hours.

Also, String::from_utf8_lossy has the same runtime and runtime scaling as just looping over a slice and doing len() + get_unchecked(i).

So if I had to go for a minimal diff to address the backtrace printing issue, hoist the length and a pointer to the slice into a local outside of the loop, and use pointer arithmetic to read each element in the ASCII path instead of calling get_unchecked.

I would really like to do something about this on the Miri end, but I'm not sure what to do. The debugging support I slapped together to figure out what tags are being added might be a start?

RalfJung · 2022-06-30T01:12:03Z

What happens with something like this?

let elem = &x[i];
let _item = *elem;

Then we do still get a reborrow for each iteration. But we avoid the large reborrow of x, which I think is the problem.

saethlin · 2022-06-30T03:56:06Z

Same runtimes within noise as let _item = x[i]. Yes, I agree that these large reborrows are the problem.

RalfJung · 2022-06-30T14:57:51Z

Note that with &x[i], we do already end up with a unique borrow stack per location. So that is not, in itself, the problem.

I think there is, in theory, an optimization we can do here, but I am not sure how practical it is.
When we reborrow a tag that is SharedReadOnly everywhere from another one, and both tags are in the borrow stack of the exact same locations, then we can just keep using the original tag and don't need to generate+push a new one. Basically, two SharedReadOnly tags that are in the same stacks don't differentiate anything, they will always be invalidated at the exact same moment.

Add a benchmark of the hang-on-test-failure code path This is the code pattern that produces the performance problem in #2273 I figured out what I was stuck on in #2315 (comment). For a while I was just doing `let x: &[u8] = &[0u8; 4096];` but that doesn't produce the runtime inside `Stack::item_popped` that I was looking for, I think because this allocation is never deallocated. But with `Vec`, I get the profile I'm looking for.

…mulacrum Update backtrace Most notably, this pulls in rust-lang/backtrace-rs@ebc9a85, which is both a bugfix and a significant performance improvement for Miri, since fixing the bug makes `RUST_BACKTRACE=1` backtraces much smaller: rust-lang/miri#2273, rust-lang/miri-test-libstd#8 This also pulls in other commits which turn the backtrace-rs CI green. That's nice.

Update backtrace Most notably, this pulls in rust-lang/backtrace-rs@ebc9a85, which is both a bugfix and a significant performance improvement for Miri, since fixing the bug makes `RUST_BACKTRACE=1` backtraces much smaller: rust-lang/miri#2273, rust-lang/miri-test-libstd#8 This also pulls in other commits which turn the backtrace-rs CI green. That's nice.

saethlin · 2022-09-20T04:53:10Z

Between the stack cache and the tag GC, on my machine Miri will execute a simple Cargo project with a test that just panics in 2.3 seconds with RUST_BACKTRACE=1 and 6.6 seconds with RUST_BACKTRACE=full (the tag GC makes full backtraces much faster, it has basically no effect on the pruned ones). It's still not speedy, but I think we're comfortably in the range where this won't be confused with a hang. Do you agree?

RalfJung · 2022-09-20T06:55:49Z

Yeah agreed.

What does this program with square bracket indexing now print (without isolation)?

saethlin · 2022-09-20T12:45:31Z

25.074065ms
44.571429ms
87.756986ms

RalfJung · 2022-09-20T13:39:39Z

Sorry, I meant with get_unchecked. For me that shows

# primitive indexing
31.261897ms
61.30079ms
122.615057ms

# get_unchecked
84.539718ms
221.671185ms
441.900342ms

So, it's still super-linear, but it's a lot better than before. Awesome!

saethlin · 2022-09-20T13:58:23Z

Is it super-linear? I see this

35.167154ms
67.234414ms
131.999157ms

Which is almost exactly doubling of runtime when we double the size of the array.

(I would not expect the tag GC to fix all such cases of super-linear behavior, it's a bit conservative and there's also the RangeMap splitting)

RalfJung · 2022-09-20T14:05:09Z

The "time divided by array size" in my measurement is

0,33
0,43
0,43

🤷

RalfJung added C-bug Category: This is a bug. A-aliasing Area: This affects the aliasing model (Stacked/Tree Borrows) I-slow Impact: Makes Miri even slower than it already is labels Jun 26, 2022

Kixiron mentioned this issue Jun 27, 2022

Fixed worst-case miri performance with lossy string decoding rust-lang/rust#98592

Closed

saethlin mentioned this issue Jul 10, 2022

Add a benchmark of the hang-on-test-failure code path #2352

Merged

RalfJung mentioned this issue Aug 6, 2022

looks like it hangs right before printing failures rust-lang/miri-test-libstd#8

Closed

This was referenced Aug 27, 2022

Obey the return value from the trace visitor under Miri rust-lang/backtrace-rs#476

Merged

Update backtrace rust-lang/rust#101342

Merged

RalfJung closed this as completed Sep 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miri hangs when a test failed #2273

Miri hangs when a test failed #2273

RalfJung commented Jun 26, 2022 •

edited

Loading

saethlin commented Jun 26, 2022 •

edited

Loading

RalfJung commented Jun 27, 2022

saethlin commented Jun 27, 2022

Kixiron commented Jun 27, 2022

saethlin commented Jun 30, 2022

RalfJung commented Jun 30, 2022

saethlin commented Jun 30, 2022

RalfJung commented Jun 30, 2022

saethlin commented Sep 20, 2022

RalfJung commented Sep 20, 2022

saethlin commented Sep 20, 2022

RalfJung commented Sep 20, 2022

saethlin commented Sep 20, 2022

RalfJung commented Sep 20, 2022

Miri hangs when a test failed #2273

Miri hangs when a test failed #2273

Comments

RalfJung commented Jun 26, 2022 • edited Loading

saethlin commented Jun 26, 2022 • edited Loading

RalfJung commented Jun 27, 2022

saethlin commented Jun 27, 2022

Kixiron commented Jun 27, 2022

saethlin commented Jun 30, 2022

RalfJung commented Jun 30, 2022

saethlin commented Jun 30, 2022

RalfJung commented Jun 30, 2022

saethlin commented Sep 20, 2022

RalfJung commented Sep 20, 2022

saethlin commented Sep 20, 2022

RalfJung commented Sep 20, 2022

saethlin commented Sep 20, 2022

RalfJung commented Sep 20, 2022

RalfJung commented Jun 26, 2022 •

edited

Loading

saethlin commented Jun 26, 2022 •

edited

Loading