Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent query cycles in the MIR inliner #68828

Merged
merged 6 commits into from
Jan 25, 2021
Merged

Conversation

oli-obk
Copy link
Contributor

@oli-obk oli-obk commented Feb 4, 2020

r? @eddyb @wesleywiser

cc @rust-lang/wg-mir-opt

The general design is that we have a new query that is run on the validated_mir instead of on the optimized_mir. That query is forced before going into the optimization pipeline, so as to not try to read from a stolen MIR.

The query should not be cached cross crate, as you should never call it for items from other crates. By its very design calls into other crates can never cause query cycles.

This is a pessimistic approach to inlining, since we strictly have more calls in the validated_mir than we have in optimized_mir, but that's not a problem imo.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 4, 2020
Copy link
Contributor

@matthewjasper matthewjasper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pessimistic approach to inlining, since we strictly have more calls in the validated_mir than we have in optimized_mir

Except for box_free calls, which drop elaboration adds.

src/librustc_mir/transform/mod.rs Outdated Show resolved Hide resolved
src/librustc_mir/transform/inline.rs Outdated Show resolved Hide resolved
src/librustc_mir/transform/inline.rs Outdated Show resolved Hide resolved
@eddyb
Copy link
Member

eddyb commented Feb 5, 2020

I think I get it now, this is similar to computing SCCs (of the callgraph) ahead of time, but both better (incremental) and worse (not cached nor using an SCC algorithm) at the same time.

We discussed something like this ages ago, maybe @nikomatsakis can remember where.


I think your query helps both with caching and incremental red-green, but it's not necessary to get the correctness, you could "lookahead" by using optimized_mir(callee) in the inliner directly (unlike your query, there would be no stealing issue there AFAICT).


One of the problems with precomputing SCCs in an on-demand/incremental fashion is deciding on which (callgraph) node "defines" the SCCs, as it needs to be the same no matter which node in the SCC the information is queried for.

And... I think we've had an answer for a while now! We sort several things by stable hashes.

So we could roughly have this query (where mir_inliner_callees is your query):

struct Cycle {
    root: LocalDefId,
    members: FxHashSet<LocalDefId>,
}
fn mir_inliner_cycle(tcx: TyCtxt<'tcx>, root: LocalDefId) -> Option<&'tcx Cycle> {
    let root_hash = tcx.def_path_hash(root);

    let mut cycle = None;

    let mut queue = VecDeque::new();
    queue.push_back(root);

    while let Some(caller) = queue.pop_front() {
        for callee in tcx.mir_inliner_callees(caller) {
            if callee == def_id {
                cycle = Some(Cycle {
                    root,
                    // TODO: accumulate cycle members.
                    // Can probably use a stack instead of a queue, emulating
                    // the runtime callstack, but this example is large as is.
                    members: FxHashSet::new(),
                });
                continue;
            }

            if tcx.def_path_hash(callee) < root_hash {
                // We may share a cycle with `callee`, in which case
                // it's "closer" to the cycle's real root (lower hash).
                if let Some(callee_cycle) = tcx.mir_inliner_cycle(callee) {
                    if callee_cycle.members.contains(&root) {
                        return Some(callee_cycle);
                    }
                }
            } else {
                // Keep exploring areas of the callgraph that could
                // contain a cycle that we're the root of
                // (i.e. all other nodes have higher hash).
                queue.push_back(callee);
            }
        }
    }

    cycle
}

Alright, I got carried away sketching that, but I think we've made progress here!

EDIT: to be clear, I don't think we can't use a real whole-graph SCC algorithm, everything has to be one SCC at a time (i.e. "find SCC this node is in") for it to work with incremental queries.

@eddyb
Copy link
Member

eddyb commented Feb 5, 2020

This is a pessimistic approach to inlining, since we strictly have more calls in the validated_mir than we have in optimized_mir, but that's not a problem imo.

Is this true? Couldn't optimizations expose calls which were previously e.g. indirect?
Also, what about static dispatch being resolved by (partial) monomorphization during inlining?

src/librustc/query/mod.rs Outdated Show resolved Hide resolved
@oli-obk
Copy link
Contributor Author

oli-obk commented Feb 5, 2020

I think your query helps both with caching and incremental red-green, but it's not necessary to get the correctness, you could "lookahead" by using optimized_mir(callee) in the inliner directly (unlike your query, there would be no stealing issue there AFAICT).

I'm not sure how that could work, since optimized_mir will trigger the inliner on callee, which then will call optimized_mir causing cycles

@oli-obk
Copy link
Contributor Author

oli-obk commented Feb 5, 2020

Couldn't optimizations expose calls which were previously e.g. indirect?

Yea, I guess const prop could turn (foo as fn())() into foo(), this could be solved by splitting the optimization pipeline into two queries. One "pre-inlining" and one with the inliner and everything after it.

Also, what about static dispatch being resolved by (partial) monomorphization during inlining?

I had a more complex approach before the one showed here. That approach also returned the substs of the function calls from the query so we can adjust them to the call site's substs. That's basically what inlining does right now. I scrapped it because I couldn't wrap my head around all the edge cases and wanted to publish something. I plan to deduplicate that logic with the inliner's logic so we have a single source of truth instead of having two schemes that cause cycles if they diverge.

@nikomatsakis
Copy link
Contributor

I believe my plan was to abandon this constraint from @eddyb

EDIT: to be clear, I don't think we can't use a real whole-graph SCC algorithm, everything has to be one SCC at a time (i.e. "find SCC this node is in") for it to work with incremental queries.

Basically, we would have a single "call graph" query that walks the crate and constructs the call graph, presumably based on reading the "validated MIR". This query would be wrapped with other queries that extract the individual SCCs and return their contents -- with a bit of careful work on the keys, we could make it work so that if the contents of an SCC don't change, the query yields up green, and thus avoid the need to re-optimize things that contain inlining. (I am imagining something like: the key that identifies an SCC is a hash of its contents, perhaps? Or it is named by some def-id within which has the lowest hash. Something like that, so that it is not dependent on the order in which we traversed the call graph.)

Assumptions:

  • In an IDE, you would never do this; it's only done as part of batch compilation (i.e., when optimizing).
  • In an incremental context, most of the "Validated MIR" that you are asking for are going to be cached. (Is this true? Not sure, maybe we don't catch the intermediate MIR, I guess that could be a problem -- but we could address it by introducing an intermediate query that reads the validated MIR and then produces the list of callees, and caching just that.)
  • Similarly, the content of most SCCs will not change. Therefore, although all of the "optimized MIR" will be dependent on this "compute scc graph" query, there should mostly be green queries that lie in between, allowing us to avoid the work of recomputing it.

This strikes me as a fairly simple design that would likely work reasonably well? But I'm open to other alternatives for sure. I'm not sure how far off that is from the pseudo-code that @eddyb sketched.

@eddyb
Copy link
Member

eddyb commented Feb 5, 2020

@nikomatsakis I thought we decided long ago that it's a no-starter, otherwise... we would've done if already, wouldn't we? We even had the SCC computation code and everything, I think we just removed it.

The problem with a monolith query is that it depends on every function in the crate.
So any time the MIR changes (or more realistically, a call is added/removed), anywhere in the crate, the whole query would have to re-run.

Sure, individual optimized_mir executions would be firewalled by the intermediary per-SCC query, but a single monolithic query still has to run... do we expect it to be cheap?

My alternative would be more expensive the first time, probably, but it should explore smaller parts of the callgraph later.

@oli-obk Can you make a crate-wide query that just queries your local_callees query for all DefIds with MIR, and make the inliner call it, even when it's not enabled by the mir-opt-level? Just to see if the global SCC computation is feasible in terms of dependencies, even with no actual data being produced.

@nikomatsakis
Copy link
Contributor

I thought we decided long ago that it's a no-starter, otherwise... we would've done if already, wouldn't we? We even had the SCC computation code and everything, I think we just removed it.

I don't recall any such decision, but I could believe that there was a consensus reached, either without me, or that I've forgotten about.

So any time the MIR changes (or more realistically, a call is added/removed), anywhere in the crate, the whole query would have to re-run.

Yes. I am not convinced this would be a problem in practice, but it is definitely a downside. My hope was that this specific query would be "small enough" in scope not to be a big deal.

My alternative would be more expensive the first time, probably, but it should explore smaller parts of the callgraph later.

I guess I need to look more closely. It seems to me that one challenge here is that, if the query is "figure out which SCC this def-id is a part of", then we'd probably want to extend the query system, because otherwise you'll have to recompute that computation for each member of the SCC, right? (i.e., you could start from many different entry points)

@bjorn3
Copy link
Member

bjorn3 commented Feb 5, 2020

Would cycle error recovery be sound by discarding everything up to the oldest try_query call on a cycle, even when there is a newer try_query call. That way the query will be recomputed for every possible cycle entry point.

eg optimized_mir(a) -> try optimized_mir(b) -> try optimized_mir(a) would return the cycle error from try optimized_mir(b), not try optimized_mir(a) and the result of optimized_mir(b) will not be stored, thus not causing any difference depending on the cycle root.

The returning back to the oldest try_query could be implemented using:

  • unwinding
  • returning cycle error from newest try_query and not caching result for any queries returning until the oldest try_query returns.
  • same as previous, but returning a special kind of cycle error to request short circuiting of the query evaluation by eg returning a dummy value.

@nikomatsakis
Copy link
Contributor

I do think we could extend the query system to permit cycles, potentially, and avoid caching for intermediate nodes, but it raises some interesting questions. There are various models for this. It also interacts with parallel rustc. I guess I just need to read this PR and @eddyb's comment in more detail so I actually understand what's being proposed here and stop talking about hypotheticals.

// and is already optimized.
self.tcx.optimized_mir(callsite.callee)
};
let mut seen: FxHashSet<_> = Default::default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this cycle check should be extracted to a function.

@eddyb
Copy link
Member

eddyb commented Feb 6, 2020

because otherwise you'll have to recompute that computation for each member of the SCC, right? (i.e., you could start from many different entry points)

It bails out as soon as it finds another cycle member with a lower hash (i.e. "closer to the root"), but you're right that it might be too expensive (I think we should benchmark both your and my version).

Would cycle error recovery be sound by discarding everything up to the oldest try_query call on a cycle, even when there is a newer try_query call. That way the query will be recomputed for every possible cycle entry point.

Heh, that's what cyclotron's bruteforce implementation does. It's good enough for parsing with arbitrary context-free grammars (it handles left recursion, you just add LL and backtracking on top).

I don't think we can do that for optimized_mir (we don't really want to optimize more than once, although that changes if it's more fixpoint-like, but that requires properties which we can't guarantee of optimizations), but it might be interesting to rely on something like that for computing SCCs.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Feb 7, 2020

If we opt to have cycles discard caching for the contents of the cycle, it is reasonable, but it does mean that foo.query(bar) may have "inconsistent" results -- i.e., it can return distinct things depending on whether invoked from within a cycle or not. It's certainly not the right behavior for every scenario, but I think that for many scenarios it is what you want (in particular, for cases where you're attempting to do a search of a graph of possibilities).

EDIT: Just to clarify -- it's exactly the possibility of 'inconsistent' results that causes us problems using it for mir-optimize, since it means that the final results depend on query order, which we do not want.

EDIT: The prolog "tabling" solution would be to remember the result you got at the end and then re-run the query, but this time you use that result for any recursive results, and you require that it reaches a fixed point.

I had hoped we could get away with the approach that I was advocating for earlier in these sorts of scenarios, but I am open to the idea that it won't work. I feel though that we should at least try it before we reach that conclusion. As @eddyb said, building a few options and benchmarking would be ideal.

@bjorn3
Copy link
Member

bjorn3 commented Feb 7, 2020

EDIT: Just to clarify -- it's exactly the possibility of 'inconsistent' results that causes us problems using it for mir-optimize, since it means that the final results depend on query order, which we do not want.

What I meant was that at a cycle error everything after the first try_query is discarded, so for every possible entry point of the cycle a cycle error will be reported. on the next query of the cycle.

optimized_mir(a) <-> optimized_mir(b) will return a cycle error for the call of try optimized_mir(b) when calling optimized_mir(a) and vice versa. This way the query order doesn't matter.

@eddyb
Copy link
Member

eddyb commented Feb 7, 2020

EDIT: The prolog "tabling" solution would be to remember the result you got at the end and then re-run the query, but this time you use that result for any recursive results, and you require that it reaches a fixed point.

Haha, I've been meaning to look into tabling for a while now.
That's exactly what I implemented in cyclotron and I got the idea specifically thinking about left-recursion during parsing (because CFG parsing is monotonic, so you make progress up to a fixpoint).

@bors
Copy link
Contributor

bors commented Mar 1, 2020

☔ The latest upstream changes (presumably #69592) made this pull request unmergeable. Please resolve the merge conflicts.

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 11, 2020
@joelpalmer joelpalmer added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 23, 2020
@joelpalmer joelpalmer added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 2, 2020
@camelid
Copy link
Member

camelid commented Jan 23, 2021

failures:

---- [mir-opt] mir-opt/inline/cycle.rs stdout ----
13           StorageLive(_3);                 // scope 0 at $DIR/cycle.rs:4:5: 4:6
14           _3 = &_1;                        // scope 0 at $DIR/cycle.rs:4:5: 4:6
15           StorageLive(_4);                 // scope 0 at $DIR/cycle.rs:4:5: 4:8
-           _2 = <impl Fn() as Fn<()>>::call(move _3, move _4) -> [return: bb1, unwind: bb3]; // scope 0 at $DIR/cycle.rs:4:5: 4:8
+           _2 = <impl Fn() as Fn<()>>::call(move _3, move _4) -> bb1; // scope 0 at $DIR/cycle.rs:4:5: 4:8
17                                            // mir::Constant
18                                            // + span: $DIR/cycle.rs:4:5: 4:6
19                                            // + literal: Const { ty: for<'r> extern "rust-call" fn(&'r impl Fn(), ()) -> <impl Fn() as std::ops::FnOnce<()>>::Output {<impl Fn() as std::ops::Fn<()>>::call}, val: Value(Scalar(<ZST>)) }

24           StorageDead(_3);                 // scope 0 at $DIR/cycle.rs:4:7: 4:8
25           StorageDead(_2);                 // scope 0 at $DIR/cycle.rs:4:8: 4:9
26           _0 = const ();                   // scope 0 at $DIR/cycle.rs:3:20: 5:2
-           drop(_1) -> [return: bb2, unwind: bb4]; // scope 0 at $DIR/cycle.rs:5:1: 5:2
+           drop(_1) -> bb2;                 // scope 0 at $DIR/cycle.rs:5:1: 5:2
29   
30       bb2: {


31           return;                          // scope 0 at $DIR/cycle.rs:5:2: 5:2
-   
-   
-       bb3 (cleanup): {
-           drop(_1) -> bb4;                 // scope 0 at $DIR/cycle.rs:5:1: 5:2
-   
-   
-       bb4 (cleanup): {
-           resume;                          // scope 0 at $DIR/cycle.rs:3:1: 5:2
41   }
42   
thread '[mir-opt] mir-opt/inline/cycle.rs' panicked at 'Actual MIR output differs from expected MIR output /checkout/src/test/mir-opt/inline/cycle.f.Inline.diff', src/tools/compiletest/src/runtest.rs:3452:25

@oli-obk
Copy link
Contributor Author

oli-obk commented Jan 25, 2021

@bors r=wesleywiser

wasm generates different MIR

@bors
Copy link
Contributor

bors commented Jan 25, 2021

📌 Commit d38553c has been approved by wesleywiser

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 25, 2021
@bors
Copy link
Contributor

bors commented Jan 25, 2021

⌛ Testing commit d38553c with merge 12c5b6d144fd893cecdab3539aa2723d0c860909...

@rust-log-analyzer
Copy link
Collaborator

The job dist-x86_64-linux failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
    Checking crates-io v0.31.1 (/tmp/cargo/crates/crates-io)
error[E0283]: type annotations needed
   --> src/cargo/util/config/de.rs:530:63
    |
530 |                 seed.deserialize(Tuple2Deserializer(1i32, env.as_ref()))
    |                                                           |   |
    |                                                           |   |
    |                                                           |   cannot infer type for type parameter `T` declared on the trait `AsRef`
    |                                                           this method call resolves to `&T`
    |
    = note: cannot satisfy `std::string::String: AsRef<_>`
help: use the fully qualified path for the potential candidates
    |
530 |                 seed.deserialize(Tuple2Deserializer(1i32, <std::string::String as AsRef<OsStr>>::as_ref(env)))
    |                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
530 |                 seed.deserialize(Tuple2Deserializer(1i32, <std::string::String as AsRef<std::path::Path>>::as_ref(env)))
    |                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
530 |                 seed.deserialize(Tuple2Deserializer(1i32, <std::string::String as AsRef<str>>::as_ref(env)))
    |                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
530 |                 seed.deserialize(Tuple2Deserializer(1i32, <std::string::String as AsRef<[u8]>>::as_ref(env)))

error: aborting due to previous error

For more information about this error, try `rustc --explain E0283`.

@bors
Copy link
Contributor

bors commented Jan 25, 2021

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 25, 2021
@oli-obk
Copy link
Contributor Author

oli-obk commented Jan 25, 2021

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 25, 2021
@bors
Copy link
Contributor

bors commented Jan 25, 2021

⌛ Testing commit d38553c with merge f4eb5d9...

@bors
Copy link
Contributor

bors commented Jan 25, 2021

☀️ Test successful - checks-actions
Approved by: wesleywiser
Pushing f4eb5d9 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 25, 2021
@bors bors merged commit f4eb5d9 into rust-lang:master Jan 25, 2021
@rustbot rustbot added this to the 1.51.0 milestone Jan 25, 2021
@Mark-Simulacrum
Copy link
Member

It looks like this was a moderate regression in instruction counts on several benchmarks (https://perf.rust-lang.org/compare.html?start=7fba12bb1d3877870758a7a53e2fe766bb19bd60&end=f4eb5d9f719cd3c849befc8914ad8ce0ddcf34ed&stat=instructions:u), though there was also some improvement. @oli-obk - was this expected? I guess there were some earlier perf runs that were much worse, so we've improved since then at least...

@oli-obk oli-obk deleted the inline_cycle branch January 27, 2021 11:02
@oli-obk
Copy link
Contributor Author

oli-obk commented Jan 27, 2021

the previous runs here were with inlining activated. I should have re-run without inlining activated, but I forgot.

The results are really odd. When looking at the detailed perf (e.g. for keccak), it seems like typeck has gotten slower, which makes absolutely no sense considering this PR doesn't actually change anything without mir-opt-level=2... I'll start looking into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir-opt-inlining Area: MIR inlining merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.