-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use reverse postorder in non_ssa_locals
#96601
Conversation
The reverse postorder, unlike preorder, is now cached inside the MIR body. Code generation uses reverse postorder anyway, so it might be a small perf improvement to use it here as well.
r? @davidtwco (rust-highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit fa41852 with merge 204cd52d1e796574f64ace5276cb3a794e73585f... |
☀️ Try build successful - checks-actions |
Queued 204cd52d1e796574f64ace5276cb3a794e73585f with parent f75d884, future comparison URL. |
Finished benchmarking commit (204cd52d1e796574f64ace5276cb3a794e73585f): comparison url. Summary:
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Footnotes |
Does iterating in this order retain the benefits of pre-order as described in #85741? |
Yes. If x dominates y, then in any depth first walk of the control flow graph, x must be before y in a pre-order, and x must be before y in reverse post-order. |
@bors r+ |
📌 Commit fa41852 has been approved by |
⌛ Testing commit fa41852 with merge 9aeba99dc26b95ab9a9d737df6352a197e436845... |
💔 Test failed - checks-actions |
@bors retry spurious network error |
⌛ Testing commit fa41852 with merge fe21c30fb9b85db46cd807a0a345bb06bee90882... |
💔 Test failed - checks-actions |
The job Click to see the possible cause of the failure (guessed by this bot)
|
☀️ Test successful - checks-actions |
Finished benchmarking commit (e1df625): comparison url. Summary: This benchmark run did not return any relevant results. If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression |
// If there exists a local definition that dominates all uses of that local, | ||
// the definition should be visited first. Traverse blocks in preorder which | ||
// the definition should be visited first. Traverse blocks in an order that | ||
// is a topological sort of dominance partial order. | ||
for (bb, data) in traversal::preorder(&mir) { | ||
for (bb, data) in traversal::reverse_postorder(&mir) { | ||
analyzer.visit_basic_block_data(bb, data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, a bit surprised this was using preorder. I thought it was RPO since that's the correct def-before-use order (but I guess we have to check for dominance anyway so this can't go wrong just be unnecessarily conservative?).
EDIT: ah I see, it went through these steps:
visit_body
-> inlined loop over blocks: Remove dead code fromLocalAnalyzer
#85965- loop over blocks ->
preorder
: Use preorder traversal when checking for SSA locals #85741 preorder
->reverse_postorder
(this PR)
Would it make sense to force visit_body
to use RPO? Or have two forms, visit_body_unordered
and visit_body_rpo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that preorder and reverse postorder give identical end results, since either visits a definition before a use, when the definition dominates the use (and the order is irrelevant otherwise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: you can mostly ignore the rant below, it's me reasoning about RPO to myself, really
Hmm, I think it depends on what kind of "definition" we're talking about - I think I overapproximated what RPO actually did, and how it's stronger than just giving you "dominator before dominated".
I guess that implies a fun iteration algorithm like this:
fn visit_block(&mut self, bb: Block) {
if self.visited[bb] { return; }
if let Some(dom_bb) = self.doms[bb] {
self.visit_block(dom_bb);
}
// ... guaranteed to get here only once and *after* all dominators ...
}
More seriously though, I guess RPO is important in SSA for merges (let's ignore cycles for now):
S(tart)
/ \
a b
\ /
M(erge)
flatten | dedup (keep first) |
dedup (keep last) |
||
---|---|---|---|---|
preorder | S->{a->M, b->M} |
SaMbM |
SaMb |
SabM |
reverse (per-column) | MbMaS |
bMaS |
MbaS |
|
postorder | {M<-a, M<-b}<-S |
MaMbS |
MabS |
aMbS |
reverse (per-column) | SbMaM |
SbaM |
SbMa |
And RPO is usually the reversed "dedup (keep first)" postorder, which ends up as SbaM
(though siblings can be reversed in the initial postorder visit to get SabM
if that's aesthetically preferred for e.g. an IR dump - we should do this for --emit=mir
IMO).
What we're looking for in this example is S->{a,b}->M
which is a bit like structured control-flow, and for SSA in particular it allows seeing all the definitions of φ nodes (or "BB args" values etc.) before a merge (not sure if this logic works for backedges, but it partially might?).
In MIR we don't have φ nodes but we still want to see all "sources" of a merge before the merge for dataflow algorithms, for pretty much the same reason SSA IRs use φ/BB args, the difference being that a fixpoint dataflow algorithm is only slowed down by the suboptimal order (whereas SSA IR passes may have bigger issues with lacking definitions used by φ nodes).
Also, you may have noticed the above table that the preorder "dedup (keep last)" is SabM
without any reversals (but "keep last" is more expensive computationally, since we tend not to have the "flatten" form at all but instead skip visiting eagerly, which naturally results in "keep first").
So really what "RPO" does is a more efficient way to get "preorder but keep only the last visit instead of the usual first" (isomorphic up to sibling order, but I think you can get them perfectly equal if you make the "aesthetic fix" to RPO).
The reverse postorder, unlike preorder, is now cached inside the MIR
body. Code generation uses reverse postorder anyway, so it might be
a small perf improvement to use it here as well.