-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataflow: Simplify revFlowThrough #18355
base: main
Are you sure you want to change the base?
Dataflow: Simplify revFlowThrough #18355
Conversation
Observations: * revFlowThrough can be much larger than the other reverse-flow predicates, presumably when there are many different innerReturnAps. * It is only ever used in conjunction with flowThroughIntoCall, which can therefore be pushed in, and several of its parameters can thereby be dropped in exchange for exposing `arg`. * `revFlowThroughArg` can then be trivially inlined. Result: on repository `go-gitea/gitea` with PR github#17701 producing a wider selection of access paths than are seen on `main`, `revFlowThrough` drops in size from ~120m tuples to ~4m, and the runtime of the reverse-flow computation for dataflow stage 4 goes from dominating the forward-flow cost to relatively insignificant. Overall runtime falls from 3 minutes to 2 with substantial ram available, and presumably falls much more under GHA-style memory pressure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot wasn't able to review any files in this pull request.
Files not reviewed (1)
- shared/dataflow/codeql/dataflow/internal/DataFlowImpl.qll: Language not supported
Tip: Copilot only keeps its highest confidence comments to reduce noise and keep you focused. Learn more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I have started DCA for all languages, let's wait for the result of that before merging.
flowThroughIntoCall(call, arg, p, ap, innerReturnAp) and | ||
revFlowParamToReturn(p, state, pos, innerReturnAp, ap) and | ||
revFlowIsReturned(call, returnCtx, returnAp, pos, innerReturnAp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is disrupting a non-linear join, so it's very likely beneficial to push flowThroughIntoCall
further into one of the recursive conjuncts in some way - as-is we're likely having some inefficiency with at least one of the two delta+prev combinations. Also, it would be nice to understand a bit deeper which columns are contributing to the blowup in which way.
We can safely push the projection flowThroughIntoCall(_, _, p, ap, innerReturnAp)
into revFlowParamToReturn
, which may be beneficial, as that's a pure filter on a pre-non-linear-join conjunct, but we cannot push in the other columns as that would amount to a join with the call-graph a bit too soon (revFlowIsReturned
is exactly meant to constrain that part as much as possible).
OTOH, it may very well be good to push flowThroughIntoCall
in its entirety into revFlowIsReturned
as that already contains the call graph edge. If a project of flowThroughIntoCall
to a pure filter in that case turns out yield a beneficial tuple reduction, then the join of revFlowOut
and returnFlowsThrough
(which occurs in a few places) ought to be revised as flowThroughIntoCall
already contains a projected version of returnFlowsThrough
.
Observations:
arg
.revFlowThroughArg
can then be trivially inlined.Result: on repository
go-gitea/gitea
with PR #17701 producing a wider selection of access paths than are seen onmain
,revFlowThrough
drops in size from ~120m tuples to ~4m, and the runtime of the reverse-flow computation for dataflow stage 4 goes from dominating the forward-flow cost to relatively insignificant. Overall runtime falls from 3 minutes to 2 with substantial ram available, and presumably falls much more under GHA-style memory pressure.