Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Remove control flow nodes for module entry definitions from the dataflow graph. #15030

Merged
merged 3 commits into from
Dec 11, 2023

Conversation

yoff
Copy link
Contributor

@yoff yoff commented Dec 6, 2023

This should have been part of #14777.
These nodes are extra copared to which SSA nodes existed before.

We do lose one result that we gained by having these nodes:

testFailures
+| package/subpackage/__init__.py:12:90:12:113 | Comment #$ prints=submodule_attr | Missing result:prints=submodule_attr |

which is the one we might expect to lose. We do get to keep the other one (about lambdas in flow summaries),

for module entry definitions from the dataflow graph.
@github-actions github-actions bot added the Python label Dec 6, 2023
mostly removing of nodes from the graph.
One result lost:
```
check("submodule.submodule_attr", submodule.submodule_attr, "submodule_attr", globals()) #$ MISSING:prints=submodule_attr
```
@yoff yoff marked this pull request as ready for review December 7, 2023 07:33
@yoff yoff requested a review from a team as a code owner December 7, 2023 07:33
@yoff yoff added the no-change-note-required This PR does not need a change note label Dec 7, 2023
@RasmusWL
Copy link
Member

RasmusWL commented Dec 7, 2023

When I initially read the PR description, that sounded like a non-trivial tradeoff. What is your conclusion on whether this is a good tradeoff or not? (I don't see you taking a stance on this explicitly).

From reading the code over more closely myself, it seems like a huge edge case scenario, where we import a single attribute from a relative module and then access a different attribute afterwards.

from .submodule import irrelevant_attr
use(submodule.submodule_attr)

We could try to gauge the impact of this from using MRVA or DCA with some meta-query, if we actually wanted to learn more. But I expect we can reach a conclusion without it 🤞


I also realized that we should have updated the comment for TNode in the original PR (but didn't):

/**
* IPA type for data flow nodes.
*
* Flow between SSA variables are computed in `Essa.qll`
*
* Flow from SSA variables to control flow nodes are generally via uses.
*
* Flow from control flow nodes to SSA variables are generally via assignments.
*
* The current implementation of these cross flows can be seen in `EssaTaintTracking`.
*/

Since we're fixing up minor things in this PR anyway, do you care to fix that comment as well? 🙏

@yoff
Copy link
Contributor Author

yoff commented Dec 8, 2023

What is your conclusion on whether this is a good tradeoff or not? (I don't see you taking a stance on this explicitly).

Sorry, I had recently written my opinion on slack and forgot to repeat it here. I believe that the SSA removal should be strictly cleanup, essentially "no semantic changes" just rerouting the graph past unneeded nodes. So I am for no performance degradation and no gained precision (except what we got from sorting out previous disconnects).

That we now know of a way to gain precision is a nice bonus, but we might be able to get that cheaper. I think we should investigate the impact and trade-off later.

Since we're fixing up minor things in this PR anyway, do you care to fix that comment as well?

Yes, now that I am no longer in hot-fix mode, I think that is a great idea :-)

RasmusWL
RasmusWL previously approved these changes Dec 8, 2023
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the SSA removal should be strictly cleanup, essentially "no semantic changes" just rerouting the graph past unneeded nodes. So I am for no performance degradation and no gained precision (except what we got from sorting out previous disconnects).

Thanks for that clear sentiment 👍 with that in mind, here is an approval (even though I would still like to see the doc improvement)

RasmusWL added a commit to RasmusWL/codeql that referenced this pull request Dec 8, 2023
@RasmusWL RasmusWL merged commit 419130b into github:main Dec 11, 2023
8 checks passed
RasmusWL added a commit to RasmusWL/codeql that referenced this pull request Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants