Python: Remove control flow nodes for module entry definitions from the dataflow graph. #15030

yoff · 2023-12-06T20:48:10Z

This should have been part of #14777.
These nodes are extra copared to which SSA nodes existed before.

We do lose one result that we gained by having these nodes:

testFailures
+| package/subpackage/__init__.py:12:90:12:113 | Comment #$ prints=submodule_attr | Missing result:prints=submodule_attr |

which is the one we might expect to lose. We do get to keep the other one (about lambdas in flow summaries),

for module entry definitions from the dataflow graph.

mostly removing of nodes from the graph. One result lost: ``` check("submodule.submodule_attr", submodule.submodule_attr, "submodule_attr", globals()) #$ MISSING:prints=submodule_attr ```

RasmusWL · 2023-12-07T09:54:42Z

When I initially read the PR description, that sounded like a non-trivial tradeoff. What is your conclusion on whether this is a good tradeoff or not? (I don't see you taking a stance on this explicitly).

From reading the code over more closely myself, it seems like a huge edge case scenario, where we import a single attribute from a relative module and then access a different attribute afterwards.

from .submodule import irrelevant_attr
use(submodule.submodule_attr)

We could try to gauge the impact of this from using MRVA or DCA with some meta-query, if we actually wanted to learn more. But I expect we can reach a conclusion without it 🤞

I also realized that we should have updated the comment for TNode in the original PR (but didn't):

codeql/python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPublic.qll

Lines 15 to 25 in 263c0aa

    
           /** 
        
            * IPA type for data flow nodes. 
        
            * 
        
            * Flow between SSA variables are computed in `Essa.qll` 
        
            * 
        
            * Flow from SSA variables to control flow nodes are generally via uses. 
        
            * 
        
            * Flow from control flow nodes to SSA variables are generally via assignments. 
        
            * 
        
            * The current implementation of these cross flows can be seen in `EssaTaintTracking`. 
        
            */

Since we're fixing up minor things in this PR anyway, do you care to fix that comment as well? 🙏

yoff · 2023-12-08T10:05:57Z

What is your conclusion on whether this is a good tradeoff or not? (I don't see you taking a stance on this explicitly).

Sorry, I had recently written my opinion on slack and forgot to repeat it here. I believe that the SSA removal should be strictly cleanup, essentially "no semantic changes" just rerouting the graph past unneeded nodes. So I am for no performance degradation and no gained precision (except what we got from sorting out previous disconnects).

That we now know of a way to gain precision is a nice bonus, but we might be able to get that cheaper. I think we should investigate the impact and trade-off later.

Since we're fixing up minor things in this PR anyway, do you care to fix that comment as well?

Yes, now that I am no longer in hot-fix mode, I think that is a great idea :-)

RasmusWL

I believe that the SSA removal should be strictly cleanup, essentially "no semantic changes" just rerouting the graph past unneeded nodes. So I am for no performance degradation and no gained precision (except what we got from sorting out previous disconnects).

Thanks for that clear sentiment 👍 with that in mind, here is an approval (even though I would still like to see the doc improvement)

…s from github#15030

Python: remove control flow nodes

8c5ca3f

for module entry definitions from the dataflow graph.

github-actions bot added the Python label Dec 6, 2023

Python: adjust test expectations

263c0aa

mostly removing of nodes from the graph. One result lost: ``` check("submodule.submodule_attr", submodule.submodule_attr, "submodule_attr", globals()) #$ MISSING:prints=submodule_attr ```

yoff marked this pull request as ready for review December 7, 2023 07:33

yoff requested a review from a team as a code owner December 7, 2023 07:33

yoff added the no-change-note-required This PR does not need a change note label Dec 7, 2023

RasmusWL previously approved these changes Dec 8, 2023

View reviewed changes

RasmusWL added a commit to RasmusWL/codeql that referenced this pull request Dec 8, 2023

Python: Recover subclass finder .expected after cherry picking commit…

de55ca3

…s from github#15030

Python: Update comment.

d9c0c8c

yoff dismissed RasmusWL’s stale review via d9c0c8c December 8, 2023 16:32

yoff requested a review from RasmusWL December 8, 2023 16:33

RasmusWL approved these changes Dec 11, 2023

View reviewed changes

RasmusWL merged commit 419130b into github:main Dec 11, 2023
8 checks passed

RasmusWL added a commit to RasmusWL/codeql that referenced this pull request Dec 19, 2023

Python: Recover subclass finder .expected after cherry picking commit…

0fe29b6

…s from github#15030

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Remove control flow nodes for module entry definitions from the dataflow graph. #15030

Python: Remove control flow nodes for module entry definitions from the dataflow graph. #15030

yoff commented Dec 6, 2023 •

edited

Loading

RasmusWL commented Dec 7, 2023

yoff commented Dec 8, 2023

RasmusWL left a comment

Python: Remove control flow nodes for module entry definitions from the dataflow graph. #15030

Python: Remove control flow nodes for module entry definitions from the dataflow graph. #15030

Conversation

yoff commented Dec 6, 2023 • edited Loading

RasmusWL commented Dec 7, 2023

yoff commented Dec 8, 2023

RasmusWL left a comment

Choose a reason for hiding this comment

yoff commented Dec 6, 2023 •

edited

Loading