Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide consistent concept for DFG/EOG for Overlay nodes #1994

Open
oxisto opened this issue Jan 28, 2025 · 1 comment
Open

Provide consistent concept for DFG/EOG for Overlay nodes #1994

oxisto opened this issue Jan 28, 2025 · 1 comment
Assignees
Labels

Comments

@oxisto
Copy link
Member

oxisto commented Jan 28, 2025

We need a consistent concept for

  • DFG and EOG edges for the operations and concepts
  • Is it allowed to put DFG edges between concepts/operations?
  • Put the DFG edges on the "traditional world" based on the concepts
  • Guidance where to place overlay nodes -> Is it allowed to have multiple "underlay" nodes (currently NOT)
  • Does that effect the queries?
    • For example, it can be "dangerous", to make DFG shortcuts, because then we do not "see" the traditional DFG path anymore and cannot check if something happens on this path
  • Does it make sense to introduce a new ConceptGranularity to have a "concept-world" DFG edge
@maximiliankaul
Copy link
Contributor

Summary of today's discussion:

DFG and EOG edges

  • We will mark new DFG/EOG edges.
    • This allows us to ignore these new edges in existing passes, thus not breaking existing behavior.
  • We will connect Operation nodes to the CPG nodes to allow for nicer queries (all DFG paths from FileRead must... instead of all nodes, where an OverlayNode of type FileRead exists, must ...
graph TD
  FileRead["Operation[FileRead]"]
  CallExpr["CallExpression[read]"]

  FileRead -->|DFG| CallExpr
Loading
  • New edges will only be introduced wherever they provide a benefit.
    • This allows us to connect nodes where the original CPG would not find a connection (e.g. HTTP endpoint and client call).
    • This does not introduce shorter paths as no new edge is introduced if a DFG/... path is already present in the "CPG world"
  • We'll have to be careful as this can result in new loops.

Deciding if all paths contain a required Concept node

This is not easy, as there will always be the path using the concept nodes and the one ignoring it:

graph TD
    prev["before foo()"]
    foo["foo()"]
    after["after foo()"]
    concept["concept"]
    prev -->|DFG| foo -->|DFG| after
    concept -->|DFG| foo
    foo -->|DFG| concept
Loading

Currently, the best solution is to rephrase such a query to check for a node with an overlayNode of Type X on the path. However, @konradweiss has another suggestion to add to this discussion.

Where to place nodes?

This is purely a design decision. Suggestion:

  • One node per "concept instance" (instead of one global node per concept)
  • The concept node is connected to the CPG node creating the object (e.g. API call / malloc / ...).
  • Operation nodes are connected to the respective CallExpressions.

Concept Reference

We have concepts with fields representing other concepts. E.g. a DiskEncryption concept requires a Secret as a key.

  • We'll introduce a ConceptReference or something similar.
  • Having a reference keeps us from dealing with code where we cannot decide which concrete instance of a concept is used (e.g. a Secret set in a branch being used outside the branch). However, this can still be resolved manually to the best of the CPG's ability).

Multiple "underlay" nodes

Think of the following code snippet:

key = SomeLib.getKey()
foo(key)
bar(key)

Here, we will not create edges from the call arguments to the Secret concept. We will provide an (extension?) function allowing the end user to easily check if a given Reference (i.e. key) node can be a key by checking prevDFG. This is purely a convenience function and should result in easier to read queries.
There is also the option to connect all occurrences of key with the concept, however we do not see a benefit of this approach. This is purely a design decision and should yield equivalent results.

Do we have to require a Operation class

The question on the necessity of having an Operation for creating a Concept was raised and has to be evaluated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants