Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching comments is not supported #5931

Open
Vasco-jofra opened this issue Aug 17, 2022 · 6 comments
Open

Matching comments is not supported #5931

Vasco-jofra opened this issue Aug 17, 2022 · 6 comments
Labels
exotic funky stuff lang:python priority:low user:external requested by someone outside of r2c

Comments

@Vasco-jofra
Copy link

Vasco-jofra commented Aug 17, 2022

Problem description

It would be nice if Semgrep could be used to match python comments in a simpler way. For example, I wanted to match python functions with the @route decorator, did not have the @protected_endpoint decorator, and did not have a specific comment (# Hello in the examples below).

This was the best solution I came up with (https://semgrep.dev/s/pwr0).

patterns:
  - pattern-regex: |
      # Hello
      [\s\S]*?
      def.*
      (^[ \t]+[^\n]*\n?)+
  - pattern: |
      @route($ROUTE, ...)
      def $F(...):
        ...
  - pattern-not: |
      @protected_endpoint(...)
      def $F(...):
        ...

The pattern-regex works as follows:

  1. Matches the comment (# Hello)
  2. Matches any decorators until it finds a def, i.e., until the beginning of a function ([\s\S]*?\ndef.*)
  3. Matches any string that starts with a space or tab, i.e., until the end of the function ((^[ \t]+[^\n]*\n?)+)

Problems:

  • If a match is at the bottom of a file without a new line at the end, this solution does not work.
  • It is error prone to write these complicated regexes

Solution I'd like

I would like to be able to match comments with Semgrep. For example, the rule could look something like this (https://semgrep.dev/s/2Dyq):

patterns:
  - pattern: |
      @route($ROUTE, ...)
      def $F(...):
        ...
  - pattern-not: |
      @protected_endpoint(...)
      def $F(...):
        ...
  - pattern-not: |
      # Hello
      def $F(...):
        ...
@r2c-demo
Copy link
Collaborator

This issue is synced in Linear at https://linear.app/r2c/issue/PA-1775/matching-python-comments-is-not-well-supported. Note: this link is for r2c use only and is not accessible publicly.

@LewisArdern LewisArdern added lang:python user:external requested by someone outside of r2c labels Aug 17, 2022
@Sjord
Copy link
Contributor

Sjord commented Aug 25, 2022

I think comments are generally ignored by semgrep in any language. Today I also ran into a use case where I want to match comments, but in PHP: the comments contain some documentation on parameters, and I want to cross-check that against the code with semgrep. So matching comments would be desirable.

@stale
Copy link

stale bot commented Sep 8, 2022

This issue is being marked stale because there hasn't been any activity in 14 days and either it wasn't prioritized or its priority is high. Please apply the appropriate priority:* label before removing the stale label.

@stale stale bot added the stale needs prioritization; label applied by stalebot label Sep 8, 2022
@aryx aryx added priority:low exotic funky stuff and removed stale needs prioritization; label applied by stalebot labels Sep 17, 2022
@aryx aryx changed the title Matching python comments is not well supported Matching comments is not supported Sep 17, 2022
@mjambon
Copy link
Member

mjambon commented Sep 22, 2022

Hi there. It's not impossible that we could add some support for matching comments but I haven't found a simple approach that's also generic yet. Thanks for the clear use case, it's helpful.

Preliminary work to have comments in the AST is almost done: semgrep/ocaml-tree-sitter-core#48

@Pchao93
Copy link

Pchao93 commented Sep 22, 2022

Chiming in with another use case - catching and preventing linter exemption comments. In ruby, Rubocop rules can be disabled with comments like # rubocop:disable Some/Rule, which cannot be detected. Very interested in seeing this usage supported!

@xmo-odoo
Copy link

xmo-odoo commented Jun 19, 2024

It's pretty much the same for Python, tooling generally supports a way to opt out (e.g. noqa, pylint: disable, ...) and while semgrep itself can be invoked with --disable-nosem that's often not the option elsewhere. Plus it might still be of value to have a pass highlighting such opt-outs in order to review them more carefully.

OTOH the command of these while parseable is generally not regular, so I think pattern-regex is sufficient, it seems to work fine enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exotic funky stuff lang:python priority:low user:external requested by someone outside of r2c
Projects
None yet
Development

No branches or pull requests

9 participants