From de8253d9530842c0b0227a67a4996904a6c67abe Mon Sep 17 00:00:00 2001 From: Daniel D'Avella Date: Mon, 22 Jul 2024 13:57:12 -0400 Subject: [PATCH 1/3] Clean up documentation a bit --- codetf.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/codetf.md b/codetf.md index 505b031..02a81f9 100644 --- a/codetf.md +++ b/codetf.md @@ -6,7 +6,7 @@ This open format describes code changes or suggestions made by an automated tool # The specification JSON -The [specification](codetf.json) is immature right now, only existing as a marked-up JSON file instead of a proper JSON schema. It's also not independently versioned outside of langauge-specific bindings (e.g., [Java binding](https://github.com/pixee/codetf-java-bindings)). We are avoiding more investment in ceremony, versioning, governance, etc., until we feel it has reached a more stable footing. Following [SARIF](https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html) stylistically as a long term goal makes sense, not only because it's a successful standard, but also because our results will be closely linked with SARIF, so we could have many users, consumers, and implementors in common. +The [specification](codetf.schema.json) is expressed in terms of [JSON Schema](https://json-schema.org/). The schema is currently not versioned. We are avoiding investment in ceremony, versioning, governance, etc., until we feel it has reached a more stable footing. Following [SARIF](https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html) stylistically as a long term goal makes sense, not only because it's a successful standard, but also because our results will be closely linked with SARIF, so we could have many users, consumers, and implementors in common. Note that like SARIF, this format is not intended to be a replacement for a diagnostic log. It's not intended to have anything more than minimum diagnostics to help with reproducibility. From 4d12fd552a70a434f0389c5dadfed2875a535f5d Mon Sep 17 00:00:00 2001 From: Daniel D'Avella Date: Mon, 22 Jul 2024 14:14:28 -0400 Subject: [PATCH 2/3] Describe codemod IDs --- codetf.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/codetf.md b/codetf.md index 02a81f9..47d7806 100644 --- a/codetf.md +++ b/codetf.md @@ -14,6 +14,18 @@ Note that like SARIF, this format is not intended to be a replacement for a diag It may help to understand the major components of CodeTF from a high levels first before exploring or attemptin to implement the specification. The `results` and `changeset` fields can be seen as a series of patches against a project's directory. Each patch builds on any previous patches seen. Therefore, applying a patch from the middle of a `changeset` without the others may be invalid. Multiple locations can be changed in a single file within the scope of a single codemod and be represented by a single `changeset` array entry. +# Codemod IDs + +Codemods are uniquely identified by an ID, which is represented in CodeTF as the `codemod` property of the `result` object. + +IDs are descriptive and must conform to the following schema: `:/` + +Each component of the ID has a particular meaning: + +* ``: Origin describes the source of the analysis or transformation. For example, "find and fix" codemods provided by Pixee are labelled with the origin "pixee". Codemods that remediate issues found by a static analysis tool might be labelled with the origin corresponding to that tool name (e.g. "semgrep" or "codeql"). Implementers of custom codemods may use a unique identifier that is specific to their organization or tool. +* ``: The language that is transformed by the codemod. This should be a short, unique identifier for the language. Valid languages include `java`, `python`, and `javascript`. +* ``: The name of the codemod. This should be a short, unique identifier for the transformation that is performed. Individual words in the name should be separated by hyphens. For example: `remove-unused-imports`. + # Notes Note that the `changeset` array can have multiple entries for the same given file. From 42236382e819253aa72ac50ba177de571289e4ab Mon Sep 17 00:00:00 2001 From: Daniel D'Avella Date: Mon, 22 Jul 2024 15:44:26 -0400 Subject: [PATCH 3/3] Address code review feedback --- codetf.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/codetf.md b/codetf.md index 47d7806..818220a 100644 --- a/codetf.md +++ b/codetf.md @@ -14,15 +14,15 @@ Note that like SARIF, this format is not intended to be a replacement for a diag It may help to understand the major components of CodeTF from a high levels first before exploring or attemptin to implement the specification. The `results` and `changeset` fields can be seen as a series of patches against a project's directory. Each patch builds on any previous patches seen. Therefore, applying a patch from the middle of a `changeset` without the others may be invalid. Multiple locations can be changed in a single file within the scope of a single codemod and be represented by a single `changeset` array entry. -# Codemod IDs +# Codemod URIs -Codemods are uniquely identified by an ID, which is represented in CodeTF as the `codemod` property of the `result` object. +Codemods are uniquely identified by a URI, which is represented in CodeTF as the `codemod` property of the `result` object. -IDs are descriptive and must conform to the following schema: `:/` +URIs are descriptive and must conform to the following schema: `:/` -Each component of the ID has a particular meaning: +Each component of the URI has a particular meaning: -* ``: Origin describes the source of the analysis or transformation. For example, "find and fix" codemods provided by Pixee are labelled with the origin "pixee". Codemods that remediate issues found by a static analysis tool might be labelled with the origin corresponding to that tool name (e.g. "semgrep" or "codeql"). Implementers of custom codemods may use a unique identifier that is specific to their organization or tool. +* ``: Describes the source of the analysis that drives the transformation. Codemods that remediate issues found by a specific analysis tool should be labeled with the detector corresponding to that tool name (e.g. "semgrep", "codeql", etc.). Implementers of custom codemods that perform their own internal detection should use a unique identifier for their detector. For example, Pixee's "find and fix" codemods use "pixee". * ``: The language that is transformed by the codemod. This should be a short, unique identifier for the language. Valid languages include `java`, `python`, and `javascript`. * ``: The name of the codemod. This should be a short, unique identifier for the transformation that is performed. Individual words in the name should be separated by hyphens. For example: `remove-unused-imports`.