Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document which fields of Syft SBOM are used in processing input #2249

Open
chovanecadam opened this issue Nov 8, 2024 · 6 comments
Open
Labels
enhancement New feature or request
Milestone

Comments

@chovanecadam
Copy link

What would you like to be added:

Documentation about what fields from the Syft JSON are used and for what purpose.

Why is this needed:

Some organizations have access to information about installed packages from other sources, e.g. from monitoring tools, or configuration management tools. In this case Syft is not used, but Syft SBOM is constructed from the data manually. Syft SBOM contains many fields which are not necessary for vulnerability analysis with Grype. It would be great to document which fields are necessary for Grype to function and to include this information in the changelog in case it changes.

Additional context:

@chovanecadam chovanecadam added the enhancement New feature or request label Nov 8, 2024
@popey
Copy link
Contributor

popey commented Nov 8, 2024

Hi @chovanecadam - thanks for the interesting issue. This sentence alone raises, for me, a point and a question

Syft SBOM contains many fields which are not necessary for vulnerability analysis with Grype.

  • Question: Which fields? I'm not trying to be awkward, but when someone asserts that "Thing does something that is unnecessary", I'm inclined to ask for more data to back up the assertion.

  • Point: Syft doesn't only exist to feed Grype. Many people use Syft to generate an SBOM which may be analyzed by some tool other than Grype. As such we probably shouldn't limit the data in the Syft output only to that which is consumed by Grype.

That said, I don't doubt that it might be useful to have documented somewhere, the metadata used by Grype to correctly identify packages, versions, files and origins.

But, it's all open source, so this would be a great "fresh eyeballs" taslk for someone. Thanks again for the question.

@chovanecadam
Copy link
Author

To answer the question, through trial and error I have arrived at this snippet of an entry in the artifacts array. I am not sure if this even counts as a valid SBOM, but Grype does not complain.

    {
      "id": "118",
      "name": "zlib1g",
      "version": "1:1.2.13.dfsg-1",
      "type": "deb",
      "foundBy": "",
      "locations": [],
      "licenses": [],
      "language": "",
      "cpes": [],
      "purl": "pkg:pkg/deb/debian/zlib1g@1:1.2.13.dfsg-1?arch=amd64&distro=debian-12",
      "metadataType": "dpkg-db-entry",
      "metadata": {
        "package": "zlib1g",
        "source": "zlib",
        "version": "1:1.2.13.dfsg-1",
        "sourceVersion": "1:1.2.13.dfsg-1",
        "architecture": "amd64",
        "maintainer": "",
        "installedSize": 0,
        "provides": [],
        "depends": [],
        "files": []
      }

Many other fields in the SBOM are not AFAIK necessary, such as the files array or the artifactRelationships array. I didn't look at the source code of Grype to figure out which fields is it taking into considerations yet.

Would there be an interest at documenting this behavior and updating the docs? I might create the docs, but if there isn't interest in the project for this, I don't see the point. No docs is better than incorrect docs imo.

@popey
Copy link
Contributor

popey commented Nov 11, 2024

Thanks @chovanecadam - if you would like to document it, I'm sure we can find somewhere in the docs to put it!
That would be a wonderful contribution. Apologies if my previous response sounded like I wasn't interested. 🙏

@chovanecadam
Copy link
Author

Great. I will look into it hopefully sometime this week and notify you if I make some progress.

@kzantow
Copy link
Contributor

kzantow commented Nov 27, 2024

Hey all -- Grype has it's own set of package structures and metadata structures, which contain a much smaller set of data -- this is probably the best thing to look at to understand what Grype needs to accurately match. Grype only ingests data used for matching and drops other data. The package and structures are in: https://github.com/anchore/grype/blob/main/grype/pkg -- for example, look at the difference between Grype's Java Metadata and Syft's Java Metadata. While documenting these fields might be okay, these also might get stale quickly as we move Syft and Grype forward, so at least for now, just looking at this package would be a great starting point to understand what's needed.

@kzantow kzantow added this to the Grype 1.0 milestone Nov 27, 2024
@kzantow kzantow moved this to Backlog in OSS Nov 27, 2024
@kzantow
Copy link
Contributor

kzantow commented Nov 27, 2024

When planning for Grype 1.0, we should consider this request: ideally we would auto-generate the documentation for this from the Grype data structures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

3 participants