Skip to content

Move from regex/path-based package resolution to hash-based package identification #43

@absol27

Description

@absol27

Background

sbomit currently resolves files in witness attestations to packages using regex patterns on file paths:

  • Python: extracts name/version from site-packages/foo-1.2.3.dist-info
  • Go: parses /pkg/mod/<module>@<version>/ from the module cache path
  • Rust: splits crate-name-1.0.0 on -, which is ambiguous since crate names can contain hyphens

Problems

  • Fragile by design - path structure is a convention, but package managers change install layouts across versions
  • Ambiguous splits - Rust's hyphen-delimited name-version format has no reliable delimiter
  • System packages - files from apt/rpm/apk have no version in their paths at all
  • Hash is ignored - witness records SHA256 for every file, but we never use it for lookup, it's our most reliable signal

Goal

Use a file's content hash (already in the attestation) as the primary key for package lookup, falling back to path as a hint. Given sha256:<hash> → return pkg:<ecosystem>/<name>@<version>. This does not touch the network-trace resolver, which already handles HTTP exchange data well. Focus is purely on the file-trace side.

Potential Directions

  • Software Heritage - accepts SHA256 lookups, but maps to a blob rather than a package directly. Getting from "file exists in archive" to "file belongs to package@version" requires additional traversal
  • deps.dev - dont think it works with individual file hashes, but works for jar, wheels for example
  • Registry metadata - do some package manager have a lookup option we can utilize
  • We build our own index - I built a apt hash lookup database, which we can extend potentially to other ecosystems

@yongjae354

Metadata

Metadata

Assignees

Labels

discussionLong-term tasks requiring community discussion

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions