Background
sbomit currently resolves files in witness attestations to packages using regex patterns on file paths:
- Python: extracts name/version from
site-packages/foo-1.2.3.dist-info
- Go: parses
/pkg/mod/<module>@<version>/ from the module cache path
- Rust: splits
crate-name-1.0.0 on -, which is ambiguous since crate names can contain hyphens
Problems
- Fragile by design - path structure is a convention, but package managers change install layouts across versions
- Ambiguous splits - Rust's hyphen-delimited
name-version format has no reliable delimiter
- System packages - files from
apt/rpm/apk have no version in their paths at all
- Hash is ignored - witness records SHA256 for every file, but we never use it for lookup, it's our most reliable signal
Goal
Use a file's content hash (already in the attestation) as the primary key for package lookup, falling back to path as a hint. Given sha256:<hash> → return pkg:<ecosystem>/<name>@<version>. This does not touch the network-trace resolver, which already handles HTTP exchange data well. Focus is purely on the file-trace side.
Potential Directions
- Software Heritage - accepts SHA256 lookups, but maps to a blob rather than a package directly. Getting from "file exists in archive" to "file belongs to package@version" requires additional traversal
- deps.dev - dont think it works with individual file hashes, but works for jar, wheels for example
- Registry metadata - do some package manager have a lookup option we can utilize
- We build our own index - I built a apt hash lookup database, which we can extend potentially to other ecosystems
@yongjae354
Background
sbomitcurrently resolves files in witness attestations to packages using regex patterns on file paths:site-packages/foo-1.2.3.dist-info/pkg/mod/<module>@<version>/from the module cache pathcrate-name-1.0.0on-, which is ambiguous since crate names can contain hyphensProblems
name-versionformat has no reliable delimiterapt/rpm/apkhave no version in their paths at allGoal
Use a file's content hash (already in the attestation) as the primary key for package lookup, falling back to path as a hint. Given
sha256:<hash>→ returnpkg:<ecosystem>/<name>@<version>. This does not touch the network-trace resolver, which already handles HTTP exchange data well. Focus is purely on the file-trace side.Potential Directions
@yongjae354