Improve SBOM file filtering logic (prepare for SOURCE classification support)
Description
Today, when loading file hashes from an SPDX SBOM, Micromize filters files using:
fileTypes = BINARY or APPLICATION
- Plus a path-based mitigation: loading files under common executable paths (
/bin, /sbin, /lib, /lib64, etc.)
This is a defensive workaround because many executable scripts are currently classified by Syft as TEXT, making them indistinguishable from non-executable text files (config, docs, docs, etc.).
If Syft improves SPDX classification to emit SOURCE for shebang scripts (See: anchore/syft#4640), we should:
- Update filtering logic to include:
BINARY
APPLICATION
SOURCE
- Reduce reliance on path-based heuristics.
- Avoid loading unnecessary non-executable text artifacts into enforcement maps.
Proposed direction
Short term (Done)
- Refactor filtering logic into a dedicated classification function.
- Make path-based rules explicit and isolated.
Medium term (after and if Syft change is available)
- Include
SOURCE in the allowed file types.
- Optionally gate behavior behind a feature flag to maintain backward compatibility.
Long term
- Consider augmenting classification with executable-bit metadata if available from SBOM generators in the future.
- Re-evaluate whether path-based heuristics can be fully removed once classification quality improves.
Improve SBOM file filtering logic (prepare for SOURCE classification support)
Description
Today, when loading file hashes from an SPDX SBOM, Micromize filters files using:
fileTypes=BINARYorAPPLICATION/bin,/sbin,/lib,/lib64, etc.)This is a defensive workaround because many executable scripts are currently classified by Syft as
TEXT, making them indistinguishable from non-executable text files (config, docs, docs, etc.).If Syft improves SPDX classification to emit
SOURCEfor shebang scripts (See: anchore/syft#4640), we should:BINARYAPPLICATIONSOURCEProposed direction
Short term (Done)
Medium term (after and if Syft change is available)
SOURCEin the allowed file types.Long term