Skip to content

Improve SBOM file filtering logic #89

@dorser

Description

@dorser

Improve SBOM file filtering logic (prepare for SOURCE classification support)

Description

Today, when loading file hashes from an SPDX SBOM, Micromize filters files using:

  • fileTypes = BINARY or APPLICATION
  • Plus a path-based mitigation: loading files under common executable paths (/bin, /sbin, /lib, /lib64, etc.)

This is a defensive workaround because many executable scripts are currently classified by Syft as TEXT, making them indistinguishable from non-executable text files (config, docs, docs, etc.).

If Syft improves SPDX classification to emit SOURCE for shebang scripts (See: anchore/syft#4640), we should:

  1. Update filtering logic to include:
    • BINARY
    • APPLICATION
    • SOURCE
  2. Reduce reliance on path-based heuristics.
  3. Avoid loading unnecessary non-executable text artifacts into enforcement maps.

Proposed direction

Short term (Done)

  • Refactor filtering logic into a dedicated classification function.
  • Make path-based rules explicit and isolated.

Medium term (after and if Syft change is available)

  • Include SOURCE in the allowed file types.
  • Optionally gate behavior behind a feature flag to maintain backward compatibility.

Long term

  • Consider augmenting classification with executable-bit metadata if available from SBOM generators in the future.
  • Re-evaluate whether path-based heuristics can be fully removed once classification quality improves.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions