feat: add opt-in local-first LLM enrichment scaffolding by Copilot · Pull Request #1 · DmytroKashchuk/Syftient

Copilot · 2026-04-27T02:02:43Z

Establishes the architectural foundation for AI-powered SBOM enrichment using a local LLM backend (Ollama). Scaffolding only — interfaces and structure, no production-grade prompts or full classification logic yet. Feature is disabled by default; default Syft behaviour is completely unchanged.

New packages

`internal/llm/`

client.go — provider-agnostic Client interface (Generate, HealthCheck, ModelInfo); future providers (OpenAI, Anthropic) implement this without touching the pipeline
types.go — Request, Response, ModelInfo, Evidence structs; Request carries prompt, system prompt, JSON schema, temperature, seed, max tokens, timeout
ollama.go — OllamaClient via plain net/http (no SDK); httpDoer interface stubs HTTP for unit tests; 1 retry, configurable endpoint/model/timeout
cache.go — CachedClient decorator over internal/cache; key = sha256(prompt + systemPrompt + model + modelVersion)
mock.go — MockClient with call recording and canned response/error queues for use across test suites

`syft/pkg/cataloger/llmenrich/`

Post-processor (not a cataloger). Separate from syft/pkg/cataloger/ai/ (GGUF model file cataloger).

enricher.go — EnrichmentTask interface + Orchestrator (token budget, task name filtering, graceful per-package failure logging)
license_classifier.go — skeleton task targeting packages with NOASSERTION/empty SPDX licenses; placeholder SPDX enum with TODO for full internal/spdxlicense integration; prompt template with TODO for few-shot tuning
evidence.go — AttachEvidence/GetEvidence storing llm.Evidence{source, model, confidence, prompt_hash} in pkg.Package.Metadata under key "llm-evidence" without altering the SBOM schema

`cmd/syft/internal/options/llm.go`

type LLM struct {
    Enabled       bool          // default: false — opt-in only
    Provider      string        // "ollama" only in this release; validated in PostLoad
    Endpoint      string        // default: http://localhost:11434
    Model         string        // default: llama3.2:3b
    Timeout       time.Duration // default: 30s
    Temperature   float64       // default: 0.0 (deterministic)
    MinConfidence float64       // default: 0.75
    Tasks         []string      // empty = all; "licenses" only in this release
    MaxTokens     int           // default: 100000 (soft per-scan budget)
}

Wiring

Single additive hook in scan.go after SBOM generation:

applyLLMEnrichment(ctx, s, &opts.LLM)

No-op when Enabled=false. On health-check failure: logs one warning, returns unmodified SBOM (graceful degradation, never fails the scan).

Documentation

docs/llm-enrichment.md — Overview, Why local-first, Quickstart with docker-compose, config reference, how to add a new EnrichmentTask, privacy & data handling, limitations, roadmap
README.md — "🤖 AI Enrichment (Experimental, opt-in)" section linking to docs

Roadmap (follow-up PRs)

License Classifier prompt tuning + few-shot examples
Full SPDX list integration from internal/spdxlicense
Benchmark / golden-file dataset
Additional tasks: Unknown Binary Identifier, CPE normalisation
Embedding pre-filter
Optional cloud providers (OpenAI, Anthropic)

Type of change

New feature (non-breaking change which adds functionality)
Documentation (updates the documentation)

Checklist

I have added unit tests that cover changed behavior
I have tested my code in common scenarios and confirmed there are no regressions
I have added comments to my code, particularly in hard-to-understand sections

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

get.anchore.io
- Triggering command: /usr/bin/curl curl -sSfL REDACTED ux-amd64/pkg/too/tmp/ccACg5vX.o .cfg�� assifier.go t 0.1-go1.25.8.linux-amd64/pkg/tool/linux_amd64/vet -unreachable=falgit t t 0.1-go1.25.8.lindiff .cfg�� t t 0.1-go1.25.8.lin99158be0baba90d9beff0e311e947d0f03fb1876 -bool t t 0.1-go1.25.8.linux-amd64/pkg/tool/linux_amd64/vet (dns block)
- Triggering command: `/usr/bin/curl curl -sSfL REDACTED pkg/mod/golang.ofix: address code review feedback on llmenrich package

Fix Applies() to check only SPDXExpression (not raw Value) for license classification
Fix token budgshow - Fi�� assifier.go t ux_amd64/cgo -bool t l/linux_amd64/ve99158be0baba90d9beff0e311e947d0f03fb1876:docs/llm-enrichment.md ux_amd64/cgo 1636�� k/Syftient l/linux_amd64/vet pkg/mod/golang.org/toolchain@v0.0.1-go1.25.8.linux-amd64/pkg/tool/linux_amd64/vet -bool t t pkg/mod/golang.org/toolchain@v0.0.1-go1.25.8.linux-amd64/pkg/tool/linux_amd64/vet` (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled

Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

Goal

Add the scaffolding (skeleton, no full business logic yet) for an opt-in, local-first LLM enrichment layer in Syftient. The goal of this PR is to establish the architectural foundation for AI-powered SBOM enrichment using a local LLM backend (Ollama) — without sending any data to cloud providers.

This PR must NOT include the final tuned prompts, benchmarks, or full classification logic. Those will follow in subsequent PRs. The goal is to land a clean, reviewable, production-quality skeleton that compiles, has passing tests, and clearly defines extension points (// TODO markers) for future feature work.

Design principles (must respect ALL of them)

Opt-in only: the LLM enrichment MUST be disabled by default. The default behavior of syft must be unchanged.
Graceful degradation: if --llm-enabled is set but Ollama is unreachable, log a clear warning and continue producing the standard SBOM. NEVER fail the scan because the LLM is unavailable.
Provider-agnostic interface: define a Client interface in internal/llm/ so future providers (OpenAI, Anthropic, etc.) can be added without refactoring. For this PR only an Ollama implementation is needed.
Determinism: default temperature=0, support for explicit seed, structured JSON output via Ollama's format parameter.
Cache reuse: reuse the existing internal/cache package. Cache key MUST be sha256(promptTemplate + input + model + modelVersion).
Privacy: pass any input through internal/redact before sending to the LLM provider. Add a TODO in the right place if redaction integration is non-trivial.
Auditability: every LLM-derived field in the SBOM must be marked with structured evidence metadata (source: "llm", model, confidence, prompt_hash).
Zero telemetry: no calls home, ever.

Scope of this PR (what to create)

1. `internal/llm/` — new package

Create the following files:

client.go — defines the provider-agnostic Client interface:

type Client interface {
    Generate(ctx context.Context, req Request) (*Response, error)
    HealthCheck(ctx context.Context) error
    ModelInfo() ModelInfo
}

types.go — defines Request, Response, ModelInfo, Evidence structs. The Request must support: prompt, system prompt, JSON schema for structured output, temperature, seed, max tokens, timeout. The Response must include: parsed content, raw content, latency, token counts, model used, prompt hash, confidence (if extracted from JSON output).
ollama.go — OllamaClient implementation that talks to http://localhost:11434 (configurable). Use POST /api/generate with format: "json" for structured outputs and POST /api/tags for health check. Implement timeout, retries (1 retry max, simple), and clean error wrapping. Stub the actual HTTP calls behind a small internal interface so they can be unit-tested without a real Ollama instance.
cache.go — thin wrapper around internal/cache GetResolverCachingErrors[Response] that handles key derivation (the sha256 mentioned above). Expose a CachedClient decorator that wraps any Client.
mock.go — a MockClient implementation for use in tests across the codebase. Should allow recording calls and returning canned responses.
client_test.go, ollama_test.go, cache_test.go — unit tests with the mock and with httptest.NewServer for the Ollama HTTP layer. Tests must NOT require a real Ollama running.

2. `cmd/syft/internal/options/llm.go` — new CLI options file

Follow the patterns of existing option files like cache.go, golang.go, python.go in the same directory.

Add a LLM struct with these fields (use proper mapstructure tags):

type LLM struct {
    Enabled       bool          `yaml:"enabled" json:"enabled" mapstructure:"enabled"`
    Provider      string        `yaml:"provider" json:"provider" mapstructure:"provider"`         // only "ollama" supported in this PR; validate
    Endpoint      string        `yaml:"endpoint" json:"endpoint" mapstructure:"endpoint"`         // default http://localhost:11434
    Model         string        `yaml:"model" json:"model" mapstructure:"model"`                  // default llama3.2:3b
    Timeout       time.Duration `yaml:"timeout" json:"timeout" mapstructure:"timeout"`            // default 30s
    Temperature   float64       `yaml:"temperature" json:"temperature" mapstructure:"temperature"` // default 0.0
    MinConfidence float64       `yaml:"min-confidence" json:"min_confidence" mapstructure:"min-confidence"` // default 0.75
    Tasks         []string      `yaml:"tasks" json:"tasks" mapstructure:"tasks"`                  // empty = all enabled tasks; in this PR only "licenses" is registered
    MaxTokens     int           `yaml:"max-tokens" json:"max_tokens" mapstructure:"max-tokens"`   // soft budget per scan, default 100000
}

Implement AddFlags, `Describe...

This pull request was created from Copilot chat.

- Add internal/llm package with Client interface, OllamaClient, CachedClient, MockClient - Add cmd/syft/internal/options/llm.go with LLM config struct and flags - Add syft/pkg/cataloger/llmenrich package with Orchestrator, LicenseClassifier, Evidence helpers - Wire LLM enrichment into scan command (opt-in, graceful degradation) - Add docs/llm-enrichment.md and README.md section - All new unit tests pass, go build ./... and targeted tests pass Agent-Logs-Url: https://github.com/DmytroKashchuk/Syftient/sessions/796bdcdd-ddd8-4fda-88e1-d6ee2802c93a Co-authored-by: DmytroKashchuk <31933655+DmytroKashchuk@users.noreply.github.com>

- Fix Applies() to check only SPDXExpression (not raw Value) for license classification - Fix token budget tracking: increment per enriched package with TODO for proper token counting - Fix spelling: licence -> license in comment Agent-Logs-Url: https://github.com/DmytroKashchuk/Syftient/sessions/796bdcdd-ddd8-4fda-88e1-d6ee2802c93a Co-authored-by: DmytroKashchuk <31933655+DmytroKashchuk@users.noreply.github.com>

- Document shallow copy safety in LicenseClassifier.Enrich - Add comments explaining defaultTimeout and maxRetries constants - Improve TestOrchestrator_EnrichesPackages to verify ID replacement Agent-Logs-Url: https://github.com/DmytroKashchuk/Syftient/sessions/796bdcdd-ddd8-4fda-88e1-d6ee2802c93a Co-authored-by: DmytroKashchuk <31933655+DmytroKashchuk@users.noreply.github.com>

Initial plan

eb62df6

Copilot AI assigned Copilot and DmytroKashchuk Apr 27, 2026

Copilot started work on behalf of DmytroKashchuk April 27, 2026 02:02 View session

Copilot AI and others added 3 commits April 27, 2026 02:16

Copilot AI changed the title ~~[WIP] Add scaffolding for local-first LLM enrichment layer~~ feat: add opt-in local-first LLM enrichment scaffolding Apr 27, 2026

Copilot AI requested a review from DmytroKashchuk April 27, 2026 02:26

Copilot finished work on behalf of DmytroKashchuk April 27, 2026 02:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add opt-in local-first LLM enrichment scaffolding#1

feat: add opt-in local-first LLM enrichment scaffolding#1
Copilot wants to merge 4 commits into
mainfrom
copilot/add-llm-enrichment-skeleton

Copilot AI commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New packages

internal/llm/

syft/pkg/cataloger/llmenrich/

cmd/syft/internal/options/llm.go

Wiring

Documentation

Roadmap (follow-up PRs)

Type of change

Checklist

I tried to connect to the following addresses, but was blocked by firewall rules:

Goal

Design principles (must respect ALL of them)

Scope of this PR (what to create)

1. internal/llm/ — new package

2. cmd/syft/internal/options/llm.go — new CLI options file

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 27, 2026 •

edited

Loading

`internal/llm/`

`syft/pkg/cataloger/llmenrich/`

`cmd/syft/internal/options/llm.go`

1. `internal/llm/` — new package

2. `cmd/syft/internal/options/llm.go` — new CLI options file