feat: add opt-in local-first LLM enrichment scaffolding#1
Draft
Copilot wants to merge 4 commits into
Draft
Conversation
- Add internal/llm package with Client interface, OllamaClient, CachedClient, MockClient - Add cmd/syft/internal/options/llm.go with LLM config struct and flags - Add syft/pkg/cataloger/llmenrich package with Orchestrator, LicenseClassifier, Evidence helpers - Wire LLM enrichment into scan command (opt-in, graceful degradation) - Add docs/llm-enrichment.md and README.md section - All new unit tests pass, go build ./... and targeted tests pass Agent-Logs-Url: https://github.com/DmytroKashchuk/Syftient/sessions/796bdcdd-ddd8-4fda-88e1-d6ee2802c93a Co-authored-by: DmytroKashchuk <31933655+DmytroKashchuk@users.noreply.github.com>
- Fix Applies() to check only SPDXExpression (not raw Value) for license classification - Fix token budget tracking: increment per enriched package with TODO for proper token counting - Fix spelling: licence -> license in comment Agent-Logs-Url: https://github.com/DmytroKashchuk/Syftient/sessions/796bdcdd-ddd8-4fda-88e1-d6ee2802c93a Co-authored-by: DmytroKashchuk <31933655+DmytroKashchuk@users.noreply.github.com>
- Document shallow copy safety in LicenseClassifier.Enrich - Add comments explaining defaultTimeout and maxRetries constants - Improve TestOrchestrator_EnrichesPackages to verify ID replacement Agent-Logs-Url: https://github.com/DmytroKashchuk/Syftient/sessions/796bdcdd-ddd8-4fda-88e1-d6ee2802c93a Co-authored-by: DmytroKashchuk <31933655+DmytroKashchuk@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add scaffolding for local-first LLM enrichment layer
feat: add opt-in local-first LLM enrichment scaffolding
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Establishes the architectural foundation for AI-powered SBOM enrichment using a local LLM backend (Ollama). Scaffolding only — interfaces and structure, no production-grade prompts or full classification logic yet. Feature is disabled by default; default Syft behaviour is completely unchanged.
New packages
internal/llm/client.go— provider-agnosticClientinterface (Generate,HealthCheck,ModelInfo); future providers (OpenAI, Anthropic) implement this without touching the pipelinetypes.go—Request,Response,ModelInfo,Evidencestructs;Requestcarries prompt, system prompt, JSON schema, temperature, seed, max tokens, timeoutollama.go—OllamaClientvia plainnet/http(no SDK);httpDoerinterface stubs HTTP for unit tests; 1 retry, configurable endpoint/model/timeoutcache.go—CachedClientdecorator overinternal/cache; key =sha256(prompt + systemPrompt + model + modelVersion)mock.go—MockClientwith call recording and canned response/error queues for use across test suitessyft/pkg/cataloger/llmenrich/Post-processor (not a cataloger). Separate from
syft/pkg/cataloger/ai/(GGUF model file cataloger).enricher.go—EnrichmentTaskinterface +Orchestrator(token budget, task name filtering, graceful per-package failure logging)license_classifier.go— skeleton task targeting packages withNOASSERTION/empty SPDX licenses; placeholder SPDX enum withTODOfor fullinternal/spdxlicenseintegration; prompt template withTODOfor few-shot tuningevidence.go—AttachEvidence/GetEvidencestoringllm.Evidence{source, model, confidence, prompt_hash}inpkg.Package.Metadataunder key"llm-evidence"without altering the SBOM schemacmd/syft/internal/options/llm.goWiring
Single additive hook in
scan.goafter SBOM generation:No-op when
Enabled=false. On health-check failure: logs one warning, returns unmodified SBOM (graceful degradation, never fails the scan).Documentation
docs/llm-enrichment.md— Overview, Why local-first, Quickstart with docker-compose, config reference, how to add a newEnrichmentTask, privacy & data handling, limitations, roadmapREADME.md— "🤖 AI Enrichment (Experimental, opt-in)" section linking to docsRoadmap (follow-up PRs)
internal/spdxlicenseType of change
Checklist
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
get.anchore.io/usr/bin/curl curl -sSfL REDACTED ux-amd64/pkg/too/tmp/ccACg5vX.o .cfg�� assifier.go t 0.1-go1.25.8.linux-amd64/pkg/tool/linux_amd64/vet -unreachable=falgit t t 0.1-go1.25.8.lindiff .cfg�� t t 0.1-go1.25.8.lin99158be0baba90d9beff0e311e947d0f03fb1876 -bool t t 0.1-go1.25.8.linux-amd64/pkg/tool/linux_amd64/vet(dns block)Original prompt
Goal
Add the scaffolding (skeleton, no full business logic yet) for an opt-in, local-first LLM enrichment layer in Syftient. The goal of this PR is to establish the architectural foundation for AI-powered SBOM enrichment using a local LLM backend (Ollama) — without sending any data to cloud providers.
This PR must NOT include the final tuned prompts, benchmarks, or full classification logic. Those will follow in subsequent PRs. The goal is to land a clean, reviewable, production-quality skeleton that compiles, has passing tests, and clearly defines extension points (
// TODOmarkers) for future feature work.Design principles (must respect ALL of them)
syftmust be unchanged.--llm-enabledis set but Ollama is unreachable, log a clear warning and continue producing the standard SBOM. NEVER fail the scan because the LLM is unavailable.Clientinterface ininternal/llm/so future providers (OpenAI, Anthropic, etc.) can be added without refactoring. For this PR only an Ollama implementation is needed.formatparameter.internal/cachepackage. Cache key MUST besha256(promptTemplate + input + model + modelVersion).internal/redactbefore sending to the LLM provider. Add a TODO in the right place if redaction integration is non-trivial.source: "llm",model,confidence,prompt_hash).Scope of this PR (what to create)
1.
internal/llm/— new packageCreate the following files:
client.go— defines the provider-agnosticClientinterface:types.go— definesRequest,Response,ModelInfo,Evidencestructs. TheRequestmust support: prompt, system prompt, JSON schema for structured output, temperature, seed, max tokens, timeout. TheResponsemust include: parsed content, raw content, latency, token counts, model used, prompt hash, confidence (if extracted from JSON output).ollama.go—OllamaClientimplementation that talks tohttp://localhost:11434(configurable). UsePOST /api/generatewithformat: "json"for structured outputs andPOST /api/tagsfor health check. Implement timeout, retries (1 retry max, simple), and clean error wrapping. Stub the actual HTTP calls behind a small internal interface so they can be unit-tested without a real Ollama instance.cache.go— thin wrapper aroundinternal/cacheGetResolverCachingErrors[Response]that handles key derivation (the sha256 mentioned above). Expose aCachedClientdecorator that wraps anyClient.mock.go— aMockClientimplementation for use in tests across the codebase. Should allow recording calls and returning canned responses.client_test.go,ollama_test.go,cache_test.go— unit tests with the mock and withhttptest.NewServerfor the Ollama HTTP layer. Tests must NOT require a real Ollama running.2.
cmd/syft/internal/options/llm.go— new CLI options fileFollow the patterns of existing option files like
cache.go,golang.go,python.goin the same directory.Add a
LLMstruct with these fields (use propermapstructuretags):Implement
AddFlags, `Describe...This pull request was created from Copilot chat.