Status: active
What you'll learn: Drive the full CVE response pipeline on a synthetic CVE — exposure calculation, AI-generated runbook, operator approval, orchestrated remediation via rolling upgrades, verification, learning capture.
Time: ~75 min (mostly waiting for batched remediation)
Builds on: Tutorial 06 — CVE remediation orchestrates the same
rolling_module_upgradeskill across all affected modules. You should have the rolling upgrade machinery working end-to-end first.Sets you up for: Tutorial 09 — Honeypot canary — different defensive posture (detection of unauthorized access vs. patching of known vulnerabilities); both use the same autonomy + approval rails.
flowchart TD
Op[Security operator] --> Inject[Inject synthetic CVE<br/>DRILL: CVE-2026-99001]
Inject --> EC[ExposureCalculator]
EC --> Calc[Walk SBOMs<br/>match affected_packages]
Calc --> Exp[CveExposure rows:<br/>system-base × 42 instances<br/>nginx × 18 instances]
Exp --> Triage[CVE Responder tick<br/>cve_response skill]
Triage --> Plan[Plan:<br/>system-base 1.0.3 → 1.0.4<br/>nginx 1.24 → 1.26]
Plan --> RG[Concierge execute_agent<br/>cve_runbook_generate skill]
RG --> Page[Page persisted<br/>shareable runbook]
Plan --> Apr[ApprovalRequest<br/>risk_score=95]
Apr --> Op2{Operator approves?}
Op2 -->|yes| CRE[CveResponseExecutor]
CRE --> RU1[rolling_module_upgrade<br/>system-base 42 instances]
CRE --> RU2[rolling_module_upgrade<br/>nginx 18 instances]
RU1 --> Verify
RU2 --> Verify[ExposureCalculator re-run]
Verify --> Final{exposed_count == 0?}
Final -->|yes| Done[create_learning<br/>compound learning]
Final -->|no| AF[attribute_failure<br/>silent instances]
By the end you'll have rehearsed the CVE response flow against a synthetic CVE and produced a learning that compounds for the team.
The CVE response pipeline has 4 stages:
- Ingest — CVE records come from NVD (hourly
SystemCveFeedJob) or are injected manually for drills. Match keys areaffected_packages(name + version range). The feed ingester (CveOps::FeedIngestService) paginates the NVD 2.0 API (resultsPerPage/startIndex), pulling up toPOWERNODE_NVD_MAX_PAGES×POWERNODE_NVD_PAGE_SIZEresults per run (default 25 × 2000 = up to 50k CVEs), so a large window is fully pulled across the loop rather than capped at one page. - Expose —
ExposureCalculatorwalks eachNodeModule's SBOM (ingested per Tutorial 02 from cosign attestations) and computes per-instance exposure. Output is oneCveExposurerow per (CVE, NodeInstance) pair. - Triage —
cve_responseskill builds a remediation plan: for each affected module, what version bump removes the exposure, batched by instance count. - Remediate —
CveResponseExecutororchestrates onerolling_module_upgradeper affected module. Parallel when modules are independent; sequenced when dependency_spec requires it.
requires_approval always fires for risk_score ≥ 50. Lower-risk CVEs
can have auto-remediation policies, but production deployments typically
gate everything.
Why drills matter: the muscle memory of CVE response is what saves time during a real incident. Quarterly drills produce learnings that compound.
| Requirement | How |
|---|---|
| Tutorial 06 completed | You understand rolling upgrade + circuit breaker behavior |
Fleet using system-base and/or nginx modules with SBOMs ingested |
Tutorial 02 covers SBOM ingestion via Stage 2 CI |
Operator with system.cve_remediate approval rights |
Default for admin users |
(Optional) Ai::Concierge configured for runbook generation |
See AI agents in CLAUDE.md |
For drills, inject directly (real CVEs come from NVD feed automatically):
platform.system_create_cve({
cve_id: "DRILL-CVE-2026-99001",
severity: "critical",
cvss_score: 9.8,
affected_packages: [{ name: "openssl", version: "<3.1.4" }],
summary: "DRILL: Synthetic OpenSSL TLS handshake RCE",
published_at: "2026-05-17T00:00:00Z"
})
// → { cve: { id, ... } }Expected outcome: CVE row created in severity: critical.
Drill naming convention: always prefix synthetic CVE IDs with
DRILL-so they're never confused with real NVD records in audit logs and learnings.
The platform runs ExposureCalculator automatically on new CVE rows
(via the system_cve_feed worker), but for a fresh drill you can
trigger it directly:
platform.system_get_cve_exposure({ cve_id: "DRILL-CVE-2026-99001" })
// → {
// exposed_modules: [
// { id: "mod-system-base", name: "system-base", version: "1.0.3", assignment_count: 42 },
// { id: "mod-nginx", name: "nginx", version: "1.24.0", assignment_count: 18 }
// ],
// exposed_instance_count: 60,
// total_fleet_size: 150,
// exposure_pct: 40.0
// }Expected outcome: 40% fleet exposure — needs a coordinated response.
There is no execute_skill MCP action — the system-cve-response triage
skill is an executor bound to the CVE Responder agent, which runs it
automatically on its 60-second reconcile tick when a CvePublishedSensor
signal fires for the new CVE. To inspect the triage result, read the
exposure (Step 2) and the agent's recent events; the reconciler supplies the
skill inputs from the CveExposure rows. The triage produces:
Expected outcome: 1-hour total remediation estimate; 2 parallel module upgrades.
The system-cve-runbook-generate skill is a read-shape executor bound to the
System Concierge, so you trigger it by asking the Concierge in chat — or
programmatically via execute_agent (resolve the Concierge's UUID with
list_agents first; there is no execute_skill action):
platform.list_agents()
// → { agents: [{ id: "<concierge-uuid>", name: "System Concierge", ... }, ...] }
platform.execute_agent({
agent_id: "<concierge-uuid>", // execute_agent also accepts a slug or exact name
input: { input: "Generate a CVE remediation runbook for DRILL-CVE-2026-99001 and persist it as a page." }
})
// The Concierge calls the system-cve-runbook-generate executor internally and returns:
// → { runbook_markdown: "# DRILL-CVE-2026-99001 — Remediation Runbook\n\n...",
// persisted_page_id: "page-..." }Expected outcome: runbook surfaces:
- Exposed module list + version-bump targets
- Step-by-step remediation actions (with command snippets)
- Verification procedures (post-fix exposure recompute)
- Communication template for stakeholders
Page is shareable with the security team via /app/wiki or Slack.
Operator opens /app/approvals → sees the proposed plan + risk_score (95)
- generated runbook from Step 4 → optionally adjusts
batch_sizeper module (smaller for control-plane modules; larger for stateless) → clicks Approve.
Once approved, autonomy reconciler executes the plan.
platform.recent_events({ kind_prefix: "cve", limit: 50 })
// → cve.remediation_started, cve.remediation_module_started, cve.module_remediation_completed, ...
platform.recent_events({ kind_prefix: "module.upgrade", limit: 100 })
// → standard rolling upgrade events from Tutorial 06Or via UI: /app/system/operations → CVE response panel shows progress
per module.
After all batches complete (~1 hour):
platform.system_get_cve_exposure({ cve_id: "DRILL-CVE-2026-99001" })
// → { exposed_instance_count: 0, exposed_modules: [], ... }Expected outcome: zero exposure. If non-zero, some instances were silent during the upgrade window:
// system-attribute-failure is a read-shape skill bound to the Concierge —
// invoke it by asking the Concierge (no execute_skill action exists):
platform.execute_agent({
agent_id: "<concierge-uuid>", // from list_agents (see Step 4); slug/name also accepted
input: { input: "Attribute the reconcile failure on instance <silent-instance> looking back 24 hours." }
})
// → the Concierge runs the system-attribute-failure executor and returns the
// diagnosis of why this instance didn't reconcileplatform.create_learning({
title: "DRILL: DRILL-CVE-2026-99001 OpenSSL response — 60-instance scope",
category: "discovery",
content: "Synthetic critical CVE drill. Exposure correctly identified 60 instances across system-base + nginx. Total fix duration: 58 min. system-base bump (1.0.3→1.0.4) ran first per remediation plan (42 instances, 4-batch, 0 failures). nginx (1.24→1.26) ran in parallel after first 2 batches of system-base completed (per dependency_spec). Zero circuit breaker trips. Lessons: parallel module upgrades work when modules are independent; sequencing was correct via DependencyResolutionService.",
tags: ["cve-drill", "openssl", "incident-response", "rolling-upgrade"],
related_entities: [
{ type: "cve", id: "DRILL-CVE-2026-99001" },
{ type: "module", name: "system-base" },
{ type: "module", name: "nginx" }
]
})platform.system_delete_cve({ cve_id: "DRILL-CVE-2026-99001" })For a real CVE, do not delete — keep the record for audit. The
CveExposure rows transition to remediated and stay queryable.
CVE MCP actions unreachable — system_create_cve, system_get_cve,
system_get_cve_exposure, and system_delete_cve are all registered actions.
If a call fails it's an auth/permission issue, not a missing action; confirm
your token carries the CVE permissions. Direct-model fallbacks if you ever
need them from a console (cd server && bundle exec rails console):
- Insert:
System::Cve.create!(cve_id: "DRILL-...", severity: "critical", ...) - Query exposure:
System::CveExposure.where(cve_id: ...) - Delete:
System::Cve.find_by(cve_id: ...).destroy
Triage skill returns risk_score: 0 — SBOM ingestion isn't seeing the
affected packages. Verify:
- Module CI has the syft step (Tutorial 02 step 7)
- The webhook landed (
platform.recent_events({ kind_prefix: "system.sbom.ingested" })) - Package names in your
affected_packagesmatch exactly what syft emitted
Operator approval queue empty after triage — cve_response skill ran
but didn't create the ApprovalRequest. Check the agent's intervention
policy for system.cve_remediate:
// agent_introspect resolves by UUID only — look up "CVE Responder" first:
platform.list_agents()
// → { agents: [{ id: "<cve-responder-uuid>", name: "CVE Responder", ... }, ...] }
platform.agent_introspect({ agent_id: "<cve-responder-uuid>" })If the policy says auto_approve for low-severity CVEs and your
synthetic is critical, it should always require approval — file an
issue if not.
Remediation completes but exposure recompute still shows non-zero — race condition or silent instances. Cross-check:
platform.system_list_instances({
template_id: "<affected-template>",
exclude_running_module_digest: "<new-digest>"
})
// → instances that still report the old digestFor each one, run attribute_failure to find out why it didn't reconcile.
Container image CVEs aren't detected — cve_response matches against
NodeModule.package_spec (i.e., OS packages), not container image
contents. For images, use external scanners (Trivy, Grype) per Use Case
10 in USE_CASE_MATRIX.md.
- Tutorial 08 — Instance pools — for stateless workloads, the pool-replacement remediation strategy (terminate old, claim new-version from pool) is faster and safer than in-place upgrade.
runbooks/cve-response.md— full operator CVE runbook with SBOM-aware matching details.SKILL_EXECUTORS.md—cve_response,cve_runbook_generate,cve_remediation_orchestrationreference.- CVE Responder agent — autonomous remediation agent that runs this pipeline on every CVE feed update; see CLAUDE.md AI agents section.
- Run drills quarterly — every drill produces a learning that compounds. Real incidents reuse the muscle memory.
Last verified: 2026-06-03