Add curated MCP / external-API / library registry with security hardening

## Goal
Add a curated registry through which agents can be granted access to (a) **MCP servers** (filesystem, GitHub, Linear, Slack, Playwright, domain-specific vendor MCPs), (b) **external HTTP APIs** (OCR eval, cost APIs, observability APIs, cloud providers), and (c) **new libraries / frameworks** (e.g. `pydantic-ai`, `instructor`, `dspy`, a new vector store) that a repo could benefit from but currently has no pathway to discover or adopt.

Today agents get only whatever tools their underlying CLI ships with. Every integration is bespoke Python glue. There is no per-repo, operator-curated way to say "this repo may use the Linear MCP and the `pydantic-ai` framework" — nor any way for the system to notice "a new framework exists that would simplify what this repo is doing by hand."

## Success Criteria

**Phase 1 — Tool registry (MCPs + HTTP APIs)**
- New `config.yaml: tool_registry` section with two subsections:
  - `mcp_servers: [{name, command, args, env_refs, description, default_permissions}]`
  - `http_apis: [{name, base_url, auth_kind (bearer|header|none), env_refs, description}]`
- Per-repo opt-in under `repos.<key>.enabled_tools: [linear_mcp, playwright_mcp, receipt_eval_api]`
- Per-task-type permission scoping: `tool_registry.permissions: { groomer: [linear_mcp:read], quality_harness: [receipt_eval_api:call] }`
- `orchestrator/tool_registry.py` exposes `resolve_tools_for(repo_key, task_type) -> ToolBundle` returning normalized flags for the agent adapter (`--mcp-config`, tool allowlist, etc.)
- Adapters (`claude`, `codex`, `gemini`, `deepseek`) wire the bundle into their invocation; unsupported adapters log a clear "tool X not available on adapter Y" warning rather than silently dropping it
- Credentials only via env-var references — registry never stores secrets inline; startup validation fails closed if a referenced env var is missing

**Phase 2 — Library / framework awareness**
- New `orchestrator/library_scout.py` runs on a slow cadence (default: monthly)
- For each repo, reads `pyproject.toml` / `package.json` and the task history to understand what the repo does (modality labels from #250 + issue titles)
- Consults a curated `library_catalog.yaml` (operator-owned, at repo root) listing candidate libraries with `{name, category, fits_when, url, last_verified_at}`. Example categories: `llm_framework`, `validation`, `evals`, `vector_store`, `browser_automation`, `observability`
- Emits `library_suggestion` findings via the same scorer → groomer pipeline with `{repo, library_name, reason, fit_signals, proposed_experiment}`
- Groomer renders findings; operator approves via Telegram before a spike/experiment issue is filed (same anti-slop gate as #249)

**Phase 3 — Security hardening (non-negotiable)**
- Registry entries require explicit operator approval — no autonomous edits to `tool_registry` or `library_catalog.yaml` by any agent, ever
- MCP server commands must match a **pinned executable + pinned version** (e.g. `npx -y @modelcontextprotocol/server-github@1.2.3`, never `@latest`)
- Pre-flight verification on startup: each MCP server's package is checked against a `verified_packages.yaml` manifest with `{name, version, sha256, source_url}`. Mismatch fails closed with a Telegram alert
- Library suggestions restricted to the curated `library_catalog.yaml` — the scout cannot propose a package that is not already in the catalog; catalog entries are added by the operator only
- All tool invocations and library-suggestion approvals logged to the audit trail (#247)
- Outbound HTTP from agents routed through a per-repo allowlist (`tool_registry.http_apis[].base_url` is the only permitted origin set); unknown hosts blocked at the adapter layer where feasible
- No dynamic `pip install` / `npm install` by agents during task execution — dependency changes still flow through normal PRs reviewed by the existing gates
- Daily digest gains a "tool registry status" line: registered servers, verification results, any failed pre-flight checks

**Acceptance test**
- Synthetic MCP with correct pinned version + sha256 passes pre-flight; mutated sha256 fails closed
- Per-repo `enabled_tools` correctly narrows the bundle passed to the adapter
- A repo without `pydantic-ai` in deps, where task history shows manual JSON-parsing work, produces a `library_suggestion` finding pointing at `pydantic-ai` (assuming it is in the catalog); after operator approval, groomer files a spike issue
- A library not in the catalog never appears as a suggestion, even if deps/history would motivate it

## Constraints
- **Operator-curated registries only** — no autonomous additions to `tool_registry` or `library_catalog.yaml`. This is the anti-malware / anti-supply-chain bet, mirroring the #249 `target_operating_model.yaml` pattern
- **Pinned versions + checksums required** for MCP servers. `@latest` is banned. The system refuses to start a server whose checksum does not match the manifest
- **No secret storage in registry** — env-var references only; startup validation surfaces missing creds loudly
- **Per-task-type permission scoping is required, not optional** — an agent running a groomer task must not get quality-harness scopes by accident
- **Library scout is suggestion-only** — never opens a PR that adds a dependency; only files a spike/experiment issue for operator approval
- **Backwards-compatible** — repos with no `enabled_tools` continue to work exactly as today, with the adapter's default toolset

## Task Type
architecture

## Why
Agent-os today can only do what its LLM CLIs can do out of the box. The moment a repo needs Linear ticketing, Slack messaging, a browser, a vendor OCR eval, or a newer framework like `pydantic-ai` / `instructor` / `dspy` that would let it stop hand-rolling structured extraction — agent-os has no path forward. Every such integration becomes a one-off Python patch.

This is a ceiling on how far agent-os can grow. Two specific ceilings:
1. **Tool ceiling** — the quality harness (#250) hits this immediately: multimodal eval often needs vendor APIs (receipt-OCR ground truth, audio transcription, image similarity). The architect (#249) can detect *sensor* gaps but not *tool* gaps.
2. **Library ceiling** — a repo might be hand-parsing LLM JSON outputs when `pydantic-ai` would do it in three lines. Today nothing in agent-os notices. As the library ecosystem shifts, agent-os ossifies on whatever was trendy the month a repo was onboarded.

The mitigation is the same anti-slop pattern as #249 and #250: the *registry of known-good things* is operator-owned; the *detection of gaps against that registry* is autonomous. Security hardening (pinned versions + checksums + no autonomous registry edits + no dynamic installs) means the blast radius of a compromised catalog entry is bounded and auditable. Without this, "extend agent capabilities" either stays a manual coding chore or opens a supply-chain door we cannot defend.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add curated MCP / external-API / library registry with security hardening #252

Goal

Success Criteria

Constraints

Task Type

Why

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add curated MCP / external-API / library registry with security hardening #252

Description

Goal

Success Criteria

Constraints

Task Type

Why

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions