Skip to content

Add agent-driven template + docs/agents.md#129

Merged
marklubin merged 3 commits intomainfrom
mark/agent-template-docs
Apr 7, 2026
Merged

Add agent-driven template + docs/agents.md#129
marklubin merged 3 commits intomainfrom
mark/agent-template-docs

Conversation

@marklubin
Copy link
Copy Markdown
Owner

Summary

  • Template 09-agent-driven — demo pipeline using agent-backed transforms. Same DAG as 03-team-report but with named agents:

    • analyst agent reused across MapSynthesis + ReduceSynthesis (same persona, different tasks)
    • reporter agent for FoldSynthesis
    • synix.toml with [agents.*] config + instruction files in prompts/
    • Full 6-step demo case with golden verification
  • docs/agents.md — comprehensive agent documentation covering:

    • Agent protocol (map/reduce/group/fold + task_prompt composition)
    • agent_id vs fingerprint_value (separate lifecycles)
    • SynixLLMAgent + PromptStore integration
    • Workspace config and load_agents()
    • Custom agent implementations
    • Artifact provenance

Test plan

  • uv run synix demo run templates/09-agent-driven — all 12 steps pass
  • uv run release — full gate including all demos

New template demonstrating agent-backed transforms:
- analyst agent reused across MapSynthesis + ReduceSynthesis
- reporter agent for FoldSynthesis
- Same DAG as 03-team-report but with named agents
- Transform prompts define task structure, agent instructions define persona
- synix.toml with [agents.*] config + instructions files in prompts/
- Full demo case with plan/build/release/search/rebuild/explain

New docs/agents.md covering:
- Agent protocol (map/reduce/group/fold + task_prompt composition)
- agent_id vs fingerprint_value (separate lifecycles)
- SynixLLMAgent + PromptStore integration
- Workspace config via [agents.*] in synix.toml
- Custom agent implementations
- Artifact provenance (agent_id + agent_fingerprint)
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Red Team Review — OpenAI GPT-5.4 | Adversarial review (docs + diff only)

Threat assessment — Medium-high risk: this PR introduces a new user-facing abstraction (“agents”) but the shipped template/demo is broken and the docs overstate capabilities the implementation does not provide.

One-way doors

  1. Public “Agent” abstraction and protocol shapedocs/agents.md defines agent_id, fingerprint_value(), and method names map/reduce/group/fold. Users will build custom agents against this surface. Hard to reverse because it becomes extension API. Safe to merge only if the protocol is actually stable, implemented consistently across transforms, and covered by compatibility tests.
  2. Workspace config contract for agentsdocs/agents.md + templates/09-agent-driven/synix.toml introduce [agents.<name>], prompt_key, instructions_file, provider/model/base_url. That’s a config schema users will depend on. Safe only if load_agents() and prompt-store binding are real, tested, and documented end-to-end.
  3. Provenance schema additionsdocs/agents.md claims artifacts record agent_id, agent_fingerprint, prompt_id. Once emitted into on-disk artifacts/search DB, downstream tooling may rely on them. Safe only if release/search/viewer paths preserve and expose these fields consistently.

Findings

  1. src/synix/templates/09-agent-driven/pipeline.py and templates/09-agent-driven/pipeline.py: broken template by construction
    The template instantiates SynixLLMAgent without binding a prompt store, while comments claim “no PromptStore needed.” Your own goldens confirm build and plan fail immediately on fingerprinting. Shipping a template that cannot load is unacceptable, and worse, it teaches the wrong usage. Severity: [critical]

  2. templates/09-agent-driven/golden/*.txt: demo/golden suite codifies failure as success
    The goldens assert error outputs for plan, build, and rebuild. That means CI can pass while the feature is nonfunctional. This is not testing behavior; it is freezing a known broken UX. Severity: [critical]

  3. docs/agents.md: design/API claims exceed implementation
    Docs state “Both are required when using agents,” “load_agents() reads synix.toml,” and define agent-driven grouping as part of the protocol. But SynixLLMAgent.group() explicitly raises NotImplementedError, and there is no diff showing load_agents() implementation or any end-to-end workspace path. This is a direct documentation/implementation mismatch. Severity: [warning]

  4. docs/agents.md vs project principle: config-first drift
    DESIGN.md is explicit: Python-first, not config-first. This PR adds synix.toml agent config as a first-class story without reconciling that architectural stance. Maybe acceptable as workspace metadata, but the docs present it as the main configuration path for agents. That’s a design inconsistency and a likely source of future coupling between pipeline code and workspace state. Severity: [warning]

  5. src/synix/agents.py + transform constructors: hidden eager fingerprint dependency
    Agents require prompt-store binding just to compute fingerprints, and constructors/tests already reject empty fingerprints. That means pipeline loading/planning now depends on external workspace state before execution. This leaks runtime configuration into graph definition and makes templates/non-workspace pipelines fragile. Severity: [warning]

  6. docs/agents.md: “Instructions are loaded from the PromptStore at call time”
    If true, then cache identity depends on mutable external prompt store state, not just pipeline code. That is a one-way coupling between workspace files/editor state and build graph semantics. The diff shows no validation for missing keys, stale prompt files, or reproducibility story beyond a hash. This undermines the “Python-first declaration” and makes builds less self-contained. Severity: [warning]

  7. src/synix/templates/09-agent-driven/pipeline.py: uses pipeline.build_dir = "./build"
    DESIGN.md says build/release separation is absolute and “there is no build/ directory”; .synix/ is the source of truth. Even if this field is legacy-compatible, introducing it in a new template violates the documented storage model and risks users depending on obsolete layout assumptions. Severity: [warning]

Missing

  • A passing end-to-end test that uvx synix init ...09-agent-driven, build, release, and search all work.
  • Tests for load_agents() / synix.toml parsing, prompt-store binding, missing prompt files, and missing prompt keys.
  • Documentation updates in README/website if “agents” is now a supported user-facing concept.
  • Validation at pipeline load time with a clear error if agent-backed transforms are used without a bound prompt store.
  • Any migration note explaining how agent fingerprints interact with existing content-addressed cache semantics.

VerdictBlock: this introduces a public abstraction and template while the flagship example is nonfunctional and the docs promise behavior not demonstrated in code.

Review parameters
  • Model: gpt-5.4
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,038 lines
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-04-07T21:32:29Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary — This PR introduces the Agent abstraction (docs, template, demo case) and applies formatting cleanup across several files. Agents are named execution units with stable identity and fingerprinting that compose with transform prompts — persona (HOW) separated from task (WHAT). The bulk of the diff is a new template (09-agent-driven), new documentation (docs/agents.md), and cosmetic reformatting.

Alignment — Agents fit the vision well. DESIGN.md's core thesis is Python-first programmability and architecture experimentation. Agents extend the transform interface without replacing it — you can swap agents without changing task prompts, and agent fingerprints feed into cache invalidation (materialization keys capture all inputs, per §3.3). The agent_id / agent_fingerprint provenance fields preserve the audit determinism principle (§3.9). The Protocol-based extension point is clean and consistent with "Python-first, not config-first" (§4.1).

Observations

  1. [concern] Golden files record errors, not success. build.stdout.txt, plan.stdout.txt, explain.stdout.txt, and rebuild.stdout.txt all contain "Error loading pipeline: ... Agent 'analyst' has no prompt store". The template's pipeline.py creates SynixLLMAgent inline without calling bind_prompt_store(), and the comment says "no PromptStore needed" — but the golden output proves otherwise. This template doesn't work. The demo case will fail on every step that loads the pipeline.

  2. [concern] Template duplicated in two locations. templates/09-agent-driven/ and src/synix/templates/09-agent-driven/ contain identical pipeline.py and source files. No explanation for why both exist. This creates a maintenance hazard — edits to one won't automatically propagate to the other.

  3. [concern] No automated tests for the new agent template. The demo case.py exercises the flow, but the golden files capture error output. There are no unit or integration tests verifying that the agent-driven pipeline actually builds artifacts. The existing test_agent_transforms.py changes are purely formatting.

  4. [question] Agent Protocol includes group() but SynixLLMAgent raises NotImplementedError for it. The docs show group as a first-class protocol method, but the implementation explicitly doesn't support it (See issue #127). Should the docs note this limitation? A user following the documentation would expect it to work.

  5. [question] pipeline.build_dir = "./build" in template. The DESIGN.md §1.5 explicitly says "There is no build/ directory" and that .synix/ is the single source of truth. Is build_dir a legacy property, or does this template contradict the stated architecture?

  6. [nit] Formatting-only changes dominate the diff. The reformatting of agents.py, viewer/server.py, test_viewer_server.py, and all four ext/*.py files is pure whitespace/line-break changes. These are fine but inflate the diff and obscure the substantive additions. Separating formatter runs from feature work would make review easier.

  7. [positive] The docs are well-structured. docs/agents.md clearly separates identity vs. fingerprint, shows the Protocol, demonstrates the synix.toml config path, and explains provenance fields. The composability model (task prompt as user message, agent instructions as system message) is intuitive and well-explained.

  8. [positive] The extension model is clean. The Protocol-based Agent interface with four methods matching the four transform shapes is easy to implement. The custom agent example in the docs is minimal and self-contained — someone could build on it without reading source.

  9. [nit] Prompt files lack trailing newline. analyst.txt and reporter.txt both end without \n. Minor, but can cause issues with some tooling and is inconsistent with typical text file conventions.

Verdict — The Agent concept is a sound addition that fits the architecture, but the PR ships a broken template (golden files prove it fails) and lacks tests for the happy path — it needs the bind_prompt_store lifecycle resolved before merging.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,038 lines
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-04-07T21:32:30Z

…oldens

- Pipeline creates PromptStore and seeds from prompts/ directory,
  then binds to agents (was missing — agents failed without store)
- LLMClient._complete_anthropic() now extracts system messages from
  messages list and passes as system= parameter (Anthropic API
  requires this, was sending system role in messages)
- Remove volatile search golden (LLM output varies without cassette)
- Add output_masks for LLM-dependent steps in case.py
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Red Team Review — OpenAI GPT-5.4 | Adversarial review (docs + diff only)

Threat assessment — Medium-high risk: this looks small in code, but it introduces a new user-facing abstraction (“agents”) with unclear semantics, docs/runtime drift, and weak test coverage around the actual failure paths.

One-way doors

  1. Public “Agent” abstraction and API shape

    • Hard to reverse because users will start writing custom Agent implementations against map/reduce/group/fold, and template/docs now teach this mental model.
    • Safe to merge only if the protocol is actually supported end-to-end, documented as stable/experimental consistently, and tested across all transform types.
  2. synix.toml agent workspace config

    • Hard to reverse because config keys like [agents.<name>], prompt_key, instructions_file, provider/model fields become part of project layout and tooling expectations.
    • Safe only if loader behavior, validation, and migration story are implemented and documented. Right now docs mention load_agents() but no diff shows that implementation.
  3. Anthropic system-message handling

    • This changes wire semantics for all Anthropic calls. If shipped incorrectly, cached artifacts and behavioral expectations will diverge by provider.
    • Safe only with regression tests proving multiple-message cases, no-system cases, and parity with previous non-Anthropic behavior.

Findings

  1. docs/agents.md + templates/09-agent-driven: introduces user-facing feature without README/website integration

    • User-facing capability added, including a new template and config model, but primary docs (README, website content) are unchanged. Pre-1.0 is not an excuse for fragmented product surface.
    • Severity: [warning]
  2. docs/agents.md: claims group() is part of protocol while src/synix/agents.py built-in agent explicitly does not implement it

    • That is docs/runtime contradiction. The doc presents agent-driven grouping as first-class; the shipped default agent throws NotImplementedError.
    • Severity: [warning]
  3. docs/agents.md: references synix.workspace.load_agents() with no supporting diff

    • This is either undocumented existing functionality or dead documentation. Either way, this PR is advertising an integration path not shown or verified here.
    • Severity: [warning]
  4. src/synix/templates/09-agent-driven/pipeline.py and templates/09-agent-driven/pipeline.py: hardcoded pipeline.build_dir = "./build" and direct PromptStore(_here / ".synix" / "prompts.db")

    • This violates the design doc’s “.synix/ is the single source of truth — there is no build/ directory.” The template bakes in legacy storage assumptions and reaches into server internals for prompt storage.
    • Severity: [critical]
  5. templates/09-agent-driven/pipeline.py: imports synix.server.prompt_store.PromptStore directly

    • Leaks server implementation details into a pipeline template. That couples offline pipeline authoring to server internals, exactly the wrong boundary for a Python-first pipeline API.
    • Severity: [critical]
  6. src/synix/build/llm_client.py::_complete_anthropic

    • It silently keeps only the last system message encountered and drops any earlier one. If message construction ever emits multiple system messages, behavior changes without validation or warning.
    • Severity: [warning]
  7. src/synix/build/llm_client.py::_complete_anthropic

    • No test diff for the actual Anthropic fix. This is a behavioral provider-specific change with obvious regression surface and zero visible coverage.
    • Severity: [warning]
  8. templates/09-agent-driven/case.py: demo explicitly depends on real LLMs and masks failures

    • “no cassette support yet” plus output masking means the demo is not a reliable test. It normalizes away instability instead of asserting behavior. That is showcase code, not verification.
    • Severity: [warning]
  9. docs/agents.md: provenance field naming (prompt_id = transform prompt template hash) is underspecified

    • The design doc emphasizes full audit/provenance. Here prompt_id is overloaded and likely to confuse template hash vs prompt-store key vs rendered prompt hash. Once users depend on lineage fields, renaming is painful.
    • Severity: [minor]

Missing

  • Tests for Anthropic system-message extraction, especially multiple system messages and no-system cases.
  • Tests proving agent fingerprints actually invalidate cache behavior, not just golden text claiming they do.
  • Tests for template/bootstrap path: prompt DB seeding, missing prompt files, concurrent access to .synix/prompts.db, and rebuild behavior after prompt edits.
  • README/template index updates if 09-agent-driven is intended as a supported template.
  • Validation/error handling around PromptStore initialization in templates; current template can fail on environment/layout assumptions.
  • Clear designation whether agents are experimental. Current status docs don’t mention them.

VerdictBlock: the core bugfix in Anthropic handling may be fine, but this PR also ships a new abstraction with doc/code drift, boundary violations, and a template that contradicts the project’s own storage model.

Review parameters
  • Model: gpt-5.4
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,244 lines
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-04-07T21:53:24Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary — This PR introduces the Agent abstraction — named, reusable execution personas that compose with transform task prompts. It adds the Agent protocol, SynixLLMAgent implementation, a new template (09-agent-driven), documentation, a demo case with golden files, a bug fix for Anthropic system messages, and a formatter pass across several files.

Alignment — Strong fit. DESIGN.md's Python-first principle (§4.1) and the extension model (custom transforms, prompt functions) both anticipate this. Agents add a composable axis of variation — persona is now independently versionable from task prompt — without violating core invariants. The fingerprint_value() method feeds into materialization keys (§3.3), so changing agent instructions invalidates cache correctly. agent_id + agent_fingerprint recorded on artifacts maintains provenance completeness (§3.9 audit determinism). The protocol-based extension model ("any object satisfying the Agent protocol works") is consistent with the CDK-over-CloudFormation bet: code, not config.

Observations

  1. [positive] Anthropic system message fix (llm_client.py): The Anthropic SDK rejects {"role": "system"} in the messages list — it requires system= as a kwarg. This is a real bug fix that would have blocked any agent usage with Anthropic. Clean extraction logic.

  2. [positive] Cache invalidation design: Agent fingerprint participates in cache keys. The explain-cache golden shows agent config as a cache component. Changing agent instructions → fingerprint changes → rebuild. This is architecturally correct per §3.3.

  3. [concern] group() raises NotImplementedError in SynixLLMAgent — the protocol declares it, the docs show it as a method, but the built-in implementation punts to issue Agent-driven group routing: split artifacts into named groups for parallel downstream processing #127. GroupSynthesis.execute has an agent path that calls self.agent.group(). If someone passes a SynixLLMAgent to a GroupSynthesis, they'll get a runtime error with no compile-time signal. The docs should call this out explicitly, or GroupSynthesis.__init__ should reject agents that don't implement group.

  4. [question] Duplicate file trees: Template sources appear under both src/synix/templates/09-agent-driven/ and templates/09-agent-driven/. The pipeline.py is identical in both locations. Is there a build step that syncs these, or is this a maintenance burden that will drift?

  5. [concern] PromptStore import from synix.server: The template's pipeline.py imports from synix.server.prompt_store import PromptStore. This means running the template requires the server extras. If synix[server] isn't installed, this import fails at pipeline load time with an opaque ModuleNotFoundError. Either the prompt store should be in core, or the template needs a guard/documentation.

  6. [question] load_agents() in docs but not in diff: docs/agents.md describes from synix.workspace import load_agents which reads synix.toml and creates agents automatically. This function doesn't appear in the diff. Is it pre-existing, or is the doc forward-declaring an unimplemented API?

  7. [positive] Golden files with masks: The demo case uses output_masks and placeholder tokens (<BUILD_COUNTS>, <TIME>, <N>) to handle LLM non-determinism in golden files. This is a pragmatic approach for testing agent-backed transforms where content varies.

  8. [nit] Formatting-only changes mixed with behavioral changes: The diff is ~60% formatter changes (line wrapping in agents.py, group_synthesis.py, viewer/server.py, test files). This inflates the diff and makes the real changes harder to review. Consider separating format-only commits.

  9. [concern] No unit tests for the Anthropic system message fix: The _complete_anthropic change is the most impactful behavioral fix in this PR — it determines whether system messages work at all with Anthropic. There's no test covering the message splitting logic (e.g., what if there are multiple system messages? What if there are zero?).

  10. [nit] Golden files missing trailing newlines: Every golden file ends with \No newline at end of file. Minor, but some tools complain.

Verdict — This is a well-designed extension that adds a meaningful composition axis (persona × task) while preserving cache and provenance invariants; the Anthropic fix and PromptStore coupling are the items to address before merge.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,244 lines
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-04-07T21:53:37Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Red Team Review — OpenAI GPT-5.4 | Adversarial review (docs + diff only)

Threat assessment — Medium risk: there’s one real behavioral fix here, but the PR mostly papers over an unfinished “agents” abstraction and introduces user-facing template/docs commitments that outpace the implementation.

One-way doors

  1. Public “Agent” abstraction and terminology

    • Hard to reverse because Agent, agent_id, agent-backed transforms, and synix.toml [agents.*] become part of user code, docs, and mental model.
    • Safe to merge only if the protocol is actually stable across transform types and workspace loading is implemented/documented end-to-end, not just sketched.
  2. Workspace-level agent config in synix.toml

    • Hard to reverse because once users define agents there, changing keys/shape breaks projects and templates.
    • Safe only if load_agents() and the TOML schema are tested and documented as supported, or clearly marked experimental.
  3. Template baking in .synix/prompts.db and build/

    • Hard to reverse because templates become canonical examples. Worse, build_dir="./build" conflicts with the design doc’s explicit “no build/ directory” / .synix source of truth direction.
    • Safe only if the project has actually chosen to abandon build/release separation as stated in DESIGN.md, or the template is corrected.

Findings

  1. src/synix/templates/09-agent-driven/pipeline.py and templates/09-agent-driven/pipeline.py: hardcoded pipeline.build_dir = "./build"

    • This directly contradicts DESIGN.md section 1.5 (“There is no build/ directory”). Shipping a first-party template with the old model teaches users the wrong storage contract and creates migration debt.
    • Severity: [warning]
  2. docs/agents.md: documents group() on the Agent protocol while SynixLLMAgent.group() is explicitly unsupported

    • This is a spec/implementation mismatch. The doc presents grouping as a normal agent capability and says “each method matches a pipeline transform shape,” but the only bundled agent throws NotImplementedError. Users will build against a capability that does not exist.
    • Severity: [warning]
  3. docs/agents.md: introduces synix.toml + load_agents() without any code in diff proving the path works

    • The PR adds user-facing configuration promises with no corresponding implementation/test evidence here. That’s exactly how pre-1.0 projects accumulate phantom APIs.
    • Severity: [warning]
  4. src/synix/build/llm_client.py::_complete_anthropic: multiple system messages silently collapse to the last one

    • The loop overwrites system_text on each role=="system" message. If upstream composition ever emits more than one system message, behavior becomes order-dependent and silently lossy. That is hidden complexity, not a clean adapter.
    • Severity: [warning]
  5. src/synix/build/llm_client.py::_complete_anthropic: no validation for empty non-system message list after extraction

    • If callers pass only a system message, this now sends messages=[] to Anthropic. Likely API error path, and there’s no explicit guard or clearer error. System-boundary validation is missing.
    • Severity: [minor]
  6. templates/09-agent-driven/pipeline.py: template reaches into synix.server.prompt_store.PromptStore

    • This leaks server/viewer internals into core pipeline authoring. README/DESIGN say Python-first declarative pipelines; this template makes a pipeline depend on server storage mechanics and a local prompt DB path. That’s coupling the pipeline model to the viewer/server implementation.
    • Severity: [warning]
  7. templates/09-agent-driven/case_live.py: demo explicitly says “no cassette support yet” and masks failures

    • This is not a reliable regression test; it’s a flaky live demo. Masking Pipeline failed in golden outputs is a red flag, not coverage. You’re adding a public template without deterministic verification.
    • Severity: [warning]

Missing

  • A focused unit test for the Anthropic fix: system message extracted into system=..., non-system messages preserved, and behavior with multiple/zero user messages.
  • Tests for the documented workspace agent loading path (synix.tomlload_agents() → bound prompt store).
  • Documentation updates in README/website if “agents” is now a promoted feature; right now only a side doc/template was added.
  • Clarification whether agents are experimental. The docs read as productized.
  • Validation around template bootstrapping when .synix/prompts.db or prompt files are missing/unwritable.

VerdictShip with fixes: the Anthropic adapter change is probably correct, but the agents/template/docs story is prematurely hardened and currently conflicts with the project’s own design direction.

Review parameters
  • Model: gpt-5.4
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,244 lines
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-04-07T21:57:06Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary

This PR introduces the Agent abstraction — named, reusable execution personas that compose with transform task prompts. It adds the Agent protocol and SynixLLMAgent implementation, a new template (09-agent-driven) demonstrating agent-backed pipelines, documentation (docs/agents.md), a bug fix for Anthropic system message handling, and a large amount of formatting cleanup (line length, dict/call indentation).

Alignment

Strong fit. Agents are a clean extension of the "Python-first" and "architecture is a runtime concern" principles from DESIGN.md. The agent_id / fingerprint_value() split correctly preserves cache invalidation semantics — changing agent instructions changes the fingerprint, which changes the materialization key, triggering incremental rebuilds of only affected layers. Provenance is extended: agent_id + agent_fingerprint + prompt_id on artifacts maintains the "audit determinism" contract. The Protocol-based extension model ("any object satisfying Agent works") is consistent with the design's emphasis on composability and experimentation.

Observations

  1. [positive] The task/persona separation (transform prompt = WHAT, agent instructions = HOW) is well-motivated and clearly documented. The template pipeline.py is readable and demonstrates all four transform shapes with two agents.

  2. [positive] The Anthropic system message fix in llm_client.py is a real bug fix — Anthropic's API rejects {"role": "system"} in the messages list. This was likely broken before agents started injecting system messages.

  3. [concern] The Agent protocol in docs/agents.md shows map, reduce, group, fold as methods, but SynixLLMAgent.group() raises NotImplementedError. The protocol documentation doesn't mention this limitation. A user implementing the protocol would expect all methods are required, but in practice group() is deferred to issue Agent-driven group routing: split artifacts into named groups for parallel downstream processing #127. The doc should note which methods are optional or raise-by-default.

  4. [concern] The pipeline template at src/synix/templates/09-agent-driven/pipeline.py and templates/09-agent-driven/pipeline.py are identical files. This duplication will drift. Is there a build step that copies one to the other, or is this a mistake?

  5. [question] _store.seed_from_files(_here / "prompts") in the template pipeline — what happens if the prompts/ directory is missing or empty? The template ships with prompt files, but a user copying the pattern might hit an unhelpful error.

  6. [question] The case_live.py notes "no cassette support yet" for agent-backed transforms, meaning this demo case always hits a real LLM. Are there unit tests that cover the agent-transform integration without LLM calls? The existing test_agent_transforms.py uses FakeAgent, which is good, but the golden files in templates/09-agent-driven/golden/ use heavy masking (<BUILD_COUNTS>, <N>, <STATS>) — this suggests the e2e path may not be runnable in CI without API keys.

  7. [concern] No new unit tests were added in this PR. The changes to test_agent_transforms.py, test_agents.py, and test_viewer_server.py are purely formatting. The Anthropic system message extraction logic in llm_client.py — which is the most consequential behavioral change — has zero test coverage in this diff. Edge cases: what if there are multiple system messages? The current loop takes the last one silently.

  8. [nit] Golden files lack trailing newlines (every file ends with \ + No newline at end of file). Minor, but some tools and diff viewers handle this poorly.

  9. [nit] The formatting changes (line length, dict indentation) across agents.py, group_synthesis.py, viewer/server.py, etc. are consistent with a formatter run (likely ruff or black). They're fine but inflate the diff — would be cleaner as a separate commit.

  10. [positive] The synix.toml workspace configuration for agents is a sensible ergonomic addition — declarative agent definitions that load_agents() hydrates. This keeps simple cases simple while the Python API handles complex ones.

Verdict

Good incremental step that adds a well-designed extension point; the main gap is missing tests for the Anthropic system-message fix and the duplicated template files.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,244 lines
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-04-07T21:57:16Z

@marklubin marklubin merged commit 8410676 into main Apr 7, 2026
13 checks passed
@marklubin marklubin deleted the mark/agent-template-docs branch April 7, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant