Skip to content

Typed Agent protocol: map/reduce/group/fold + SynixLLMAgent#128

Merged
marklubin merged 6 commits intomainfrom
mark/typed-agents
Apr 7, 2026
Merged

Typed Agent protocol: map/reduce/group/fold + SynixLLMAgent#128
marklubin merged 6 commits intomainfrom
mark/typed-agents

Conversation

@marklubin
Copy link
Copy Markdown
Owner

Summary

Replaces the generic write(AgentRequest) → AgentResult gateway (PR #126) with a typed Agent protocol matching pipeline transform shapes. Agents now own rendering + execution and have separate identity (agent_id) and config snapshot (fingerprint_value()).

  • Agent Protocol — typed methods: map(Artifact)→str, reduce(list[Artifact])→str, group(list[Artifact])→list[Group], fold(str, Artifact, int, int)→str
  • agent_id (stable identity) separate from fingerprint_value() (config snapshot for cache invalidation)
  • SynixLLMAgent — built-in implementation backed by PromptStore for versioned instructions, LLMClient for execution, system+user messages
  • Group dataclass — key, artifacts, content for group operation results (Agent-driven group routing: split artifacts into named groups for parallel downstream processing #127 tracks feed-forward variant)
  • Workspace config[agents.*] TOML parsing, load_agents() convenience for pipeline.py
  • All 4 transforms updated to call typed methods when agent is set
  • Backward compatibleagent=None path unchanged

Key design decisions

  • Agent owns rendering + message framing + LLM execution (transform just passes typed inputs)
  • Instructions loaded from PromptStore at call time (picks up viewer edits automatically)
  • fingerprint_value() uses PromptStore content_hash (auto-invalidates cache on prompt edit)
  • agent_id = stable name, doesn't change with config — separate lifecycle from fingerprint

Test plan

  • uv run pytest tests/unit/test_agents.py -v — protocol + SynixLLMAgent
  • uv run pytest tests/unit/test_agent_transforms.py -v — all 4 shapes
  • uv run pytest tests/e2e/test_agent_pipeline.py -v — full pipeline
  • uv run pytest tests/unit/test_ext_transforms.py -v — backward compat
  • uv run release — full gate

Replace generic write(AgentRequest) gateway with typed operations
matching pipeline transform shapes:

- Agent Protocol: map(Artifact)→str, reduce(list[Artifact])→str,
  group(list[Artifact])→list[Group], fold(accumulated, Artifact, step, total)→str
- agent_id (stable identity) separate from fingerprint_value() (config snapshot)
- SynixLLMAgent: built-in implementation backed by PromptStore for
  versioned instructions, LLMClient for execution, system+user messages
- Group dataclass: key + artifacts + content for group results
- Workspace [agents.*] config parsing, load_agents() convenience
- All 4 generic transforms updated for typed calls
- 84 agent tests + 2214 total tests passing
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Red Team Review — OpenAI GPT-5.4 | Adversarial review (docs + diff only)

Threat assessment — Medium-high risk: this PR changes a core public abstraction (Agent) in a way that leaks implementation details, breaks compatibility, and creates cache/provenance ambiguity.

One-way doors

  1. Public Agent protocol redesign (src/synix/agents.py) — Replacing write(request) with shape-specific methods (map/reduce/group/fold) is a user-facing API break that downstream custom agents will code against. Hard to reverse once docs/examples adopt it. Safe only if there is an explicit migration plan, compatibility shim, and docs updated everywhere AgentRequest/AgentResult were part of the supported surface.
  2. Persisting agent_id on Artifact (src/synix/core/models.py) — This extends artifact schema/serialized records and becomes part of provenance semantics. Hard to undo if releases/search/materialization depend on it. Safe only if on-disk compatibility, migration behavior, and all readers/serializers are covered by tests.
  3. Config-level [agents.*] in synix.toml (src/synix/workspace.py) — New config schema becomes user contract. Safe only if documented, validated, and stable naming/override rules are nailed down.

Findings

  1. src/synix/__init__.py re-export breakage pattern
    You removed AgentRequest/AgentResult from package exports with no compatibility alias. Any user importing from synix import AgentRequest now breaks immediately. That is a straight public API break. Severity: [critical]

  2. src/synix/agents.py protocol now forces every agent to implement all transform shapes
    Design doc emphasizes composable primitives, but this API couples one agent implementation to map/reduce/group/fold even if it only supports one. That is a bad abstraction boundary and a regression from generic gateway behavior. group() is especially absurd here because the built-in implementation doesn’t support it. Severity: [warning]

  3. src/synix/ext/group_synthesis.py agent path violates project’s own separation of concerns
    Old model: Synix owned grouping semantics. New model: agent.group(inputs) owns grouping, rendering, and execution. That leaks DAG/grouping logic into the agent layer and directly contradicts the previous contract in the module docstring and design direction (“Synix owns prompt rendering and transform semantics”). Severity: [critical]

  4. src/synix/agents.py SynixLLMAgent.map/reduce/fold double-injects instructions
    It renders self.instructions as a template into user_content, then also sends the same self.instructions as the system prompt in _call(). That means instructions are both template and policy, conflating two different concerns and making prompt behavior non-obvious. It also risks duplicated instruction text and unstable outputs. Severity: [warning]

  5. src/synix/agents.py fingerprint behavior is cache-unsafe before prompt store binding
    fingerprint_value() silently hashes empty prompt content when _prompt_store is absent. Same agent gets one fingerprint before binding and another after binding. That can poison cache/materialization correctness depending on call order/environment. Severity: [critical]

  6. src/synix/workspace.py load_agents() / get_agent() have side effects during config load
    These functions mutate prompt storage by seeding instruction files on read. Config parsing should not write state. This creates hidden execution-order dependency on whether .synix/prompts.db exists and whether prompt key is already present. Severity: [warning]

  7. src/synix/workspace.py inconsistent config handling for api_key_env
    get_agent() passes api_key_env; load_agents() does not. Same config produces different runtime behavior depending on code path. That is exactly the kind of hidden divergence that becomes impossible to debug. Severity: [critical]

  8. src/synix/ext/group_synthesis.py non-agent and agent semantics now diverge materially
    Non-agent path still uses Synix group_by splitting; agent path ignores it and can emit arbitrary groups. Same transform type, radically different semantics. That will break provenance expectations, output count estimates, and likely cache/planning assumptions. Severity: [warning]

  9. src/synix/agents.py from_file() uses default text decoding and no error handling
    Reads instruction files with platform-default encoding and no failure-path tests for missing file, unreadable file, or invalid content. That is sloppy at the file boundary. Severity: [minor]

Missing

  • Migration notes/docs for removed AgentRequest/AgentResult and new custom-agent contract.
  • Tests for serialization/persistence of new Artifact.agent_id through build, release, and search.
  • Tests proving fingerprint stability across bound/unbound prompt store states.
  • Validation/tests for malformed [agents.*] config in synix.toml.
  • Any documentation update in README/website/DESIGN for the new agent model and agent-owned grouping semantics.
  • Concurrency/locking story for prompt store seeding from both workspace and pipeline load paths.

VerdictBlock: this PR introduces a user-visible API break and a deeper design regression by pushing transform semantics into agents without a compatibility plan or cache-safety proof.

Review parameters
  • Model: gpt-5.4
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,804 lines
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-04-07T02:43:48Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary

This PR replaces the generic Agent.write(AgentRequest) → AgentResult gateway with a shape-typed protocol where agents implement map(), reduce(), group(), and fold() directly. It introduces SynixLLMAgent as a built-in implementation backed by PromptStore, adds agent_id to Artifact for provenance, and wires agent configuration through synix.toml and workspace loading.

Alignment

Strong fit. DESIGN.md specifies that provenance chains must be complete and cache keys must capture all inputs. Adding agent_id to artifacts enriches the provenance chain — you can now trace which agent produced an artifact, not just its fingerprint. The fingerprint still drives cache invalidation (config snapshot), while agent_id provides stable identity — these are correctly separated, matching the materialization key philosophy. The SynixLLMAgent.fingerprint_value() hashes prompt content + llm_config, which aligns with the "step_version_hash captures everything affecting output" rule. The TOML-based agent config is a pragmatic complement to the Python-first principle — agents are defined declaratively but composed in Python pipeline code.

Observations

  1. [positive] The shape-typed protocol (map, reduce, group, fold) is a better contract than the generic write(). Each transform no longer renders prompts then hands opaque text to the agent — the agent receives typed domain objects. This makes the extension model significantly clearer.

  2. [concern] group_synthesis.py — agent path bypasses split/cache logic entirely. When self.agent is not None, the code calls self.agent.group(inputs) at the top of execute() and returns immediately, skipping the split() mechanism and the deterministic sorting. This means the agent path has no fingerprint-based caching — if the agent produces different groupings on re-run, there's no way to detect it. The non-agent path still goes through split. This is a meaningful behavioral divergence that could cause cache correctness issues.

  3. [concern] Duplicated agent-loading logic. Workspace.get_agent() and load_agents() both construct SynixLLMAgent from AgentConfig with nearly identical llm_config assembly code (~15 lines each). This will drift. Extract a shared _build_agent_from_config().

  4. [question] SynixLLMAgent.group() raises NotImplementedError — but the Agent protocol requires it. If someone passes a SynixLLMAgent to a GroupSynthesis, it will fail at runtime, not at construction. Should GroupSynthesis.__init__ validate that the agent's group() is callable (not just present)?

  5. [concern] SynixLLMAgent._call() instantiates a new LLMClient on every call. For pipelines processing thousands of artifacts, this creates thousands of client instances. If LLMClient holds connection state or does setup work, this could be a performance issue at scale.

  6. [positive] Test coverage is thorough — unit tests cover all four shapes, protocol conformance (missing any single method fails isinstance), fingerprint determinism and sensitivity, PromptStore binding, the from_file factory, and agent_id propagation. The e2e test updates are consistent.

  7. [nit] agent_id_val as a variable name in the transforms is awkward. agent_id would be fine — it doesn't shadow anything in scope.

  8. [question] load_agents() checks for prompts.db existence but creates no store if it doesn't exist. On a fresh project before first build, agents loaded via load_agents() would have no store bound and instructions access would raise. Is this the intended UX for pipeline authors?

  9. [positive] The __init__.py export change (Agent, Group, SynixLLMAgent) is a clean public API. Removing AgentRequest/AgentResult is a breaking change but appropriate for pre-1.0.

  10. [nit] Group is a very generic name for a public export. Could collide with user code. GroupResult or AgentGroup would be more specific.

Verdict

This is a well-executed interface upgrade that strengthens both the extension model and provenance tracking, with one real concern around the group-synthesis agent path bypassing caching semantics that should be addressed before merge.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,804 lines
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-04-07T02:43:59Z

…, compat aliases

- fingerprint_value() raises ValueError when prompt store not bound
  (was silently hashing empty string — cache safety issue)
- load_agents() now passes api_key_env from config (was missing)
- AgentRequest/AgentResult kept as deprecated aliases for backward compat
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary

This PR replaces the generic Agent.write(AgentRequest) → AgentResult gateway with typed methods (map, reduce, group, fold) matching transform shapes. It introduces SynixLLMAgent — a built-in implementation backed by PromptStore for versioned instructions — adds agent_id to artifacts for provenance, and wires agent configuration through synix.toml and Workspace.

Alignment

Strong fit. DESIGN.md states agents don't own prompts, grouping, or fold semantics — transforms do. The old write() gateway was already that boundary, but the new typed methods make the contract more explicit: map receives a single Artifact, reduce receives a list, fold receives accumulated state. Each transform shape now has a named method instead of overloading a single write() with metadata hints. The agent_id on artifacts directly strengthens provenance chains (DESIGN.md §3.9, "explain why a record exists"). The SynixLLMAgent.fingerprint_value() incorporating prompt content hash and llm_config aligns with materialization key semantics (§3.3) — config changes invalidate cache.

Observations

  1. [concern] group_synthesis.py agent path — when self.agent is not None, the method returns early before the group_key / split logic. This means the agent now owns grouping entirely, bypassing the transform's split() and group_by config. The non-agent path still uses group_by. This is a semantic break: a GroupSynthesis with group_by="team" and an agent silently ignores group_by. No warning, no error. Users who set both will get surprising behavior.

  2. [concern] SynixLLMAgent.group() raises NotImplementedError, but GroupSynthesis.execute() calls self.agent.group() unconditionally when an agent is present. This means attaching a SynixLLMAgent to a GroupSynthesis will always crash at runtime. There's a test confirming the NotImplementedError, but no guard in GroupSynthesis itself.

  3. [concern] load_agents() in workspace.py duplicates the llm_config dict-building logic from get_agent() — ~15 identical lines. This will drift.

  4. [question] The Agent protocol now requires all four methods (map, reduce, group, fold). A user building a map-only agent must still implement reduce, group, and fold as stubs. The old protocol was two methods; the new one is six (including agent_id property and fingerprint_value). Is this intentional? It raises the bar for the extension model. DESIGN.md's CDK-over-CloudFormation bet argues for ease of customization.

  5. [question] SynixLLMAgent._call() instantiates a new LLMClient on every invocation. For fold over hundreds of inputs, this creates hundreds of client instances. Is LLMClient lightweight enough for this?

  6. [positive] agent_id vs fingerprint_value separation is clean — stable identity vs. config snapshot, with explicitly different lifecycles. The test test_same_config_different_names_same_fingerprint confirms fingerprint is content-based, not identity-based. This is correct for cache semantics.

  7. [positive] Test coverage is thorough: unit tests for every transform shape × agent, protocol conformance tests for missing methods, SynixLLMAgent prompt store binding/unbinding, fingerprint determinism and invalidation, from_file factory, and e2e agent pipeline. Edge cases (empty name, missing key, unbound store) are all covered.

  8. [nit] AgentRequest and AgentResult backward-compat aliases are created via type() with no usable fields — any code referencing AgentRequest.prompt will get AttributeError with no helpful message. These are exported from __init__.py. Consider raising DeprecationWarning on instantiation instead.

  9. [nit] SynixLLMAgent is a @dataclass but uses __post_init__ for validation and mutable _prompt_store. Not frozen=True, which is fine, but the old AgentRequest/AgentResult were frozen. The new Group dataclass is also mutable — intentional given it's a return value from agent code.

Verdict

Good incremental step — the typed agent interface is a cleaner contract than the generic gateway, and SynixLLMAgent with PromptStore integration is a meaningful capability addition. The GroupSynthesis + agent interaction (observation #1–2) needs resolution before merge: either guard against SynixLLMAgent on GroupSynthesis, or fall back to the split-based path when group() is not implemented.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,817 lines
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-04-07T02:50:11Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Red Team Review — OpenAI GPT-5.4 | Adversarial review (docs + diff only)

Threat assessment — Medium-high risk: this PR changes a core extension point and cache semantics while introducing config-driven agents with weak compatibility and incomplete behavioral coverage.

One-way doors

  1. Public Agent protocol shape (write(request)map/reduce/group/fold)
    Hard to reverse because downstream custom agents will implement this interface and pipeline code/tests will ossify around it. Safe to merge only if you’re committed to shape-specific agent APIs as the long-term extension model, or you provide a real compatibility adapter/deprecation path.
  2. Persisting agent_id on Artifact (src/synix/core/models.py)
    This is now part of on-disk artifact schema/provenance. Reversing means migrations or silent mixed-state behavior. Safe only if artifact serialization/deserialization and release/search consumers are confirmed compatible across existing stores.
  3. Config surface [agents.*] in synix.toml (src/synix/workspace.py)
    User-facing config keys become support burden immediately. Safe only with docs, validation, and clear precedence rules vs pipeline.py-declared behavior.

Findings

  1. src/synix/agents.py backward compatibility is fake
    AgentRequest/AgentResult are replaced with empty dynamic types, not dataclasses. Any user code doing AgentRequest(prompt=...) or result.content will break instantly. The diff claims “deprecated” but actually removes functionality. Severity: [critical]

  2. src/synix/ext/group_synthesis.py changes group semantics in a major way
    Old path grouped by group_by and called the agent per group. New agent path calls agent.group(inputs) once on the entire input set and lets the agent own grouping. That is not a refactor; it’s a semantic rewrite. It violates the design’s “Synix owns grouping semantics” separation and makes grouping behavior agent-dependent and opaque. Severity: [critical]

  3. src/synix/agents.py / SynixLLMAgent._call duplicates instructions into both system and rendered user content
    map/reduce/fold render self.instructions as a template and then _call also sends self.instructions as system prompt. If instructions contain actual prompt text, the model sees it twice. If instructions are intended as system-only, templating them as user content is wrong. This is leaky and likely output-affecting. Severity: [warning]

  4. src/synix/workspace.py::load_agents has hidden environment coupling
    Whether agents can fingerprint depends on .synix/prompts.db existing next to synix.toml. Same pipeline file can work or fail based on local filesystem state. That’s brittle and violates predictable Python-first declaration. Severity: [warning]

  5. src/synix/workspace.py::get_agent/load_agents silently seeds prompt store only if key absent
    Changing instructions_file after first seed does nothing if the prompt key already exists. That creates stale config with no warning and surprising cache behavior. Severity: [warning]

  6. src/synix/agents.py::from_file reads text with default encoding and no error handling
    Non-UTF-8 files, missing files, and partial writes are plausible. This is user-boundary I/O with no validation. Severity: [minor]

  7. src/synix/ext/*_synthesis.py only partially propagates agent provenance
    agent_id is added on produced artifacts, but non-agent path in group_synthesis.py now hardcodes agent_fingerprint=None / model_config=ctx.llm_config without preserving prior local variables. That’s okay functionally, but there’s no evidence search/release/provenance tooling knows about agent_id. Schema addition without end-to-end validation is risky. Severity: [warning]

  8. src/synix/__init__.py exports new names publicly without docs
    Group and SynixLLMAgent are now top-level API. That’s a commitment, not an internal change. No README/docs update included. Severity: [warning]

Missing

  • Tests for real backward compatibility with old write(AgentRequest) -> AgentResult agents. Right now that path appears broken.
  • Tests for artifact serialization/deserialization or migrations involving new agent_id.
  • Tests for load_agents() / get_agent() failure modes: missing prompt DB, missing prompt key, missing instructions file, bad TOML types.
  • Documentation updates for the new Agent API, [agents.*] config schema, and the fact that group semantics moved into agents.
  • Validation of config fields (temperature, max_tokens, unknown keys, path normalization).
  • Any evidence that cache/provenance consumers incorporate agent_id correctly.

VerdictBlock: this PR breaks compatibility at a core extension boundary and quietly changes grouping ownership/semantics without docs, migration strategy, or sufficient tests.

Review parameters
  • Model: gpt-5.4
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,817 lines
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-04-07T02:50:15Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Red Team Review — OpenAI GPT-5.4 | Adversarial review (docs + diff only)

Threat assessment — Medium-high risk: this PR changes the agent abstraction in a way that leaks into cache semantics, artifact metadata, and pipeline behavior, with at least one direct violation of the project’s Python-first design.

One-way doors

  1. Public Agent protocol shape (map/reduce/group/fold, agent_id)
    Hard to reverse because third-party agents will implement this exact interface and transforms now depend on shape-specific methods instead of a generic gateway. Safe only if the team is committed to agent-owned transform semantics long-term and has audited all extension points for compatibility/migration.

  2. Persisting agent_id on Artifact
    This is now part of the artifact model and likely on-disk/object schema. Hard to remove once builds/releases/provenance views depend on it. Safe only if there is a documented schema compatibility story and storage/read-paths tolerate missing/extra fields across versions.

  3. [agents.*] in synix.toml plus workspace.load_agents()
    This is a design commitment to config-defined agents, which conflicts with “Python-first, not config-first.” Once documented/used, backing it out is painful. Safe only if this config layer is explicitly scoped as optional runtime wiring, not a primary pipeline API, and docs are updated to say so.

Findings

  1. src/synix/agents.py — protocol forces group() on every Agent, but built-in SynixLLMAgent.group() is NotImplementedError
    The core built-in implementation does not satisfy the practical contract of the protocol. Any code path that treats Agent as generally usable across transform shapes will fail at runtime. This is a bad abstraction: the type says “all shapes,” the implementation says “except one.” Severity: [critical]

  2. src/synix/ext/group_synthesis.py — agent path bypasses transform-owned grouping/splitting semantics entirely
    Design doc says grouping belongs to transforms/DAG semantics; previous code also kept grouping in transform logic. Now agent-owned grouping can arbitrarily reshape outputs, labels, and provenance independent of group_by. That’s a direct semantic shift, not just refactoring. It undermines stable incremental behavior and makes cache reasoning much harder. Severity: [critical]

  3. src/synix/agents.py + src/synix/ext/*.py — agent now owns prompt rendering despite prior contract saying Synix owns prompts
    The module doc literally says the old design kept rendering/semantics in transforms. Now SynixLLMAgent.map/reduce/fold renders internally, while transforms do it in non-agent mode. Same transform can have different prompt construction depending on execution path. That is a leaky abstraction and a doc/code contradiction. Severity: [warning]

  4. src/synix/workspace.py — config-defined agents in TOML violate stated Python-first principle
    DESIGN.md explicitly rejects config-first as contradicting the thesis. This PR adds a substantial [agents] config surface and helper loader for use in pipeline.py. That’s not a tiny implementation detail; it’s product direction drift. Severity: [warning]

  5. src/synix/workspace.py::load_agents() — silently returns unbound agents if .synix/prompts.db does not exist
    Those agents will later explode on fingerprint_value()/instructions with runtime ValueError. This is missing boundary validation and creates order-dependent failures based on filesystem state. Severity: [warning]

  6. src/synix/workspace.py::get_agent() / load_agents() — seeding prompt store only if key missing
    Changes to instructions_file after first seed are ignored forever unless the DB is manually edited/deleted. That is hidden state overriding source-controlled files, with no warning. For a build system claiming provenance and predictable rebuilds, this is dangerous. Severity: [warning]

  7. src/synix/agents.py::_call() — no timeout/retry/degradation controls visible
    Each agent call instantiates a client and makes a network call with no obvious timeout handling here. Maybe the client does it, maybe not; this PR gives no tests for failure behavior. Reliability risk increases because agent methods are now first-class execution paths. Severity: [minor]

  8. Tests focus on happy-path protocol mechanics, not cache/provenance/storage compatibility
    You added agent_id to Artifact and changed transform semantics, but there are no tests proving fingerprints/materialization behavior, serialization/deserialization compatibility, or release/provenance output behavior. Severity: [warning]

Missing

  • Migration/storage tests proving Artifact.agent_id is persisted/read safely across old artifacts.
  • Tests for load_agents() when prompt DB is absent, prompt key is missing, or instructions_file changes after seeding.
  • Documentation updates: README/DESIGN/site do not mention config-defined agents, agent-owned grouping, or the new agent protocol.
  • Clear statement whether AgentRequest/AgentResult are deprecated publicly, and for how long.
  • Any cache-correctness tests showing agent_id vs agent_fingerprint interaction does not break incremental rebuild guarantees.

VerdictBlock: the abstraction shift is not just additive; it changes ownership of grouping/rendering, conflicts with the design doc, and introduces config/state behavior that is under-tested and hard to unwind.

Review parameters
  • Model: gpt-5.4
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,837 lines
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-04-07T02:53:56Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary

Replaces the generic Agent.write(AgentRequest) → AgentResult gateway with typed methods (map, reduce, group, fold) matching transform shapes. Introduces SynixLLMAgent as a built-in PromptStore-backed implementation, adds agent_id to artifacts for provenance, and adds agent configuration/loading from synix.toml.

Alignment

Strong fit with the vision. DESIGN.md says transforms own prompt rendering and semantics while agents are execution gateways. This PR pushes that further — agents now own rendering and execution for their shape, which is a cleaner boundary. The agent_id on artifacts extends provenance (DESIGN §3.9: "explain why a record exists"). Fingerprinting via PromptStore content hash aligns with materialization key semantics (§3.3): when instructions change, the fingerprint changes, triggering rebuilds. The PromptStore binding supports the "memory architecture is a runtime concern" hypothesis — edit instructions in the viewer, rebuilds happen automatically.

Observations

  1. [positive] The typed map/reduce/group/fold protocol is a clear improvement. It makes the agent contract self-documenting — implementors know exactly what shapes they need to support. The old AgentRequest.metadata bag was an implicit contract disguised as data.

  2. [concern] group_synthesis.py — the agent path bypasses the existing split() logic entirely and delegates grouping to the agent. The non-agent path still uses split() + per-group LLM calls. This means agent-driven grouping has fundamentally different semantics (agent decides groups) vs. config-driven grouping (group_by metadata key). The group_by parameter on GroupSynthesis is silently ignored when an agent is provided. This should be documented or validated.

  3. [concern] SynixLLMAgent._call() instantiates a new LLMClient on every invocation. At 10,000+ artifacts in a map transform, this creates 10,000 client objects. If LLMClient.__init__ does anything non-trivial (connection setup, key validation), this is a performance issue.

  4. [concern] load_agents() in workspace.py duplicates the llm_config assembly logic from get_agent() nearly verbatim (~15 lines). Extract to a shared helper.

  5. [question] The Agent protocol now requires all four methods (map, reduce, group, fold). If someone only needs a map agent for a MapSynthesis transform, they must still implement reduce, group, and fold to satisfy isinstance checks. The tests confirm this (TestAgentProtocol). Was a smaller protocol per-shape considered? This makes the extension model heavier than necessary.

  6. [question] SynixLLMAgent.group() raises NotImplementedError. But GroupSynthesis.execute() calls self.agent.group() unconditionally when an agent is set. So wiring a SynixLLMAgent to a GroupSynthesis will always fail at runtime. Should this be caught at pipeline construction time?

  7. [positive] Backward compatibility is handled cleanly — AgentRequest/AgentResult are kept with deprecation comments. No breaking change for existing code.

  8. [positive] Test coverage is thorough: protocol conformance, all four shapes with/without agents, agent_id propagation, fingerprint determinism, PromptStore binding/edits, from_file, empty fingerprint rejection. Edge cases like missing store and missing prompt key are tested.

  9. [nit] agent_id_val as a variable name in the transforms is awkward. agent_id would be fine — it doesn't shadow anything in scope.

  10. [nit] SynixLLMAgent is a @dataclass but not frozen, with a mutable _prompt_store field. The Agent protocol doesn't constrain this, but the mix of mutable binding (bind_prompt_store) and cache-correctness contracts feels like a footgun for concurrent use.

  11. [concern] No tests for WorkspaceConfig.agents parsing or load_agents(). The TOML parsing path and the get_agent() workspace method are untested in this diff.

Verdict

Good incremental step — the typed agent protocol is a meaningful improvement to the extension model, provenance gets richer with agent_id, and test coverage is solid on the core path. The workspace/TOML integration needs tests, and the group() gap (SynixLLMAgent always raises but GroupSynthesis always calls) should be addressed before this ships to users.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,837 lines
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-04-07T02:54:10Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary

This PR replaces the thin Agent gateway (a single write(AgentRequest) → AgentResult method) with a shape-aware Agent protocol that has typed methods (map, reduce, group, fold) matching transform shapes, adds stable identity (agent_id), and introduces SynixLLMAgent — a built-in implementation backed by PromptStore for versioned instructions. It also adds TOML-based agent configuration in workspace and propagates agent_id onto artifacts.

Alignment

Strong fit. DESIGN.md says "architecture is a runtime concern" and "Python-first." This PR moves agent behavior closer to being a declarative, configurable runtime concern — agents defined in synix.toml, instructions in a prompt store that can be edited live, fingerprint tied to prompt content for cache invalidation. The agent_id on artifacts strengthens the provenance chain (DESIGN.md §3.9: "explain why a record exists"). Materialization keys remain content-addressed since fingerprint_value() captures prompt content + model config, not agent identity — correctly separating identity from cache semantics.

Observations

  1. [positive] agent_id vs fingerprint_value() separation is clean. Identity is stable; fingerprint changes with behavior. This directly serves the materialization key design (DESIGN.md §3.3) — two agents with identical configs produce the same fingerprint, which is correct for cache semantics (verified by test_same_config_different_names_same_fingerprint).

  2. [concern] SynixLLMAgent.group() raises NotImplementedError, but the Agent protocol requires it. Any custom agent must implement all four methods even if only used for map. This makes the protocol expensive to implement for simple use cases. Consider splitting into narrower protocols (e.g., MapAgent, FoldAgent) or providing a base class with NotImplementedError defaults.

  3. [concern] SynixLLMAgent._call() instantiates a new LLMClient on every invocation. For a fold over 100 inputs, that's 100 client instantiations. If LLMClient has any setup cost (connection pool, auth), this could be wasteful at scale.

  4. [concern] In group_synthesis.py, the agent path bypasses the split() logic entirely — the agent receives all inputs and owns grouping. But the non-agent path still uses split() + per-group execute() recursion. This means the agent path doesn't respect any group_by metadata key or deterministic sorting. The semantic contract changes silently depending on whether an agent is attached. The label/metadata construction also differs (agent path uses g.key directly; non-agent path constructs a slug). This asymmetry could confuse pipeline authors.

  5. [positive] fingerprint_value() raises if no prompt store is bound — fail-fast for cache correctness rather than silently producing wrong cache keys. Good discipline.

  6. [question] load_agents() in workspace.py duplicates the llm_config dict-building logic from get_agent() almost line-for-line (~20 lines). Should this be extracted?

  7. [nit] SynixLLMAgent is a @dataclass but doesn't satisfy the Agent protocol's isinstance check because agent_id is a @property on the class, not a protocol-matching attribute. The tests don't verify isinstance(SynixLLMAgent(...), Agent) — worth adding.

  8. [positive] Test coverage is thorough — unit tests for each shape method, fingerprint determinism, prompt store binding lifecycle, from_file factory, protocol compliance checks for missing methods, agent_id propagation on all four transform types, and the e2e test updated. Edge cases like empty fingerprint rejection and unbound store errors are covered.

  9. [nit] Group is a mutable dataclass (no frozen=True), unlike the old AgentRequest/AgentResult which were frozen. Since groups are returned from agents and consumed by transforms, immutability would be safer.

  10. [question] The instructions property sends both self.instructions as system prompt and the rendered template as user content in _call(). This means instructions appear twice in the LLM context (once as system, once embedded in the rendered template via render_template). Is this intentional?

Verdict

A well-motivated refactor that strengthens provenance and moves toward agents as first-class pipeline citizens; the group-path semantic divergence (observation 4) and protocol width (observation 2) deserve attention before this solidifies as the public extension API.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,813 lines
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-04-07T20:51:16Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Red Team Review — OpenAI GPT-5.4 | Adversarial review (docs + diff only)

Threat assessment — Medium-high risk: this PR changes the public agent API, shifts transform semantics, and introduces config/prompt-store coupling without enough migration, persistence, or failure-mode coverage.

One-way doors

  1. Public Agent protocol rewrite (src/synix/agents.py, src/synix/__init__.py)
    Hard to reverse because users implementing Agent now must provide map/reduce/group/fold and agent_id instead of write(). That becomes ecosystem surface area immediately. Safe only if there is an explicit migration path, deprecation window, docs updates, and proof no shipped examples/plugins still depend on write().

  2. Artifact schema expansion with agent_id (src/synix/core/models.py)
    Hard to reverse once artifacts persist this field and downstream tooling starts reading it. Safe only if serialization/deserialization, release/search/provenance consumers, and compatibility with old artifact objects are verified.

  3. synix.toml [agents.*] config shape (src/synix/workspace.py)
    This is user-facing config contract. Renaming fields later (instructions_file, prompt_key, api_key_env) will be painful. Safe only with docs, validation, and examples showing intended long-term shape.

Findings

  1. src/synix/ext/group_synthesis.py — semantic break: agent path bypasses group_by/split() entirely
    Previous behavior grouped by transform config; now self.agent.group(inputs) receives all inputs and owns grouping. That violates the design principle that pipeline topology/semantics live in transforms, not execution gateways. It also makes configured group_by effectively ignored when an agent is attached. Severity: [critical]

  2. src/synix/agents.pySynixLLMAgent.map/reduce/fold sends instructions twice
    self.instructions is rendered into user content and also passed as system prompt in _call(). If instructions contain placeholders, they appear in both channels; if they are plain instructions, they’re duplicated. This is a silent prompt-shape change with output/caching consequences. Severity: [warning]

  3. src/synix/workspace.pyload_agents() conditionally binds prompt store based on .synix/prompts.db existence
    Same pipeline file can produce working agents in one environment and fingerprint failures in another. That is hidden environment coupling and non-obvious execution-order dependence. Severity: [warning]

  4. src/synix/workspace.py — config loading lacks boundary validation
    No validation for invalid temperature, missing provider/model, nonexistent instructions_file, unreadable files, or malformed agent sections. You’ll get late runtime failures from LLM config or prompt lookup instead of config-time errors. Severity: [warning]

  5. src/synix/core/models.py + transforms — schema change not traced through consumers
    agent_id is written on artifacts, but diff shows no updates to object-store serialization, search indexing, lineage display, CLI show, or release projection code. If those paths assume a fixed artifact shape, this is regression bait. Severity: [warning]

  6. src/synix/agents.pyGroup stores full Artifact objects
    Agent API now depends on core artifact internals, not a narrow execution contract. That’s tighter coupling than the old rendered-request boundary and leaks implementation details into agent implementations. Severity: [warning]

  7. src/synix/agents.pyfrom_file() and seeding logic overwrite-free but race-prone
    store.get(...) is None then store.put(...) is a classic check-then-act pattern. Multiple processes using the same workspace can interleave and produce inconsistent prompt-store state. Severity: [minor]

  8. tests/unit/test_agent_transforms.py — tests codify the semantic regression
    New tests assert GroupSynthesis with agent calls agent.group() once on all inputs and returns arbitrary group keys. That confirms the transform abstraction was weakened rather than accidentally broken. Severity: [critical]

Missing

  • No migration tests proving old write()-style agents fail cleanly or are adapted.
  • No docs updates in README/site/design for the new agent API or [agents.*] config.
  • No tests for artifact persistence/round-trip with agent_id.
  • No tests for Workspace.get_agent() when prompt store is absent, prompt key missing, or instructions_file is bad.
  • No concurrency/race handling around prompt-store seeding.
  • No evidence that cache/provenance/search/release consumers incorporate agent_id safely.

VerdictBlock: the group-agent semantic change is a design violation and the new agent/config API is being made public without migration, validation, or end-to-end compatibility proof.

Review parameters
  • Model: gpt-5.4
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,813 lines
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-04-07T20:51:18Z

Transform renders its prompt (task structure) and passes it as
task_prompt to the agent. Agent uses task_prompt as user message,
its own instructions as system message. Both compose:
- Transform defines WHAT (e.g. "Summarize: {artifact}")
- Agent defines HOW (e.g. "You are a concise writer, 200 words")

Same agent works across different transforms. Same transform works
with different agents.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Red Team Review — OpenAI GPT-5.4 | Adversarial review (docs + diff only)

Threat assessment — Medium-high risk: this PR changes the public agent abstraction, mutates transform semantics, and introduces config/runtime coupling without proving cache, provenance, or compatibility are still correct.

One-way doors

  1. Public Agent protocol redesign (src/synix/agents.py, src/synix/__init__.py) — Replacing write(request) with four shape-specific methods plus agent_id is a breaking API users will implement against. Hard to reverse once examples/docs/plugins depend on it. Safe only if there is a compatibility shim or the project explicitly declares the old API removed and updates all docs/examples.
  2. Persisting agent_id on Artifact (src/synix/core/models.py) — This changes the artifact data model and likely on-disk object shape. Hard to undo if releases/build objects start carrying it. Safe only if serialization/backward-read compatibility is verified and migration behavior is documented.
  3. Agent-driven grouping semantics (src/synix/ext/group_synthesis.py) — This is not just plumbing; it changes who owns grouping from transform to agent for the agent path. That cuts across the design doc’s “transform defines grouping semantics” stance. Safe only if materialization/fingerprinting/lineage invariants for N:M are explicitly defined and tested.

Findings

  1. src/synix/ext/group_synthesis.py — agent path bypasses transform grouping entirely
    Failure mode: with self.agent is not None, group_by, deterministic split logic, and per-group execution are skipped; the agent gets all inputs and returns arbitrary groups. That violates the documented principle that Synix owns grouping semantics and risks nondeterministic cache keys/lineage drift. It also makes group_by effectively ignored for agent-backed transforms. Severity: [critical]

  2. src/synix/workspace.py + src/synix/agents.py — cache fingerprint depends on live PromptStore state
    Failure mode: fingerprint_value() raises if no prompt store is bound, and when bound it reads mutable DB content at runtime. Same pipeline code can fingerprint differently depending on whether .synix/prompts.db exists or what a viewer edited out-of-band. That undermines Python-first/versionable declaration and reproducibility/audit expectations. Severity: [critical]

  3. src/synix/workspace.py::load_agents — hidden side effect: importing pipeline code may mutate prompt store
    Failure mode: loading agents seeds instructions into .synix/prompts.db if missing. Merely evaluating config can perform writes, dependent on filesystem state and execution order. That is hidden complexity and a race hazard if multiple processes initialize concurrently. Severity: [warning]

  4. src/synix/__init__.py — removes AgentRequest/AgentResult exports
    Failure mode: obvious downstream break for any user code importing these public names. No deprecation path, no alias, no docs update in provided materials. Severity: [warning]

  5. src/synix/ext/map_synthesis.py, reduce_synthesis.py, fold_synthesis.py — no error handling around agent method failures
    Failure mode: agent implementations can now fail in shape-specific ways, including fingerprint_value() and group() NotImplementedError. There’s no graceful wrapping, context enrichment, or fallback. This will be painful to debug in builds. Severity: [warning]

  6. src/synix/ext/group_synthesis.py — provenance/metadata consistency diverges between agent and non-agent paths
    Failure mode: non-agent path stores slugged labels and ctx.llm_config; agent path stores raw g.key, no slug normalization, model_config=None, and arbitrary input_ids chosen by the agent. Search/release/UI code likely assumes consistent label and metadata conventions. Severity: [warning]

  7. tests/unit/test_agents.py — tests normalize bad design instead of protecting contracts
    Failure mode: tests assert that two agents with different names have the same fingerprint. That may be intended, but combined with separate agent_id it means cache identity and provenance identity diverge. If that’s deliberate, it needs stronger rationale and integration tests proving no stale reuse across semantically distinct agents sharing instructions/config. Severity: [minor]

Missing

  • Backward-compatibility tests for old write() agents or an explicit migration/removal note.
  • Serialization/object-store tests proving old artifacts without agent_id still load and new artifacts don’t break release/search.
  • Integration tests for Workspace.get_agent() / load_agents() with and without .synix/prompts.db, including missing prompt keys and concurrent seeding.
  • Tests proving GroupSynthesis cache/provenance behavior remains deterministic when agent grouping is used.
  • Documentation updates: README/DESIGN still describe agents as execution gateways where transforms own grouping/sorting/fold semantics. This PR contradicts that.

VerdictBlock: the agent API break is manageable, but the agent-driven grouping and PromptStore-dependent fingerprinting introduce architectural inconsistency and cache/provenance risk that is not justified or documented.

Review parameters
  • Model: gpt-5.4
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,803 lines
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-04-07T21:02:57Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary

Replaces the generic Agent gateway (single write(AgentRequest) → AgentResult method) with a shape-typed protocol where agents implement map, reduce, group, and fold directly. Introduces SynixLLMAgent as a built-in implementation backed by PromptStore for versioned instructions, adds agent_id to artifact provenance, and wires agent configuration through synix.toml / workspace.

Alignment

Strong fit. DESIGN.md defines transforms as the core build rules and the agent as an "execution gateway." This PR keeps that separation — transforms still own prompt rendering (task_prompt), agents own execution — but gives agents typed awareness of transform shape, which is necessary groundwork for agent-driven memory (the roadmap's end-state). Adding agent_id to Artifact extends the provenance chain (DESIGN.md §3.9), and fingerprint_value() from PromptStore content hashes correctly drives cache invalidation per the materialization key contract (§3.3). The SynixLLMAgent.group() raising NotImplementedError is honest — grouping semantics are structurally different and the PR doesn't pretend otherwise.

Observations

  1. [positive] fingerprint_value() raises if no PromptStore is bound — this enforces cache correctness at the call site rather than silently producing wrong keys. Exactly right per §3.3.

  2. [positive] Comprehensive test coverage: unit tests for each protocol method, SynixLLMAgent construction/validation/fingerprinting, all four transform shapes with agent, agent_id propagation, empty-fingerprint rejection, no-agent fallback, and the e2e test updated. Edge cases (missing store, key not found, prompt edits changing fingerprint, same config different names) are covered.

  3. [concern] load_agents() in workspace.py duplicates the llm_config construction logic from get_agent() almost line-for-line (~15 lines). This will drift. Extract a shared _build_llm_config(ac: AgentConfig) -> dict helper.

  4. [concern] group_synthesis.py agent path bypasses the existing split() / group_key logic entirely — the agent receives all inputs and does its own grouping. This means group_by metadata key is ignored when an agent is present. The semantic shift is significant but undocumented. A comment or docstring clarifying "agent owns grouping when present" would prevent confusion.

  5. [concern] SynixLLMAgent._call() instantiates a new LLMClient on every invocation. For fold with many steps this creates N client instances. Not a correctness bug, but a scale concern if client setup has overhead (connection pooling, auth).

  6. [question] The Agent protocol now requires all four methods (map, reduce, group, fold). External implementers who only need one shape must stub the rest. Was a narrower per-shape protocol considered? The protocol tests confirm this is intentional — every missing method fails isinstance — but it's a heavier contract than the old one.

  7. [nit] reduce_synthesis.py line 115: count=str(len(sorted_inputs)) but output_metadata on line 139 uses len(inputs). These are the same list with different ordering, so the value is identical, but mixing references is confusing — pick one consistently.

  8. [nit] SynixLLMAgent is a @dataclass but not frozen, meaning name and prompt_key can be mutated after construction. Given that agent_id derives from name and feeds into artifact provenance, consider frozen=True or documenting the mutability contract.

  9. [positive] from_file() classmethod with PromptStore seeding is a clean factory pattern for the TOML-configured path. The "seed if not already present" logic in get_agent() avoids clobbering user edits in the viewer — good UX detail.

  10. [question] No test for the _parse_toml agent-section parsing or load_agents(). These are new external-facing entry points — a missing [agents.x] table, malformed config, or absent prompts.db should have coverage.

Verdict

This is a well-structured evolution of the agent interface that strengthens provenance tracking and correctly maintains cache semantics — a good incremental step, with the config-building duplication and missing TOML parsing tests as the main items to address before merge.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,803 lines
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-04-07T21:03:12Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Red Team Review — OpenAI GPT-5.4 | Adversarial review (docs + diff only)

Threat assessment — Medium-high risk: this PR changes a public extension point and partially redefines transform semantics, with at least one clear design contradiction and weak migration/docs coverage.

One-way doors

  1. Agent protocol redesign (write() → shape-specific methods + agent_id)
    Hard to reverse because third-party/custom agents will code to this protocol and tests now enforce it structurally. Reverting later means breaking user pipelines. Safe only if there is a documented compatibility/migration story and evidence no shipped code still depends on AgentRequest/AgentResult.
  2. Persisting agent_id on Artifact
    This is now part of the artifact model and likely on-disk object schema. Once artifacts/releases include it, downstream tools may rely on it. Safe only if serialization/back-compat is verified and old artifacts without agent_id are handled everywhere.
  3. TOML-level [agents.*] config in synix.toml
    The design doc is explicitly Python-first and anti-config-first. Exposing agents as first-class TOML config creates a second declaration surface users will depend on. Safe only if this is an intentional product-direction change and docs are updated to explain precedence/lifecycle with pipeline.py.

Findings

  1. src/synix/ext/group_synthesis.py — agent path bypasses group_by/split semantics
    In the non-agent path, GroupSynthesis groups by group_by and processes each group. In the agent path, it calls self.agent.group(inputs, rendered) once on the full input set and lets the agent invent groups. That is not the same abstraction. It violates the documented model where grouping is a transform concern, not an agent concern. It also means metadata/labels/group cardinality differ depending on execution backend. Severity: [critical]
  2. src/synix/agents.py + src/synix/__init__.py — breaking public API with no compatibility layer
    AgentRequest and AgentResult are removed from exports and implementation, replaced with a wider protocol. Any external code importing them or implementing write() now breaks. I see no shim, deprecation path, or release note. Severity: [critical]
  3. src/synix/workspace.py — config duplication and hidden runtime dependency on prompt store
    load_agents() / get_agent() silently produce agents that may or may not be fingerprintable depending on whether .synix/prompts.db exists or runtime is active. That makes pipeline loading environment-dependent. A build graph object should not change validity based on incidental local files. Severity: [warning]
  4. src/synix/agents.pySynixLLMAgent.from_file() writes into PromptStore as a side effect
    Constructing an agent mutates persistent state. That is surprising, non-idempotent at the API boundary, and can couple import-time behavior to local DB state. This kind of side effect is exactly how reproducibility gets muddy. Severity: [warning]
  5. src/synix/ext/group_synthesis.py — inconsistent artifact provenance/model fields between paths
    Agent-produced grouped artifacts set model_config=None always and don’t preserve any grouping logic provenance beyond agent_fingerprint; non-agent path stores ctx.llm_config. If an agent internally uses an LLM config, you’ve dropped audit detail contrary to the design’s “full provenance / audit determinism” claims. Severity: [warning]
  6. src/synix/workspace.py — no validation of agent config values
    temperature, max_tokens, file paths, and provider/model combinations are accepted almost raw. Bad TOML now propagates into runtime failures deep in LLM execution instead of failing at config load. Severity: [warning]
  7. src/synix/agents.py — no timeout/retry/error policy at new abstraction boundary
    _call() creates a new LLMClient per invocation and directly calls complete(). No visible timeout/retry/circuit-breaker behavior here, and this path is now central. At scale this is a reliability footgun. Severity: [minor]

Missing

  • Migration docs for custom agents using write(AgentRequest) -> AgentResult.
  • Tests proving old artifacts/object loading still works with new agent_id field.
  • Any docs/README updates describing agents, [agents.*] TOML, SynixLLMAgent, or the changed ownership of grouping semantics.
  • Tests for load_agents() / get_agent() failure modes: missing prompt DB, missing instructions file, invalid config, concurrent seeding.
  • Serialization/persistence tests showing agent_id survives build/release/query paths.

VerdictBlock: the group-agent semantics are a real design break, and the public agent API change ships without migration or documentation.

Review parameters
  • Model: gpt-5.4
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,823 lines
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-04-07T21:12:06Z

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary

Replaces the generic Agent.write(AgentRequest) -> AgentResult gateway with typed, shape-aware methods (map, reduce, group, fold) on the Agent protocol. Adds agent_id as stable identity distinct from fingerprint_value(). Introduces SynixLLMAgent, a built-in implementation backed by PromptStore + LLMClient, and workspace-level agent configuration via synix.toml.

Alignment

Strong fit. DESIGN.md §3.9 establishes audit determinism: every artifact must explain what produced it. Adding agent_id to Artifact directly advances this — you now know which agent produced an artifact, not just its config hash. The fingerprint_value() contract (content-based, not name-based) correctly aligns with materialization key semantics from §3.3: same instructions + same model = same fingerprint, regardless of agent name. The separation of agent_id (stable identity) from fingerprint_value() (cache key) mirrors the design's separation of step name from step_version_hash.

The typed methods (map, reduce, group, fold) match the four build rule types from §3.5 exactly, which is cleaner than routing everything through a generic write() that lost type information.

Observations

  1. [positive] GroupSynthesis agent path now delegates grouping to the agent — the agent decides group membership and synthesizes content. This is a meaningful capability expansion that aligns with Hypothesis 3 (architecture is a runtime concern). The non-agent path is preserved unchanged.

  2. [concern] SynixLLMAgent._call() instantiates a new LLMClient on every invocation. For a fold over 28 monthly inputs, that's 28 client constructions. This won't break correctness but is wasteful and will matter at scale. Should be lazy-initialized and cached.

  3. [concern] SynixLLMAgent.group() raises NotImplementedError. But GroupSynthesis.execute() unconditionally calls self.agent.group() when an agent is present. Any user passing a SynixLLMAgent to GroupSynthesis will get a runtime crash. This should either be documented loudly at GroupSynthesis.__init__ validation time, or GroupSynthesis should check hasattr / try-except. The test covers the error but no test verifies the integration path catches it gracefully.

  4. [concern] load_agents() in workspace.py duplicates the llm_config assembly logic from get_agent() — ~15 identical lines. This will drift. Extract a helper.

  5. [question] The Agent protocol now requires all four methods (map, reduce, group, fold). Protocol tests confirm missing any one method fails isinstance. But most custom agents will only implement 1-2 shapes. Is forcing all four the right ergonomic choice for the extension model? A partial-implementation pattern (raise NotImplementedError for unused shapes, like SynixLLMAgent.group()) works but makes the protocol check misleading — isinstance(agent, Agent) passes but calling certain methods blows up.

  6. [question] agent_id is not included in fingerprint_value() for SynixLLMAgent. The test test_same_config_different_names_same_fingerprint explicitly asserts this. Is that correct for cache semantics? Two agents with different names but identical config will share cache hits. If the intent is that agent identity affects output (persona), this could cause stale cache returns.

  7. [nit] reduce_synthesis.py changes count=str(len(inputs)) to count=str(len(sorted_inputs)) — these are always the same length, but using sorted_inputs is more consistent with the rendered artifacts_text. Fine.

  8. [positive] Test coverage is thorough — unit tests for every protocol method, every transform shape, agent ID on artifacts, fingerprint determinism, prompt store binding/edits, from_file factory, empty-fingerprint rejection. The e2e test is updated. Edge cases (missing store, missing key, empty name) are all covered.

  9. [nit] Group dataclass is not frozen, unlike the old AgentRequest/AgentResult. Intentional? Mutable return values from agent.group() could be modified downstream.

  10. [positive] PromptStore-backed instructions with live reload (test_instructions_picks_up_edits) is a smart design — edit prompts in the viewer, agent picks them up without restart. This directly supports the "iterate on architectures" thesis.

Verdict

This is a well-structured evolution of the agent interface that improves type safety, provenance tracking, and extensibility — but the group() NotImplementedError integration gap and the protocol-requires-all-methods ergonomic question should be resolved before merge.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 1,823 lines
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-04-07T21:12:24Z

@marklubin marklubin merged commit cfd84ba into main Apr 7, 2026
13 checks passed
@marklubin marklubin deleted the mark/typed-agents branch April 7, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant