This document describes the shape of Smith as a system and product direction. It is intentionally broader than the current implementation target.
This document is not the v1 source of truth.
For the conservative v1 runtime contract, see docs/rfc/0001-smith-v1-core.md.
Smith is a seed framework for LLM-powered computation. Instead of building agent behavior through SDK code, Smith treats the filesystem as the programming model and the LLM as the runtime.
The core idea is:
- prompts are executable artifacts
- folders express structure and control flow
- the runner is infrastructure, not the application
- task trees are the primary unit of reuse, inspection, and version control
| .NET | Smith |
|---|---|
| CLR | LLM runtime |
| IL bytecode | Markdown prompts |
| BCL / System namespace | Core modules and task conventions |
| .exe / application | Task tree |
| MSBuild | Go runner |
The analogy is useful as a mental model, not as a promise of VM-style determinism. Smith embraces inspectability and composition, but it runs on top of probabilistic models and tool systems.
- The task format is the product.
- The runner should stay small, boring, and replaceable.
- Human-readable artifacts matter more than framework cleverness.
- Capabilities should be explicit and least-privilege by default.
- Reuse should happen at the task-tree level, not through opaque code generation.
- The architecture should acknowledge that LLMs are adaptive runtimes, not deterministic CPUs.
Layer 0: Runner
A Go process that discovers tasks, assembles prompts, calls providers,
mediates tools, writes outputs, and reports status.
Layer 1: Core conventions and reusable modules
Shared task patterns such as planning, evaluation, summarization,
distillation, reflection, testing, and memory maintenance.
Layer 2: User task trees
Project-specific workflows written as folders plus markdown.
Layer 3: Runtime adaptation
Future-facing capabilities such as dynamic task creation, self-improving
pipelines, and richer orchestration policies.
A concrete task is a folder whose executable definition is task.md.
Task trees are composed through:
- nested tasks under
subtasks/ - explicit dependencies between sibling tasks
- shared conventions for prompts, schemas, tools, outputs, and context
In the broad architecture, child tasks inherit structure from the tree around them. The exact execution semantics, supported files, and validation rules are defined by RFCs, not by this document.
Smith wants reusable task definitions, but the architecture should not depend on filesystem tricks. The preferred direction is explicit module references rather than symlinks.
A module reference is an authoring construct that resolves to a concrete task definition before execution. It is not itself the primitive task rule. That keeps the conceptual model clean:
- concrete runtime task: resolved folder with
task.md - reusable source artifact: module definition referenced by other tasks
Tasks consume a mix of:
- local static context
- inherited outputs from nearby tasks
- explicit dependency outputs
- optional external data via tools
The architecture assumes context should be shaped deliberately rather than dumped wholesale. Over time, Smith should support both deterministic context assembly and more adaptive retrieval patterns.
The runner talks to one or more model providers through an abstract runtime layer. Tasks may also use external tools, typically through MCP-compatible interfaces.
The capability model should stay explicit:
- tasks only get declared tools
- high-risk capabilities require approval
- broader trust policies belong to project-level policy, not hidden defaults
Smith has multiple interfaces, but they should stay aligned around the same task artifacts:
- direct CLI execution for automation
- a conversational shell for authoring and iteration
- prompt inspection and replay tooling for debugging
- testing and provenance views for confidence
Different interfaces should not create different task semantics. They are different ways of working with the same underlying task tree.
The broader Smith architecture still aims at several capabilities that are intentionally larger than the conservative v1 RFC:
Planning tasks may eventually emit new task folders during execution. This is a defining long-term idea, but it raises real questions around graph stability, approvals, caching, and cost boundaries.
Conditional execution, loops, retries, human review, evaluation, and self-revision are all important parts of the larger vision. They should exist as explicit task-tree mechanisms, not hidden framework magic.
Smith should eventually support persistent project memory, task-local memory, reference corpora, and more intelligent retrieval. That work should be designed with token budgets, auditability, and pruning in mind.
Strong provenance, regression testing, and policy enforcement are essential if Smith is going to be trusted for real work. The architecture assumes these are first-class concerns, not bolt-ons.
RFCs define what is actually supported. They are intentionally narrower than this architecture document.
The intended split is:
docs/ARCHITECTURE.md: broad system shape, vocabulary, and directiondocs/rfc/0001-smith-v1-core.md: normative conservative v1 runtime contractdocs/ideation/: exploratory notes, alternatives, and future ideas
When this document and an RFC disagree, the RFC wins.