Skip to content

mandarnilange/agentforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgentForge

CI npm @mandarnilange/agentforge-core npm @mandarnilange/agentforge License: MIT Node ≥20

An open framework for agentic workflows. Bring your process, LLMs, scripts, agents, and infra — we handle the orchestration.

  • Compose agent harnesses in YAML — LLM calls, scripts, validators, transforms, loops, and conditionals.
  • Gate the LLM with your tools — deterministic steps (linters, tests, schemas) wrap non-deterministic LLM calls, so output is checked on every run.
  • Plug in your stack — Anthropic, OpenAI, Gemini, Ollama, coding-agent runtimes. Every layer has an extension point where the built-in doesn't fit.
  • Run anywhere — local, Docker, or remote workers; SQLite or Postgres; OTel-native.
  • Scale like infra — multi-worker scheduling, approval gates, cost ceilings, live dashboard.

Ships with a reference SDLC template — runnable end-to-end in minutes. Domain-agnostic: code review, content generation, ops runbooks, data pipelines — anywhere multiple LLM calls need to be coordinated with humans in the loop.


Quick Start

Requires Node.js 22.19 or later (the bundled pi execution backends set this floor).

# 1. Install
npm install @mandarnilange/agentforge

# 2. Set your Anthropic API key
export ANTHROPIC_API_KEY=sk-ant-...

# 3. Scaffold the reference template into .agentforge/
npx @mandarnilange/agentforge init --template simple-sdlc

# 4. Run the full pipeline (approval gates between phases)
npx @mandarnilange/agentforge run --project my-app --input "brief=Build a freelance invoicing SaaS"

# 5. Watch it live
npx @mandarnilange/agentforge dashboard           # → http://localhost:3001

npm 11+ note — approve native install scripts. npm now blocks dependency install scripts by default. AgentForge depends on the native module better-sqlite3 (and koffi), which need their build scripts to compile. If your install warns that packages are "not yet covered by allowScripts", approve and rebuild them so the native binaries are built:

npm approve-scripts better-sqlite3 koffi
npm rebuild

Skipping this leaves better-sqlite3 uncompiled and AgentForge fails at runtime with a native-binding load error. Avoid --all / --dangerously-allow-all-scripts: approve only the packages you trust.


The harness model — what makes AgentForge different

Other frameworks treat an "agent" as one LLM call wrapped in tools. AgentForge treats an agent as a harness — a named flow of steps where your existing tools are first-class:

  • llm — call the model with the system prompt + inputs.
  • script — run any shell command (linter, test runner, security scanner, your custom CLI).
  • validate — Zod / JSON Schema check against an artifact. Fails the run by default.
  • transform — pure data reshape between steps.

Wrap any of these in loop (with a until predicate + maxIterations) or condition blocks. The LLM proposes; your tools decide whether the output is acceptable. Bad output never leaks into the next phase.

A real example — the bundled developer agent's generate → lint → test → fix-until-passing flow:

spec:
  flow:
    - step: generate-code
    - step: lint-and-format
    - loop:
        until: "{{steps.test-gate.output}}"     # exits when test-gate emits "PASS"
        maxIterations: 3
        do:
          - step: run-tests
          - step: test-gate
          - step: fix-code
            condition: "{{steps.test-gate.output}}"   # skip fix if tests passed
    - step: validate-output
    - step: git-commit

Each step's output, exit code, duration, and OTel span land in the state store. The dashboard shows the whole harness, not just the LLM turn.


Architecture: control plane + execution plane

                ┌─────────────────────────────────────────────┐
                │              CONTROL PLANE                  │
                │   Dashboard · Scheduler · Gates             │
                │   Definition store · State store · Events   │
                └──────────────────┬──────────────────────────┘
                                   │
                  dispatch jobs    │    report results
                                   ▼
                ┌─────────────────────────────────────────────┐
                │             EXECUTION PLANE                 │
                │   node: local · docker · ssh / http worker  │
                │                                             │
                │  Agents run here — file system, LLM calls,  │
                │  shell, tools all live on the node.         │
                └─────────────────────────────────────────────┘
  • Control plane — pipeline / gate controllers, scheduler, definition store, state store, event bus, dashboard server.
  • Execution plane (nodes) — the local process, a Docker container, or a remote worker over SSH / HTTP. Nodes advertise capabilities (llm-access, docker, local-fs, git, gpu, …) and the scheduler matches each agent's nodeAffinity to the pool.
  • Same binary. On a laptop, one process hosts both. In production, run control-plane and worker containers from the same image with different CLI invocations.

Core concepts

Concept What it is Defined in
Agent A system prompt + typed I/O + optional step pipeline. Runs on a node. .agentforge/agents/*.agent.yaml
Pipeline A sequence of phases; each phase runs one or more agents and may end with an approval gate. .agentforge/pipelines/*.pipeline.yaml
Node An execution target — local, Docker, or remote SSH — with declared capabilities. .agentforge/nodes/*.node.yaml
Artifact A typed, validated JSON document passed between agents. .agentforge/schemas/*.schema.yaml
Gate A pause point between phases for human review (approve / reject / revise). Inline in the pipeline

Agents declare which capabilities they need:

# .agentforge/agents/developer.agent.yaml (excerpt)
spec:
  nodeAffinity:
    required:
      - capability: llm-access
      - capability: docker        # writes files + runs shell — needs isolation
    preferred:
      - capability: high-memory   # prefer a beefy node if available

The scheduler picks the highest-scoring node whose capabilities satisfy the required set; soft preferences break ties.


Reference template — simple-sdlc

Three agents wired into a classic requirements → architecture → implementation flow.

Brief ──► analyst ─► [gate] ─► architect ─► [gate] ─► developer ─► done
# .agentforge/pipelines/simple-sdlc.pipeline.yaml
apiVersion: agentforge/v1
kind: PipelineDefinition
metadata:
  name: simple-sdlc
spec:
  input:
    - { name: brief, type: raw-brief, required: true }
  phases:
    - { name: requirements, phase: 1, agents: [analyst],   gate: { required: true } }
    - { name: architecture, phase: 2, agents: [architect], gate: { required: true } }
    - { name: implementation, phase: 3, agents: [developer] }

More templates — api-builder, code-review, content-generation, data-pipeline, seo-review — ship with the platform binary.


Deployment topologies

Same YAML, same binary — three shapes from smallest to largest.

1. Laptop — single process

┌─────────────────────────────────────────────────┐
│  agentforge  (one Node.js process)              │
│   control plane ──dispatch──► local node        │
│   SQLite state · Anthropic LLM                  │
└─────────────────────────────────────────────────┘

npx @mandarnilange/agentforge dashboard starts everything. Dockerized variant available without Postgres / OTel:

docker compose up -d                                  # Dashboard at :3001
PROJECT=my-app BRIEF="Build a todo app" \
  docker compose run --rm runner                      # One-shot pipeline

2. Single host — production on one box

Postgres durability, OTel tracing, Docker-isolated agent runs.

docker compose -f packages/platform/docker-compose.prod.yml up -d

3. Distributed — control plane + worker pool

# Control-plane host
docker compose -f packages/platform/docker-compose.control-plane.yml up -d

# Each worker host
CONTROL_PLANE_URL=http://cp-host:3001 \
  docker compose -f packages/platform/docker-compose.worker.yml up -d

Workers register, heartbeat, and receive dispatched jobs. Heterogeneous pools (GPU vs lightweight) are routed by nodeAffinity.

Current limitation — control plane is single-replica. The execution plane scales horizontally to many worker hosts, but the control plane itself should be run as a single instance today. Pending-job queue, scheduler state, and event bus are process-local; running two replicas will split-brain. Tracked with a concrete fix path — see ROADMAP.md.


Two packages — which one to install

Install @mandarnilange/agentforge unless you have a specific reason not to. Defaults are identical for local dev (SQLite, local executor, Anthropic), and every production feature is available the day you need it — no migration.

Install @mandarnilange/agentforge-core if you want the framework primitives without multi-provider middleware, Postgres, or the Docker / SSH executors — typically when embedding AgentForge in your own CLI.


Dashboard

A React SPA served by the same binary. Real-time pipeline view via Server-Sent Events: run list with status / cost / progress, phase-by-phase timeline with live agent conversations, gate management (approve / reject / revise in-browser), artifact viewer with type-aware renderers, PDF export.

npx @mandarnilange/agentforge dashboard --port 3001

When ANTHROPIC_API_KEY isn't set, the dashboard renders a read-only banner — useful for browsing completed runs.


Agent Skills

The repo ships agent skills that drop into any Claude Code / Cursor / Codex session — installable in one command:

npx skills add mandarnilange/agentforge
Skill Audience What it does
agentforge-workflow End user Walks through designing an AgentForge workflow and emits a complete .agentforge/ directory.
agentforge-cli Operator Conversational interface to the AgentForge CLI — run, monitor, approve gates, apply YAML.
agentforge-debug Operator Triages stuck or failing pipeline runs from symptom to fix path.
agentforge-template-author Contributor Guides adding a new shipped template under packages/{core,platform}/src/templates/.

Trigger phrases live in each skill's frontmatter — e.g. "help me design a PR-triage pipeline" fires agentforge-workflow, "approve the architect gate" fires agentforge-cli, "why is pipeline X stuck?" fires agentforge-debug.

New here? The Skill Quickstart is a 5-minute, fill-in-the-blanks markdown you edit, paste into your AI agent, and run. Catalog, changelog, and authoring docs: skills/.


CLI reference

agentforge init --template <name>       # Scaffold .agentforge/ from a template
agentforge templates list               # Show bundled templates
agentforge exec <agent> [options]       # Run a single agent
agentforge run --project <name>         # Start a pipeline
agentforge run --continue <run-id>      # Resume a paused pipeline
agentforge dashboard                    # Start the web dashboard
agentforge get pipelines                # List pipeline runs
agentforge gate {approve,reject,revise} # Gate actions
agentforge logs <run-id>                # View agent run logs
agentforge apply -f <path>              # Apply persistent YAML definitions (platform)
agentforge get nodes                    # List registered worker nodes (platform)
agentforge node start --control-plane-url <url>   # Run as a worker (platform)

Environment variables

Variable Required Default Description
ANTHROPIC_API_KEY Yes Anthropic API key.
OPENAI_API_KEY / GOOGLE_API_KEY If using Other providers.
OLLAMA_BASE_URL No http://localhost:11434 Ollama server URL.
AGENTFORGE_DEFAULT_MODEL No claude-sonnet-4-6 Default model.
AGENTFORGE_LLM_TIMEOUT_SECONDS No 600 Wall-clock timeout per LLM call (0 disables).
AGENTFORGE_OUTPUT_DIR / AGENTFORGE_DIR No ./output / ./.agentforge Output and definitions paths.
AGENTFORGE_STATE_STORE No sqlite sqlite or postgres.
AGENTFORGE_POSTGRES_URL If postgres Connection URL — masked in logs.
OTEL_EXPORTER_OTLP_ENDPOINT No Enables OTel tracing export.

Reliability: every LLM call is bounded by timeoutSeconds (per-agent override available). Anthropic HTTP 529 (overloaded_error) is retried 3× with exponential backoff. API keys and the Postgres URL are masked in logs, errors, and conversation transcripts.


Learn more

Every deep-dive lives in docs/. Pick a track:

Get started

  • Skill Quickstart — 5-minute, fill-in-the-blanks path from zero to your first run via the agentforge-workflow skill.
  • Getting Started — install to first pipeline run, full CLI walkthrough, resume flow.
  • Who Uses It — what platform engineers, software engineers, and domain owners each get out of AgentForge.
  • Templates — catalog of bundled pipeline templates.

Concepts

  • Harness Model — full step-grammar walkthrough and the bundled developer agent's test-fix loop.
  • Architecture — control plane, domain model, ports & adapters, step grammar.
  • Pipeline Execution Flows — how a run moves through the system.
  • Artifact Typing — typed inputs/outputs, schema validation, why malformed LLM output fails fast.

Operate

Extend & test


Stability

v0.3.x — early but stabilising. The API surface is settling down but may still shift before 1.0. npm install @mandarnilange/agentforge pulls the latest release. Open an issue for anything that looks rough, or use Discussions for usage questions.


Contributing

Bug reports, feature ideas, doc fixes, and code are all welcome.

  • Issues / requests: GitHub issues.
  • Dev setup and conventions: CONTRIBUTING.md.
  • Larger architectural work: ROADMAP.md — every entry is issue-ready.
  • Pull requests: small, focused, with tests. Conventional-commit messages preferred.

MIT-licensed — see LICENSE.