Skip to content

dattgoswami/forge-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

forge-core

Shared domain library for agentic software engineering systems.

forge-core provides the framework-agnostic domain logic for automated software engineering agents. It is not an orchestration framework. It does not abstract away LangGraph, CrewAI, OpenAI Agents SDK, Google ADK, or AWS Strands.

What it provides is everything below orchestration: the tools agents call, the schemas defining inter-agent contracts, the guardrails governing what agents are allowed to do, the sandbox isolating code execution, and the evaluation harness measuring agent quality across all framework implementations.


Install

pip install forge-core                   # core library only
pip install forge-core[openai]           # with OpenAI Agents SDK adapter
pip install forge-core[langgraph]        # with LangGraph adapter
pip install forge-core[crewai]           # with CrewAI adapter
pip install forge-core[adk]             # with Google ADK adapter
pip install forge-core[strands]         # with AWS Strands adapter

Design Principles

1. The tool layer is shared. The orchestration layer is not. read_file, run_tests, fetch_pr_diff — these are the same function regardless of whether LangGraph or CrewAI is orchestrating. Implement them once, test them once, audit them once.

2. Domain schemas are the inter-agent contract. Every agent in every framework produces and consumes the same Pydantic models: ImplementationPlan, Finding, VerificationReport, PRDraft. You can swap the LangGraph planner for the OpenAI planner and the downstream Coder agent is unaffected.

3. Guardrails are logic, not framework hooks. The auto-commit safety lock, the prompt injection classifier, the PII scanner — these are pure Python. Each framework adapter wires them into its native hook mechanism. The logic never duplicates.

4. Evaluation is the ground truth. The eval harness runs the same SWE-bench tasks against any adapter. Framework choice becomes a data-driven decision, not a religious one.

5. Framework power is never sacrificed. forge-core imposes zero constraints on how adapters use their frameworks. LangGraph adapters keep interrupt(). CrewAI adapters keep @flow. The library is additive, not limiting.


Architecture

forge-core sits below orchestration. Frameworks call into it; it never calls into frameworks.

┌──────────────────────────────────────────────────────────────────────┐
│                        YOUR FRAMEWORK LAYER                          │
│                                                                      │
│   LangGraph          OpenAI Agents SDK      CrewAI / ADK / Strands   │
│   StateGraph +        Runner + handoffs +    @flow + Crew +          │
│   interrupt()         @output_guardrail      @start/@listen          │
└───────────────────────────────┬──────────────────────────────────────┘
                                │
                 thin adapters (100–300 LOC each)
                 get_langgraph_tools() / get_openai_tools() / …
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────────┐
│                           forge-core                                 │
│                                                                      │
│  ┌─────────────────┐  ┌──────────────────┐  ┌─────────────────────┐ │
│  │    schemas/      │  │     tools/        │  │    guardrails/      │ │
│  │                 │  │                  │  │                     │ │
│  │ Implementation  │  │ ToolRegistry     │  │ CommitSafetyGuard   │ │
│  │ Plan            │  │ @forge_tool      │  │ PromptInjection     │ │
│  │ AtomicTask      │  │ ──────────────── │  │ Guard               │ │
│  │ Finding         │  │ github.py        │  │ PIIGuard            │ │
│  │ FindingSet      │  │  fetch_pr_diff   │  │ ScopeGuard          │ │
│  │ Verification    │  │  commit_patch    │  │ ConfidenceGuard     │ │
│  │ Report          │  │ filesystem.py    │  │ ──────────────────  │ │
│  │ PRDraft / Patch │  │  read_file       │  │ GuardChain          │ │
│  │ RepoMap         │  │  write_file      │  │ GuardViolation      │ │
│  └─────────────────┘  │  apply_diff      │  └─────────────────────┘ │
│                       │ execution.py     │                           │
│  ┌─────────────────┐  │  run_tests       │  ┌─────────────────────┐ │
│  │    sandbox/      │  │  run_linter      │  │      evals/          │ │
│  │                 │  │  run_type_checker│  │                     │ │
│  │ DockerSandbox   │  │ search.py        │  │ ForgeEvaluator      │ │
│  │ (rootless       │  │  search_code     │  │ SWEBenchEvaluator   │ │
│  │  container,     │  │  find_symbol     │  │ RunMetrics          │ │
│  │  network=none)  │  │  build_repo_map  │  │ AggregateMetrics    │ │
│  │ SandboxSnapshot │  └──────────────────┘  │ EvalReport          │ │
│  │ SandboxViolation│                        └─────────────────────┘ │
│  └─────────────────┘                                                 │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │   adapters/  (ForgeAdapter ABC + per-framework tool wiring)  │   │
│  │   base.py · openai_sdk.py · langgraph.py · crewai.py ·       │   │
│  │   adk.py · strands.py                                        │   │
│  └──────────────────────────────────────────────────────────────┘   │
└─────────────────────────────┬────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────────────┐
│                        EXTERNAL SYSTEMS                              │
│   GitHub API      Docker Engine      SWE-bench      Filesystem       │
└──────────────────────────────────────────────────────────────────────┘

Data flow for a single issue → PR run:

GitHub issue URL
      │
      ▼ fetch_pr_diff / fetch_repo_file (tools/github.py)
[Planner Agent] → ImplementationPlan (schemas/plan.py)
      │                   │
      │           GuardChain check (guardrails/)
      │                   │ GuardViolation → HITL / reject
      ▼ (plan approved)
[Coder Agent]  → Patch (schemas/pr.py)
      │
      ▼ CommitSafetyGuard + ScopeGuard
      │
      ▼ DockerSandbox.run_tests() / run_linter() (sandbox/)
      │
      ▼ VerificationReport (schemas/verification.py)
      │
      ▼ commit_patch (tools/github.py, risk=high)
[PR Draft] → PRDraft (schemas/pr.py)

Package Structure

forge_core/
├── schemas/          ← Pydantic v2 inter-agent contracts
│   ├── findings.py   ← Finding, FindingSet, Severity, Category
│   ├── plan.py       ← ImplementationPlan, AtomicTask, RiskLevel
│   ├── verification.py ← VerificationReport, TestResult, LinterResult
│   ├── pr.py         ← Patch, PRDraft, CommitMetadata, PRDiff, PRMetadata
│   └── repo.py       ← RepoMap, FileNode, SymbolRef, SearchResult
├── tools/            ← @forge_tool decorated functions + ToolRegistry
│   ├── registry.py   ← ForgeTool, ToolRegistry, forge_tool decorator
│   ├── github.py     ← fetch_pr_diff, fetch_repo_file, commit_patch …
│   ├── filesystem.py ← read_file, write_file, list_files, apply_diff …
│   ├── execution.py  ← run_tests, run_linter, run_type_checker …
│   └── search.py     ← search_code, find_symbol, build_repo_map
├── guardrails/       ← Pure Python safety logic
│   ├── commit_safety.py ← CommitSafetyGuard
│   ├── injection.py  ← PromptInjectionGuard
│   ├── pii.py        ← PIIGuard
│   ├── scope.py      ← ScopeGuard
│   └── confidence.py ← ConfidenceGuard
├── sandbox/          ← Docker-based isolated code execution
│   ├── docker.py     ← DockerSandbox, SandboxViolation
│   ├── executor.py   ← CommandExecutor, ExecutionResult
│   └── snapshot.py   ← SandboxSnapshot
├── evals/            ← Evaluation harness
│   ├── harness.py    ← ForgeEvaluator, EvalReport
│   ├── swe_bench.py  ← SWEBenchEvaluator, SWEBenchIssue
│   └── metrics.py    ← RunMetrics, AggregateMetrics
└── adapters/         ← Optional framework wiring (thin, ~30 LOC each)
    ├── base.py       ← ForgeAdapter ABC, ForgeRunResult
    ├── openai_sdk.py ← get_openai_tools()
    ├── langgraph.py  ← get_langgraph_tools()
    ├── crewai.py     ← get_crewai_tools()
    ├── adk.py        ← get_adk_tools()
    └── strands.py    ← get_strands_tools()

Quick Start

Use tools in any framework

from forge_core.tools import ToolRegistry

# All registered tools, filtered by risk level
tools = ToolRegistry.by_risk(("low", "medium"))

# Convert to your framework's format
from forge_core.adapters.openai_sdk import get_openai_tools
openai_schemas = get_openai_tools(risk_levels=("low", "medium"))

from forge_core.adapters.langgraph import get_langgraph_tools
lc_tools = get_langgraph_tools()

Work with schemas

from forge_core.schemas import (
    ImplementationPlan, AtomicTask, RiskLevel,
    Patch, PRDraft, VerificationReport,
    Finding, FindingSet, Severity, Category,
)

# Plan produced by the Planner agent
plan = ImplementationPlan(
    tasks=[
        AtomicTask(
            file_path="src/utils.py",
            instruction="Add null check before calling process()",
            expected_diff_size=5,
            test_implications="tests/test_utils.py::test_null_input",
        )
    ],
    risk_level=RiskLevel.LOW,
    rationale="Defensive null check to prevent AttributeError on None input.",
)

# Any framework's Coder agent consumes the same type
if plan.high_risk():
    # trigger HITL
    ...

Apply guardrails

from forge_core.guardrails import (
    CommitSafetyGuard, ScopeGuard, PIIGuard,
    PromptInjectionGuard, GuardChain, GuardViolation,
)
from forge_core.schemas import Patch

# Compose guards for commit-time checks
commit_guard = GuardChain(
    CommitSafetyGuard(confidence_floor=0.80),
    ScopeGuard(max_diff_lines=5000),
)

patch = Patch(
    file="src/utils.py",
    diff="@@ -1,2 +1,2 @@\n context\n-old\n+new",
    confidence=0.92,
    is_safe_to_autocommit=True,
)

try:
    commit_guard.check(patch)
    # proceed with commit
except GuardViolation as e:
    print(f"Blocked: {e}")
    # route to HITL

# Scan PR body for PII before posting
pii_guard = PIIGuard()
try:
    pii_guard.check(pr_body)
except GuardViolation as e:
    print(f"PII detected: {e}")

# Check untrusted issue body for injection
injection_guard = PromptInjectionGuard()
injection_guard.check(issue_body)  # raises GuardViolation if injection found

Run code in a sandbox

from forge_core.sandbox import DockerSandbox, SandboxViolation

async with DockerSandbox(
    repo_path="/path/to/repo",
    memory_limit="2g",
    network="none",
) as sandbox:
    # Run tests
    result = await sandbox.run_tests(filter="test_utils")
    if result.passed:
        print("Tests pass")

    # Write files (workspace boundary enforced)
    await sandbox.write_file("/path/to/repo/src/fixed.py", new_content)

    # Escape attempts raise SandboxViolation
    try:
        await sandbox.write_file("/etc/passwd", "evil")
    except SandboxViolation:
        pass  # caught — always

    # Snapshot / restore for clean-state test runs
    snap = await sandbox.snapshot()
    await sandbox.run_tests()
    await snap.restore()  # workspace is clean again

Evaluate adapters

from forge_core.evals import ForgeEvaluator
from forge_core.adapters.base import ForgeAdapter, ForgeRunResult

class MyLangGraphAdapter(ForgeAdapter):
    async def run(self, issue_url, repo_path) -> ForgeRunResult:
        # your LangGraph orchestration here
        ...

    async def get_plan(self, issue_url, repo_path):
        ...

    async def verify(self, repo_path, branch):
        ...

evaluator = ForgeEvaluator(
    adapter=MyLangGraphAdapter(),
    issue_set=[
        "https://github.com/org/repo/issues/42",
        "https://github.com/org/repo/issues/43",
    ],
)
report = await evaluator.run()
report.print_summary()
# Eval Report — Adapter: MyLangGraphAdapter
# Success rate: 100.0%
# Pass@1 rate: 100.0%
# Mean duration: 45.2s
# Mean cost: $0.3200
# ...

Tools Reference

Tool Risk Taxonomy

Risk Behavior
low Read-only or idempotent. Executes silently.
medium Logged to audit trail. Proceeds automatically.
high Blocked by default. Requires guard chain clearance before execution.

Registering a Tool

from forge_core.tools import forge_tool

@forge_tool(
    risk="low",
    description="Fetch the open issues for a GitHub repo.",
    category="data",
)
async def fetch_open_issues(repo_url: str) -> list[dict]:
    ...

The decorator registers the function in ToolRegistry and returns it unchanged — it remains directly callable.

GitHub Tools (forge_core.tools.github)

Tool Risk Description
fetch_pr_diff low Structured diff for a GitHub PR
fetch_repo_file low File content at a specific SHA
fetch_pr_metadata low PR title, state, author, branches
list_pr_comments low All review comments on a PR
post_pr_comment medium Post an issue-level comment
commit_patch high Commit a Patch to the feature branch

Set GITHUB_TOKEN environment variable for authenticated requests.

Filesystem Tools (forge_core.tools.filesystem)

Tool Risk Description
read_file low Read file contents as UTF-8
list_files low Glob files in a directory
get_file_diff low Unified diff between two strings
write_file medium Write (overwrite) a file
apply_diff medium Apply a unified diff to a file

Execution Tools (forge_core.tools.execution)

Tool Risk Description
run_linter low Run ruff on specified paths
run_type_checker low Run mypy on specified paths
check_formatting low Run ruff format --check
run_tests medium Run pytest with fail-fast

Search Tools (forge_core.tools.search)

Tool Risk Description
search_code low Regex search across code files
find_symbol low Find function/class definitions by name
build_repo_map low Full structural map with symbols

Guardrails Reference

All guards expose check(subject) — returns GuardResult(passed=True) or raises GuardViolation.

CommitSafetyGuard

Blocks patches touching high-risk paths, with low confidence, or flagged unsafe:

from forge_core.guardrails import CommitSafetyGuard

guard = CommitSafetyGuard(
    confidence_floor=0.80,
    blocked_paths=["auth/", "payments/", "migrations/"],
)
guard.check(patch)

PromptInjectionGuard

Pattern-based detection of adversarial instructions in untrusted input:

from forge_core.guardrails import PromptInjectionGuard

guard = PromptInjectionGuard()
guard.check(issue_body)        # raises if injection detected
guard.check(pr_description)
guard.check(file_content)

For model-based classification, supply a classifier implementing InjectionClassifier.

PIIGuard

Scans output for PII and credentials before external posting:

from forge_core.guardrails import PIIGuard

guard = PIIGuard(disabled_kinds={"email_address"})
guard.check(pr_body)   # raises if SSN, credit card, AWS key, etc. found
matches = guard.scan(text)  # non-raising: returns list[PIIMatch]

Detected patterns: US SSN, credit cards, email, US phone, AWS access keys, private key headers, GitHub tokens, JWTs.

ScopeGuard

Diff-size budget and repo allowlist:

from forge_core.guardrails import ScopeGuard

guard = ScopeGuard(
    max_diff_lines=5000,
    allowed_repos=["https://github.com/org/myrepo"],
)
guard.check(patch)                    # diff size + blocked paths
guard.check_repo(pr_url)              # repo allowlist

ConfidenceGuard

Standalone confidence floor check (composable with CommitSafetyGuard):

from forge_core.guardrails import ConfidenceGuard

ConfidenceGuard(floor=0.85).check(patch)

GuardChain

Compose multiple guards; first violation halts:

from forge_core.guardrails import GuardChain

chain = GuardChain(
    PromptInjectionGuard(),
    CommitSafetyGuard(confidence_floor=0.75),
    ScopeGuard(max_diff_lines=10_000),
)
chain.check(patch)

Adapter Pattern

Each adapter is ~100–300 lines. It wires forge-core into a framework's native API.

LangGraph (sketch)

from langgraph.graph import StateGraph
from forge_core.adapters.langgraph import get_langgraph_tools
from forge_core.guardrails import CommitSafetyGuard, GuardChain
from forge_core.schemas import ImplementationPlan

tools = get_langgraph_tools(risk_levels=("low", "medium"))

graph = StateGraph(SWEForgeState)
graph.add_node("planning", planning_node)   # produces ImplementationPlan
graph.add_node("hitl", interrupt_node)      # native LangGraph interrupt()
graph.add_node("coding", coder_node)        # uses forge_core tools

OpenAI Agents SDK (sketch)

from agents import Agent, Runner
from forge_core.adapters.openai_sdk import get_openai_tools

orchestrator = Agent(
    name="Orchestrator",
    model="gpt-4.1",
    tools=get_openai_tools(risk_levels=("low", "medium")),
)

CrewAI (sketch)

from crewai.flow.flow import Flow, start, listen
from forge_core.adapters.crewai import get_crewai_tools

class ForgeFlow(Flow):
    @start()
    def load_context(self): ...

    @listen(load_context)
    def run_planning_crew(self):
        tools = get_crewai_tools()
        ...

Sandbox Image Spec

Build the forge-sandbox image for the DockerSandbox:

FROM python:3.12-slim

RUN apt-get update && apt-get install -y patch git && rm -rf /var/lib/apt/lists/*
RUN pip install pytest ruff mypy black

# Node.js for TypeScript projects
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && apt-get install -y nodejs
RUN npm install -g typescript

# Rust toolchain
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
ENV PATH="/root/.cargo/bin:$PATH"

RUN useradd -m -u 1000 sandbox
USER sandbox
WORKDIR /workspace

Build: docker build -t forge-sandbox:latest -f Dockerfile.sandbox .


Configuration

# forge.toml

[forge]
default_framework = "langgraph"
max_run_budget_usd = 5.00
auto_approve_low_risk = false
hitl_timeout_hours = 24

[sandbox]
image = "forge-sandbox:latest"
memory_limit = "2g"
network = "none"
timeout_seconds = 300

[github]
app_id = "${GITHUB_APP_ID}"
private_key_path = "${GITHUB_APP_PRIVATE_KEY_PATH}"

[guardrails]
commit_confidence_floor = 0.75
max_diff_lines = 10000
blocked_paths = ["auth/", "payments/", "migrations/", "secrets/"]
prompt_injection_model = "claude-haiku-4-5"

[evals]
swe_bench_split = "verified"
issue_set_path = "./eval_issues.json"

SWE-bench Evaluation

from forge_core.evals.swe_bench import SWEBenchEvaluator

evaluator = SWEBenchEvaluator(
    adapter=MyAdapter(),
    issue_set_path="./eval_issues.json",   # SWE-bench JSON format
    concurrency=4,
)
report = await evaluator.run()
report.print_summary()

Issue set JSON format:

[
  {
    "instance_id": "django__django-12345",
    "repo": "django/django",
    "issue_url": "https://github.com/django/django/issues/12345",
    "problem_statement": "...",
    "base_commit": "abc123",
    "patch": "...",
    "test_patch": "..."
  }
]

Development

git clone https://github.com/your-org/forge-core
cd forge-core
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run tests (excluding Docker integration)
pytest -m "not docker"

# Run with Docker integration (requires running Docker daemon)
pytest -m docker

# Type check
mypy forge_core/

# Lint
ruff check forge_core/ tests/

Test Coverage Target

Module Target
forge_core.guardrails ≥ 98% (safety-critical)
forge_core.sandbox ≥ 98% (safety-critical)
forge_core.schemas ≥ 95%
forge_core.tools ≥ 90%
forge_core.evals ≥ 90%
forge_core.adapters Best-effort (optional deps)

Sandbox Escape Tests

Zero-tolerance policy: tests/sandbox/test_docker.py::TestAssertWithinWorkspace covers:

  • Path inside workspace → allowed
  • Absolute path outside workspace → SandboxViolation
  • ../ traversal at all depths → SandboxViolation
  • Sibling directory → SandboxViolation
  • Path-prefix trick (/tmp/workspace_evil vs /tmp/workspace) → SandboxViolation
  • Symlink pointing outside workspace → SandboxViolation

Schema Stability Contract

Schema fields are additive-only between minor versions. Field removals are major version changes. This ensures that a LangGraph Planner's ImplementationPlan output remains compatible with an OpenAI Coder's input across minor releases.


Who This Is For

forge-core is for teams building production agentic SWE systems who have made — or are evaluating — a choice of orchestration framework.

You will get the most out of this library if:

  • You are building an automated code-review, bug-fix, or PR-generation pipeline on top of GitHub.
  • Your team uses or is evaluating LangGraph, OpenAI Agents SDK, CrewAI, Google ADK, or AWS Strands and wants the domain logic to be portable across that choice.
  • You want guardrails, sandbox isolation, and eval infrastructure written once and audited once — not reimplemented inside each adapter.
  • You need empirical data to decide between frameworks (the eval harness gives you an apples-to-apples comparison on the same issue set).
  • You are comfortable owning the orchestration layer: wiring a StateGraph or a Runner loop is your job; forge-core gives you everything that goes inside those nodes.

Roles that map well onto this library:

  • Principal/Staff engineers designing a multi-agent SWE system from scratch.
  • Platform/infra teams standardizing tooling across multiple agent projects in an org.
  • ML/AI engineers running SWE-bench evaluations and iterating on framework or model choice.
  • Security engineers who need a single auditable guardrail layer rather than scattered per-framework safety logic.

Who Should Look Elsewhere

You probably do not need forge-core if:

  • You are building a single, simple agent in one framework and have no plans to evaluate or migrate. The abstraction overhead is not worth it for a one-adapter world.
  • You want a fully assembled, batteries-included agent that works out of the box. forge-core is a library of domain primitives — it does not provide a runnable agent. You must build the orchestration layer yourself (see: ForgeAdapter in adapters/base.py).
  • Your workflow is not GitHub-centric. The tool layer is built around GitHub PRs and issues. Adapting it to GitLab, Bitbucket, or a custom VCS requires replacing tools/github.py entirely.
  • You are not working in Python. There are no bindings for other languages.
  • You want a hosted or SaaS solution. forge-core is a local Python library that you run in your own infrastructure.
  • You are early in learning agentic frameworks. The library assumes you already understand state graphs, agent loops, and tool-calling patterns. It adds structure on top of that knowledge, not in place of it.

Known Limitations

These are current gaps in the implementation, not design decisions. They represent work to do before running this in production.

1. Adapter orchestration is not implemented. The five adapter files (openai_sdk.py, langgraph.py, crewai.py, adk.py, strands.py) are tool converters only — they wrap forge-core tools into framework-native formats but none of them subclass ForgeAdapter and implement run(), get_plan(), or verify(). The eval harness and the full issue → PR pipeline cannot run end-to-end without at least one concrete ForgeAdapter implementation. This is the most significant gap.

2. PromptInjectionGuard is regex-based only. The 17 patterns cover common jailbreak templates but will miss novel phrasing, multilingual injections, and adversarially encoded inputs. The InjectionClassifier protocol exists for plugging in a model-based detector, but no implementation is provided. For production use against untrusted issue bodies, you should supply a classifier (e.g., a fine-tuned Haiku call).

3. PIIGuard uses static regex patterns. It covers US SSN, credit cards, email addresses, US phone numbers, AWS access keys, private key PEM headers, GitHub tokens, and JWTs. It will not catch non-US formats, obfuscated PII, or PII in images or PDFs attached to PRs. Extend _PII_PATTERNS or subclass PIIGuard for your coverage requirements.

4. DockerSandbox requires a local Docker daemon. The sandbox uses the docker Python SDK synchronously inside asyncio.to_thread. It will not work in environments without a Docker daemon (e.g., GitHub Actions without DinD, some Kubernetes pods, or cloud sandboxes). There is no Podman, gVisor, or Firecracker backend.

5. _assert_within_workspace requires paths to be resolvable. pathlib.Path.resolve() follows symlinks but requires the path to exist for full resolution on some platforms. A path like /workspace/new_dir/new_file.py where new_dir does not yet exist may not resolve correctly on all systems. The write_file method calls target.parent.mkdir(parents=True, exist_ok=True) before writing, which mitigates this in practice, but symlink-pointing-outside-workspace checks depend on the path already existing.

6. No per-command timeout in DockerSandbox._exec. timeout_seconds is a container-level config field but is not passed as a per-exec_run timeout. A runaway test suite or linter will block the event loop thread indefinitely. Wrap _exec with asyncio.wait_for at the call site until this is addressed.

7. ForgeEvaluator.run() aborts on any single adapter exception. asyncio.gather(..., return_exceptions=False) means a single run() failure propagates and cancels the entire eval. For large issue sets, set return_exceptions=True and filter in the aggregation step.

8. No config loader. forge.toml is documented in the Configuration section and in the spec, but there is no forge_core.config module that parses it. Configuration values (confidence floors, blocked paths, sandbox settings) must be passed directly to guard and sandbox constructors. Building a config loader that reads forge.toml and instantiates these components is left to the caller.

9. No GitHub App authentication at the tool layer. tools/github.py reads GITHUB_TOKEN from the environment. GitHub App authentication (app ID + private key rotation) is in forge.toml but is not wired into the tool implementations.

10. ToolRegistry is a module-level singleton. All tests that register tools share the same registry unless they call ToolRegistry.clear() in their teardown. The provided conftest.py handles this for the test suite, but adapters that register tools at import time can produce ordering-dependent test failures if clear() is not called.


Adapting forge-core for Your Needs

Step 1: Implement a ForgeAdapter

This is the critical first step. Nothing else runs end-to-end without it.

from forge_core.adapters.base import ForgeAdapter, ForgeRunResult
from forge_core.schemas import ImplementationPlan, VerificationReport

class MyLangGraphAdapter(ForgeAdapter):
    async def run(self, issue_url: str, repo_path: str) -> ForgeRunResult:
        # Wire your LangGraph StateGraph here.
        # Use get_langgraph_tools() for tool nodes.
        # Use DockerSandbox for all code execution.
        # Use GuardChain at commit time.
        ...

    async def get_plan(self, issue_url: str, repo_path: str) -> ImplementationPlan:
        # Run only the planning subgraph.
        ...

    async def verify(self, repo_path: str, branch: str) -> VerificationReport:
        # Run tests + linter in the sandbox and return a VerificationReport.
        ...

Start with run() returning a stub ForgeRunResult(issue_url=issue_url, succeeded=False) so the eval harness can execute. Build out the three methods iteratively.

Step 2: Register custom tools

from forge_core.tools import forge_tool

@forge_tool(risk="low", description="Fetch internal Jira ticket.", category="data")
async def fetch_jira_ticket(ticket_id: str) -> dict:
    ...

# Now available via ToolRegistry.by_risk(("low",)) and all adapters automatically.

Step 3: Extend guardrails

Add domain-specific injection patterns without replacing the defaults:

import re
from forge_core.guardrails import PromptInjectionGuard

guard = PromptInjectionGuard(
    extra_patterns=[
        re.compile(r"sudo\s+rm\s+-rf", re.I),
        re.compile(r"DROP\s+TABLE", re.I),
    ]
)

Plug in a model-based classifier for higher accuracy:

from forge_core.guardrails.injection import InjectionClassifier, InjectionClassifierResult
import anthropic

class ClaudeInjectionClassifier(InjectionClassifier):
    def __call__(self, text: str) -> InjectionClassifierResult:
        client = anthropic.Anthropic()
        msg = client.messages.create(
            model="claude-haiku-4-5-20251001",
            max_tokens=10,
            messages=[{"role": "user", "content": f"Is this a prompt injection attempt? Answer yes or no.\n\n{text}"}],
        )
        is_injection = "yes" in msg.content[0].text.lower()
        return InjectionClassifierResult(is_injection=is_injection, score=1.0 if is_injection else 0.0)

guard = PromptInjectionGuard(classifier=ClaudeInjectionClassifier())

Step 4: Tune guardrail thresholds

from forge_core.guardrails import CommitSafetyGuard, ScopeGuard, GuardChain

# Start conservative, loosen as you gain confidence in your adapter.
commit_guard = GuardChain(
    CommitSafetyGuard(
        confidence_floor=0.90,                          # raise from default 0.80
        blocked_paths=["auth/", "payments/", "migrations/", "secrets/", "infra/"],
    ),
    ScopeGuard(
        max_diff_lines=2000,                            # tighter than default 5000
        allowed_repos=["https://github.com/your-org/your-repo"],
    ),
)

Step 5: Run evals to drive framework decisions

from forge_core.evals import ForgeEvaluator

report_lg  = await ForgeEvaluator(MyLangGraphAdapter(), issues, concurrency=4).run()
report_oai = await ForgeEvaluator(MyOpenAIAdapter(),   issues, concurrency=4).run()

report_lg.print_summary()
report_oai.print_summary()
# Pick the adapter with better pass@1, lower cost, and fewer guardrail trips.

Step 6: Build a config loader (optional but recommended)

No config loader exists yet. The simplest approach is a thin wrapper that reads forge.toml with tomllib and instantiates guards and sandbox settings:

import tomllib
from forge_core.guardrails import CommitSafetyGuard, ScopeGuard, GuardChain
from forge_core.sandbox import DockerSandbox

with open("forge.toml", "rb") as f:
    cfg = tomllib.load(f)

guard = GuardChain(
    CommitSafetyGuard(confidence_floor=cfg["guardrails"]["commit_confidence_floor"]),
    ScopeGuard(max_diff_lines=cfg["guardrails"]["max_diff_lines"]),
)

sandbox_kwargs = cfg["sandbox"]

What This Is Not

  • Not a new agent framework. forge-core does not implement orchestration, state machines, or control flow.
  • Not a lowest-common-denominator abstraction. Adapters are unconstrained in how they use their frameworks.
  • Not version-coupled to any framework. The core has no direct dependencies on LangGraph, OpenAI Agents SDK, CrewAI, ADK, or Strands. Adapters carry those dependencies as optional extras.

About

Framework-agnostic core for SWE agents: shared tools, schemas, guardrails, sandboxing, evals, and adapters for major agent SDKs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages