Background coding-agent pipelines that plan, implement, review, and land work
end-to-end. Given a goal or GitHub issue, a gremlin runs the full
plan → implement → review-code → address-code cycle unattended, writing
artifacts to the per-user state directory resolved by
platformdirs.user_state_dir("gremlins") and optionally opening a pull
request. A fleet manager tracks running, stalled, and finished gremlins and
provides stop / land / close operations.
Status: brand-new and a bit janky. This is a fresh project, actively shaped by daily use. Expect rough edges — stream timeouts, the occasional merge conflict from parallel gremlins, a few stages still finding their final shape. Bug reports, ideas, and PRs are all welcome.
Paste the output of gremlins prompt-for-assistant into a fresh Claude Code session (or any compatible assistant) to configure it as a competent gremlins collaborator.
The workflow: you discuss the work with the assistant, it captures discrete units as GitHub issues or plan files, launches gremlins in the background to implement them, and lands each finished gremlin before starting dependent work. You stay at the strategic level — deciding what to build and in what order — while gremlins handle the implementation cycle unattended. The assistant maintains a queue of running, pending, and blocked work and surfaces it on request.
When you run gremlins launch, the launcher captures the current working
directory's repo root via git rev-parse --show-toplevel and stores it as
project_root in the gremlin's state.json. That value pins the worktree
base, child process cwd, and pipeline discovery for that gremlin's lifetime.
To work on a different repo: cd there, then gremlins launch. There is
no --project-root flag; the cwd at launch time is the contract.
Fleet view (gremlins) shows gremlins from all repos by default.
Pass --here to filter to the current repo's project_root.
Pipeline discovery walks from the launching cwd, so .gremlins/pipelines/
overrides in each repo apply to gremlins launched from that repo.
Queue caveat: there is one global queue and the runner's cwd is frozen at
gremlins queue run --detach time. To queue work against a different repo,
prefix the command with cd:
gremlins queue add "cd /path/to/other-repo && gremlins launch gh --plan '#42' --wait"
gremlins queue add "cd /path/to/other-repo && gremlins land <id>"State isolation: each gremlin's state lives under its own directory
(resolved via platformdirs.user_state_dir("gremlins")/<id>/), so two repos
can have running gremlins simultaneously without interference.
gh— GitHub CLIgit— Git (pre-installed on most systems)claude— Claude Code CLI
uv venv
source .venv/bin/activate # or `.venv\Scripts\activate` on Windows
uv pip install -e ".[dev]"| Target | What it runs |
|---|---|
make test |
pytest |
make lint |
ruff check . |
make format |
ruff format --check . (check only — does not rewrite files) |
make typecheck |
pyright |
make check |
lint + format + typecheck |
Invoked as python -m gremlins.cli <subcommand> or gremlins <subcommand>
after install. The authoritative list and per-subcommand description lives in
the dispatch table in gremlins/cli/__init__.py.
| Subcommand | Purpose |
|---|---|
launch <name> |
Launch a background gremlin by pipeline name (gremlins launch --list to see available) |
resume |
Re-spawn an existing gremlin from its recorded stage |
stop |
Send SIGTERM to a running gremlin and wait for it to exit |
land |
Land a finished gremlin onto the current branch |
rm |
Delete a dead gremlin's state dir, worktree, and branch |
close |
Mark a dead gremlin as closed |
log |
Tail the gremlin's log file |
ack |
Acknowledge a gremlin waiting for human input |
skip |
Skip a gremlin waiting for human input |
queue |
Manage the gremlin launch queue |
prompt-for-assistant |
Print the assistant setup prompt to stdout |
_run-pipeline is an internal spawn boundary; not for direct use.
| Sub-subcommand | Description |
|---|---|
add [--run] <command> |
Add a command to the queue; --run also starts the runner if idle |
list [--watch] [--json] |
List queued items |
run [--once] [--poll-interval SEC] [--detach] |
Start the queue runner |
requeue [--done] |
Move failed (and optionally done) items back to pending |
clear [--failed|--done|--pending|--purge|--item STEM] |
Remove items from the queue |
set-state <state> --item STEM |
Manually transition a queue item to a different state |
stop |
Stop the detached runner |
Flags vary by pipeline. The first stage's __init__ signature defines the accepted flags; gremlins launch <name> --help prints the full list.
Common infrastructure flags (accepted by all pipelines):
| Flag | Default | Description |
|---|---|---|
--plan <path-or-ref> |
— | Path to a plan/spec file, or a GitHub issue ref (42, #42, owner/repo#42, or issue URL) |
--description <text> |
— | Human-readable description stored in state |
--parent <id> |
— | Parent gremlin ID (used by boss to track child ownership) |
--print-id |
false | Print the gremlin ID to stdout after launch |
-c/--instructions <text> |
— | Instructions string (mutually exclusive with --plan) |
--base-ref <ref> |
HEAD |
Git ref to branch the worktree from; ignored for gh pipelines (always anchors to origin default branch). In parallel pipelines, automatically propagated to all child processes. |
--spec <path> |
— | Path to a coding-style spec file passed into stages |
--bypass |
false | Skip permission checks; run in bypass mode |
Gremlins runs a sequence of stages defined in a YAML file. The bundled pipelines work out of the box; a project-local YAML can override any of them.
--pipeline <name|path> resolves as follows:
- A value with a
.yamlsuffix or more than one path component is loaded directly as a filesystem path. - Otherwise
./.gremlins/pipelines/<name>.yamlis checked first (project-local override). - Then
gremlins/pipelines/<name>.yaml(bundled) is checked.
The pipeline name is the first non-flag argument to gremlins launch. Run gremlins launch --list to see all available pipeline names.
gremlins launch local # bundled local.yaml
gremlins launch gh # bundled gh.yamlTop-level keys:
name: my-pipeline # optional; defaults to the file stem
default_client: claude:sonnet # optional; provider:model string
prompt_dir: ../prompts # optional; relative to YAML, defaults to the YAML's directory
stages:
- name: plan
type: plan
client: copilot:gpt-5.4 # optional; overrides default_client for this stage
prompt: gremlins:plan.md # `gremlins:NAME` -> bundled prompts; bare NAME -> prompt_dir
options: {}| Key | Description |
|---|---|
name |
Pipeline display name; defaults to the file stem |
default_client |
provider:model string used for stages without an explicit client: |
prompt_dir |
Directory that bare-name prompt: paths resolve against, relative to the YAML file. Defaults to the YAML's directory. |
stages |
Ordered list of stage entries or parallel groups |
Per-stage keys:
| Key | Description |
|---|---|
name |
Unique stage identifier; used for resume targeting |
type |
Registered stage type (see Available stage types) |
client |
provider:model string; overrides default_client for this stage |
prompt |
Path or list of paths. gremlins:NAME resolves from the bundled package prompts; a bare NAME resolves from the pipeline's prompt_dir. |
options |
Free-form dict passed to the stage |
provider:model format:
Providers: claude (default), copilot, openai, xai, anthropic. The model part is optional — claude: and claude:sonnet are both valid. Examples: claude:sonnet, copilot:gpt-5.4, openai:gpt-4o. Per-stage client: in YAML takes precedence over the CLI --client flag; default_client: at the pipeline level does not.
Parallel-group form:
- name: reviews
parallel:
- name: review-detail
type: review-code
client: claude:sonnet
- name: review-security
type: review-code
client: claude:sonnet
max_concurrent: 2 # optional; defaults to all children at once| Key | Description |
|---|---|
name |
Group identifier |
parallel |
List of child stage entries (no nesting allowed) |
max_concurrent |
Max simultaneously running children (optional) |
Clients are specified as provider:model inline strings, either at the pipeline level (default_client:) or per stage (client:). The model part is optional.
default_client: claude:sonnet # all stages default to this
stages:
- name: plan
type: plan
- name: implement
type: implement
client: copilot:gpt-5.4 # this stage uses copilot insteadProviders: claude, copilot, openai, xai, anthropic. The CLI --client provider:model flag overrides the pipeline-level default_client: but yields to per-stage client: settings.
prompt: gremlins:plan.md # single bundled file
prompt: [gremlins:code_style.md, plan.md] # mix bundled and local; concatenated with \n\nEach entry is one of:
gremlins:NAME— resolved from the bundled prompts shipped with the package. Use this for prompts owned by gremlins (code_style.md,plan_gh.md, etc.).- bare
NAME— resolved from the pipeline's top-levelprompt_dir:(relative to the YAML file; defaults to the YAML's own directory). Use this for prompts you author and check in alongside your pipeline.
Lists are joined with \n\n before being passed to the stage. There is
no search fallback between the two — the prefix is the contract, so a
custom YAML reads as self-describing about which prompts come from the
package vs which must be provided locally.
By convention, project-local prompts live in ./.gremlins/prompts/ (a peer
of ./.gremlins/pipelines/, not nested under it) and pipelines set
prompt_dir: ../prompts.
A free-form dict passed verbatim to the stage. Selected options by stage
(see gremlins/stages/AGENTS.md for the full list):
verify — runs a list of shell commands with an agent fix-loop:
options:
cmds: ["make check", "make test"] # commands to run (joined with &&)
max_attempts: 3 # fix-loop retries (default: 3)For local stages, model options (plan_model, impl_model, address_model,
test_fix_model, detail) can also be set here to override the CLI defaults.
| Type | Description |
|---|---|
plan |
Produces an implementation plan |
implement |
Applies the plan to the working tree |
review-code |
Runs a code review and writes findings to disk |
verify |
Runs check and test commands with an agent fix-loop |
exec |
Runs shell commands with in:/out: artifact bindings |
agent |
Resolves in: artifacts, renders prompt, invokes agent, verifies out: artifacts |
handoff |
Runs the handoff agent once per boss loop iteration |
loop |
Iterates body stages until a termination predicate or max iterations |
sequence |
Runs body stages sequentially using child state |
github-open-pull-request |
Opens a pull request on GitHub |
github-request-copilot-review |
Requests a Copilot review on the open PR |
github-wait-copilot |
Polls until Copilot posts its review |
github-wait-ci |
Polls PR CI checks until they pass or exhaust attempts |
Wrap sibling stages in a parallel: list to run them concurrently:
default_client: claude:sonnet
stages:
- name: plan
type: plan
- name: reviews
parallel:
- name: review-detail
type: review-code
- name: review-security
type: review-code
max_concurrent: 2
- name: address-code
type: agentExecution and failure: The parallel group executes in three phases:
- Fan-out — each child stage starts independently as a subprocess
- Concurrent execution — all children run simultaneously (up to
max_concurrent) - Fan-in — all children finish or one bails; siblings continue running until group completion
If any child fails (raises Bail), the pipeline halts after the group finishes —
siblings are not cancelled mid-run by default. This can be changed with cancel_on_bail: true
to cancel outstanding tasks immediately. The bail is evaluated via bail_policy (default: any,
meaning one failed child halts the group; set bail_policy: all to halt only when all children bail).
Subsequent stages are skipped; the operator can resume or ack the group via CLI.
State isolation: Each child gets its own state directory and subprocess.
Client overrides, worktree paths, and artifact bindings are isolated per-child.
Children run in parallel without blocking each other. Parent state.json is updated
during the concurrent phase (e.g., active_children snapshot); copying child artifact
bindings into the parent registry is deferred until fan-in completes.
Resume targeting: Use the full child gremlin ID (form: <parent-id>--<group-name>--<child-key>,
visible in fleet view) to resume a specific child. Resuming the parent group ID re-spawns all
children that haven't landed.
Base ref propagation: The --base-ref flag is automatically propagated from
the parent to all child processes, ensuring consistent branching across the group.
Child worktrees are derived from the parent's base_ref as recorded in state.
Create .gremlins/pipelines/local.yaml to override the bundled local
pipeline. This example uses Opus for plan/implement/address stages and adds
a verify stage before review-code:
name: local
stages:
- { type: plan, options: { plan_model: opus } }
- { type: implement, options: { impl_model: opus } }
- { type: verify, options: { cmds: ["pytest"] } }
- { type: review-code }
- { name: address-code, type: agent, options: { address_model: opus } }Add a prompt: key to any stage to supply a custom prompt; paths are
relative to the YAML file.
Run two review-code passes in parallel, then address both:
name: local
default_client: claude:sonnet
stages:
- { type: plan }
- { type: implement }
- name: reviews
parallel:
- name: review-detail
type: review-code
- name: review-security
type: review-code
max_concurrent: 2
- { name: address-code, type: agent }Note: review-code does not currently support per-stage prompt overrides
via YAML — both passes use the built-in detail lens.
YAML stage-definitions: lets you name and reuse stage patterns within a pipeline:
stage-definitions:
review-base: &review-base
type: review-code
client: claude:sonnet
prompt: gremlins:code_style.md
stages:
- { type: plan }
- { type: implement }
- name: review-detail
<<: *review-base
prompt: [gremlins:code_style.md, detail_review.md]
- name: review-security
<<: *review-base
prompt: security_review.mdDefinitions provide base type, options, and prompt. Call-sites can override
prompt and options via YAML anchors (as shown above) or via template placeholders
in multi-stage recipes. Call-sites own the name:, in:, and out: keys;
out: is forbidden inside a definition, but in: can be declared and will be
merged with call-site in: values. For single-stage definitions, only name, in,
and out keys can be safely overridden; to vary prompt or options, use anchors.
Stages can bind artifacts via in: and out: maps. These define what data
flows between stages in the pipeline:
stages:
- name: scan
type: exec
options:
cmds: ["python scan.py > $ARTIFACTS/report.json"]
out:
report: file://session/report
- name: analyze
type: agent
in:
report: report
prompt: |
The scanning report is in {report}.
Propose fixes.Artifact URI schemes:
file://session/<name>— Session artifact: a file created under the gremlin's$ARTIFACTSdirectorygit://ref/<ref>— Git ref name (e.g.,git://ref/mainreturns the stringmain)git://commit/<sha>— Commit SHA (e.g.,git://commit/abc123defreturns the full SHA)git://range/<base>..<head>— Commit range/log between two refsgh://pulls/<number>/head— GitHub PR head ref (and othergh://schemes for GitHub data)file://,git://,gh://— File artifact resolvers support these base schemes
Artifact binding semantics:
in:values are registry key paths (e.g.,reportorreport.critical?default) with optional dotted attribute access and?defaultfallbackout:values are URI strings that name what the stage produces; downstream stages reference the key name (not the URI) in theirin:maps- Prompt/option substitution uses
{var}tokens (not{{var}}); artifacts bound viain:become available for substitution in:can be declared in a stage definition and will be merged with call-sitein:values;out:cannot appear inside a definition
Some stage types are not built-in — they are provided as bundled YAML recipes and must be wired in via stage-definitions: before use:
stage-definitions:
github-push-to-pr-branch: gremlins:github_push_to_pr_branch
stages:
- { name: push, type: github-push-to-pr-branch }gremlins:NAME resolves the recipe from the bundled package (gremlins/recipes/stages/NAME.yaml). A bare path resolves relative to the pipeline file.
The canonical reference pipelines:
gremlins/pipelines/local.yaml—gremlins launch localgremlins/pipelines/gh.yaml—gremlins launch ghgremlins/pipelines/gh-terse.yaml—gremlins launch gh-tersegremlins/pipelines/pr-extend.yaml—gremlins launch pr-extendgremlins/pipelines/boss.yaml—gremlins launch boss
Gremlins can fail or get stuck during execution. Understanding how to recover is essential for running long-running pipelines.
When a stage detects an unrecoverable condition (e.g., a code review requests changes, secrets are detected, or a merge conflict blocks progress), it raises a Bail exception with a detail string.
By convention, agent-based stages emit a BAIL: <class>: <detail> marker at the end of their output. The <class> token is conventionally one of:
reviewer_requested_changes— code review found issues that must be addressedsecurity— security review detected problemssecrets— credentials or sensitive data detected in the codeother— stage-specific or unknown failure condition
The bail detail is written to a per-attempt bail_<attempt>.json file in the gremlin's state directory and is visible in the fleet view. When a stage bails, the entire pipeline halts — subsequent stages do not run, but the gremlin's state is preserved for recovery.
When a gremlin bails and halts, you have three recovery options:
gremlins resume <id> — Re-spawn the bailed gremlin from the stage where it
bailed. Use this when the cause has been fixed externally (e.g., a code review
fix has been merged, or a merge conflict has been resolved). The gremlin will
restart from the bailed stage with the current worktree state.
gremlins ack <id> — Acknowledge the gremlin without re-running. Use this
when the bailed condition is acceptable (e.g., the review found minor style
issues that don't block landing, or external work was already completed). The
gremlin marks the bailed stage as complete and proceeds to subsequent stages.
gremlins skip <id> — Create a new sibling attempt with the same parameters
and a fresh ID, leaving the failed gremlin in place. Use this for transient
failures (timeouts, CI hangs) that won't self-resolve. Both attempts are visible
in the fleet; the new attempt begins from the start.
When a child in a parallel group bails:
- The group halts after all currently-running children finish (not mid-run), unless
cancel_on_bail: true - The bail reason is attributed to the child stage name
gremlins resume <parent-id>re-spawns all children that haven't landedgremlins resume <parent-id>--<group-name>--<child-key>resumes only that child (use the full child ID from fleet view)
If the cause was a transient failure affecting multiple children, skip the entire
group and re-launch the pipeline to restart all children.
When a boss gremlin spawns child gremlins (gremlins launch ... --parent <boss-id>),
the boss halts if a child bails. At this point:
- The child's gremlin ID is visible in the fleet view as a child of the boss
- Recover the child (
resume,ack, orskip) independently - Once the child lands or is abandoned, resume the boss (
gremlins resume <boss-id>)
The boss resumes from its child-spawn stage and proceeds with the next iteration (re-planning, re-implementing, or wrapping up, depending on the pipeline).
Gremlins operate in one of two permission modes:
Default mode (no flags): The agent is restricted to an allowlist of tools (Read, Edit, Write, Bash, Grep, Glob) and its Bash commands are path-scoped to the gremlin's git worktree. It can read and modify files inside that worktree and blocks direct path references outside it. This is a best-effort token check, not a full sandbox — indirect references (heredocs, computed paths) may not be caught.
Bypass mode (--bypass, GREMLINS_BYPASS_PERMISSIONS=1, project
.gremlins/permissions.yaml bypass_permissions: true, or user config
~/.config/gremlins/config.toml bypass_permissions = true): All permission
checks are disabled. The agent can use any tool and reference any path. Use
this when the task genuinely requires broader access (e.g. a pipeline that
modifies system config).
The three opt-in paths for bypass are:
gremlins launch <pipeline> --bypass— single-launch overrideGREMLINS_BYPASS_PERMISSIONS=1in the environmentbypass_permissions: truein.gremlins/permissions.yaml(project) orbypass_permissions = truein~/.config/gremlins/config.toml(user)
Honest disclaimer: The allowlist limits reach — what paths and tools the agent can invoke. It does not limit impact within reach. A gremlin with write access to your worktree can make any change inside it. Review landed commits before merging.
Backend differences: On openai: and xai: backends, gremlins owns the
tool layer and enforces the allowlist directly. On the anthropic: backend,
enforcement is coarser — the SDK loop uses vendor-defined tools and the path
scoping is advisory. On claude: and copilot: subprocess backends, the
gremlins-layer permission block is not translated into CLI flags or
settings — the underlying CLI reads the operator's ambient config and
enforces whatever the operator has configured there. See "Backend config
inheritance" below.
The claude: backend is a thin wrapper around claude -p. It does not
materialize a per-gremlin config dir, and it does not set
CLAUDE_CONFIG_DIR for the subprocess. Whatever the operator has configured
for their interactive Claude session is exactly what the subprocess sees:
-
Settings —
~/.claude/settings.json(plus any project-level.claude/settings.jsonthe CLI discovers) is read by the CLI directly. The gremlins-layerallowed_tools/disallowed_toolsblock has no effect onclaude:runs; configure tool permissions via your own Claude settings or use theanthropic:backend.Gremlin worktrees — where the
claude:subprocess does its file edits — live under a stable, gremlins-scoped prefix in the system temp directory. Discover it at runtime:python -c "from gremlins import paths; print(paths.work_root())"On Linux/macOS this is
/tmp/gremlins; the OS reclaims orphaned worktrees on reboot. A singlepermissions.allowrule in~/.claude/settings.jsoncovers every worktree path:{ "permissions": { "allow": [ "Edit(<work_root>/**)", "Write(<work_root>/**)", "Read(<work_root>/**)" ] } }Replace
<work_root>with the actual output of the command above. -
MCP servers and hooks — inherited from the user's Claude config.
-
Auth — subscription auth follows
~/.claude/.credentials.json(or the macOS keychain) exactly as it would for an interactive session. -
Permission mode — the only thing the wrapper still controls per call:
--permission-mode bypassPermissionswhen bypass is enabled, otherwisedefault.
If you need per-gremlin tool allow-lists, hermetic config, or a clean separation between gremlins and your interactive Claude session, use one of the SDK-backed providers instead:
anthropic:<model-id>—claude-agent-sdkwithsetting_sources=[](no ambient settings, no MCP, no hooks). RequiresANTHROPIC_API_KEY.allowed_toolsfrom the native block is enforced by the SDK.openai:<model-id>/xai:<model-id>—openai-agentsSDK with the in-treeGREMLINS_TOOLSlist. Per-gremlinallowed_toolsfilters that list. RequiresOPENAI_API_KEY/XAI_API_KEY.
Set via pipeline YAML:
default_client: anthropic:claude-sonnet-4-6
# or per-stage:
stages:
- name: implement
client: anthropic:claude-sonnet-4-6Subscription auth is not available on the SDK backends — that is Anthropic policy, not a gremlins limitation.
If .gremlins/env exists in the project root, gremlins sources it through
bash at startup and merges any new or changed variables into the process
environment before any stage runs. All subprocesses (plan, implement, verify,
review) inherit the result automatically.
Security warning: because
.gremlins/envis executed as a bash script, it can run arbitrary code. Do not run gremlins in a repository unless you have reviewed the contents of.gremlins/envand trust them.
The file is sourced via bash, so it can use command substitution,
conditionals, and anything bash supports:
export VIRTUAL_ENV=$(poetry env info --path)
export PATH="$VIRTUAL_ENV/bin:$PATH"
export TEST_DATABASE_URL=postgresql://localhost/mydb_testAdd .gremlins/env to your ~/.gitignore_global or project .gitignore.
gremlins/pipeline/loader.py exposes:
load_pipeline(path)→Pipeline— parses a YAML file, resolvesclientsviaCLIENT_FACTORIES, and validates every stagetypeagainstSTAGE_REGISTRY(populated by importinggremlins.stages.all).resolve_pipeline_path(name_or_path, base_dir)— resolves a name or path using the discovery order above.
Dataclasses: Pipeline, StageEntry (parallel groups have type="parallel"
internally and carry a children list and optional max_concurrent).
gremlins/AGENTS.md— module layout, entry points, testability seam, byte-stable stringsgremlins/fleet/AGENTS.md— fleet manager internalsgremlins/orchestrators/AGENTS.md— orchestrator internalsgremlins/stages/AGENTS.md— stage internals