Skip to content

Add input sources mechanism for pipeline artifact seeding#1082

Merged
xbrianh merged 23 commits into
mainfrom
issue-1053-input-sources-iter3
Jun 3, 2026
Merged

Add input sources mechanism for pipeline artifact seeding#1082
xbrianh merged 23 commits into
mainfrom
issue-1053-input-sources-iter3

Conversation

@xbrianh
Copy link
Copy Markdown
Owner

@xbrianh xbrianh commented Jun 2, 2026

Summary

Introduces an input sources mechanism that allows pipelines to declare external artifact sources in their YAML configuration. This enables systematic handling of external inputs (plan files, issue references, instructions) rather than hardcoding registry pre-seeding logic.

Changes

  • InputSource and InputSources classes (gremlins/pipeline/inputs.py): Define and parse input source declarations from pipeline YAML, supporting flexible type systems (filepath, string) and optional/required markers.
  • Pipeline integration: Pipeline now stores and parses input_sources from the inputs: block in YAML.
  • Registry pre-seeding (launcher.py): New _seed_registry_from_sources function drives artifact registration based on declared sources, with legacy fallback for pipelines without input_sources.
  • gh-terse.yaml updated: Declares issue, plan, and instructions sources to replace hardcoded registry initialization.
  • Comprehensive tests (test_input_sources.py, test_launcher_input_sources.py): Cover parsing, validation, optional/required handling, union types, legacy behavior, and error cases.

Design

The mechanism separates artifact declaration (pipeline YAML) from artifact loading (launcher), allowing:

  • Type-safe source definitions (string vs file paths)
  • Optional vs required source handling
  • Validation at parse time
  • Clear pipeline intent for external dependencies

Closes #1053

xbrianh and others added 7 commits June 2, 2026 11:53
- Define InputSource and InputSources classes to declare external artifact sources
- Parse sources: block from inputs: section in pipeline YAML
- Update Pipeline to store input_sources
- Register instructions artifact in artifact directory
- Add _seed_registry_from_sources function to drive pre-seeding from sources
- Update launcher to use sources for registry pre-seeding with fallback to legacy hardcoding
- Update gh-terse.yaml to declare issue, plan, and instructions sources
- Test legacy fallback when no input_sources present
- Test string sources for issue, plan, and instructions
- Test optional vs required source handling
- Test union types (filepath + string)
- Test error cases for missing required sources
- Resolve values based on source key (plan, issue, instructions)
- Avoid mixing plan argument with issue/plan source semantics
- Issue only uses plan arg if it's not a file (i.e., issue reference)
- Improves testability and semantic correctness
Removed unused pathlib import and backward-compat legacy path. Trimmed oversized docstrings to one-line max. Moved empty-types validation to top of __post_init__. Deleted dead methods all_sources() and required_sources(). Eliminated what-narration comments throughout _seed_registry_from_sources. Changed exception handling to only catch file-not-found errors, letting validation errors propagate. Added validation in InputSources.from_yaml to restrict allowed source keys at parse time. Threaded loaded_pipeline through _Inputs to eliminate duplicate pipeline loads.

Addresses code review and complexity review findings.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@xbrianh xbrianh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the design is clean — decoupling source declarations from launcher logic is the right move. Four issues worth addressing: a weak type annotation on _Inputs, a re-export in pipeline/__init__.py that violates the project style, and two redundant file writes introduced alongside the new seeding logic.

Comment thread gremlins/launcher.py Outdated
Comment thread gremlins/pipeline/__init__.py Outdated
Comment thread gremlins/launcher.py Outdated
Comment thread gremlins/launcher.py Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a first-class “input sources” declaration under inputs: in pipeline YAML so the launcher can pre-seed the artifact registry from externally-provided inputs (e.g., plan/issue/instructions) in a structured, validated way, instead of relying on ad-hoc hardcoded seeding.

Changes:

  • Introduces InputSource / InputSources parsing and attaches parsed inputs.sources onto Pipeline.
  • Updates the launcher to seed the artifact registry based on declared input_sources, with legacy fallback behavior.
  • Updates gh-terse.yaml to declare sources and adds focused test coverage for parsing + launcher seeding.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
gremlins/pipeline/inputs.py New schema/validation and YAML parsing for inputs.sources.
gremlins/pipeline/__init__.py Parses inputs.sources during pipeline load and stores it on Pipeline.
gremlins/launcher.py Adds registry seeding logic driven by declared input sources and adjusts state dir preparation.
gremlins/pipelines/gh-terse.yaml Declares issue/plan/instructions sources in pipeline YAML.
tests/test_input_sources.py Unit tests for parsing/validation and pipeline integration.
tests/test_launcher_input_sources.py Tests launcher registry seeding behavior (including legacy fallback and union types).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gremlins/launcher.py Outdated
Comment thread gremlins/pipelines/gh-terse.yaml
Comment thread gremlins/pipeline/inputs.py Outdated
Comment thread gremlins/pipeline/inputs.py Outdated
xbrianh added 3 commits June 2, 2026 12:17
- Fix type annotation on loaded_pipeline to _PipelineData | None (not Any)
- Move InputSources import under TYPE_CHECKING to avoid re-export
- Remove redundant instructions.txt write in _prepare_state_dir
- Remove redundant plan-arg.txt write in legacy fallback path
- Always seed plan_arg when --plan is provided, even with input_sources
- Make issue source optional in gh-terse.yaml for file-backed plans
- Validate optional field is actually boolean in InputSources.from_yaml
- Allow arbitrary source keys in InputSources.from_yaml for extensibility
Comment thread gremlins/pipelines/gh-terse.yaml Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

gremlins/run_child.py:124

  • _build_state() no longer passes an instructions value into build_state(), but the file’s top-level docstring still lists an "instructions" field in the spec JSON schema. That’s now inaccurate and can confuse callers producing spec files.

Update the documented spec schema to match the actual fields consumed.

    return build_state(
        data=data,
        client=client,
        artifact_dir=artifact_dir,
        pipeline_data=pipeline_data,
        repo=str(spec.get("repo") or ""),
        child_key=spec.get("child_key") or None,
        parent_stage=str(spec.get("parent_stage") or ""),
        worktree=worktree,
        worktree_parent=worktree_parent,
    )

Comment on lines 4 to 8
- name: resolve-plan-input
type: exec
in:
plan_arg: "plan_arg?"
plan: "plan?"
options:
Comment on lines 4 to 8
- name: resolve-plan-input
type: exec
in:
plan_arg: "plan_arg?"
plan: "plan?"
options:
Comment thread gremlins/launcher.py
Comment on lines +373 to +395
def _seed_registry_from_sources(
registry: ArtifactRegistry,
input_values: dict[str, str],
sources: dict[str, Any],
artifacts_dir: pathlib.Path,
) -> None:
for key, source in sources.items():
value = input_values.get(key) or None
if not value:
if not source.optional:
raise ValueError(
f"required input source {key!r} (type: {source.types}) is not available"
)
continue
for t in source.types:
if t == "filepath" and os.path.isfile(value):
registry.bind(key, Uri.parse(f"file://{value}"))
break
elif t == "string":
dest = artifacts_dir / f"{key}.txt"
dest.write_text(value, encoding="utf-8")
registry.bind(key, Uri.parse(f"file://session/{key}.txt"))
break
Comment on lines 8 to +17
in:
INSTRUCTIONS: instructions?
PLAN: plan?
PR: pr?
sources:
plan:
type: [filepath, string]
optional: true
instructions:
type: string
optional: true
Comment thread .gremlins/gh-terse.yaml
Comment on lines 7 to +16
in:
INSTRUCTIONS: instructions?
PLAN: plan?
PR: pr?
sources:
plan:
type: [filepath, string]
optional: true
instructions:
type: string
optional: true
Comment thread gremlins/launcher.py
Comment on lines +387 to +390
for t in source.types:
if t == "filepath" and os.path.isfile(value):
registry.bind(key, Uri.parse(f"file://{value}"))
break
Comment thread gremlins/spawn/child.py
Comment on lines 125 to 136
return build_state(
data=data,
client=client,
artifact_dir=artifact_dir,
pipeline_data=pipeline_data,
child_key=spec.get("child_key") or None,
parent_stage=str(spec.get("parent_stage") or ""),
worktree=worktree,
worktree_parent=worktree_parent,
repo=str(spec.get("repo") or ""),
instructions=str(spec.get("instructions") or ""),
base_ref=str(spec.get("base_ref") or ""),
)
xbrianh and others added 3 commits June 3, 2026 09:46
…seeding

- plan-document is the resolved markdown artifact; plan remains the raw
  input source binding (--plan CLI arg), eliminating the DuplicateArtifact
  collision that prevented recipes from rebinding the resolved content
- filepath input sources now copy to session artifacts instead of binding
  an external file:// URI, so resolve-plan-input's early-exit check fires
  correctly and the plan-document binding succeeds

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ant cases

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@xbrianh xbrianh merged commit 31e506e into main Jun 3, 2026
1 check passed
@xbrianh xbrianh deleted the issue-1053-input-sources-iter3 branch June 3, 2026 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

inputs: sources — make inputs: the sole owner of external artifact seeding

2 participants