feat: 16 - manifest#18
Conversation
joefrost01
left a comment
There was a problem hiding this comment.
Review: Feature 16 — Manifest Sidecar
Reviewed against specs/16-manifest.md, CLAUDE.md, and DESIGN.md.
Spec Compliance
| Requirement | Status |
|---|---|
| YAML structure matches spec | ✓ All fields present, correct nesting |
batch_id / batch_hash from lineage manager |
✓ |
dtoo_version from env!("CARGO_PKG_VERSION") |
✓ |
command.* from CLI args |
✓ |
files.* from pipeline result (total/processed/skipped/details) |
✓ |
output.rows / output.fingerprint |
✓ (None when --fingerprint not used) |
timing.* from pipeline start/end |
✓ |
| Written as final pipeline step (step 17) | ✓ |
Write-on-failure (--expect-at-least etc.) |
✓ — closure pattern ensures manifest written after errors |
| Parent directory creation | ✓ — create_dir_all in ManifestWriter::write |
| Warning-only on manifest write errors | ✓ — eprintln! warning, does not affect exit code |
batch_id/batch_hash without --lineage |
✓ — LineageManager::new(None, ...) still generates both |
Architecture
Clean separation: manifest.rs owns struct definitions and serialization; query_pipeline.rs collects metadata and calls write_manifest_if_requested after the closure returns. The error: Option<String> field (with skip_serializing_if) is a sensible extension for the failure-recording use case.
Minor suggestions (non-blocking)
-
batch_hashearly-failure fallback (query_pipeline.rs:126): The pre-closure fallback usesUuid::new_v4()forbatch_hash. This only matters if the pipeline fails beforeLineageManager::new(e.g., DuckDB init failure). A deterministic hash would be more semantically correct — could compute it fromargsupfront — but this is a narrow edge case. -
Doc comments on public API (
manifest.rs): Per CLAUDE.md, public structs/functions should have doc comments. Thelineage.rsadditions have them;manifest.rsitems do not. -
Duplicate format helpers:
manifest::output_format_labelduplicatesquery_pipeline::output_format_to_str. Could reuse the existing one.
Tests
Two tests cover the key paths:
pipeline_writes_manifest_on_success— verifies manifest written with expected fieldspipeline_writes_manifest_on_expect_at_least_failure— verifies manifest written with error info on failure
LGTM. Ready to merge once CI passes.
Manifest Review — 3 Focus Areas1. Schema Completeness ✅The implementation covers all fields from the DESIGN.md example schema and adds reasonable extras ( 2. Write-on-Failure Behavior — Bug 🐛The However, the files:
total: 5 # correct (set at line 140)
processed: 0 # wrong — defaults, never updated
skipped: 0 # wrong — defaults, never updated
details: # has real entries pushed at lines 238-243 before the early return
- path: "file1.csv"
rows_matched: 100
status: okThe This path also lacks test coverage — only 3. Warning-Only Manifest Write Failures ✅Lines 729-734 catch Minor
TL;DR: One real bug — |
What problem are you trying to solve?
There was no manifest sidecar output for query runs, which made orchestration/audit workflows harder because run metadata (inputs, file outcomes, timing, output rows/fingerprint, and command settings) was not persisted in a machine-readable artifact.
What does this PR change?
This adds
ManifestWriter(src/manifest.rs) and integrates manifest emission as the final query step when--manifestis provided. The pipeline now captures run/file/output metadata, writes YAML manifests on success and on runtime failures (including--expect-at-leastfailures), creates parent directories as needed, and warns (without failing the run) if manifest writing fails.Does this change align with DESIGN.md?
Yes. Query execution order remains intact; manifest writing is supplementary and occurs after execution completes/fails. Core stdout/stderr behavior and exit-code semantics are unchanged.
What alternatives did you consider?
I considered writing manifest data inline directly from
query_pipelinewithout a dedicated module, but that scattered serialization concerns and made testing harder. A dedicated manifest module keeps schema/serialization concerns isolated while pipeline code focuses on data collection.Does this PR contain multiple unrelated changes?
No. All edits are directly related to feature 16 manifest support and required metadata plumbing.
Existing PRs
Testing
cargo testpassescargo clippypasses with no warningscargo fmthas been runpipeline_writes_manifest_on_successpipeline_writes_manifest_on_expect_at_least_failureEvaluation
--manifestwrites manifest YAML with batch/command/files/output/timing fields.--expect-at-leaststill writes manifest containing failure metadata.--manifestpath (creating parent directories), with warnings-only behavior on manifest write errors.Human review