Skip to content

feat: 16 - manifest#18

Merged
joefrost01 merged 1 commit into
mainfrom
feat/16-manifest
Apr 11, 2026
Merged

feat: 16 - manifest#18
joefrost01 merged 1 commit into
mainfrom
feat/16-manifest

Conversation

@joefrost01
Copy link
Copy Markdown
Contributor

What problem are you trying to solve?

There was no manifest sidecar output for query runs, which made orchestration/audit workflows harder because run metadata (inputs, file outcomes, timing, output rows/fingerprint, and command settings) was not persisted in a machine-readable artifact.

What does this PR change?

This adds ManifestWriter (src/manifest.rs) and integrates manifest emission as the final query step when --manifest is provided. The pipeline now captures run/file/output metadata, writes YAML manifests on success and on runtime failures (including --expect-at-least failures), creates parent directories as needed, and warns (without failing the run) if manifest writing fails.

Does this change align with DESIGN.md?

Yes. Query execution order remains intact; manifest writing is supplementary and occurs after execution completes/fails. Core stdout/stderr behavior and exit-code semantics are unchanged.

What alternatives did you consider?

I considered writing manifest data inline directly from query_pipeline without a dedicated module, but that scattered serialization concerns and made testing harder. A dedicated manifest module keeps schema/serialization concerns isolated while pipeline code focuses on data collection.

Does this PR contain multiple unrelated changes?

No. All edits are directly related to feature 16 manifest support and required metadata plumbing.

Existing PRs

  • I have reviewed all open AND closed PRs for duplicates or prior art
  • Related PRs: none found

Testing

  • cargo test passes
  • cargo clippy passes with no warnings
  • cargo fmt has been run
  • New tests added:
    • pipeline_writes_manifest_on_success
    • pipeline_writes_manifest_on_expect_at_least_failure

Evaluation

  • What was the specific scenario you tested?
    • Successful query with --manifest writes manifest YAML with batch/command/files/output/timing fields.
    • Query failing --expect-at-least still writes manifest containing failure metadata.
  • What was the output before and after the change?
    • Before: no manifest sidecar was emitted.
    • After: manifest YAML is written to --manifest path (creating parent directories), with warnings-only behavior on manifest write errors.
  • Did you test error cases (bad input, missing files, invalid SQL)?
    • Yes. Existing suite still covers those paths, and new tests add explicit failure-case manifest coverage.

Human review

  • A human has reviewed the COMPLETE proposed diff before submission

Copy link
Copy Markdown
Contributor Author

@joefrost01 joefrost01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Feature 16 — Manifest Sidecar

Reviewed against specs/16-manifest.md, CLAUDE.md, and DESIGN.md.

Spec Compliance

Requirement Status
YAML structure matches spec ✓ All fields present, correct nesting
batch_id / batch_hash from lineage manager
dtoo_version from env!("CARGO_PKG_VERSION")
command.* from CLI args
files.* from pipeline result (total/processed/skipped/details)
output.rows / output.fingerprint ✓ (None when --fingerprint not used)
timing.* from pipeline start/end
Written as final pipeline step (step 17)
Write-on-failure (--expect-at-least etc.) ✓ — closure pattern ensures manifest written after errors
Parent directory creation ✓ — create_dir_all in ManifestWriter::write
Warning-only on manifest write errors ✓ — eprintln! warning, does not affect exit code
batch_id/batch_hash without --lineage ✓ — LineageManager::new(None, ...) still generates both

Architecture

Clean separation: manifest.rs owns struct definitions and serialization; query_pipeline.rs collects metadata and calls write_manifest_if_requested after the closure returns. The error: Option<String> field (with skip_serializing_if) is a sensible extension for the failure-recording use case.

Minor suggestions (non-blocking)

  1. batch_hash early-failure fallback (query_pipeline.rs:126): The pre-closure fallback uses Uuid::new_v4() for batch_hash. This only matters if the pipeline fails before LineageManager::new (e.g., DuckDB init failure). A deterministic hash would be more semantically correct — could compute it from args upfront — but this is a narrow edge case.

  2. Doc comments on public API (manifest.rs): Per CLAUDE.md, public structs/functions should have doc comments. The lineage.rs additions have them; manifest.rs items do not.

  3. Duplicate format helpers: manifest::output_format_label duplicates query_pipeline::output_format_to_str. Could reuse the existing one.

Tests

Two tests cover the key paths:

  • pipeline_writes_manifest_on_success — verifies manifest written with expected fields
  • pipeline_writes_manifest_on_expect_at_least_failure — verifies manifest written with error info on failure

LGTM. Ready to merge once CI passes.

@joefrost01
Copy link
Copy Markdown
Contributor Author

Manifest Review — 3 Focus Areas

1. Schema Completeness ✅

The implementation covers all fields from the DESIGN.md example schema and adds reasonable extras (dtoo_version, top-level error, output.format, expanded command fields). These are sensible additions for auditability. No fields from the spec are missing.

2. Write-on-Failure Behavior — Bug 🐛

The write_manifest_if_requested call at line 409 fires after the inner closure regardless of success/failure — good design. The --expect-at-least failure path and PartialFailure paths work correctly because they occur after lines 274-275 set summary.files_processed and summary.files_skipped.

However, the --on-error fail file-error path (line 253-254) returns Err(err) from the closure before reaching lines 274-275. This means the manifest is written with:

files:
  total: 5          # correct (set at line 140)
  processed: 0      # wrong — defaults, never updated
  skipped: 0        # wrong — defaults, never updated
  details:          # has real entries pushed at lines 238-243 before the early return
    - path: "file1.csv"
      rows_matched: 100
      status: ok

The processed/skipped summary counts are inconsistent with the details array. Fix: either update summary.files_processed/summary.files_skipped incrementally inside the loop (not after it), or set them from file_details.len() in write_manifest_if_requested.

This path also lacks test coverage — only --expect-at-least failure has a manifest test. A test with --on-error fail + a corrupt file + --manifest would catch this.

3. Warning-Only Manifest Write Failures ✅

Lines 729-734 catch ManifestWriter::write errors, print to stderr, and do not propagate — correctly preserving the pipeline's original exit code. Clean implementation.

Minor

  • ManifestWriter is a unit struct with a single static method — could be a free function (like build_command already is). Not blocking, just a consistency note.

TL;DR: One real bug — files.processed/files.skipped are wrong in the manifest when --on-error fail triggers a file-level error. The fix is straightforward (update summary counts incrementally). Everything else looks solid.

@joefrost01 joefrost01 merged commit b78c9bf into main Apr 11, 2026
6 checks passed
@joefrost01 joefrost01 deleted the feat/16-manifest branch April 11, 2026 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant