Skip to content

feat: 15 - verbose mode#17

Merged
joefrost01 merged 2 commits into
mainfrom
feat/15-verbose-mode
Apr 11, 2026
Merged

feat: 15 - verbose mode#17
joefrost01 merged 2 commits into
mainfrom
feat/15-verbose-mode

Conversation

@joefrost01
Copy link
Copy Markdown
Contributor

What problem are you trying to solve?

--verbose existed as a flag but did not emit structured progress logs across pipeline stages. Users could not observe stage timing, per-file outcomes, or row-count progression without adding ad-hoc prints.

What does this PR change?

This adds a timestamped verbose logger in the query pipeline and logs key execution milestones to stderr (start, file resolution, schema mode, ref loads, per-file success/skip, accumulation, post-sql, masking, lineage, limit, validation, count, output/profile/fingerprint, done). It also adds thousands-separator row formatting and tests for number formatting.

Does this change align with DESIGN.md?

Yes. Output data paths remain unchanged, and verbose logs stay on stderr so stdout output remains pipe-safe. Core execution order and semantics are preserved.

What alternatives did you consider?

I considered introducing a separate logging module and dependency-heavy progress UI first, but that added complexity before behavior was validated. A focused in-pipeline logger with clear timestamp formatting delivered the spec behavior with minimal risk.

Does this PR contain multiple unrelated changes?

No. All changes are scoped to verbose mode behavior for feature 15.

Existing PRs

  • I have reviewed all open AND closed PRs for duplicates or prior art
  • Related PRs: none found

Testing

  • cargo test passes
  • cargo clippy passes with no warnings
  • cargo fmt has been run
  • New tests added:
    • format_count_inserts_thousands_separators

Evaluation

  • What was the specific scenario you tested?
    • End-to-end query pipeline tests (including dry-run paths) with verbose logging enabled in code path
    • Row-count formatting behavior for 0, small, and thousands-sized values
  • What was the output before and after the change?
    • Before: minimal/no structured verbose progress output.
    • After: timestamped stderr progress logs across major pipeline stages with formatted counts.
  • Did you test error cases (bad input, missing files, invalid SQL)?
    • Yes, existing suite still covers those paths and now runs with the verbose instrumentation present.

Human review

  • A human has reviewed the COMPLETE proposed diff before submission

Copy link
Copy Markdown
Contributor Author

@joefrost01 joefrost01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of feat: 15 - verbose mode

Overall this is solid work — the VerboseLogger struct is clean, stderr-only output is correct, timestamp format matches spec, format_count is well-tested, and stdout data output is untouched. However, there's one issue that needs fixing before merge:


1. "Accumulation complete" is logged in the wrong position (must fix)

File: src/query_pipeline.rs:274-278

The "Accumulation complete" log appears after post-sql, masking, lineage, and limit have all been applied. Per the spec, it should appear immediately after the per-file processing loop and before post-sql.

The spec example makes this clear — the counts are different:

[00:12.4] Accumulation complete: 48,721 rows    ← pre-transformation total
[00:12.5] Applying post-sql...
[00:12.8] Post-sql complete: 1,204 rows          ← post-transformation total

Currently the code queries SELECT COUNT(*) FROM temp_results after limit/post-sql/etc. have already mutated the table, so "Accumulation complete" would report e.g. 1,204 instead of 48,721. This is semantically misleading.

Fix: Move the row_count query + "Accumulation complete" log to immediately after the file processing loop (after the processed == 0 check, before the post_sql block). Keep the existing later row_count query for validation/output where it is.


2. Hyphens instead of em-dashes (minor)

The spec consistently uses em-dashes () in log messages:

Loading ref table: regions (ref/regions.parquet) — 250 rows
[1/47] data/2024-01/sales.csv — 1,423 rows matched

The implementation uses regular hyphens (-). See the format! calls in the ref-table loop (~line 163) and per-file success/skip logging (~lines 209, 224).


3. Missing "Exclude" log point (minor, can defer)

The spec lists Excluded {n} files (only if excludes matched) but no such log exists. If the exclude count isn't readily available from resolve_files, this could be a follow-up.


Everything else checks out: pipe safety, dry-run early exit before logging, per-file [i/total] indexing, conditional logging for masking/lineage/limit/validation, and the "Done" summary. Nice format_count implementation with tests.

Not LGTM yet — issue #1 needs to be addressed. After that fix (and the optional em-dash cleanup), this is ready to merge pending CI.

Copy link
Copy Markdown
Contributor Author

@joefrost01 joefrost01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat/15-verbose-mode vs specs/15-verbose-mode.md

Overall this is solid work — the VerboseLogger struct, timestamp format, format_count, and pipeline integration are all clean. A few spec deviations need fixing before merge:

1. Em dash vs hyphen in log messages

The spec uses (em dash) as the separator in several log messages. The implementation uses - (hyphen) instead.

Spec:

[00:00.1] Loading ref table: regions (ref/regions.parquet) — 250 rows
[00:00.2] [1/47] data/2024-01/sales.csv — 1,423 rows matched
[00:00.4] [3/47] data/2024-02/sales.csv — SKIPPED (...)

Actual (in diff):

"Loading ref table: {} ({}) - {} rows"
"[{}/{}] {} - {} rows matched"
"[{}/{}] {} - SKIPPED ({})"

These should use (em dash) to match the spec format.

2. Missing "Exclude" log point

The spec requires:

Excluded {n} files (only if excludes matched)

This log point is absent from the implementation. FileResolver::resolve() currently consumes the exclude count internally, so surfacing it may require returning metadata from the resolver. If this is deferred, it should be tracked as a follow-up.

3. indicatif dependency

The spec lists indicatif = "0.17" under Dependencies. It isn't added in this PR. The spec notes it's for future progress bar use, so this may be intentional — but worth confirming whether it should be added now per the spec or deferred.


Everything else looks correct: timestamp format, all other log points present, thousands-separator formatting, pipe safety (stderr only), closure refactor to capture row counts, and the format_count test. Nice work on keeping the behavioral changes minimal.

Verdict: Not ready to merge. The em dash formatting is a clear spec deviation that should be fixed. The exclude log point and indicatif dependency should be addressed or explicitly deferred.

Copy link
Copy Markdown
Contributor Author

@joefrost01 joefrost01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: All prior findings resolved ✓

Verified the latest commit (74bf7c2) addresses all three issues from the prior review:

  1. Em-dash format — Log messages now use (em dash) matching the spec (query_pipeline.rs:167,218,231)
  2. Excluded files log"Excluded {n} files" is now emitted when excludes match (query_pipeline.rs:131)
  3. indicatif dependencyindicatif = "0.17" added to Cargo.toml

LGTM — ready to merge once CI passes.

@joefrost01 joefrost01 merged commit 5ed8782 into main Apr 11, 2026
6 checks passed
@joefrost01 joefrost01 deleted the feat/15-verbose-mode branch April 11, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant