CI: add smoke suite and gate unit/integration workflows by shaypal5 · Pull Request #5 · DataHackIL/tfht_enforce_idx

shaypal5 · 2026-03-03T14:29:59Z

Summary

add a dedicated tests/smoke suite with fast checks for CLI/config/seen-store
update smoke-tests workflow to run the smoke suite directly
enforce smoke as a strict prerequisite for unit-tests and integration-tests
keep pre-commit, lint, coverage, unit, and integration as separate workflows

Validation

ran locally: PYTHONPATH=src pytest -q tests/smoke (3 passed)

Notes

unit and integration workflows now trigger only from successful smoke-tests runs on the same head SHA/branch

Copilot

Pull request overview

Adds a lightweight smoke test suite and restructures CI so smoke tests run first and (attempt to) gate heavier test workflows behind a successful smoke run.

Changes:

Introduces tests/smoke with fast checks covering CLI load/version, config defaults, and seen-store persistence.
Adds/updates GitHub Actions workflows for smoke, unit, integration, lint, pre-commit, and coverage.
Updates dev dependencies to include pytest-cov and pre-commit.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/smoke/test_smoke_suite.py`	New fast smoke tests for CLI/config/SeenStore.
`pyproject.toml`	Adds dev tools needed by CI (pytest-cov, pre-commit).
`.pre-commit-config.yaml`	Defines pre-commit hooks (basic hygiene + ruff/format).
`.pre-commit-ci.yaml`	Configures pre-commit.ci service behavior (autoupdate schedule, no autofix PRs).
`.github/workflows/smoke-tests.yml`	Runs the smoke suite on PRs and main pushes.
`.github/workflows/unit-tests.yml`	Runs unit tests on `workflow_run` after smoke completion.
`.github/workflows/integration-tests.yml`	Runs integration tests on `workflow_run` after smoke completion.
`.github/workflows/lint.yml`	Adds ruff format/lint + mypy workflow.
`.github/workflows/pre-commit-ci.yml`	Runs pre-commit hooks in GitHub Actions.
`.github/workflows/codecoverage.yml`	Runs unit+integration with coverage and uploads XML artifact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Fixes all 16 issues raised in the post-merge review: Critical: - [#1] Orchestrator now checks config.enabled / config.mode at the top of evaluate_thin/evaluate_thick: mode=OFF or enabled=False returns a _noop_pass immediately without running any stage or writing telemetry; SHADOW mode downgrades drop→pass in _conclude while preserving stopped_at_stage for recall analysis; ENFORCE respects drops. - [#2] Stage objects are only instantiated when config.stages.X.enabled is True; disabled stages are stored as None, preventing model-load cost for stages like C (embedding) and D (SLM) that aren't in use. - [#3] Added @runtime_checkable StageEvaluator Protocol in models.py with uniform evaluate(candidate, pass_kind, body=None) signature; all four stage stubs (A–D) updated to that signature so the orchestrator calls them uniformly. - [#4] Removed duplicate ThinOrThick alias from cascade.py; PassKind from models.py is the single source of truth. Major: - [#5] StoppedAt = StageName | Literal["passed_all"] — no longer a copy-paste of the four stage letters. - [#6] PrefilterDecision.decided_at changed from str to datetime; telemetry writer converts to .isoformat() at the serialisation boundary; _path_for uses .strftime() directly on the datetime. - [#7] StageScore.__post_init__ validates p_negative and threshold are both in [0.0, 1.0], raising ValueError for out-of-range values. - [#8] Stage A re-run in evaluate_thick documented with an explicit Note in the docstring; thin-result passthrough deferred to a later PR. - [#9] Test fixtures now use typed aliases (StageName, Verdict, StoppedAt, PassKind) — all type: ignore[arg-type] comments removed from helpers. Minor: - [#10] flush() removed from PrefilterDecisionWriter. - [#11] _path_for no longer has a try/except — datetime param makes it unnecessary. - [#12] "short" removed from _hash_config docstring. - [#13] test_frozen uses pytest.raises(FrozenInstanceError) instead of try/except/else antipattern. - [#14] PrefilterStatePaths converted from pydantic BaseModel to @dataclasses.dataclass(frozen=True) — consistent with StageScore / PrefilterDecision. - [#15] __init__.py now exports CandidateView, StageEvaluator, PrefilterStatePaths, resolve_prefilter_state_paths, PassKind, Verdict. - [#16] cli.py summary command no longer hardcodes agents/news/local.yaml; prints an actionable error and exits 1 when --config is not supplied. Tests: 54 → 78 (+24), all passing. New coverage: StageScore bounds validation (5 tests), StageEvaluator protocol conformance for all four stages (5 tests), type-alias smoke checks (4 tests), OFF-mode no-telemetry (2 tests), disabled-flag suppression (1), shadow/enforce telemetry (2), shadow downgrade with monkeypatched stage (1), enforce drop with monkeypatched stage (1), disabled-stages-not-instantiated (1), decided_at-is-datetime (2). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(prefilter): LPF-PR-01 — prefilter package foundation, models, config, telemetry, no-op cascade Introduces the `denbust.prefilter` package (10 modules, 54 unit tests, 0 ruff/mypy errors): - `models.py`: `CandidateView` runtime-checkable Protocol, `StageScore` and `PrefilterDecision` frozen dataclasses with Literal-typed `StageName`, `PassKind`, `Verdict`, and `StoppedAt` fields. - `config.py`: `PrefilterMode(StrEnum)` (off/shadow/enforce), per-stage configs (`StageAConfig`–`StageDConfig`), `PrefilterStagesConfig`, `PrefilterRefreshConfig`, and `PrefilterConfig` root with `~`-expansion model_validator. - `state_paths.py`: `PrefilterStatePaths` pydantic model + `resolve_prefilter_state_paths()` anchoring artefacts under `<state_root>/<dataset>/<job>/prefilter/`. - `telemetry.py`: `PrefilterDecisionWriter` appending decisions to date-sharded `<decisions_dir>/YYYY-MM-DD.jsonl` files. - `cascade.py`: `CascadeOrchestrator` with `evaluate_thin()` / `evaluate_thick()` — always returns `verdict="pass"` stub; records every decision via the writer. - `stage_a.py`–`stage_d.py`: stub `evaluate()` methods returning `None` so the cascade always passes through; full implementations land in LPF-PR-03 through LPF-PR-07. - `cli.py`: `denbust prefilter summary` Typer command stub. - `__init__.py`: re-exports `CascadeOrchestrator`, `PrefilterConfig`, `PrefilterMode`, `PrefilterDecision`, `StageScore`. - `src/denbust/config.py`: adds `prefilter: PrefilterConfig` field to the root `Config`. - `src/denbust/cli.py`: registers `prefilter_app` under `denbust prefilter`. - `README.md`: retitles the cascade section to reflect active implementation. Cascade ships disabled (`mode: off`); no pipeline insertion in this PR. 54 unit tests covering protocol conformance, config validation, YAML round-trips, state-path resolution, JSONL telemetry, and cascade no-op behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(plan): mark LPF-PR-01 done, update last-merged-PR reference (#158) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(prefilter): address code-review issues from PR #158 self-review Fixes all 16 issues raised in the post-merge review: Critical: - [#1] Orchestrator now checks config.enabled / config.mode at the top of evaluate_thin/evaluate_thick: mode=OFF or enabled=False returns a _noop_pass immediately without running any stage or writing telemetry; SHADOW mode downgrades drop→pass in _conclude while preserving stopped_at_stage for recall analysis; ENFORCE respects drops. - [#2] Stage objects are only instantiated when config.stages.X.enabled is True; disabled stages are stored as None, preventing model-load cost for stages like C (embedding) and D (SLM) that aren't in use. - [#3] Added @runtime_checkable StageEvaluator Protocol in models.py with uniform evaluate(candidate, pass_kind, body=None) signature; all four stage stubs (A–D) updated to that signature so the orchestrator calls them uniformly. - [#4] Removed duplicate ThinOrThick alias from cascade.py; PassKind from models.py is the single source of truth. Major: - [#5] StoppedAt = StageName | Literal["passed_all"] — no longer a copy-paste of the four stage letters. - [#6] PrefilterDecision.decided_at changed from str to datetime; telemetry writer converts to .isoformat() at the serialisation boundary; _path_for uses .strftime() directly on the datetime. - [#7] StageScore.__post_init__ validates p_negative and threshold are both in [0.0, 1.0], raising ValueError for out-of-range values. - [#8] Stage A re-run in evaluate_thick documented with an explicit Note in the docstring; thin-result passthrough deferred to a later PR. - [#9] Test fixtures now use typed aliases (StageName, Verdict, StoppedAt, PassKind) — all type: ignore[arg-type] comments removed from helpers. Minor: - [#10] flush() removed from PrefilterDecisionWriter. - [#11] _path_for no longer has a try/except — datetime param makes it unnecessary. - [#12] "short" removed from _hash_config docstring. - [#13] test_frozen uses pytest.raises(FrozenInstanceError) instead of try/except/else antipattern. - [#14] PrefilterStatePaths converted from pydantic BaseModel to @dataclasses.dataclass(frozen=True) — consistent with StageScore / PrefilterDecision. - [#15] __init__.py now exports CandidateView, StageEvaluator, PrefilterStatePaths, resolve_prefilter_state_paths, PassKind, Verdict. - [#16] cli.py summary command no longer hardcodes agents/news/local.yaml; prints an actionable error and exits 1 when --config is not supplied. Tests: 54 → 78 (+24), all passing. New coverage: StageScore bounds validation (5 tests), StageEvaluator protocol conformance for all four stages (5 tests), type-alias smoke checks (4 tests), OFF-mode no-telemetry (2 tests), disabled-flag suppression (1), shadow/enforce telemetry (2), shadow downgrade with monkeypatched stage (1), enforce drop with monkeypatched stage (1), disabled-stages-not-instantiated (1), decided_at-is-datetime (2). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

shaypal5 added 2 commits March 3, 2026 16:11

Add separate CI workflows with smoke-gated tests

30c0c37

ci: add smoke suite and gate unit/integration workflows

df9e081

shaypal5 requested a review from Copilot March 3, 2026 14:54

Copilot started reviewing on behalf of shaypal5 March 3, 2026 14:55 View session

ci: fix lint import order and ignore fixture html whitespace

74efd0b

Copilot AI reviewed Mar 3, 2026

View reviewed changes

Comment thread .github/workflows/unit-tests.yml Outdated

Comment thread .github/workflows/integration-tests.yml Outdated

Comment thread .github/workflows/unit-tests.yml Outdated

Comment thread .github/workflows/integration-tests.yml Outdated

shaypal5 added 3 commits March 3, 2026 17:01

ci: skip eof fixer for fixture html files

eefbd5d

chore: remove trailing whitespace for pre-commit

9285503

test: make integration fixture date window stable

a74ad8b

shaypal5 added configuration enhancement New feature or request ci tests labels Mar 4, 2026

shaypal5 added 2 commits March 4, 2026 12:25

ci: harden workflow_run test jobs for concurrency and fork safety

457d128

Consolidate CI into single ci-test workflow

eacb5ce

shaypal5 merged commit aaf08b2 into main Mar 4, 2026
6 checks passed

shaypal5 deleted the codex/ci-workflows-smoke-lint-tests branch March 4, 2026 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: add smoke suite and gate unit/integration workflows#5

CI: add smoke suite and gate unit/integration workflows#5
shaypal5 merged 8 commits into
mainfrom
codex/ci-workflows-smoke-lint-tests

shaypal5 commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaypal5 commented Mar 3, 2026

Summary

Validation

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants