Skip to content

CI: add smoke suite and gate unit/integration workflows#5

Merged
shaypal5 merged 8 commits into
mainfrom
codex/ci-workflows-smoke-lint-tests
Mar 4, 2026
Merged

CI: add smoke suite and gate unit/integration workflows#5
shaypal5 merged 8 commits into
mainfrom
codex/ci-workflows-smoke-lint-tests

Conversation

@shaypal5

@shaypal5 shaypal5 commented Mar 3, 2026

Copy link
Copy Markdown
Member

Summary

  • add a dedicated tests/smoke suite with fast checks for CLI/config/seen-store
  • update smoke-tests workflow to run the smoke suite directly
  • enforce smoke as a strict prerequisite for unit-tests and integration-tests
  • keep pre-commit, lint, coverage, unit, and integration as separate workflows

Validation

  • ran locally: PYTHONPATH=src pytest -q tests/smoke (3 passed)

Notes

  • unit and integration workflows now trigger only from successful smoke-tests runs on the same head SHA/branch

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a lightweight smoke test suite and restructures CI so smoke tests run first and (attempt to) gate heavier test workflows behind a successful smoke run.

Changes:

  • Introduces tests/smoke with fast checks covering CLI load/version, config defaults, and seen-store persistence.
  • Adds/updates GitHub Actions workflows for smoke, unit, integration, lint, pre-commit, and coverage.
  • Updates dev dependencies to include pytest-cov and pre-commit.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/smoke/test_smoke_suite.py New fast smoke tests for CLI/config/SeenStore.
pyproject.toml Adds dev tools needed by CI (pytest-cov, pre-commit).
.pre-commit-config.yaml Defines pre-commit hooks (basic hygiene + ruff/format).
.pre-commit-ci.yaml Configures pre-commit.ci service behavior (autoupdate schedule, no autofix PRs).
.github/workflows/smoke-tests.yml Runs the smoke suite on PRs and main pushes.
.github/workflows/unit-tests.yml Runs unit tests on workflow_run after smoke completion.
.github/workflows/integration-tests.yml Runs integration tests on workflow_run after smoke completion.
.github/workflows/lint.yml Adds ruff format/lint + mypy workflow.
.github/workflows/pre-commit-ci.yml Runs pre-commit hooks in GitHub Actions.
.github/workflows/codecoverage.yml Runs unit+integration with coverage and uploads XML artifact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/unit-tests.yml Outdated
Comment thread .github/workflows/integration-tests.yml Outdated
Comment thread .github/workflows/unit-tests.yml Outdated
Comment thread .github/workflows/integration-tests.yml Outdated
@shaypal5 shaypal5 merged commit aaf08b2 into main Mar 4, 2026
6 checks passed
@shaypal5 shaypal5 deleted the codex/ci-workflows-smoke-lint-tests branch March 4, 2026 12:28
shaypal5 added a commit that referenced this pull request May 21, 2026
Fixes all 16 issues raised in the post-merge review:

Critical:
- [#1] Orchestrator now checks config.enabled / config.mode at the top of
  evaluate_thin/evaluate_thick: mode=OFF or enabled=False returns a
  _noop_pass immediately without running any stage or writing telemetry;
  SHADOW mode downgrades drop→pass in _conclude while preserving
  stopped_at_stage for recall analysis; ENFORCE respects drops.
- [#2] Stage objects are only instantiated when config.stages.X.enabled is
  True; disabled stages are stored as None, preventing model-load cost for
  stages like C (embedding) and D (SLM) that aren't in use.
- [#3] Added @runtime_checkable StageEvaluator Protocol in models.py with
  uniform evaluate(candidate, pass_kind, body=None) signature; all four
  stage stubs (A–D) updated to that signature so the orchestrator calls
  them uniformly.
- [#4] Removed duplicate ThinOrThick alias from cascade.py; PassKind from
  models.py is the single source of truth.

Major:
- [#5] StoppedAt = StageName | Literal["passed_all"] — no longer a
  copy-paste of the four stage letters.
- [#6] PrefilterDecision.decided_at changed from str to datetime; telemetry
  writer converts to .isoformat() at the serialisation boundary; _path_for
  uses .strftime() directly on the datetime.
- [#7] StageScore.__post_init__ validates p_negative and threshold are both
  in [0.0, 1.0], raising ValueError for out-of-range values.
- [#8] Stage A re-run in evaluate_thick documented with an explicit Note in
  the docstring; thin-result passthrough deferred to a later PR.
- [#9] Test fixtures now use typed aliases (StageName, Verdict, StoppedAt,
  PassKind) — all type: ignore[arg-type] comments removed from helpers.

Minor:
- [#10] flush() removed from PrefilterDecisionWriter.
- [#11] _path_for no longer has a try/except — datetime param makes it
  unnecessary.
- [#12] "short" removed from _hash_config docstring.
- [#13] test_frozen uses pytest.raises(FrozenInstanceError) instead of
  try/except/else antipattern.
- [#14] PrefilterStatePaths converted from pydantic BaseModel to
  @dataclasses.dataclass(frozen=True) — consistent with StageScore /
  PrefilterDecision.
- [#15] __init__.py now exports CandidateView, StageEvaluator,
  PrefilterStatePaths, resolve_prefilter_state_paths, PassKind, Verdict.
- [#16] cli.py summary command no longer hardcodes agents/news/local.yaml;
  prints an actionable error and exits 1 when --config is not supplied.

Tests: 54 → 78 (+24), all passing.
New coverage: StageScore bounds validation (5 tests), StageEvaluator
protocol conformance for all four stages (5 tests), type-alias smoke checks
(4 tests), OFF-mode no-telemetry (2 tests), disabled-flag suppression (1),
shadow/enforce telemetry (2), shadow downgrade with monkeypatched stage (1),
enforce drop with monkeypatched stage (1), disabled-stages-not-instantiated
(1), decided_at-is-datetime (2).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request May 21, 2026
* feat(prefilter): LPF-PR-01 — prefilter package foundation, models, config, telemetry, no-op cascade

Introduces the `denbust.prefilter` package (10 modules, 54 unit tests, 0 ruff/mypy errors):

- `models.py`: `CandidateView` runtime-checkable Protocol, `StageScore` and
  `PrefilterDecision` frozen dataclasses with Literal-typed `StageName`, `PassKind`,
  `Verdict`, and `StoppedAt` fields.
- `config.py`: `PrefilterMode(StrEnum)` (off/shadow/enforce), per-stage configs
  (`StageAConfig`–`StageDConfig`), `PrefilterStagesConfig`, `PrefilterRefreshConfig`,
  and `PrefilterConfig` root with `~`-expansion model_validator.
- `state_paths.py`: `PrefilterStatePaths` pydantic model + `resolve_prefilter_state_paths()`
  anchoring artefacts under `<state_root>/<dataset>/<job>/prefilter/`.
- `telemetry.py`: `PrefilterDecisionWriter` appending decisions to date-sharded
  `<decisions_dir>/YYYY-MM-DD.jsonl` files.
- `cascade.py`: `CascadeOrchestrator` with `evaluate_thin()` / `evaluate_thick()` — always
  returns `verdict="pass"` stub; records every decision via the writer.
- `stage_a.py`–`stage_d.py`: stub `evaluate()` methods returning `None` so the cascade
  always passes through; full implementations land in LPF-PR-03 through LPF-PR-07.
- `cli.py`: `denbust prefilter summary` Typer command stub.
- `__init__.py`: re-exports `CascadeOrchestrator`, `PrefilterConfig`, `PrefilterMode`,
  `PrefilterDecision`, `StageScore`.
- `src/denbust/config.py`: adds `prefilter: PrefilterConfig` field to the root `Config`.
- `src/denbust/cli.py`: registers `prefilter_app` under `denbust prefilter`.
- `README.md`: retitles the cascade section to reflect active implementation.

Cascade ships disabled (`mode: off`); no pipeline insertion in this PR.
54 unit tests covering protocol conformance, config validation, YAML round-trips,
state-path resolution, JSONL telemetry, and cascade no-op behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(plan): mark LPF-PR-01 done, update last-merged-PR reference (#158)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(prefilter): address code-review issues from PR #158 self-review

Fixes all 16 issues raised in the post-merge review:

Critical:
- [#1] Orchestrator now checks config.enabled / config.mode at the top of
  evaluate_thin/evaluate_thick: mode=OFF or enabled=False returns a
  _noop_pass immediately without running any stage or writing telemetry;
  SHADOW mode downgrades drop→pass in _conclude while preserving
  stopped_at_stage for recall analysis; ENFORCE respects drops.
- [#2] Stage objects are only instantiated when config.stages.X.enabled is
  True; disabled stages are stored as None, preventing model-load cost for
  stages like C (embedding) and D (SLM) that aren't in use.
- [#3] Added @runtime_checkable StageEvaluator Protocol in models.py with
  uniform evaluate(candidate, pass_kind, body=None) signature; all four
  stage stubs (A–D) updated to that signature so the orchestrator calls
  them uniformly.
- [#4] Removed duplicate ThinOrThick alias from cascade.py; PassKind from
  models.py is the single source of truth.

Major:
- [#5] StoppedAt = StageName | Literal["passed_all"] — no longer a
  copy-paste of the four stage letters.
- [#6] PrefilterDecision.decided_at changed from str to datetime; telemetry
  writer converts to .isoformat() at the serialisation boundary; _path_for
  uses .strftime() directly on the datetime.
- [#7] StageScore.__post_init__ validates p_negative and threshold are both
  in [0.0, 1.0], raising ValueError for out-of-range values.
- [#8] Stage A re-run in evaluate_thick documented with an explicit Note in
  the docstring; thin-result passthrough deferred to a later PR.
- [#9] Test fixtures now use typed aliases (StageName, Verdict, StoppedAt,
  PassKind) — all type: ignore[arg-type] comments removed from helpers.

Minor:
- [#10] flush() removed from PrefilterDecisionWriter.
- [#11] _path_for no longer has a try/except — datetime param makes it
  unnecessary.
- [#12] "short" removed from _hash_config docstring.
- [#13] test_frozen uses pytest.raises(FrozenInstanceError) instead of
  try/except/else antipattern.
- [#14] PrefilterStatePaths converted from pydantic BaseModel to
  @dataclasses.dataclass(frozen=True) — consistent with StageScore /
  PrefilterDecision.
- [#15] __init__.py now exports CandidateView, StageEvaluator,
  PrefilterStatePaths, resolve_prefilter_state_paths, PassKind, Verdict.
- [#16] cli.py summary command no longer hardcodes agents/news/local.yaml;
  prints an actionable error and exits 1 when --config is not supplied.

Tests: 54 → 78 (+24), all passing.
New coverage: StageScore bounds validation (5 tests), StageEvaluator
protocol conformance for all four stages (5 tests), type-alias smoke checks
(4 tests), OFF-mode no-telemetry (2 tests), disabled-flag suppression (1),
shadow/enforce telemetry (2), shadow downgrade with monkeypatched stage (1),
enforce drop with monkeypatched stage (1), disabled-stages-not-instantiated
(1), decided_at-is-datetime (2).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants