Skip to content

pandas: testing lift seat + showcase + the 2×4 guard-shape matrix#1937

Open
TSavo wants to merge 8 commits into
mainfrom
codex/pandas-testing-seat
Open

pandas: testing lift seat + showcase + the 2×4 guard-shape matrix#1937
TSavo wants to merge 8 commits into
mainfrom
codex/pandas-testing-seat

Conversation

@TSavo
Copy link
Copy Markdown
Owner

@TSavo TSavo commented Jun 5, 2026

The next package up the ladder from numpy is pandas — and per the federation design, it is just another lifter package. This PR adds the seat, the showcase that proves real pandas with zero changes, and the 2×4 guard-shape matrix across numpy + pandas.

1. provekit-lift-py-pandas-testing (the seat)

Exact sibling of provekit-lift-py-numpy-testing, reusing the pytest seat's term/formula machinery. The EXACT/APPROX discipline is sharper than numpy because pandas is approximate-by-default: assert_frame_equal/assert_series_equal compare floats with a tolerance unless check_exact is pinned (pandas 3.0.3: the default is the <no_default> sentinel). So it lifts as = only when check_exact=True is explicitly pinned and no relation-altering kwarg is present; everything else is a loud refuse. Claims the whole pandas._testing (tm) 22-name vocabulary so nothing real is silently skipped. 17 unit tests; wired into make test-python.

Honest scope: frame equality is opaque-EUF on both sides, so z3 can't contradict two opaque DataFrames (same as numpy array equality). The consistency teeth come from scalar assertions; this seat's job is sound refusal-of-approximate + witness keying.

2. examples/pandas-showcase (the proof)

Real pandas project with one buggy self-contradictory test → proves the good code and refuses the bug both ways (consistency: ==6 consistent / ==6 AND ==7 UNSAT; witness: reproduces / run failed). run.sh self-checks. Green: 4 discharged, 2 refused.

3. examples/python-guard-shapes (the 2×4 matrix)

Guard shapes #2 index-bounds, #3 empty-container, #4 divide-by-zero, #5 key-access × {numpy, pandas} = 8 cells, each a real library idiom with a discrimination mutant.

These are runtime faults (IndexError/ValueError/KeyError/silent inf), not logical contradictions — so the consistency axis is blind and only the witness axis catches them. Divide-by-zero is the sharp cell: numpy/pandas return silent inf, caught via an isfinite assertion. run.sh checks the verdict per file (every _ok discharged, every _bad refused). Green: 8 discharged, 8 refused.

Signed-receipt verify

The stricter verify (witness-body re-resolution) now passes too. A one-line fix: the resolve_witness recompute guard tested code_files for truthiness, but an all-tests project pins an empty code_files (the code under test is the installed library), and [] is falsy. Gated on presence instead; verify now re-resolves the good witnesses by recompute and refuses the bad ones as failed runs. Regression test added (kit: 20 passed).

Summary by CodeRabbit

  • New Features

    • Added pandas.testing assertion lifting with exact-equality pinning
    • Whole-suite, content-addressed witness packages for pytest runs
    • New pandas-showcase and python-guard-shapes examples with runnable self-check scripts
  • Documentation

    • Example READMEs explaining multi-axis verification and witness environment setup
  • Bug Fixes

    • Improved RPC UTF-8 serialization to avoid encoding failures

TSavo and others added 2 commits June 5, 2026 16:15
The next package up the ladder from numpy is pandas, and per the federation
design it is just another lifter package: provekit-lift-py-pandas-testing, the
exact sibling of provekit-lift-py-numpy-testing, reusing the pytest seat's
term/formula machinery so equivalent claims federate.

The pandas EXACT/APPROXIMATE discipline is SHARPER than numpy's because pandas
makes approximate the DEFAULT: assert_frame_equal / assert_series_equal compare
floats with a tolerance unless check_exact is pinned. (Confirmed against pandas
3.0.3: check_exact's default is the <no_default> sentinel, resolved at runtime
by dtype.) So lifting an un-pinned frame assertion as `=` would FALSE-PASS,
claiming exact equality pandas never checked. Version-independently, the seat
lifts a frame/series/index/extension-array assertion as `=` ONLY when
check_exact=True is explicitly pinned AND no relation-altering keyword
(check_like, rtol/atol, check_dtype, ...) is present; everything else is a LOUD
REFUSE.

The whole pandas._testing (tm) 22-name vocabulary is recognised so nothing real
is silently skipped -- the trichotomy: express the exact, loudly refuse the
rest. assert_equal is deliberately excluded (cross-library generic name owned by
the numpy/generic seat).

Honest scope: a frame/series equality is opaque-EUF on both sides, so z3 cannot
manufacture a contradiction between two opaque DataFrame constructors -- same as
numpy array equality. The CONSISTENCY teeth for pandas come from SCALAR
assertions, which the pytest bare-assert seat already lifts (and this seat lifts
when they ride a claimed pandas.testing test). This seat's load-bearing job is
sound loud-refusal of approximate frame comparison plus callsite keying of the
pandas op under test for the witness axis.

17 unit tests: positive lifts (check_exact pinned), the approximate-by-default
refusal, relation-altering-kwarg refusals, mutation/control-flow/side-effect
refusals, SSA independence, broad-tm claim+refuse, and assert_equal non-claim.
Wired into the Makefile test-python loop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stands up pandas next to numpy, proving correctness with zero code changes.
Differs from numpy-showcase only by swapping in the pandas.testing seat and
pointing the witness venv at pandas -- the next package is just another lifter
package, as designed.

Three lift seats over the project: the plain pytest CONSISTENCY seat (scalar
assertions, where z3's teeth are), the pandas.testing seat (frame assertions;
approximate-by-default refused unless check_exact pinned), and pytest-witness
(runs the tests under real pandas). The project deliberately contains a buggy
(self-contradictory) test, so the showcase proves the CORRECT pandas code AND
catches the contradiction -- refused BOTH ways:
  consistency: Series.sum() == 6 consistent (discharged); == 6 AND == 7 UNSAT
               (refused).
  witness    : the good tests reproduce (discharged); the contradictory run is
               'failed' (refused).

run.sh is self-checking: it asserts provekit produces exactly that verdict and
exits non-zero otherwise. Verified green (4 discharged, 2 refused).

Scope: demonstrates lift + prove (consistency + witness by recompute). The
stricter signed-receipt verify path re-resolves each witness body, which for the
pytest-witness kit needs a packaged witness body or full re-run metadata
forwarded by the verifier -- a follow-up increment, noted in the README.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

Looking for one thing? Review this PR in Change Stack to search files, summaries, diffs, and code without losing your place.

Review Change Stack

Walkthrough

This PR adds a pandas.testing assertion lifter with an LSP server and tests, changes pytest-witness to emit and resolve suite-level witness packages, integrates the pandas suite into Makefile tests, and adds two runnable example projects (pandas-showcase and python-guard-shapes) with run scripts that self-validate witness outcomes.

Changes

Pandas.testing Lift Implementation

Layer / File(s) Summary
Package setup
implementations/python/provekit-lift-py-pandas-testing/pyproject.toml, implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/__init__.py
Package metadata, setuptools build config, and module initializer re-exporting lift_file_pandas_testing.
LSP server and file handling
implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/lsp.py
JSON-RPC over stdio with handlers for initialize, kit_declaration, and lift; filesystem traversal to discover .py files; per-file lifting into canonical JSON IR and aggregated warnings.
Pandas.testing assertion lifting
implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py
Core AST-based lifter recognizing pandas.testing equality assertions, enforcing check_exact=True for exact lifting, refusing approximate/tolerance assertions, gating permitted statements, performing SSA/versioning, and grouping atoms by callsite-keyed contract.
Lift test suite
implementations/python/provekit-lift-py-pandas-testing/tests/test_pandas_testing_layer.py
Unit tests covering positive lifting, refusal rules, SSA independence, statement-gating refusals, and claiming discipline for pandas.testing vs broader tm vocabulary.
Build integration
Makefile
Adds the pandas-testing suite run in a dedicated venv under test-python and aggregates failures into the existing $$failed check.

Pandas Showcase (Dual-Axis Example)

Layer / File(s) Summary
Configuration and infrastructure
examples/pandas-showcase/.provekit/config.toml, .provekit/lift/*/manifest.toml, examples/pandas-showcase/README.md, examples/pandas-showcase/run.sh
Project config defining three lift surfaces (python, python-pandas-testing, python-pytest-witness), Z3 solver config, lift kit manifests, README, and orchestration script that runs provekit mint/prove and self-validates consistency + witness verdicts.
Test cases
examples/pandas-showcase/test_pandas_frame.py, examples/pandas-showcase/test_pandas_sum.py, examples/pandas-showcase/test_pandas_sum_bad.py
Frame round-trip pinned to exact equality, column-sum verification, and a deliberately contradictory test producing refusal on the consistency and witness axes.

Python Guard Shapes (Runtime Guard Example)

Layer / File(s) Summary
Configuration and infrastructure
examples/python-guard-shapes/.provekit/config.toml, .provekit/lift/python-pytest-witness/manifest.toml, examples/python-guard-shapes/README.md, examples/python-guard-shapes/run.sh
Project config and manifest for pytest-witness lift, README documenting the 4 bug classes × 2 libraries matrix, and run script that provisions a witness venv, mints witness bundles, and validates witness-package outcomes via JSON parsing.
Test cases
examples/python-guard-shapes/test_*_numpy_{ok,bad}.py, examples/python-guard-shapes/test_*_pandas_{ok,bad}.py
Sixteen tests covering index bounds, empty container, divide-by-zero, and key access across NumPy and pandas with ok/bad variants used by the witness self-checks.

Pytest Witness: suite packaging and runtime

Layer / File(s) Summary
Witness collection plugin
implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/_witness_collect.py
Pytest plugin that records per-node outcomes and can write a JSON map to PROVEKIT_WITNESS_OUT for downstream use.
Suite-level witness packaging
implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/lift_lsp.py
handle_lift now builds a content-addressed witness bundle for the suite and emits a single ContractDecl named witness-package:{cid} with suite metadata; handle_resolve_witness supports package recompute and verification.
Witness runtime & bundle primitives
implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/witness.py
Per-file pytest runs producing per-node witnesses, bundle read/write/lookup helpers, suite bundle building, package mementos, and discharge logic that refuses when any per-test witness is not passed.
Witness tests
implementations/python/provekit-lift-py-pytest-witness/tests/test_witness.py
Tests for suite packaging, refusal on failing tests or tampered CID, per-test witness minting/recompute, content-addressed JSONL semantics, and an RPC recompute regression when code_files is empty.

RPC & Source-oracle changes

Layer / File(s) Summary
Bind lifter: SourceMemento
implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_lifter.py
Switches library sugar binding body_source to a lean SourceMemento (CIDs + locus), adds source_memento_of helper, and ensures _body_source_locator populates template/body_text fields.
RPC materialize and template handling
implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py
Materialize now resolves missing body_text via the source oracle for lean entries, _binding_template_from_sugar_entry keys on template_cid and carries body_source/source_function_name, and _send() writes UTF-8 bytes to stdout.buffer with replace semantics.
Source oracle recompute behavior
implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/source_oracle.py
Resolve no longer overrides PROVEKIT_LEAN_SOURCE when recomputing, so lean-mode environment can affect recomputation results.
Tests updated for lean body_source
implementations/python/provekit-lift-python-source/tests/*
Multiple tests updated to expect lean body_source (template_cid/source_cid) and to recompute expected templates from on-disk source where needed.

Repository-level / CI changes

Layer / File(s) Summary
Makefile integration
Makefile
test-python now runs the pandas-testing suite (venv + pytest) and aggregates its failures into the $$failed result.
CI/workflow and config removals
.github/workflows/provekit.yml, tsconfig.json
Removes the previous GitHub Actions workflow and the tsconfig.json shown in the diff.
Rust test deletion
implementations/rust/provekit-cli/tests/cli_surface.rs
Removes an obsolete CLI test block asserting Bug Zoo harness self-containment.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

A rabbit hops through testing ground,
Where pandas assertions sound,
Exact pins kept with gentle art,
Suite bundles hold each test's heart,
Witness and proof now dance in sync — hooray! 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.28% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: addition of a pandas testing lift seat, a pandas showcase example, and a 2×4 guard-shape matrix of test examples.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/pandas-testing-seat

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07533f0056

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +327 to +329
except ValueError as e:
skipped.append(f"`{_unparse(stmt)[:60]}`: {e}")
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Refuse the whole test after an unliftable pandas assertion

When a test mixes an unpinned/approximate assert_frame_equal(...) with any liftable bare assert, this continue only records the pandas refusal in skipped; if another assertion is keyed later, the function still emits a successful partial contract. That bypasses the intended loud refusal for approximate pandas comparisons and can let prove pass while silently ignoring the frame assertion that made the test unsafe to lift as exact equality.

Useful? React with 👍 / 👎.

Guard shapes #2-#5 (index-bounds, empty-container, divide-by-zero, key-access)
proved on BOTH numpy and pandas -- 8 cells, each a real library idiom with a
discrimination mutant. Zero code changes.

These are RUNTIME faults (IndexError / ValueError / KeyError / silent inf), not
logical contradictions, so the consistency (z3) axis is blind to them: `a[5] ==
0` is a consistent proposition. Only running the code catches the bug, so this
example registers ONLY the pytest-witness seat -- the axis that proves
correctness by execution. It re-runs each case under real numpy/pandas: the
guarded `_ok` case is witnessed (discharged); the `_bad` case breaches the guard
so the run raises (or, for divide-by-zero, the finiteness assertion fails) and
the witness is refused.

Divide-by-zero is the sharp cell: numpy/pandas return silent `inf` rather than
raising; provekit catches it because `_ok` asserts isfinite and `_bad` fails it.

The witness is per-file (runs `pytest <file>`), so `_ok`/`_bad` are separate
files -- 8 cells x 2 = 16. run.sh checks the verdict PER FILE (every `_ok`
discharged, every `_bad` refused), so a swapped or missing verdict fails the
gate. Verified green: 8 discharged, 8 refused.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@TSavo TSavo changed the title pandas: the pandas.testing lift seat + the two-axis showcase pandas: testing lift seat + showcase + the 2×4 guard-shape matrix Jun 5, 2026
…iles)

The signed-receipt verify path refused good witnesses with "no package file and
not re-runnable" on projects that are all tests (numpy/pandas showcases). Root
cause: the resolve_witness RECOMPUTE guard tested `memento.get("code_files")`
for truthiness, but an all-tests project pins an EMPTY code_files (the code under
test is the installed library, not a local file), and `[]` is falsy in Python.
So a trivially re-runnable witness was declared not re-runnable.

Fix is one line: gate on `code_files is not None` (present), not truthiness. The
empty list is part of the pinned witness body and reconstructs correctly, so the
tamper check at line 150 still holds -- this stays sound.

After the fix, verify re-resolves the good witnesses by recompute (rust re-runs
the test, recomputes the CID, matches) and refuses the contradictory ones as
failed runs -- the same verdict prove gives. Confirmed on pandas-showcase and
python-guard-shapes.

Adds a regression test driving the resolve_witness RPC with an empty-code_files
witness (would fail pre-fix), and drops the now-stale verify caveat from the
pandas-showcase README. Kit tests: 20 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py (1)

168-196: 💤 Low value

Rename variable l to left or lhs for clarity.

The variable l at line 193 can be confused with the digit 1 in many fonts. Consider renaming it to left or lhs for better readability. Similarly, r at line 194 could be right or rhs.

♻️ Proposed refactor
-        l = _translate_term_scoped(call.args[0], scope, call_vars)
-        r = _translate_term_scoped(call.args[1], scope, call_vars)
-        return comparison_with_none_guard("=", l, r, emit_none_guard=False)
+        left = _translate_term_scoped(call.args[0], scope, call_vars)
+        right = _translate_term_scoped(call.args[1], scope, call_vars)
+        return comparison_with_none_guard("=", left, right, emit_none_guard=False)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py`
around lines 168 - 196, In _lift_pdt_assertion_scoped rename the local variables
l and r to clearer names (e.g., left/lhs and right/rhs): replace the assignments
to l and r from _translate_term_scoped with left (or lhs) and right (or rhs) and
update the subsequent return call to use those new names in
comparison_with_none_guard; ensure any other references to l and r in that
function are updated to the new identifiers for consistency and clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/pandas-showcase/README.md`:
- Line 48: Replace the likely typo in the README sentence that currently reads
"pandas, not fame, is the right next rung" by changing "fame" to "frame" so the
sentence reads "pandas, not frame, is the right next rung" (locate the sentence
text in README.md to edit).

In `@examples/pandas-showcase/run.sh`:
- Line 22: The script currently uses "set -uo pipefail" which omits the -e flag
so failures in commands like cd, mint, or prove may be ignored; update the shell
options to "set -euo pipefail" (or add -e alongside the existing flags) at the
top of the script to fail fast on any command error, and audit the command
sequences around the cd/mint/prove invocations to ensure no subshells or traps
suppress exit codes (the same change should be applied to the other occurrences
of the same set invocation near the mint/prove blocks).
- Around line 30-33: The script exposes PANDAS_WITNESS_VENV via VENV but other
artifacts (lift manifests) call /tmp/pandas-witness-venv/bin/python directly, so
either make all invocations respect the VENV variable or remove the misleading
override; update the script to consistently reference the VENV variable
(PANDAS_WITNESS_VENV -> VENV) everywhere and change any hardcoded references to
/tmp/pandas-witness-venv/bin/python to use "$VENV/bin/python", or if you prefer
the minimal fix, remove PANDAS_WITNESS_VENV and hardcode VENV to
/tmp/pandas-witness-venv so behavior matches the manifests.

In `@examples/python-guard-shapes/run.sh`:
- Around line 14-33: The script currently uses "set -uo pipefail" but lacks -e
so failing commands like cd, "$BIN" mint, or the PATH-prefixed "$BIN" prove can
be ignored; update the shebang/options to "set -euo pipefail" (or add explicit
"|| exit 1" after critical commands) so the script exits immediately on errors,
ensuring failures in cd, the mint call, or the prove call
(PATH="$VENV/bin:$PATH" "$BIN" prove ...) abort the run and surface the actual
fault.

In `@examples/python-guard-shapes/test_divide_by_zero_pandas_ok.py`:
- Around line 1-2: The top-of-file comment is misleading: in
test_divide_by_zero_pandas_ok.py the divisor used is non-zero (2.0) and the test
is an OK/discharged case, so update the header lines that currently mention
"divide by zero" and "zero divisor yields inf" to state this is the non-zero
divisor (2.0) case, that pandas performs the real division and yields finite
results, and that the witness RUNS the real pandas code and is discharged.

---

Nitpick comments:
In
`@implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py`:
- Around line 168-196: In _lift_pdt_assertion_scoped rename the local variables
l and r to clearer names (e.g., left/lhs and right/rhs): replace the assignments
to l and r from _translate_term_scoped with left (or lhs) and right (or rhs) and
update the subsequent return call to use those new names in
comparison_with_none_guard; ensure any other references to l and r in that
function are updated to the new identifiers for consistency and clarity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 35645acd-9785-4340-bd9f-36a4bd501f2a

📥 Commits

Reviewing files that changed from the base of the PR and between ac55f14 and 308ce0e.

📒 Files selected for processing (35)
  • Makefile
  • examples/pandas-showcase/.provekit/config.toml
  • examples/pandas-showcase/.provekit/lift/python-pandas-testing/manifest.toml
  • examples/pandas-showcase/.provekit/lift/python-pytest-witness/manifest.toml
  • examples/pandas-showcase/.provekit/lift/python/manifest.toml
  • examples/pandas-showcase/README.md
  • examples/pandas-showcase/run.sh
  • examples/pandas-showcase/test_pandas_frame.py
  • examples/pandas-showcase/test_pandas_sum.py
  • examples/pandas-showcase/test_pandas_sum_bad.py
  • examples/python-guard-shapes/.provekit/config.toml
  • examples/python-guard-shapes/.provekit/lift/python-pytest-witness/manifest.toml
  • examples/python-guard-shapes/README.md
  • examples/python-guard-shapes/run.sh
  • examples/python-guard-shapes/test_divide_by_zero_numpy_bad.py
  • examples/python-guard-shapes/test_divide_by_zero_numpy_ok.py
  • examples/python-guard-shapes/test_divide_by_zero_pandas_bad.py
  • examples/python-guard-shapes/test_divide_by_zero_pandas_ok.py
  • examples/python-guard-shapes/test_empty_container_numpy_bad.py
  • examples/python-guard-shapes/test_empty_container_numpy_ok.py
  • examples/python-guard-shapes/test_empty_container_pandas_bad.py
  • examples/python-guard-shapes/test_empty_container_pandas_ok.py
  • examples/python-guard-shapes/test_index_bounds_numpy_bad.py
  • examples/python-guard-shapes/test_index_bounds_numpy_ok.py
  • examples/python-guard-shapes/test_index_bounds_pandas_bad.py
  • examples/python-guard-shapes/test_index_bounds_pandas_ok.py
  • examples/python-guard-shapes/test_key_access_numpy_bad.py
  • examples/python-guard-shapes/test_key_access_numpy_ok.py
  • examples/python-guard-shapes/test_key_access_pandas_bad.py
  • examples/python-guard-shapes/test_key_access_pandas_ok.py
  • implementations/python/provekit-lift-py-pandas-testing/pyproject.toml
  • implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/__init__.py
  • implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/lsp.py
  • implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py
  • implementations/python/provekit-lift-py-pandas-testing/tests/test_pandas_testing_layer.py

its tests assert scalars (`.sum() == 6`) alongside frame equality. A package
whose tests only assert `assert_allclose` over giant tensors (TensorFlow,
Transformers) would get the witness axis but a near-vacuous consistency axis —
which is exactly why pandas, not fame, is the right next rung.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the likely typo in the rationale sentence.

Line 48 says “pandas, not fame,” which reads like a typo and weakens the explanation.

Suggested edit
-which is exactly why pandas, not fame, is the right next rung.
+which is exactly why pandas, not frame, is the right next rung.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
which is exactly why pandas, not fame, is the right next rung.
which is exactly why pandas, not frame, is the right next rung.
🧰 Tools
🪛 LanguageTool

[style] ~48-~48: Consider an alternative for the overused word “exactly”.
Context: ...ear-vacuous consistency axis — which is exactly why pandas, not fame, is the right next...

(EXACTLY_PRECISELY)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/pandas-showcase/README.md` at line 48, Replace the likely typo in
the README sentence that currently reads "pandas, not fame, is the right next
rung" by changing "fame" to "frame" so the sentence reads "pandas, not frame, is
the right next rung" (locate the sentence text in README.md to edit).

Source: Linters/SAST tools

# test_pandas_sum_bad.py, so the showcase proves the CORRECT pandas code AND
# catches the contradiction -- refused BOTH ways. This script PASSES iff
# provekit produces exactly that verdict.
set -uo pipefail
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on command errors in the orchestration script.

Line 22 omits -e, so failed cd, mint, or prove steps can be silently ignored and the script may continue with invalid state.

Suggested hardening
-set -uo pipefail
+set -euo pipefail
@@
-cd "$HERE"
+cd "$HERE" || exit 1

Also applies to: 36-36, 41-44

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/pandas-showcase/run.sh` at line 22, The script currently uses "set
-uo pipefail" which omits the -e flag so failures in commands like cd, mint, or
prove may be ignored; update the shell options to "set -euo pipefail" (or add -e
alongside the existing flags) at the top of the script to fail fast on any
command error, and audit the command sequences around the cd/mint/prove
invocations to ensure no subshells or traps suppress exit codes (the same change
should be applied to the other occurrences of the same set invocation near the
mint/prove blocks).

Source: Linters/SAST tools

Comment thread examples/pandas-showcase/run.sh
Comment thread examples/python-guard-shapes/run.sh Outdated
Comment on lines +14 to +33
set -uo pipefail
HERE="$(cd "$(dirname "$0")" && pwd)"
REPO="$(cd "$HERE/../.." && pwd)"
BIN="$REPO/implementations/rust/target/debug/provekit"

VENV="${PANDAS_WITNESS_VENV:-/tmp/pandas-witness-venv}"
if [ ! -x "$VENV/bin/python" ]; then
python3 -m venv "$VENV"
"$VENV/bin/pip" install -q numpy pandas pytest pynacl blake3 cbor2
fi

cd "$HERE"
rm -f blake3-512:*.proof 2>/dev/null || true
rm -rf .provekit/runs .provekit/witnesses __pycache__ 2>/dev/null || true

echo "== mint (pytest-witness over the 16 cases) =="
"$BIN" mint --out . --quiet

echo "== prove (witness axis: re-run each case under real numpy/pandas) =="
PATH="$VENV/bin:$PATH" "$BIN" prove . --json 2>/dev/null > /tmp/guard_shapes_prove.json
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enable fail-fast for gate reliability.

The script does not stop on command failure (set -uo pipefail), so failed cd/mint/prove can fall through and produce misleading gate output. Add -e (or explicit || exit) so orchestration fails at the real fault point.

Suggested patch
-set -uo pipefail
+set -euo pipefail
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
set -uo pipefail
HERE="$(cd "$(dirname "$0")" && pwd)"
REPO="$(cd "$HERE/../.." && pwd)"
BIN="$REPO/implementations/rust/target/debug/provekit"
VENV="${PANDAS_WITNESS_VENV:-/tmp/pandas-witness-venv}"
if [ ! -x "$VENV/bin/python" ]; then
python3 -m venv "$VENV"
"$VENV/bin/pip" install -q numpy pandas pytest pynacl blake3 cbor2
fi
cd "$HERE"
rm -f blake3-512:*.proof 2>/dev/null || true
rm -rf .provekit/runs .provekit/witnesses __pycache__ 2>/dev/null || true
echo "== mint (pytest-witness over the 16 cases) =="
"$BIN" mint --out . --quiet
echo "== prove (witness axis: re-run each case under real numpy/pandas) =="
PATH="$VENV/bin:$PATH" "$BIN" prove . --json 2>/dev/null > /tmp/guard_shapes_prove.json
set -euo pipefail
HERE="$(cd "$(dirname "$0")" && pwd)"
REPO="$(cd "$HERE/../.." && pwd)"
BIN="$REPO/implementations/rust/target/debug/provekit"
VENV="${PANDAS_WITNESS_VENV:-/tmp/pandas-witness-venv}"
if [ ! -x "$VENV/bin/python" ]; then
python3 -m venv "$VENV"
"$VENV/bin/pip" install -q numpy pandas pytest pynacl blake3 cbor2
fi
cd "$HERE"
rm -f blake3-512:*.proof 2>/dev/null || true
rm -rf .provekit/runs .provekit/witnesses __pycache__ 2>/dev/null || true
echo "== mint (pytest-witness over the 16 cases) =="
"$BIN" mint --out . --quiet
echo "== prove (witness axis: re-run each case under real numpy/pandas) =="
PATH="$VENV/bin:$PATH" "$BIN" prove . --json 2>/dev/null > /tmp/guard_shapes_prove.json
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 25-25: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/python-guard-shapes/run.sh` around lines 14 - 33, The script
currently uses "set -uo pipefail" but lacks -e so failing commands like cd,
"$BIN" mint, or the PATH-prefixed "$BIN" prove can be ignored; update the
shebang/options to "set -euo pipefail" (or add explicit "|| exit 1" after
critical commands) so the script exits immediately on errors, ensuring failures
in cd, the mint call, or the prove call (PATH="$VENV/bin:$PATH" "$BIN" prove
...) abort the run and surface the actual fault.

Source: Linters/SAST tools

Comment on lines +1 to +2
# Guard shape #4 -- divide by zero (pandas), ok case. pandas inherits numpy: zero divisor yields inf, finiteness assertion fails
# The witness RUNS the real pandas code: this case is witnessed (discharged).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix misleading header comment for the _ok case.

The comment describes the zero-divisor failure path, but this test uses a non-zero divisor (2.0) and is expected to discharge. Please update wording to match the actual scenario.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/python-guard-shapes/test_divide_by_zero_pandas_ok.py` around lines 1
- 2, The top-of-file comment is misleading: in test_divide_by_zero_pandas_ok.py
the divisor used is non-zero (2.0) and the test is an OK/discharged case, so
update the header lines that currently mention "divide by zero" and "zero
divisor yields inf" to state this is the non-zero divisor (2.0) case, that
pandas performs the real division and yields finite results, and that the
witness RUNS the real pandas code and is discharged.

TSavo and others added 2 commits June 5, 2026 17:18
At full-package scale the per-FILE witness fell over: the Oracle ran
`pytest <file>` and keyed ONE witness off the file's exit code, so a single
failing test out of ~1,600 refused the whole file (numpy: all 179 files refused,
~every passing test discarded). That is not a numpy finding, it is a granularity
defect. Running witnessing is the Oracle's job, so the fix lives in the Oracle's
run primitive, shared by mint (lift) and verify (recompute).

PER-TEST (witness.py + _witness_collect.py):
  run_file_witnesses runs the file ONCE -- in the package's own execution context
  (conftest, fixtures, relative imports) -- and a tiny pytest plugin
  (_witness_collect, json/os only, loaded standalone so it never drags in the kit
  dep chain) records every test's outcome. One file run -> one witness PER TEST,
  keyed by the pytest node id. run_and_witness routes a node id back through its
  file, so LIFT and RECOMPUTE agree on outcome under shared file state.
  Proven on real numpy test_scalarmath.py: 1,582 per-test witnesses, 1,581
  discharged, 1 isolated refusal (was: whole file refused).

.witness BUNDLE (witness.py):
  A per-test run of a real package yields thousands of witnesses; one file per
  CID or thousands of inline .proof mementos do not scale. write/read_witness_bundle
  pack MANY bodies into ONE content-addressed .witness file -- JSONL, one
  canonical body per line, blake3(line) == that witness's cid. The content IS the
  key: no index, nothing to trust; the Oracle resolves by scan + blake3, exactly
  its content-address check. read_witness_body now resolves from bundles too.
  mint writes the bundle to .provekit/witnesses/; the .proof carries only the
  signed pointers.

Also fixes a real full-scale robustness bug: bind_rpc._send crashed with
UnicodeEncodeError when a pandas source emoji (🐍) split a surrogate pair across
the 9,802-binding response; now encodes with errors="replace" so one pathological
character never aborts the run.

Lesson recorded: a real package's suite must be witnessed against the INSTALLED
package, not a loose copy -- a compiled package (numpy) cannot be duplicated and
imported alongside itself (ndarray-type identity clash).

Kit: 23 tests (3 new: per-test, recompute-agreement, content-addressed bundle).
Examples green: guard-shapes 8/8, pandas-showcase both axes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The full-package numpy proof was 900 MB / 99,189 members because the proof
carried 48,935 witnesses BY VALUE (a ContractDecl + memento per test). That is
an addressing bug, not a size problem: a proof should carry witnesses BY
REFERENCE -- one content-addressed package, not N inline bodies.

The oracle mints, the verifier asks the oracle, the proof carries one cid:
- MINT (the oracle): `build_suite_bundle` runs the whole suite per-test and
  assembles ONE content-addressed `.witness` package (cid = blake3(bundle)).
  `handle_lift` emits ONE `WitnessPackageMemento` (a pointer + signature over the
  package cid, carrying the test/code files the oracle needs to reproduce it) and
  ONE contract whose evidence pins the package cid. NOT N mementos.
- VERIFY (the verifier asks the oracle): UNCHANGED. `try_witness_discharge`
  already spawns the kit's discharge command; `witness_verify` already calls
  `resolve_witness`. So discharge re-runs the suite, rebuilds the package, and
  confirms it reproduces the pinned cid (`discharge_bundle`); the signed
  dimension resolves the package by cid (package file or recompute) and
  content-addresses it. No rust change.
- THE PROOF CARRIES ONE WITNESS CID: the package. Per-test granularity lives IN
  the package (each line a witness body, self-addressing), committed by the one
  cid.

Result on guard-shapes: proof 900MB-shape -> 4,394 bytes, 99k members -> 2 (one
package contract + one WitnessPackageMemento), the 16 per-test facts in the
content-addressed package. Discharge re-runs the suite and names the exact 8
failing tests. verify resolves the package via the package file, blake3 matches.

A package DISCHARGES iff it reproduces AND every test passed; a failing test
refuses it (naming the failures) -- which is what a failing test in a package
means. Examples read per-test discrimination straight from the package.

Kit: 25 tests (2 new: package one-cid roundtrip + refuses-on-failure/drift).
guard-shapes + pandas-showcase green. No compressor, no RNG -- content-addressing
is the dedup, determinism is the point.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
examples/python-guard-shapes/run.sh (1)

14-14: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Enable fail-fast for gate reliability.

The script does not stop on command failure (set -uo pipefail), so failed cd/mint/prove can fall through and produce misleading gate output. Add -e (or explicit || exit) so orchestration fails at the real fault point.

🛡️ Suggested patch
-set -uo pipefail
+set -euo pipefail
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/python-guard-shapes/run.sh` at line 14, The script currently enables
-u and pipefail but not -e, so failures in commands like cd, mint, or prove can
be ignored; update the shell options line that reads "set -uo pipefail" to
include -e (making it "set -euo pipefail") so the script fails fast on any
command error and prevents misleading gate outputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/_witness_collect.py`:
- Around line 21-29: pytest_runtest_logreport currently ignores report.when ==
"teardown" so a test that fails during teardown can remain recorded as "passed";
update pytest_runtest_logreport (and the _RESULTS map) to treat teardown
failures like setup failures by overwriting the node outcome when report.when ==
"teardown" and report.outcome != "passed" (e.g., set _RESULTS[nodeid] =
report.outcome or setdefault to "error" as appropriate) so teardown failures
replace any prior "passed" entry for that nodeid.

In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/lift_lsp.py`:
- Around line 154-166: The package-recompute branch (the if block checking
memento.get("witness_kind") == "pytest-witness-package" and calling
build_suite_bundle) currently trusts memento["test_files"] and
memento["code_files"] without authentication; change it so you either (A) verify
those lists are present in the signed payload associated with the memento before
using them, or (B) refuse to recompute and raise an error unless an
authenticated package file (the original bundle) is supplied. Locate the branch
that calls build_suite_bundle(ws, list(memento.get("test_files", [])),
list(memento.get("code_files", []))) and replace the unconditional recompute
with logic that checks the signed/asserted inputs (or requires the package file)
and only calls build_suite_bundle when those inputs are cryptographically bound
to the memento; otherwise raise a clear RuntimeError rejecting recompute.

In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/witness.py`:
- Around line 111-134: subprocess.run(...) currently ignores pytest's return
code so if pytest aborts and the plugin never writes out_path we silently return
an empty results set; change the call to capture the CompletedProcess (e.g.
result = subprocess.run(...)) and after reading out_path check: if
result.returncode != 0 and not results: raise an exception (RuntimeError or
subprocess.CalledProcessError) indicating pytest exited non-zero with no
recorded outcomes so the suite cannot be treated as a valid empty suite; keep
the existing finally block that removes out_path and reference symbols:
subprocess.run, result.returncode, out_path, results, witnesses.

---

Duplicate comments:
In `@examples/python-guard-shapes/run.sh`:
- Line 14: The script currently enables -u and pipefail but not -e, so failures
in commands like cd, mint, or prove can be ignored; update the shell options
line that reads "set -uo pipefail" to include -e (making it "set -euo pipefail")
so the script fails fast on any command error and prevents misleading gate
outputs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7e8a7c37-2423-4961-a13f-aa3ba5262efd

📥 Commits

Reviewing files that changed from the base of the PR and between 308ce0e and 1d790a9.

📒 Files selected for processing (8)
  • examples/pandas-showcase/README.md
  • examples/pandas-showcase/run.sh
  • examples/python-guard-shapes/run.sh
  • implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/_witness_collect.py
  • implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/lift_lsp.py
  • implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/witness.py
  • implementations/python/provekit-lift-py-pytest-witness/tests/test_witness.py
  • implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py
✅ Files skipped from review due to trivial changes (1)
  • examples/pandas-showcase/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/pandas-showcase/run.sh

Comment on lines +21 to +29
def pytest_runtest_logreport(report):
nodeid = report.nodeid
if report.when == "call":
# passed | failed | skipped for the test body itself
_RESULTS[nodeid] = report.outcome
elif report.when == "setup" and report.outcome == "failed":
# a setup/collection error never reaches the call phase; record it so the
# test is a witness (a 'failed' one), not silently dropped.
_RESULTS.setdefault(nodeid, "error")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat teardown failures as non-passing outcomes.

pytest_runtest_logreport also fires for when == "teardown". Right now a test that passes in call but fails during fixture cleanup stays recorded as "passed", so the witness layer can mint a passing witness for a pytest error. Overwrite the node outcome on teardown failures instead of only handling setup failures.

Suggested fix
 def pytest_runtest_logreport(report):
     nodeid = report.nodeid
     if report.when == "call":
         # passed | failed | skipped for the test body itself
         _RESULTS[nodeid] = report.outcome
-    elif report.when == "setup" and report.outcome == "failed":
-        # a setup/collection error never reaches the call phase; record it so the
-        # test is a witness (a 'failed' one), not silently dropped.
-        _RESULTS.setdefault(nodeid, "error")
+    elif report.when in {"setup", "teardown"} and report.outcome == "failed":
+        # setup/teardown errors should not leave an earlier "passed" result in place
+        _RESULTS[nodeid] = "error"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/_witness_collect.py`
around lines 21 - 29, pytest_runtest_logreport currently ignores report.when ==
"teardown" so a test that fails during teardown can remain recorded as "passed";
update pytest_runtest_logreport (and the _RESULTS map) to treat teardown
failures like setup failures by overwriting the node outcome when report.when ==
"teardown" and report.outcome != "passed" (e.g., set _RESULTS[nodeid] =
report.outcome or setdefault to "error" as appropriate) so teardown failures
replace any prior "passed" entry for that nodeid.

Comment on lines +154 to +166
# 2a. PACKAGE RECOMPUTE -- a whole-suite WitnessPackageMemento reproduces
# by re-running the suite and rebuilding the content-addressed bundle.
if body is None and ws and memento.get("witness_kind") == "pytest-witness-package":
from .witness import build_suite_bundle
buf, rcid, _ = build_suite_bundle(
ws, list(memento.get("test_files", [])), list(memento.get("code_files", []))
)
if rcid != cid:
raise RuntimeError(
f"witness package did not reproduce: recomputed {rcid}, pinned {cid}"
)
body = buf
resolved_by = "recompute"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't recompute suite packages from unsigned rerun inputs.

This branch trusts memento["test_files"] and memento["code_files"] to decide what to execute, but those fields are not authenticated here before the rerun starts. That means a tampered package memento can steer verification into running a different suite and only fail after recomputing the wrong CID. The rerun inputs need to be bound into the signed payload and verified first, or package recompute should refuse unless the package file is supplied.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/lift_lsp.py`
around lines 154 - 166, The package-recompute branch (the if block checking
memento.get("witness_kind") == "pytest-witness-package" and calling
build_suite_bundle) currently trusts memento["test_files"] and
memento["code_files"] without authentication; change it so you either (A) verify
those lists are present in the signed payload associated with the memento before
using them, or (B) refuse to recompute and raise an error unless an
authenticated package file (the original bundle) is supplied. Locate the branch
that calls build_suite_bundle(ws, list(memento.get("test_files", [])),
list(memento.get("code_files", []))) and replace the unconditional recompute
with logic that checks the signed/asserted inputs (or requires the package file)
and only calls build_suite_bundle when those inputs are cryptographically bound
to the memento; otherwise raise a clear RuntimeError rejecting recompute.

Comment on lines +111 to +134
subprocess.run(
[sys.executable, "-m", "pytest", test_file, "-q", "-p", "no:cacheprovider",
"-p", "_witness_collect"],
cwd=project_dir, capture_output=True, text=True, env=env,
)
results: dict = {}
if os.path.exists(out_path) and os.path.getsize(out_path) > 0:
with open(out_path, encoding="utf-8") as f:
try:
results = _json.load(f)
except ValueError:
results = {}
finally:
if os.path.exists(out_path):
os.remove(out_path)

witnesses: List[Witness] = []
for nodeid, raw in sorted(results.items()):
if raw == "skipped":
continue # a skip is neither a discharge nor a refusal
outcome = "passed" if raw == "passed" else "failed"
cid = jcs_hash(_witness_value(cc, rc, nodeid, outcome, code_files))
witnesses.append(Witness(cc, rc, nodeid, outcome, tuple(sorted(code_files)), cid))
return witnesses
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Refuse empty witness sets when pytest exited abnormally.

subprocess.run()'s return code is ignored here. If pytest aborts before the plugin writes any node results — e.g. collection/import failure, plugin load failure, internal error — this falls through as results = {} and returns no witnesses. build_suite_bundle() then treats that as a valid empty suite. At minimum, fail when pytest exits non-zero and no outcomes were recorded.

Suggested guard
-        subprocess.run(
+        proc = subprocess.run(
             [sys.executable, "-m", "pytest", test_file, "-q", "-p", "no:cacheprovider",
              "-p", "_witness_collect"],
             cwd=project_dir, capture_output=True, text=True, env=env,
         )
         results: dict = {}
         if os.path.exists(out_path) and os.path.getsize(out_path) > 0:
             with open(out_path, encoding="utf-8") as f:
                 try:
                     results = _json.load(f)
                 except ValueError:
                     results = {}
+        if proc.returncode != 0 and not results:
+            raise RuntimeError(
+                f"pytest exited {proc.returncode} before emitting witness results for {test_file}"
+            )
🧰 Tools
🪛 Ruff (0.15.15)

[error] 111-111: subprocess call: check for execution of untrusted input

(S603)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/witness.py`
around lines 111 - 134, subprocess.run(...) currently ignores pytest's return
code so if pytest aborts and the plugin never writes out_path we silently return
an empty results set; change the call to capture the CompletedProcess (e.g.
result = subprocess.run(...)) and after reading out_path check: if
result.returncode != 0 and not results: raise an exception (RuntimeError or
subprocess.CalledProcessError) indicating pytest exited non-zero with no
recorded outcomes so the suite cannot be treated as a valid empty suite; keep
the existing finally block that removes out_path and reference symbols:
subprocess.run, result.returncode, out_path, results, witnesses.

TSavo and others added 2 commits June 5, 2026 19:04
…proof carries

A flag (PROVEKIT_LEAN_SOURCE=1) gated the right thing -- so the DEFAULT proof
inlined the function body, a doubled copy of code the package already holds, and
you had to remember a flag to get the lean SourceMemento. A flag gating
correctness means the default is wrong. Deleted, not inverted.

- bind_lifter: `_body_source_locator` is the FULL reconstruction (locus + cids +
  body + ast_template) -- what the Source Oracle returns. The MINT path strips it
  to the SourceMemento (`source_memento_of`: file, source_cid, span, template_cid,
  param_names) before anything enters the `.proof`. The body NEVER touches the
  proof. No flag, no fat alternative.
- source_oracle: the flag-pop dance is gone; the reconstructor is unconditionally
  complete, so resolve just calls it.

Deleting the bad path flushed out every consumer that was secretly leaning on the
inline body (the whole point of deleting it):
- `_binding_template_from_sugar_entry` REQUIRED inline ast_template and silently
  DROPPED lean bindings. Recognize matches by template_cid (always present, even
  lean), so the gate was wrong; build the template for lean entries and carry
  `body_source` + `source_function_name` so the oracle can resolve.
- `materialize_impl` indexed only bindings that already had `body_text`, so a lean
  binding produced "no sugar binding in scope" and 0 materialized. Now it resolves
  the body through the Source Oracle -- the same resolution recognize uses.

Result: numpy-showcase mints lean with NO flag -- 0 inline body, SourceMementos
only. recognize + materialize + the snake-eats-tail fixpoint all resolve from disk
via the oracle, CID-verified, refusing on drift. The proof signs what you run; it
does not carry a copy of it.

Tests migrated off the deleted inline form to assert the SourceMemento + oracle
reconstruction. python-source 235, pytest-witness 25, py-tests 349 -- all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two CI failures on this branch were inherited from #1936 (TypeScript + bug-zoo
bankruptcy), red on main too -- references to things #1936 deleted but did not
clean up:

- `bug_zoo_machinery_is_self_contained` (cli_surface.rs) asserted
  `menagerie/bug-zoo/Cargo.toml` EXISTS -- but #1936 deleted the bug zoo. The
  test is obsolete; `provekit_cli_does_not_expose_zoo_subcommand` still holds and
  stays.
- `.github/workflows/provekit.yml` was a TS/node "prove" job whose only steps
  typecheck (`tsc --noEmit`) and test (`pnpm test`) the deleted
  `implementations/typescript` tree -- TS18003 "no inputs found" every run. A job
  that only checks deleted code is not a gate. The real signal is
  `acid test (make ci)` (rust + python).
- `tsconfig.json` pinned `include: implementations/typescript/src` (gone).

Deleted all three. The acid test now has no dangling rust test; the dead TS gate
is removed rather than left rotting red.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py`:
- Around line 706-711: The lean binding path currently skips creating a
recognition pattern when `ast_template` is absent, so self-resolved entries
(which forward `ast_template` from `body_source`) never get passed to
`_recognize_calls_anywhere(...)` and lose body-anywhere coverage; update the
logic around `template_cid` handling to derive and pass a pattern when
`ast_template` is missing by using `body_source`/`body_source.ast_template` (or
the forwarded `ast_template` if present) before any oracle reconstruction,
ensuring `_recognize_calls_anywhere(template_cid, pattern, ...)` is called;
change the code paths in the block referencing `template_cid`, `ast_template`,
`body_source`, and where `_recognize_calls_anywhere` is invoked so lean entries
receive a constructed pattern and retain body-anywhere recognition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 71af82ad-0384-40cf-97ae-618558eda41b

📥 Commits

Reviewing files that changed from the base of the PR and between 1d790a9 and b997848.

📒 Files selected for processing (8)
  • .github/workflows/provekit.yml
  • implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_lifter.py
  • implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py
  • implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/source_oracle.py
  • implementations/python/provekit-lift-python-source/tests/test_bind_lifter.py
  • implementations/python/provekit-lift-python-source/tests/test_recognize_source_oracle.py
  • implementations/rust/provekit-cli/tests/cli_surface.rs
  • tsconfig.json
💤 Files with no reviewable changes (3)
  • .github/workflows/provekit.yml
  • tsconfig.json
  • implementations/rust/provekit-cli/tests/cli_surface.rs

Comment on lines +706 to +711
# `template_cid` is the recognize key (recognize matches by cid, not by the
# inline template) and is ALWAYS present -- even in a lean SourceMemento. The
# inline `ast_template`/`body_text` are absent in a lean binding; the consumer
# (recognize match / materialize fill) resolves them through the Source Oracle.
# Carry `body_source` + `source_function_name` so the oracle can resolve.
if not isinstance(template_cid, str) or not template_cid:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Self-resolved lean bindings lose body-anywhere recognition coverage.

The template adapter now forwards ast_template from body_source, but lean entries intentionally omit it. Since self-resolved templates are appended before any oracle reconstruction, _recognize_calls_anywhere(...) skips them (no pattern), so only exact-CID whole-function matches remain for those bindings.

Suggested fix
diff --git a/implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py b/implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py
@@ def _self_resolved_binding_templates(
-        template = _binding_template_from_sugar_entry(entry)
+        template = _binding_template_from_sugar_entry(entry)
         if template is not None:
+            if template.get("ast_template") is None:
+                resolved = _resolve_via_source_oracle(str(root), template)
+                if resolved is not None:
+                    template = {
+                        **template,
+                        "ast_template": resolved.get("ast_template"),
+                        "body_text": template.get("body_text") or resolved.get("body_text"),
+                    }
             templates.append(template)

Also applies to: 718-718, 723-724

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py`
around lines 706 - 711, The lean binding path currently skips creating a
recognition pattern when `ast_template` is absent, so self-resolved entries
(which forward `ast_template` from `body_source`) never get passed to
`_recognize_calls_anywhere(...)` and lose body-anywhere coverage; update the
logic around `template_cid` handling to derive and pass a pattern when
`ast_template` is missing by using `body_source`/`body_source.ast_template` (or
the forwarded `ast_template` if present) before any oracle reconstruction,
ensuring `_recognize_calls_anywhere(template_cid, pattern, ...)` is called;
change the code paths in the block referencing `template_cid`, `ast_template`,
`body_source`, and where `_recognize_calls_anywhere` is invoked so lean entries
receive a constructed pattern and retain body-anywhere recognition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant