pandas: testing lift seat + showcase + the 2×4 guard-shape matrix by TSavo · Pull Request #1937 · TSavo/provekit

TSavo · 2026-06-05T23:23:44Z

The next package up the ladder from numpy is pandas — and per the federation design, it is just another lifter package. This PR adds the seat, the showcase that proves real pandas with zero changes, and the 2×4 guard-shape matrix across numpy + pandas.

1. `provekit-lift-py-pandas-testing` (the seat)

Exact sibling of provekit-lift-py-numpy-testing, reusing the pytest seat's term/formula machinery. The EXACT/APPROX discipline is sharper than numpy because pandas is approximate-by-default: assert_frame_equal/assert_series_equal compare floats with a tolerance unless check_exact is pinned (pandas 3.0.3: the default is the <no_default> sentinel). So it lifts as = only when check_exact=True is explicitly pinned and no relation-altering kwarg is present; everything else is a loud refuse. Claims the whole pandas._testing (tm) 22-name vocabulary so nothing real is silently skipped. 17 unit tests; wired into make test-python.

Honest scope: frame equality is opaque-EUF on both sides, so z3 can't contradict two opaque DataFrames (same as numpy array equality). The consistency teeth come from scalar assertions; this seat's job is sound refusal-of-approximate + witness keying.

2. `examples/pandas-showcase` (the proof)

Real pandas project with one buggy self-contradictory test → proves the good code and refuses the bug both ways (consistency: ==6 consistent / ==6 AND ==7 UNSAT; witness: reproduces / run failed). run.sh self-checks. Green: 4 discharged, 2 refused.

3. `examples/python-guard-shapes` (the 2×4 matrix)

Guard shapes #2 index-bounds, #3 empty-container, #4 divide-by-zero, #5 key-access × {numpy, pandas} = 8 cells, each a real library idiom with a discrimination mutant.

These are runtime faults (IndexError/ValueError/KeyError/silent inf), not logical contradictions — so the consistency axis is blind and only the witness axis catches them. Divide-by-zero is the sharp cell: numpy/pandas return silent inf, caught via an isfinite assertion. run.sh checks the verdict per file (every _ok discharged, every _bad refused). Green: 8 discharged, 8 refused.

Signed-receipt `verify`

The stricter verify (witness-body re-resolution) now passes too. A one-line fix: the resolve_witness recompute guard tested code_files for truthiness, but an all-tests project pins an empty code_files (the code under test is the installed library), and [] is falsy. Gated on presence instead; verify now re-resolves the good witnesses by recompute and refuses the bad ones as failed runs. Regression test added (kit: 20 passed).

Summary by CodeRabbit

New Features
- Added pandas.testing assertion lifting with exact-equality pinning
- Whole-suite, content-addressed witness packages for pytest runs
- New pandas-showcase and python-guard-shapes examples with runnable self-check scripts
Documentation
- Example READMEs explaining multi-axis verification and witness environment setup
Bug Fixes
- Improved RPC UTF-8 serialization to avoid encoding failures

The next package up the ladder from numpy is pandas, and per the federation design it is just another lifter package: provekit-lift-py-pandas-testing, the exact sibling of provekit-lift-py-numpy-testing, reusing the pytest seat's term/formula machinery so equivalent claims federate. The pandas EXACT/APPROXIMATE discipline is SHARPER than numpy's because pandas makes approximate the DEFAULT: assert_frame_equal / assert_series_equal compare floats with a tolerance unless check_exact is pinned. (Confirmed against pandas 3.0.3: check_exact's default is the <no_default> sentinel, resolved at runtime by dtype.) So lifting an un-pinned frame assertion as `=` would FALSE-PASS, claiming exact equality pandas never checked. Version-independently, the seat lifts a frame/series/index/extension-array assertion as `=` ONLY when check_exact=True is explicitly pinned AND no relation-altering keyword (check_like, rtol/atol, check_dtype, ...) is present; everything else is a LOUD REFUSE. The whole pandas._testing (tm) 22-name vocabulary is recognised so nothing real is silently skipped -- the trichotomy: express the exact, loudly refuse the rest. assert_equal is deliberately excluded (cross-library generic name owned by the numpy/generic seat). Honest scope: a frame/series equality is opaque-EUF on both sides, so z3 cannot manufacture a contradiction between two opaque DataFrame constructors -- same as numpy array equality. The CONSISTENCY teeth for pandas come from SCALAR assertions, which the pytest bare-assert seat already lifts (and this seat lifts when they ride a claimed pandas.testing test). This seat's load-bearing job is sound loud-refusal of approximate frame comparison plus callsite keying of the pandas op under test for the witness axis. 17 unit tests: positive lifts (check_exact pinned), the approximate-by-default refusal, relation-altering-kwarg refusals, mutation/control-flow/side-effect refusals, SSA independence, broad-tm claim+refuse, and assert_equal non-claim. Wired into the Makefile test-python loop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Stands up pandas next to numpy, proving correctness with zero code changes. Differs from numpy-showcase only by swapping in the pandas.testing seat and pointing the witness venv at pandas -- the next package is just another lifter package, as designed. Three lift seats over the project: the plain pytest CONSISTENCY seat (scalar assertions, where z3's teeth are), the pandas.testing seat (frame assertions; approximate-by-default refused unless check_exact pinned), and pytest-witness (runs the tests under real pandas). The project deliberately contains a buggy (self-contradictory) test, so the showcase proves the CORRECT pandas code AND catches the contradiction -- refused BOTH ways: consistency: Series.sum() == 6 consistent (discharged); == 6 AND == 7 UNSAT (refused). witness : the good tests reproduce (discharged); the contradictory run is 'failed' (refused). run.sh is self-checking: it asserts provekit produces exactly that verdict and exits non-zero otherwise. Verified green (4 discharged, 2 refused). Scope: demonstrates lift + prove (consistency + witness by recompute). The stricter signed-receipt verify path re-resolves each witness body, which for the pytest-witness kit needs a packaged witness body or full re-run metadata forwarded by the verifier -- a follow-up increment, noted in the README. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-05T23:23:51Z

Looking for one thing? Review this PR in Change Stack to search files, summaries, diffs, and code without losing your place.

Walkthrough

This PR adds a pandas.testing assertion lifter with an LSP server and tests, changes pytest-witness to emit and resolve suite-level witness packages, integrates the pandas suite into Makefile tests, and adds two runnable example projects (pandas-showcase and python-guard-shapes) with run scripts that self-validate witness outcomes.

Changes

Pandas.testing Lift Implementation

Layer / File(s)	Summary
Package setup `implementations/python/provekit-lift-py-pandas-testing/pyproject.toml`, `implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/__init__.py`	Package metadata, setuptools build config, and module initializer re-exporting `lift_file_pandas_testing`.
LSP server and file handling `implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/lsp.py`	JSON-RPC over stdio with handlers for `initialize`, `kit_declaration`, and `lift`; filesystem traversal to discover `.py` files; per-file lifting into canonical JSON IR and aggregated warnings.
Pandas.testing assertion lifting `implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py`	Core AST-based lifter recognizing pandas.testing equality assertions, enforcing `check_exact=True` for exact lifting, refusing approximate/tolerance assertions, gating permitted statements, performing SSA/versioning, and grouping atoms by callsite-keyed contract.
Lift test suite `implementations/python/provekit-lift-py-pandas-testing/tests/test_pandas_testing_layer.py`	Unit tests covering positive lifting, refusal rules, SSA independence, statement-gating refusals, and claiming discipline for pandas.testing vs broader `tm` vocabulary.
Build integration `Makefile`	Adds the pandas-testing suite run in a dedicated venv under `test-python` and aggregates failures into the existing `$$failed` check.

Pandas Showcase (Dual-Axis Example)

Layer / File(s)	Summary
Configuration and infrastructure `examples/pandas-showcase/.provekit/config.toml`, `.provekit/lift/*/manifest.toml`, `examples/pandas-showcase/README.md`, `examples/pandas-showcase/run.sh`	Project config defining three lift surfaces (`python`, `python-pandas-testing`, `python-pytest-witness`), Z3 solver config, lift kit manifests, README, and orchestration script that runs `provekit mint`/`prove` and self-validates consistency + witness verdicts.
Test cases `examples/pandas-showcase/test_pandas_frame.py`, `examples/pandas-showcase/test_pandas_sum.py`, `examples/pandas-showcase/test_pandas_sum_bad.py`	Frame round-trip pinned to exact equality, column-sum verification, and a deliberately contradictory test producing refusal on the consistency and witness axes.

Python Guard Shapes (Runtime Guard Example)

Layer / File(s)	Summary
Configuration and infrastructure `examples/python-guard-shapes/.provekit/config.toml`, `.provekit/lift/python-pytest-witness/manifest.toml`, `examples/python-guard-shapes/README.md`, `examples/python-guard-shapes/run.sh`	Project config and manifest for pytest-witness lift, README documenting the 4 bug classes × 2 libraries matrix, and run script that provisions a witness venv, mints witness bundles, and validates witness-package outcomes via JSON parsing.
Test cases `examples/python-guard-shapes/test__numpy_{ok,bad}.py`, `examples/python-guard-shapes/test__pandas_{ok,bad}.py`	Sixteen tests covering index bounds, empty container, divide-by-zero, and key access across NumPy and pandas with ok/bad variants used by the witness self-checks.

Pytest Witness: suite packaging and runtime

Layer / File(s)	Summary
Witness collection plugin `implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/_witness_collect.py`	Pytest plugin that records per-node outcomes and can write a JSON map to `PROVEKIT_WITNESS_OUT` for downstream use.
Suite-level witness packaging `implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/lift_lsp.py`	`handle_lift` now builds a content-addressed witness bundle for the suite and emits a single `ContractDecl` named `witness-package:{cid}` with suite metadata; `handle_resolve_witness` supports package recompute and verification.
Witness runtime & bundle primitives `implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/witness.py`	Per-file pytest runs producing per-node witnesses, bundle read/write/lookup helpers, suite bundle building, package mementos, and discharge logic that refuses when any per-test witness is not `passed`.
Witness tests `implementations/python/provekit-lift-py-pytest-witness/tests/test_witness.py`	Tests for suite packaging, refusal on failing tests or tampered CID, per-test witness minting/recompute, content-addressed JSONL semantics, and an RPC recompute regression when `code_files` is empty.

RPC & Source-oracle changes

Layer / File(s)	Summary
Bind lifter: SourceMemento `implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_lifter.py`	Switches library sugar binding body_source to a lean SourceMemento (CIDs + locus), adds source_memento_of helper, and ensures `_body_source_locator` populates template/body_text fields.
RPC materialize and template handling `implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py`	Materialize now resolves missing body_text via the source oracle for lean entries, `_binding_template_from_sugar_entry` keys on `template_cid` and carries body_source/source_function_name, and `_send()` writes UTF-8 bytes to stdout.buffer with replace semantics.
Source oracle recompute behavior `implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/source_oracle.py`	Resolve no longer overrides PROVEKIT_LEAN_SOURCE when recomputing, so lean-mode environment can affect recomputation results.
Tests updated for lean body_source `implementations/python/provekit-lift-python-source/tests/*`	Multiple tests updated to expect lean body_source (template_cid/source_cid) and to recompute expected templates from on-disk source where needed.

Repository-level / CI changes

Layer / File(s)	Summary
Makefile integration `Makefile`	`test-python` now runs the pandas-testing suite (venv + pytest) and aggregates its failures into the `$$failed` result.
CI/workflow and config removals `.github/workflows/provekit.yml`, `tsconfig.json`	Removes the previous GitHub Actions workflow and the tsconfig.json shown in the diff.
Rust test deletion `implementations/rust/provekit-cli/tests/cli_surface.rs`	Removes an obsolete CLI test block asserting Bug Zoo harness self-containment.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

TSavo/provekit#1922: Overlapping work on pytest-witness recompute and witness semantics.

A rabbit hops through testing ground,
Where pandas assertions sound,
Exact pins kept with gentle art,
Suite bundles hold each test's heart,
Witness and proof now dance in sync — hooray! 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.28% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: addition of a pandas testing lift seat, a pandas showcase example, and a 2×4 guard-shape matrix of test examples.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/pandas-testing-seat

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07533f0056

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-05T23:26:21Z

+        except ValueError as e:
+            skipped.append(f"`{_unparse(stmt)[:60]}`: {e}")
+            continue


Refuse the whole test after an unliftable pandas assertion

When a test mixes an unpinned/approximate assert_frame_equal(...) with any liftable bare assert, this continue only records the pandas refusal in skipped; if another assertion is keyed later, the function still emits a successful partial contract. That bypasses the intended loud refusal for approximate pandas comparisons and can let prove pass while silently ignoring the frame assertion that made the test unsafe to lift as exact equality.

Useful? React with 👍 / 👎.

Guard shapes #2-#5 (index-bounds, empty-container, divide-by-zero, key-access) proved on BOTH numpy and pandas -- 8 cells, each a real library idiom with a discrimination mutant. Zero code changes. These are RUNTIME faults (IndexError / ValueError / KeyError / silent inf), not logical contradictions, so the consistency (z3) axis is blind to them: `a[5] == 0` is a consistent proposition. Only running the code catches the bug, so this example registers ONLY the pytest-witness seat -- the axis that proves correctness by execution. It re-runs each case under real numpy/pandas: the guarded `_ok` case is witnessed (discharged); the `_bad` case breaches the guard so the run raises (or, for divide-by-zero, the finiteness assertion fails) and the witness is refused. Divide-by-zero is the sharp cell: numpy/pandas return silent `inf` rather than raising; provekit catches it because `_ok` asserts isfinite and `_bad` fails it. The witness is per-file (runs `pytest <file>`), so `_ok`/`_bad` are separate files -- 8 cells x 2 = 16. run.sh checks the verdict PER FILE (every `_ok` discharged, every `_bad` refused), so a swapped or missing verdict fails the gate. Verified green: 8 discharged, 8 refused. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…iles) The signed-receipt verify path refused good witnesses with "no package file and not re-runnable" on projects that are all tests (numpy/pandas showcases). Root cause: the resolve_witness RECOMPUTE guard tested `memento.get("code_files")` for truthiness, but an all-tests project pins an EMPTY code_files (the code under test is the installed library, not a local file), and `[]` is falsy in Python. So a trivially re-runnable witness was declared not re-runnable. Fix is one line: gate on `code_files is not None` (present), not truthiness. The empty list is part of the pinned witness body and reconstructs correctly, so the tamper check at line 150 still holds -- this stays sound. After the fix, verify re-resolves the good witnesses by recompute (rust re-runs the test, recomputes the CID, matches) and refuses the contradictory ones as failed runs -- the same verdict prove gives. Confirmed on pandas-showcase and python-guard-shapes. Adds a regression test driving the resolve_witness RPC with an empty-code_files witness (would fail pre-fix), and drops the now-stale verify caveat from the pandas-showcase README. Kit tests: 20 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py (1)

168-196: 💤 Low value

Rename variable l to left or lhs for clarity.

The variable l at line 193 can be confused with the digit 1 in many fonts. Consider renaming it to left or lhs for better readability. Similarly, r at line 194 could be right or rhs.

♻️ Proposed refactor

-        l = _translate_term_scoped(call.args[0], scope, call_vars)
-        r = _translate_term_scoped(call.args[1], scope, call_vars)
-        return comparison_with_none_guard("=", l, r, emit_none_guard=False)
+        left = _translate_term_scoped(call.args[0], scope, call_vars)
+        right = _translate_term_scoped(call.args[1], scope, call_vars)
+        return comparison_with_none_guard("=", left, right, emit_none_guard=False)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py`
around lines 168 - 196, In _lift_pdt_assertion_scoped rename the local variables
l and r to clearer names (e.g., left/lhs and right/rhs): replace the assignments
to l and r from _translate_term_scoped with left (or lhs) and right (or rhs) and
update the subsequent return call to use those new names in
comparison_with_none_guard; ensure any other references to l and r in that
function are updated to the new identifiers for consistency and clarity.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/pandas-showcase/README.md`:
- Line 48: Replace the likely typo in the README sentence that currently reads
"pandas, not fame, is the right next rung" by changing "fame" to "frame" so the
sentence reads "pandas, not frame, is the right next rung" (locate the sentence
text in README.md to edit).

In `@examples/pandas-showcase/run.sh`:
- Line 22: The script currently uses "set -uo pipefail" which omits the -e flag
so failures in commands like cd, mint, or prove may be ignored; update the shell
options to "set -euo pipefail" (or add -e alongside the existing flags) at the
top of the script to fail fast on any command error, and audit the command
sequences around the cd/mint/prove invocations to ensure no subshells or traps
suppress exit codes (the same change should be applied to the other occurrences
of the same set invocation near the mint/prove blocks).
- Around line 30-33: The script exposes PANDAS_WITNESS_VENV via VENV but other
artifacts (lift manifests) call /tmp/pandas-witness-venv/bin/python directly, so
either make all invocations respect the VENV variable or remove the misleading
override; update the script to consistently reference the VENV variable
(PANDAS_WITNESS_VENV -> VENV) everywhere and change any hardcoded references to
/tmp/pandas-witness-venv/bin/python to use "$VENV/bin/python", or if you prefer
the minimal fix, remove PANDAS_WITNESS_VENV and hardcode VENV to
/tmp/pandas-witness-venv so behavior matches the manifests.

In `@examples/python-guard-shapes/run.sh`:
- Around line 14-33: The script currently uses "set -uo pipefail" but lacks -e
so failing commands like cd, "$BIN" mint, or the PATH-prefixed "$BIN" prove can
be ignored; update the shebang/options to "set -euo pipefail" (or add explicit
"|| exit 1" after critical commands) so the script exits immediately on errors,
ensuring failures in cd, the mint call, or the prove call
(PATH="$VENV/bin:$PATH" "$BIN" prove ...) abort the run and surface the actual
fault.

In `@examples/python-guard-shapes/test_divide_by_zero_pandas_ok.py`:
- Around line 1-2: The top-of-file comment is misleading: in
test_divide_by_zero_pandas_ok.py the divisor used is non-zero (2.0) and the test
is an OK/discharged case, so update the header lines that currently mention
"divide by zero" and "zero divisor yields inf" to state this is the non-zero
divisor (2.0) case, that pandas performs the real division and yields finite
results, and that the witness RUNS the real pandas code and is discharged.

---

Nitpick comments:
In
`@implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py`:
- Around line 168-196: In _lift_pdt_assertion_scoped rename the local variables
l and r to clearer names (e.g., left/lhs and right/rhs): replace the assignments
to l and r from _translate_term_scoped with left (or lhs) and right (or rhs) and
update the subsequent return call to use those new names in
comparison_with_none_guard; ensure any other references to l and r in that
function are updated to the new identifiers for consistency and clarity.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 35645acd-9785-4340-bd9f-36a4bd501f2a

📥 Commits

Reviewing files that changed from the base of the PR and between ac55f14 and 308ce0e.

📒 Files selected for processing (35)

Makefile
examples/pandas-showcase/.provekit/config.toml
examples/pandas-showcase/.provekit/lift/python-pandas-testing/manifest.toml
examples/pandas-showcase/.provekit/lift/python-pytest-witness/manifest.toml
examples/pandas-showcase/.provekit/lift/python/manifest.toml
examples/pandas-showcase/README.md
examples/pandas-showcase/run.sh
examples/pandas-showcase/test_pandas_frame.py
examples/pandas-showcase/test_pandas_sum.py
examples/pandas-showcase/test_pandas_sum_bad.py
examples/python-guard-shapes/.provekit/config.toml
examples/python-guard-shapes/.provekit/lift/python-pytest-witness/manifest.toml
examples/python-guard-shapes/README.md
examples/python-guard-shapes/run.sh
examples/python-guard-shapes/test_divide_by_zero_numpy_bad.py
examples/python-guard-shapes/test_divide_by_zero_numpy_ok.py
examples/python-guard-shapes/test_divide_by_zero_pandas_bad.py
examples/python-guard-shapes/test_divide_by_zero_pandas_ok.py
examples/python-guard-shapes/test_empty_container_numpy_bad.py
examples/python-guard-shapes/test_empty_container_numpy_ok.py
examples/python-guard-shapes/test_empty_container_pandas_bad.py
examples/python-guard-shapes/test_empty_container_pandas_ok.py
examples/python-guard-shapes/test_index_bounds_numpy_bad.py
examples/python-guard-shapes/test_index_bounds_numpy_ok.py
examples/python-guard-shapes/test_index_bounds_pandas_bad.py
examples/python-guard-shapes/test_index_bounds_pandas_ok.py
examples/python-guard-shapes/test_key_access_numpy_bad.py
examples/python-guard-shapes/test_key_access_numpy_ok.py
examples/python-guard-shapes/test_key_access_pandas_bad.py
examples/python-guard-shapes/test_key_access_pandas_ok.py
implementations/python/provekit-lift-py-pandas-testing/pyproject.toml
implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/__init__.py
implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/lsp.py
implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py
implementations/python/provekit-lift-py-pandas-testing/tests/test_pandas_testing_layer.py

coderabbitai · 2026-06-05T23:43:33Z

+its tests assert scalars (`.sum() == 6`) alongside frame equality. A package
+whose tests only assert `assert_allclose` over giant tensors (TensorFlow,
+Transformers) would get the witness axis but a near-vacuous consistency axis —
+which is exactly why pandas, not fame, is the right next rung.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the likely typo in the rationale sentence.

Line 48 says “pandas, not fame,” which reads like a typo and weakens the explanation.

Suggested edit

-which is exactly why pandas, not fame, is the right next rung. +which is exactly why pandas, not frame, is the right next rung.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

which is exactly why pandas, not fame, is the right next rung.

which is exactly why pandas, not frame, is the right next rung.

🧰 Tools

🪛 LanguageTool

[style] ~48-~48: Consider an alternative for the overused word “exactly”.
Context: ...ear-vacuous consistency axis — which is exactly why pandas, not fame, is the right next...

(EXACTLY_PRECISELY)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/pandas-showcase/README.md` at line 48, Replace the likely typo in the README sentence that currently reads "pandas, not fame, is the right next rung" by changing "fame" to "frame" so the sentence reads "pandas, not frame, is the right next rung" (locate the sentence text in README.md to edit).

Source: Linters/SAST tools

coderabbitai · 2026-06-05T23:43:33Z

+# test_pandas_sum_bad.py, so the showcase proves the CORRECT pandas code AND
+# catches the contradiction -- refused BOTH ways. This script PASSES iff
+# provekit produces exactly that verdict.
+set -uo pipefail


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on command errors in the orchestration script.

Line 22 omits -e, so failed cd, mint, or prove steps can be silently ignored and the script may continue with invalid state.

Suggested hardening

-set -uo pipefail +set -euo pipefail @@ -cd "$HERE" +cd "$HERE" || exit 1

Also applies to: 36-36, 41-44

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/pandas-showcase/run.sh` at line 22, The script currently uses "set -uo pipefail" which omits the -e flag so failures in commands like cd, mint, or prove may be ignored; update the shell options to "set -euo pipefail" (or add -e alongside the existing flags) at the top of the script to fail fast on any command error, and audit the command sequences around the cd/mint/prove invocations to ensure no subshells or traps suppress exit codes (the same change should be applied to the other occurrences of the same set invocation near the mint/prove blocks).

Source: Linters/SAST tools

coderabbitai · 2026-06-05T23:43:33Z

+set -uo pipefail
+HERE="$(cd "$(dirname "$0")" && pwd)"
+REPO="$(cd "$HERE/../.." && pwd)"
+BIN="$REPO/implementations/rust/target/debug/provekit"
+
+VENV="${PANDAS_WITNESS_VENV:-/tmp/pandas-witness-venv}"
+if [ ! -x "$VENV/bin/python" ]; then
+  python3 -m venv "$VENV"
+  "$VENV/bin/pip" install -q numpy pandas pytest pynacl blake3 cbor2
+fi
+
+cd "$HERE"
+rm -f blake3-512:*.proof 2>/dev/null || true
+rm -rf .provekit/runs .provekit/witnesses __pycache__ 2>/dev/null || true
+
+echo "== mint (pytest-witness over the 16 cases) =="
+"$BIN" mint --out . --quiet
+
+echo "== prove (witness axis: re-run each case under real numpy/pandas) =="
+PATH="$VENV/bin:$PATH" "$BIN" prove . --json 2>/dev/null > /tmp/guard_shapes_prove.json


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enable fail-fast for gate reliability.

The script does not stop on command failure (set -uo pipefail), so failed cd/mint/prove can fall through and produce misleading gate output. Add -e (or explicit || exit) so orchestration fails at the real fault point.

Suggested patch

-set -uo pipefail +set -euo pipefail

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

set -uo pipefail

HERE="$(cd "$(dirname "$0")" && pwd)"

REPO="$(cd "$HERE/../.." && pwd)"

BIN="$REPO/implementations/rust/target/debug/provekit"

VENV="${PANDAS_WITNESS_VENV:-/tmp/pandas-witness-venv}"

if [ ! -x "$VENV/bin/python" ]; then

python3 -m venv "$VENV"

"$VENV/bin/pip" install -q numpy pandas pytest pynacl blake3 cbor2

fi

cd "$HERE"

rm -f blake3-512:*.proof 2>/dev/null || true

rm -rf .provekit/runs .provekit/witnesses __pycache__ 2>/dev/null || true

echo "== mint (pytest-witness over the 16 cases) =="

"$BIN" mint --out . --quiet

echo "== prove (witness axis: re-run each case under real numpy/pandas) =="

PATH="$VENV/bin:$PATH" "$BIN" prove . --json 2>/dev/null > /tmp/guard_shapes_prove.json

set -euo pipefail

HERE="$(cd "$(dirname "$0")" && pwd)"

REPO="$(cd "$HERE/../.." && pwd)"

BIN="$REPO/implementations/rust/target/debug/provekit"

VENV="${PANDAS_WITNESS_VENV:-/tmp/pandas-witness-venv}"

if [ ! -x "$VENV/bin/python" ]; then

python3 -m venv "$VENV"

"$VENV/bin/pip" install -q numpy pandas pytest pynacl blake3 cbor2

fi

cd "$HERE"

rm -f blake3-512:*.proof 2>/dev/null || true

rm -rf .provekit/runs .provekit/witnesses __pycache__ 2>/dev/null || true

echo "== mint (pytest-witness over the 16 cases) =="

"$BIN" mint --out . --quiet

echo "== prove (witness axis: re-run each case under real numpy/pandas) =="

PATH="$VENV/bin:$PATH" "$BIN" prove . --json 2>/dev/null > /tmp/guard_shapes_prove.json

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 25-25: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/python-guard-shapes/run.sh` around lines 14 - 33, The script currently uses "set -uo pipefail" but lacks -e so failing commands like cd, "$BIN" mint, or the PATH-prefixed "$BIN" prove can be ignored; update the shebang/options to "set -euo pipefail" (or add explicit "|| exit 1" after critical commands) so the script exits immediately on errors, ensuring failures in cd, the mint call, or the prove call (PATH="$VENV/bin:$PATH" "$BIN" prove ...) abort the run and surface the actual fault.

Source: Linters/SAST tools

coderabbitai · 2026-06-05T23:43:33Z

+# Guard shape #4 -- divide by zero (pandas), ok case. pandas inherits numpy: zero divisor yields inf, finiteness assertion fails
+# The witness RUNS the real pandas code: this case is witnessed (discharged).


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix misleading header comment for the _ok case.

The comment describes the zero-divisor failure path, but this test uses a non-zero divisor (2.0) and is expected to discharge. Please update wording to match the actual scenario.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/python-guard-shapes/test_divide_by_zero_pandas_ok.py` around lines 1 - 2, The top-of-file comment is misleading: in test_divide_by_zero_pandas_ok.py the divisor used is non-zero (2.0) and the test is an OK/discharged case, so update the header lines that currently mention "divide by zero" and "zero divisor yields inf" to state this is the non-zero divisor (2.0) case, that pandas performs the real division and yields finite results, and that the witness RUNS the real pandas code and is discharged.

At full-package scale the per-FILE witness fell over: the Oracle ran `pytest <file>` and keyed ONE witness off the file's exit code, so a single failing test out of ~1,600 refused the whole file (numpy: all 179 files refused, ~every passing test discarded). That is not a numpy finding, it is a granularity defect. Running witnessing is the Oracle's job, so the fix lives in the Oracle's run primitive, shared by mint (lift) and verify (recompute). PER-TEST (witness.py + _witness_collect.py): run_file_witnesses runs the file ONCE -- in the package's own execution context (conftest, fixtures, relative imports) -- and a tiny pytest plugin (_witness_collect, json/os only, loaded standalone so it never drags in the kit dep chain) records every test's outcome. One file run -> one witness PER TEST, keyed by the pytest node id. run_and_witness routes a node id back through its file, so LIFT and RECOMPUTE agree on outcome under shared file state. Proven on real numpy test_scalarmath.py: 1,582 per-test witnesses, 1,581 discharged, 1 isolated refusal (was: whole file refused). .witness BUNDLE (witness.py): A per-test run of a real package yields thousands of witnesses; one file per CID or thousands of inline .proof mementos do not scale. write/read_witness_bundle pack MANY bodies into ONE content-addressed .witness file -- JSONL, one canonical body per line, blake3(line) == that witness's cid. The content IS the key: no index, nothing to trust; the Oracle resolves by scan + blake3, exactly its content-address check. read_witness_body now resolves from bundles too. mint writes the bundle to .provekit/witnesses/; the .proof carries only the signed pointers. Also fixes a real full-scale robustness bug: bind_rpc._send crashed with UnicodeEncodeError when a pandas source emoji (🐍) split a surrogate pair across the 9,802-binding response; now encodes with errors="replace" so one pathological character never aborts the run. Lesson recorded: a real package's suite must be witnessed against the INSTALLED package, not a loose copy -- a compiled package (numpy) cannot be duplicated and imported alongside itself (ndarray-type identity clash). Kit: 23 tests (3 new: per-test, recompute-agreement, content-addressed bundle). Examples green: guard-shapes 8/8, pandas-showcase both axes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The full-package numpy proof was 900 MB / 99,189 members because the proof carried 48,935 witnesses BY VALUE (a ContractDecl + memento per test). That is an addressing bug, not a size problem: a proof should carry witnesses BY REFERENCE -- one content-addressed package, not N inline bodies. The oracle mints, the verifier asks the oracle, the proof carries one cid: - MINT (the oracle): `build_suite_bundle` runs the whole suite per-test and assembles ONE content-addressed `.witness` package (cid = blake3(bundle)). `handle_lift` emits ONE `WitnessPackageMemento` (a pointer + signature over the package cid, carrying the test/code files the oracle needs to reproduce it) and ONE contract whose evidence pins the package cid. NOT N mementos. - VERIFY (the verifier asks the oracle): UNCHANGED. `try_witness_discharge` already spawns the kit's discharge command; `witness_verify` already calls `resolve_witness`. So discharge re-runs the suite, rebuilds the package, and confirms it reproduces the pinned cid (`discharge_bundle`); the signed dimension resolves the package by cid (package file or recompute) and content-addresses it. No rust change. - THE PROOF CARRIES ONE WITNESS CID: the package. Per-test granularity lives IN the package (each line a witness body, self-addressing), committed by the one cid. Result on guard-shapes: proof 900MB-shape -> 4,394 bytes, 99k members -> 2 (one package contract + one WitnessPackageMemento), the 16 per-test facts in the content-addressed package. Discharge re-runs the suite and names the exact 8 failing tests. verify resolves the package via the package file, blake3 matches. A package DISCHARGES iff it reproduces AND every test passed; a failing test refuses it (naming the failures) -- which is what a failing test in a package means. Examples read per-test discrimination straight from the package. Kit: 25 tests (2 new: package one-cid roundtrip + refuses-on-failure/drift). guard-shapes + pandas-showcase green. No compressor, no RNG -- content-addressing is the dedup, determinism is the point. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

examples/python-guard-shapes/run.sh (1)
14-14: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Enable fail-fast for gate reliability.

The script does not stop on command failure (set -uo pipefail), so failed cd/mint/prove can fall through and produce misleading gate output. Add -e (or explicit || exit) so orchestration fails at the real fault point.
🛡️ Suggested patch
-set -uo pipefail
+set -euo pipefail
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/python-guard-shapes/run.sh` at line 14, The script currently enables
-u and pipefail but not -e, so failures in commands like cd, mint, or prove can
be ignored; update the shell options line that reads "set -uo pipefail" to
include -e (making it "set -euo pipefail") so the script fails fast on any
command error and prevents misleading gate outputs.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/_witness_collect.py`:
- Around line 21-29: pytest_runtest_logreport currently ignores report.when ==
"teardown" so a test that fails during teardown can remain recorded as "passed";
update pytest_runtest_logreport (and the _RESULTS map) to treat teardown
failures like setup failures by overwriting the node outcome when report.when ==
"teardown" and report.outcome != "passed" (e.g., set _RESULTS[nodeid] =
report.outcome or setdefault to "error" as appropriate) so teardown failures
replace any prior "passed" entry for that nodeid.

In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/lift_lsp.py`:
- Around line 154-166: The package-recompute branch (the if block checking
memento.get("witness_kind") == "pytest-witness-package" and calling
build_suite_bundle) currently trusts memento["test_files"] and
memento["code_files"] without authentication; change it so you either (A) verify
those lists are present in the signed payload associated with the memento before
using them, or (B) refuse to recompute and raise an error unless an
authenticated package file (the original bundle) is supplied. Locate the branch
that calls build_suite_bundle(ws, list(memento.get("test_files", [])),
list(memento.get("code_files", []))) and replace the unconditional recompute
with logic that checks the signed/asserted inputs (or requires the package file)
and only calls build_suite_bundle when those inputs are cryptographically bound
to the memento; otherwise raise a clear RuntimeError rejecting recompute.

In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/witness.py`:
- Around line 111-134: subprocess.run(...) currently ignores pytest's return
code so if pytest aborts and the plugin never writes out_path we silently return
an empty results set; change the call to capture the CompletedProcess (e.g.
result = subprocess.run(...)) and after reading out_path check: if
result.returncode != 0 and not results: raise an exception (RuntimeError or
subprocess.CalledProcessError) indicating pytest exited non-zero with no
recorded outcomes so the suite cannot be treated as a valid empty suite; keep
the existing finally block that removes out_path and reference symbols:
subprocess.run, result.returncode, out_path, results, witnesses.

---

Duplicate comments:
In `@examples/python-guard-shapes/run.sh`:
- Line 14: The script currently enables -u and pipefail but not -e, so failures
in commands like cd, mint, or prove can be ignored; update the shell options
line that reads "set -uo pipefail" to include -e (making it "set -euo pipefail")
so the script fails fast on any command error and prevents misleading gate
outputs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7e8a7c37-2423-4961-a13f-aa3ba5262efd

📥 Commits

Reviewing files that changed from the base of the PR and between 308ce0e and 1d790a9.

📒 Files selected for processing (8)

examples/pandas-showcase/README.md
examples/pandas-showcase/run.sh
examples/python-guard-shapes/run.sh
implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/_witness_collect.py
implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/lift_lsp.py
implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/witness.py
implementations/python/provekit-lift-py-pytest-witness/tests/test_witness.py
implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py

✅ Files skipped from review due to trivial changes (1)

examples/pandas-showcase/README.md

🚧 Files skipped from review as they are similar to previous changes (1)

examples/pandas-showcase/run.sh

coderabbitai · 2026-06-06T01:32:17Z

+def pytest_runtest_logreport(report):
+    nodeid = report.nodeid
+    if report.when == "call":
+        # passed | failed | skipped for the test body itself
+        _RESULTS[nodeid] = report.outcome
+    elif report.when == "setup" and report.outcome == "failed":
+        # a setup/collection error never reaches the call phase; record it so the
+        # test is a witness (a 'failed' one), not silently dropped.
+        _RESULTS.setdefault(nodeid, "error")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat teardown failures as non-passing outcomes.

pytest_runtest_logreport also fires for when == "teardown". Right now a test that passes in call but fails during fixture cleanup stays recorded as "passed", so the witness layer can mint a passing witness for a pytest error. Overwrite the node outcome on teardown failures instead of only handling setup failures.

Suggested fix

def pytest_runtest_logreport(report): nodeid = report.nodeid if report.when == "call": # passed | failed | skipped for the test body itself _RESULTS[nodeid] = report.outcome - elif report.when == "setup" and report.outcome == "failed": - # a setup/collection error never reaches the call phase; record it so the - # test is a witness (a 'failed' one), not silently dropped. - _RESULTS.setdefault(nodeid, "error") + elif report.when in {"setup", "teardown"} and report.outcome == "failed": + # setup/teardown errors should not leave an earlier "passed" result in place + _RESULTS[nodeid] = "error"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/_witness_collect.py` around lines 21 - 29, pytest_runtest_logreport currently ignores report.when == "teardown" so a test that fails during teardown can remain recorded as "passed"; update pytest_runtest_logreport (and the _RESULTS map) to treat teardown failures like setup failures by overwriting the node outcome when report.when == "teardown" and report.outcome != "passed" (e.g., set _RESULTS[nodeid] = report.outcome or setdefault to "error" as appropriate) so teardown failures replace any prior "passed" entry for that nodeid.

coderabbitai · 2026-06-06T01:32:17Z

+        # 2a. PACKAGE RECOMPUTE -- a whole-suite WitnessPackageMemento reproduces
+        # by re-running the suite and rebuilding the content-addressed bundle.
+        if body is None and ws and memento.get("witness_kind") == "pytest-witness-package":
+            from .witness import build_suite_bundle
+            buf, rcid, _ = build_suite_bundle(
+                ws, list(memento.get("test_files", [])), list(memento.get("code_files", []))
+            )
+            if rcid != cid:
+                raise RuntimeError(
+                    f"witness package did not reproduce: recomputed {rcid}, pinned {cid}"
+                )
+            body = buf
+            resolved_by = "recompute"


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't recompute suite packages from unsigned rerun inputs.

This branch trusts memento["test_files"] and memento["code_files"] to decide what to execute, but those fields are not authenticated here before the rerun starts. That means a tampered package memento can steer verification into running a different suite and only fail after recomputing the wrong CID. The rerun inputs need to be bound into the signed payload and verified first, or package recompute should refuse unless the package file is supplied.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/lift_lsp.py` around lines 154 - 166, The package-recompute branch (the if block checking memento.get("witness_kind") == "pytest-witness-package" and calling build_suite_bundle) currently trusts memento["test_files"] and memento["code_files"] without authentication; change it so you either (A) verify those lists are present in the signed payload associated with the memento before using them, or (B) refuse to recompute and raise an error unless an authenticated package file (the original bundle) is supplied. Locate the branch that calls build_suite_bundle(ws, list(memento.get("test_files", [])), list(memento.get("code_files", []))) and replace the unconditional recompute with logic that checks the signed/asserted inputs (or requires the package file) and only calls build_suite_bundle when those inputs are cryptographically bound to the memento; otherwise raise a clear RuntimeError rejecting recompute.

coderabbitai · 2026-06-06T01:32:17Z

+        subprocess.run(
+            [sys.executable, "-m", "pytest", test_file, "-q", "-p", "no:cacheprovider",
+             "-p", "_witness_collect"],
+            cwd=project_dir, capture_output=True, text=True, env=env,
+        )
+        results: dict = {}
+        if os.path.exists(out_path) and os.path.getsize(out_path) > 0:
+            with open(out_path, encoding="utf-8") as f:
+                try:
+                    results = _json.load(f)
+                except ValueError:
+                    results = {}
+    finally:
+        if os.path.exists(out_path):
+            os.remove(out_path)
+
+    witnesses: List[Witness] = []
+    for nodeid, raw in sorted(results.items()):
+        if raw == "skipped":
+            continue  # a skip is neither a discharge nor a refusal
+        outcome = "passed" if raw == "passed" else "failed"
+        cid = jcs_hash(_witness_value(cc, rc, nodeid, outcome, code_files))
+        witnesses.append(Witness(cc, rc, nodeid, outcome, tuple(sorted(code_files)), cid))
+    return witnesses


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Refuse empty witness sets when pytest exited abnormally.

subprocess.run()'s return code is ignored here. If pytest aborts before the plugin writes any node results — e.g. collection/import failure, plugin load failure, internal error — this falls through as results = {} and returns no witnesses. build_suite_bundle() then treats that as a valid empty suite. At minimum, fail when pytest exits non-zero and no outcomes were recorded.

Suggested guard

- subprocess.run( + proc = subprocess.run( [sys.executable, "-m", "pytest", test_file, "-q", "-p", "no:cacheprovider", "-p", "_witness_collect"], cwd=project_dir, capture_output=True, text=True, env=env, ) results: dict = {} if os.path.exists(out_path) and os.path.getsize(out_path) > 0: with open(out_path, encoding="utf-8") as f: try: results = _json.load(f) except ValueError: results = {} + if proc.returncode != 0 and not results: + raise RuntimeError( + f"pytest exited {proc.returncode} before emitting witness results for {test_file}" + )

🧰 Tools

🪛 Ruff (0.15.15)

[error] 111-111: subprocess call: check for execution of untrusted input

(S603)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/witness.py` around lines 111 - 134, subprocess.run(...) currently ignores pytest's return code so if pytest aborts and the plugin never writes out_path we silently return an empty results set; change the call to capture the CompletedProcess (e.g. result = subprocess.run(...)) and after reading out_path check: if result.returncode != 0 and not results: raise an exception (RuntimeError or subprocess.CalledProcessError) indicating pytest exited non-zero with no recorded outcomes so the suite cannot be treated as a valid empty suite; keep the existing finally block that removes out_path and reference symbols: subprocess.run, result.returncode, out_path, results, witnesses.

…proof carries A flag (PROVEKIT_LEAN_SOURCE=1) gated the right thing -- so the DEFAULT proof inlined the function body, a doubled copy of code the package already holds, and you had to remember a flag to get the lean SourceMemento. A flag gating correctness means the default is wrong. Deleted, not inverted. - bind_lifter: `_body_source_locator` is the FULL reconstruction (locus + cids + body + ast_template) -- what the Source Oracle returns. The MINT path strips it to the SourceMemento (`source_memento_of`: file, source_cid, span, template_cid, param_names) before anything enters the `.proof`. The body NEVER touches the proof. No flag, no fat alternative. - source_oracle: the flag-pop dance is gone; the reconstructor is unconditionally complete, so resolve just calls it. Deleting the bad path flushed out every consumer that was secretly leaning on the inline body (the whole point of deleting it): - `_binding_template_from_sugar_entry` REQUIRED inline ast_template and silently DROPPED lean bindings. Recognize matches by template_cid (always present, even lean), so the gate was wrong; build the template for lean entries and carry `body_source` + `source_function_name` so the oracle can resolve. - `materialize_impl` indexed only bindings that already had `body_text`, so a lean binding produced "no sugar binding in scope" and 0 materialized. Now it resolves the body through the Source Oracle -- the same resolution recognize uses. Result: numpy-showcase mints lean with NO flag -- 0 inline body, SourceMementos only. recognize + materialize + the snake-eats-tail fixpoint all resolve from disk via the oracle, CID-verified, refusing on drift. The proof signs what you run; it does not carry a copy of it. Tests migrated off the deleted inline form to assert the SourceMemento + oracle reconstruction. python-source 235, pytest-witness 25, py-tests 349 -- all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two CI failures on this branch were inherited from #1936 (TypeScript + bug-zoo bankruptcy), red on main too -- references to things #1936 deleted but did not clean up: - `bug_zoo_machinery_is_self_contained` (cli_surface.rs) asserted `menagerie/bug-zoo/Cargo.toml` EXISTS -- but #1936 deleted the bug zoo. The test is obsolete; `provekit_cli_does_not_expose_zoo_subcommand` still holds and stays. - `.github/workflows/provekit.yml` was a TS/node "prove" job whose only steps typecheck (`tsc --noEmit`) and test (`pnpm test`) the deleted `implementations/typescript` tree -- TS18003 "no inputs found" every run. A job that only checks deleted code is not a gate. The real signal is `acid test (make ci)` (rust + python). - `tsconfig.json` pinned `include: implementations/typescript/src` (gone). Deleted all three. The acid test now has no dangling rust test; the dead TS gate is removed rather than left rotting red. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py`:
- Around line 706-711: The lean binding path currently skips creating a
recognition pattern when `ast_template` is absent, so self-resolved entries
(which forward `ast_template` from `body_source`) never get passed to
`_recognize_calls_anywhere(...)` and lose body-anywhere coverage; update the
logic around `template_cid` handling to derive and pass a pattern when
`ast_template` is missing by using `body_source`/`body_source.ast_template` (or
the forwarded `ast_template` if present) before any oracle reconstruction,
ensuring `_recognize_calls_anywhere(template_cid, pattern, ...)` is called;
change the code paths in the block referencing `template_cid`, `ast_template`,
`body_source`, and where `_recognize_calls_anywhere` is invoked so lean entries
receive a constructed pattern and retain body-anywhere recognition.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 71af82ad-0384-40cf-97ae-618558eda41b

📥 Commits

Reviewing files that changed from the base of the PR and between 1d790a9 and b997848.

📒 Files selected for processing (8)

.github/workflows/provekit.yml
implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_lifter.py
implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py
implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/source_oracle.py
implementations/python/provekit-lift-python-source/tests/test_bind_lifter.py
implementations/python/provekit-lift-python-source/tests/test_recognize_source_oracle.py
implementations/rust/provekit-cli/tests/cli_surface.rs
tsconfig.json

💤 Files with no reviewable changes (3)

.github/workflows/provekit.yml
tsconfig.json
implementations/rust/provekit-cli/tests/cli_surface.rs

coderabbitai · 2026-06-06T04:36:18Z

+    # `template_cid` is the recognize key (recognize matches by cid, not by the
+    # inline template) and is ALWAYS present -- even in a lean SourceMemento. The
+    # inline `ast_template`/`body_text` are absent in a lean binding; the consumer
+    # (recognize match / materialize fill) resolves them through the Source Oracle.
+    # Carry `body_source` + `source_function_name` so the oracle can resolve.
+    if not isinstance(template_cid, str) or not template_cid:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Self-resolved lean bindings lose body-anywhere recognition coverage.

The template adapter now forwards ast_template from body_source, but lean entries intentionally omit it. Since self-resolved templates are appended before any oracle reconstruction, _recognize_calls_anywhere(...) skips them (no pattern), so only exact-CID whole-function matches remain for those bindings.

Suggested fix

diff --git a/implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py b/implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py @@ def _self_resolved_binding_templates( - template = _binding_template_from_sugar_entry(entry) + template = _binding_template_from_sugar_entry(entry) if template is not None: + if template.get("ast_template") is None: + resolved = _resolve_via_source_oracle(str(root), template) + if resolved is not None: + template = { + **template, + "ast_template": resolved.get("ast_template"), + "body_text": template.get("body_text") or resolved.get("body_text"), + } templates.append(template)

Also applies to: 718-718, 723-724

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py` around lines 706 - 711, The lean binding path currently skips creating a recognition pattern when `ast_template` is absent, so self-resolved entries (which forward `ast_template` from `body_source`) never get passed to `_recognize_calls_anywhere(...)` and lose body-anywhere coverage; update the logic around `template_cid` handling to derive and pass a pattern when `ast_template` is missing by using `body_source`/`body_source.ast_template` (or the forwarded `ast_template` if present) before any oracle reconstruction, ensuring `_recognize_calls_anywhere(template_cid, pattern, ...)` is called; change the code paths in the block referencing `template_cid`, `ast_template`, `body_source`, and where `_recognize_calls_anywhere` is invoked so lean entries receive a constructed pattern and retain body-anywhere recognition.

TSavo and others added 2 commits June 5, 2026 16:15

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

TSavo changed the title ~~pandas: the pandas.testing lift seat + the two-axis showcase~~ pandas: testing lift seat + showcase + the 2×4 guard-shape matrix Jun 5, 2026

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

TSavo and others added 2 commits June 5, 2026 17:18

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

TSavo and others added 2 commits June 5, 2026 19:04

coderabbitai Bot reviewed Jun 6, 2026

View reviewed changes

	which is exactly why pandas, not fame, is the right next rung.
	which is exactly why pandas, not frame, is the right next rung.

		# Guard shape #4 -- divide by zero (pandas), ok case. pandas inherits numpy: zero divisor yields inf, finiteness assertion fails
		# The witness RUNS the real pandas code: this case is witnessed (discharged).

Conversation

TSavo commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. provekit-lift-py-pandas-testing (the seat)

2. examples/pandas-showcase (the proof)

3. examples/python-guard-shapes (the 2×4 matrix)

Signed-receipt verify

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TSavo commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading

1. `provekit-lift-py-pandas-testing` (the seat)

2. `examples/pandas-showcase` (the proof)

3. `examples/python-guard-shapes` (the 2×4 matrix)

Signed-receipt `verify`

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading