pandas: testing lift seat + showcase + the 2×4 guard-shape matrix#1937
pandas: testing lift seat + showcase + the 2×4 guard-shape matrix#1937TSavo wants to merge 8 commits into
Conversation
The next package up the ladder from numpy is pandas, and per the federation design it is just another lifter package: provekit-lift-py-pandas-testing, the exact sibling of provekit-lift-py-numpy-testing, reusing the pytest seat's term/formula machinery so equivalent claims federate. The pandas EXACT/APPROXIMATE discipline is SHARPER than numpy's because pandas makes approximate the DEFAULT: assert_frame_equal / assert_series_equal compare floats with a tolerance unless check_exact is pinned. (Confirmed against pandas 3.0.3: check_exact's default is the <no_default> sentinel, resolved at runtime by dtype.) So lifting an un-pinned frame assertion as `=` would FALSE-PASS, claiming exact equality pandas never checked. Version-independently, the seat lifts a frame/series/index/extension-array assertion as `=` ONLY when check_exact=True is explicitly pinned AND no relation-altering keyword (check_like, rtol/atol, check_dtype, ...) is present; everything else is a LOUD REFUSE. The whole pandas._testing (tm) 22-name vocabulary is recognised so nothing real is silently skipped -- the trichotomy: express the exact, loudly refuse the rest. assert_equal is deliberately excluded (cross-library generic name owned by the numpy/generic seat). Honest scope: a frame/series equality is opaque-EUF on both sides, so z3 cannot manufacture a contradiction between two opaque DataFrame constructors -- same as numpy array equality. The CONSISTENCY teeth for pandas come from SCALAR assertions, which the pytest bare-assert seat already lifts (and this seat lifts when they ride a claimed pandas.testing test). This seat's load-bearing job is sound loud-refusal of approximate frame comparison plus callsite keying of the pandas op under test for the witness axis. 17 unit tests: positive lifts (check_exact pinned), the approximate-by-default refusal, relation-altering-kwarg refusals, mutation/control-flow/side-effect refusals, SSA independence, broad-tm claim+refuse, and assert_equal non-claim. Wired into the Makefile test-python loop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stands up pandas next to numpy, proving correctness with zero code changes.
Differs from numpy-showcase only by swapping in the pandas.testing seat and
pointing the witness venv at pandas -- the next package is just another lifter
package, as designed.
Three lift seats over the project: the plain pytest CONSISTENCY seat (scalar
assertions, where z3's teeth are), the pandas.testing seat (frame assertions;
approximate-by-default refused unless check_exact pinned), and pytest-witness
(runs the tests under real pandas). The project deliberately contains a buggy
(self-contradictory) test, so the showcase proves the CORRECT pandas code AND
catches the contradiction -- refused BOTH ways:
consistency: Series.sum() == 6 consistent (discharged); == 6 AND == 7 UNSAT
(refused).
witness : the good tests reproduce (discharged); the contradictory run is
'failed' (refused).
run.sh is self-checking: it asserts provekit produces exactly that verdict and
exits non-zero otherwise. Verified green (4 discharged, 2 refused).
Scope: demonstrates lift + prove (consistency + witness by recompute). The
stricter signed-receipt verify path re-resolves each witness body, which for the
pytest-witness kit needs a packaged witness body or full re-run metadata
forwarded by the verifier -- a follow-up increment, noted in the README.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Looking for one thing? Review this PR in Change Stack to search files, summaries, diffs, and code without losing your place. WalkthroughThis PR adds a pandas.testing assertion lifter with an LSP server and tests, changes pytest-witness to emit and resolve suite-level witness packages, integrates the pandas suite into Makefile tests, and adds two runnable example projects (pandas-showcase and python-guard-shapes) with run scripts that self-validate witness outcomes. ChangesPandas.testing Lift Implementation
Pandas Showcase (Dual-Axis Example)
Python Guard Shapes (Runtime Guard Example)
Pytest Witness: suite packaging and runtime
RPC & Source-oracle changes
Repository-level / CI changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 07533f0056
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| except ValueError as e: | ||
| skipped.append(f"`{_unparse(stmt)[:60]}`: {e}") | ||
| continue |
There was a problem hiding this comment.
Refuse the whole test after an unliftable pandas assertion
When a test mixes an unpinned/approximate assert_frame_equal(...) with any liftable bare assert, this continue only records the pandas refusal in skipped; if another assertion is keyed later, the function still emits a successful partial contract. That bypasses the intended loud refusal for approximate pandas comparisons and can let prove pass while silently ignoring the frame assertion that made the test unsafe to lift as exact equality.
Useful? React with 👍 / 👎.
Guard shapes #2-#5 (index-bounds, empty-container, divide-by-zero, key-access) proved on BOTH numpy and pandas -- 8 cells, each a real library idiom with a discrimination mutant. Zero code changes. These are RUNTIME faults (IndexError / ValueError / KeyError / silent inf), not logical contradictions, so the consistency (z3) axis is blind to them: `a[5] == 0` is a consistent proposition. Only running the code catches the bug, so this example registers ONLY the pytest-witness seat -- the axis that proves correctness by execution. It re-runs each case under real numpy/pandas: the guarded `_ok` case is witnessed (discharged); the `_bad` case breaches the guard so the run raises (or, for divide-by-zero, the finiteness assertion fails) and the witness is refused. Divide-by-zero is the sharp cell: numpy/pandas return silent `inf` rather than raising; provekit catches it because `_ok` asserts isfinite and `_bad` fails it. The witness is per-file (runs `pytest <file>`), so `_ok`/`_bad` are separate files -- 8 cells x 2 = 16. run.sh checks the verdict PER FILE (every `_ok` discharged, every `_bad` refused), so a swapped or missing verdict fails the gate. Verified green: 8 discharged, 8 refused. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iles)
The signed-receipt verify path refused good witnesses with "no package file and
not re-runnable" on projects that are all tests (numpy/pandas showcases). Root
cause: the resolve_witness RECOMPUTE guard tested `memento.get("code_files")`
for truthiness, but an all-tests project pins an EMPTY code_files (the code under
test is the installed library, not a local file), and `[]` is falsy in Python.
So a trivially re-runnable witness was declared not re-runnable.
Fix is one line: gate on `code_files is not None` (present), not truthiness. The
empty list is part of the pinned witness body and reconstructs correctly, so the
tamper check at line 150 still holds -- this stays sound.
After the fix, verify re-resolves the good witnesses by recompute (rust re-runs
the test, recomputes the CID, matches) and refuses the contradictory ones as
failed runs -- the same verdict prove gives. Confirmed on pandas-showcase and
python-guard-shapes.
Adds a regression test driving the resolve_witness RPC with an empty-code_files
witness (would fail pre-fix), and drops the now-stale verify caveat from the
pandas-showcase README. Kit tests: 20 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py (1)
168-196: 💤 Low valueRename variable
ltoleftorlhsfor clarity.The variable
lat line 193 can be confused with the digit1in many fonts. Consider renaming it toleftorlhsfor better readability. Similarly,rat line 194 could berightorrhs.♻️ Proposed refactor
- l = _translate_term_scoped(call.args[0], scope, call_vars) - r = _translate_term_scoped(call.args[1], scope, call_vars) - return comparison_with_none_guard("=", l, r, emit_none_guard=False) + left = _translate_term_scoped(call.args[0], scope, call_vars) + right = _translate_term_scoped(call.args[1], scope, call_vars) + return comparison_with_none_guard("=", left, right, emit_none_guard=False)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py` around lines 168 - 196, In _lift_pdt_assertion_scoped rename the local variables l and r to clearer names (e.g., left/lhs and right/rhs): replace the assignments to l and r from _translate_term_scoped with left (or lhs) and right (or rhs) and update the subsequent return call to use those new names in comparison_with_none_guard; ensure any other references to l and r in that function are updated to the new identifiers for consistency and clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/pandas-showcase/README.md`:
- Line 48: Replace the likely typo in the README sentence that currently reads
"pandas, not fame, is the right next rung" by changing "fame" to "frame" so the
sentence reads "pandas, not frame, is the right next rung" (locate the sentence
text in README.md to edit).
In `@examples/pandas-showcase/run.sh`:
- Line 22: The script currently uses "set -uo pipefail" which omits the -e flag
so failures in commands like cd, mint, or prove may be ignored; update the shell
options to "set -euo pipefail" (or add -e alongside the existing flags) at the
top of the script to fail fast on any command error, and audit the command
sequences around the cd/mint/prove invocations to ensure no subshells or traps
suppress exit codes (the same change should be applied to the other occurrences
of the same set invocation near the mint/prove blocks).
- Around line 30-33: The script exposes PANDAS_WITNESS_VENV via VENV but other
artifacts (lift manifests) call /tmp/pandas-witness-venv/bin/python directly, so
either make all invocations respect the VENV variable or remove the misleading
override; update the script to consistently reference the VENV variable
(PANDAS_WITNESS_VENV -> VENV) everywhere and change any hardcoded references to
/tmp/pandas-witness-venv/bin/python to use "$VENV/bin/python", or if you prefer
the minimal fix, remove PANDAS_WITNESS_VENV and hardcode VENV to
/tmp/pandas-witness-venv so behavior matches the manifests.
In `@examples/python-guard-shapes/run.sh`:
- Around line 14-33: The script currently uses "set -uo pipefail" but lacks -e
so failing commands like cd, "$BIN" mint, or the PATH-prefixed "$BIN" prove can
be ignored; update the shebang/options to "set -euo pipefail" (or add explicit
"|| exit 1" after critical commands) so the script exits immediately on errors,
ensuring failures in cd, the mint call, or the prove call
(PATH="$VENV/bin:$PATH" "$BIN" prove ...) abort the run and surface the actual
fault.
In `@examples/python-guard-shapes/test_divide_by_zero_pandas_ok.py`:
- Around line 1-2: The top-of-file comment is misleading: in
test_divide_by_zero_pandas_ok.py the divisor used is non-zero (2.0) and the test
is an OK/discharged case, so update the header lines that currently mention
"divide by zero" and "zero divisor yields inf" to state this is the non-zero
divisor (2.0) case, that pandas performs the real division and yields finite
results, and that the witness RUNS the real pandas code and is discharged.
---
Nitpick comments:
In
`@implementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.py`:
- Around line 168-196: In _lift_pdt_assertion_scoped rename the local variables
l and r to clearer names (e.g., left/lhs and right/rhs): replace the assignments
to l and r from _translate_term_scoped with left (or lhs) and right (or rhs) and
update the subsequent return call to use those new names in
comparison_with_none_guard; ensure any other references to l and r in that
function are updated to the new identifiers for consistency and clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 35645acd-9785-4340-bd9f-36a4bd501f2a
📒 Files selected for processing (35)
Makefileexamples/pandas-showcase/.provekit/config.tomlexamples/pandas-showcase/.provekit/lift/python-pandas-testing/manifest.tomlexamples/pandas-showcase/.provekit/lift/python-pytest-witness/manifest.tomlexamples/pandas-showcase/.provekit/lift/python/manifest.tomlexamples/pandas-showcase/README.mdexamples/pandas-showcase/run.shexamples/pandas-showcase/test_pandas_frame.pyexamples/pandas-showcase/test_pandas_sum.pyexamples/pandas-showcase/test_pandas_sum_bad.pyexamples/python-guard-shapes/.provekit/config.tomlexamples/python-guard-shapes/.provekit/lift/python-pytest-witness/manifest.tomlexamples/python-guard-shapes/README.mdexamples/python-guard-shapes/run.shexamples/python-guard-shapes/test_divide_by_zero_numpy_bad.pyexamples/python-guard-shapes/test_divide_by_zero_numpy_ok.pyexamples/python-guard-shapes/test_divide_by_zero_pandas_bad.pyexamples/python-guard-shapes/test_divide_by_zero_pandas_ok.pyexamples/python-guard-shapes/test_empty_container_numpy_bad.pyexamples/python-guard-shapes/test_empty_container_numpy_ok.pyexamples/python-guard-shapes/test_empty_container_pandas_bad.pyexamples/python-guard-shapes/test_empty_container_pandas_ok.pyexamples/python-guard-shapes/test_index_bounds_numpy_bad.pyexamples/python-guard-shapes/test_index_bounds_numpy_ok.pyexamples/python-guard-shapes/test_index_bounds_pandas_bad.pyexamples/python-guard-shapes/test_index_bounds_pandas_ok.pyexamples/python-guard-shapes/test_key_access_numpy_bad.pyexamples/python-guard-shapes/test_key_access_numpy_ok.pyexamples/python-guard-shapes/test_key_access_pandas_bad.pyexamples/python-guard-shapes/test_key_access_pandas_ok.pyimplementations/python/provekit-lift-py-pandas-testing/pyproject.tomlimplementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/__init__.pyimplementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/lsp.pyimplementations/python/provekit-lift-py-pandas-testing/src/provekit_lift_py_pandas_testing/pandas_testing_layer.pyimplementations/python/provekit-lift-py-pandas-testing/tests/test_pandas_testing_layer.py
| its tests assert scalars (`.sum() == 6`) alongside frame equality. A package | ||
| whose tests only assert `assert_allclose` over giant tensors (TensorFlow, | ||
| Transformers) would get the witness axis but a near-vacuous consistency axis — | ||
| which is exactly why pandas, not fame, is the right next rung. |
There was a problem hiding this comment.
Fix the likely typo in the rationale sentence.
Line 48 says “pandas, not fame,” which reads like a typo and weakens the explanation.
Suggested edit
-which is exactly why pandas, not fame, is the right next rung.
+which is exactly why pandas, not frame, is the right next rung.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| which is exactly why pandas, not fame, is the right next rung. | |
| which is exactly why pandas, not frame, is the right next rung. |
🧰 Tools
🪛 LanguageTool
[style] ~48-~48: Consider an alternative for the overused word “exactly”.
Context: ...ear-vacuous consistency axis — which is exactly why pandas, not fame, is the right next...
(EXACTLY_PRECISELY)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/pandas-showcase/README.md` at line 48, Replace the likely typo in
the README sentence that currently reads "pandas, not fame, is the right next
rung" by changing "fame" to "frame" so the sentence reads "pandas, not frame, is
the right next rung" (locate the sentence text in README.md to edit).
Source: Linters/SAST tools
| # test_pandas_sum_bad.py, so the showcase proves the CORRECT pandas code AND | ||
| # catches the contradiction -- refused BOTH ways. This script PASSES iff | ||
| # provekit produces exactly that verdict. | ||
| set -uo pipefail |
There was a problem hiding this comment.
Fail fast on command errors in the orchestration script.
Line 22 omits -e, so failed cd, mint, or prove steps can be silently ignored and the script may continue with invalid state.
Suggested hardening
-set -uo pipefail
+set -euo pipefail
@@
-cd "$HERE"
+cd "$HERE" || exit 1Also applies to: 36-36, 41-44
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/pandas-showcase/run.sh` at line 22, The script currently uses "set
-uo pipefail" which omits the -e flag so failures in commands like cd, mint, or
prove may be ignored; update the shell options to "set -euo pipefail" (or add -e
alongside the existing flags) at the top of the script to fail fast on any
command error, and audit the command sequences around the cd/mint/prove
invocations to ensure no subshells or traps suppress exit codes (the same change
should be applied to the other occurrences of the same set invocation near the
mint/prove blocks).
Source: Linters/SAST tools
| set -uo pipefail | ||
| HERE="$(cd "$(dirname "$0")" && pwd)" | ||
| REPO="$(cd "$HERE/../.." && pwd)" | ||
| BIN="$REPO/implementations/rust/target/debug/provekit" | ||
|
|
||
| VENV="${PANDAS_WITNESS_VENV:-/tmp/pandas-witness-venv}" | ||
| if [ ! -x "$VENV/bin/python" ]; then | ||
| python3 -m venv "$VENV" | ||
| "$VENV/bin/pip" install -q numpy pandas pytest pynacl blake3 cbor2 | ||
| fi | ||
|
|
||
| cd "$HERE" | ||
| rm -f blake3-512:*.proof 2>/dev/null || true | ||
| rm -rf .provekit/runs .provekit/witnesses __pycache__ 2>/dev/null || true | ||
|
|
||
| echo "== mint (pytest-witness over the 16 cases) ==" | ||
| "$BIN" mint --out . --quiet | ||
|
|
||
| echo "== prove (witness axis: re-run each case under real numpy/pandas) ==" | ||
| PATH="$VENV/bin:$PATH" "$BIN" prove . --json 2>/dev/null > /tmp/guard_shapes_prove.json |
There was a problem hiding this comment.
Enable fail-fast for gate reliability.
The script does not stop on command failure (set -uo pipefail), so failed cd/mint/prove can fall through and produce misleading gate output. Add -e (or explicit || exit) so orchestration fails at the real fault point.
Suggested patch
-set -uo pipefail
+set -euo pipefail📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| set -uo pipefail | |
| HERE="$(cd "$(dirname "$0")" && pwd)" | |
| REPO="$(cd "$HERE/../.." && pwd)" | |
| BIN="$REPO/implementations/rust/target/debug/provekit" | |
| VENV="${PANDAS_WITNESS_VENV:-/tmp/pandas-witness-venv}" | |
| if [ ! -x "$VENV/bin/python" ]; then | |
| python3 -m venv "$VENV" | |
| "$VENV/bin/pip" install -q numpy pandas pytest pynacl blake3 cbor2 | |
| fi | |
| cd "$HERE" | |
| rm -f blake3-512:*.proof 2>/dev/null || true | |
| rm -rf .provekit/runs .provekit/witnesses __pycache__ 2>/dev/null || true | |
| echo "== mint (pytest-witness over the 16 cases) ==" | |
| "$BIN" mint --out . --quiet | |
| echo "== prove (witness axis: re-run each case under real numpy/pandas) ==" | |
| PATH="$VENV/bin:$PATH" "$BIN" prove . --json 2>/dev/null > /tmp/guard_shapes_prove.json | |
| set -euo pipefail | |
| HERE="$(cd "$(dirname "$0")" && pwd)" | |
| REPO="$(cd "$HERE/../.." && pwd)" | |
| BIN="$REPO/implementations/rust/target/debug/provekit" | |
| VENV="${PANDAS_WITNESS_VENV:-/tmp/pandas-witness-venv}" | |
| if [ ! -x "$VENV/bin/python" ]; then | |
| python3 -m venv "$VENV" | |
| "$VENV/bin/pip" install -q numpy pandas pytest pynacl blake3 cbor2 | |
| fi | |
| cd "$HERE" | |
| rm -f blake3-512:*.proof 2>/dev/null || true | |
| rm -rf .provekit/runs .provekit/witnesses __pycache__ 2>/dev/null || true | |
| echo "== mint (pytest-witness over the 16 cases) ==" | |
| "$BIN" mint --out . --quiet | |
| echo "== prove (witness axis: re-run each case under real numpy/pandas) ==" | |
| PATH="$VENV/bin:$PATH" "$BIN" prove . --json 2>/dev/null > /tmp/guard_shapes_prove.json |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 25-25: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/python-guard-shapes/run.sh` around lines 14 - 33, The script
currently uses "set -uo pipefail" but lacks -e so failing commands like cd,
"$BIN" mint, or the PATH-prefixed "$BIN" prove can be ignored; update the
shebang/options to "set -euo pipefail" (or add explicit "|| exit 1" after
critical commands) so the script exits immediately on errors, ensuring failures
in cd, the mint call, or the prove call (PATH="$VENV/bin:$PATH" "$BIN" prove
...) abort the run and surface the actual fault.
Source: Linters/SAST tools
| # Guard shape #4 -- divide by zero (pandas), ok case. pandas inherits numpy: zero divisor yields inf, finiteness assertion fails | ||
| # The witness RUNS the real pandas code: this case is witnessed (discharged). |
There was a problem hiding this comment.
Fix misleading header comment for the _ok case.
The comment describes the zero-divisor failure path, but this test uses a non-zero divisor (2.0) and is expected to discharge. Please update wording to match the actual scenario.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/python-guard-shapes/test_divide_by_zero_pandas_ok.py` around lines 1
- 2, The top-of-file comment is misleading: in test_divide_by_zero_pandas_ok.py
the divisor used is non-zero (2.0) and the test is an OK/discharged case, so
update the header lines that currently mention "divide by zero" and "zero
divisor yields inf" to state this is the non-zero divisor (2.0) case, that
pandas performs the real division and yields finite results, and that the
witness RUNS the real pandas code and is discharged.
At full-package scale the per-FILE witness fell over: the Oracle ran `pytest <file>` and keyed ONE witness off the file's exit code, so a single failing test out of ~1,600 refused the whole file (numpy: all 179 files refused, ~every passing test discarded). That is not a numpy finding, it is a granularity defect. Running witnessing is the Oracle's job, so the fix lives in the Oracle's run primitive, shared by mint (lift) and verify (recompute). PER-TEST (witness.py + _witness_collect.py): run_file_witnesses runs the file ONCE -- in the package's own execution context (conftest, fixtures, relative imports) -- and a tiny pytest plugin (_witness_collect, json/os only, loaded standalone so it never drags in the kit dep chain) records every test's outcome. One file run -> one witness PER TEST, keyed by the pytest node id. run_and_witness routes a node id back through its file, so LIFT and RECOMPUTE agree on outcome under shared file state. Proven on real numpy test_scalarmath.py: 1,582 per-test witnesses, 1,581 discharged, 1 isolated refusal (was: whole file refused). .witness BUNDLE (witness.py): A per-test run of a real package yields thousands of witnesses; one file per CID or thousands of inline .proof mementos do not scale. write/read_witness_bundle pack MANY bodies into ONE content-addressed .witness file -- JSONL, one canonical body per line, blake3(line) == that witness's cid. The content IS the key: no index, nothing to trust; the Oracle resolves by scan + blake3, exactly its content-address check. read_witness_body now resolves from bundles too. mint writes the bundle to .provekit/witnesses/; the .proof carries only the signed pointers. Also fixes a real full-scale robustness bug: bind_rpc._send crashed with UnicodeEncodeError when a pandas source emoji (🐍) split a surrogate pair across the 9,802-binding response; now encodes with errors="replace" so one pathological character never aborts the run. Lesson recorded: a real package's suite must be witnessed against the INSTALLED package, not a loose copy -- a compiled package (numpy) cannot be duplicated and imported alongside itself (ndarray-type identity clash). Kit: 23 tests (3 new: per-test, recompute-agreement, content-addressed bundle). Examples green: guard-shapes 8/8, pandas-showcase both axes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The full-package numpy proof was 900 MB / 99,189 members because the proof carried 48,935 witnesses BY VALUE (a ContractDecl + memento per test). That is an addressing bug, not a size problem: a proof should carry witnesses BY REFERENCE -- one content-addressed package, not N inline bodies. The oracle mints, the verifier asks the oracle, the proof carries one cid: - MINT (the oracle): `build_suite_bundle` runs the whole suite per-test and assembles ONE content-addressed `.witness` package (cid = blake3(bundle)). `handle_lift` emits ONE `WitnessPackageMemento` (a pointer + signature over the package cid, carrying the test/code files the oracle needs to reproduce it) and ONE contract whose evidence pins the package cid. NOT N mementos. - VERIFY (the verifier asks the oracle): UNCHANGED. `try_witness_discharge` already spawns the kit's discharge command; `witness_verify` already calls `resolve_witness`. So discharge re-runs the suite, rebuilds the package, and confirms it reproduces the pinned cid (`discharge_bundle`); the signed dimension resolves the package by cid (package file or recompute) and content-addresses it. No rust change. - THE PROOF CARRIES ONE WITNESS CID: the package. Per-test granularity lives IN the package (each line a witness body, self-addressing), committed by the one cid. Result on guard-shapes: proof 900MB-shape -> 4,394 bytes, 99k members -> 2 (one package contract + one WitnessPackageMemento), the 16 per-test facts in the content-addressed package. Discharge re-runs the suite and names the exact 8 failing tests. verify resolves the package via the package file, blake3 matches. A package DISCHARGES iff it reproduces AND every test passed; a failing test refuses it (naming the failures) -- which is what a failing test in a package means. Examples read per-test discrimination straight from the package. Kit: 25 tests (2 new: package one-cid roundtrip + refuses-on-failure/drift). guard-shapes + pandas-showcase green. No compressor, no RNG -- content-addressing is the dedup, determinism is the point. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
examples/python-guard-shapes/run.sh (1)
14-14:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winEnable fail-fast for gate reliability.
The script does not stop on command failure (
set -uo pipefail), so failedcd/mint/provecan fall through and produce misleading gate output. Add-e(or explicit|| exit) so orchestration fails at the real fault point.🛡️ Suggested patch
-set -uo pipefail +set -euo pipefail🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/python-guard-shapes/run.sh` at line 14, The script currently enables -u and pipefail but not -e, so failures in commands like cd, mint, or prove can be ignored; update the shell options line that reads "set -uo pipefail" to include -e (making it "set -euo pipefail") so the script fails fast on any command error and prevents misleading gate outputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/_witness_collect.py`:
- Around line 21-29: pytest_runtest_logreport currently ignores report.when ==
"teardown" so a test that fails during teardown can remain recorded as "passed";
update pytest_runtest_logreport (and the _RESULTS map) to treat teardown
failures like setup failures by overwriting the node outcome when report.when ==
"teardown" and report.outcome != "passed" (e.g., set _RESULTS[nodeid] =
report.outcome or setdefault to "error" as appropriate) so teardown failures
replace any prior "passed" entry for that nodeid.
In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/lift_lsp.py`:
- Around line 154-166: The package-recompute branch (the if block checking
memento.get("witness_kind") == "pytest-witness-package" and calling
build_suite_bundle) currently trusts memento["test_files"] and
memento["code_files"] without authentication; change it so you either (A) verify
those lists are present in the signed payload associated with the memento before
using them, or (B) refuse to recompute and raise an error unless an
authenticated package file (the original bundle) is supplied. Locate the branch
that calls build_suite_bundle(ws, list(memento.get("test_files", [])),
list(memento.get("code_files", []))) and replace the unconditional recompute
with logic that checks the signed/asserted inputs (or requires the package file)
and only calls build_suite_bundle when those inputs are cryptographically bound
to the memento; otherwise raise a clear RuntimeError rejecting recompute.
In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/witness.py`:
- Around line 111-134: subprocess.run(...) currently ignores pytest's return
code so if pytest aborts and the plugin never writes out_path we silently return
an empty results set; change the call to capture the CompletedProcess (e.g.
result = subprocess.run(...)) and after reading out_path check: if
result.returncode != 0 and not results: raise an exception (RuntimeError or
subprocess.CalledProcessError) indicating pytest exited non-zero with no
recorded outcomes so the suite cannot be treated as a valid empty suite; keep
the existing finally block that removes out_path and reference symbols:
subprocess.run, result.returncode, out_path, results, witnesses.
---
Duplicate comments:
In `@examples/python-guard-shapes/run.sh`:
- Line 14: The script currently enables -u and pipefail but not -e, so failures
in commands like cd, mint, or prove can be ignored; update the shell options
line that reads "set -uo pipefail" to include -e (making it "set -euo pipefail")
so the script fails fast on any command error and prevents misleading gate
outputs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7e8a7c37-2423-4961-a13f-aa3ba5262efd
📒 Files selected for processing (8)
examples/pandas-showcase/README.mdexamples/pandas-showcase/run.shexamples/python-guard-shapes/run.shimplementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/_witness_collect.pyimplementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/lift_lsp.pyimplementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/witness.pyimplementations/python/provekit-lift-py-pytest-witness/tests/test_witness.pyimplementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py
✅ Files skipped from review due to trivial changes (1)
- examples/pandas-showcase/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/pandas-showcase/run.sh
| def pytest_runtest_logreport(report): | ||
| nodeid = report.nodeid | ||
| if report.when == "call": | ||
| # passed | failed | skipped for the test body itself | ||
| _RESULTS[nodeid] = report.outcome | ||
| elif report.when == "setup" and report.outcome == "failed": | ||
| # a setup/collection error never reaches the call phase; record it so the | ||
| # test is a witness (a 'failed' one), not silently dropped. | ||
| _RESULTS.setdefault(nodeid, "error") |
There was a problem hiding this comment.
Treat teardown failures as non-passing outcomes.
pytest_runtest_logreport also fires for when == "teardown". Right now a test that passes in call but fails during fixture cleanup stays recorded as "passed", so the witness layer can mint a passing witness for a pytest error. Overwrite the node outcome on teardown failures instead of only handling setup failures.
Suggested fix
def pytest_runtest_logreport(report):
nodeid = report.nodeid
if report.when == "call":
# passed | failed | skipped for the test body itself
_RESULTS[nodeid] = report.outcome
- elif report.when == "setup" and report.outcome == "failed":
- # a setup/collection error never reaches the call phase; record it so the
- # test is a witness (a 'failed' one), not silently dropped.
- _RESULTS.setdefault(nodeid, "error")
+ elif report.when in {"setup", "teardown"} and report.outcome == "failed":
+ # setup/teardown errors should not leave an earlier "passed" result in place
+ _RESULTS[nodeid] = "error"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/_witness_collect.py`
around lines 21 - 29, pytest_runtest_logreport currently ignores report.when ==
"teardown" so a test that fails during teardown can remain recorded as "passed";
update pytest_runtest_logreport (and the _RESULTS map) to treat teardown
failures like setup failures by overwriting the node outcome when report.when ==
"teardown" and report.outcome != "passed" (e.g., set _RESULTS[nodeid] =
report.outcome or setdefault to "error" as appropriate) so teardown failures
replace any prior "passed" entry for that nodeid.
| # 2a. PACKAGE RECOMPUTE -- a whole-suite WitnessPackageMemento reproduces | ||
| # by re-running the suite and rebuilding the content-addressed bundle. | ||
| if body is None and ws and memento.get("witness_kind") == "pytest-witness-package": | ||
| from .witness import build_suite_bundle | ||
| buf, rcid, _ = build_suite_bundle( | ||
| ws, list(memento.get("test_files", [])), list(memento.get("code_files", [])) | ||
| ) | ||
| if rcid != cid: | ||
| raise RuntimeError( | ||
| f"witness package did not reproduce: recomputed {rcid}, pinned {cid}" | ||
| ) | ||
| body = buf | ||
| resolved_by = "recompute" |
There was a problem hiding this comment.
Don't recompute suite packages from unsigned rerun inputs.
This branch trusts memento["test_files"] and memento["code_files"] to decide what to execute, but those fields are not authenticated here before the rerun starts. That means a tampered package memento can steer verification into running a different suite and only fail after recomputing the wrong CID. The rerun inputs need to be bound into the signed payload and verified first, or package recompute should refuse unless the package file is supplied.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/lift_lsp.py`
around lines 154 - 166, The package-recompute branch (the if block checking
memento.get("witness_kind") == "pytest-witness-package" and calling
build_suite_bundle) currently trusts memento["test_files"] and
memento["code_files"] without authentication; change it so you either (A) verify
those lists are present in the signed payload associated with the memento before
using them, or (B) refuse to recompute and raise an error unless an
authenticated package file (the original bundle) is supplied. Locate the branch
that calls build_suite_bundle(ws, list(memento.get("test_files", [])),
list(memento.get("code_files", []))) and replace the unconditional recompute
with logic that checks the signed/asserted inputs (or requires the package file)
and only calls build_suite_bundle when those inputs are cryptographically bound
to the memento; otherwise raise a clear RuntimeError rejecting recompute.
| subprocess.run( | ||
| [sys.executable, "-m", "pytest", test_file, "-q", "-p", "no:cacheprovider", | ||
| "-p", "_witness_collect"], | ||
| cwd=project_dir, capture_output=True, text=True, env=env, | ||
| ) | ||
| results: dict = {} | ||
| if os.path.exists(out_path) and os.path.getsize(out_path) > 0: | ||
| with open(out_path, encoding="utf-8") as f: | ||
| try: | ||
| results = _json.load(f) | ||
| except ValueError: | ||
| results = {} | ||
| finally: | ||
| if os.path.exists(out_path): | ||
| os.remove(out_path) | ||
|
|
||
| witnesses: List[Witness] = [] | ||
| for nodeid, raw in sorted(results.items()): | ||
| if raw == "skipped": | ||
| continue # a skip is neither a discharge nor a refusal | ||
| outcome = "passed" if raw == "passed" else "failed" | ||
| cid = jcs_hash(_witness_value(cc, rc, nodeid, outcome, code_files)) | ||
| witnesses.append(Witness(cc, rc, nodeid, outcome, tuple(sorted(code_files)), cid)) | ||
| return witnesses |
There was a problem hiding this comment.
Refuse empty witness sets when pytest exited abnormally.
subprocess.run()'s return code is ignored here. If pytest aborts before the plugin writes any node results — e.g. collection/import failure, plugin load failure, internal error — this falls through as results = {} and returns no witnesses. build_suite_bundle() then treats that as a valid empty suite. At minimum, fail when pytest exits non-zero and no outcomes were recorded.
Suggested guard
- subprocess.run(
+ proc = subprocess.run(
[sys.executable, "-m", "pytest", test_file, "-q", "-p", "no:cacheprovider",
"-p", "_witness_collect"],
cwd=project_dir, capture_output=True, text=True, env=env,
)
results: dict = {}
if os.path.exists(out_path) and os.path.getsize(out_path) > 0:
with open(out_path, encoding="utf-8") as f:
try:
results = _json.load(f)
except ValueError:
results = {}
+ if proc.returncode != 0 and not results:
+ raise RuntimeError(
+ f"pytest exited {proc.returncode} before emitting witness results for {test_file}"
+ )🧰 Tools
🪛 Ruff (0.15.15)
[error] 111-111: subprocess call: check for execution of untrusted input
(S603)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@implementations/python/provekit-lift-py-pytest-witness/src/provekit_pytest_witness/witness.py`
around lines 111 - 134, subprocess.run(...) currently ignores pytest's return
code so if pytest aborts and the plugin never writes out_path we silently return
an empty results set; change the call to capture the CompletedProcess (e.g.
result = subprocess.run(...)) and after reading out_path check: if
result.returncode != 0 and not results: raise an exception (RuntimeError or
subprocess.CalledProcessError) indicating pytest exited non-zero with no
recorded outcomes so the suite cannot be treated as a valid empty suite; keep
the existing finally block that removes out_path and reference symbols:
subprocess.run, result.returncode, out_path, results, witnesses.
…proof carries A flag (PROVEKIT_LEAN_SOURCE=1) gated the right thing -- so the DEFAULT proof inlined the function body, a doubled copy of code the package already holds, and you had to remember a flag to get the lean SourceMemento. A flag gating correctness means the default is wrong. Deleted, not inverted. - bind_lifter: `_body_source_locator` is the FULL reconstruction (locus + cids + body + ast_template) -- what the Source Oracle returns. The MINT path strips it to the SourceMemento (`source_memento_of`: file, source_cid, span, template_cid, param_names) before anything enters the `.proof`. The body NEVER touches the proof. No flag, no fat alternative. - source_oracle: the flag-pop dance is gone; the reconstructor is unconditionally complete, so resolve just calls it. Deleting the bad path flushed out every consumer that was secretly leaning on the inline body (the whole point of deleting it): - `_binding_template_from_sugar_entry` REQUIRED inline ast_template and silently DROPPED lean bindings. Recognize matches by template_cid (always present, even lean), so the gate was wrong; build the template for lean entries and carry `body_source` + `source_function_name` so the oracle can resolve. - `materialize_impl` indexed only bindings that already had `body_text`, so a lean binding produced "no sugar binding in scope" and 0 materialized. Now it resolves the body through the Source Oracle -- the same resolution recognize uses. Result: numpy-showcase mints lean with NO flag -- 0 inline body, SourceMementos only. recognize + materialize + the snake-eats-tail fixpoint all resolve from disk via the oracle, CID-verified, refusing on drift. The proof signs what you run; it does not carry a copy of it. Tests migrated off the deleted inline form to assert the SourceMemento + oracle reconstruction. python-source 235, pytest-witness 25, py-tests 349 -- all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two CI failures on this branch were inherited from #1936 (TypeScript + bug-zoo bankruptcy), red on main too -- references to things #1936 deleted but did not clean up: - `bug_zoo_machinery_is_self_contained` (cli_surface.rs) asserted `menagerie/bug-zoo/Cargo.toml` EXISTS -- but #1936 deleted the bug zoo. The test is obsolete; `provekit_cli_does_not_expose_zoo_subcommand` still holds and stays. - `.github/workflows/provekit.yml` was a TS/node "prove" job whose only steps typecheck (`tsc --noEmit`) and test (`pnpm test`) the deleted `implementations/typescript` tree -- TS18003 "no inputs found" every run. A job that only checks deleted code is not a gate. The real signal is `acid test (make ci)` (rust + python). - `tsconfig.json` pinned `include: implementations/typescript/src` (gone). Deleted all three. The acid test now has no dangling rust test; the dead TS gate is removed rather than left rotting red. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py`:
- Around line 706-711: The lean binding path currently skips creating a
recognition pattern when `ast_template` is absent, so self-resolved entries
(which forward `ast_template` from `body_source`) never get passed to
`_recognize_calls_anywhere(...)` and lose body-anywhere coverage; update the
logic around `template_cid` handling to derive and pass a pattern when
`ast_template` is missing by using `body_source`/`body_source.ast_template` (or
the forwarded `ast_template` if present) before any oracle reconstruction,
ensuring `_recognize_calls_anywhere(template_cid, pattern, ...)` is called;
change the code paths in the block referencing `template_cid`, `ast_template`,
`body_source`, and where `_recognize_calls_anywhere` is invoked so lean entries
receive a constructed pattern and retain body-anywhere recognition.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 71af82ad-0384-40cf-97ae-618558eda41b
📒 Files selected for processing (8)
.github/workflows/provekit.ymlimplementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_lifter.pyimplementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.pyimplementations/python/provekit-lift-python-source/src/provekit_lift_python_source/source_oracle.pyimplementations/python/provekit-lift-python-source/tests/test_bind_lifter.pyimplementations/python/provekit-lift-python-source/tests/test_recognize_source_oracle.pyimplementations/rust/provekit-cli/tests/cli_surface.rstsconfig.json
💤 Files with no reviewable changes (3)
- .github/workflows/provekit.yml
- tsconfig.json
- implementations/rust/provekit-cli/tests/cli_surface.rs
| # `template_cid` is the recognize key (recognize matches by cid, not by the | ||
| # inline template) and is ALWAYS present -- even in a lean SourceMemento. The | ||
| # inline `ast_template`/`body_text` are absent in a lean binding; the consumer | ||
| # (recognize match / materialize fill) resolves them through the Source Oracle. | ||
| # Carry `body_source` + `source_function_name` so the oracle can resolve. | ||
| if not isinstance(template_cid, str) or not template_cid: |
There was a problem hiding this comment.
Self-resolved lean bindings lose body-anywhere recognition coverage.
The template adapter now forwards ast_template from body_source, but lean entries intentionally omit it. Since self-resolved templates are appended before any oracle reconstruction, _recognize_calls_anywhere(...) skips them (no pattern), so only exact-CID whole-function matches remain for those bindings.
Suggested fix
diff --git a/implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py b/implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py
@@ def _self_resolved_binding_templates(
- template = _binding_template_from_sugar_entry(entry)
+ template = _binding_template_from_sugar_entry(entry)
if template is not None:
+ if template.get("ast_template") is None:
+ resolved = _resolve_via_source_oracle(str(root), template)
+ if resolved is not None:
+ template = {
+ **template,
+ "ast_template": resolved.get("ast_template"),
+ "body_text": template.get("body_text") or resolved.get("body_text"),
+ }
templates.append(template)Also applies to: 718-718, 723-724
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/bind_rpc.py`
around lines 706 - 711, The lean binding path currently skips creating a
recognition pattern when `ast_template` is absent, so self-resolved entries
(which forward `ast_template` from `body_source`) never get passed to
`_recognize_calls_anywhere(...)` and lose body-anywhere coverage; update the
logic around `template_cid` handling to derive and pass a pattern when
`ast_template` is missing by using `body_source`/`body_source.ast_template` (or
the forwarded `ast_template` if present) before any oracle reconstruction,
ensuring `_recognize_calls_anywhere(template_cid, pattern, ...)` is called;
change the code paths in the block referencing `template_cid`, `ast_template`,
`body_source`, and where `_recognize_calls_anywhere` is invoked so lean entries
receive a constructed pattern and retain body-anywhere recognition.
The next package up the ladder from numpy is pandas — and per the federation design, it is just another lifter package. This PR adds the seat, the showcase that proves real pandas with zero changes, and the 2×4 guard-shape matrix across numpy + pandas.
1.
provekit-lift-py-pandas-testing(the seat)Exact sibling of
provekit-lift-py-numpy-testing, reusing the pytest seat's term/formula machinery. The EXACT/APPROX discipline is sharper than numpy because pandas is approximate-by-default:assert_frame_equal/assert_series_equalcompare floats with a tolerance unlesscheck_exactis pinned (pandas 3.0.3: the default is the<no_default>sentinel). So it lifts as=only whencheck_exact=Trueis explicitly pinned and no relation-altering kwarg is present; everything else is a loud refuse. Claims the wholepandas._testing(tm) 22-name vocabulary so nothing real is silently skipped. 17 unit tests; wired intomake test-python.Honest scope: frame equality is opaque-EUF on both sides, so z3 can't contradict two opaque DataFrames (same as numpy array equality). The consistency teeth come from scalar assertions; this seat's job is sound refusal-of-approximate + witness keying.
2.
examples/pandas-showcase(the proof)Real pandas project with one buggy self-contradictory test → proves the good code and refuses the bug both ways (consistency:
==6consistent /==6 AND ==7UNSAT; witness: reproduces / runfailed).run.shself-checks. Green: 4 discharged, 2 refused.3.
examples/python-guard-shapes(the 2×4 matrix)Guard shapes #2 index-bounds, #3 empty-container, #4 divide-by-zero, #5 key-access × {numpy, pandas} = 8 cells, each a real library idiom with a discrimination mutant.
These are runtime faults (IndexError/ValueError/KeyError/silent inf), not logical contradictions — so the consistency axis is blind and only the witness axis catches them. Divide-by-zero is the sharp cell: numpy/pandas return silent
inf, caught via anisfiniteassertion.run.shchecks the verdict per file (every_okdischarged, every_badrefused). Green: 8 discharged, 8 refused.Signed-receipt
verifyThe stricter
verify(witness-body re-resolution) now passes too. A one-line fix: the resolve_witness recompute guard testedcode_filesfor truthiness, but an all-tests project pins an emptycode_files(the code under test is the installed library), and[]is falsy. Gated on presence instead; verify now re-resolves the good witnesses by recompute and refuses the bad ones as failed runs. Regression test added (kit: 20 passed).Summary by CodeRabbit
New Features
Documentation
Bug Fixes