Skip to content

[codex] lift Python class shapes#1913

Open
TSavo wants to merge 1 commit into
mainfrom
codex/python-classshape-forensics-20260603
Open

[codex] lift Python class shapes#1913
TSavo wants to merge 1 commit into
mainfrom
codex/python-classshape-forensics-20260603

Conversation

@TSavo
Copy link
Copy Markdown
Owner

@TSavo TSavo commented Jun 3, 2026

Summary

This PR adds Slice 1 of the Python class-shape work: a data-only classShapes catalog on Python source-unit contracts.

The slice is intentionally soundness-inert:

  • no verifier changes
  • no mint/discharge changes
  • no attribute_present predicate emission yet
  • no new attribute cf_guarded carriers
  • existing attribute panicLoci remain in place

The catalog records:

  • guaranteed-present attributes only for class-body assignments and unconditional __init__ instance assignments
  • slot declarations as permittedAttributes with guaranteesPresence=false, not as presence
  • openAttributes for known but not guaranteed members
  • method receiver metadata for instance/classmethod/staticmethod/property methods
  • conservative openReasons for the seven approved soundness cases
  • explicit bounded assumptions for standard __init__ construction and cross-module mutation limits

Additive / CID Gate

Confirmed classShapes is purely additive and gated:

  • Class-free unit check: current output matches origin/main exactly and carries no classShapes field.
  • Class-bearing unit check: current output differs from origin/main only by additive classShapes; stripping that field restores byte-identical origin/main IR.

Command evidence:

class-free: old==new, classShapes absent
class-bearing: old!=new due to additive classShapes; stripping classShapes restores origin/main IR
class-bearing classShapes count: 1

Signature/CID classification:

  • Class-free minted signatures: no expected shift from this change, because classShapes is absent for class-free source units.
  • Class-bearing IR signature delta: expected-additive only; removing the new classShapes metadata restores the old IR.
  • Python self-contract attestation: no committed signature delta. make USE_BCARGO=0 mint-python accepted the checked-in attestation:
cid:            blake3-512:64c8ddf3a7ef02c6d12665866e1c2483b59009830a5f27cee869b123d5ece63cccb7dc8d62002383f348b15cd866857de0c6a053dd2fc02e3d9356bd3295b0a4
contractSetCid: blake3-512:b1de941756d0a3b352ca79ebed8b75644b7c782c3afe4163273220384125ec100457d5e969a921b7ceb277e329a24c5a4ea21ffd54963b51c1756befdb1793dc
OK  .provekit/self-contracts-attestations/python.json (contractSetCid blake3-512:b1de941756d0a3b352ca79ebed8b75644b7c782c3afe4163273220384125ec100457d5e969a921b7ceb277e329a24c5a4ea21ffd54963b51c1756befdb1793dc)

make python-language-signature currently fails because checked-in language-signature assets are already stale on origin/main:

python-language-signature specs are stale: specs/op_source-unit.spec.json, specs/op_compare.spec.json, specs/language_signature_python.spec.json

This branch does not modify menagerie/python-language-signature, and the same check fails from an origin/main archive, so this is reported as a pre-existing signature check failure, not a regression from classShapes.

RED -> GREEN Evidence

Tests were written before implementation.

Initial RED run:

python3 -m pytest implementations/python/provekit-lift-python-source/tests/test_lifter.py -q -k class_shape
FF.
2 failed, 1 passed, 139 deselected

The failures were because classShapes was absent from source-unit contracts.

GREEN runs:

python3 -m pytest implementations/python/provekit-lift-python-source/tests/test_lifter.py -q
142 passed in 0.42s

python3 -m pytest implementations/python/provekit-lift-python-source/tests -q
231 passed in 1.14s

git diff --check
# clean

Python Kit Conformance / Self-Application

Python C1-C8 kit conformance passed with local Cargo:

make USE_BCARGO=0 prove-python
pass C1: initialize protocol_version_match
pass C2: initialize capabilities_populated
pass C3: lift request well_formed
pass C4: lift surface_in_capabilities
pass C5: lift response kind_matches_layer
pass C6: lift response ir_document_array
pass C7: diagnostics field_is_array
pass C8: call_edge_stream_present
pass: kit=`python`: all 8 contracts hold

Python self-contract mint/self-application passed:

make USE_BCARGO=0 mint-python
OK  .provekit/self-contracts-attestations/python.json (contractSetCid blake3-512:b1de941756d0a3b352ca79ebed8b75644b7c782c3afe4163273220384125ec100457d5e969a921b7ceb277e329a24c5a4ea21ffd54963b51c1756befdb1793dc)

Note: the first unqualified make prove-python attempt used the default bin/bcargo route on this macOS host. It built/synced a Linux implementations/rust/target/release/provekit binary and then failed locally with cannot execute binary file / exit 126. The local-Cargo rerun above is the valid conformance evidence.

Corpus Counts

Corpus sampled from /tmp/provekit-classshape-src-20260603/src.

tomli: closed=4 open=1
  top open reasons: non-local-base=1

packaging: closed=6 open=72
  top open reasons: non-local-base=48, class-decorator=16, dynamic-setattr=11, property-descriptor=10, late-instance-attribute=9, conditional-init-attribute=7, method-decorator=6, metaclass=3

numpy: closed=719 open=806
  top open reasons: non-local-base=450, method-decorator=328, class-decorator=107, late-instance-attribute=61, property-descriptor=36, nested-instance-attribute=20, conditional-init-attribute=16, read-modify-instance-attribute=10

Representative records:

tomli/_parser.py Flags: status=closed, attrs=[EXPLICIT_NEST,FROZEN,_flags,_pending_flags], methods=[__init__,add_pending,finalize_pending,is_,set,unset_all], openReasons=[]
tomli/_parser.py TOMLDecodeError: status=open, attrs=[colno,doc,lineno,msg,pos], openReasons=[non-local-base]
packaging/_structures.py InfinityType: status=closed, attrs=[], methods=[__repr__:instance], openReasons=[]
packaging/_elffile.py EIClass: status=open, attrs=[C32,C64], openReasons=[non-local-base]
numpy/_core/_internal.py dummy_ctype: status=closed, attrs=[_cls], methods=[__call__,__eq__,__init__,__mul__,__ne__], openReasons=[]
numpy/_array_api_info.py __array_namespace_info__: status=open, attrs=[], methods=[capabilities,default_device,default_dtypes,devices,dtypes], openReasons=[class-decorator]

Slice 2 Notes Acknowledged, Not Implemented

These are logged for the next slice and intentionally not acted on in this PR:

  • Slice 2 first action: hand-audit about 10 closed NumPy classes before any discharge trusts closed.
  • Coverage recovery refinements: method-decorator whole-class-open (328 NumPy) and non-local/cross-file base resolution (450 NumPy) are the top deferred refinements.

Review Gate

Do not self-merge. CI plus CodeRabbit review are mandatory before merge.

Summary by CodeRabbit

  • New Features

    • Python source lifter now generates and includes class shape metadata documenting class attributes, methods, inheritance relationships, and structural details.
  • Tests

    • Added comprehensive test coverage for class shape metadata generation and soundness analysis across various class patterns.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

Walkthrough

This PR extends the Python source lifter to generate and attach comprehensive class-shape metadata to lifted source units. The contract is updated to optionally carry classShapes; the lifter analyzes class definitions, methods, attributes, and dynamic mutations through new AST visitors; and three tests validate the class-shape outputs across slots, methods, inheritance patterns, and soundness boundaries.

Changes

Class Shape Metadata Generation

Layer / File(s) Summary
Contract Extension for Class Shapes
src/provekit_lift_python_source/ir.py
source_unit_contract adds an optional class_shapes parameter and conditionally injects classShapes into the returned contract when provided.
Class Shape Analysis Infrastructure
src/provekit_lift_python_source/lifter.py
Introduces module constants documenting class-shape assumptions, a _ClassInfo dataclass for class metadata, and three AST visitors: _ClassCollector gathers class definitions; _MethodAttributeScanner tracks instance attribute presence and detects dynamic mutations (setattr/delattr patterns); _ClassBodyPoisonScanner flags class-level dynamic effects.
Class Shape Generation Pipeline
src/provekit_lift_python_source/lifter.py
Implements _lift_class_shapes and _build_class_shape to construct per-class JSON shape records with open/closed status, attributes, methods, bases, open reasons, and visible_setattr_override computation; includes helpers for method-kind detection, slot parsing, and attribute-source merging.
Integration into lift_source
src/provekit_lift_python_source/lifter.py
Wires class-shape computation into lift_source by computing shapes via AST analysis and passing the result to source_unit_contract.
Test Infrastructure and Coverage
tests/test_lifter.py
Adds IR-inspection helpers to extract class-shape metadata, and three new tests: one validates guaranteed slots and method receivers; another exercises soundness-boundary patterns (inheritance, __setattr__, deletion, properties); a third confirms class-shape lift is soundness-inert for attribute panic loci.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hops through class-shape lands so fine,
Where attributes and methods align,
Tracking slots, inheritance's grace,
Soundness checks in every place—
The lifter now sees all with care!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[codex] lift Python class shapes' directly and concisely describes the main change: adding Python class-shape lifting functionality to the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/python-classshape-forensics-20260603

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d87377de32

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

setattr_override_by_name=setattr_override_by_name,
)
shapes.append(shape)
shapes_by_name[info.node.name] = shape
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid resolving bases through hidden local classes

When a module-level base name is reused by a class inside a function, this simple-name index is overwritten by the function-local class even though that class is not visible to later module-level class definitions. For example, after class Base with __setattr__, def f(): class Base: pass, then class Derived(Base), the derived class is resolved against f.<locals>.Base, so setattr-override-in-mro can be missed and the shape may be marked closed incorrectly. Keying only by node.name makes class-shape soundness depend on unrelated hidden local definitions.

Useful? React with 👍 / 👎.

Comment on lines +435 to +440
if (
self.method_name == "__init__"
and self.method_kind == "instance"
and self._conditional_depth == 0
and self._nested_depth == 0
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat early exits before init writes as non-guaranteed

This marks every top-level self.x = ... in __init__ as guaranteed, but a preceding top-level return or raise can skip that assignment while still leaving an instance constructed, e.g. def __init__(self, ok): if not ok: return; self.value = 1. In that case value is recorded as presence-guaranteed even though normal construction can produce an object without it, so any future use of this catalog would discharge an attribute that may be absent.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/lifter.py`:
- Around line 1320-1328: The shape currently sets instanceReceiver only when
method_kind == "instance", which omits receivers for property accessors because
_method_kind(node) returns "property" for `@property/.setter/.deleter`; update the
condition that computes "instanceReceiver" (using _first_parameter_name(node))
to include property methods as well (e.g., if method_kind == "instance" or
method_kind == "property") so _MethodAttributeScanner will see self assignments
inside property setters/deleters and preserve receiver/openAttributes metadata.
- Around line 515-522: In _ClassBodyPoisonScanner, the early returns in
visit_FunctionDef, visit_AsyncFunctionDef, and visit_ClassDef skip traversing
decorator expressions so class-body mutations via decorators are missed; change
each of these methods to iterate over node.decorator_list and call
self.visit(...) on each decorator expression (and then return) so decorator-side
effects (e.g., setattr/delattr calls) are seen during class-body scanning; keep
the methods otherwise noop for body traversal to avoid entering nested bodies.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eb6e9f71-62cb-4483-9b95-5cf54c84ead8

📥 Commits

Reviewing files that changed from the base of the PR and between 99fc497 and d87377d.

📒 Files selected for processing (3)
  • implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/ir.py
  • implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/lifter.py
  • implementations/python/provekit-lift-python-source/tests/test_lifter.py

Comment on lines +515 to +522
def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
return

def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
return

def visit_ClassDef(self, node: ast.ClassDef) -> None:
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Traverse decorator expressions in _ClassBodyPoisonScanner.

Lines 515-522 return before visiting decorator_list, so class-body effects like @setattr(...) / @delattr(...) on methods or nested classes are never seen. Those decorators execute while the class body is being evaluated, so a shape can stay "closed" even though class construction performed dynamic mutation.

Suggested fix
 def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
-    return
+    for decorator in node.decorator_list:
+        self.visit(decorator)

 def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
-    return
+    for decorator in node.decorator_list:
+        self.visit(decorator)

 def visit_ClassDef(self, node: ast.ClassDef) -> None:
-    return
+    for decorator in node.decorator_list:
+        self.visit(decorator)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
return
def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
return
def visit_ClassDef(self, node: ast.ClassDef) -> None:
return
def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
for decorator in node.decorator_list:
self.visit(decorator)
def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
for decorator in node.decorator_list:
self.visit(decorator)
def visit_ClassDef(self, node: ast.ClassDef) -> None:
for decorator in node.decorator_list:
self.visit(decorator)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/lifter.py`
around lines 515 - 522, In _ClassBodyPoisonScanner, the early returns in
visit_FunctionDef, visit_AsyncFunctionDef, and visit_ClassDef skip traversing
decorator expressions so class-body mutations via decorators are missed; change
each of these methods to iterate over node.decorator_list and call
self.visit(...) on each decorator expression (and then return) so decorator-side
effects (e.g., setattr/delattr calls) are seen during class-body scanning; keep
the methods otherwise noop for body traversal to avoid entering nested bodies.

Comment on lines +1320 to +1328
method_kind = _method_kind(node)
first_arg = _first_parameter_name(node)
shape: Json = {
"name": node.name,
"qualname": f"{owner_qualname}.{node.name}",
"methodKind": method_kind,
"instanceReceiver": first_arg if method_kind == "instance" else None,
"line": int(getattr(node, "lineno", 0) or 0),
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep instanceReceiver for property accessors.

Lines 1322-1328 only populate instanceReceiver for "instance" methods. _method_kind() returns "property" for @property, .setter, and .deleter, so those shapes lose self, and _MethodAttributeScanner stops seeing self.attr = ... inside property setters/deleters. That drops promised receiver metadata and misses openAttributes evidence.

Suggested fix
 def _method_shape(
     node: ast.FunctionDef | ast.AsyncFunctionDef,
     owner_qualname: str,
 ) -> Json:
     method_kind = _method_kind(node)
     first_arg = _first_parameter_name(node)
     shape: Json = {
         "name": node.name,
         "qualname": f"{owner_qualname}.{node.name}",
         "methodKind": method_kind,
-        "instanceReceiver": first_arg if method_kind == "instance" else None,
+        "instanceReceiver": first_arg if method_kind in {"instance", "property"} else None,
         "line": int(getattr(node, "lineno", 0) or 0),
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/lifter.py`
around lines 1320 - 1328, The shape currently sets instanceReceiver only when
method_kind == "instance", which omits receivers for property accessors because
_method_kind(node) returns "property" for `@property/.setter/.deleter`; update the
condition that computes "instanceReceiver" (using _first_parameter_name(node))
to include property methods as well (e.g., if method_kind == "instance" or
method_kind == "property") so _MethodAttributeScanner will see self assignments
inside property setters/deleters and preserve receiver/openAttributes metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant