[codex] lift Python class shapes#1913
Conversation
WalkthroughThis PR extends the Python source lifter to generate and attach comprehensive class-shape metadata to lifted source units. The contract is updated to optionally carry classShapes; the lifter analyzes class definitions, methods, attributes, and dynamic mutations through new AST visitors; and three tests validate the class-shape outputs across slots, methods, inheritance patterns, and soundness boundaries. ChangesClass Shape Metadata Generation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d87377de32
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| setattr_override_by_name=setattr_override_by_name, | ||
| ) | ||
| shapes.append(shape) | ||
| shapes_by_name[info.node.name] = shape |
There was a problem hiding this comment.
Avoid resolving bases through hidden local classes
When a module-level base name is reused by a class inside a function, this simple-name index is overwritten by the function-local class even though that class is not visible to later module-level class definitions. For example, after class Base with __setattr__, def f(): class Base: pass, then class Derived(Base), the derived class is resolved against f.<locals>.Base, so setattr-override-in-mro can be missed and the shape may be marked closed incorrectly. Keying only by node.name makes class-shape soundness depend on unrelated hidden local definitions.
Useful? React with 👍 / 👎.
| if ( | ||
| self.method_name == "__init__" | ||
| and self.method_kind == "instance" | ||
| and self._conditional_depth == 0 | ||
| and self._nested_depth == 0 | ||
| ): |
There was a problem hiding this comment.
Treat early exits before init writes as non-guaranteed
This marks every top-level self.x = ... in __init__ as guaranteed, but a preceding top-level return or raise can skip that assignment while still leaving an instance constructed, e.g. def __init__(self, ok): if not ok: return; self.value = 1. In that case value is recorded as presence-guaranteed even though normal construction can produce an object without it, so any future use of this catalog would discharge an attribute that may be absent.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/lifter.py`:
- Around line 1320-1328: The shape currently sets instanceReceiver only when
method_kind == "instance", which omits receivers for property accessors because
_method_kind(node) returns "property" for `@property/.setter/.deleter`; update the
condition that computes "instanceReceiver" (using _first_parameter_name(node))
to include property methods as well (e.g., if method_kind == "instance" or
method_kind == "property") so _MethodAttributeScanner will see self assignments
inside property setters/deleters and preserve receiver/openAttributes metadata.
- Around line 515-522: In _ClassBodyPoisonScanner, the early returns in
visit_FunctionDef, visit_AsyncFunctionDef, and visit_ClassDef skip traversing
decorator expressions so class-body mutations via decorators are missed; change
each of these methods to iterate over node.decorator_list and call
self.visit(...) on each decorator expression (and then return) so decorator-side
effects (e.g., setattr/delattr calls) are seen during class-body scanning; keep
the methods otherwise noop for body traversal to avoid entering nested bodies.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eb6e9f71-62cb-4483-9b95-5cf54c84ead8
📒 Files selected for processing (3)
implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/ir.pyimplementations/python/provekit-lift-python-source/src/provekit_lift_python_source/lifter.pyimplementations/python/provekit-lift-python-source/tests/test_lifter.py
| def visit_FunctionDef(self, node: ast.FunctionDef) -> None: | ||
| return | ||
|
|
||
| def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None: | ||
| return | ||
|
|
||
| def visit_ClassDef(self, node: ast.ClassDef) -> None: | ||
| return |
There was a problem hiding this comment.
Traverse decorator expressions in _ClassBodyPoisonScanner.
Lines 515-522 return before visiting decorator_list, so class-body effects like @setattr(...) / @delattr(...) on methods or nested classes are never seen. Those decorators execute while the class body is being evaluated, so a shape can stay "closed" even though class construction performed dynamic mutation.
Suggested fix
def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
- return
+ for decorator in node.decorator_list:
+ self.visit(decorator)
def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
- return
+ for decorator in node.decorator_list:
+ self.visit(decorator)
def visit_ClassDef(self, node: ast.ClassDef) -> None:
- return
+ for decorator in node.decorator_list:
+ self.visit(decorator)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def visit_FunctionDef(self, node: ast.FunctionDef) -> None: | |
| return | |
| def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None: | |
| return | |
| def visit_ClassDef(self, node: ast.ClassDef) -> None: | |
| return | |
| def visit_FunctionDef(self, node: ast.FunctionDef) -> None: | |
| for decorator in node.decorator_list: | |
| self.visit(decorator) | |
| def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None: | |
| for decorator in node.decorator_list: | |
| self.visit(decorator) | |
| def visit_ClassDef(self, node: ast.ClassDef) -> None: | |
| for decorator in node.decorator_list: | |
| self.visit(decorator) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/lifter.py`
around lines 515 - 522, In _ClassBodyPoisonScanner, the early returns in
visit_FunctionDef, visit_AsyncFunctionDef, and visit_ClassDef skip traversing
decorator expressions so class-body mutations via decorators are missed; change
each of these methods to iterate over node.decorator_list and call
self.visit(...) on each decorator expression (and then return) so decorator-side
effects (e.g., setattr/delattr calls) are seen during class-body scanning; keep
the methods otherwise noop for body traversal to avoid entering nested bodies.
| method_kind = _method_kind(node) | ||
| first_arg = _first_parameter_name(node) | ||
| shape: Json = { | ||
| "name": node.name, | ||
| "qualname": f"{owner_qualname}.{node.name}", | ||
| "methodKind": method_kind, | ||
| "instanceReceiver": first_arg if method_kind == "instance" else None, | ||
| "line": int(getattr(node, "lineno", 0) or 0), | ||
| } |
There was a problem hiding this comment.
Keep instanceReceiver for property accessors.
Lines 1322-1328 only populate instanceReceiver for "instance" methods. _method_kind() returns "property" for @property, .setter, and .deleter, so those shapes lose self, and _MethodAttributeScanner stops seeing self.attr = ... inside property setters/deleters. That drops promised receiver metadata and misses openAttributes evidence.
Suggested fix
def _method_shape(
node: ast.FunctionDef | ast.AsyncFunctionDef,
owner_qualname: str,
) -> Json:
method_kind = _method_kind(node)
first_arg = _first_parameter_name(node)
shape: Json = {
"name": node.name,
"qualname": f"{owner_qualname}.{node.name}",
"methodKind": method_kind,
- "instanceReceiver": first_arg if method_kind == "instance" else None,
+ "instanceReceiver": first_arg if method_kind in {"instance", "property"} else None,
"line": int(getattr(node, "lineno", 0) or 0),
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@implementations/python/provekit-lift-python-source/src/provekit_lift_python_source/lifter.py`
around lines 1320 - 1328, The shape currently sets instanceReceiver only when
method_kind == "instance", which omits receivers for property accessors because
_method_kind(node) returns "property" for `@property/.setter/.deleter`; update the
condition that computes "instanceReceiver" (using _first_parameter_name(node))
to include property methods as well (e.g., if method_kind == "instance" or
method_kind == "property") so _MethodAttributeScanner will see self assignments
inside property setters/deleters and preserve receiver/openAttributes metadata.
Summary
This PR adds Slice 1 of the Python class-shape work: a data-only
classShapescatalog on Python source-unit contracts.The slice is intentionally soundness-inert:
attribute_presentpredicate emission yetcf_guardedcarrierspanicLociremain in placeThe catalog records:
attributesonly for class-body assignments and unconditional__init__instance assignmentspermittedAttributeswithguaranteesPresence=false, not as presenceopenAttributesfor known but not guaranteed membersopenReasonsfor the seven approved soundness cases__init__construction and cross-module mutation limitsAdditive / CID Gate
Confirmed
classShapesis purely additive and gated:origin/mainexactly and carries noclassShapesfield.origin/mainonly by additiveclassShapes; stripping that field restores byte-identicalorigin/mainIR.Command evidence:
Signature/CID classification:
classShapesis absent for class-free source units.classShapesmetadata restores the old IR.make USE_BCARGO=0 mint-pythonaccepted the checked-in attestation:make python-language-signaturecurrently fails because checked-in language-signature assets are already stale onorigin/main:This branch does not modify
menagerie/python-language-signature, and the same check fails from anorigin/mainarchive, so this is reported as a pre-existing signature check failure, not a regression from classShapes.RED -> GREEN Evidence
Tests were written before implementation.
Initial RED run:
The failures were because
classShapeswas absent from source-unit contracts.GREEN runs:
Python Kit Conformance / Self-Application
Python C1-C8 kit conformance passed with local Cargo:
Python self-contract mint/self-application passed:
Note: the first unqualified
make prove-pythonattempt used the defaultbin/bcargoroute on this macOS host. It built/synced a Linuximplementations/rust/target/release/provekitbinary and then failed locally withcannot execute binary file/ exit 126. The local-Cargo rerun above is the valid conformance evidence.Corpus Counts
Corpus sampled from
/tmp/provekit-classshape-src-20260603/src.Representative records:
Slice 2 Notes Acknowledged, Not Implemented
These are logged for the next slice and intentionally not acted on in this PR:
closedNumPy classes before any discharge trustsclosed.method-decoratorwhole-class-open (328 NumPy) and non-local/cross-file base resolution (450 NumPy) are the top deferred refinements.Review Gate
Do not self-merge. CI plus CodeRabbit review are mandatory before merge.
Summary by CodeRabbit
New Features
Tests