idx=15: freeze (46 records) — Mast Therapeutics Separation Agreement (multi-line title + synthetic L0 swap)#88
idx=15: freeze (46 records) — Mast Therapeutics Separation Agreement (multi-line title + synthetic L0 swap)#88arthrod wants to merge 1 commit into
Conversation
… swap doc2dict synthetic single-word L0 ("AGREEMENT") with the earlier multi-line title carrier ("SEPARATION AGREEMENT / AND / GENERAL RELEASE OF CLAIMS"), enable forward continuation walk on swapped L0, mark synthetic separator so /s/ sig-block UP-walk skips it
|
Mention Blocks like a regular teammate with your question or request: @blocks review this pull request Run |
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
CodeAnt AI is reviewing your PR. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📜 Recent review details🧰 Additional context used📓 Path-based instructions (2)**/*.py📄 CodeRabbit inference engine (Custom checks)
Files:
**/*.{py,ts,tsx}📄 CodeRabbit inference engine (Custom checks)
Files:
🪛 Ruff (0.15.12)scripts/parse_doc2dict_with_config.py[warning] 2417-2417: Too many branches (13 > 12) (PLR0912) [warning] 2704-2704: Consider iterable unpacking instead of concatenation Replace with iterable unpacking (RUF005) 🔍 Remote MCPSummary of Gathered ContextDocument Context: Mast Therapeutics Separation AgreementThe idx=15 document is a Separation Agreement and General Release of Claims between Mast Therapeutics, Inc. and employee Brandi L. Roberts, dated April 10–13, 2017 (as mentioned in PR context). The termination was expected to occur on or about April 21, 2017, in connection with the closing of the acquisition of Savara Inc. by Mast. The acquisition was completed on April 27, 2017. PR Code Changes ReviewThree surgical parser fixes were introduced:
Verification & Metrics
State & Freeze UpdatesThe
Important Note on User Requirements🔇 Additional comments (4)
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughThis PR introduces a new parsing pass that detects and normalizes a layout pathology where doc2dict incorrectly promotes a bare single-word AGREEMENT/PLAN separator to L0 depth, while the real multi-line agreement title appears earlier. The fix swaps their depth assignments, extends the merge logic to collect forward continuation lines, updates downstream signature-block logic to skip synthetic separators, and adds a frozen output dataset for idx_15. ChangesSynthetic Agreement Title Normalization
Frozen Dataset and State Management
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a structural fix for document parsing to correctly identify agreement titles by swapping generic separators with descriptive titles found earlier in the text. It also updates multi-line title merging to support forward-walking collection and includes data updates for document index 15. Feedback identifies a discrepancy between the docstring and implementation in the new swap function and suggests optimizing regex compilation by moving it to the module level.
| 1. The current L0 title is BARE: matches exactly "AGREEMENT" or | ||
| "PLAN" with nothing else. | ||
| 2. The current L0 has NO body_direct. | ||
| 3. There is an EARLIER record (smaller node_id) whose title matches | ||
| the AGREEMENT|PLAN end pattern AND has descriptive prefix words | ||
| (i.e. title is longer than just the bare word). | ||
| 4. That earlier record is in agreement scope (not envelope, not | ||
| trailer). |
There was a problem hiding this comment.
The docstring states that one of the conditions for the swap is that 'The current L0 has NO body_direct'. However, the implementation does not check for this, and a comment at line 2487 explicitly says 'Body presence is NOT a disqualifier'. This suggestion updates the docstring to be consistent with the implementation by removing this point and renumbering the list.
| 1. The current L0 title is BARE: matches exactly "AGREEMENT" or | |
| "PLAN" with nothing else. | |
| 2. The current L0 has NO body_direct. | |
| 3. There is an EARLIER record (smaller node_id) whose title matches | |
| the AGREEMENT|PLAN end pattern AND has descriptive prefix words | |
| (i.e. title is longer than just the bare word). | |
| 4. That earlier record is in agreement scope (not envelope, not | |
| trailer). | |
| 1. The current L0 title is BARE: matches exactly "AGREEMENT" or | |
| "PLAN" with nothing else. | |
| 2. There is an EARLIER record (smaller node_id) whose title matches | |
| the AGREEMENT|PLAN end pattern AND has descriptive prefix words | |
| (i.e. title is longer than just the bare word). | |
| 3. That earlier record is in agreement scope (not envelope, not | |
| trailer). |
| _AGREEMENT_END_RE = re.compile( | ||
| r"^.*\b(AGREEMENT|PLAN)\s*$", | ||
| re.IGNORECASE, | ||
| ) |
There was a problem hiding this comment.
For better performance, this regular expression should be compiled only once at the module level, for example, right after _BARE_AGREEMENT_PLAN_TITLE_RE. Compiling it on every function call is inefficient, especially since this function is called for each document parsed. You can then use the module-level constant _AGREEMENT_END_RE inside this function.
|
CodeAnt AI finished reviewing your PR. |
User description
Summary
Sixteenth stacked PR. Adds idx=15 (SEPARATION AGREEMENT AND GENERAL RELEASE OF CLAIMS between Brian M. Culley and Mast Therapeutics, Inc., April 10-13, 2017) as the sixteenth verified frozen baseline on top of idx=14 (PR #87).
This agreement has a quirky HTML structure that exposed 3 parser pathologies, all addressed surgically:
SEPARATION AGREEMENT/AND/GENERAL RELEASE OF CLAIMS)AGREEMENTbody separator between RECITALS and operative sections that doc2dict promoted to L0 instead of the real titleParser changes (3 surgical, shape-driven)
_swap_synthetic_l0_with_real_title(new, ~lines 2411-2559): when L0 has a BARE single-word^\s*(?:AGREEMENT|PLAN)\s*$title, finds an earlier sibling in the same scope whose title matches^.*\b(AGREEMENT|PLAN)\s*$with a descriptive prefix; swaps depths (earlier→L0, synthetic→L1); sets_swapped_l0and_synthetic_l0_separatormarkers. Predicate is shape-tight: silently no-ops if no descriptive earlier title exists or ifsubdoc_penalty/scope don't match. Inspector verified all prior 15 idxs have descriptive L0 — none fire the swap.Extended
_merge_multiline_l0_title(~lines 2562-2727) with FORWARD continuation walk gated by_swapped_l0. Collects trailing title-line siblings (AND,GENERAL RELEASE OF CLAIMS), absorbs preamble body so_split_l0_title_from_preamblelifts it to L1. Normal multi-line titles (idx=7 backward-walk) still use the original backward walk.Guarded
_explode_signature_block_linesUP-walk (~lines 4676-4682) against claiming the synthetic separator as a sig-block ancestor label. Usesparent.get("_synthetic_l0_separator")exact-match check.All detection is SHAPE-based. No phrase blocklists.
Verified output for idx=15
{L0:1, L1:32, L2:13}(max depth 2)L0 (verbatim, multi-line)
Top structure
Risk assessment (inspector)
All 3 fixes well-scoped:
_swapped_l0marker → only fires after the swap → cannot affect normal multi-line titlesKnown minor quirks (polish-deferred)
- 5 -page footer artifact (5 chars). Per rubric §"Common parser failure modes" this is technically out-of-scope chrome. Single-record defect, not blocking.Test plan
uv run scripts/parse_doc2dict_with_config.py --limit 16 --no-truncate --output-dir data/auto_parseexits 0 withok 16uv run scripts/level_loop/freeze.py 15 --forcereports word_coverage ≥ 90% (96.3%)uv run scripts/level_loop/regress.pyreports all 16 frozen idxs OK🤖 Generated with Claude Code
CodeAnt-AI Description
Correctly parse a separation agreement with a split title and internal section separator
What Changed
Impact
✅ Accurate contract titles✅ Fewer missing or misplaced opening sections✅ Cleaner signature block extraction🔄 Retrigger CodeAnt AI Review
💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.