Skip to content

idx=8: freeze (26 records) — Allegiance Bancshares Restricted Stock Agreement template (IWW without /s/)#81

Open
arthrod wants to merge 1 commit into
redo/idx-7from
redo/idx-8
Open

idx=8: freeze (26 records) — Allegiance Bancshares Restricted Stock Agreement template (IWW without /s/)#81
arthrod wants to merge 1 commit into
redo/idx-7from
redo/idx-8

Conversation

@arthrod
Copy link
Copy Markdown
Owner

@arthrod arthrod commented May 17, 2026

User description

Summary

Ninth stacked PR. Adds idx=8 (RESTRICTED STOCK AGREEMENT, Allegiance Bancshares Non-Employee Director form template) as the ninth verified frozen baseline on top of idx=7 (PR #80).

idx=8 is a blank form/template — has the IWW operating clause but NO /s/ signature marks, only placeholder By:/Name:/Title: fields. This stress-tests the sig-page detector against the template-form pattern.

Parser change (1 surgical, shape-driven)

Extended _explode_signature_block_lines (~lines 3935-4057): handles the IWW-present-but-no-/s/ case. When _SIGN_OFF_RE finds no /s/ carrier but an IWW operating clause is present:

  1. Treats each IWW record as the sig-page anchor
  2. Walks IWW's sibling records (same parent) and their descendants
  3. Applies existing SHAPE detectors (_SIG_FIELD_RE for By:/Name:/Title:/Address:, _SIG_BLOCK_LABEL_RE for ALL-CAPS party labels, _CORP_SUFFIX_LABEL_RE for corp-suffix labels) to identify sig-line records
  4. Pins identified sig-line records at depth 2 + subdoc_penalty
  5. Excludes numbered/lettered sections, the IWW clause itself, and envelope/trailer rows

Gating: only fires when not has_slash_s and iww_present. In all 8 prior signed agreements, this branch is skipped — confirmed by byte-identical regress.

Verified output for idx=8

  • 26 records, distribution {L0:1, L1:16, L2:7, L3:2}
  • Reconstruction: word_coverage 100.0%, char_ratio 98.0%
  • Max depth: 3

Top-level structure (verbatim L0 + sig area)

o=0  L0: RESTRICTED STOCK AGREEMENT
o=1  L1: (Non-Employee Director) / This Restricted Stock Agreement ("Agreement") effective as of...
o=2  L1: 1. Award of Stock
o=3  L1: 2. Forfeiture Restrictions
...
o=23 L1: IN WITNESS WHEREOF, the Company has caused this Agreement to be duly executed...
o=24 L2: ALLEGIANCE BANCSHARES, INC. / By: / Name: / Title:
o=25 L2: Holder

The template has NO /s/ marks (it's a blank form). Sig area structure preserved per doc2dict natural grouping at L2.

Test plan

  • uv run scripts/parse_doc2dict_with_config.py --limit 9 --no-truncate --output-dir data/auto_parse exits 0 with ok 9
  • uv run scripts/level_loop/freeze.py 8 --force reports word_coverage ≥ 90% (100.0%)
  • uv run scripts/level_loop/regress.py reports all 9 frozen idxs OK
  • Inspector verified form-template extension well-scoped, gated correctly, byte-identical regress on signed idxs 0-7

Source

http://www.sec.gov/Archives/edgar/data/1642081/000164208117000074/a2017q1exhibit101-restrict.htm

Why this matters for the corpus

Form templates with blank signature fields are common in SEC EX-10 filings (stock award forms, employment agreement templates, etc.). The IWW-without-/s/ branch generalizes to all such templates while leaving signed agreements untouched.

🤖 Generated with Claude Code


CodeAnt-AI Description

Detect signature areas in blank form agreements that have an IWW clause but no /s/ mark

What Changed

  • Agreements that contain an IWW clause but no /s/ signature mark now keep the signature page grouped correctly instead of leaving signature fields at the wrong level
  • Party labels and placeholder fields like By:, Name:, and Title: are now treated as signature-area content when they appear alongside the IWW clause
  • Numbered sections, the IWW clause itself, and trailer/envelope rows are excluded from the signature area
  • Added a frozen baseline for the Allegiance Bancshares Restricted Stock Agreement template with verified reconstruction output

Impact

✅ Correct parsing of blank signed-form templates
✅ Fewer misgrouped signature fields
✅ Stable frozen baseline coverage

🔄 Retrigger CodeAnt AI Review

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

…greement: IWW-anchored sig-page detection for form templates without /s/ marks

idx=8 is a form/template Restricted Stock Agreement (Non-Employee Director) for Allegiance Bancshares, Inc. As a blank form it has an IWW operating clause but no /s/ signature marks — only placeholder By:/Name:/Title: fields. The existing _explode_signature_block_lines early-exited when no /s/ carrier existed, leaving party-label and By:/Name:/Title: records at L1 instead of demoting them to L2.

Fix in scripts/parse_doc2dict_with_config.py: when IWW is present but no /s/ exists, treat each IWW record as the sig-page anchor. Walk siblings of the IWW carrier under the same parent (and their descendants) and apply shape detection (SIG_FIELD_RE / SIG_BLOCK_LABEL_RE / CORP_SUFFIX_LABEL_RE / bare-name) to identify sig-line records, then pin them at depth 2 + subdoc_penalty. The walk does not filter by node_id position because doc2dict's promoted-leaf ordering doesn't guarantee document order — shape detection itself filters out non-sig content. Numbered/lettered sections, IWW itself, and envelope/trailer rows are excluded.

Verified:
  - idx=8: 26 records, levels {0: 1, 1: 16, 2: 7, 3: 2}, reconstruction word_coverage=100% char_ratio=98%
  - All 9 frozen idxs (0..8) regress OK
@blocksorg
Copy link
Copy Markdown

blocksorg Bot commented May 17, 2026

Mention Blocks like a regular teammate with your question or request:

@blocks review this pull request
@blocks make the following changes ...
@blocks create an issue from what was mentioned in the following comment ...
@blocks explain the following code ...
@blocks are there any security or performance concerns?

Run @blocks /help for more information.

Workspace settings | Disable this message

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arthrod! 👋

Your private repo does not have access to Sourcery.

Please upgrade to continue using Sourcery ✨

@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented May 17, 2026

CodeAnt AI is reviewing your PR.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

The PR enhances the parser to detect signatures via IWW clauses when /s/ marks are absent. It adds logic to identify signature-shaped sibling records, demote them to L2 depth, and mark them for later processing. A frozen dataset validates the updated logic on a Restricted Stock Agreement example.

Changes

Signature block explosion with IWW fallback

Layer / File(s) Summary
IWW-based signature detection logic
scripts/parse_doc2dict_with_config.py
Early-return guard checks for both /s/ marks and IWW clauses; new branch identifies signature-shaped siblings under IWW parent nodes via shape detectors (By/Name/Title/Address fields, block-label forms, corporate-suffix forms, and bare-name shapes), demotes matched records to L2 depth, and sets _sig_line=True.
Frozen index dataset and state tracking
data/auto_parse/level_freeze/frozen/idx_8.jsonl, data/auto_parse/level_freeze/state.json
New frozen dataset idx: 8 with 26 JSONL records encoding a Restricted Stock Agreement as ordered, hierarchically-leveled spans; state.json frozen index list and history updated with a freeze event timestamped 2026-05-17T07:30:15.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • arthrod/clause-extract#50: Both PRs extend the data/auto_parse/level_freeze workflow by adding frozen idx_*.jsonl files and updating state.json frozen list and history records; this PR also modifies the parser logic in scripts/parse_doc2dict_with_config.py to support signature detection fallback via IWW.

Suggested labels

Feat2

🐰 A rabbit hops through signatures with grace,
When /s/ marks fade from the document's face,
IWW steps in with shape-detecting cheer,
L2 demotions make the structure clear,
A frozen dataset proves the logic sound—
Happy parsing all around! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding idx=8 (a 26-record frozen baseline) for a Restricted Stock Agreement template with IWW but no /s/ signatures, which directly matches the changeset's primary objective.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description comprehensively details the changes, including the new frozen baseline idx=8, parser modifications to handle IWW-present-but-no-/s/ cases, verified output metrics, and test results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logic to detect signature areas in document templates that lack explicit signature marks but contain an 'In Witness Whereof' (IWW) clause. It implements a new sibling and descendant walk to identify signature-related fields and demote them to level 2. Additionally, a new frozen test case for a Restricted Stock Agreement is added. Feedback was provided regarding an inconsistency in the signature line detection logic where the sibling walk was missing a check for bare titles, which could lead to missing certain party labels.

Comment on lines +4006 to +4012
looks_sig = (
_SIG_FIELD_RE.match(sib_title)
or _SIG_FIELD_RE.match(sib_body)
or (sib_title and _SIG_BLOCK_LABEL_RE.match(sib_title))
or (sib_title and _CORP_SUFFIX_LABEL_RE.match(sib_title))
or (not sib_title and sib_body and _SIG_FIELD_RE.match(sib_body))
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for identifying signature lines in the template case (no /s/) is inconsistent between the sibling walk and the descendant walk. The sibling walk here is missing the (sib_title and not sib_body) check which is present in the descendant walk at line 4032. This means mixed-case party labels like "Holder" (as seen in the verified output for idx=8) might be missed if they are siblings of the IWW clause rather than descendants of another signature-shaped node. Adding this check ensures that bare titles in the signature area are correctly identified as signature lines regardless of their case.

                looks_sig = (
                    _SIG_FIELD_RE.match(sib_title)
                    or _SIG_FIELD_RE.match(sib_body)
                    or (sib_title and _SIG_BLOCK_LABEL_RE.match(sib_title))
                    or (sib_title and _CORP_SUFFIX_LABEL_RE.match(sib_title))
                    or (sib_title and not sib_body)  # bare name as title
                    or (not sib_title and sib_body and _SIG_FIELD_RE.match(sib_body))
                )

@coderabbitai coderabbitai Bot added the Feat2 label May 17, 2026
@codeant-ai codeant-ai Bot added the size:L This PR changes 100-499 lines, ignoring generated files label May 17, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/parse_doc2dict_with_config.py`:
- Around line 4013-4037: The descendant-scan is only executed when the sibling
`sib` matches `looks_sig`, so nested signature-like nodes inside a neutral
wrapper are never examined; change the logic around `looks_sig` so you still add
`sib["node_id"]` to `sig_line_node_ids` when `looks_sig` is true, but always
call `_walk_descendants_local(sib["node_id"])` for every `sib` (removing the `if
looks_sig:` gating) and keep the existing filtering checks
(`_has_section_marker_title`, `_is_iww_clause`, `_SIG_FIELD_RE`,
`_SIG_BLOCK_LABEL_RE`, `_CORP_SUFFIX_LABEL_RE`, bare title/body heuristics,
`_span_text`) when deciding whether to add descendant `d["node_id"]` to
`sig_line_node_ids`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6c388712-1d29-4490-84c6-a9d230a64bb6

📥 Commits

Reviewing files that changed from the base of the PR and between ec36b24 and 0f31083.

📒 Files selected for processing (3)
  • data/auto_parse/level_freeze/frozen/idx_8.jsonl
  • data/auto_parse/level_freeze/state.json
  • scripts/parse_doc2dict_with_config.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (Custom checks)

**/*.py: Run smoke test for Python web servers: start server, wait for ready signal, hit GET /health or GET /, assert HTTP 200. Timeout: 30s. Fail if smoke test fails.
Run smoke test for Python CLI: run <cli> --help, assert exit code 0. Fail if smoke test fails.
Run uv run pytest --cov=<src_package> --cov-report=term-missing --cov-branch --cov-fail-under=80 -q for Python coverage. Fail if exit code is non-zero, branch coverage < 70%, or line coverage < 80%.
Run uv run ruff check . --diff for Python linting. Fail if exit code is non-zero and list each violation.
Run uv run ruff format --check --diff . for Python formatting. Fail if exit code is non-zero and list each unformatted file.
Scan diffs for newly added Python suppression comments (# noqa, # type: ignore). Each suppression must have an inline justification comment. Fail if any new suppression lacks justification. Warn if total new suppressions > 3 in a single PR.
Run uv run ruff check --select I,F401 . to verify Python import ordering and detect unused imports. Fail if violations found.
Run the full Python test suite: uv run pytest --tb=line -q on origin/main to capture baseline pass/fail counts, and uv run pytest --tb=short -q on PR branch. Fail immediately if exit code is non-zero.
Run uv run typy check for Python type checking if .py files exist in diff or project has py.typed marker. Fail if exit code is non-zero. If typy is not available, use configured mypy or pyright instead. Fail with 'No Python type checker configured' if none is found.
Scan diffs for new bare type: ignore comments (without error codes) in Python files and cast() calls without explanatory comments. Warn for each. Fail if bare type: ignore count > 3.

Files:

  • scripts/parse_doc2dict_with_config.py
**/*.{py,ts,tsx}

📄 CodeRabbit inference engine (Custom checks)

For each changed production file, verify at least one corresponding test file exists or already exists in the repo with assertions covering changed symbols. Fail if a changed production file has zero associated test file and introduces new exported functions, classes, or public methods. Exempt: config files, type declaration files (.d.ts), migrations, and generated code.

Files:

  • scripts/parse_doc2dict_with_config.py
🔇 Additional comments (2)
data/auto_parse/level_freeze/frozen/idx_8.jsonl (1)

1-26: LGTM!

data/auto_parse/level_freeze/state.json (1)

11-12: LGTM!

Also applies to: 175-180

Comment on lines +4013 to +4037
if looks_sig:
sig_line_node_ids.add(sib["node_id"])
# Walk descendants of this sig-shape sibling so
# nested labels/fields (e.g. a "Holder" child under
# a party-label parent) are also demoted to L2.
for d in _walk_descendants_local(sib["node_id"]):
if d.get("is_envelope") or d.get("scope") == "trailer":
continue
if _has_section_marker_title(d):
continue
if _is_iww_clause(_span_text(d)):
continue
d_title = (d.get("title") or "").strip()
d_body = (d.get("body_direct") or "").strip()
d_looks_sig = (
_SIG_FIELD_RE.match(d_title)
or _SIG_FIELD_RE.match(d_body)
or (d_title and _SIG_BLOCK_LABEL_RE.match(d_title))
or (d_title and _CORP_SUFFIX_LABEL_RE.match(d_title))
or (d_title and not d_body) # bare name as title
or (not d_title and d_body and _SIG_FIELD_RE.match(d_body))
)
if d_looks_sig:
sig_line_node_ids.add(d["node_id"])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Descendant sig-line detection is incorrectly gated by sibling shape.

At Line 4015, descendant traversal only runs when sib already matches looks_sig. If doc2dict emits a neutral wrapper sibling containing By:/Name:/Title: descendants, those descendants are never scanned and won’t be pinned/flagged.

Suggested patch
-                if looks_sig:
-                    sig_line_node_ids.add(sib["node_id"])
-                    # Walk descendants of this sig-shape sibling so
-                    # nested labels/fields (e.g. a "Holder" child under
-                    # a party-label parent) are also demoted to L2.
-                    for d in _walk_descendants_local(sib["node_id"]):
+                if looks_sig:
+                    sig_line_node_ids.add(sib["node_id"])
+                # Always inspect descendants of IWW-scope siblings; a wrapper
+                # node may not be sig-shaped while its children are.
+                for d in _walk_descendants_local(sib["node_id"]):
                         if d.get("is_envelope") or d.get("scope") == "trailer":
                             continue
                         if _has_section_marker_title(d):
                             continue
                         if _is_iww_clause(_span_text(d)):
                             continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if looks_sig:
sig_line_node_ids.add(sib["node_id"])
# Walk descendants of this sig-shape sibling so
# nested labels/fields (e.g. a "Holder" child under
# a party-label parent) are also demoted to L2.
for d in _walk_descendants_local(sib["node_id"]):
if d.get("is_envelope") or d.get("scope") == "trailer":
continue
if _has_section_marker_title(d):
continue
if _is_iww_clause(_span_text(d)):
continue
d_title = (d.get("title") or "").strip()
d_body = (d.get("body_direct") or "").strip()
d_looks_sig = (
_SIG_FIELD_RE.match(d_title)
or _SIG_FIELD_RE.match(d_body)
or (d_title and _SIG_BLOCK_LABEL_RE.match(d_title))
or (d_title and _CORP_SUFFIX_LABEL_RE.match(d_title))
or (d_title and not d_body) # bare name as title
or (not d_title and d_body and _SIG_FIELD_RE.match(d_body))
)
if d_looks_sig:
sig_line_node_ids.add(d["node_id"])
if looks_sig:
sig_line_node_ids.add(sib["node_id"])
# Always inspect descendants of IWW-scope siblings; a wrapper
# node may not be sig-shaped while its children are.
for d in _walk_descendants_local(sib["node_id"]):
if d.get("is_envelope") or d.get("scope") == "trailer":
continue
if _has_section_marker_title(d):
continue
if _is_iww_clause(_span_text(d)):
continue
d_title = (d.get("title") or "").strip()
d_body = (d.get("body_direct") or "").strip()
d_looks_sig = (
_SIG_FIELD_RE.match(d_title)
or _SIG_FIELD_RE.match(d_body)
or (d_title and _SIG_BLOCK_LABEL_RE.match(d_title))
or (d_title and _CORP_SUFFIX_LABEL_RE.match(d_title))
or (d_title and not d_body) # bare name as title
or (not d_title and d_body and _SIG_FIELD_RE.match(d_body))
)
if d_looks_sig:
sig_line_node_ids.add(d["node_id"])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/parse_doc2dict_with_config.py` around lines 4013 - 4037, The
descendant-scan is only executed when the sibling `sib` matches `looks_sig`, so
nested signature-like nodes inside a neutral wrapper are never examined; change
the logic around `looks_sig` so you still add `sib["node_id"]` to
`sig_line_node_ids` when `looks_sig` is true, but always call
`_walk_descendants_local(sib["node_id"])` for every `sib` (removing the `if
looks_sig:` gating) and keep the existing filtering checks
(`_has_section_marker_title`, `_is_iww_clause`, `_SIG_FIELD_RE`,
`_SIG_BLOCK_LABEL_RE`, `_CORP_SUFFIX_LABEL_RE`, bare title/body heuristics,
`_span_text`) when deciding whether to add descendant `d["node_id"]` to
`sig_line_node_ids`.

Comment on lines +4006 to +4012
looks_sig = (
_SIG_FIELD_RE.match(sib_title)
or _SIG_FIELD_RE.match(sib_body)
or (sib_title and _SIG_BLOCK_LABEL_RE.match(sib_title))
or (sib_title and _CORP_SUFFIX_LABEL_RE.match(sib_title))
or (not sib_title and sib_body and _SIG_FIELD_RE.match(sib_body))
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The IWW-anchored path marks uppercase/corporate-title records as signature lines even when they carry normal clause body text, because _SIG_BLOCK_LABEL_RE and _CORP_SUFFIX_LABEL_RE are used without the body guard applied in the /s/ path. This can incorrectly demote non-signature agreement content to L2 in unsigned templates. Require empty/sig-shaped body before treating label-shaped titles as sig lines. [incorrect condition logic]

Severity Level: Major ⚠️
- ❌ Non-signature uppercase headings demoted to L2 sig-lines.
- ⚠️ Level-loop gating may treat real clauses as signature block.
Steps of Reproduction ✅
1. Inspect `_explode_signature_block_lines()` in `scripts/parse_doc2dict_with_config.py`:
the IWW-anchored no-`/s/` branch begins at line 3952, and for each sibling of the IWW
parent it computes `looks_sig` at lines 4004–4012 using `_SIG_BLOCK_LABEL_RE` and
`_CORP_SUFFIX_LABEL_RE` on `sib_title` without checking that `sib_body` is empty or
sig-shaped, unlike the `/s/` path's ancestor logic at lines 14–24 and descendant logic at
lines 80–88 in the same file, which explicitly require `not p_body`/`not d_body` when
using these label detectors.

2. In a Python shell, build a `rows` list where `iww` is a record with `node_id=1`,
`parent_node_id=20`, span starting "IN WITNESS WHEREOF" (so `_is_iww_clause` is true), and
`heading` is a sibling record with `node_id=2`, `parent_node_id=20`, `title="BINDING
EFFECT"` (short all-caps, matching `_SIG_BLOCK_LABEL_RE`) and `body_direct="This section
sets binding effect terms."` (non-sig-shaped text), with `is_envelope=False` and `scope`
not `"trailer"`.

3. Call `_explode_signature_block_lines(rows)` so that `actives` contains both records,
`iww_present` is set true at lines 36–44, `has_slash_s` is false at line 46, and the `if
not has_slash_s and iww_present:` branch at 3952 is taken; `sig_area_parent_ids` is
`{20}`, and the sibling loop at lines 3993–3999 iterates over `heading`, which is not
skipped by `_has_section_marker_title` because it has no numbered/lettered marker.

4. For `heading`, `sib_title` is `"BINDING EFFECT"` and `sib_body` is the non-empty clause
text; `looks_sig` at lines 4006–4012 becomes true because
`_SIG_BLOCK_LABEL_RE.match(sib_title)` passes even though `sib_body` is normal clause
content, so `heading["node_id"]` is added to `sig_line_node_ids` at lines 4013–4015 and
later, in PASS 3 at lines 4039–4051, its `depth` is forcibly set to `2 + subdoc_penalty`
and `_sig_line=True`, incorrectly demoting a real agreement clause to a signature-line
record under the IWW-anchored template path.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** scripts/parse_doc2dict_with_config.py
**Line:** 4006:4012
**Comment:**
	*Incorrect Condition Logic: The IWW-anchored path marks uppercase/corporate-title records as signature lines even when they carry normal clause body text, because `_SIG_BLOCK_LABEL_RE` and `_CORP_SUFFIX_LABEL_RE` are used without the body guard applied in the `/s/` path. This can incorrectly demote non-signature agreement content to L2 in unsigned templates. Require empty/sig-shaped body before treating label-shaped titles as sig lines.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment on lines +4013 to +4036
if looks_sig:
sig_line_node_ids.add(sib["node_id"])
# Walk descendants of this sig-shape sibling so
# nested labels/fields (e.g. a "Holder" child under
# a party-label parent) are also demoted to L2.
for d in _walk_descendants_local(sib["node_id"]):
if d.get("is_envelope") or d.get("scope") == "trailer":
continue
if _has_section_marker_title(d):
continue
if _is_iww_clause(_span_text(d)):
continue
d_title = (d.get("title") or "").strip()
d_body = (d.get("body_direct") or "").strip()
d_looks_sig = (
_SIG_FIELD_RE.match(d_title)
or _SIG_FIELD_RE.match(d_body)
or (d_title and _SIG_BLOCK_LABEL_RE.match(d_title))
or (d_title and _CORP_SUFFIX_LABEL_RE.match(d_title))
or (d_title and not d_body) # bare name as title
or (not d_title and d_body and _SIG_FIELD_RE.match(d_body))
)
if d_looks_sig:
sig_line_node_ids.add(d["node_id"])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Descendant scanning is currently gated behind the sibling-level looks_sig check, so any IWW sibling that is just a structural container (doesn't itself match sig-shape) will never have its children inspected. That drops real signature fields nested under that container in no-/s/ templates. Walk descendants for each sibling first, then apply sig-shape filtering per descendant record. [incomplete implementation]

Severity Level: Major ⚠️
- ❌ IWW-only templates can drop nested signature fields.
- ⚠️ Level-freeze baselines mis-group signature blocks for some forms.
Steps of Reproduction ✅
1. Open `scripts/parse_doc2dict_with_config.py` and inspect
`_explode_signature_block_lines()` at line 3825 and the IWW-anchored no-`/s/` branch at
lines 3952–4053; note the descendant walk `_walk_descendants_local()` is only invoked
inside the `if looks_sig:` block at lines 4013–4036.

2. In a Python shell from the repo root, construct a minimal `rows` list matching this
function's expected shape: one record `iww` with `node_id=1`, `parent_node_id=10`, span
starting with "IN WITNESS WHEREOF" (so `_is_iww_clause(_span_text(iww))` is true), and
another record `container` with `node_id=2`, `parent_node_id=10`, whose `title` and
`body_direct` are both non-sig-shaped (so `_SIG_FIELD_RE`, `_SIG_BLOCK_LABEL_RE`, and
`_CORP_SUFFIX_LABEL_RE` all return false); then add a child record `sig_child` with
`node_id=3`, `parent_node_id=2`, whose `title` is `"By:"` (so `_SIG_FIELD_RE.match("By:")`
is true).

3. Call `_explode_signature_block_lines(rows)` directly (imported from
`scripts.parse_doc2dict_with_config`) so that `actives` includes all three records,
`iww_present` is set true at lines 36–44, `has_slash_s` is false at line 46, and the
form/template branch `if not has_slash_s and iww_present:` at lines 3952–3956 is taken;
inside, `sig_area_parent_ids` becomes `{10}`, `siblings` contains `iww` and `container`,
`container` is not skipped by `_has_section_marker_title`, and its `looks_sig` computed at
lines 4004–4012 is false because its title/body are not sig-shaped.

4. Because `looks_sig` is false for `container`, the `if looks_sig:` block at line 4013 is
not entered, `_walk_descendants_local(container["node_id"])` is never called, and
`sig_child` (with `title="By:"`) is never examined or added to `sig_line_node_ids`; in
PASS 3 (lines 4039–4051) only records whose `node_id` is in `sig_line_node_ids` are
assigned depth `2 + subdoc_penalty` and `_sig_line=True`, so `sig_child` remains at its
original depth and is not marked as a signature line even though it structurally is part
of the signature page.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** scripts/parse_doc2dict_with_config.py
**Line:** 4013:4036
**Comment:**
	*Incomplete Implementation: Descendant scanning is currently gated behind the sibling-level `looks_sig` check, so any IWW sibling that is just a structural container (doesn't itself match sig-shape) will never have its children inspected. That drops real signature fields nested under that container in no-`/s/` templates. Walk descendants for each sibling first, then apply sig-shape filtering per descendant record.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented May 17, 2026

CodeAnt AI finished reviewing your PR.

@arthrod
Copy link
Copy Markdown
Owner Author

arthrod commented May 17, 2026

Triage agent — PR #81 comment review (read-only pass, no code changes)

4 inline comments reviewed:

  1. gemini-code-assist @ parse_doc2dict...py:4012 — sibling walk missing (sib_title and not sib_body) check present in descendant walk (WILL-DEFER)
    Real inconsistency: the no-/s/ sibling walk and descendant walk use different guards for the template case. The descendant path has the combined check but the sibling path does not, so the sibling walk may classify non-sig siblings with body text. Deferred — needs a fixture to confirm which path is wrong (sibling should match the stricter descendant guard).

  2. coderabbitai @ parse_doc2dict...py:4037 — descendant traversal gated behind sibling looks_sig check (WILL-DEFER)
    Real issue (MAJOR): if a sibling is a structural container that doesn't itself match looks_sig, its By:/Name:/Title: descendants are never inspected. Fixes the "neutral wrapper sibling" problem. Deferred — same fixture requirement as item 1.

  3. codeant-ai @ parse_doc2dict...py:4012 — IWW-anchored path marks uppercase/corporate-title rows as sig without body guard (WILL-DEFER)
    The /s/ path has a body guard; the no-/s/ IWW path does not, so clause body text on corporate-title rows can be incorrectly demoted. Should apply the same body guard. Deferred.

  4. codeant-ai @ parse_doc2dict...py:4036 — descendant scanning gated behind sibling-level looks_sig (WILL-DEFER)
    Same as coderabbitai item 2. Consolidated.

WILL-DEFER items (3 distinct): Unify sibling/descendant sig-guard logic; ungate descendant traversal from sibling looks_sig; add body guard to no-/s/ IWW path.

Triage only — no code changes made this round.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feat2 size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant