Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions data/auto_parse/level_freeze/frozen/idx_5.jsonl

Large diffs are not rendered by default.

9 changes: 8 additions & 1 deletion data/auto_parse/level_freeze/state.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
1,
2,
3,
4
4,
5
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current_idx (line 2) remains at 0 despite freezing up to idx=5. In this workflow, it is recommended to advance current_idx to 6 to reflect the progress and ensure that subsequent runs of the loop scripts (like freeze.py or advance.py) default to the correct next document. This prevents accidental overwrites of earlier indices.

],
"history": [
{
Expand Down Expand Up @@ -132,6 +133,12 @@
"action": "freeze",
"idx": 4,
"n_records": 77
},
{
"ts": "2026-05-17T06:18:39",
"action": "freeze",
"idx": 5,
"n_records": 18
}
Comment on lines +137 to 142
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The frozen output for idx=5 contains a structural inconsistency where 'TRACT FIFTEEN' (order 5) is assigned to Level 1, despite being content within Section 1.1(a) (Level 2). Additionally, order 17 appears to be a trailing footer artifact ('Ex. B-98') that should ideally be dropped per the rubric. While these are noted as 'quirks' in the PR description, freezing them into the golden baseline reduces the structural accuracy of the regression set. Consider if these can be addressed in the parser to maintain a cleaner baseline.

]
}