Cross-rule cell sharing via graph coloring#175
Conversation
|
Just FYI, there are a few good papers out there about doing submatch and named submatch correctly that may be worth looking at, one by the re2c maintainer (and I think it's on his web site.) |
Thanks, I've had a look at them before working on this PR (because you've mentioned it in #112 (comment)) The implementation here was inspired by these papers and the implementation of ocamllex.
|
When a Set_position tag fires on every transition entering a cycle
state, remove it from those transitions and emit Set_position with
offset 1 (pos - 1) on exit transitions. This turns O(n) tag writes
per loop into O(1) on exit.
Handles multi-state cycles via SCC analysis (Tarjan's algorithm) and
multi-exit cycles: exits from other cycle states are safe when they
only reach rules that don't read the tag.
Set_position { cell; offset } with offset 0 for current position and
offset 1 for previous code point (subtracted from pos).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tags are allocated during regexp_of_pattern in the PPX, not during compile_re. The tag_starts approach incorrectly mapped all tags to rule 0. Use an NFA walk (tag_owners) to correctly discover which rule each tag belongs to. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With the tag binding on rule 1 (not rule 0), incorrect tag ownership would allow an unsafe delay: the check would see rule 0 as unreachable from the non-s exit (correct) but miss that rule 1 IS reachable there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e2fd0b3 to
317b10b
Compare
Non-interfering tag cells from different rules can share the same physical memory slot. Uses forward reachability to build an interference graph, then greedy graph coloring to assign slots. Run after tag delay so coloring benefits from the reduced forward reachability of delayed tags. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@toots, this is the last PR optimizing the named-group capture that I want to integrate before we do a new release |
Ok! What is your opinion on risk and testing we should do before the release? |
I've created #194 that adds qcheck based testing. It revealed a bug in the implementation currently on master. |
Summary
Non-interfering tag cells from different rules share the same physical memory slot via graph coloring:
cell_owners,forward_reachable,color_cells,remap_cells,share_cellsStacked on #179.
Test plan
test_realistic.mlmulti-rule lexer:init_mem 2→init_mem 1🤖 Generated with Claude Code