Skip to content

Cross-rule cell sharing via graph coloring#175

Draft
hhugo wants to merge 4 commits into
ocaml-community:masterfrom
hhugo:share-cell
Draft

Cross-rule cell sharing via graph coloring#175
hhugo wants to merge 4 commits into
ocaml-community:masterfrom
hhugo:share-cell

Conversation

@hhugo
Copy link
Copy Markdown
Collaborator

@hhugo hhugo commented Feb 8, 2026

Summary

Non-interfering tag cells from different rules share the same physical memory slot via graph coloring:

  • Forward reachability per cell to build an interference graph
  • Greedy graph coloring to assign cells to slots
  • Runs after tag delay (Tag delay optimization #179) so coloring benefits from reduced forward reachability of delayed tags
  • Extracted helpers: cell_owners, forward_reachable, color_cells, remap_cells, share_cells

Stacked on #179.

Test plan

  • Codegen expect tests (DOT graphs + generated code)
  • test_realistic.ml multi-rule lexer: init_mem 2init_mem 1
  • Runtime correctness tests

🤖 Generated with Claude Code

@pmetzger
Copy link
Copy Markdown
Member

Just FYI, there are a few good papers out there about doing submatch and named submatch correctly that may be worth looking at, one by the re2c maintainer (and I think it's on his web site.)

@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 13, 2026

Just FYI, there are a few good papers out there about doing submatch and named submatch correctly that may be worth looking at, one by the re2c maintainer (and I think it's on his web site.)

Thanks, I've had a look at them before working on this PR (because you've mentioned it in #112 (comment))

The implementation here was inspired by these papers and the implementation of ocamllex.
I've tried to only implement the most effective optimizations to keep the complexity of the codebase manageable.

When a Set_position tag fires on every transition entering a cycle
state, remove it from those transitions and emit Set_position with
offset 1 (pos - 1) on exit transitions. This turns O(n) tag writes
per loop into O(1) on exit.

Handles multi-state cycles via SCC analysis (Tarjan's algorithm) and
multi-exit cycles: exits from other cycle states are safe when they
only reach rules that don't read the tag.

Set_position { cell; offset } with offset 0 for current position and
offset 1 for previous code point (subtracted from pos).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
hhugo and others added 2 commits March 27, 2026 17:24
Tags are allocated during regexp_of_pattern in the PPX, not during
compile_re. The tag_starts approach incorrectly mapped all tags to
rule 0. Use an NFA walk (tag_owners) to correctly discover which
rule each tag belongs to.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With the tag binding on rule 1 (not rule 0), incorrect tag ownership
would allow an unsafe delay: the check would see rule 0 as unreachable
from the non-s exit (correct) but miss that rule 1 IS reachable there.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hhugo hhugo force-pushed the share-cell branch 2 times, most recently from e2fd0b3 to 317b10b Compare March 27, 2026 17:42
Non-interfering tag cells from different rules can share the same
physical memory slot. Uses forward reachability to build an
interference graph, then greedy graph coloring to assign slots.

Run after tag delay so coloring benefits from the reduced forward
reachability of delayed tags.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hhugo hhugo changed the title Implement named capture group. Cross-rule cell sharing via graph coloring Mar 27, 2026
@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 27, 2026

@toots, this is the last PR optimizing the named-group capture that I want to integrate before we do a new release

@toots
Copy link
Copy Markdown
Member

toots commented Mar 27, 2026

@toots, this is the last PR optimizing the named-group capture that I want to integrate before we do a new release

Ok! What is your opinion on risk and testing we should do before the release?

@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Apr 2, 2026

What is your opinion on risk and testing we should do before the release?

I've created #194 that adds qcheck based testing. It revealed a bug in the implementation currently on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants