Defer tag writes past fixed-length neighbors#6
Closed
hhugo wants to merge 24 commits into
Closed
Conversation
Add two example calculators showing how to bridge Sedlexing.lexbuf with ocamlyacc/menhir parsers, and document the pattern in the README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The upper bound of the surrogate rejection range was 0xdf00 instead of 0xdfff, which would have allowed U+DF01..U+DFFF through. In practice the bug was masked by the local Uchar.of_int wrapper, but fix it for correctness. Add comments explaining why only check_three needs the surrogate check, and add an expect test for surrogate rejection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nity#176) * Support nested let..in for [%sedlex.regexp?] definitions (ocaml-community#41) Allow users to define named regexps using nested let statements, e.g.: let int_lit = let digit = [%sedlex.regexp? '0'..'9'] in [%sedlex.regexp? Plus digit] Add eval_regexp_expr method that recursively evaluates let..in chains of regexp definitions, used by both the expression handler and structure_with_regexps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add comment to ast match * Update documentation --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The default branch in a match%sedlex is not a regexp — it fires when no rule matches, so zero characters are consumed and the lexeme is "". To catch unexpected characters, use `any` instead. Closes ocaml-community#51 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-community#181) * Document regexp operator precedence (fixes ocaml-community#35) Since sedlex regexps are OCaml patterns, they follow OCaml's pattern precedence: | (lowest) < , < constructor application (highest). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Doc: add new sub sections * cleanup * cleanup --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
emacs/sedlex-dot.el provides two interactive commands to render/remove DOT graph overlays directly in test buffers. render_dots.sh offers a CLI alternative for batch-rendering DOT graphs as SVG. Both are documented in HACKING.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
reduce memory consumption for named pattern under or pattern Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Delay tag writes as late as possible: when a fixed-length element needs only one tag, use the end tag instead of the start tag. This reduces redundant tag operations in loops (e.g., self-loop on 'a' no longer writes the tag on every iteration). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allocate boundary tags for fixed-length tuple elements that lack a
right-position anchor, so that as-binding tags fire as late as
possible in the automaton.
During the rights computation (right-to-left pass), when retreat
breaks at a variable-length element but the current element has a
fixed length, a boundary tag is allocated at the element's end.
This tag becomes a concrete anchor: elements further left can
compute their positions via Tag{tag; offset}. Dead boundary tags
(unreferenced by any as-binding) are eliminated by a new
Sedlex.optimize pass that strips unused tags and remaps live ones
to a dense range.
Inner tuples communicate boundary anchors to an enclosing alias
via the third element of aux's return triple: (start_anchor,
end_anchor). The alias picks whichever expression fires latest:
- Start_plus/End_minus always win (no tag needed).
- For the start boundary: inner anchor > outer left context.
- For the end boundary: outer right context > inner anchor.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sedlex.optimize) strips unused boundary tags and remaps live ones to a dense rangeaux's return tripleTest plan
("0x", Plus hexa) as x)🤖 Generated with Claude Code