Skip to content

Add shortest match support via match%sedlex.shortest#180

Draft
hhugo wants to merge 1 commit into
ocaml-community:masterfrom
hhugo:longest
Draft

Add shortest match support via match%sedlex.shortest#180
hhugo wants to merge 1 commit into
ocaml-community:masterfrom
hhugo:longest

Conversation

@hhugo
Copy link
Copy Markdown
Collaborator

@hhugo hhugo commented Mar 11, 2026

In shortest mode, the lexer returns as soon as any rule matches rather than continuing to find the longest match. This is purely a code generation change — the DFA is identical, but final states return immediately instead of calling mark/backtrack.

@hhugo hhugo force-pushed the longest branch 2 times, most recently from 156b2e6 to 074be93 Compare March 11, 2026 22:08
@pmetzger
Copy link
Copy Markdown
Member

@Drup @toots Can you review?

@hhugo hhugo force-pushed the longest branch 2 times, most recently from bdfa1ab to 356a9f3 Compare March 12, 2026 21:14
@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 13, 2026

I've reworked the implementation to be much more self contained.
The whole logic happen at the end of the compile function.

  let dfa = ... in
  if shortest then (
    (* Collect reachable states, stripping transitions from accepting ones *)
    let n = Array.length dfa in
    let remap = Array.make n (-1) in
    let order = Array.make n 0 in
    let next = ref 0 in
    let rec mark i =
      if remap.(i) = -1 then (
        let j = !next in
        remap.(i) <- j;
        order.(j) <- i;
        incr next;
        let st = dfa.(i) in
        if not (Array.exists Fun.id st.finals) then
          Array.iter (fun (_, t) -> mark t) st.trans)
    in
    mark 0;
    Array.init !next (fun j ->
        let st = dfa.(order.(j)) in
        if Array.exists Fun.id st.finals then { st with trans = [||] }
        else { st with trans = Array.map (fun (c, t) -> (c, remap.(t))) st.trans }))
  else dfa

In shortest mode, the lexer returns as soon as any rule matches rather
than continuing to find the longest match. This is purely a code
generation change — the DFA is identical, but final states return
immediately instead of calling mark/backtrack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 13, 2026

@toots, the last implementation should be relatively easy to review, if you want to give it a try.

Copy link
Copy Markdown
Member

@toots toots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One factor out request, the rest looks good based on the plumbing and tests. The function to factor out is really the critical part so let's make it clear. Thanks!

Comment thread src/syntax/sedlex.ml
Comment on lines +145 to +166
if shortest then (
(* Collect reachable states, stripping transitions from accepting ones *)
let n = Array.length dfa in
let remap = Array.make n (-1) in
let order = Array.make n 0 in
let next = ref 0 in
let rec mark i =
if remap.(i) = -1 then (
let j = !next in
remap.(i) <- j;
order.(j) <- i;
incr next;
let st = dfa.(i) in
if not (Array.exists Fun.id st.finals) then
Array.iter (fun (_, t) -> mark t) st.trans)
in
mark 0;
Array.init !next (fun j ->
let st = dfa.(order.(j)) in
if Array.exists Fun.id st.finals then { st with trans = [||] }
else
{ st with trans = Array.map (fun (c, t) -> (c, remap.(t))) st.trans }))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you extract this as separate function so we can reason/change the implementation if it is ever needed?

@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 19, 2026

Ill hold onto this PR until we decide what to do with https://github.com/ocaml-community/sedlex/pull/188/changes. Because we would require a different implementation

@pmetzger
Copy link
Copy Markdown
Member

@toots You do have commit access to this repo. You should probably be making more decisions about merging @hhugo's updates as you are more involved in sedlex than I've been in a long time.

@toots
Copy link
Copy Markdown
Member

toots commented Mar 19, 2026

@pmetzger I need admin rights to setup automerge:
Screenshot 2026-03-19 at 8 07 16 AM

@hhugo hhugo marked this pull request as draft March 27, 2026 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants