Skip to content

Add Sedlexing.accept for custom buffer control#188

Draft
hhugo wants to merge 2 commits into
ocaml-community:masterfrom
hhugo:accept
Draft

Add Sedlexing.accept for custom buffer control#188
hhugo wants to merge 2 commits into
ocaml-community:masterfrom
hhugo:accept

Conversation

@hhugo
Copy link
Copy Markdown
Collaborator

@hhugo hhugo commented Mar 13, 2026

Summary

  • Addresses Supporting buffers that can ignore mark requests #81
  • Adds Sedlexing.accept : lexbuf -> int -> int called when the lexer reaches a final state with no further transitions
  • Default implementation simply returns the final state index (no behavior change)
  • Custom Sedlexing modules can override accept to call backtrack instead, e.g. to reject matches that do not end on a grapheme cluster boundary

Test plan

  • dune build passes
  • dune runtest passes
  • Codegen tests updated to reflect Sedlexing.accept calls in generated code

🤖 Generated with Claude Code

@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 13, 2026

Here is a usecase https://github.com/ifazk/sedlying

Copy link
Copy Markdown
Member

@toots toots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be possible to add an example with a custom accept to better illustrate a situation where this is useful?

There should be a major version bump for this.

Worth noting for future reference: I do remember having a lot of projects vendoring the sedlexing.mli interface which can cause a lot of obscure compilation failures when pushing the package to the opam CI.

Since you're only adding a net-new API perhaps this wont happen?

@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 13, 2026

We should probably wait for feedback from @ifazk

@ifazk
Copy link
Copy Markdown

ifazk commented Mar 13, 2026

It has been a long time since I've hacked around with sedlex, and I haven't even been using OCaml for the last little bit. I personally don't need this anymore, but I would still like to see this merged. Will need a day or two to figure out how to use the new accept and verify if it fits my original use case.

@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 13, 2026

It has been a long time since I've hacked around with sedlex, and I haven't even been using OCaml for the last little bit. I personally don't need this anymore, but I would still like to see this merged. Will need a day or two to figure out how to use the new accept and verify if it fits my original use case.

You should be able to define accept as

let accept lexbuf final =
  mark lexbuf final;
  backtrack lexbuf

@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 14, 2026

If we decide to add this feature. #180 would need to be implemented differently.

@pmetzger
Copy link
Copy Markdown
Member

Someone tell me what to merge (if anything.)

@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 25, 2026

gentle ping @ifazk

@ifazk
Copy link
Copy Markdown

ifazk commented Mar 26, 2026

Sorry, was busy the last two weekends. I'm trying to work through it now, but ran into some issues and need some help since I haven't been keeping up with sedlex changes.

I locally added accept to Sedlying, and vendored sedlex with hhugo's changes in the test directory, but ran into the following error.

File "_none_", line 1:               
Error: Unbound value Sedlexing.__private__next_int

It seems to come from match%sedlex buf, and seems to happen on both hhugo:accept and master. Is sedlex.ppx completely unsupported with custom modules?

My accept implementation:

let accept lexbuf i =
  mark lexbuf i;
  let _ = backtrack lexbuf in
  ()

Test code that's failing:

let leg = "leg"
let legdots = "leg\u{0308}"

(* this needs `opam pin uugen https://github.com/ifazk/uugen.git` *)
let mk_gen str = (* make a grapheme cluster gen *)
  let decoder = Uutf.decoder ~encoding:`UTF_8 (`String str) in
  let boundary = `Grapheme_cluster in
  Uugen.Uuseg_gen.of_decoder_replacing ~boundary ~decoder

let rleg = [%sedlex.regexp? "leg" ]
let rlegplus = [%sedlex.regexp? Plus "leg" ]
let rlegstar = [%sedlex.regexp? Star "leg" ]

module Sedlexing = Sedlying.Sedlexing

let mk_buf str = (* make a sedlying buffer with grapheme cluster boundaries *)
  Sedlexing.create (mk_gen str)

let test1 () =
  let buf = Sedlexing.create (mk_gen leg) in
  match%sedlex buf with
  | rleg -> ()
  | _ ->  failwith "test1 failed"

I should be more responsive now that I have OCaml and dune setup, again sorry I was a little MIA.

@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 26, 2026

Just provide __private__next_int by exporting the one found in the official Sedlexing module.

For accept, you need to return the accepting state.

let accept lexbuf i =
  mark lexbuf i;
  let x = backtrack lexbuf in
  x

@ifazk
Copy link
Copy Markdown

ifazk commented Mar 27, 2026

Thanks for all your help. The accept change fit my original use case and passed all the test cases I had!

Sorry it took so long that you have to rebase now.

@hhugo hhugo marked this pull request as draft March 27, 2026 11:12
@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 27, 2026

I'd like to explore a bit more on this front before we commit to anything. I've concerted the into a Draft

hhugo and others added 2 commits March 27, 2026 12:15
…tance (ocaml-community#81)

When a DFA reaches a final state with no further transitions, the generated
code now calls Sedlexing.accept instead of directly returning the state index.
The default implementation simply returns the index, so there is no behavior
change. Custom Sedlexing modules can override accept to call backtrack instead,
e.g. to reject matches that do not end on a grapheme cluster boundary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The int parameter was unnecessary — the PPX already knows the rule index,
so accept only needs to signal yes/no. This simplifies both the API and
the generated code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hhugo
Copy link
Copy Markdown
Collaborator Author

hhugo commented Mar 27, 2026

I've explored a different design space for what you wanted to achieve in #192 that is support for when guards on rules. That made we wonder if we could instead expose enough of the internal sedlex-ppx so that you can have you own ppx and support match%sedlying .. with. One cheap way to do this would be to automatically add a guard to all branches PAT when Sedlying.at_boundary lexbuf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants