Skip to content

add NFA/DFA oracle tests#194

Open
hhugo wants to merge 4 commits into
ocaml-community:masterfrom
hhugo:oracle-testing
Open

add NFA/DFA oracle tests#194
hhugo wants to merge 4 commits into
ocaml-community:masterfrom
hhugo:oracle-testing

Conversation

@hhugo
Copy link
Copy Markdown
Collaborator

@hhugo hhugo commented Apr 2, 2026

Summary

  • Extract sedlex.compiler library: Move Sedlex, Cset (formerly Sedlex_cset) out of the PPX into a standalone sedlex.compiler library. Absorb sedlex.utils into it. Non-PPX code (tests, tools) can now depend on the regexp/DFA engine without pulling in ppxlib.
  • Add Cset.mem: Code point membership test for character sets.
  • Add NFA/DFA oracle testing: A sedlex_oracle library with a per-thread NFA simulator that serves as ground truth for the DFA. Compares acceptance and tag/binding values between NFA and DFA, printing FIXME on mismatches. Includes hand-written expect tests and QCheck property-based tests. Documents the known FIXME in add_node where the shared tag-op list causes wrong captures for overlapping Star/Plus + as-binding patterns.

Test plan

  • dune build succeeds
  • dune runtest passes (all existing tests + new oracle tests)
  • QCheck acceptance tests (2000 random regexps) pass
  • QCheck tag-aware tests find and report known DFA limitations with FIXME prefix

🤖 Generated with Claude Code

@pmetzger
Copy link
Copy Markdown
Member

@hhugo You should have the ability to commit such things on your own at this point...

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hhugo hhugo changed the title Extract sedlex.compiler library and add NFA/DFA oracle tests add NFA/DFA oracle tests Apr 13, 2026
hhugo and others added 3 commits April 15, 2026 12:15
Copy lex/{cset,lexgen,syntax,table}.{ml,mli} verbatim from the OCaml
compiler sources. These provide an independent TDFA implementation
used as a reference oracle for testing sedlex's tagged DFA.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ocamllex_oracle: converts Ir.t to ocamllex's regex AST, compiles
  with Lexgen.make_dfa, and simulates the resulting automata
- sedlex_oracle: runs both ocamllex and sedlex DFA simulators,
  compares results, reports errors on any divergence
- Add FIXME comment on add_node documenting the known epsilon-closure
  tag bug with overlapping Star/Plus and captures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hand-written tests cover captures, multi-rule, backtracking, bounded
repetition, complement, subtraction, intersection, or-patterns, and
the known Star/Plus overlap limitation. QCheck tests (50k iterations)
fuzz single-rule and two-rule patterns with random captures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants