Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# unreleased
- Support nested `let..in` for `[%sedlex.regexp?]` definitions
- Add support for named captured group (#177)
- Add support for `when` guard expressions

# 3.7 (2025-10-06)
- Update to unicode 17.0.0
Expand Down
34 changes: 32 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,7 @@ or:
]
```

(The first vertical bar is optional as in any OCaml pattern matching.
Guard expressions are not allowed.)
(The first vertical bar is optional as in any OCaml pattern matching.)

where:
- lexbuf is an arbitrary lowercase identifier, which must refer to
Expand Down Expand Up @@ -185,6 +184,37 @@ match%sedlex buf with
**Restriction:** `as` bindings are not allowed inside repetition operators
(`Star`, `Plus`, `Opt`, `Rep`) or set operators (`Compl`, `Sub`, `Intersect`).

### Guard expressions (`when` clauses)

Rules can include `when` guards, just like normal OCaml pattern matching:

```ocaml
match%sedlex buf with
| Plus ('0'..'9') when int_of_string (Sedlexing.Utf8.lexeme buf) < 256 ->
Printf.printf "byte: %s\n" (Sedlexing.Utf8.lexeme buf)
| Plus ('0'..'9') ->
Printf.printf "large number: %s\n" (Sedlexing.Utf8.lexeme buf)
| _ -> ()
```

When a guard returns `false`, the next rule that matches the same input is
tried. If no further rule matches, the lexer backtracks to the last
accepted position as usual.

Guards can reference `as`-bound variables:

```ocaml
match%sedlex buf with
| (Plus ('0'..'9') as n) when int_of_string (Sedlexing.Utf8.of_submatch n) < 256 ->
Printf.printf "byte: %s\n" (Sedlexing.Utf8.of_submatch n)
| _ -> ()
```

**Note:** Guards are evaluated at each DFA accepting state during lexing, not
after the final match. This means a guard may be evaluated multiple times for
rules inside repetition (e.g. `Plus`). Guard expressions should be
side-effect-free.

### Encoding

- The OCaml source is assumed to be encoded in UTF-8.
Expand Down
Loading