-
Notifications
You must be signed in to change notification settings - Fork 0
Configuration
David Sisco edited this page Jan 4, 2026
·
1 revision
Configure tokenization behavior using TokenizerOptions.
The TokenizerOptions class provides fluent configuration for the tokenizer:
var options = TokenizerOptions.Default
.WithOperators(CommonOperators.JavaScript)
.WithCommentStyles(CommentStyle.CStyleSingleLine, CommentStyle.CStyleMultiLine)
.WithTagPrefixes('#', '@', '$')
.WithSymbols(',', ';', ':');TokenizerOptions.Default includes:
-
CommonOperators.Universaloperators - C-style single-line (
//) and multi-line (/* */) comments - Standard symbols (
,,;,:, etc.)
Operators are multi-character sequences matched greedily (longest match wins).
| Set | Operators | Use Case |
|---|---|---|
CommonOperators.Universal |
==, !=, &&, ||, <=, >=, <<, >>
|
Basic operators |
CommonOperators.CFamily |
Universal + ++, --, ->, ::, ...
|
C/C++/C# |
CommonOperators.JavaScript |
CFamily + ===, !==, =>, ?., ??, **
|
JavaScript/TypeScript |
CommonOperators.Rust |
CFamily + =>, .., ..=, ::
|
Rust |
CommonOperators.Python |
Universal + **, //, :=, @
|
Python |
// Add to existing set
var options = TokenizerOptions.Default
.WithOperators(CommonOperators.CFamily)
.WithOperators("<=>", "<|>", "|>");
// Start from scratch
var options = TokenizerOptions.Default
.WithOperators("->", "=>", "::", "...");
// Combine sets
var ops = CommonOperators.CFamily
.Union(CommonOperators.JavaScript)
.Add("custom");
var options = TokenizerOptions.Default.WithOperators(ops);Operators are matched using a trie for O(k) lookup:
// Given operators: =, ==, ===
// Input: "==="
// Matching proceeds character by character:
// 1. '=' → matches "="
// 2. '=' → matches "==" (longer)
// 3. '=' → matches "===" (longest match wins)Configure how comments are recognized.
| Style | Start | End | Example |
|---|---|---|---|
CommentStyle.CStyleSingleLine |
// |
newline | // comment |
CommentStyle.CStyleMultiLine |
/* |
*/ |
/* comment */ |
CommentStyle.HashSingleLine |
# |
newline | # comment |
// Use built-in styles
var options = TokenizerOptions.Default
.WithCommentStyles(
CommentStyle.CStyleSingleLine,
CommentStyle.CStyleMultiLine);
// Create custom single-line style
var luaComment = new CommentStyle("--");
// Create custom multi-line style
var htmlComment = new CommentStyle("<!--", "-->");
var options = TokenizerOptions.Default
.WithCommentStyles(luaComment, htmlComment);Symbols are single-character tokens that don't combine into operators.
// Add symbols
var options = TokenizerOptions.Default
.WithSymbols(',', ';', ':', '?');
// Default symbols include: , ; : and othersNote: Characters used in operators (like =, +, -) are automatically handled — they'll form operators when adjacent to matching characters, or become symbols when standalone.
Tag prefixes create TaggedIdentToken when followed by an identifier.
var options = TokenizerOptions.Default
.WithTagPrefixes('#', '@', '$');
var tokens = "#define @attribute $variable".TokenizeToTokens(options);
// TaggedIdentToken("#define", Tag='#', Name="define")
// TaggedIdentToken("@attribute", Tag='@', Name="attribute")
// TaggedIdentToken("$variable", Tag='$', Name="variable")| Prefix | Language | Example |
|---|---|---|
# |
C preprocessor |
#define, #include
|
@ |
Java/C# attributes |
@Override, @Attribute
|
$ |
PHP/Shell variables |
$var, $HOME
|
// Configure for a custom DSL
var options = TokenizerOptions.Default
.WithOperators(
"->", // Arrow
"=>", // Fat arrow
"::", // Scope resolution
"...", // Spread
"?.", // Optional chaining
"??" // Null coalescing
)
.WithCommentStyles(
CommentStyle.CStyleSingleLine,
CommentStyle.CStyleMultiLine,
new CommentStyle("--") // Lua-style
)
.WithTagPrefixes('#', '@')
.WithSymbols(',', ';', ':', '.');
var tokens = "foo?.bar() // call".TokenizeToTokens(options);For TinyAst, wrap options in a Schema:
var schema = Schema.Create()
.WithOperators(CommonOperators.JavaScript)
.WithCommentStyles(CommentStyle.CStyleSingleLine)
.WithTagPrefixes('#', '@')
.Build();
var tree = SyntaxTree.Parse(source, schema);See Schema for details on unified configuration.
- Schema — Unified tokenization + syntax definitions
- Token Types — Understanding operator and symbol tokens
- API Reference — Full operator and comment style lists