Skip to content

Configuration

David Sisco edited this page Jan 4, 2026 · 1 revision

Configuration

Configure tokenization behavior using TokenizerOptions.

TokenizerOptions

The TokenizerOptions class provides fluent configuration for the tokenizer:

var options = TokenizerOptions.Default
    .WithOperators(CommonOperators.JavaScript)
    .WithCommentStyles(CommentStyle.CStyleSingleLine, CommentStyle.CStyleMultiLine)
    .WithTagPrefixes('#', '@', '$')
    .WithSymbols(',', ';', ':');

Default Configuration

TokenizerOptions.Default includes:

  • CommonOperators.Universal operators
  • C-style single-line (//) and multi-line (/* */) comments
  • Standard symbols (,, ;, :, etc.)

Operators

Operators are multi-character sequences matched greedily (longest match wins).

Built-in Operator Sets

Set Operators Use Case
CommonOperators.Universal ==, !=, &&, ||, <=, >=, <<, >> Basic operators
CommonOperators.CFamily Universal + ++, --, ->, ::, ... C/C++/C#
CommonOperators.JavaScript CFamily + ===, !==, =>, ?., ??, ** JavaScript/TypeScript
CommonOperators.Rust CFamily + =>, .., ..=, :: Rust
CommonOperators.Python Universal + **, //, :=, @ Python

Custom Operators

// Add to existing set
var options = TokenizerOptions.Default
    .WithOperators(CommonOperators.CFamily)
    .WithOperators("<=>", "<|>", "|>");

// Start from scratch
var options = TokenizerOptions.Default
    .WithOperators("->", "=>", "::", "...");

// Combine sets
var ops = CommonOperators.CFamily
    .Union(CommonOperators.JavaScript)
    .Add("custom");
var options = TokenizerOptions.Default.WithOperators(ops);

How Operator Matching Works

Operators are matched using a trie for O(k) lookup:

// Given operators: =, ==, ===
// Input: "==="

// Matching proceeds character by character:
// 1. '=' → matches "="
// 2. '=' → matches "==" (longer)
// 3. '=' → matches "===" (longest match wins)

Comment Styles

Configure how comments are recognized.

Built-in Comment Styles

Style Start End Example
CommentStyle.CStyleSingleLine // newline // comment
CommentStyle.CStyleMultiLine /* */ /* comment */
CommentStyle.HashSingleLine # newline # comment

Custom Comment Styles

// Use built-in styles
var options = TokenizerOptions.Default
    .WithCommentStyles(
        CommentStyle.CStyleSingleLine,
        CommentStyle.CStyleMultiLine);

// Create custom single-line style
var luaComment = new CommentStyle("--");

// Create custom multi-line style
var htmlComment = new CommentStyle("<!--", "-->");

var options = TokenizerOptions.Default
    .WithCommentStyles(luaComment, htmlComment);

Symbols

Symbols are single-character tokens that don't combine into operators.

// Add symbols
var options = TokenizerOptions.Default
    .WithSymbols(',', ';', ':', '?');

// Default symbols include: , ; : and others

Note: Characters used in operators (like =, +, -) are automatically handled — they'll form operators when adjacent to matching characters, or become symbols when standalone.

Tag Prefixes

Tag prefixes create TaggedIdentToken when followed by an identifier.

var options = TokenizerOptions.Default
    .WithTagPrefixes('#', '@', '$');

var tokens = "#define @attribute $variable".TokenizeToTokens(options);
// TaggedIdentToken("#define", Tag='#', Name="define")
// TaggedIdentToken("@attribute", Tag='@', Name="attribute")
// TaggedIdentToken("$variable", Tag='$', Name="variable")

Common Tag Prefix Uses

Prefix Language Example
# C preprocessor #define, #include
@ Java/C# attributes @Override, @Attribute
$ PHP/Shell variables $var, $HOME

Complete Example

// Configure for a custom DSL
var options = TokenizerOptions.Default
    .WithOperators(
        "->",    // Arrow
        "=>",    // Fat arrow
        "::",    // Scope resolution
        "...",   // Spread
        "?.",    // Optional chaining
        "??"     // Null coalescing
    )
    .WithCommentStyles(
        CommentStyle.CStyleSingleLine,
        CommentStyle.CStyleMultiLine,
        new CommentStyle("--")  // Lua-style
    )
    .WithTagPrefixes('#', '@')
    .WithSymbols(',', ';', ':', '.');

var tokens = "foo?.bar() // call".TokenizeToTokens(options);

Using with Schema

For TinyAst, wrap options in a Schema:

var schema = Schema.Create()
    .WithOperators(CommonOperators.JavaScript)
    .WithCommentStyles(CommentStyle.CStyleSingleLine)
    .WithTagPrefixes('#', '@')
    .Build();

var tree = SyntaxTree.Parse(source, schema);

See Schema for details on unified configuration.

See Also

  • Schema — Unified tokenization + syntax definitions
  • Token Types — Understanding operator and symbol tokens
  • API Reference — Full operator and comment style lists

Clone this wiki locally