Checked 49 scenario(s) across 41 test file(s).
- Tokenise a Python hello-world file
- Tokenise a JavaScript hello-world file
- Tokenise a TypeScript hello-world file
- Tokenise a Rust hello-world file
- Tokenise a C++ hello-world file
- Tokenise a Fortran hello-world file
- Tokenise a Vyper hello-world file
- Output is a valid JSON array
- Token types are non-empty strings
- Concatenating token values reconstructs the original input
- Unrecognised characters are reported on stderr not stdout
- Keywords are identified as keyword tokens
- Identifiers are identified as identifier tokens
- Whitespace is preserved as whitespace tokens
- String literals are identified as string literal tokens
- Operators are identified as operator tokens
- Keywords are not misidentified as identifiers
- Tokenise a Python file with function definition and control flow
- Tokenise a JavaScript file with variable declarations and arrow functions
- Tokenise a Rust file with struct and impl definitions
- Missing command-line arguments prints usage and exits non-zero
- Input file not found exits with a clear error message
- Invalid YAML lexer config exits with a clear error message
- Valid lexer YAML passes validation
- Lexer YAML missing required field fails validation
- Lexer YAML with a token missing both value and pattern fails validation
- All bundled lexer files pass validation
- A custom lexer tokenises a simple DSL
- Single-line comments are tokenised as comment tokens
- Multi-line comments are tokenised as comment tokens
- Python import statement is tokenised correctly
- JavaScript import statement is tokenised correctly
- Rust use statement is tokenised correctly
- Equality operator is tokenised as a single token
- Arrow operator is tokenised as a single token
- Compound assignment operators are tokenised as single tokens
- Integer literals are tokenised as number tokens
- Float literals are tokenised as number tokens
- Hexadecimal literals are tokenised as number tokens
- Empty file produces an empty token array
- Whitespace-only file produces only whitespace tokens
- Duplicate value-based tokens are rejected during validation
- Bundled lexers contain no duplicate token values
- Patterns are compiled once before tokenisation begins
- Large file tokenisation completes within a reasonable time
- String slice extraction uses direct slicing instead of character-by-character concatenation
- File not found produces a clean error message without a traceback
- Invalid YAML produces a clean error message without a traceback
- Unreadable file produces a clean error message without a traceback
49/49 scenarios covered.