A lightweight, flexible TypeScript library for building parsers using parser combinators. Create powerful parsers by combining simple, reusable parsing functions.
- 🚀 Parser Combinators: Build complex parsers from simple building blocks
- 🔍 Built-in Tokenizer: Flexible tokenization with regex and string matching
- 📝 TypeScript First: Full type safety and IntelliSense support
- 🎯 Backtracking Support: Automatic position restoration on parse failures
- đź”’ Infinite Loop Protection:
manycombinator detects non-progressing parsers - 📦 Zero Dependencies: Lightweight with no external runtime dependencies
- ✨ Widely Compatible: Packaged with tsdown for ESM and CJS support
npm install @mattwca/little-parser-libimport { Tokenizer, TokenStream, anyOf, and, many, runParser } from '@mattwca/little-parser-lib';
// 1. Define your tokenizer
const tokenizer = new Tokenizer()
.withTokenType('letter', /[a-zA-Z]/)
.withTokenType('digit', /[0-9]/)
.withTokenType('whitespace', /\s/);
// 2. Tokenize your input
const tokens = tokenizer.tokenize('hello123');
const stream = new TokenStream(tokens);
// 3. Create a parser using combinators
const parser = and(
many(anyOf('letter')),
many(anyOf('digit'))
);
// 4. Run the parser
const result = runParser(parser, stream);
console.log(result); // { result: [[...letters], [...digits]] }The Tokenizer class converts raw input strings into tokens. Each token has a type, value, and position.
const tokenizer = new Tokenizer()
.withTokenType('number', /[0-9]/)
.withTokenType('operator', /[+\-*/]/)
.withTokenType('whitespace', /\s/);
const tokens = tokenizer.tokenize('1 + 2');
// [
// { type: 'number', value: '1', position: { line: 1, column: 1 } },
// { type: 'whitespace', value: ' ', position: { line: 1, column: 2 } },
// { type: 'operator', value: '+', position: { line: 1, column: 3 } },
// ...
// ]A parser function (ParseFn<T>) takes a TokenStream and returns a ParserResult<T>, which can be either:
SuccessfulParserResult<T>: Contains the parsed resultFailedParserResult: Contains error message and position
The TokenStream class manages the token consumption and position tracking during parsing:
Core Methods:
peek(): Look at the next token without consuming itconsume(): Consume and return the next tokenconsumeIf(...types): Conditionally consume a token if it matches the specified types
Position Management:
storePosition(): Save current position to the stack (for backtracking)clearPosition(): Remove the most recent saved positionrestorePosition(): Restore to the most recent saved position
Utility Methods:
peekRemainder(): Get all remaining tokens as a stringgetPositionForError(): Get current position info for error reporting
Combines multiple parsers in sequence. All parsers must succeed. Returns a tuple array preserving the types of each parser's result.
const parser = and(
anyOf('keyword'),
anyOf('identifier'),
anyOf('semicolon')
);
// Result type is [Token, Token, Token]Tries parsers in order, returns the first successful result. If all fail, returns the deepest error.
const parser = or(
anyOf('keyword'),
anyOf('identifier'),
anyOf('operator')
);Applies a parser repeatedly until it fails or stops making progress. Requires at least one successful match. Includes infinite loop protection by detecting when the parser doesn't advance the token position.
const parser = many(anyOf('digit')); // Parses one or more digitsMakes a parser optional. Returns null if it fails.
const parser = optional(anyOf('sign')); // Sign is optionalWraps a parser with automatic backtracking on failure.
const parser = attempt(
and(anyOf('keyword'), anyOf('identifier'))
);Transforms the result of a parser using a mapping function.
const digitParser = anyOf('digit');
const numberParser = map(
many(digitParser),
(tokens) => parseInt(tokens.map(t => t.value).join(''))
);Adds a custom label to parser errors for better debugging.
const parser = label(
'function declaration',
and(anyOf('function'), anyOf('identifier'))
);Parses any token matching the specified type(s).
const parser = anyOf('letter', 'digit', 'underscore');Parses any token NOT matching the specified type(s).
const parser = anyExcept('whitespace', 'newline');Ensures the end of input has been reached.
const parser = and(
myMainParser,
endOfInput() // Ensure nothing left to parse
);Runs a parser on a token stream. Throws ParserError on failure.
try {
const result = runParser(myParser, tokenStream);
console.log(result.result);
} catch (error) {
if (error instanceof ParserError) {
console.error(`Parse error at ${error.location.line}:${error.location.column}`);
}
}Convenience method to tokenize and parse in one step.
const result = runParserOnString(myParser, 'input string', tokenizer);import {
Tokenizer,
isSuccessfulResult,
anyOf,
and,
or,
many,
map,
optional,
runParserOnString
} from '@mattwca/little-parser-lib';
// Define tokenizer
const tokenizer = new Tokenizer()
.withTokenType('digit', /[0-9]/)
.withTokenType('plus', '+')
.withTokenType('minus', '-')
.withTokenType('multiply', '*')
.withTokenType('divide', '/')
.withTokenType('whitespace', /\s/);
// Define parsers
const digit = anyOf('digit');
const ws = optional(anyOf('whitespace'));
// Parse a number (one or more digits)
const number = map(
many(digit),
(tokens) => parseInt(tokens.map(t => t.value).join(''))
);
// Parse an operator
const operator = or(
anyOf('plus'),
anyOf('minus'),
anyOf('multiply'),
anyOf('divide')
);
// Parse a complete expression: number operator number
const expression = and(
number,
ws,
operator,
ws,
number
);
// Parse and extract values
const result = runParserOnString(expression, '42 + 8', tokenizer);
if (isSuccessfulResult(result)) {
const [leftNum, , op, , rightNum] = result.result;
console.log(`${leftNum} ${op.value} ${rightNum}`); // "42 + 8"
}The library provides detailed error messages with position information:
try {
const result = runParser(myParser, stream);
} catch (error) {
if (error instanceof ParserError) {
console.error(`
Error: ${error.message}
Line: ${error.location.line}
Column: ${error.location.column}
Position: ${error.location.position}
`);
}
}Parsers should be able to parse complex expressions, which can occasionally require a parser to be able to call itself, or call another parser which in turn calls it.
Taking the expression parser example above, we can modify it to support parsing of nested algebraic expressions:
type Expression = {
left: Expression | number;
operator: string;
right: Expression | number;
};
type AlgebraicExpression = {
symbol: string;
expression: Expression;
};
const tokenizer = new Tokenizer()
...
.withTokenType('left_parenthesis', '(')
.withTokenType('right_parenthesis', ')')
.withTokenType('letter', /[a-zA-Z]/);
let expression: ParseFn<number | AlgebraicExpression | Expression> | null = null;
const letter = anyOf('letter');
const leftParen = anyOf('left_parenthesis');
const rightParen = anyOf('right_parenthesis');
const algebraicExpression: ParseFn<AlgebraicExpression> = (ts) => {
return map(
and(letter, leftParen, expression!, rightParen),
([{ value: symbol },, expression]) => {
return {
symbol,
expression
}
}
)(ts);
};
expression = map(
and(
or<number | AlgebraicExpression>(number, algebraicExpression!),
ws,
operator,
ws,
or<number | AlgebraicExpression>(number, algebraicExpression!),
),
([leftExpr,, { value: operator },, rightExpr]) => ({
left: leftExpr,
operator,
right: rightExpr,
})
);
const result = runParserOnString(expression, 'a(3 + b(7 + 8)) + c(1 + 2)', tokenizer);
if (isSuccessfulResult(result)) {
console.log(JSON.stringify(result.result, null, 2));
}Tokenizer: Converts input strings into tokensTokenStream: Manages token consumption and backtrackingpeek(): Look at the next token without consuming itconsume(): Get and advance to the next tokenconsumeIf(...types): Conditionally consume token if it matches given typespeekRemainder(): Get remaining unparsed tokens as a stringstorePosition(),clearPosition(),restorePosition(): Manual backtracking control
ParserError: Error thrown when parsing fails
Token: Represents a single token with type, value, and positionTokenType: String identifier for token types (or 'end_of_input')TokenPosition: Position info with line and column numbersParseFn<T>: Function that takes a TokenStream and returns ParserResultParserResult<T>: Union of SuccessfulParserResult and FailedParserResultParserErrorPosition: Extended position info including token stream position
and(...parsers): Sequential combinationor(...parsers): Alternative combinationmany(parser): One or more repetitionsoptional(parser): Optional parserattempt(parser): Parser with backtrackingmap(parser, fn): Transform parser resultlabel(label, parser): Add error label
anyOf(...types): Match any of specified token typesanyExcept(...types): Match any token except specified typesendOfInput(): Match end of input
runParser(parser, stream): Execute parser on token streamrunParserOnString(parser, input, tokenizer): Execute parser on stringisSuccessfulResult(result): Type guard for successful resultsisFailedResult(result): Type guard for failed results
MIT
@mattwca