@mattwca/little-parser-lib

A lightweight, flexible TypeScript library for building parsers using parser combinators. Create powerful parsers by combining simple, reusable parsing functions.

Features

🚀 Parser Combinators: Build complex parsers from simple building blocks
🔍 Built-in Tokenizer: Flexible tokenization with regex and string matching
📝 TypeScript First: Full type safety and IntelliSense support
🎯 Backtracking Support: Automatic position restoration on parse failures
🔒 Infinite Loop Protection: many combinator detects non-progressing parsers
📦 Zero Dependencies: Lightweight with no external runtime dependencies
✨ Widely Compatible: Packaged with tsdown for ESM and CJS support

Installation

npm install @mattwca/little-parser-lib

Quick Start

import { Tokenizer, TokenStream, anyOf, and, many, runParser } from '@mattwca/little-parser-lib';

// 1. Define your tokenizer
const tokenizer = new Tokenizer()
  .withTokenType('letter', /[a-zA-Z]/)
  .withTokenType('digit', /[0-9]/)
  .withTokenType('whitespace', /\s/);

// 2. Tokenize your input
const tokens = tokenizer.tokenize('hello123');
const stream = new TokenStream(tokens);

// 3. Create a parser using combinators
const parser = and(
  many(anyOf('letter')),
  many(anyOf('digit'))
);

// 4. Run the parser
const result = runParser(parser, stream);
console.log(result); // { result: [[...letters], [...digits]] }

Core Concepts

Tokenizer

The Tokenizer class converts raw input strings into tokens. Each token has a type, value, and position.

const tokenizer = new Tokenizer()
  .withTokenType('number', /[0-9]/)
  .withTokenType('operator', /[+\-*/]/)
  .withTokenType('whitespace', /\s/);

const tokens = tokenizer.tokenize('1 + 2');
// [
//   { type: 'number', value: '1', position: { line: 1, column: 1 } },
//   { type: 'whitespace', value: ' ', position: { line: 1, column: 2 } },
//   { type: 'operator', value: '+', position: { line: 1, column: 3 } },
//   ...
// ]

Parser Functions

A parser function (ParseFn<T>) takes a TokenStream and returns a ParserResult<T>, which can be either:

SuccessfulParserResult<T>: Contains the parsed result
FailedParserResult: Contains error message and position

TokenStream

The TokenStream class manages the token consumption and position tracking during parsing:

Core Methods:

peek(): Look at the next token without consuming it
consume(): Consume and return the next token
consumeIf(...types): Conditionally consume a token if it matches the specified types

Position Management:

storePosition(): Save current position to the stack (for backtracking)
clearPosition(): Remove the most recent saved position
restorePosition(): Restore to the most recent saved position

Utility Methods:

peekRemainder(): Get all remaining tokens as a string
getPositionForError(): Get current position info for error reporting

Parser Combinators

`and(...parsers)`

Combines multiple parsers in sequence. All parsers must succeed. Returns a tuple array preserving the types of each parser's result.

const parser = and(
  anyOf('keyword'),
  anyOf('identifier'),
  anyOf('semicolon')
);
// Result type is [Token, Token, Token]

`or(...parsers)`

Tries parsers in order, returns the first successful result. If all fail, returns the deepest error.

const parser = or(
  anyOf('keyword'),
  anyOf('identifier'),
  anyOf('operator')
);

`many(parser)`

Applies a parser repeatedly until it fails or stops making progress. Requires at least one successful match. Includes infinite loop protection by detecting when the parser doesn't advance the token position.

const parser = many(anyOf('digit')); // Parses one or more digits

`optional(parser, shouldBacktrack?)`

Makes a parser optional. Returns null if it fails.

const parser = optional(anyOf('sign')); // Sign is optional

`attempt(parser)`

Wraps a parser with automatic backtracking on failure.

const parser = attempt(
  and(anyOf('keyword'), anyOf('identifier'))
);

`map(parser, mapFn)`

Transforms the result of a parser using a mapping function.

const digitParser = anyOf('digit');
const numberParser = map(
  many(digitParser),
  (tokens) => parseInt(tokens.map(t => t.value).join(''))
);

`label(label, parser)`

Adds a custom label to parser errors for better debugging.

const parser = label(
  'function declaration',
  and(anyOf('function'), anyOf('identifier'))
);

Built-in Parsers

`anyOf(...types)`

Parses any token matching the specified type(s).

const parser = anyOf('letter', 'digit', 'underscore');

`anyExcept(...types)`

Parses any token NOT matching the specified type(s).

const parser = anyExcept('whitespace', 'newline');

`endOfInput()`

Ensures the end of input has been reached.

const parser = and(
  myMainParser,
  endOfInput() // Ensure nothing left to parse
);

Running Parsers

`runParser(parser, tokenStream)`

Runs a parser on a token stream. Throws ParserError on failure.

try {
  const result = runParser(myParser, tokenStream);
  console.log(result.result);
} catch (error) {
  if (error instanceof ParserError) {
    console.error(`Parse error at ${error.location.line}:${error.location.column}`);
  }
}

`runParserOnString(parser, input, tokenizer)`

Convenience method to tokenize and parse in one step.

const result = runParserOnString(myParser, 'input string', tokenizer);

Example: Simple Expression Parser

import { 
  Tokenizer, 
  isSuccessfulResult,
  anyOf, 
  and, 
  or, 
  many, 
  map, 
  optional,
  runParserOnString 
} from '@mattwca/little-parser-lib';

// Define tokenizer
const tokenizer = new Tokenizer()
  .withTokenType('digit', /[0-9]/)
  .withTokenType('plus', '+')
  .withTokenType('minus', '-')
  .withTokenType('multiply', '*')
  .withTokenType('divide', '/')
  .withTokenType('whitespace', /\s/);

// Define parsers
const digit = anyOf('digit');
const ws = optional(anyOf('whitespace'));

// Parse a number (one or more digits)
const number = map(
  many(digit),
  (tokens) => parseInt(tokens.map(t => t.value).join(''))
);

// Parse an operator
const operator = or(
  anyOf('plus'),
  anyOf('minus'),
  anyOf('multiply'),
  anyOf('divide')
);

// Parse a complete expression: number operator number
const expression = and(
  number,
  ws,
  operator,
  ws,
  number
);

// Parse and extract values
const result = runParserOnString(expression, '42 + 8', tokenizer);
if (isSuccessfulResult(result)) {
  const [leftNum, , op, , rightNum] = result.result;
  console.log(`${leftNum} ${op.value} ${rightNum}`); // "42 + 8"
}

Error Handling

The library provides detailed error messages with position information:

try {
  const result = runParser(myParser, stream);
} catch (error) {
  if (error instanceof ParserError) {
    console.error(`
      Error: ${error.message}
      Line: ${error.location.line}
      Column: ${error.location.column}
      Position: ${error.location.position}
    `);
  }
}

Self-referencing ("recursive") parsing

Parsers should be able to parse complex expressions, which can occasionally require a parser to be able to call itself, or call another parser which in turn calls it.

Taking the expression parser example above, we can modify it to support parsing of nested algebraic expressions:

type Expression = {
  left: Expression | number;
  operator: string;
  right: Expression | number;
};

type AlgebraicExpression = {
  symbol: string;
  expression: Expression;
};

const tokenizer = new Tokenizer()
  ...
  .withTokenType('left_parenthesis', '(')
  .withTokenType('right_parenthesis', ')')
  .withTokenType('letter', /[a-zA-Z]/);

let expression: ParseFn<number | AlgebraicExpression | Expression> | null = null;

const letter = anyOf('letter');
const leftParen = anyOf('left_parenthesis');
const rightParen = anyOf('right_parenthesis');

const algebraicExpression: ParseFn<AlgebraicExpression> = (ts) => {
  return map(
    and(letter, leftParen, expression!, rightParen),
    ([{ value: symbol },, expression]) => {
      return {
        symbol,
        expression
      }
    }
  )(ts);
};

expression = map(
  and(
    or<number | AlgebraicExpression>(number, algebraicExpression!),
    ws,
    operator,
    ws,
    or<number | AlgebraicExpression>(number, algebraicExpression!),
  ),
  ([leftExpr,, { value: operator },, rightExpr]) => ({
    left: leftExpr,
    operator,
    right: rightExpr,
  })
);

const result = runParserOnString(expression, 'a(3 + b(7 + 8)) + c(1 + 2)', tokenizer);
if (isSuccessfulResult(result)) {
  console.log(JSON.stringify(result.result, null, 2));
}

API Reference

Classes

Tokenizer: Converts input strings into tokens
TokenStream: Manages token consumption and backtracking
- peek(): Look at the next token without consuming it
- consume(): Get and advance to the next token
- consumeIf(...types): Conditionally consume token if it matches given types
- peekRemainder(): Get remaining unparsed tokens as a string
- storePosition(), clearPosition(), restorePosition(): Manual backtracking control
ParserError: Error thrown when parsing fails

Types

Token: Represents a single token with type, value, and position
TokenType: String identifier for token types (or 'end_of_input')
TokenPosition: Position info with line and column numbers
ParseFn<T>: Function that takes a TokenStream and returns ParserResult
ParserResult<T>: Union of SuccessfulParserResult and FailedParserResult
ParserErrorPosition: Extended position info including token stream position

Combinators

and(...parsers): Sequential combination
or(...parsers): Alternative combination
many(parser): One or more repetitions
optional(parser): Optional parser
attempt(parser): Parser with backtracking
map(parser, fn): Transform parser result
label(label, parser): Add error label

Parsers

anyOf(...types): Match any of specified token types
anyExcept(...types): Match any token except specified types
endOfInput(): Match end of input

Utilities

runParser(parser, stream): Execute parser on token stream
runParserOnString(parser, input, tokenizer): Execute parser on string
isSuccessfulResult(result): Type guard for successful results
isFailedResult(result): Type guard for failed results

License

MIT

Author

@mattwca

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsdown.config.js		tsdown.config.js

mattwca/little-parser-lib

Folders and files

Latest commit

History

Repository files navigation

@mattwca/little-parser-lib

Features

Installation

Quick Start

Core Concepts

Tokenizer

Parser Functions

TokenStream

Parser Combinators

and(...parsers)

or(...parsers)

many(parser)

optional(parser, shouldBacktrack?)

attempt(parser)

map(parser, mapFn)

label(label, parser)

Built-in Parsers

anyOf(...types)

anyExcept(...types)

endOfInput()

Running Parsers

runParser(parser, tokenStream)

runParserOnString(parser, input, tokenizer)

Example: Simple Expression Parser

Error Handling

Self-referencing ("recursive") parsing

API Reference

Classes

Types

Combinators

Parsers

Utilities

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages