Skip to content

Refactor parser: Remove macro-based token dispatch and manual stateful block parsing #74

@sourcery-ai

Description

@sourcery-ai

The current parser implementation contains two areas of unnecessary complexity:

  1. Macro-based token dispatch (token_dispatch!)
  2. Manual stateful block parsing using balanced_block_with_min and a Cell state machine

Both can be simplified for improved readability and maintainability without sacrificing functionality.


1. Replace token_dispatch! Macro with a Method

Instead of using the token_dispatch! macro, introduce a dispatch method on TokenStream that takes a closure. This method would loop over tokens and allow the closure to handle each token, making the code more idiomatic and easier to follow.

Example Implementation:

impl<'a> TokenStream<'a> {
    /// Drive a user-supplied closure over every peeked token;
    /// the closure must advance the stream itself.
    pub fn dispatch<F>(&mut self, mut f: F)
    where
        F: FnMut(&mut Self, SyntaxKind, Span),
    {
        while let Some(&(kind, ref span_ref)) = self.peek() {
            let span = span_ref.clone();
            f(self, kind, span);
        }
    }
}

Call-site change:

// before:
// token_dispatch!(st, {
//     SyntaxKind::T_LPAREN => lp,
//     SyntaxKind::T_RPAREN => rp,
// });

// after:
st.stream.dispatch(|stream, kind, span| {
    match kind {
        SyntaxKind::T_LPAREN => lp(&mut st, span),
        SyntaxKind::T_RPAREN => rp(&mut st, span),
        _                  => stream.advance(),
    }
});

2. Replace Manual Stateful Block Parsing with Recursive Combinators

The current approach uses a Cell-based depth counter and manual state management. This can be replaced with a recursive parser combinator (e.g., using chumsky::recursive), which is flatter and easier to read.

Example Implementation:

use chumsky::recursive;

pub(super) fn balanced_block(
    open: SyntaxKind,
    close: SyntaxKind,
) -> impl Parser<SyntaxKind, (), Error = Simple<SyntaxKind>> + Clone {
    recursive(|block| {
        just(open)
            .padded_by(inline_ws().repeated())
            .ignore_then(
                block
                    .or(filter(|k: &SyntaxKind| *k != open && *k != close).ignored())
                    .padded_by(inline_ws().repeated())
                    .repeated(),
            )
            .then_ignore(just(close))
            .ignored()
    })
}

// For nonempty blocks:
pub(super) fn balanced_block_nonempty(
    open: SyntaxKind,
    close: SyntaxKind,
) -> impl Parser<SyntaxKind, (), Error = Simple<SyntaxKind>> + Clone {
    recursive(|block| {
        just(open)
            .padded_by(inline_ws().repeated())
            .ignore_then(
                block
                    .or(filter(|k: &SyntaxKind| *k != open && *k != close).ignored())
                    .padded_by(inline_ws().repeated())
                    .repeated()
                    .at_least(1),
            )
            .then_ignore(just(close))
            .ignored()
    })
}

This approach eliminates the need for internal mutable state and results in a much cleaner parser.


Action Items

  • Refactor the token_dispatch! macro into a dispatch method on TokenStream.
  • Replace the manual stateful block parsing (balanced_block_with_min + Cell) with a recursive combinator approach.
  • Update all call sites and tests accordingly.

These changes will make the parser codebase simpler, more idiomatic, and easier to maintain.


I created this issue for @leynos from #73 (comment).

Tips and commands

Getting Help

Metadata

Metadata

Assignees

No one assigned

    Labels

    lowLow criticality issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions