Skip to content

Refactor Rule Parsing to Use New Expression Parsers#184

Draft
leynos wants to merge 1 commit into
mainfrom
terragon/enhance-rule-body-parsing-as3l4x
Draft

Refactor Rule Parsing to Use New Expression Parsers#184
leynos wants to merge 1 commit into
mainfrom
terragon/enhance-rule-body-parsing-as3l4x

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Nov 15, 2025

Summary

  • Refactors rule body parsing to leverage new expression parsers and a span-based extraction approach.
  • Introduces Rule::body_expressions to parse rule bodies into structured Expr nodes, with meaningful parse errors preserved.
  • Replaces previous token-based body parsing with a focused span collection strategy for body literals.
  • Adds collect_rule_literal_spans in span_scanner and adapts parsing flow accordingly.
  • Updates tests to reflect new behavior and adds integration tests for expression parsing.

Changes

Parser API

  • Rule::body_literals now delegates to a new body_expression_texts helper, which collects expression texts from N_EXPR_NODE literals.
  • Added Rule::body_expression_texts and Rule::body_expressions:
    • body_expressions returns Vec<Result<Expr, Vec<Simple>>> by parsing each text with parse_expression.
    • body_expression_texts gathers the literal texts from the rule body, trimmed and non-empty.
  • Updated imports to include chumsky::error::Simple, and to use crate::parser::expression::parse_expression and Expr.

Span Scanning

  • Removed the old rule_body_span and replaced with collect_rule_literal_spans to extract body literal spans from the token stream.
  • Introduced helpers is_top_level and is_trivia_kind to properly detect top-level literals and ignore trivia.
  • Updated rule_decl to work with the new rule_body_literal path and to collect spans for validation via the new collector.

Expression Parsing

  • Expressions in rule bodies are now parsed via the dedicated expression parser (parse_expression).
  • The system now yields structured Expr nodes (e.g., binary, if-else, etc.) or parsing diagnostics.

Tests

  • Adjusted parser tests to reflect the new approach and expectations:
    • simple and multi-literal rules are updated to align with the current support level.
    • Added tests that verify spans are extracted and that literals are parsed into expression structures.
  • Updated expression integration tests to ensure rule bodies yield expression nodes without unexpected errors.

Why

  • This refactor decouples rule-body parsing from a brittle, token-oriented approach and delegates complex expression syntax to the dedicated expression parser. It improves maintainability and enables richer rule-body expressions in the future.

Risk & Mitigations

  • Some tests expecting multi-literal rule bodies to parse fully may fail until full multi-literal support is implemented. The suite has been updated to reflect the current behavior and to guide future enhancements.
  • Internal API changes may affect downstream code; Rule::body_expressions now returns structured Exprs and may require updating callers.

Testing Plan

  • Run cargo test to ensure all tests pass.
  • Verify single-literal rule bodies produce correct body_literals and parsed expressions.
  • Verify span extraction yields expected literal spans and that expression parsing produces correct AST structures for complex bodies.
  • Ensure integration tests for expression parsing cover binary operations, conditionals, and nested expressions.

🌿 Generated by Terry


ℹ️ Tag @terragon-labs to ask questions and address PR feedback

📎 Task: https://www.terragonlabs.com/task/9a6801dd-1edc-476a-9513-f641b8f9d12f

Summary by Sourcery

Refactor rule-body parsing to delegate literal extraction and syntax analysis to the expression parser, replacing brittle token-driven logic with a span-based approach and exposing structured Expr nodes via the AST.

New Features:

  • Introduce collect_rule_literal_spans to extract rule-body literal spans from token streams.
  • Add Rule::body_expressions to parse and return structured Expr AST nodes for each body literal, preserving parse diagnostics.

Enhancements:

  • Overhaul span_scanner to remove legacy rule_statement and for-binding parsers in favor of a focused literal collector.
  • Simplify Rule.body_literals to use body_expression_texts that filters N_EXPR_NODE children.
  • Streamline expression-integration tests to assert zero parse errors and correct node texts.

Tests:

  • Add unit tests verifying collect_rule_spans correctly extracts trimmed literal texts.
  • Add integration tests for expression parsing in rule bodies, including multiple literals and control-flow expressions.
  • Update existing parser tests to align with updated multi-literal support and error expectations.

- Replaced rule body literal string extraction with parsing into structured expressions.
- Added `body_expressions()` on Rule AST node returning parsed Expr with error details.
- Updated span scanner to properly collect literal spans by tracking nested tokens and commas/dots.
- Simplified expression parsing in rule body using chumsky parser combinators.
- Removed obsolete expression span logic related to rule bodies.
- Improved parser tests to assert correct parsing of multiple expressions in rule body.
- Enhanced integration tests verifying accurate expression nodes for multiple literals.

This refactor improves AST representation of rule bodies by parsing each literal into a full expression node rather than plain string literals, enabling richer semantic analysis and validation.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Nov 15, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch terragon/enhance-rule-body-parsing-as3l4x

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Nov 15, 2025

Reviewer's Guide

Refactors rule-body parsing by replacing the brittle token-oriented approach with a span-based literal collector and leveraging the dedicated expression parser to produce structured Expr nodes with preserved diagnostics.

Class diagram for updated Rule body parsing API

classDiagram
    class Rule {
        +body_literals() Vec<String>
        +body_expressions() Vec<Result<Expr, Vec<Simple<SyntaxKind>>>>
        +body_expression_texts() Vec<String>
    }
    class Expr
    class Simple
    class SyntaxKind
    Rule --> Expr : parses to
    Rule --> Simple : error diagnostics
    Rule --> SyntaxKind : uses for error reporting
Loading

Flow diagram for new rule body literal span collection

flowchart TD
    A["Token stream"] --> B["collect_rule_literal_spans"]
    B --> C["Extract literal spans"]
    C --> D["body_expression_texts"]
    D --> E["parse_expression"]
    E --> F["Expr nodes or diagnostics"]
Loading

File-Level Changes

Change Details Files
Refactor AST API for body expression extraction
  • Added body_expression_texts to collect trimmed N_EXPR_NODE texts
  • Introduced body_expressions to parse texts into Expr with error reporting
  • Replaced body_literals to delegate to new helper
  • Imported parse_expression, Expr, and Simple
src/parser/ast/rule.rs
Replace token-based scanning with literal span collector
  • Removed rule_body_span and old rule_statement combinator
  • Added collect_rule_literal_spans to extract top-level literal spans
  • Introduced is_trivia_kind and is_top_level to track nesting levels
  • Refactored rule_decl and rule_body_literal to use new collector
src/parser/span_scanner.rs
Integrate expression parser for rule bodies
  • Hooked parse_expression into body_expressions to produce structured AST nodes
  • Preserved and propagated parse diagnostics in rule_decl via validate_expression
  • Adjusted parsing flow to validate each literal span against the expression parser
src/parser/ast/rule.rs
src/parser/span_scanner.rs
Update and add parser tests for new behavior
  • Updated rule parser tests to match span-based literal extraction
  • Added integration tests verifying Expr node creation and multi-literal support
  • Modified expectation flags in rules.rs for multi-literal rules based on current support
src/parser/tests/expression_integration.rs
src/parser/tests/rules.rs

Possibly linked issues

  • #Refactor collect_rule_spans to collapse helper functions into a single focused loop: The PR refactors rule body parsing by removing rule_statement and similar helpers, introducing collect_rule_literal_spans and Rule::body_expressions to consolidate and simplify the logic as requested in the issue.
  • Simplify complex expression span extraction logic in collect_rule_spans #102: The PR refactors and simplifies the complex expression span extraction logic in collect_rule_spans by introducing new, modular parsers and span collection methods.
  • #Simplify body_literals method in src/parser/ast/rule.rs: The PR directly addresses the issue by removing the complex helper methods from body_literals and replacing them with a simplified, delegated parsing approach using new expression parsers and span-based extraction.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant