Skip to content

Refactor SpanCursors to use SmallVec; add CST integration tests#203

Merged
leynos merged 11 commits into
mainfrom
terragon/refactor-spancursors-to-vec-q01vp6
Jan 5, 2026
Merged

Refactor SpanCursors to use SmallVec; add CST integration tests#203
leynos merged 11 commits into
mainfrom
terragon/refactor-spancursors-to-vec-q01vp6

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Jan 2, 2026

Summary

  • Replaces fixed-size SpanCursors with a dynamic SmallVec of SpanCursor<'a> built from SPAN_CURSOR_KINDS, enabling easier extension of span categories during CST construction.
  • Introduces SPAN_CURSOR_KINDS: a static array of (getter, SyntaxKind) linking each ParsedSpans getter to the target SyntaxKind.
  • Adds a type alias SpanSliceGetter and a small SPAN_CURSOR_INLINE capacity to drive building of SpanCursor instances, simplifying per-token processing.

Core CST Construction

  • Removed the SPAN_CURSOR_COUNT constant and the fixed-size SpanCursors wrapper.
  • SpanCursors now uses a SmallVec<[SpanCursor<'a>; SPAN_CURSOR_INLINE]> constructed from SPAN_CURSOR_KINDS in SpanCursors::new(spans).
  • CST green-tree construction updated to iterate over the dynamic cursor collection for advance_to, start_if, and finish_if.

Implementation Details

  • Per-cursor boilerplate removed; cursors are built at runtime from SPAN_CURSOR_KINDS.
  • No separate build_span_cursors function; build via SpanCursors::new(spans).
  • Uses smallvec::SmallVec to store cursors.

Tests

  • Build and run existing tests to ensure no regressions in CST construction.
  • Validate CST output remains consistent for representative input samples.
  • Add integration tests (new module) to verify all top-level categories produce expected CST nodes.
  • Test harness enhancements:
    • Introduced SourceText wrapper to unify test input types.
    • Added count_nodes_by_kind helper for concise node counts in tests.
    • parse_ok/parse_err helpers now accept Into and convert accordingly; round_trip helper updated to use SourceText.
    • Tests now rely on a new cst_integration module.

Integration Tests

  • Added src/parser/tests/cst_integration.rs to exercise CST construction across all top-level categories.
  • Updated src/parser/tests/mod.rs to include cst_integration module.

Notes

  • The changes introduce the smallvec dependency (Cargo.toml).
  • The test harness updates enable more ergonomic tests and stronger CST validation.
  • Added text_range_to_span helper in src/parser/ast/rule.rs to bridge Rowan TextRange to byte-offset Spans used in tests.

📎 Task: https://www.terragonlabs.com/task/d0f7613f-b563-46d4-af14-6cb6a796f068

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 2, 2026

Summary by CodeRabbit

  • Refactor

    • Parser now builds span cursors from a data-driven, small stack-backed buffer with per-cursor kind tracking to reduce allocations and improve CST construction.
  • Tests

    • Added an integration test for a mixed top‑level program; enhanced span‑position verification and unified parsing helper inputs for more flexible test data.
  • Documentation

    • Documented the CST construction optimisation and rationale for the small inline buffer.
  • Chores

    • Added a small-vector dependency to support stack‑backed buffering.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Refactor CST span-cursor construction from a fixed-size array to a data-driven SmallVec built from SPAN_CURSOR_KINDS; store per-cursor SyntaxKind on each SpanCursor; update cursor iteration and span-position tests; add smallvec dependency and minor test helper/visibility changes.

Changes

Cohort / File(s) Summary
Core cursor refactor
src/parser/cst_builder/tree.rs
Replace fixed-size cursor array with SmallVec<SpanCursor> using SPAN_CURSOR_KINDS. Add SpanSliceGetter alias, SPAN_CURSOR_KINDS, and SPAN_CURSOR_INLINE. Store SyntaxKind on SpanCursor. Update SpanCursors::new, advance_and_start, finish, and iteration logic to operate over the dynamic collection.
Tests: CST verification
src/parser/tests/cst_integration.rs, src/parser/tests/mod.rs, src/parser/tests/helpers.rs, src/parser/tests/rules/body_terms.rs
Add cst_integration test parsing a mixed top-level programme and asserting node-kind counts and pretty-print round-trip. Wire mod cst_integration;. Introduce SourceText test wrapper, add count_nodes_by_kind, change helper APIs to accept impl Into<SourceText>, and adjust a couple of tests to call parse_err(src.as_str()).
Dependency & docs
Cargo.toml, docs/ddlint-design-and-road-map.md
Add smallvec = "1.15.1" dependency. Document CST construction optimisation and cite smallvec in roadmap.
Visibility tweak
src/parser/ast/rule.rs
Change text_range_to_span visibility from private to pub(crate) and add doc comment explaining its bridging role.

Sequence Diagram(s)

(Skip; changes are internal refactor and test adjustments without a multi-actor sequential runtime flow that benefits from a diagram.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Poem

Cursors wake from rigid rows,
SmallVec cradles lines that flow,
Kinds kept close, each span aligned,
Tests confirm the tree defined,
Parsers hum — new order grows. 🎉

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarises the main changes: refactoring SpanCursors to use SmallVec and adding CST integration tests.
Description check ✅ Passed The description comprehensively covers the refactoring objectives, implementation details, test enhancements, and dependency additions directly related to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch terragon/refactor-spancursors-to-vec-q01vp6

📜 Recent review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59e65df and 2ce8ba8.

📒 Files selected for processing (1)
  • src/parser/tests/helpers.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings in Rust.
Handle errors with the Result type instead of panicking in Rust where feasible.
Avoid unsafe code in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared test setup in Rust.
Replace duplicated tests with #[rstest(…)] parameterized cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...

Files:

  • src/parser/tests/helpers.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/parser/tests/helpers.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.

Applied to files:

  • src/parser/tests/helpers.rs
🧬 Code graph analysis (1)
src/parser/tests/helpers.rs (7)
src/parser/expression/token_stream.rs (1)
  • src (106-108)
src/parser/token_stream.rs (1)
  • src (118-120)
src/parser/ast/root.rs (1)
  • kind (46-48)
src/parser/mod.rs (1)
  • parse (47-55)
src/test_util/assertions.rs (1)
  • assert_no_parse_errors (21-23)
src/parser/cst_builder/mod.rs (1)
  • root (45-47)
src/parser/ast/mod.rs (1)
  • syntax (24-24)
🔍 Remote MCP

Based on the comprehensive repository documentation gathered, here is a concise summary of relevant context for this pull request:

Summary of Relevant Context

Purpose of SpanCursors and CST Construction

SpanCursors manages iteration over collected statement spans and determines when to start and finish syntax nodes in the rowan::GreenNodeBuilder during the Concrete Syntax Tree (CST) construction phase. SpanCursors holds an array of individual SpanCursor instances, one for each type of DDlog statement (e.g., imports, typedefs, relations, functions).

How the Refactoring Fits Into the Architecture

The PR refactors SpanCursors from a fixed-size array to a dynamic SmallVec-backed collection. This aligns with the existing architecture where the build_green_tree function takes a stream of tokens and ParsedSpans to create a lossless syntax tree, initializes SpanCursors with spans, iterates through each token in the token stream calling cursors.advance_and_start() and cursors.finish() to manage starting and finishing nodes based on token positions.

ParsedSpans and Span Collection Context

The ParsedSpans structure stores categorized byte ranges for different DDlog statements, collected during the "Span Analysis" phase by the parse_tokens function. The ParsedSpans builder ensures that span lists are sorted and non-overlapping in debug builds to prevent errors during CST construction.

Seven Cursor Categories

The refactoring manages cursors for seven different statement types: imports, typedefs, relations, indexes, functions, transformers, and rules, which directly corresponds to the span categories tracked in ParsedSpans.

SmallVec Performance Motivation

The use of SmallVec aligns with the documented architecture goal: reducing heap allocations during the hot path of CST construction by using stack-backed storage when the cursor count fits inline.

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (3)
src/parser/tests/helpers.rs (3)

13-37: LGTM!

The SourceText wrapper is well-documented and provides idiomatic trait implementations. The removal of the redundant From<&String> impl addresses the prior feedback correctly.


93-123: LGTM!

The refactor to impl Into<SourceText> is applied consistently across parse_ok, parse_err, and round_trip. The conversion pattern (src.into() followed by src.as_ref()) is clean and idiomatic.


125-165: LGTM!

The parse_single_item helper and its domain-specific wrappers follow the established pattern. The #[expect(clippy::expect_used, reason = "...")] attribute correctly documents why the lint is suppressed and is placed after the doc comment per coding guidelines.


Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Jan 2, 2026

Reviewer's Guide

Refactors CST span handling to replace the fixed-size SpanCursors wrapper and SPAN_CURSOR_COUNT constant with a dynamically built Vec, and updates CST construction to iterate generically over that vector when advancing, starting, and finishing spans per token.

Class diagram for dynamic SpanCursor vector-based CST construction

classDiagram
    class ParsedSpans {
        +imports() [Span]
        +typedefs() [Span]
        +relations() [Span]
        +indexes() [Span]
        +functions() [Span]
        +transformers() [Span]
        +rules() [Span]
        +expressions() [Span]
    }

    class SpanCursor~'a~ {
        -iter PeekableIterSpan
        -kind SyntaxKind
        +new(spans_ref SpanSliceRef, kind SyntaxKind) SpanCursor~'a~
        +advance_to(pos usize) void
        +start_if(builder_ref GreenNodeBuilderRef, pos usize) void
        +finish_if(builder_ref GreenNodeBuilderRef, pos usize) void
    }

    class GreenNodeBuilder {
        +new() GreenNodeBuilder
        +start_node(raw_kind RawSyntaxKind) void
        +finish_node() void
    }

    class Functions {
        +build_span_cursors(spans_ref ParsedSpansRef) VecSpanCursor
        +build_green_tree(spans_ref ParsedSpansRef, tokens_ref TokenSliceRef, src_ref StrRef) void
    }

    ParsedSpans "1" --> "*" Span : provides_spans_for
    ParsedSpans --> SpanCursor : used_to_construct
    SpanCursor --> GreenNodeBuilder : controls_node_start
    SpanCursor --> GreenNodeBuilder : controls_node_finish

    Functions ..> ParsedSpans : parameter
    Functions ..> SpanCursor : constructs_collection
    Functions ..> GreenNodeBuilder : drives_cst_build

    %% Relationships representing the new dynamic vector usage
    class VecSpanCursor {
        +len() usize
        +iter_mut() SpanCursorIterator
    }

    Functions --> VecSpanCursor : returns
    VecSpanCursor "*" --> "1" SpanCursor : contains

    %% Removed structure (for contrast)
    class SpanCursors~'a~ {
        -cursors SpanCursorArray
    }

    SpanCursors~'a~ .. SpanCursor : was_fixed_array_of
    SpanCursors~'a~ .. Functions : removed_in_refactor
Loading

File-Level Changes

Change Details Files
Replace fixed-size SpanCursors container with a dynamic Vec of SpanCursor built from ParsedSpans.
  • Remove SPAN_CURSOR_COUNT constant and SpanCursors struct along with its constructor and helper methods for advancing/finishing cursors.
  • Introduce build_span_cursors(spans) helper that assembles SpanCursor instances by pairing each ParsedSpans category with its SyntaxKind and collecting them into a Vec.
  • Use an iterator-based map/collect pattern to construct SpanCursor values instead of a fixed array literal.
src/parser/cst_builder/tree.rs
Update CST construction loop to operate directly over the Vec for span advancement and node boundary management.
  • Initialize cursors in build_green_tree via build_span_cursors(spans) instead of SpanCursors::new(spans).
  • Inline advance_and_start behavior by looping over cursors to call advance_to and start_if at each token’s start position.
  • Inline finish behavior by looping over cursors to call finish_if at each token’s end position, preserving existing CST semantics while removing per-cursor boilerplate.
src/parser/cst_builder/tree.rs

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@leynos leynos marked this pull request as ready for review January 3, 2026 02:00
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The per-token loops over cursors in build_green_tree now duplicate the advance_to/start_if and finish_if logic inline; consider extracting small helper functions (or a thin wrapper type around Vec<SpanCursor<'_>>) to keep this logic encapsulated as it was with SpanCursors and avoid repeating the loops.
  • build_span_cursors creates a new Vec on every build_green_tree call; if CST construction is on a hot path, you might want to benchmark and consider alternatives like reusing a preallocated buffer or a small-vector type to reduce heap allocations while keeping the dynamic span category list.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The per-token loops over `cursors` in `build_green_tree` now duplicate the `advance_to`/`start_if` and `finish_if` logic inline; consider extracting small helper functions (or a thin wrapper type around `Vec<SpanCursor<'_>>`) to keep this logic encapsulated as it was with `SpanCursors` and avoid repeating the loops.
- `build_span_cursors` creates a new `Vec` on every `build_green_tree` call; if CST construction is on a hot path, you might want to benchmark and consider alternatives like reusing a preallocated buffer or a small-vector type to reduce heap allocations while keeping the dynamic span category list.

## Individual Comments

### Comment 1
<location> `src/parser/cst_builder/tree.rs:43` </location>
<code_context>
-            cur.finish_if(builder, pos);
-        }
-    }
+fn build_span_cursors(spans: &ParsedSpans) -> Vec<SpanCursor<'_>> {
+    vec![
+        (spans.imports(), SyntaxKind::N_IMPORT_STMT),
</code_context>

<issue_to_address>
**issue (review_instructions):** Add both behavioural and unit tests to cover this change in how span cursors are constructed and iterated.

Add unit tests that exercise `build_green_tree` with representative `ParsedSpans` to verify that imports, typedefs, relations, indexes, functions, transformers, rules, and expressions still start and finish at the correct positions after introducing `build_span_cursors` and the new cursor iteration pattern. In addition, add behavioural tests at the integration level (e.g. parsing a Datalog program and validating the resulting CST) to ensure that the observable behaviour of CST construction has not regressed.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
- For any new feature or change to an existing feature, both behavioural *and* unit tests are required.

</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jan 3, 2026

@coderabbitai Have the following now been resolved?

Please address the comments from this code review:

## Overall Comments
- The per-token loops over `cursors` in `build_green_tree` now duplicate the `advance_to`/`start_if` and `finish_if` logic inline; consider extracting small helper functions (or a thin wrapper type around `Vec<SpanCursor<'_>>`) to keep this logic encapsulated as it was with `SpanCursors` and avoid repeating the loops.
- `build_span_cursors` creates a new `Vec` on every `build_green_tree` call; if CST construction is on a hot path, you might want to benchmark and consider alternatives like reusing a preallocated buffer or a small-vector type to reduce heap allocations while keeping the dynamic span category list.

## Individual Comments

### Comment 1
<location> `src/parser/cst_builder/tree.rs:43` </location>
<code_context>
-            cur.finish_if(builder, pos);
-        }
-    }
+fn build_span_cursors(spans: &ParsedSpans) -> Vec<SpanCursor<'_>> {
+    vec![
+        (spans.imports(), SyntaxKind::N_IMPORT_STMT),
</code_context>

<issue_to_address>
**issue (review_instructions):** Add both behavioural and unit tests to cover this change in how span cursors are constructed and iterated.

Add unit tests that exercise `build_green_tree` with representative `ParsedSpans` to verify that imports, typedefs, relations, indexes, functions, transformers, rules, and expressions still start and finish at the correct positions after introducing `build_span_cursors` and the new cursor iteration pattern. In addition, add behavioural tests at the integration level (e.g. parsing a Datalog program and validating the resulting CST) to ensure that the observable behaviour of CST construction has not regressed.

<details>
<summary>Review instructions:</summary>

**Path patterns:** `**/*`

**Instructions:**
- For any new feature or change to an existing feature, both behavioural *and* unit tests are required.

</details>
</issue_to_address>

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 3, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the integration test file exists and examine its contents
if [ -f "src/parser/tests/cst_integration.rs" ]; then
    echo "=== Integration test file found ==="
    cat src/parser/tests/cst_integration.rs
else
    echo "=== Integration test file NOT found ==="
fi

echo ""
echo "=== Checking test module declaration ==="
cat src/parser/tests/mod.rs

echo ""
echo "=== Checking current tree.rs implementation ==="
cat src/parser/cst_builder/tree.rs

Length of output: 12147


@leynos: Yes, the issues from the code review have been resolved:

✅ Tests Requirement (Fully Resolved)

Behavioural/Integration Tests:

  • src/parser/tests/cst_integration.rs contains parse_builds_cst_for_all_top_level_categories which parses a comprehensive Datalog program and verifies all top-level categories produce expected CST nodes (imports, typedefs, relations, indexes, functions, transformers, rules, and expressions).

Unit Tests:

  • src/parser/cst_builder/tree.rs contains build_green_tree_matches_span_positions which exercises build_green_tree with representative ParsedSpans and verifies that all categories start and finish at the correct positions by comparing CST node spans against the original parsed spans.

✅ Code Duplication (Fully Resolved)

The inline loop duplication has been eliminated. SpanCursors is now a proper wrapper type with encapsulated methods:

  • advance_and_start(): combines advance_to + start_if logic
  • finish(): encapsulates finish_if logic

The build_green_tree function now simply calls these methods, keeping the logic clean and maintainable.

⚠️ Vec Allocation (Partially Addressed)

The code now uses Vec::with_capacity(SPAN_CURSOR_KINDS.len()) to pre-allocate the vector, reducing heap allocations. The original suggestion to benchmark and potentially reuse buffers was a performance optimisation "consider" rather than a requirement. The current implementation with pre-allocation is a reasonable improvement unless profiling reveals CST construction as a bottleneck.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jan 3, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 3, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06f9376 and 8f97d81.

📒 Files selected for processing (3)
  • src/parser/cst_builder/tree.rs
  • src/parser/tests/cst_integration.rs
  • src/parser/tests/mod.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings in Rust.
Handle errors with the Result type instead of panicking in Rust where feasible.
Avoid unsafe code in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared test setup in Rust.
Replace duplicated tests with #[rstest(…)] parameterized cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...

Files:

  • src/parser/tests/cst_integration.rs
  • src/parser/tests/mod.rs
  • src/parser/cst_builder/tree.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/parser/tests/cst_integration.rs
  • src/parser/tests/mod.rs
  • src/parser/cst_builder/tree.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.

Applied to files:

  • src/parser/tests/cst_integration.rs
  • src/parser/tests/mod.rs
  • src/parser/cst_builder/tree.rs
🧬 Code graph analysis (2)
src/parser/tests/cst_integration.rs (4)
src/test_util/assertions.rs (1)
  • assert_no_parse_errors (21-23)
src/parser/tests/helpers.rs (1)
  • pretty_print (19-36)
src/parser/cst_builder/mod.rs (1)
  • root (45-47)
src/parser/ast/root.rs (1)
  • kind (46-48)
src/parser/cst_builder/tree.rs (3)
src/parser/cst_builder/spans.rs (17)
  • imports (52-55)
  • imports (156-158)
  • typedefs (59-62)
  • typedefs (162-164)
  • relations (66-69)
  • relations (168-170)
  • indexes (73-76)
  • indexes (174-176)
  • functions (80-83)
  • functions (180-182)
  • transformers (87-90)
  • transformers (186-188)
  • rules (94-97)
  • rules (192-194)
  • expressions (101-104)
  • expressions (198-200)
  • builder (148-150)
src/parser/span_scanner.rs (1)
  • parse_tokens (17-67)
src/parser/ast/rule.rs (1)
  • text_range_to_span (626-630)
🔍 Remote MCP Deepwiki

Summary of Additional Context Found

Purpose and Function of SpanCursor in CST Construction:

SpanCursor is a utility struct used during the Concrete Syntax Tree (CST) construction phase to manage and track the start and end positions of different statement types within the token stream. Each SpanCursor instance is associated with a specific SyntaxKind (e.g., N_IMPORT_STMT, N_RELATION_DECL) and an iterator over a sorted list of Spans for that kind.

How SpanCursor Operations Work:

For each token during CST construction, the build_green_tree function calls cursors.advance_and_start(&mut builder, span.start), which iterates through all SpanCursors—advancing each past any spans that have already ended and starting new nodes when the token's start position matches a span's start position. After pushing the current token, cursors.finish(&mut builder, span.end) finishes nodes when the token's end position matches or exceeds the end of managed spans.

Supported Cursor Kinds:

The SPAN_CURSOR_KINDS mapping supports seven top-level statement types:

  • N_IMPORT_STMT (imports), N_TYPE_DEF (typedefs), N_RELATION_DECL (relations), N_INDEX (indexes), N_FUNCTION (functions), N_TRANSFORMER (transformers), and N_RULE (Datalog rules)

Key Distinction:

SpanCollector is responsible for generating spans during the initial scan of the token stream, whereas SpanCursor uses these pre-collected spans to guide the rowan::GreenNodeBuilder in constructing the CST.

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (5)
src/parser/tests/mod.rs (1)

5-5: LGTM!

The new cst_integration module is correctly wired into the test suite, maintaining alphabetical ordering.

src/parser/tests/cst_integration.rs (1)

1-42: LGTM!

Solid integration test that exercises CST construction across all top-level statement categories. Good use of concat!() for the multi-line source string, and the pretty-print round-trip assertion provides additional confidence that the CST structure is correct.

src/parser/cst_builder/tree.rs (3)

10-21: LGTM!

Clean declarative mapping that centralises cursor kind registration. This makes extending the CST builder with new span categories straightforward—add a getter to ParsedSpans and a corresponding entry here.


61-67: LGTM!

Pre-allocation with Vec::with_capacity addresses the allocation concern raised during review. The iterator-driven construction is clean and maintainable.


185-212: LGTM!

Comprehensive unit test that validates span-to-node mapping for all eight cursor kinds. Good complement to the integration test, ensuring the cursor machinery produces CST nodes at the expected positions.

Comment thread src/parser/cst_builder/tree.rs Outdated
@leynos leynos changed the title Refactor SpanCursors to Vec for flexible CST construction Refactor SpanCursors to use SmallVec for flexible CST construction Jan 3, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/parser/cst_builder/tree.rs (1)

169-173: Eliminate the duplicate text_range_to_span helper.

This helper duplicates the implementation at line 626 in src/parser/ast/rule.rs. Make the original pub(crate), then import and reuse it here.

Based on past review comments flagging this duplication.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f97d81 and f6bd83c.

📒 Files selected for processing (3)
  • Cargo.toml
  • docs/ddlint-design-and-road-map.md
  • src/parser/cst_builder/tree.rs
🧰 Additional context used
📓 Path-based instructions (6)
Cargo.toml

📄 CodeRabbit inference engine (AGENTS.md)

Cargo.toml: Use explicit version ranges in Cargo.toml and keep dependencies up-to-date.
All crate versions specified in Cargo.toml must use SemVer-compatible caret requirements (e.g., some-crate = "1.2.3"); wildcard (*) or open-ended inequality (>=) version requirements are strictly forbidden.
Tilde requirements (~) in Cargo.toml should only be used where a dependency must be locked to patch-level updates for a specific, documented reason.

Files:

  • Cargo.toml
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

**/*.md: Markdown paragraphs must be wrapped at 80 columns, and bullet points need the same limit.
Code blocks in Markdown must be wrapped at 120 columns.
Tables in Markdown must not be wrapped, and headings must remain unwrapped.
Use dashes (-) for list bullets in Markdown.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.

Files:

  • docs/ddlint-design-and-road-map.md

⚙️ CodeRabbit configuration file

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

  • Use en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Headings must not be wrapped.
  • Documents must start with a level 1 heading
  • Headings must correctly increase or decrease by no more than one level at a time
  • Use GitHub-flavoured Markdown style for footnotes and endnotes.
  • Numbered footnotes must be numbered by order of appearance in the document.

Files:

  • docs/ddlint-design-and-road-map.md
docs/**/*.{md,rs,hs,py}

📄 CodeRabbit inference engine (docs/differential-datalog-parser-syntax-spec-migration-plan.md)

Update all repository references from docs/haskell-parser-analysis.md to point to the normative DDlog syntax specification and new implementation notes

Files:

  • docs/ddlint-design-and-road-map.md
docs/**/*.md

📄 CodeRabbit inference engine (docs/documentation-style-guide.md)

docs/**/*.md: Use British English based on Oxford English Dictionary locale en-GB, including: -ize suffixes in words like 'realize' and 'organization', -lyse suffixes in words like 'analyse' and 'paralyse', -our suffixes in words like 'colour' and 'behaviour', -re suffixes in words like 'centre' and 'calibre', double 'l' in words like 'cancelled' and 'counsellor', maintain 'e' in words like 'likeable' and 'liveable', -ogue suffixes in words like 'catalogue'
Keep United States (US) spelling when used in an API (e.g., 'color')
Use the Oxford comma in documentation: 'ships, planes, and hovercraft' where it aids comprehension
Treat company names as collective nouns in documentation (e.g., 'Concordat Industries are expanding')
Write headings in sentence case
Use Markdown headings (#, ##, ###, etc.) in order without skipping levels
Follow markdownlint recommendations for markdown formatting
Always provide a language identifier for fenced code blocks; use 'plaintext' for non-code text
Use '-' as the first level bullet and renumber lists when items change
Prefer inline links using text or angle brackets around the URL in documentation
Ensure blank lines before and after bulleted lists and fenced blocks in documentation
Ensure tables have a delimiter line below the header row
Expand any uncommon acronym on first use (e.g., Continuous Integration (CI))
Wrap paragraphs at 80 columns in documentation
Wrap code at 120 columns in documentation
Do not wrap tables in documentation
Use GitHub-flavoured numeric footnotes referenced as [^1] in documentation
Footnotes must be numbered in order of appearance in the document
Caption every table and every diagram in documentation
Include Mermaid diagrams where they add clarity to documentation
Use 'alt text' syntax for embedding figures and provide brief alt text describing the content in documentation
Add a short description before each Mermaid diagram in documentation so screen readers can understand it

Files:

  • docs/ddlint-design-and-road-map.md
docs/**/!(README).md

📄 CodeRabbit inference engine (docs/documentation-style-guide.md)

Avoid first and second person personal pronouns outside the README.md file

Files:

  • docs/ddlint-design-and-road-map.md
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings in Rust.
Handle errors with the Result type instead of panicking in Rust where feasible.
Avoid unsafe code in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared test setup in Rust.
Replace duplicated tests with #[rstest(…)] parameterized cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...

Files:

  • src/parser/cst_builder/tree.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/parser/cst_builder/tree.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.

Applied to files:

  • src/parser/cst_builder/tree.rs
🧬 Code graph analysis (1)
src/parser/cst_builder/tree.rs (2)
src/parser/cst_builder/spans.rs (17)
  • imports (52-55)
  • imports (156-158)
  • typedefs (59-62)
  • typedefs (162-164)
  • relations (66-69)
  • relations (168-170)
  • indexes (73-76)
  • indexes (174-176)
  • functions (80-83)
  • functions (180-182)
  • transformers (87-90)
  • transformers (186-188)
  • rules (94-97)
  • rules (192-194)
  • expressions (101-104)
  • expressions (198-200)
  • builder (148-150)
src/parser/span_scanner.rs (1)
  • parse_tokens (17-67)
🔍 Remote MCP Deepwiki

Relevant findings for reviewing PR #203 (SpanCursors → dynamic Vec / SmallVec):

  • Code locations and flow impacted

    • CST construction uses build_green_tree in src/parser/cst_builder/tree.rs; SpanCursors/SpanCursor coordinate start/finish events driven by ParsedSpans from span scanner → affects hot path in CST builder.,
  • Tests added / coverage

    • New integration test module: src/parser/tests/cst_integration.rs validating one top-level node per category (import, typedef, relation, index, function, transformer, rule) and expression node counts; pretty-print round-trip asserted. Unit test build_green_tree_matches_span_positions added to tree.rs to assert span→node mapping across all kinds. These exercise behavior changes end-to-end.
  • SmallVec change and rationale

    • PR (final) replaces Vec allocation with SmallVec (stack-backed, capacity 8) for the span-cursors collection to avoid heap allocations in the common case; Cargo.toml updated to include smallvec and docs/ddlint-design-and-road-map.md annotated accordingly. This targets the reviewer concern about per-call allocations on a hot path.
  • Performance / API considerations to verify in review

    • build_green_tree remains a hot loop: ensure SmallVec capacity (8) covers typical number of cursor kinds and that Vec/SmallVec construction does not regress allocation behavior under pathological inputs.
    • Confirm SpanCursors wrapper now exposes advance_and_start() and finish() (encapsulating earlier duplicated loops) so build_green_tree no longer duplicates per-token loops. Verify no behavioral regressions in start/finish ordering across cursor kinds.
  • Testing strategy / CI expectations

    • Repository uses integration-first testing and round-trip pretty-print assertions; added tests align with existing strategy (rstest fixtures, round-trip checks). Confirm CI runs clippy/format/tests with warnings denied; newly added dependency and tests should pass those gates.

Files / areas to inspect closely in review

  • src/parser/cst_builder/tree.rs — correctness of SpanCursor kind tracking, start/finish ordering, and absence of off-by-one span handling.
  • src/parser/cst_builder/spans.rs and span-scanner collectors — ensure SPAN_CURSOR_KINDS mapping matches ParsedSpans fields and ordering assumptions.
  • Tests: src/parser/tests/cst_integration.rs and unit test in tree.rs — verify they cover edge cases (overlapping spans, empty lists).
  • Cargo.toml & docs/ddlint-design-and-road-map.md — dependency addition and documentation accuracy for SmallVec rationale.
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (2)
src/parser/cst_builder/tree.rs (1)

187-214: Excellent comprehensive test coverage.

This test exercises build_green_tree across all eight top-level span kinds and validates that CST node positions match the parsed spans. The use of concat! for the multi-line source string follows the coding guidelines.

docs/ddlint-design-and-road-map.md (1)

216-218: Well-documented optimisation with proper citation.

The addition clearly explains the SmallVec optimisation in the CST construction pipeline, uses correct en-GB-oxendict spelling ("optimization" with -ize), wraps at 80 columns, and includes a properly ordered footnote citation. This aligns with the code changes in tree.rs.

Also applies to: 1009-1010

Comment thread Cargo.toml Outdated
Comment thread src/parser/cst_builder/tree.rs Outdated
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jan 3, 2026

@coderabbitai Have the following now been resolved?

⚠️ Vec Allocation (Partially Addressed)

The code now uses Vec::with_capacity(SPAN_CURSOR_KINDS.len()) to pre-allocate the vector, reducing heap allocations. The original suggestion to benchmark and potentially reuse buffers was a performance optimisation "consider" rather than a requirement. The current implementation with pre-allocation is a reasonable improvement unless profiling reveals CST construction as a bottleneck.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 3, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if SmallVec is being used in tree.rs
echo "=== Checking for SmallVec usage in tree.rs ==="
rg -n "SmallVec" src/parser/cst_builder/tree.rs

echo ""
echo "=== Checking imports in tree.rs ==="
head -20 src/parser/cst_builder/tree.rs | rg -n "use|mod"

echo ""
echo "=== Checking SpanCursors struct definition ==="
rg -n -A5 "struct SpanCursors" src/parser/cst_builder/tree.rs

echo ""
echo "=== Checking SpanCursors::new implementation ==="
rg -n -A10 "impl.*SpanCursors" src/parser/cst_builder/tree.rs | head -20

echo ""
echo "=== Checking Cargo.toml for smallvec dependency ==="
rg -n "smallvec" Cargo.toml

Length of output: 1246


@leynos: Yes, this has now been fully resolved

The code has been updated from Vec to SmallVec since my previous assessment:

  1. SmallVec imported (line 5): use smallvec::SmallVec;
  2. SpanCursors uses SmallVec (line 59): SmallVec<[SpanCursor<'a>; SPAN_CURSOR_INLINE]>
  3. Construction optimised (line 64): SmallVec::with_capacity(SPAN_CURSOR_KINDS.len())
  4. Dependency added: smallvec = "1.13" in Cargo.toml

SmallVec is a stack-backed vector that avoids heap allocations when the collection fits within the inline capacity (SPAN_CURSOR_INLINE). Since the number of span cursor kinds is small and known at compile-time, this provides exactly the optimisation the reviewer suggested: reducing heap allocations on the hot path whilst maintaining the dynamic, data-driven approach.

This fully addresses the original concern about "alternatives like reusing a preallocated buffer or a small-vector type to reduce heap allocations."

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jan 4, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 4, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/parser/ast/rule.rs (1)

626-630: Add documentation for the internal API.

The visibility change appropriately eliminates duplication in tests. However, expose the function's purpose and contract with a doc comment.

🔎 Proposed documentation
+/// Convert a Rowan text range to a byte-offset span.
+///
+/// This helper bridges the CST layer (which uses `rowan::TextRange`) and
+/// the parser/diagnostic layer (which uses byte-offset `Span`s).
 pub(crate) fn text_range_to_span(range: rowan::TextRange) -> Span {
     let start: usize = range.start().into();
     let end: usize = range.end().into();
     start..end
 }

As per coding guidelines, internal APIs visible across modules should carry doc comments for maintainability.

♻️ Duplicate comments (1)
src/parser/cst_builder/tree.rs (1)

12-12: Add compile-time synchronization between SPAN_CURSOR_INLINE and SPAN_CURSOR_KINDS.

This was flagged in a previous review and remains unaddressed. Set SPAN_CURSOR_INLINE to SPAN_CURSOR_KINDS.len() directly:

-const SPAN_CURSOR_INLINE: usize = 8;
+const SPAN_CURSOR_INLINE: usize = SPAN_CURSOR_KINDS.len();

Alternatively, add a compile-time assertion to prevent silent heap spills if new span kinds are added.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6bd83c and cab8d7b.

📒 Files selected for processing (2)
  • src/parser/ast/rule.rs
  • src/parser/cst_builder/tree.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings in Rust.
Handle errors with the Result type instead of panicking in Rust where feasible.
Avoid unsafe code in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared test setup in Rust.
Replace duplicated tests with #[rstest(…)] parameterized cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...

Files:

  • src/parser/ast/rule.rs
  • src/parser/cst_builder/tree.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/parser/ast/rule.rs
  • src/parser/cst_builder/tree.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.

Applied to files:

  • src/parser/ast/rule.rs
  • src/parser/cst_builder/tree.rs
🧬 Code graph analysis (1)
src/parser/cst_builder/tree.rs (5)
src/parser/ast/root.rs (7)
  • imports (76-78)
  • relations (88-90)
  • indexes (94-96)
  • functions (100-102)
  • transformers (106-108)
  • rules (112-114)
  • syntax (32-34)
src/parser/cst_builder/spans.rs (17)
  • imports (52-55)
  • imports (156-158)
  • typedefs (59-62)
  • typedefs (162-164)
  • relations (66-69)
  • relations (168-170)
  • indexes (73-76)
  • indexes (174-176)
  • functions (80-83)
  • functions (180-182)
  • transformers (87-90)
  • transformers (186-188)
  • rules (94-97)
  • rules (192-194)
  • expressions (101-104)
  • expressions (198-200)
  • builder (148-150)
src/parser/ast/rule.rs (1)
  • text_range_to_span (626-630)
src/parser/span_scanner.rs (1)
  • parse_tokens (17-67)
src/parser/ast/mod.rs (1)
  • syntax (24-24)
🔍 Remote MCP Deepwiki

Relevant facts discovered (concise)

  • Core CST builder location and span-cursor flow: build_green_tree uses SpanCursors to drive start/finish events from ParsedSpans; SpanCursors/SpanCursor logic lives in src/parser/cst_builder/tree.rs and parsed spans are built in span_scanner.rs.

  • PR aligns with documented pipeline: parse() → parse_tokens() (span collection) → build_green_tree() (CST construction) → ast::Root::from_green(). Changes to SpanCursors affect a hot path in build_green_tree.

  • SmallVec rationale documented: docs/ddlint-design-and-road-map.md notes using a small, stack-backed buffer (smallvec) to avoid heap allocations in the common case; Cargo.toml updated to add smallvec = "1.13". Reviewer allocation concern was addressed by switching to SmallVec with inline capacity.

  • Tests added and testing strategy: new integration test exercises all top-level categories and round-trip pretty-printing; unit test verifies span→node mappings. Repo CI enforces strict quality gates (cargo test, clippy with -D warnings, rustfmt); tests must pass under those constraints.

  • Areas to focus in review (from docs): verify SpanCursor kind tracking, start/finish ordering, absence of off-by-one span handling, that SmallVec inline capacity covers typical cursor count, and that encapsulation/duplication concerns were resolved (SpanCursors wrapper exposing advance_and_start/finish).

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (3)
src/parser/cst_builder/tree.rs (3)

14-23: Well-structured data-driven mapping.

The SPAN_CURSOR_KINDS array cleanly maps span getters to their corresponding SyntaxKind values, enabling generic cursor construction. This is a clear improvement over the previous hard-coded approach.


58-83: Clean encapsulation of cursor operations.

The SpanCursors wrapper with advance_and_start() and finish() methods properly encapsulates the iteration logic, addressing the earlier duplication concerns. The SmallVec construction with pre-allocated capacity is appropriate.


182-209: Thorough span position verification test.

The test exercises all eight span kinds with a representative input covering imports, typedefs, relations, indexes, functions, transformers, rules, and expressions. The use of concat!() for the multi-line string follows coding guidelines.

leynos and others added 6 commits January 4, 2026 04:26
Refactored the CST builder code by removing the SpanCursors struct and replacing it with a function that builds a Vec<SpanCursor> dynamically. This reduces complexity and improves code readability.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Changed the build_span_cursors function to take a borrowed ParsedSpans without lifetime annotation and return a Vec with elided lifetime, improving code clarity and conciseness.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
…clarity

- Introduced SpanCursors struct to encapsulate management of multiple SpanCursor instances.
- Replaced build_span_cursors function with SpanCursors::new constructor.
- Moved advance and finish logic into SpanCursors methods for cleaner iteration.
- Added comprehensive tests to validate CST node spans against parsed spans.
- Added integration test verifying CST construction for all top-level statement types.

This refactor improves code clarity and maintainability by better encapsulating cursor state and behavior during CST construction.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Added the `smallvec` dependency to Cargo.toml and replaced the `Vec` of `SpanCursor`s with a `SmallVec` backed by a stack-allocated buffer of size 8 in the CST builder. This reduces heap allocations in the common case during concrete syntax tree construction, improving performance and memory usage.

Also updated documentation to reference the `smallvec` crate and explain the optimization.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Made the `text_range_to_span` function public in `rule.rs` and removed its duplicate
local definition from the tests in `tree.rs` to improve code reuse and clarity.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos force-pushed the terragon/refactor-spancursors-to-vec-q01vp6 branch from 958b202 to 7c61476 Compare January 4, 2026 04:31
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Jan 5, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 5, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @src/parser/tests/cst_integration.rs:
- Around line 1-42: The test parse_builds_cst_for_all_top_level_categories
correctly validates CST node counts and pretty-printing but can be simplified by
extracting the count_nodes closure into a reusable helper; move the closure
logic into a new helper function (e.g., count_nodes_by_kind) in the tests
helpers module and update this test to call that helper instead of the inline
closure, keeping references to SyntaxKind variants (N_IMPORT_STMT, N_TYPE_DEF,
N_RELATION_DECL, N_INDEX, N_FUNCTION, N_TRANSFORMER, N_RULE, N_EXPR_NODE) and
existing helpers::pretty_print and parse so behavior and assertions remain
unchanged.
♻️ Duplicate comments (1)
Cargo.toml (1)

17-17: Upgrade smallvec to the latest stable release.

This concern has already been raised in a previous review comment. The dependency version 1.13 is outdated; upgrade to 1.15.1 as recommended.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cab8d7b and 7c61476.

📒 Files selected for processing (6)
  • Cargo.toml
  • docs/ddlint-design-and-road-map.md
  • src/parser/ast/rule.rs
  • src/parser/cst_builder/tree.rs
  • src/parser/tests/cst_integration.rs
  • src/parser/tests/mod.rs
🧰 Additional context used
📓 Path-based instructions (6)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings in Rust.
Handle errors with the Result type instead of panicking in Rust where feasible.
Avoid unsafe code in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared test setup in Rust.
Replace duplicated tests with #[rstest(…)] parameterized cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...

Files:

  • src/parser/tests/cst_integration.rs
  • src/parser/ast/rule.rs
  • src/parser/tests/mod.rs
  • src/parser/cst_builder/tree.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/parser/tests/cst_integration.rs
  • src/parser/ast/rule.rs
  • src/parser/tests/mod.rs
  • src/parser/cst_builder/tree.rs
**/*.md

📄 CodeRabbit inference engine (AGENTS.md)

**/*.md: Markdown paragraphs must be wrapped at 80 columns, and bullet points need the same limit.
Code blocks in Markdown must be wrapped at 120 columns.
Tables in Markdown must not be wrapped, and headings must remain unwrapped.
Use dashes (-) for list bullets in Markdown.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.

Files:

  • docs/ddlint-design-and-road-map.md

⚙️ CodeRabbit configuration file

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

  • Use en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Headings must not be wrapped.
  • Documents must start with a level 1 heading
  • Headings must correctly increase or decrease by no more than one level at a time
  • Use GitHub-flavoured Markdown style for footnotes and endnotes.
  • Numbered footnotes must be numbered by order of appearance in the document.

Files:

  • docs/ddlint-design-and-road-map.md
docs/**/*.{md,rs,hs,py}

📄 CodeRabbit inference engine (docs/differential-datalog-parser-syntax-spec-migration-plan.md)

Update all repository references from docs/haskell-parser-analysis.md to point to the normative DDlog syntax specification and new implementation notes

Files:

  • docs/ddlint-design-and-road-map.md
docs/**/*.md

📄 CodeRabbit inference engine (docs/documentation-style-guide.md)

docs/**/*.md: Use British English based on Oxford English Dictionary locale en-GB, including: -ize suffixes in words like 'realize' and 'organization', -lyse suffixes in words like 'analyse' and 'paralyse', -our suffixes in words like 'colour' and 'behaviour', -re suffixes in words like 'centre' and 'calibre', double 'l' in words like 'cancelled' and 'counsellor', maintain 'e' in words like 'likeable' and 'liveable', -ogue suffixes in words like 'catalogue'
Keep United States (US) spelling when used in an API (e.g., 'color')
Use the Oxford comma in documentation: 'ships, planes, and hovercraft' where it aids comprehension
Treat company names as collective nouns in documentation (e.g., 'Concordat Industries are expanding')
Write headings in sentence case
Use Markdown headings (#, ##, ###, etc.) in order without skipping levels
Follow markdownlint recommendations for markdown formatting
Always provide a language identifier for fenced code blocks; use 'plaintext' for non-code text
Use '-' as the first level bullet and renumber lists when items change
Prefer inline links using text or angle brackets around the URL in documentation
Ensure blank lines before and after bulleted lists and fenced blocks in documentation
Ensure tables have a delimiter line below the header row
Expand any uncommon acronym on first use (e.g., Continuous Integration (CI))
Wrap paragraphs at 80 columns in documentation
Wrap code at 120 columns in documentation
Do not wrap tables in documentation
Use GitHub-flavoured numeric footnotes referenced as [^1] in documentation
Footnotes must be numbered in order of appearance in the document
Caption every table and every diagram in documentation
Include Mermaid diagrams where they add clarity to documentation
Use 'alt text' syntax for embedding figures and provide brief alt text describing the content in documentation
Add a short description before each Mermaid diagram in documentation so screen readers can understand it

Files:

  • docs/ddlint-design-and-road-map.md
docs/**/!(README).md

📄 CodeRabbit inference engine (docs/documentation-style-guide.md)

Avoid first and second person personal pronouns outside the README.md file

Files:

  • docs/ddlint-design-and-road-map.md
Cargo.toml

📄 CodeRabbit inference engine (AGENTS.md)

Cargo.toml: Use explicit version ranges in Cargo.toml and keep dependencies up-to-date.
All crate versions specified in Cargo.toml must use SemVer-compatible caret requirements (e.g., some-crate = "1.2.3"); wildcard (*) or open-ended inequality (>=) version requirements are strictly forbidden.
Tilde requirements (~) in Cargo.toml should only be used where a dependency must be locked to patch-level updates for a specific, documented reason.

Files:

  • Cargo.toml
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.

Applied to files:

  • src/parser/tests/cst_integration.rs
  • src/parser/ast/rule.rs
  • src/parser/tests/mod.rs
  • src/parser/cst_builder/tree.rs
🧬 Code graph analysis (1)
src/parser/cst_builder/tree.rs (3)
src/parser/ast/root.rs (9)
  • imports (76-78)
  • relations (88-90)
  • indexes (94-96)
  • functions (100-102)
  • transformers (106-108)
  • rules (112-114)
  • kind (46-48)
  • from_green (38-42)
  • syntax (32-34)
src/parser/cst_builder/spans.rs (17)
  • imports (52-55)
  • imports (156-158)
  • typedefs (59-62)
  • typedefs (162-164)
  • relations (66-69)
  • relations (168-170)
  • indexes (73-76)
  • indexes (174-176)
  • functions (80-83)
  • functions (180-182)
  • transformers (87-90)
  • transformers (186-188)
  • rules (94-97)
  • rules (192-194)
  • expressions (101-104)
  • expressions (198-200)
  • builder (148-150)
src/parser/ast/rule.rs (1)
  • text_range_to_span (630-634)
🔍 Remote MCP Deepwiki

Relevant repository facts to check during review:

  • CST builder location & flow: build_green_tree in src/parser/cst_builder/tree.rs consumes tokens + ParsedSpans and uses SpanCursors to start/finish nodes while feeding rowan::GreenNodeBuilder — verify SpanCursors::new() is used there and that advance_and_start/finish integrate correctly.,

  • ParsedSpans organization: ParsedSpans exposes per-kind span slices (imports, typedefs, relations, indexes, functions, transformers, rules) used to build per-kind cursors; ensure SPAN_CURSOR_KINDS maps to those getters and SyntaxKind values and preserves ordering/overlap invariants.

  • SpanCursor API and usage points to inspect: SpanCursor methods used by build_green_tree are advance_to / start_if / finish_if (now wrapped by SpanCursors.advance_and_start and .finish) — verify loop replaced by SpanCursors methods and per-cursor kind field is used where expected.

  • SmallVec change & allocation rationale: smallvec = "1.13" added; SpanCursors now uses smallvec::SmallVec with inline capacity (SPAN_CURSOR_INLINE) and SmallVec::with_capacity(SPAN_CURSOR_KINDS.len()) to avoid heap allocation in common case — confirm inline capacity >= SPAN_CURSOR_KINDS.len() or that capacity choice is conservative.,

  • Tests added and their coverage: new integration test src/parser/tests/cst_integration.rs exercises all top-level categories (N_IMPORT_STMT, N_TYPE_DEF, N_RELATION_DECL, N_INDEX, N_FUNCTION, N_TRANSFORMER, N_RULE) and asserts exact counts plus round-trip pretty-print equality; unit test verifies build_green_tree span→node alignment — confirm tests reference the updated SpanCursors behavior and pass under CI expectations (cargo test, clippy -D warnings).,

  • Hot-path/perf risk to consider: build_green_tree is a hot path in parse() (used every parse); confirm SmallVec usage actually avoids heap allocations in typical cursor counts and review whether SpanCursors allocation per call remains acceptable or should reuse a buffer as an optimization.,

Files/places to inspect in code review:

  • src/parser/cst_builder/tree.rs (SpanCursors/SpanCursor, SPAN_CURSOR_KINDS, SPAN_CURSOR_INLINE, build_green_tree usage)
  • src/parser/cst_builder/spans.rs (ParsedSpans layout/validation)
  • Cargo.toml (smallvec dependency) and docs/ddlint-design-and-road-map.md (notes on smallvec rationale)
  • src/parser/tests/cst_integration.rs and updated test module (verify assertions mirror PR objectives)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (9)
docs/ddlint-design-and-road-map.md (2)

216-218: Documentation accurately describes the CST construction optimisation.

The semantic note clearly explains the span cursor list's use of a small, stack-backed buffer to avoid heap allocations, with appropriate citation to the smallvec crate.


1008-1010: Footnote citation correctly formatted and sequenced.

The new footnote [^16] for smallvec is properly numbered in order of appearance and includes the appropriate documentation URL.

src/parser/ast/rule.rs (1)

626-630: Visibility change and documentation are appropriate.

The pub(crate) visibility enables shared use of the CST-to-span conversion utility across parser and diagnostic modules, and the doc comment clearly explains the bridging function between rowan::TextRange and byte-offset Span.

src/parser/tests/mod.rs (1)

5-5: Test module correctly wired into the test suite.

The cst_integration module declaration is properly alphabetised and follows standard Rust test organisation conventions.

src/parser/cst_builder/tree.rs (5)

5-12: LGTM — clean type abstraction.

The SpanSliceGetter function pointer alias provides a clear contract for the cursor construction mapping.


13-23: LGTM — self-synchronizing capacity constant.

Deriving SPAN_CURSOR_INLINE from SPAN_CURSOR_KINDS.len() guarantees the inline capacity matches the cursor count, eliminating any possibility of silent heap spill due to drift.


58-83: LGTM — encapsulated cursor management with optimal allocation.

The SmallVec with inline capacity equal to the cursor count ensures zero heap allocations on the hot path. The advance_and_start and finish methods cleanly encapsulate the per-cursor iteration, eliminating the loop duplication flagged in earlier reviews.


173-177: LGTM — reuses shared helper as requested.

The import of text_range_to_span from crate::parser::ast::rule eliminates the duplication flagged in the previous review.


215-254: LGTM — comprehensive span-position validation.

The test exercises all eight span categories and verifies that the CST nodes produced by build_green_tree align precisely with the spans recorded by the parser. The use of concat!() for the multi-line source string follows the coding guidelines.

Comment thread src/parser/tests/cst_integration.rs
…ration tests

Replace inline node counting closures with centralized count_nodes_by_kind helper function for improved code reuse and clarity in tests.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

…r signatures

Introduced a SourceText type to reduce reliance on raw string types in parser test helpers. Updated parse functions and related helpers to accept impl Into<SourceText>, improving code clarity and flexibility.

Also bumped smallvec dependency to 1.15.1 in Cargo.toml.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Fix all issues with AI Agents 🤖
In @src/parser/tests/helpers.rs:
- Around line 13-15: Add a doc comment above the SourceText struct explaining
that this test-only wrapper standardizes test input types and enables flexible
From conversions (so callers can pass &str, String, etc. without needing
explicit .to_string()), briefly describing its purpose to reduce string-heavy
helper signatures in tests and point to the From impls that make conversions
ergonomic.
- Around line 66-69: The helper function count_nodes_by_kind lacks
documentation; add a detailed doc comment above pub(super) fn
count_nodes_by_kind that explains it counts descendant nodes of a given
SyntaxKind in a SyntaxNode, notes that it includes the node's descendants (not
the node itself) and is intended for tests, and show a brief usage example
(small snippet using a SyntaxNode and SyntaxKind to call count_nodes_by_kind).
Mention the function signature (count_nodes_by_kind(node: &SyntaxNode, kind:
SyntaxKind) -> usize) and that it returns the count so other test modules can
quickly understand and reuse it.
- Around line 29-33: Remove the redundant impl From<&String> for SourceText that
clones the String; &String derefs to &str so the existing From<&str> covers it.
Delete the impl block (the impl From<&String> for SourceText and its fn from) to
avoid the unnecessary clone and rely on the existing From<&str> implementation.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c70b787 and d589c60.

📒 Files selected for processing (2)
  • Cargo.toml
  • src/parser/tests/helpers.rs
🧰 Additional context used
📓 Path-based instructions (2)
Cargo.toml

📄 CodeRabbit inference engine (AGENTS.md)

Cargo.toml: Use explicit version ranges in Cargo.toml and keep dependencies up-to-date.
All crate versions specified in Cargo.toml must use SemVer-compatible caret requirements (e.g., some-crate = "1.2.3"); wildcard (*) or open-ended inequality (>=) version requirements are strictly forbidden.
Tilde requirements (~) in Cargo.toml should only be used where a dependency must be locked to patch-level updates for a specific, documented reason.

Files:

  • Cargo.toml
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings in Rust.
Handle errors with the Result type instead of panicking in Rust where feasible.
Avoid unsafe code in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared test setup in Rust.
Replace duplicated tests with #[rstest(…)] parameterized cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...

Files:

  • src/parser/tests/helpers.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/parser/tests/helpers.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.

Applied to files:

  • src/parser/tests/helpers.rs
🧬 Code graph analysis (1)
src/parser/tests/helpers.rs (2)
src/parser/token_stream.rs (1)
  • src (118-120)
src/parser/mod.rs (1)
  • parse (47-55)
🔍 Remote MCP Deepwiki

Relevant repo facts for reviewing PR #203 (SpanCursors → SmallVec, span-kind mapping, tests):

  • build_green_tree / CST flow: build_green_tree in src/parser/cst_builder/tree.rs constructs the GreenNodeBuilder and uses SpanCursors to start/finish nodes while iterating tokens; ParsedSpans are produced by parse_tokens and fed into CST construction.
  • ParsedSpans shape & validation: ParsedSpans contains per-kind span slices (imports, typedefs, relations, indexes, functions, transformers, rules); ParsedSpansBuilder validates lists sorted/non‑overlapping in debug builds.
  • Span cursor role & counts: CST builder uses per-kind cursors (one per top-level category) to trigger start/finish events for N_IMPORT_STMT, N_TYPE_DEF, N_RELATION_DECL, N_INDEX, N_FUNCTION, N_TRANSFORMER, N_RULE, and N_EXPR_NODE; SpanCursors previously used a fixed array and are the locus of this PR's change.
  • Hot-path / allocation concern: build_green_tree is invoked on every parse (hot path); design doc explicitly recommends small stack-backed buffers to avoid per-parse heap allocations — rationale for smallvec/inline capacity is documented.
  • Test strategy and expectations: repository emphasizes integration-first tests, round-trip (parse → pretty_print → == source), and span-accuracy checks; new integration test should be verifying per-kind node counts and pretty-print equality consistent with existing strategy.
  • Files to inspect in review (verify against PR changes): src/parser/cst_builder/tree.rs (SPAN_CURSOR_KINDS, SPAN_CURSOR_INLINE, SmallVec usage, SpanCursors methods), src/parser/cst_builder/spans.rs (existence and shape of getters used in mapping), Cargo.toml (smallvec dependency), src/parser/tests/cst_integration.rs and src/parser/tests/helpers.rs (integration assertions and helpers), and the new unit test in src/parser/cst_builder/tree.rs (build_green_tree_matches_span_positions).
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (4)
Cargo.toml (1)

17-17: smallvec 1.15.1 is the current stable version—no action required.

Verification confirms that 1.15.1 (released 2025-06-06) is the latest stable release in the 1.x series. The version format complies with the SemVer caret requirement specified in the coding guidelines, and no security advisories affect this release.

src/parser/tests/helpers.rs (3)

82-102: LGTM: parse_ok and parse_err conversions.

The conversion pattern (convert to SourceText at entry, then use .as_ref()) is clean and consistent. The Into<SourceText> bound provides the flexibility to accept &str, String, or SourceText directly.


106-110: LGTM: round_trip conversion.

The updated signature and conversion logic align with the other helpers. The use of .as_ref() for both the parse_ok call and the assertion is correct.


117-152: LGTM: parse_single_item and domain helper conversions.

All helpers now accept impl Into<SourceText> and delegate correctly. The pattern is uniform throughout, and the #[expect] attribute on parse_single_item is properly scoped with a clear reason.

Comment thread src/parser/tests/helpers.rs Outdated
Comment thread src/parser/tests/helpers.rs Outdated
Comment thread src/parser/tests/helpers.rs Outdated
…dd count_nodes_by_kind

- SourceText now supports From<&str> and From<String> for ergonomic conversions
- Removed From<&String> impl to reduce redundancy
- Added count_nodes_by_kind to count descendant nodes by SyntaxKind in tests
- Expanded documentation on SourceText and count_nodes_by_kind for clarity

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @src/parser/tests/helpers.rs:
- Around line 64-82: Update the doc comment for count_nodes_by_kind to remove
the incorrect parenthetical "(not `node` itself)" and accurately state that
node.descendants() includes the node itself; keep the rest of the description
and example intact and reference the behavior of SyntaxNode::descendants() in
the comment.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d589c60 and 428e3ed.

📒 Files selected for processing (1)
  • src/parser/tests/helpers.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings in Rust.
Handle errors with the Result type instead of panicking in Rust where feasible.
Avoid unsafe code in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared test setup in Rust.
Replace duplicated tests with #[rstest(…)] parameterized cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...

Files:

  • src/parser/tests/helpers.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/parser/tests/helpers.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.

Applied to files:

  • src/parser/tests/helpers.rs
🧬 Code graph analysis (1)
src/parser/tests/helpers.rs (5)
src/parser/expression/token_stream.rs (1)
  • src (106-108)
src/parser/token_stream.rs (1)
  • src (118-120)
src/parser/ast/root.rs (1)
  • kind (46-48)
src/parser/mod.rs (1)
  • parse (47-55)
src/test_util/assertions.rs (1)
  • assert_no_parse_errors (21-23)
🔍 Remote MCP Deepwiki

Summary of additional repository context relevant to reviewing PR #203

  • Parsing pipeline: parse() → tokenize() → parse_tokens() → build_green_tree(); build_green_tree uses SpanCursors to start/finish CST nodes based on ParsedSpans produced by span collection. (Core Parsing Pipeline, Parser Implementation Details)
  • ParsedSpans / span collection: ParsedSpans contains per-kind span slices (imports, typedefs, relations, indexes, functions, transformers, rules); collectors validate sorted/non‑overlap in debug builds. Span collectors use TokenStream and chumsky parsers to record spans. (Core Parsing Pipeline, Parser Implementation Details)
  • SpanCursor/SpanCursors role: SpanCursors manages one cursor per top-level category (imports, typedefs, relations, indexes, functions, transformers, rules, exprs) and drives advance_and_start / finish calls during token iteration in build_green_tree. The tree builder is a hot path invoked on every parse. (Core Parsing Pipeline, Parser Implementation Details)
  • Performance/design guidance: design docs recommend small stack-backed buffers to avoid per-parse heap allocations for CST construction; replacing a Vec with a SmallVec inline buffer is explicitly documented as an optimization. Benchmarks/reuse suggested if profiling shows hotspot. (Overview, Design Documentation)
  • Testing strategy and expectations: repo emphasizes integration-first tests, round-trip (parse → pretty_print == source), and span-accuracy checks. New integration test and unit test added in the PR align with these expectations. (Testing Strategy)
  • Files/components to inspect when reviewing the change:
    • src/parser/cst_builder/tree.rs — build_green_tree, SpanCursors/SpanCursor logic and SPAN_CURSOR_KINDS/constants
    • src/parser/cst_builder/spans.rs — ParsedSpans shape and builder validation
    • src/parser/tests/cst_integration.rs and src/parser/tests/helpers.rs — integration assertions and helper changes (SourceText, count_nodes_by_kind)
    • Cargo.toml — addition of smallvec dependency
      These are the precise hotspots for correctness, allocation, and test coverage verification. (Core Parsing Pipeline, Tokenization, Testing Strategy)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (2)
src/parser/tests/helpers.rs (2)

13-37: LGTM: SourceText wrapper is well-designed.

The wrapper standardises test input types effectively. The From<&str>, From<String>, and AsRef<str> implementations provide ergonomic conversions without redundant code, and the documentation clearly explains the purpose and usage.


95-165: LGTM: Parsing helpers refactored consistently.

The refactoring to accept impl Into<SourceText> is applied uniformly across all parsing helpers. The pattern (convert to SourceText, then as_ref()) is consistent and ergonomic, enabling callers to pass &str, String, or other string-like types without explicit conversions.

Comment thread src/parser/tests/helpers.rs
- Update `count_nodes_by_kind` doc to state count includes the node itself,
  reflecting actual behavior using `SyntaxNode::descendants()`.
- Simplify test calls in body_terms.rs by passing string slices directly to
  `parse_err` instead of references to strings.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos changed the title Refactor SpanCursors to use SmallVec for flexible CST construction Refactor SpanCursors to use SmallVec; add CST integration tests Jan 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @src/parser/tests/helpers.rs:
- Around line 73-79: The doc-example imports the test helper via an inaccessible
path; update the example to reference the helper's actual visibility by
replacing the broken import with a correct local import (e.g., use
super::count_nodes_by_kind;) or remove the import line entirely and call
count_nodes_by_kind from the current module context; adjust the code block
markers to the crate's doc style (no_run/~~~) if needed and keep the example
using the existing function name count_nodes_by_kind and
SyntaxKind::N_IMPORT_STMT.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 428e3ed and 59e65df.

📒 Files selected for processing (2)
  • src/parser/tests/helpers.rs
  • src/parser/tests/rules/body_terms.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider using Arc to reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings in Rust.
Handle errors with the Result type instead of panicking in Rust where feasible.
Avoid unsafe code in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not use return in single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow in Rust.
Use rstest fixtures for shared test setup in Rust.
Replace duplicated tests with #[rstest(…)] parameterized cases in Rust.
Prefer mockall for mocks/stubs in Rust.
Prefer .expect() over .unwrap() in Rust.
Use concat!() to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g., pub fn new(id: u64) -> Self { Self(id) } instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...

Files:

  • src/parser/tests/rules/body_terms.rs
  • src/parser/tests/helpers.rs

⚙️ CodeRabbit configuration file

**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.

  • Adhere to single responsibility and CQRS
  • Place function attributes after doc comments.
  • Do not use return in single-line functions.
  • Move conditionals with >2 branches into a predicate function.
  • Avoid unsafe unless absolutely necessary.
  • Every module must begin with a //! doc comment that explains the module's purpose and utility.
  • Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
  • Lints must not be silenced except as a last resort.
    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.
  • Replace duplicated tests with #[rstest(...)] parameterised cases.
  • Prefer mockall for mocks/stubs.
  • Prefer .expect() over .unwrap() in tests.
  • .expect() and .unwrap() are forbidden outside of tests. Errors must be propagated.
  • Ensure that any API or behavioural changes are reflected in the documentation in docs/
  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/
  • Files must not exceed 400 lines in length
    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
  • Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
    • For testing of functionality depending upon environment variables, dependency injection and...

Files:

  • src/parser/tests/rules/body_terms.rs
  • src/parser/tests/helpers.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.

Applied to files:

  • src/parser/tests/rules/body_terms.rs
  • src/parser/tests/helpers.rs
🧬 Code graph analysis (2)
src/parser/tests/rules/body_terms.rs (2)
src/parser/tests/helpers.rs (1)
  • parse_err (105-115)
src/parser/expression/token_stream.rs (1)
  • src (106-108)
src/parser/tests/helpers.rs (7)
src/parser/expression/token_stream.rs (1)
  • src (106-108)
src/parser/token_stream.rs (1)
  • src (118-120)
src/parser/ast/root.rs (1)
  • kind (46-48)
src/parser/mod.rs (1)
  • parse (47-55)
src/test_util/assertions.rs (1)
  • assert_no_parse_errors (21-23)
src/parser/cst_builder/mod.rs (1)
  • root (45-47)
src/parser/ast/mod.rs (1)
  • syntax (24-24)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Sourcery review
🔇 Additional comments (4)
src/parser/tests/rules/body_terms.rs (1)

84-84: LGTM!

The explicit conversion via .as_str() aligns with the new Into<SourceText> API whilst preserving src for subsequent assertions on span positions. The changes are consistent and correct.

Also applies to: 120-120

src/parser/tests/helpers.rs (3)

13-37: LGTM!

The SourceText wrapper follows the NewType pattern recommended in the guidelines, providing ergonomic From conversions and AsRef access. The documentation clearly explains its purpose in standardizing test input types.


95-123: LGTM!

The updated parse helpers consistently apply the Into<SourceText> pattern whilst preserving existing validation logic. The conversions are clear and correct.


130-165: LGTM!

The parse_single_item helper and its specialised variants correctly adopt the Into<SourceText> API. The guarded expect usage follows the guidelines for test-only code.

Comment thread src/parser/tests/helpers.rs
Corrected the example in the doc comment to use `super::count_nodes_by_kind` instead of an incorrect import path to improve clarity and accuracy.

Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
@leynos leynos linked an issue Jan 5, 2026 that may be closed by this pull request
@leynos leynos merged commit 1cef773 into main Jan 5, 2026
4 checks passed
@leynos leynos deleted the terragon/refactor-spancursors-to-vec-q01vp6 branch January 5, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace fixed-size SpanCursors with flexible Vec approach

1 participant