Refactor SpanCursors to use SmallVec; add CST integration tests#203
Conversation
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughRefactor CST span-cursor construction from a fixed-size array to a data-driven Changes
Sequence Diagram(s)(Skip; changes are internal refactor and test adjustments without a multi-actor sequential runtime flow that benefits from a diagram.) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.rs📄 CodeRabbit inference engine (AGENTS.md)
Files:
⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (1)📚 Learning: 2025-12-26T13:23:35.435ZApplied to files:
🧬 Code graph analysis (1)src/parser/tests/helpers.rs (7)
🔍 Remote MCPBased on the comprehensive repository documentation gathered, here is a concise summary of relevant context for this pull request: Summary of Relevant ContextPurpose of SpanCursors and CST ConstructionSpanCursors manages iteration over collected statement spans and determines when to start and finish syntax nodes in the rowan::GreenNodeBuilder during the Concrete Syntax Tree (CST) construction phase. SpanCursors holds an array of individual SpanCursor instances, one for each type of DDlog statement (e.g., imports, typedefs, relations, functions). How the Refactoring Fits Into the ArchitectureThe PR refactors SpanCursors from a fixed-size array to a dynamic SmallVec-backed collection. This aligns with the existing architecture where the build_green_tree function takes a stream of tokens and ParsedSpans to create a lossless syntax tree, initializes SpanCursors with spans, iterates through each token in the token stream calling cursors.advance_and_start() and cursors.finish() to manage starting and finishing nodes based on token positions. ParsedSpans and Span Collection ContextThe ParsedSpans structure stores categorized byte ranges for different DDlog statements, collected during the "Span Analysis" phase by the parse_tokens function. The ParsedSpans builder ensures that span lists are sorted and non-overlapping in debug builds to prevent errors during CST construction. Seven Cursor CategoriesThe refactoring manages cursors for seven different statement types: imports, typedefs, relations, indexes, functions, transformers, and rules, which directly corresponds to the span categories tracked in ParsedSpans. SmallVec Performance MotivationThe use of SmallVec aligns with the documented architecture goal: reducing heap allocations during the hot path of CST construction by using stack-backed storage when the cursor count fits inline. ⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (3)
Comment |
Reviewer's GuideRefactors CST span handling to replace the fixed-size SpanCursors wrapper and SPAN_CURSOR_COUNT constant with a dynamically built Vec, and updates CST construction to iterate generically over that vector when advancing, starting, and finishing spans per token. Class diagram for dynamic SpanCursor vector-based CST constructionclassDiagram
class ParsedSpans {
+imports() [Span]
+typedefs() [Span]
+relations() [Span]
+indexes() [Span]
+functions() [Span]
+transformers() [Span]
+rules() [Span]
+expressions() [Span]
}
class SpanCursor~'a~ {
-iter PeekableIterSpan
-kind SyntaxKind
+new(spans_ref SpanSliceRef, kind SyntaxKind) SpanCursor~'a~
+advance_to(pos usize) void
+start_if(builder_ref GreenNodeBuilderRef, pos usize) void
+finish_if(builder_ref GreenNodeBuilderRef, pos usize) void
}
class GreenNodeBuilder {
+new() GreenNodeBuilder
+start_node(raw_kind RawSyntaxKind) void
+finish_node() void
}
class Functions {
+build_span_cursors(spans_ref ParsedSpansRef) VecSpanCursor
+build_green_tree(spans_ref ParsedSpansRef, tokens_ref TokenSliceRef, src_ref StrRef) void
}
ParsedSpans "1" --> "*" Span : provides_spans_for
ParsedSpans --> SpanCursor : used_to_construct
SpanCursor --> GreenNodeBuilder : controls_node_start
SpanCursor --> GreenNodeBuilder : controls_node_finish
Functions ..> ParsedSpans : parameter
Functions ..> SpanCursor : constructs_collection
Functions ..> GreenNodeBuilder : drives_cst_build
%% Relationships representing the new dynamic vector usage
class VecSpanCursor {
+len() usize
+iter_mut() SpanCursorIterator
}
Functions --> VecSpanCursor : returns
VecSpanCursor "*" --> "1" SpanCursor : contains
%% Removed structure (for contrast)
class SpanCursors~'a~ {
-cursors SpanCursorArray
}
SpanCursors~'a~ .. SpanCursor : was_fixed_array_of
SpanCursors~'a~ .. Functions : removed_in_refactor
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The per-token loops over
cursorsinbuild_green_treenow duplicate theadvance_to/start_ifandfinish_iflogic inline; consider extracting small helper functions (or a thin wrapper type aroundVec<SpanCursor<'_>>) to keep this logic encapsulated as it was withSpanCursorsand avoid repeating the loops. build_span_cursorscreates a newVecon everybuild_green_treecall; if CST construction is on a hot path, you might want to benchmark and consider alternatives like reusing a preallocated buffer or a small-vector type to reduce heap allocations while keeping the dynamic span category list.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The per-token loops over `cursors` in `build_green_tree` now duplicate the `advance_to`/`start_if` and `finish_if` logic inline; consider extracting small helper functions (or a thin wrapper type around `Vec<SpanCursor<'_>>`) to keep this logic encapsulated as it was with `SpanCursors` and avoid repeating the loops.
- `build_span_cursors` creates a new `Vec` on every `build_green_tree` call; if CST construction is on a hot path, you might want to benchmark and consider alternatives like reusing a preallocated buffer or a small-vector type to reduce heap allocations while keeping the dynamic span category list.
## Individual Comments
### Comment 1
<location> `src/parser/cst_builder/tree.rs:43` </location>
<code_context>
- cur.finish_if(builder, pos);
- }
- }
+fn build_span_cursors(spans: &ParsedSpans) -> Vec<SpanCursor<'_>> {
+ vec![
+ (spans.imports(), SyntaxKind::N_IMPORT_STMT),
</code_context>
<issue_to_address>
**issue (review_instructions):** Add both behavioural and unit tests to cover this change in how span cursors are constructed and iterated.
Add unit tests that exercise `build_green_tree` with representative `ParsedSpans` to verify that imports, typedefs, relations, indexes, functions, transformers, rules, and expressions still start and finish at the correct positions after introducing `build_span_cursors` and the new cursor iteration pattern. In addition, add behavioural tests at the integration level (e.g. parsing a Datalog program and validating the resulting CST) to ensure that the observable behaviour of CST construction has not regressed.
<details>
<summary>Review instructions:</summary>
**Path patterns:** `**/*`
**Instructions:**
- For any new feature or change to an existing feature, both behavioural *and* unit tests are required.
</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@coderabbitai Have the following now been resolved? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 12147 @leynos: Yes, the issues from the code review have been resolved: ✅ Tests Requirement (Fully Resolved)Behavioural/Integration Tests:
Unit Tests:
✅ Code Duplication (Fully Resolved)The inline loop duplication has been eliminated.
The
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
src/parser/cst_builder/tree.rssrc/parser/tests/cst_integration.rssrc/parser/tests/mod.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider usingArcto reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated withcargo doc.
Prefer immutable data and avoid unnecessarymutbindings in Rust.
Handle errors with theResulttype instead of panicking in Rust where feasible.
Avoidunsafecode in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not usereturnin single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallowin Rust.
Userstestfixtures for shared test setup in Rust.
Replace duplicated tests with#[rstest(…)]parameterized cases in Rust.
Prefermockallfor mocks/stubs in Rust.
Prefer.expect()over.unwrap()in Rust.
Useconcat!()to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g.,pub fn new(id: u64) -> Self { Self(id) }instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...
Files:
src/parser/tests/cst_integration.rssrc/parser/tests/mod.rssrc/parser/cst_builder/tree.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
src/parser/tests/cst_integration.rssrc/parser/tests/mod.rssrc/parser/cst_builder/tree.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.
Applied to files:
src/parser/tests/cst_integration.rssrc/parser/tests/mod.rssrc/parser/cst_builder/tree.rs
🧬 Code graph analysis (2)
src/parser/tests/cst_integration.rs (4)
src/test_util/assertions.rs (1)
assert_no_parse_errors(21-23)src/parser/tests/helpers.rs (1)
pretty_print(19-36)src/parser/cst_builder/mod.rs (1)
root(45-47)src/parser/ast/root.rs (1)
kind(46-48)
src/parser/cst_builder/tree.rs (3)
src/parser/cst_builder/spans.rs (17)
imports(52-55)imports(156-158)typedefs(59-62)typedefs(162-164)relations(66-69)relations(168-170)indexes(73-76)indexes(174-176)functions(80-83)functions(180-182)transformers(87-90)transformers(186-188)rules(94-97)rules(192-194)expressions(101-104)expressions(198-200)builder(148-150)src/parser/span_scanner.rs (1)
parse_tokens(17-67)src/parser/ast/rule.rs (1)
text_range_to_span(626-630)
🔍 Remote MCP Deepwiki
Summary of Additional Context Found
Purpose and Function of SpanCursor in CST Construction:
SpanCursor is a utility struct used during the Concrete Syntax Tree (CST) construction phase to manage and track the start and end positions of different statement types within the token stream. Each SpanCursor instance is associated with a specific SyntaxKind (e.g., N_IMPORT_STMT, N_RELATION_DECL) and an iterator over a sorted list of Spans for that kind.
How SpanCursor Operations Work:
For each token during CST construction, the build_green_tree function calls cursors.advance_and_start(&mut builder, span.start), which iterates through all SpanCursors—advancing each past any spans that have already ended and starting new nodes when the token's start position matches a span's start position. After pushing the current token, cursors.finish(&mut builder, span.end) finishes nodes when the token's end position matches or exceeds the end of managed spans.
Supported Cursor Kinds:
The SPAN_CURSOR_KINDS mapping supports seven top-level statement types:
- N_IMPORT_STMT (imports), N_TYPE_DEF (typedefs), N_RELATION_DECL (relations), N_INDEX (indexes), N_FUNCTION (functions), N_TRANSFORMER (transformers), and N_RULE (Datalog rules)
Key Distinction:
SpanCollector is responsible for generating spans during the initial scan of the token stream, whereas SpanCursor uses these pre-collected spans to guide the rowan::GreenNodeBuilder in constructing the CST.
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (5)
src/parser/tests/mod.rs (1)
5-5: LGTM!The new
cst_integrationmodule is correctly wired into the test suite, maintaining alphabetical ordering.src/parser/tests/cst_integration.rs (1)
1-42: LGTM!Solid integration test that exercises CST construction across all top-level statement categories. Good use of
concat!()for the multi-line source string, and the pretty-print round-trip assertion provides additional confidence that the CST structure is correct.src/parser/cst_builder/tree.rs (3)
10-21: LGTM!Clean declarative mapping that centralises cursor kind registration. This makes extending the CST builder with new span categories straightforward—add a getter to
ParsedSpansand a corresponding entry here.
61-67: LGTM!Pre-allocation with
Vec::with_capacityaddresses the allocation concern raised during review. The iterator-driven construction is clean and maintainable.
185-212: LGTM!Comprehensive unit test that validates span-to-node mapping for all eight cursor kinds. Good complement to the integration test, ensuring the cursor machinery produces CST nodes at the expected positions.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/parser/cst_builder/tree.rs (1)
169-173: Eliminate the duplicatetext_range_to_spanhelper.This helper duplicates the implementation at line 626 in
src/parser/ast/rule.rs. Make the originalpub(crate), then import and reuse it here.Based on past review comments flagging this duplication.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
Cargo.tomldocs/ddlint-design-and-road-map.mdsrc/parser/cst_builder/tree.rs
🧰 Additional context used
📓 Path-based instructions (6)
Cargo.toml
📄 CodeRabbit inference engine (AGENTS.md)
Cargo.toml: Use explicit version ranges inCargo.tomland keep dependencies up-to-date.
All crate versions specified inCargo.tomlmust use SemVer-compatible caret requirements (e.g.,some-crate = "1.2.3"); wildcard (*) or open-ended inequality (>=) version requirements are strictly forbidden.
Tilde requirements (~) inCargo.tomlshould only be used where a dependency must be locked to patch-level updates for a specific, documented reason.
Files:
Cargo.toml
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
**/*.md: Markdown paragraphs must be wrapped at 80 columns, and bullet points need the same limit.
Code blocks in Markdown must be wrapped at 120 columns.
Tables in Markdown must not be wrapped, and headings must remain unwrapped.
Use dashes (-) for list bullets in Markdown.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.
Files:
docs/ddlint-design-and-road-map.md
⚙️ CodeRabbit configuration file
**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")
- Use en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Headings must not be wrapped.
- Documents must start with a level 1 heading
- Headings must correctly increase or decrease by no more than one level at a time
- Use GitHub-flavoured Markdown style for footnotes and endnotes.
- Numbered footnotes must be numbered by order of appearance in the document.
Files:
docs/ddlint-design-and-road-map.md
docs/**/*.{md,rs,hs,py}
📄 CodeRabbit inference engine (docs/differential-datalog-parser-syntax-spec-migration-plan.md)
Update all repository references from
docs/haskell-parser-analysis.mdto point to the normative DDlog syntax specification and new implementation notes
Files:
docs/ddlint-design-and-road-map.md
docs/**/*.md
📄 CodeRabbit inference engine (docs/documentation-style-guide.md)
docs/**/*.md: Use British English based on Oxford English Dictionary locale en-GB, including: -ize suffixes in words like 'realize' and 'organization', -lyse suffixes in words like 'analyse' and 'paralyse', -our suffixes in words like 'colour' and 'behaviour', -re suffixes in words like 'centre' and 'calibre', double 'l' in words like 'cancelled' and 'counsellor', maintain 'e' in words like 'likeable' and 'liveable', -ogue suffixes in words like 'catalogue'
Keep United States (US) spelling when used in an API (e.g., 'color')
Use the Oxford comma in documentation: 'ships, planes, and hovercraft' where it aids comprehension
Treat company names as collective nouns in documentation (e.g., 'Concordat Industries are expanding')
Write headings in sentence case
Use Markdown headings (#, ##, ###, etc.) in order without skipping levels
Follow markdownlint recommendations for markdown formatting
Always provide a language identifier for fenced code blocks; use 'plaintext' for non-code text
Use '-' as the first level bullet and renumber lists when items change
Prefer inline links using text or angle brackets around the URL in documentation
Ensure blank lines before and after bulleted lists and fenced blocks in documentation
Ensure tables have a delimiter line below the header row
Expand any uncommon acronym on first use (e.g., Continuous Integration (CI))
Wrap paragraphs at 80 columns in documentation
Wrap code at 120 columns in documentation
Do not wrap tables in documentation
Use GitHub-flavoured numeric footnotes referenced as [^1] in documentation
Footnotes must be numbered in order of appearance in the document
Caption every table and every diagram in documentation
Include Mermaid diagrams where they add clarity to documentation
Use '' syntax for embedding figures and provide brief alt text describing the content in documentation
Add a short description before each Mermaid diagram in documentation so screen readers can understand it
Files:
docs/ddlint-design-and-road-map.md
docs/**/!(README).md
📄 CodeRabbit inference engine (docs/documentation-style-guide.md)
Avoid first and second person personal pronouns outside the README.md file
Files:
docs/ddlint-design-and-road-map.md
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider usingArcto reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated withcargo doc.
Prefer immutable data and avoid unnecessarymutbindings in Rust.
Handle errors with theResulttype instead of panicking in Rust where feasible.
Avoidunsafecode in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not usereturnin single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallowin Rust.
Userstestfixtures for shared test setup in Rust.
Replace duplicated tests with#[rstest(…)]parameterized cases in Rust.
Prefermockallfor mocks/stubs in Rust.
Prefer.expect()over.unwrap()in Rust.
Useconcat!()to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g.,pub fn new(id: u64) -> Self { Self(id) }instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...
Files:
src/parser/cst_builder/tree.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
src/parser/cst_builder/tree.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.
Applied to files:
src/parser/cst_builder/tree.rs
🧬 Code graph analysis (1)
src/parser/cst_builder/tree.rs (2)
src/parser/cst_builder/spans.rs (17)
imports(52-55)imports(156-158)typedefs(59-62)typedefs(162-164)relations(66-69)relations(168-170)indexes(73-76)indexes(174-176)functions(80-83)functions(180-182)transformers(87-90)transformers(186-188)rules(94-97)rules(192-194)expressions(101-104)expressions(198-200)builder(148-150)src/parser/span_scanner.rs (1)
parse_tokens(17-67)
🔍 Remote MCP Deepwiki
Relevant findings for reviewing PR #203 (SpanCursors → dynamic Vec / SmallVec):
-
Code locations and flow impacted
- CST construction uses build_green_tree in src/parser/cst_builder/tree.rs; SpanCursors/SpanCursor coordinate start/finish events driven by ParsedSpans from span scanner → affects hot path in CST builder.,
-
Tests added / coverage
- New integration test module: src/parser/tests/cst_integration.rs validating one top-level node per category (import, typedef, relation, index, function, transformer, rule) and expression node counts; pretty-print round-trip asserted. Unit test build_green_tree_matches_span_positions added to tree.rs to assert span→node mapping across all kinds. These exercise behavior changes end-to-end.
-
SmallVec change and rationale
- PR (final) replaces Vec allocation with SmallVec (stack-backed, capacity 8) for the span-cursors collection to avoid heap allocations in the common case; Cargo.toml updated to include smallvec and docs/ddlint-design-and-road-map.md annotated accordingly. This targets the reviewer concern about per-call allocations on a hot path.
-
Performance / API considerations to verify in review
- build_green_tree remains a hot loop: ensure SmallVec capacity (8) covers typical number of cursor kinds and that Vec/SmallVec construction does not regress allocation behavior under pathological inputs.
- Confirm SpanCursors wrapper now exposes advance_and_start() and finish() (encapsulating earlier duplicated loops) so build_green_tree no longer duplicates per-token loops. Verify no behavioral regressions in start/finish ordering across cursor kinds.
-
Testing strategy / CI expectations
- Repository uses integration-first testing and round-trip pretty-print assertions; added tests align with existing strategy (rstest fixtures, round-trip checks). Confirm CI runs clippy/format/tests with warnings denied; newly added dependency and tests should pass those gates.
Files / areas to inspect closely in review
- src/parser/cst_builder/tree.rs — correctness of SpanCursor kind tracking, start/finish ordering, and absence of off-by-one span handling.
- src/parser/cst_builder/spans.rs and span-scanner collectors — ensure SPAN_CURSOR_KINDS mapping matches ParsedSpans fields and ordering assumptions.
- Tests: src/parser/tests/cst_integration.rs and unit test in tree.rs — verify they cover edge cases (overlapping spans, empty lists).
- Cargo.toml & docs/ddlint-design-and-road-map.md — dependency addition and documentation accuracy for SmallVec rationale.
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (2)
src/parser/cst_builder/tree.rs (1)
187-214: Excellent comprehensive test coverage.This test exercises
build_green_treeacross all eight top-level span kinds and validates that CST node positions match the parsed spans. The use ofconcat!for the multi-line source string follows the coding guidelines.docs/ddlint-design-and-road-map.md (1)
216-218: Well-documented optimisation with proper citation.The addition clearly explains the SmallVec optimisation in the CST construction pipeline, uses correct en-GB-oxendict spelling ("optimization" with -ize), wraps at 80 columns, and includes a properly ordered footnote citation. This aligns with the code changes in
tree.rs.Also applies to: 1009-1010
|
@coderabbitai Have the following now been resolved? The code now uses Vec::with_capacity(SPAN_CURSOR_KINDS.len()) to pre-allocate the vector, reducing heap allocations. The original suggestion to benchmark and potentially reuse buffers was a performance optimisation "consider" rather than a requirement. The current implementation with pre-allocation is a reasonable improvement unless profiling reveals CST construction as a bottleneck. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 1246 @leynos: Yes, this has now been fully resolved ✅ The code has been updated from
SmallVec is a stack-backed vector that avoids heap allocations when the collection fits within the inline capacity ( This fully addresses the original concern about "alternatives like reusing a preallocated buffer or a small-vector type to reduce heap allocations." |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/parser/ast/rule.rs (1)
626-630: Add documentation for the internal API.The visibility change appropriately eliminates duplication in tests. However, expose the function's purpose and contract with a doc comment.
🔎 Proposed documentation
+/// Convert a Rowan text range to a byte-offset span. +/// +/// This helper bridges the CST layer (which uses `rowan::TextRange`) and +/// the parser/diagnostic layer (which uses byte-offset `Span`s). pub(crate) fn text_range_to_span(range: rowan::TextRange) -> Span { let start: usize = range.start().into(); let end: usize = range.end().into(); start..end }As per coding guidelines, internal APIs visible across modules should carry doc comments for maintainability.
♻️ Duplicate comments (1)
src/parser/cst_builder/tree.rs (1)
12-12: Add compile-time synchronization betweenSPAN_CURSOR_INLINEandSPAN_CURSOR_KINDS.This was flagged in a previous review and remains unaddressed. Set
SPAN_CURSOR_INLINEtoSPAN_CURSOR_KINDS.len()directly:-const SPAN_CURSOR_INLINE: usize = 8; +const SPAN_CURSOR_INLINE: usize = SPAN_CURSOR_KINDS.len();Alternatively, add a compile-time assertion to prevent silent heap spills if new span kinds are added.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
src/parser/ast/rule.rssrc/parser/cst_builder/tree.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider usingArcto reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated withcargo doc.
Prefer immutable data and avoid unnecessarymutbindings in Rust.
Handle errors with theResulttype instead of panicking in Rust where feasible.
Avoidunsafecode in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not usereturnin single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallowin Rust.
Userstestfixtures for shared test setup in Rust.
Replace duplicated tests with#[rstest(…)]parameterized cases in Rust.
Prefermockallfor mocks/stubs in Rust.
Prefer.expect()over.unwrap()in Rust.
Useconcat!()to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g.,pub fn new(id: u64) -> Self { Self(id) }instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...
Files:
src/parser/ast/rule.rssrc/parser/cst_builder/tree.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
src/parser/ast/rule.rssrc/parser/cst_builder/tree.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.
Applied to files:
src/parser/ast/rule.rssrc/parser/cst_builder/tree.rs
🧬 Code graph analysis (1)
src/parser/cst_builder/tree.rs (5)
src/parser/ast/root.rs (7)
imports(76-78)relations(88-90)indexes(94-96)functions(100-102)transformers(106-108)rules(112-114)syntax(32-34)src/parser/cst_builder/spans.rs (17)
imports(52-55)imports(156-158)typedefs(59-62)typedefs(162-164)relations(66-69)relations(168-170)indexes(73-76)indexes(174-176)functions(80-83)functions(180-182)transformers(87-90)transformers(186-188)rules(94-97)rules(192-194)expressions(101-104)expressions(198-200)builder(148-150)src/parser/ast/rule.rs (1)
text_range_to_span(626-630)src/parser/span_scanner.rs (1)
parse_tokens(17-67)src/parser/ast/mod.rs (1)
syntax(24-24)
🔍 Remote MCP Deepwiki
Relevant facts discovered (concise)
-
Core CST builder location and span-cursor flow: build_green_tree uses SpanCursors to drive start/finish events from ParsedSpans; SpanCursors/SpanCursor logic lives in src/parser/cst_builder/tree.rs and parsed spans are built in span_scanner.rs.
-
PR aligns with documented pipeline: parse() → parse_tokens() (span collection) → build_green_tree() (CST construction) → ast::Root::from_green(). Changes to SpanCursors affect a hot path in build_green_tree.
-
SmallVec rationale documented: docs/ddlint-design-and-road-map.md notes using a small, stack-backed buffer (smallvec) to avoid heap allocations in the common case; Cargo.toml updated to add smallvec = "1.13". Reviewer allocation concern was addressed by switching to SmallVec with inline capacity.
-
Tests added and testing strategy: new integration test exercises all top-level categories and round-trip pretty-printing; unit test verifies span→node mappings. Repo CI enforces strict quality gates (cargo test, clippy with -D warnings, rustfmt); tests must pass under those constraints.
-
Areas to focus in review (from docs): verify SpanCursor kind tracking, start/finish ordering, absence of off-by-one span handling, that SmallVec inline capacity covers typical cursor count, and that encapsulation/duplication concerns were resolved (SpanCursors wrapper exposing advance_and_start/finish).
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (3)
src/parser/cst_builder/tree.rs (3)
14-23: Well-structured data-driven mapping.The
SPAN_CURSOR_KINDSarray cleanly maps span getters to their correspondingSyntaxKindvalues, enabling generic cursor construction. This is a clear improvement over the previous hard-coded approach.
58-83: Clean encapsulation of cursor operations.The
SpanCursorswrapper withadvance_and_start()andfinish()methods properly encapsulates the iteration logic, addressing the earlier duplication concerns. TheSmallVecconstruction with pre-allocated capacity is appropriate.
182-209: Thorough span position verification test.The test exercises all eight span kinds with a representative input covering imports, typedefs, relations, indexes, functions, transformers, rules, and expressions. The use of
concat!()for the multi-line string follows coding guidelines.
Refactored the CST builder code by removing the SpanCursors struct and replacing it with a function that builds a Vec<SpanCursor> dynamically. This reduces complexity and improves code readability. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Changed the build_span_cursors function to take a borrowed ParsedSpans without lifetime annotation and return a Vec with elided lifetime, improving code clarity and conciseness. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
…clarity - Introduced SpanCursors struct to encapsulate management of multiple SpanCursor instances. - Replaced build_span_cursors function with SpanCursors::new constructor. - Moved advance and finish logic into SpanCursors methods for cleaner iteration. - Added comprehensive tests to validate CST node spans against parsed spans. - Added integration test verifying CST construction for all top-level statement types. This refactor improves code clarity and maintainability by better encapsulating cursor state and behavior during CST construction. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Added the `smallvec` dependency to Cargo.toml and replaced the `Vec` of `SpanCursor`s with a `SmallVec` backed by a stack-allocated buffer of size 8 in the CST builder. This reduces heap allocations in the common case during concrete syntax tree construction, improving performance and memory usage. Also updated documentation to reference the `smallvec` crate and explain the optimization. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Made the `text_range_to_span` function public in `rule.rs` and removed its duplicate local definition from the tests in `tree.rs` to improve code reuse and clarity. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
958b202 to
7c61476
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @src/parser/tests/cst_integration.rs:
- Around line 1-42: The test parse_builds_cst_for_all_top_level_categories
correctly validates CST node counts and pretty-printing but can be simplified by
extracting the count_nodes closure into a reusable helper; move the closure
logic into a new helper function (e.g., count_nodes_by_kind) in the tests
helpers module and update this test to call that helper instead of the inline
closure, keeping references to SyntaxKind variants (N_IMPORT_STMT, N_TYPE_DEF,
N_RELATION_DECL, N_INDEX, N_FUNCTION, N_TRANSFORMER, N_RULE, N_EXPR_NODE) and
existing helpers::pretty_print and parse so behavior and assertions remain
unchanged.
♻️ Duplicate comments (1)
Cargo.toml (1)
17-17: Upgrade smallvec to the latest stable release.This concern has already been raised in a previous review comment. The dependency version 1.13 is outdated; upgrade to 1.15.1 as recommended.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
Cargo.tomldocs/ddlint-design-and-road-map.mdsrc/parser/ast/rule.rssrc/parser/cst_builder/tree.rssrc/parser/tests/cst_integration.rssrc/parser/tests/mod.rs
🧰 Additional context used
📓 Path-based instructions (6)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider usingArcto reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated withcargo doc.
Prefer immutable data and avoid unnecessarymutbindings in Rust.
Handle errors with theResulttype instead of panicking in Rust where feasible.
Avoidunsafecode in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not usereturnin single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallowin Rust.
Userstestfixtures for shared test setup in Rust.
Replace duplicated tests with#[rstest(…)]parameterized cases in Rust.
Prefermockallfor mocks/stubs in Rust.
Prefer.expect()over.unwrap()in Rust.
Useconcat!()to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g.,pub fn new(id: u64) -> Self { Self(id) }instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...
Files:
src/parser/tests/cst_integration.rssrc/parser/ast/rule.rssrc/parser/tests/mod.rssrc/parser/cst_builder/tree.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
src/parser/tests/cst_integration.rssrc/parser/ast/rule.rssrc/parser/tests/mod.rssrc/parser/cst_builder/tree.rs
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
**/*.md: Markdown paragraphs must be wrapped at 80 columns, and bullet points need the same limit.
Code blocks in Markdown must be wrapped at 120 columns.
Tables in Markdown must not be wrapped, and headings must remain unwrapped.
Use dashes (-) for list bullets in Markdown.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.
Files:
docs/ddlint-design-and-road-map.md
⚙️ CodeRabbit configuration file
**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")
- Use en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Headings must not be wrapped.
- Documents must start with a level 1 heading
- Headings must correctly increase or decrease by no more than one level at a time
- Use GitHub-flavoured Markdown style for footnotes and endnotes.
- Numbered footnotes must be numbered by order of appearance in the document.
Files:
docs/ddlint-design-and-road-map.md
docs/**/*.{md,rs,hs,py}
📄 CodeRabbit inference engine (docs/differential-datalog-parser-syntax-spec-migration-plan.md)
Update all repository references from
docs/haskell-parser-analysis.mdto point to the normative DDlog syntax specification and new implementation notes
Files:
docs/ddlint-design-and-road-map.md
docs/**/*.md
📄 CodeRabbit inference engine (docs/documentation-style-guide.md)
docs/**/*.md: Use British English based on Oxford English Dictionary locale en-GB, including: -ize suffixes in words like 'realize' and 'organization', -lyse suffixes in words like 'analyse' and 'paralyse', -our suffixes in words like 'colour' and 'behaviour', -re suffixes in words like 'centre' and 'calibre', double 'l' in words like 'cancelled' and 'counsellor', maintain 'e' in words like 'likeable' and 'liveable', -ogue suffixes in words like 'catalogue'
Keep United States (US) spelling when used in an API (e.g., 'color')
Use the Oxford comma in documentation: 'ships, planes, and hovercraft' where it aids comprehension
Treat company names as collective nouns in documentation (e.g., 'Concordat Industries are expanding')
Write headings in sentence case
Use Markdown headings (#, ##, ###, etc.) in order without skipping levels
Follow markdownlint recommendations for markdown formatting
Always provide a language identifier for fenced code blocks; use 'plaintext' for non-code text
Use '-' as the first level bullet and renumber lists when items change
Prefer inline links using text or angle brackets around the URL in documentation
Ensure blank lines before and after bulleted lists and fenced blocks in documentation
Ensure tables have a delimiter line below the header row
Expand any uncommon acronym on first use (e.g., Continuous Integration (CI))
Wrap paragraphs at 80 columns in documentation
Wrap code at 120 columns in documentation
Do not wrap tables in documentation
Use GitHub-flavoured numeric footnotes referenced as [^1] in documentation
Footnotes must be numbered in order of appearance in the document
Caption every table and every diagram in documentation
Include Mermaid diagrams where they add clarity to documentation
Use '' syntax for embedding figures and provide brief alt text describing the content in documentation
Add a short description before each Mermaid diagram in documentation so screen readers can understand it
Files:
docs/ddlint-design-and-road-map.md
docs/**/!(README).md
📄 CodeRabbit inference engine (docs/documentation-style-guide.md)
Avoid first and second person personal pronouns outside the README.md file
Files:
docs/ddlint-design-and-road-map.md
Cargo.toml
📄 CodeRabbit inference engine (AGENTS.md)
Cargo.toml: Use explicit version ranges inCargo.tomland keep dependencies up-to-date.
All crate versions specified inCargo.tomlmust use SemVer-compatible caret requirements (e.g.,some-crate = "1.2.3"); wildcard (*) or open-ended inequality (>=) version requirements are strictly forbidden.
Tilde requirements (~) inCargo.tomlshould only be used where a dependency must be locked to patch-level updates for a specific, documented reason.
Files:
Cargo.toml
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.
Applied to files:
src/parser/tests/cst_integration.rssrc/parser/ast/rule.rssrc/parser/tests/mod.rssrc/parser/cst_builder/tree.rs
🧬 Code graph analysis (1)
src/parser/cst_builder/tree.rs (3)
src/parser/ast/root.rs (9)
imports(76-78)relations(88-90)indexes(94-96)functions(100-102)transformers(106-108)rules(112-114)kind(46-48)from_green(38-42)syntax(32-34)src/parser/cst_builder/spans.rs (17)
imports(52-55)imports(156-158)typedefs(59-62)typedefs(162-164)relations(66-69)relations(168-170)indexes(73-76)indexes(174-176)functions(80-83)functions(180-182)transformers(87-90)transformers(186-188)rules(94-97)rules(192-194)expressions(101-104)expressions(198-200)builder(148-150)src/parser/ast/rule.rs (1)
text_range_to_span(630-634)
🔍 Remote MCP Deepwiki
Relevant repository facts to check during review:
-
CST builder location & flow: build_green_tree in src/parser/cst_builder/tree.rs consumes tokens + ParsedSpans and uses SpanCursors to start/finish nodes while feeding rowan::GreenNodeBuilder — verify SpanCursors::new() is used there and that advance_and_start/finish integrate correctly.,
-
ParsedSpans organization: ParsedSpans exposes per-kind span slices (imports, typedefs, relations, indexes, functions, transformers, rules) used to build per-kind cursors; ensure SPAN_CURSOR_KINDS maps to those getters and SyntaxKind values and preserves ordering/overlap invariants.
-
SpanCursor API and usage points to inspect: SpanCursor methods used by build_green_tree are advance_to / start_if / finish_if (now wrapped by SpanCursors.advance_and_start and .finish) — verify loop replaced by SpanCursors methods and per-cursor kind field is used where expected.
-
SmallVec change & allocation rationale: smallvec = "1.13" added; SpanCursors now uses smallvec::SmallVec with inline capacity (SPAN_CURSOR_INLINE) and SmallVec::with_capacity(SPAN_CURSOR_KINDS.len()) to avoid heap allocation in common case — confirm inline capacity >= SPAN_CURSOR_KINDS.len() or that capacity choice is conservative.,
-
Tests added and their coverage: new integration test src/parser/tests/cst_integration.rs exercises all top-level categories (N_IMPORT_STMT, N_TYPE_DEF, N_RELATION_DECL, N_INDEX, N_FUNCTION, N_TRANSFORMER, N_RULE) and asserts exact counts plus round-trip pretty-print equality; unit test verifies build_green_tree span→node alignment — confirm tests reference the updated SpanCursors behavior and pass under CI expectations (cargo test, clippy -D warnings).,
-
Hot-path/perf risk to consider: build_green_tree is a hot path in parse() (used every parse); confirm SmallVec usage actually avoids heap allocations in typical cursor counts and review whether SpanCursors allocation per call remains acceptable or should reuse a buffer as an optimization.,
Files/places to inspect in code review:
- src/parser/cst_builder/tree.rs (SpanCursors/SpanCursor, SPAN_CURSOR_KINDS, SPAN_CURSOR_INLINE, build_green_tree usage)
- src/parser/cst_builder/spans.rs (ParsedSpans layout/validation)
- Cargo.toml (smallvec dependency) and docs/ddlint-design-and-road-map.md (notes on smallvec rationale)
- src/parser/tests/cst_integration.rs and updated test module (verify assertions mirror PR objectives)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (9)
docs/ddlint-design-and-road-map.md (2)
216-218: Documentation accurately describes the CST construction optimisation.The semantic note clearly explains the span cursor list's use of a small, stack-backed buffer to avoid heap allocations, with appropriate citation to the smallvec crate.
1008-1010: Footnote citation correctly formatted and sequenced.The new footnote [^16] for smallvec is properly numbered in order of appearance and includes the appropriate documentation URL.
src/parser/ast/rule.rs (1)
626-630: Visibility change and documentation are appropriate.The pub(crate) visibility enables shared use of the CST-to-span conversion utility across parser and diagnostic modules, and the doc comment clearly explains the bridging function between rowan::TextRange and byte-offset Span.
src/parser/tests/mod.rs (1)
5-5: Test module correctly wired into the test suite.The cst_integration module declaration is properly alphabetised and follows standard Rust test organisation conventions.
src/parser/cst_builder/tree.rs (5)
5-12: LGTM — clean type abstraction.The
SpanSliceGetterfunction pointer alias provides a clear contract for the cursor construction mapping.
13-23: LGTM — self-synchronizing capacity constant.Deriving
SPAN_CURSOR_INLINEfromSPAN_CURSOR_KINDS.len()guarantees the inline capacity matches the cursor count, eliminating any possibility of silent heap spill due to drift.
58-83: LGTM — encapsulated cursor management with optimal allocation.The
SmallVecwith inline capacity equal to the cursor count ensures zero heap allocations on the hot path. Theadvance_and_startandfinishmethods cleanly encapsulate the per-cursor iteration, eliminating the loop duplication flagged in earlier reviews.
173-177: LGTM — reuses shared helper as requested.The import of
text_range_to_spanfromcrate::parser::ast::ruleeliminates the duplication flagged in the previous review.
215-254: LGTM — comprehensive span-position validation.The test exercises all eight span categories and verifies that the CST nodes produced by
build_green_treealign precisely with the spans recorded by the parser. The use ofconcat!()for the multi-line source string follows the coding guidelines.
…ration tests Replace inline node counting closures with centralized count_nodes_by_kind helper function for improved code reuse and clarity in tests. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
…r signatures Introduced a SourceText type to reduce reliance on raw string types in parser test helpers. Updated parse functions and related helpers to accept impl Into<SourceText>, improving code clarity and flexibility. Also bumped smallvec dependency to 1.15.1 in Cargo.toml. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Fix all issues with AI Agents 🤖
In @src/parser/tests/helpers.rs:
- Around line 13-15: Add a doc comment above the SourceText struct explaining
that this test-only wrapper standardizes test input types and enables flexible
From conversions (so callers can pass &str, String, etc. without needing
explicit .to_string()), briefly describing its purpose to reduce string-heavy
helper signatures in tests and point to the From impls that make conversions
ergonomic.
- Around line 66-69: The helper function count_nodes_by_kind lacks
documentation; add a detailed doc comment above pub(super) fn
count_nodes_by_kind that explains it counts descendant nodes of a given
SyntaxKind in a SyntaxNode, notes that it includes the node's descendants (not
the node itself) and is intended for tests, and show a brief usage example
(small snippet using a SyntaxNode and SyntaxKind to call count_nodes_by_kind).
Mention the function signature (count_nodes_by_kind(node: &SyntaxNode, kind:
SyntaxKind) -> usize) and that it returns the count so other test modules can
quickly understand and reuse it.
- Around line 29-33: Remove the redundant impl From<&String> for SourceText that
clones the String; &String derefs to &str so the existing From<&str> covers it.
Delete the impl block (the impl From<&String> for SourceText and its fn from) to
avoid the unnecessary clone and rely on the existing From<&str> implementation.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
Cargo.tomlsrc/parser/tests/helpers.rs
🧰 Additional context used
📓 Path-based instructions (2)
Cargo.toml
📄 CodeRabbit inference engine (AGENTS.md)
Cargo.toml: Use explicit version ranges inCargo.tomland keep dependencies up-to-date.
All crate versions specified inCargo.tomlmust use SemVer-compatible caret requirements (e.g.,some-crate = "1.2.3"); wildcard (*) or open-ended inequality (>=) version requirements are strictly forbidden.
Tilde requirements (~) inCargo.tomlshould only be used where a dependency must be locked to patch-level updates for a specific, documented reason.
Files:
Cargo.toml
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider usingArcto reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated withcargo doc.
Prefer immutable data and avoid unnecessarymutbindings in Rust.
Handle errors with theResulttype instead of panicking in Rust where feasible.
Avoidunsafecode in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not usereturnin single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallowin Rust.
Userstestfixtures for shared test setup in Rust.
Replace duplicated tests with#[rstest(…)]parameterized cases in Rust.
Prefermockallfor mocks/stubs in Rust.
Prefer.expect()over.unwrap()in Rust.
Useconcat!()to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g.,pub fn new(id: u64) -> Self { Self(id) }instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...
Files:
src/parser/tests/helpers.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
src/parser/tests/helpers.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.
Applied to files:
src/parser/tests/helpers.rs
🧬 Code graph analysis (1)
src/parser/tests/helpers.rs (2)
src/parser/token_stream.rs (1)
src(118-120)src/parser/mod.rs (1)
parse(47-55)
🔍 Remote MCP Deepwiki
Relevant repo facts for reviewing PR #203 (SpanCursors → SmallVec, span-kind mapping, tests):
- build_green_tree / CST flow: build_green_tree in src/parser/cst_builder/tree.rs constructs the GreenNodeBuilder and uses SpanCursors to start/finish nodes while iterating tokens; ParsedSpans are produced by parse_tokens and fed into CST construction.
- ParsedSpans shape & validation: ParsedSpans contains per-kind span slices (imports, typedefs, relations, indexes, functions, transformers, rules); ParsedSpansBuilder validates lists sorted/non‑overlapping in debug builds.
- Span cursor role & counts: CST builder uses per-kind cursors (one per top-level category) to trigger start/finish events for N_IMPORT_STMT, N_TYPE_DEF, N_RELATION_DECL, N_INDEX, N_FUNCTION, N_TRANSFORMER, N_RULE, and N_EXPR_NODE; SpanCursors previously used a fixed array and are the locus of this PR's change.
- Hot-path / allocation concern: build_green_tree is invoked on every parse (hot path); design doc explicitly recommends small stack-backed buffers to avoid per-parse heap allocations — rationale for smallvec/inline capacity is documented.
- Test strategy and expectations: repository emphasizes integration-first tests, round-trip (parse → pretty_print → == source), and span-accuracy checks; new integration test should be verifying per-kind node counts and pretty-print equality consistent with existing strategy.
- Files to inspect in review (verify against PR changes): src/parser/cst_builder/tree.rs (SPAN_CURSOR_KINDS, SPAN_CURSOR_INLINE, SmallVec usage, SpanCursors methods), src/parser/cst_builder/spans.rs (existence and shape of getters used in mapping), Cargo.toml (smallvec dependency), src/parser/tests/cst_integration.rs and src/parser/tests/helpers.rs (integration assertions and helpers), and the new unit test in src/parser/cst_builder/tree.rs (build_green_tree_matches_span_positions).
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (4)
Cargo.toml (1)
17-17: smallvec 1.15.1 is the current stable version—no action required.Verification confirms that 1.15.1 (released 2025-06-06) is the latest stable release in the 1.x series. The version format complies with the SemVer caret requirement specified in the coding guidelines, and no security advisories affect this release.
src/parser/tests/helpers.rs (3)
82-102: LGTM: parse_ok and parse_err conversions.The conversion pattern (convert to
SourceTextat entry, then use.as_ref()) is clean and consistent. TheInto<SourceText>bound provides the flexibility to accept&str,String, orSourceTextdirectly.
106-110: LGTM: round_trip conversion.The updated signature and conversion logic align with the other helpers. The use of
.as_ref()for both theparse_okcall and the assertion is correct.
117-152: LGTM: parse_single_item and domain helper conversions.All helpers now accept
impl Into<SourceText>and delegate correctly. The pattern is uniform throughout, and the#[expect]attribute onparse_single_itemis properly scoped with a clear reason.
…dd count_nodes_by_kind - SourceText now supports From<&str> and From<String> for ergonomic conversions - Removed From<&String> impl to reduce redundancy - Added count_nodes_by_kind to count descendant nodes by SyntaxKind in tests - Expanded documentation on SourceText and count_nodes_by_kind for clarity Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @src/parser/tests/helpers.rs:
- Around line 64-82: Update the doc comment for count_nodes_by_kind to remove
the incorrect parenthetical "(not `node` itself)" and accurately state that
node.descendants() includes the node itself; keep the rest of the description
and example intact and reference the behavior of SyntaxNode::descendants() in
the comment.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
src/parser/tests/helpers.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider usingArcto reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated withcargo doc.
Prefer immutable data and avoid unnecessarymutbindings in Rust.
Handle errors with theResulttype instead of panicking in Rust where feasible.
Avoidunsafecode in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not usereturnin single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallowin Rust.
Userstestfixtures for shared test setup in Rust.
Replace duplicated tests with#[rstest(…)]parameterized cases in Rust.
Prefermockallfor mocks/stubs in Rust.
Prefer.expect()over.unwrap()in Rust.
Useconcat!()to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g.,pub fn new(id: u64) -> Self { Self(id) }instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...
Files:
src/parser/tests/helpers.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
src/parser/tests/helpers.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.
Applied to files:
src/parser/tests/helpers.rs
🧬 Code graph analysis (1)
src/parser/tests/helpers.rs (5)
src/parser/expression/token_stream.rs (1)
src(106-108)src/parser/token_stream.rs (1)
src(118-120)src/parser/ast/root.rs (1)
kind(46-48)src/parser/mod.rs (1)
parse(47-55)src/test_util/assertions.rs (1)
assert_no_parse_errors(21-23)
🔍 Remote MCP Deepwiki
Summary of additional repository context relevant to reviewing PR #203
- Parsing pipeline: parse() → tokenize() → parse_tokens() → build_green_tree(); build_green_tree uses SpanCursors to start/finish CST nodes based on ParsedSpans produced by span collection. (Core Parsing Pipeline, Parser Implementation Details)
- ParsedSpans / span collection: ParsedSpans contains per-kind span slices (imports, typedefs, relations, indexes, functions, transformers, rules); collectors validate sorted/non‑overlap in debug builds. Span collectors use TokenStream and chumsky parsers to record spans. (Core Parsing Pipeline, Parser Implementation Details)
- SpanCursor/SpanCursors role: SpanCursors manages one cursor per top-level category (imports, typedefs, relations, indexes, functions, transformers, rules, exprs) and drives advance_and_start / finish calls during token iteration in build_green_tree. The tree builder is a hot path invoked on every parse. (Core Parsing Pipeline, Parser Implementation Details)
- Performance/design guidance: design docs recommend small stack-backed buffers to avoid per-parse heap allocations for CST construction; replacing a Vec with a SmallVec inline buffer is explicitly documented as an optimization. Benchmarks/reuse suggested if profiling shows hotspot. (Overview, Design Documentation)
- Testing strategy and expectations: repo emphasizes integration-first tests, round-trip (parse → pretty_print == source), and span-accuracy checks. New integration test and unit test added in the PR align with these expectations. (Testing Strategy)
- Files/components to inspect when reviewing the change:
- src/parser/cst_builder/tree.rs — build_green_tree, SpanCursors/SpanCursor logic and SPAN_CURSOR_KINDS/constants
- src/parser/cst_builder/spans.rs — ParsedSpans shape and builder validation
- src/parser/tests/cst_integration.rs and src/parser/tests/helpers.rs — integration assertions and helper changes (SourceText, count_nodes_by_kind)
- Cargo.toml — addition of smallvec dependency
These are the precise hotspots for correctness, allocation, and test coverage verification. (Core Parsing Pipeline, Tokenization, Testing Strategy)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (2)
src/parser/tests/helpers.rs (2)
13-37: LGTM: SourceText wrapper is well-designed.The wrapper standardises test input types effectively. The
From<&str>,From<String>, andAsRef<str>implementations provide ergonomic conversions without redundant code, and the documentation clearly explains the purpose and usage.
95-165: LGTM: Parsing helpers refactored consistently.The refactoring to accept
impl Into<SourceText>is applied uniformly across all parsing helpers. The pattern (convert toSourceText, thenas_ref()) is consistent and ergonomic, enabling callers to pass&str,String, or other string-like types without explicit conversions.
- Update `count_nodes_by_kind` doc to state count includes the node itself, reflecting actual behavior using `SyntaxNode::descendants()`. - Simplify test calls in body_terms.rs by passing string slices directly to `parse_err` instead of references to strings. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @src/parser/tests/helpers.rs:
- Around line 73-79: The doc-example imports the test helper via an inaccessible
path; update the example to reference the helper's actual visibility by
replacing the broken import with a correct local import (e.g., use
super::count_nodes_by_kind;) or remove the import line entirely and call
count_nodes_by_kind from the current module context; adjust the code block
markers to the crate's doc style (no_run/~~~) if needed and keep the example
using the existing function name count_nodes_by_kind and
SyntaxKind::N_IMPORT_STMT.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
src/parser/tests/helpers.rssrc/parser/tests/rules/body_terms.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
**/*.rs: Clippy warnings must be disallowed in Rust code.
Fix any warnings emitted during tests in the code itself rather than silencing them in Rust.
Where a Rust function is too long, extract meaningfully named helper functions adhering to separation of concerns and the Command Query Responsibility Segregation pattern.
Where a Rust function has too many parameters, group related parameters in meaningfully named structs.
Where a Rust function is returning a large error, consider usingArcto reduce the amount of data returned.
Write unit and scenario tests for new Rust functionality; run both before and after making any change.
Every Rust module must begin with a module level (//!) comment explaining the module's purpose and utility.
Document public APIs in Rust using Rustdoc comments (///), so documentation can be generated withcargo doc.
Prefer immutable data and avoid unnecessarymutbindings in Rust.
Handle errors with theResulttype instead of panicking in Rust where feasible.
Avoidunsafecode in Rust unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments in Rust.
Do not usereturnin single-line functions in Rust.
Use predicate functions for conditional criteria with more than two branches in Rust.
Lints must not be silenced in Rust except as a last resort; lint rule suppressions must be tightly scoped and include a clear reason.
Preferexpectoverallowin Rust.
Userstestfixtures for shared test setup in Rust.
Replace duplicated tests with#[rstest(…)]parameterized cases in Rust.
Prefermockallfor mocks/stubs in Rust.
Prefer.expect()over.unwrap()in Rust.
Useconcat!()to combine long string literals in Rust rather than escaping newlines with a backslash.
Prefer single-line versions of functions in Rust where appropriate (e.g.,pub fn new(id: u64) -> Self { Self(id) }instead of multi-line variants).
Use NewTypes in Rust to model domain values and eliminate "i...
Files:
src/parser/tests/rules/body_terms.rssrc/parser/tests/helpers.rs
⚙️ CodeRabbit configuration file
**/*.rs: * Seek to keep the cognitive complexity of functions no more than 9.
- Adhere to single responsibility and CQRS
- Place function attributes after doc comments.
- Do not use
returnin single-line functions.- Move conditionals with >2 branches into a predicate function.
- Avoid
unsafeunless absolutely necessary.- Every module must begin with a
//!doc comment that explains the module's purpose and utility.- Comments and docs must follow en-GB-oxendict (-ize / -yse / -our) spelling and grammar
- Lints must not be silenced except as a last resort.
#[allow]is forbidden.- Only narrowly scoped
#[expect(lint, reason = "...")]is allowed.- No lint groups, no blanket or file-wide suppression.
- Include
FIXME:with link if a fix is expected.- Where code is only used by specific features, it must be conditionally compiled or a conditional expectation for unused_code applied.
- Use
rstestfixtures for shared setup and to avoid repetition between tests.- Replace duplicated tests with
#[rstest(...)]parameterised cases.- Prefer
mockallfor mocks/stubs.- Prefer
.expect()over.unwrap()in tests..expect()and.unwrap()are forbidden outside of tests. Errors must be propagated.- Ensure that any API or behavioural changes are reflected in the documentation in
docs/- Ensure that any completed roadmap steps are recorded in the appropriate roadmap in
docs/- Files must not exceed 400 lines in length
- Large modules must be decomposed
- Long match statements or dispatch tables should be decomposed by domain and collocated with targets
- Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.
- Environment access (env::set_var and env::remove_var) are always unsafe in Rust 2024 and MUST be marked as such
- For testing of functionality depending upon environment variables, dependency injection and...
Files:
src/parser/tests/rules/body_terms.rssrc/parser/tests/helpers.rs
🧠 Learnings (1)
📚 Learning: 2025-12-26T13:23:35.435Z
Learnt from: leynos
Repo: leynos/ddlint PR: 197
File: src/test_util/literals.rs:135-139
Timestamp: 2025-12-26T13:23:35.435Z
Learning: In Rust projects, rustfmt may reflow long simple constructor functions (e.g., pub fn lit_bool(b: bool) -> Expr { Expr::Literal(Literal::Bool(b)) }) to a multi-line form. Ensure you run and respect rustfmt formatting so CI formatting checks pass. Do not manually enforce non-default line breaks; format with rustfmt. Consider adding a rustfmt --check (or cargo fmt) step in CI and running cargo fmt before commits to avoid formatting-related failures.
Applied to files:
src/parser/tests/rules/body_terms.rssrc/parser/tests/helpers.rs
🧬 Code graph analysis (2)
src/parser/tests/rules/body_terms.rs (2)
src/parser/tests/helpers.rs (1)
parse_err(105-115)src/parser/expression/token_stream.rs (1)
src(106-108)
src/parser/tests/helpers.rs (7)
src/parser/expression/token_stream.rs (1)
src(106-108)src/parser/token_stream.rs (1)
src(118-120)src/parser/ast/root.rs (1)
kind(46-48)src/parser/mod.rs (1)
parse(47-55)src/test_util/assertions.rs (1)
assert_no_parse_errors(21-23)src/parser/cst_builder/mod.rs (1)
root(45-47)src/parser/ast/mod.rs (1)
syntax(24-24)
⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (4)
src/parser/tests/rules/body_terms.rs (1)
84-84: LGTM!The explicit conversion via
.as_str()aligns with the newInto<SourceText>API whilst preservingsrcfor subsequent assertions on span positions. The changes are consistent and correct.Also applies to: 120-120
src/parser/tests/helpers.rs (3)
13-37: LGTM!The
SourceTextwrapper follows the NewType pattern recommended in the guidelines, providing ergonomicFromconversions andAsRefaccess. The documentation clearly explains its purpose in standardizing test input types.
95-123: LGTM!The updated parse helpers consistently apply the
Into<SourceText>pattern whilst preserving existing validation logic. The conversions are clear and correct.
130-165: LGTM!The
parse_single_itemhelper and its specialised variants correctly adopt theInto<SourceText>API. The guardedexpectusage follows the guidelines for test-only code.
Corrected the example in the doc comment to use `super::count_nodes_by_kind` instead of an incorrect import path to improve clarity and accuracy. Co-authored-by: terragon-labs[bot] <terragon-labs[bot]@users.noreply.github.com>
Summary
Core CST Construction
Implementation Details
Tests
Integration Tests
Notes
📎 Task: https://www.terragonlabs.com/task/d0f7613f-b563-46d4-af14-6cb6a796f068