diff --git a/docs/ddlint-design.md b/docs/ddlint-design.md index 86a29518..80075318 100644 --- a/docs/ddlint-design.md +++ b/docs/ddlint-design.md @@ -447,7 +447,9 @@ The core contracts are: semantic failures without leaking internal implementation types. - Provenance: semantic nodes must retain source mappings back to the lossless syntax layer for rule heads, body terms, aggregations, attributes, and - adornments, so diagnostics and later rewrites remain explainable. + adornments, so diagnostics and later rewrites remain explainable. For symbol + records specifically, coarse declaration or literal provenance must stay + distinct from any precise identifier-token span used for diagnostics. - Semantic model completeness: `ddlog-sema` must expose typed structures for constructs that downstream planners care about, including joins, filters, projections, aggregations, recursion boundaries, and semantic facts such as @@ -734,14 +736,24 @@ genuine reads. `for`-loop pattern, and it treats a binding as used only when semantic resolution maps a `UseKind::Variable` site back to that exact symbol. The wildcard `_` remains an explicit ignore and is never recorded as a -warning-eligible binding. As of roadmap item `4.1.2`, rule-local binding spans -still point at the enclosing rule or literal site rather than the exact -identifier token, so diagnostics for these rules use that existing coarse -provenance. The semantic model deliberately stays on the `Span` side of that -boundary; conversion into diagnostic `rowan::TextRange` values happens in the -linter layer through one shared boundary helper in `src/linter/span_utils.rs`. -This keeps a single conversion point, rather than per-rule glue, for -correctness diagnostics. +warning-eligible binding. + +Each `Symbol` now carries two related provenance fields: + +- `span` remains the existing coarse provenance used for debugging, maps, and + ownership; it points at the enclosing declaration, rule, or body term that + introduced the symbol. +- `name_span` is an additive optional field that points at the precise + identifier token when semantic collection had CST access to capture it. + +Rules such as `unused-variable` should prefer `name_span` for diagnostic +highlighting and fall back to `span` only when precise token provenance is not +available, such as parse-time `SemanticRule` values that retain only lowered +expressions plus a coarse source span. The semantic model deliberately stays on +the `Span` side of that boundary; conversion into diagnostic `rowan::TextRange` +values happens in the linter layer through one shared boundary helper in +`src/linter/span_utils.rs`. This keeps a single conversion point, rather than +per-rule glue, for correctness diagnostics. Resolution is deliberately tri-state: @@ -763,8 +775,9 @@ body scope and do not escape to later top-level literals. This pass consumes both AST-backed rules and parse-time `SemanticRule` values from top-level `for` desugaring, so the semantic view matches the parser's -existing rule surface. Provenance is retained through source spans on -declarations and on the enclosing literal or rule site for rule-local facts. +existing rule surface. Provenance is retained through coarse source spans on +declarations and on the enclosing literal or rule site for rule-local facts, +with precise identifier-token spans captured additively when CST access exists. ### 3.3. Initial lint rule catalog @@ -775,16 +788,16 @@ suggestions, establishing a solid foundation of essential lints. Table: DDLint rule catalogue and metadata. -| Rule Name | Group | Default Level | Autofixable | Description | -| ---------------------- | ----------- | ------------- | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| unused-relation | correctness | warn | No | Detects declared relations with no resolved read-like uses in rule bodies, `for` iterables, or `for` guards; rule heads count only as writes. | -| unused-variable | correctness | warn | No | Detects rule-local bindings from rule heads, assignment patterns, or `for` patterns that never receive a resolved variable use in the same rule; `_` is an explicit ignore. | -| shadowed-variable | correctness | warn | No | Detects when a variable binding in a literal shadows one from a preceding literal in the same rule body. | -| recursive-negation | correctness | error | No | Detects rules with recursion through negation, which leads to unsafe, non-monotonic programs. | -| inefficient-join-order | performance | hint | No | Evaluates rule bodies and suggests reordering atoms for a more efficient join plan, e.g., placing more restrictive literals first. | -| superfluous-group-by | performance | warn | Yes | Detects group_by clauses where the aggregation is trivial (e.g., grouping by all variables) and can be removed. | -| consistent-casing | style | allow | Yes | Enforces a consistent casing style for relation and type identifiers (e.g., PascalCase) and variables (e.g., snake_case). | -| no-magic-numbers | style | allow | No | Flags the use of unnamed numeric literals in rule bodies where a named constant might be clearer. | +| Rule Name | Group | Default Level | Autofixable | Description | +| ---------------------- | ----------- | ------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| unused-relation | correctness | warn | No | Detects declared relations with no resolved read-like uses in rule bodies, `for` iterables, or `for` guards; rule heads count only as writes. | +| unused-variable | correctness | warn | No | Detects rule-local bindings from rule heads, assignment patterns, or `for` patterns that never receive a resolved variable use in the same rule; `_` is an explicit ignore, and diagnostics prefer precise identifier-token spans when available. | +| shadowed-variable | correctness | warn | No | Detects when a variable binding in a literal shadows one from a preceding literal in the same rule body. | +| recursive-negation | correctness | error | No | Detects rules with recursion through negation, which leads to unsafe, non-monotonic programs. | +| inefficient-join-order | performance | hint | No | Evaluates rule bodies and suggests reordering atoms for a more efficient join plan, e.g., placing more restrictive literals first. | +| superfluous-group-by | performance | warn | Yes | Detects group_by clauses where the aggregation is trivial (e.g., grouping by all variables) and can be removed. | +| consistent-casing | style | allow | Yes | Enforces a consistent casing style for relation and type identifiers (e.g., PascalCase) and variables (e.g., snake_case). | +| no-magic-numbers | style | allow | No | Flags the use of unnamed numeric literals in rule bodies where a named constant might be clearer. | This table serves as a concrete work breakdown for the engineering team and clearly communicates the linter's initial capabilities and priorities to early diff --git a/docs/parser-implementation-notes.md b/docs/parser-implementation-notes.md index b0448229..96e39d1d 100644 --- a/docs/parser-implementation-notes.md +++ b/docs/parser-implementation-notes.md @@ -213,6 +213,41 @@ Current guarantees are intentionally narrow: - Name resolution records `Resolved`, `Unresolved`, and `Ignored` outcomes rather than emitting diagnostics directly. +### Precise identifier span sourcing + +Precise diagnostic spans for declarations and rule-local bindings are captured +once, either during syntax collection or during semantic collection, rather +than by re-walking the CST in lint rules. + +Implementation points: + +- `src/parser/ast/mod.rs` exposes `find_identifier_span(...)` and + `find_identifier_span_in_range(...)`, which locate the first matching + `T_IDENT` token within a syntax subtree or span-bounded search window. +- `src/parser/ast/function.rs`, `src/parser/ast/relation.rs`, and + `src/parser/ast/type_def.rs` use that helper to expose `name_span()` for + top-level declaration identifiers. +- `src/parser/ast/rule_head.rs` computes `RuleHead::binding_spans` while + parsing head text. `collect_head_binding_spans(...)` tokenizes the head atom + slice, filters out structural identifiers such as relation callees and field + labels, and records only declaration-like binding identifiers with absolute + source spans. + +Semantic collection then threads those precise spans into the owned +`SemanticModel`: + +- `src/sema/builder.rs` records top-level declaration `name_span` values while + AST wrappers are already in hand. +- `src/sema/traverse.rs` consumes `RuleHead::binding_spans` for AST-backed rule + heads and threads optional CST `syntax` handles through + `collect_rule_term(...)`, `collect_assignment_term(...)`, and + `collect_for_loop_term(...)` so assignment and `for` pattern bindings can + resolve `SymbolSpec.name_span` during collection. +- Where CST handles are not preserved faithfully, semantic collection leaves + `SymbolSpec.name_span` unset by design. This includes synthetic + `SemanticRuleHead` bindings produced by top-level `for` lowering and nested + `for` body traversals that recurse with `syntax: None`. + Similarly, collection literal lowering is not part of the base `parse()` pipeline: diff --git a/src/linter/rules/correctness/unused_variable.rs b/src/linter/rules/correctness/unused_variable.rs index b7b9633c..904d7ac0 100644 --- a/src/linter/rules/correctness/unused_variable.rs +++ b/src/linter/rules/correctness/unused_variable.rs @@ -3,8 +3,8 @@ //! The rule operates over semantic `RuleBinding` symbols rather than raw CST //! name matching so shadowing, unresolved names, and wildcard ignores follow //! the semantic model's existing contracts. Diagnostics currently use the -//! binding symbol's stored span, which points at the enclosing rule or literal -//! rather than the exact identifier token. +//! binding symbol's precise identifier span when semantic collection captured +//! one, falling back to the existing coarse provenance otherwise. use crate::linter::{LintDiagnostic, Rule}; use crate::{SyntaxKind, declare_lint}; @@ -30,7 +30,8 @@ declare_lint! { continue; } - let Some(range) = crate::linter::span_utils::span_to_text_range(symbol.span()) else { + let span = symbol.name_span().unwrap_or_else(|| symbol.span()); + let Some(range) = crate::linter::span_utils::span_to_text_range(span) else { continue; }; @@ -69,6 +70,10 @@ mod tests { .map(crate::linter::LintDiagnostic::message), Some("variable `head_x` is defined but never used in this rule") ); + assert_eq!( + diagnostics.first().map(crate::linter::LintDiagnostic::span), + Some(rowan::TextRange::new(7.into(), 13.into())) + ); } #[rstest] diff --git a/src/parser/ast/function.rs b/src/parser/ast/function.rs index 6b0bec97..6c51dc2a 100644 --- a/src/parser/ast/function.rs +++ b/src/parser/ast/function.rs @@ -29,6 +29,13 @@ impl Function { }) } + /// Precise source span for the function name token, if present. + #[must_use] + pub fn name_span(&self) -> Option { + self.name() + .and_then(|name| super::find_identifier_span(&self.syntax, &name)) + } + /// Returns `true` if declared with the `extern` keyword. #[must_use] pub fn is_extern(&self) -> bool { @@ -75,7 +82,8 @@ impl_ast_node!(Function); #[cfg(test)] mod tests { - use crate::parse; + use crate::{parse, test_util::span_text}; + #[expect(clippy::expect_used, reason = "Using expect for clearer test failures")] #[test] fn function_name() { @@ -89,4 +97,25 @@ mod tests { .expect("function missing"); assert_eq!(func.name().as_deref(), Some("foo")); } + + #[test] + fn function_name_span_points_to_declaration_identifier() { + let source = "function project(project: u32): u32 { project }"; + let parsed = parse(source); + crate::test_util::assert_no_parse_errors(parsed.errors()); + #[expect(clippy::expect_used, reason = "Using expect for clearer test failures")] + let func = parsed + .root() + .functions() + .first() + .cloned() + .expect("function missing"); + + let span = func + .name_span() + .unwrap_or_else(|| panic!("missing function name_span in `{source}`")); + + assert_eq!(span_text(source, &span), "project"); + assert_eq!(span.start, "function ".len()); + } } diff --git a/src/parser/ast/mod.rs b/src/parser/ast/mod.rs index 85737e8f..f2001caa 100644 --- a/src/parser/ast/mod.rs +++ b/src/parser/ast/mod.rs @@ -9,10 +9,11 @@ //! tests and fixtures to assemble match expressions without dipping into //! private modules. -use rowan::SyntaxElement; +use rowan::{NodeOrToken, SyntaxElement, SyntaxNode}; use self::parse_utils::is_trivia; -use crate::{DdlogLanguage, SyntaxKind}; +use crate::parser::ast::rule::text_range_to_span; +use crate::{DdlogLanguage, Span, SyntaxKind, SyntaxToken}; /// Common interface for AST wrappers. pub(crate) trait AstNode { @@ -20,6 +21,78 @@ pub(crate) trait AstNode { fn syntax(&self) -> &rowan::SyntaxNode; } +/// Find the first identifier token with matching text within a syntax subtree. +#[must_use] +pub(crate) fn find_identifier_span(syntax: &SyntaxNode, name: &str) -> Option { + find_identifier_span_in_range(syntax, &text_range_to_span(syntax.text_range()), name) +} + +/// Find the first identifier token with matching text within a span-bounded subtree search. +#[must_use] +pub(crate) fn find_identifier_span_in_range( + syntax: &SyntaxNode, + search_span: &Span, + name: &str, +) -> Option { + let mut stack = reversed_children(syntax); + + while let Some(element) = stack.pop() { + match element { + NodeOrToken::Token(token) => { + if !token_overlaps_range(&token, search_span) || !is_matching_ident(&token, name) { + continue; + } + + let token_span = text_range_to_span(token.text_range()); + if span_contains(search_span, &token_span) { + return Some(token_span); + } + } + NodeOrToken::Node(node) => { + if !node_overlaps_range(&node, search_span) { + continue; + } + + stack.extend(reversed_children(&node)); + } + } + } + + None +} + +/// Collect child elements in reverse order so a stack preserves source order. +fn reversed_children(syntax: &SyntaxNode) -> Vec> { + let mut children: Vec<_> = syntax.children_with_tokens().collect(); + children.reverse(); + children +} + +/// Whether a token's byte span overlaps the search range at all. +fn token_overlaps_range(token: &SyntaxToken, range: &Span) -> bool { + ranges_overlap(range, &text_range_to_span(token.text_range())) +} + +/// Whether a node's byte span overlaps the search range at all. +fn node_overlaps_range(node: &SyntaxNode, range: &Span) -> bool { + ranges_overlap(range, &text_range_to_span(node.text_range())) +} + +/// Whether the token is the identifier text the caller is searching for. +fn is_matching_ident(token: &SyntaxToken, name: &str) -> bool { + token.kind() == SyntaxKind::T_IDENT && token.text() == name +} + +/// Whether two byte ranges intersect. +fn ranges_overlap(left: &Span, right: &Span) -> bool { + left.start < right.end && right.start < left.end +} + +/// Whether the candidate span sits entirely within the container span. +fn span_contains(container: &Span, candidate: &Span) -> bool { + container.start <= candidate.start && candidate.end <= container.end +} + macro_rules! impl_ast_node { ($ty:ty) => { impl AstNode for $ty { @@ -181,6 +254,7 @@ pub use type_def::TypeDef; #[cfg(test)] mod tests { + mod identifier_span; mod precedence; use crate::parse; diff --git a/src/parser/ast/relation.rs b/src/parser/ast/relation.rs index ccd19f04..c18a75e9 100644 --- a/src/parser/ast/relation.rs +++ b/src/parser/ast/relation.rs @@ -58,6 +58,13 @@ impl Relation { }) } + /// Precise source span for the relation name token, if present. + #[must_use] + pub fn name_span(&self) -> Option { + self.name() + .and_then(|name| super::find_identifier_span(&self.syntax, &name)) + } + /// Returns `true` if declared with the `input` keyword. #[must_use] pub fn is_input(&self) -> bool { @@ -136,7 +143,7 @@ impl_ast_node!(Relation); #[cfg(test)] mod tests { - use crate::parse; + use crate::{parse, test_util::span_text}; #[test] fn relation_name() { @@ -151,4 +158,25 @@ mod tests { assert_eq!(rel.name().as_deref(), Some("R")); assert!(rel.is_input()); } + + #[test] + fn relation_name_span_points_to_declaration_identifier() { + let source = "input relation Source(Source: u32)"; + let parsed = parse(source); + crate::test_util::assert_no_parse_errors(parsed.errors()); + #[expect(clippy::expect_used, reason = "Using expect for clearer test failures")] + let rel = parsed + .root() + .relations() + .first() + .cloned() + .expect("relation missing"); + + let span = rel + .name_span() + .unwrap_or_else(|| panic!("missing relation name_span in `{source}`")); + + assert_eq!(span_text(source, &span), "Source"); + assert_eq!(span.start, "input relation ".len()); + } } diff --git a/src/parser/ast/rule_head.rs b/src/parser/ast/rule_head.rs index 019b93e0..f74142e0 100644 --- a/src/parser/ast/rule_head.rs +++ b/src/parser/ast/rule_head.rs @@ -55,6 +55,7 @@ fn process_token_for_head_text( } } +/// Extract the first head segment text from a rule syntax node. pub(crate) fn first_head_text(syntax: &rowan::SyntaxNode) -> Option { use rowan::NodeOrToken; @@ -88,8 +89,13 @@ pub struct RuleHead { pub atom: Expr, /// Optional `@` location expression. pub location: Option, + /// Source span covering only the head atom. + pub span: Span, + /// Precise identifier-token spans for bindings introduced by the head atom. + pub binding_spans: Vec<(String, Span)>, } +/// Parse the comma-separated heads at the start of a rule. pub(crate) fn parse_rule_heads( rule_src: &str, base_offset: usize, @@ -162,16 +168,77 @@ fn parse_rule_head_span( segment_offset.saturating_add(location_start), )?); - return Ok(Some(RuleHead { atom, location })); + let binding_spans = collect_head_binding_spans(atom_src, segment_offset); + + return Ok(Some(RuleHead { + atom, + location, + span: segment_offset..segment_offset.saturating_add(at_span.start), + binding_spans, + })); } + let binding_spans = collect_head_binding_spans(trimmed, segment_offset); let atom = lower_by_ref_head(parse_expr_with_offset(trimmed, segment_offset)?); Ok(Some(RuleHead { atom, location: None, + span: segment_offset..segment_offset.saturating_add(trimmed.len()), + binding_spans, })) } +/// Collect unique binding identifier spans from a parsed head atom slice. +/// +/// The scanner keeps only declaration-like identifiers. Relation callees, +/// dotted access segments, constructor-like identifiers, and wildcards are +/// skipped so the returned spans can feed `RuleHead::binding_spans`. +fn collect_head_binding_spans(src: &str, base_offset: usize) -> Vec<(String, Span)> { + let tokens = tokenize_without_trivia(src); + let mut bindings = Vec::new(); + + for (index, (kind, span)) in tokens.iter().enumerate() { + if *kind != SyntaxKind::T_IDENT { + continue; + } + + let text = src.get(span.clone()).unwrap_or(""); + if should_skip_binding_ident(&tokens, index) || text == "_" { + continue; + } + + if bindings.iter().any(|(existing, _)| existing == text) { + continue; + } + + bindings.push(( + text.to_string(), + base_offset.saturating_add(span.start)..base_offset.saturating_add(span.end), + )); + } + + bindings +} + +/// Whether the identifier token at `index` is structural rather than a binding. +/// +/// Identifiers used as dotted members or immediately followed by `(`, `{`, or +/// `:` are treated as relation names, constructors, or field labels and are +/// therefore excluded from head-binding collection. +fn should_skip_binding_ident(tokens: &[(SyntaxKind, Span)], index: usize) -> bool { + let prev_kind = index + .checked_sub(1) + .and_then(|prev| tokens.get(prev)) + .map(|(kind, _)| *kind); + let next_kind = tokens.get(index + 1).map(|(kind, _)| *kind); + + prev_kind == Some(SyntaxKind::T_DOT) + || matches!( + next_kind, + Some(SyntaxKind::T_LPAREN | SyntaxKind::T_LBRACE | SyntaxKind::T_COLON) + ) +} + fn parse_expr_with_offset(src: &str, base_offset: usize) -> Result>> { let (start, end) = trim_byte_range(src); let trimmed = src.get(start..end).unwrap_or(""); @@ -303,3 +370,7 @@ fn parse_delay_suffix( let full = minus_span.start..tokens.last().map_or(minus_span.end, |(_, sp)| sp.end); Ok(Some(DelaySuffix { full, value })) } + +#[cfg(test)] +#[path = "rule_head/tests.rs"] +mod tests; diff --git a/src/parser/ast/rule_head/tests.rs b/src/parser/ast/rule_head/tests.rs new file mode 100644 index 00000000..a7853c92 --- /dev/null +++ b/src/parser/ast/rule_head/tests.rs @@ -0,0 +1,59 @@ +//! Tests for rule-head binding span helpers. + +use super::{collect_head_binding_spans, should_skip_binding_ident}; +use crate::{Span, SyntaxKind, tokenize_without_trivia}; + +fn ident_indices(tokens: &[(SyntaxKind, Span)]) -> Vec { + tokens + .iter() + .enumerate() + .filter_map(|(index, (kind, _))| (*kind == SyntaxKind::T_IDENT).then_some(index)) + .collect() +} + +#[test] +fn collect_head_binding_spans_skips_relation_callee_collision() { + let source = "Output(Output)"; + + let spans = collect_head_binding_spans(source, 0); + + assert_eq!(spans.as_slice(), &[("Output".to_string(), 7..13)]); +} + +#[test] +fn should_skip_binding_ident_rejects_callee_but_keeps_argument_binding() { + let tokens = tokenize_without_trivia("Output(Output)"); + let indices = ident_indices(&tokens); + let [callee_index, binding_index] = indices.as_slice() else { + panic!("expected two identifier tokens in `Output(Output)`"); + }; + + assert_eq!(indices.len(), 2); + assert!(should_skip_binding_ident(&tokens, *callee_index)); + assert!(!should_skip_binding_ident(&tokens, *binding_index)); +} + +#[test] +fn should_skip_binding_ident_rejects_dotted_and_constructor_positions() { + let tokens = tokenize_without_trivia("Output(record.field, Item { value: value })"); + let indices = ident_indices(&tokens); + let [ + callee_index, + record_index, + field_index, + constructor_index, + field_label_index, + binding_index, + ] = indices.as_slice() + else { + panic!("expected six identifier tokens in dotted/constructor sample"); + }; + + assert_eq!(indices.len(), 6); + assert!(should_skip_binding_ident(&tokens, *callee_index)); + assert!(!should_skip_binding_ident(&tokens, *record_index)); + assert!(should_skip_binding_ident(&tokens, *field_index)); + assert!(should_skip_binding_ident(&tokens, *constructor_index)); + assert!(should_skip_binding_ident(&tokens, *field_label_index)); + assert!(!should_skip_binding_ident(&tokens, *binding_index)); +} diff --git a/src/parser/ast/tests/identifier_span.rs b/src/parser/ast/tests/identifier_span.rs new file mode 100644 index 00000000..a23aa111 --- /dev/null +++ b/src/parser/ast/tests/identifier_span.rs @@ -0,0 +1,80 @@ +//! Tests for identifier-span discovery helpers. + +use super::super::{AstNode, find_identifier_span, find_identifier_span_in_range}; +use crate::{parse, test_util::span_text}; + +fn first_rule(source: &str) -> super::super::Rule { + parse(source) + .root() + .rules() + .into_iter() + .next() + .unwrap_or_else(|| panic!("expected a parsed rule in `{source}`")) +} + +fn required_offset(source: &str, needle: &str) -> usize { + source + .find(needle) + .unwrap_or_else(|| panic!("expected `{needle}` in `{source}`")) +} + +#[test] +fn identifier_search_picks_match_within_repeated_name_subtree() { + let source = "Output(foo) :- Source(foo), Other(foo)."; + let rule = first_rule(source); + let other_start = required_offset(source, "Other(foo)"); + let other_end = other_start + "Other(foo)".len(); + + let span = find_identifier_span_in_range(rule.syntax(), &(other_start..other_end), "foo") + .unwrap_or_else(|| panic!("expected foo within Other(foo) in `{source}`")); + + assert_eq!(span_text(source, &span), "foo"); + assert_eq!(span.start, other_start + "Other(".len()); +} + +#[test] +fn identifier_search_returns_none_for_non_overlapping_range() { + let source = "Output(foo) :- Source(foo), Other(foo)."; + let rule = first_rule(source); + let output_start = required_offset(source, "Output"); + let output_end = output_start + "Output".len(); + + assert_eq!( + find_identifier_span_in_range(rule.syntax(), &(output_start..output_end), "foo"), + None + ); +} + +#[test] +fn identifier_search_matches_exact_nested_token_range() { + let source = "Output(result) :- var assigned_x = Seed(result)."; + let rule = first_rule(source); + let assign_start = required_offset(source, "assigned_x"); + let assign_end = assign_start + "assigned_x".len(); + let token_span = assign_start..assign_end; + + let span = find_identifier_span_in_range(rule.syntax(), &token_span, "assigned_x") + .unwrap_or_else(|| panic!("expected exact nested token match in `{source}`")); + + assert_eq!(span, token_span); +} + +#[test] +fn identifier_search_returns_none_when_name_is_absent() { + let source = "Output(result) :- Source(result)."; + let rule = first_rule(source); + + assert_eq!(find_identifier_span(rule.syntax(), "missing_name"), None); +} + +#[test] +fn identifier_search_prefers_first_match_in_source_order() { + let source = "Output(foo) :- Nested(foo, foo)."; + let rule = first_rule(source); + + let span = find_identifier_span(rule.syntax(), "foo") + .unwrap_or_else(|| panic!("expected to find foo in `{source}`")); + + assert_eq!(span_text(source, &span), "foo"); + assert_eq!(span.start, required_offset(source, "foo")); +} diff --git a/src/parser/ast/type_def.rs b/src/parser/ast/type_def.rs index 2541850f..609b46fe 100644 --- a/src/parser/ast/type_def.rs +++ b/src/parser/ast/type_def.rs @@ -25,6 +25,13 @@ impl TypeDef { super::take_first_ident(iter) } + /// Precise source span for the type name token, if present. + #[must_use] + pub fn name_span(&self) -> Option { + self.name() + .and_then(|name| super::find_identifier_span(&self.syntax, &name)) + } + /// Whether this declaration is `extern`. #[must_use] pub fn is_extern(&self) -> bool { @@ -39,7 +46,7 @@ impl_ast_node!(TypeDef); #[cfg(test)] mod tests { - use crate::parse; + use crate::{parse, test_util::span_text}; #[test] fn extern_type_parsed() { @@ -70,4 +77,25 @@ mod tests { assert_eq!(td.name().as_deref(), Some("UserId")); assert!(!td.is_extern()); } + + #[test] + fn typedef_name_span_points_to_declaration_identifier() { + let source = "typedef UserId = UserId"; + let parsed = parse(source); + crate::test_util::assert_no_parse_errors(parsed.errors()); + #[expect(clippy::expect_used, reason = "Using expect for clearer test failures")] + let td = parsed + .root() + .type_defs() + .first() + .cloned() + .expect("typedef missing"); + + let span = td + .name_span() + .unwrap_or_else(|| panic!("missing typedef name_span in `{source}`")); + + assert_eq!(span_text(source, &span), "UserId"); + assert_eq!(span.start, "typedef ".len()); + } } diff --git a/src/sema/builder.rs b/src/sema/builder.rs index 15aaa6fa..9246eb07 100644 --- a/src/sema/builder.rs +++ b/src/sema/builder.rs @@ -14,7 +14,7 @@ use crate::sema::model::{ SymbolId, SymbolOrigin, UseKind, }; -use super::resolve::collect_pattern_binding_names; +use super::resolve::{collect_head_binding_names, collect_pattern_binding_names}; pub(crate) struct SymbolSpec { pub(crate) name: String, @@ -22,6 +22,7 @@ pub(crate) struct SymbolSpec { pub(crate) origin: SymbolOrigin, pub(crate) scope: ScopeId, pub(crate) span: Span, + pub(crate) name_span: Option, pub(crate) source_order: usize, pub(crate) visible_from_rule_order: usize, } @@ -124,55 +125,10 @@ impl SemanticModelBuilder { pub(crate) fn collect_program_declarations(&mut self, root: &ast::Root) { let mut order = 0usize; - let mut declare_from_child = - |name: Option, - kind: DeclarationKind, - origin: SymbolOrigin, - child: &SyntaxNode| { - if let Some(name) = name { - self.declare_symbol(SymbolSpec { - name, - kind, - origin, - scope: self.program_scope, - span: text_range_to_span(child.text_range()), - source_order: order, - visible_from_rule_order: 0, - }); - order += 1; - } - }; - for child in root.syntax().children() { - match child.kind() { - SyntaxKind::N_RELATION_DECL => declare_from_child( - ast::Relation { - syntax: child.clone(), - } - .name(), - DeclarationKind::Relation, - SymbolOrigin::RelationDeclaration, - &child, - ), - SyntaxKind::N_FUNCTION => declare_from_child( - ast::Function { - syntax: child.clone(), - } - .name(), - DeclarationKind::Function, - SymbolOrigin::FunctionDeclaration, - &child, - ), - SyntaxKind::N_TYPE_DEF => declare_from_child( - ast::TypeDef { - syntax: child.clone(), - } - .name(), - DeclarationKind::Type, - SymbolOrigin::TypeDeclaration, - &child, - ), - _ => {} + if let Some(spec) = self.program_declaration_spec(&child, order) { + self.declare_symbol(spec); + order += 1; } } } @@ -197,17 +153,23 @@ impl SemanticModelBuilder { ); if let Ok(terms) = rule.body_terms() { - let spans: Vec = rule - .body_expression_nodes() - .into_iter() - .map(|node| node.span()) - .collect(); + let body_nodes = rule.body_expression_nodes(); + let body_nodes_len = body_nodes.len(); + let counts_match = body_nodes_len == terms.len(); + debug_assert_eq!( + body_nodes_len, + terms.len(), + "rule body node/term count mismatch should stay visible during semantic collection" + ); for (literal_index, term) in terms.into_iter().enumerate() { - let span = spans - .get(literal_index) - .cloned() - .unwrap_or_else(|| rule_span.clone()); + let maybe_node = if counts_match && literal_index < body_nodes_len { + body_nodes.get(literal_index) + } else { + None + }; + let span = + maybe_node.map_or_else(|| rule_span.clone(), ast::RuleBodyExpression::span); self.collect_rule_term( super::variables::VariableUseContext::new( rule_scope, @@ -215,6 +177,7 @@ impl SemanticModelBuilder { &span, literal_index, ), + maybe_node.map(ast::AstNode::syntax), &term, ); } @@ -243,8 +206,26 @@ impl SemanticModelBuilder { origin: SymbolOrigin::SemanticRuleHead, }, ); + // This is intentionally asymmetric with `traverse.rs`: AST rule-head + // bindings capture precise `SymbolSpec.name_span` values from + // `binding_spans`, but parse-time `SemanticRule` values do not + // retain the CST handles needed to recover those token spans here. + for binding_name in collect_head_binding_names(rule.head()) { + self.declare_symbol(SymbolSpec { + name: binding_name, + kind: DeclarationKind::RuleBinding, + origin: SymbolOrigin::SemanticRuleHead, + scope: rule_scope, + span: rule_span.clone(), + name_span: None, + source_order: self.symbols.len(), + visible_from_rule_order: 0, + }); + } for pattern in rule.patterns() { + // Parse-time semantic rules do not currently preserve CST token + // handles, so precise identifier spans are unavailable here. for binding_name in collect_pattern_binding_names(pattern) { self.declare_symbol(SymbolSpec { name: binding_name, @@ -252,6 +233,7 @@ impl SemanticModelBuilder { origin: SymbolOrigin::ForPattern, scope: rule_scope, span: rule_span.clone(), + name_span: None, source_order: self.symbols.len(), visible_from_rule_order: 0, }); @@ -283,6 +265,60 @@ impl SemanticModelBuilder { scope_id } + fn program_declaration_spec( + &self, + child: &SyntaxNode, + source_order: usize, + ) -> Option { + let (name, name_span, kind, origin) = match child.kind() { + SyntaxKind::N_RELATION_DECL => { + let relation = ast::Relation { + syntax: child.clone(), + }; + ( + relation.name(), + relation.name_span(), + DeclarationKind::Relation, + SymbolOrigin::RelationDeclaration, + ) + } + SyntaxKind::N_FUNCTION => { + let function = ast::Function { + syntax: child.clone(), + }; + ( + function.name(), + function.name_span(), + DeclarationKind::Function, + SymbolOrigin::FunctionDeclaration, + ) + } + SyntaxKind::N_TYPE_DEF => { + let type_def = ast::TypeDef { + syntax: child.clone(), + }; + ( + type_def.name(), + type_def.name_span(), + DeclarationKind::Type, + SymbolOrigin::TypeDeclaration, + ) + } + _ => return None, + }; + + name.map(|name| SymbolSpec { + name, + kind, + origin, + scope: self.program_scope, + span: text_range_to_span(child.text_range()), + name_span, + source_order, + visible_from_rule_order: 0, + }) + } + pub(crate) fn declare_symbol(&mut self, spec: SymbolSpec) -> SymbolId { let symbol_id = SymbolId(self.symbols.len()); self.symbols_by_scope_and_name @@ -297,6 +333,7 @@ impl SemanticModelBuilder { origin: spec.origin, scope: spec.scope, span: spec.span, + name_span: spec.name_span, source_order: spec.source_order, visible_from_rule_order: spec.visible_from_rule_order, }); diff --git a/src/sema/model.rs b/src/sema/model.rs index 015a56a6..735cea96 100644 --- a/src/sema/model.rs +++ b/src/sema/model.rs @@ -175,6 +175,7 @@ pub struct Symbol { pub(crate) origin: SymbolOrigin, pub(crate) scope: ScopeId, pub(crate) span: Span, + pub(crate) name_span: Option, pub(crate) source_order: usize, pub(crate) visible_from_rule_order: usize, } @@ -210,6 +211,12 @@ impl Symbol { &self.span } + /// Precise source span for the identifier token, when captured. + #[must_use] + pub fn name_span(&self) -> Option<&Span> { + self.name_span.as_ref() + } + /// Stable source-order index within the owning scope. #[must_use] pub fn source_order(&self) -> usize { diff --git a/src/sema/tests.rs b/src/sema/tests.rs index 42409629..f8c307eb 100644 --- a/src/sema/tests.rs +++ b/src/sema/tests.rs @@ -388,4 +388,5 @@ fn relation_read_helpers_ignore_head_only_and_unresolved_uses( ); } +mod name_span; mod unused_variable; diff --git a/src/sema/tests/name_span.rs b/src/sema/tests/name_span.rs new file mode 100644 index 00000000..5df8d01b --- /dev/null +++ b/src/sema/tests/name_span.rs @@ -0,0 +1,163 @@ +//! Semantic-model tests for precise identifier spans. + +use rstest::rstest; + +use super::{DeclarationKind, parse_ok, symbols_named}; +use crate::sema::SymbolOrigin; + +fn span_text<'a>(source: &'a str, span: &crate::Span) -> &'a str { + source.get(span.start..span.end).unwrap_or_else(|| { + panic!( + "invalid UTF-8 boundary for span {}..{} in `{source}`", + span.start, span.end + ) + }) +} + +fn find_symbol<'a>( + model: &'a super::super::SemanticModel, + name: &str, + kind: DeclarationKind, + origin: SymbolOrigin, +) -> &'a super::super::Symbol { + symbols_named(model, name, kind) + .into_iter() + .find(|symbol| symbol.origin() == origin) + .unwrap_or_else(|| panic!("missing symbol `{name}` with origin {origin:?}")) +} + +#[rstest] +#[case( + "input relation Source(x: u32)", + "Source", + DeclarationKind::Relation, + SymbolOrigin::RelationDeclaration, + "Source" +)] +#[case( + "function project(id: u32): u32 {}", + "project", + DeclarationKind::Function, + SymbolOrigin::FunctionDeclaration, + "project" +)] +#[case( + "function project(project: u32): u32 { project }", + "project", + DeclarationKind::Function, + SymbolOrigin::FunctionDeclaration, + "project" +)] +#[case( + "typedef UserId = u64", + "UserId", + DeclarationKind::Type, + SymbolOrigin::TypeDeclaration, + "UserId" +)] +fn top_level_declarations_capture_identifier_name_spans( + #[case] source: &str, + #[case] symbol_name: &str, + #[case] kind: DeclarationKind, + #[case] origin: SymbolOrigin, + #[case] expected_text: &str, +) { + let parsed = parse_ok(source); + let semantic_model = super::super::build(&parsed); + let symbol = find_symbol(&semantic_model, symbol_name, kind, origin); + + let name_span = symbol + .name_span() + .unwrap_or_else(|| panic!("missing name_span for `{symbol_name}`")); + + assert_eq!(span_text(source, name_span), expected_text); + assert_ne!(name_span, symbol.span()); +} + +#[rstest] +fn semantic_rule_head_bindings_leave_name_span_unavailable() { + let source = "for (item_y in Source(item_y)) Output(item_y)."; + let parsed = parse_ok(source); + let semantic_model = super::super::build(&parsed); + let symbol = find_symbol( + &semantic_model, + "item_y", + DeclarationKind::RuleBinding, + SymbolOrigin::SemanticRuleHead, + ); + + assert!(symbol.name_span().is_none()); + assert_eq!(span_text(source, symbol.span()), source); +} + +#[rstest] +#[case( + "Output(Output) :- Output(_).", + "Output", + SymbolOrigin::RuleHead, + "Output", + Some(7_usize), + "Output(Output) :- Output(_)." +)] +#[case( + "Output(result) :- Source(result), var assigned_x = Seed(result).", + "assigned_x", + SymbolOrigin::AssignmentPattern, + "assigned_x", + None, + "var assigned_x = Seed(result)" +)] +#[case( + "Output(result) :- Source(result), for (item_y in Items(result)) Inner(result).", + "item_y", + SymbolOrigin::ForPattern, + "item_y", + None, + "for (item_y in Items(result)) Inner(result)" +)] +fn binding_declarations_capture_identifier_name_spans( + #[case] source: &str, + #[case] symbol_name: &str, + #[case] origin: SymbolOrigin, + #[case] expected_name_text: &str, + #[case] expected_name_span_start: Option, + #[case] expected_span_text: &str, +) { + let parsed = parse_ok(source); + let semantic_model = super::super::build(&parsed); + let symbol = find_symbol( + &semantic_model, + symbol_name, + DeclarationKind::RuleBinding, + origin, + ); + + let name_span = symbol + .name_span() + .unwrap_or_else(|| panic!("missing binding name_span for `{symbol_name}`")); + + assert_eq!(span_text(source, name_span), expected_name_text); + if let Some(start) = expected_name_span_start { + assert_eq!(name_span.start, start); + } + assert_eq!(span_text(source, symbol.span()), expected_span_text); +} + +#[rstest] +fn nested_for_loop_bindings_lack_name_span() { + let source = "Output(x) :- Source(x), for (outer in Items(x)) for (inner in Nested(outer)) Inner(inner)."; + let parsed = parse_ok(source); + let semantic_model = super::super::build(&parsed); + let symbol = find_symbol( + &semantic_model, + "inner", + DeclarationKind::RuleBinding, + SymbolOrigin::ForPattern, + ); + + assert!(symbol.name_span().is_none()); + assert_eq!( + span_text(source, symbol.span()), + "for (outer in Items(x)) for (inner in Nested(outer)) Inner(inner)" + ); +} diff --git a/src/sema/traverse.rs b/src/sema/traverse.rs index dc93dbce..514d088e 100644 --- a/src/sema/traverse.rs +++ b/src/sema/traverse.rs @@ -1,14 +1,16 @@ //! Expression and rule-body traversal for semantic-model collection. +use rowan::SyntaxNode; + use crate::Span; use crate::parser::ast; -use crate::parser::ast::{Expr, RuleBodyTerm}; +use crate::parser::ast::{Expr, RuleBodyTerm, find_identifier_span}; use crate::sema::model::{ DeclarationKind, ScopeId, ScopeKind, ScopeOrigin, SymbolOrigin, UseKind, UseOrigin, UseSite, }; use super::builder::{ScopeSpec, SemanticModelBuilder, SymbolSpec}; -use super::resolve::{collect_head_binding_names, collect_pattern_binding_names, relation_name}; +use super::resolve::{collect_pattern_binding_names, relation_name}; use super::variables::VariableUseContext; #[derive(Clone, Copy)] @@ -18,11 +20,34 @@ pub(super) struct RuleHeadContext<'a> { pub(super) origin: SymbolOrigin, } +/// Collected inputs needed to declare a pattern binding symbol. +#[derive(Clone, Copy)] +struct PatternBindingSpec<'a> { + binding_name: &'a str, + origin: SymbolOrigin, + scope: ScopeId, + span: &'a Span, + syntax: Option<&'a SyntaxNode>, + visible_from_rule_order: usize, +} + impl SemanticModelBuilder { pub(crate) fn collect_rule_heads(&mut self, ctx: RuleHeadContext<'_>, rule: &ast::Rule) { if let Ok(heads) = rule.heads() { for head in heads { self.collect_head_expr(&head.atom, ctx); + for (binding_name, name_span) in &head.binding_spans { + self.declare_symbol(SymbolSpec { + name: binding_name.clone(), + kind: DeclarationKind::RuleBinding, + origin: ctx.origin, + scope: ctx.scope, + span: ctx.span.clone(), + name_span: Some(name_span.clone()), + source_order: self.symbols.len(), + visible_from_rule_order: 0, + }); + } if let Some(location) = head.location.as_ref() { self.walk_variable_uses( location, @@ -39,31 +64,25 @@ impl SemanticModelBuilder { UseOrigin::RelationHead, expr, ); - for binding_name in collect_head_binding_names(expr) { - self.declare_symbol(SymbolSpec { - name: binding_name, - kind: DeclarationKind::RuleBinding, - origin: ctx.origin, - scope: ctx.scope, - span: ctx.span.clone(), - source_order: self.symbols.len(), - visible_from_rule_order: 0, - }); - } } pub(crate) fn collect_rule_term( &mut self, context: VariableUseContext<'_>, + syntax: Option<&SyntaxNode>, term: &RuleBodyTerm, ) { match term { RuleBodyTerm::Expression(expr) => self.collect_expression_term(expr, context), - RuleBodyTerm::Assignment(assign) => self.collect_assignment_term(assign, context), + RuleBodyTerm::Assignment(assign) => { + self.collect_assignment_term(assign, syntax, context); + } RuleBodyTerm::Aggregation(aggregation) => { self.collect_aggregation_term(aggregation, context); } - RuleBodyTerm::ForLoop(for_loop) => self.collect_for_loop_term(for_loop, context), + RuleBodyTerm::ForLoop(for_loop) => { + self.collect_for_loop_term(for_loop, syntax, context); + } } } @@ -75,17 +94,17 @@ impl SemanticModelBuilder { fn collect_assignment_term( &mut self, assign: &ast::RuleAssignment, + syntax: Option<&SyntaxNode>, context: VariableUseContext<'_>, ) { self.walk_variable_uses(&assign.value, context); for binding_name in collect_pattern_binding_names(&assign.pattern) { - self.declare_symbol(SymbolSpec { - name: binding_name, - kind: DeclarationKind::RuleBinding, + self.declare_pattern_binding(PatternBindingSpec { + binding_name: &binding_name, origin: SymbolOrigin::AssignmentPattern, scope: context.current_scope(), - span: context.span().clone(), - source_order: self.symbols.len(), + span: context.span(), + syntax, visible_from_rule_order: context.literal_index() + 1, }); } @@ -103,6 +122,7 @@ impl SemanticModelBuilder { fn collect_for_loop_term( &mut self, for_loop: &ast::RuleForLoop, + syntax: Option<&SyntaxNode>, context: VariableUseContext<'_>, ) { self.record_top_level_relation_use(context, UseOrigin::ForIterable, &for_loop.iterable); @@ -121,18 +141,22 @@ impl SemanticModelBuilder { span: context.span().clone(), }); for binding_name in collect_pattern_binding_names(&for_loop.pattern) { - self.declare_symbol(SymbolSpec { - name: binding_name, - kind: DeclarationKind::RuleBinding, + self.declare_pattern_binding(PatternBindingSpec { + binding_name: &binding_name, origin: SymbolOrigin::ForPattern, scope: child_scope, - span: context.span().clone(), - source_order: self.symbols.len(), + span: context.span(), + syntax, visible_from_rule_order: 0, }); } for (term_offset, nested_term) in for_loop.body_terms.iter().enumerate() { + // Nested bodies recurse through `collect_rule_term`, but the + // `VariableUseContext::new(...)` path no longer retains CST handles + // for those inner terms. We therefore pass `None` here + // deliberately so nested bindings do not claim precise + // `name_span`s we cannot source faithfully. self.collect_rule_term( VariableUseContext::new( child_scope, @@ -140,6 +164,7 @@ impl SemanticModelBuilder { context.span(), context.rule_order_limit(), ), + None, nested_term, ); } @@ -166,4 +191,20 @@ impl SemanticModelBuilder { resolution, }); } + + /// Declare a rule-body pattern binding with optional precise CST-backed span data. + fn declare_pattern_binding(&mut self, spec: PatternBindingSpec<'_>) { + self.declare_symbol(SymbolSpec { + name: spec.binding_name.to_string(), + kind: DeclarationKind::RuleBinding, + origin: spec.origin, + scope: spec.scope, + span: spec.span.clone(), + name_span: spec + .syntax + .and_then(|s| find_identifier_span(s, spec.binding_name)), + source_order: self.symbols.len(), + visible_from_rule_order: spec.visible_from_rule_order, + }); + } } diff --git a/src/test_util/mod.rs b/src/test_util/mod.rs index bada16ba..d86a0b8d 100644 --- a/src/test_util/mod.rs +++ b/src/test_util/mod.rs @@ -44,6 +44,28 @@ pub fn tokenize(src: &str) -> Vec<(SyntaxKind, Span)> { tokenize_with_trivia(src) } +/// Slice `source` by a validated byte span for test assertions. +/// +/// # Examples +/// +/// ```rust,no_run +/// # #[cfg(feature = "test-support")] +/// # { +/// use ddlint::test_util::span_text; +/// +/// assert_eq!(span_text("abcdef", &(1..4)), "bcd"); +/// # } +/// ``` +#[must_use] +pub fn span_text<'a>(source: &'a str, span: &Span) -> &'a str { + source.get(span.start..span.end).unwrap_or_else(|| { + panic!( + "invalid UTF-8 boundary for span {}..{} in `{source}`", + span.start, span.end + ) + }) +} + /// Typed wrapper for variable and function names. #[derive(Debug, Clone)] pub struct Name(pub(crate) String);