Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 30 additions & 17 deletions docs/ddlint-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -447,7 +447,9 @@ The core contracts are:
semantic failures without leaking internal implementation types.
- Provenance: semantic nodes must retain source mappings back to the lossless
syntax layer for rule heads, body terms, aggregations, attributes, and
adornments, so diagnostics and later rewrites remain explainable.
adornments, so diagnostics and later rewrites remain explainable. For symbol
records specifically, coarse declaration or literal provenance must stay
distinct from any precise identifier-token span used for diagnostics.
- Semantic model completeness: `ddlog-sema` must expose typed structures for
constructs that downstream planners care about, including joins, filters,
projections, aggregations, recursion boundaries, and semantic facts such as
Expand Down Expand Up @@ -734,10 +736,20 @@ genuine reads.
`for`-loop pattern, and it treats a binding as used only when semantic
resolution maps a `UseKind::Variable` site back to that exact symbol. The
wildcard `_` remains an explicit ignore and is never recorded as a
warning-eligible binding. As of roadmap item `4.1.2`, rule-local binding spans
still point at the enclosing rule or literal site rather than the exact
identifier token, so diagnostics for these rules use that existing coarse
provenance.
warning-eligible binding.

Each `Symbol` now carries two related provenance fields:

- `span` remains the existing coarse provenance used for debugging, maps, and
ownership; it points at the enclosing declaration, rule, or body term that
introduced the symbol.
- `name_span` is an additive optional field that points at the precise
identifier token when semantic collection had CST access to capture it.

Rules such as `unused-variable` should prefer `name_span` for diagnostic
highlighting and fall back to `span` only when precise token provenance is not
available, such as parse-time `SemanticRule` values that retain only lowered
expressions plus a coarse source span.

Resolution is deliberately tri-state:

Expand All @@ -759,8 +771,9 @@ body scope and do not escape to later top-level literals.

This pass consumes both AST-backed rules and parse-time `SemanticRule` values
from top-level `for` desugaring, so the semantic view matches the parser's
existing rule surface. Provenance is retained through source spans on
declarations and on the enclosing literal or rule site for rule-local facts.
existing rule surface. Provenance is retained through coarse source spans on
declarations and on the enclosing literal or rule site for rule-local facts,
with precise identifier-token spans captured additively when CST access exists.

### 3.3. Initial lint rule catalog

Expand All @@ -771,16 +784,16 @@ suggestions, establishing a solid foundation of essential lints.

Table: DDLint rule catalogue and metadata.

| Rule Name | Group | Default Level | Autofixable | Description |
| ---------------------- | ----------- | ------------- | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| unused-relation | correctness | warn | No | Detects declared relations with no resolved read-like uses in rule bodies, `for` iterables, or `for` guards; rule heads count only as writes. |
| unused-variable | correctness | warn | No | Detects rule-local bindings from rule heads, assignment patterns, or `for` patterns that never receive a resolved variable use in the same rule; `_` is an explicit ignore. |
| shadowed-variable | correctness | warn | No | Detects when a variable binding in a literal shadows one from a preceding literal in the same rule body. |
| recursive-negation | correctness | error | No | Detects rules with recursion through negation, which leads to unsafe, non-monotonic programs. |
| inefficient-join-order | performance | hint | No | Evaluates rule bodies and suggests reordering atoms for a more efficient join plan, e.g., placing more restrictive literals first. |
| superfluous-group-by | performance | warn | Yes | Detects group_by clauses where the aggregation is trivial (e.g., grouping by all variables) and can be removed. |
| consistent-casing | style | allow | Yes | Enforces a consistent casing style for relation and type identifiers (e.g., PascalCase) and variables (e.g., snake_case). |
| no-magic-numbers | style | allow | No | Flags the use of unnamed numeric literals in rule bodies where a named constant might be clearer. |
| Rule Name | Group | Default Level | Autofixable | Description |
| ---------------------- | ----------- | ------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| unused-relation | correctness | warn | No | Detects declared relations with no resolved read-like uses in rule bodies, `for` iterables, or `for` guards; rule heads count only as writes. |
| unused-variable | correctness | warn | No | Detects rule-local bindings from rule heads, assignment patterns, or `for` patterns that never receive a resolved variable use in the same rule; `_` is an explicit ignore, and diagnostics prefer precise identifier-token spans when available. |
| shadowed-variable | correctness | warn | No | Detects when a variable binding in a literal shadows one from a preceding literal in the same rule body. |
| recursive-negation | correctness | error | No | Detects rules with recursion through negation, which leads to unsafe, non-monotonic programs. |
| inefficient-join-order | performance | hint | No | Evaluates rule bodies and suggests reordering atoms for a more efficient join plan, e.g., placing more restrictive literals first. |
| superfluous-group-by | performance | warn | Yes | Detects group_by clauses where the aggregation is trivial (e.g., grouping by all variables) and can be removed. |
| consistent-casing | style | allow | Yes | Enforces a consistent casing style for relation and type identifiers (e.g., PascalCase) and variables (e.g., snake_case). |
| no-magic-numbers | style | allow | No | Flags the use of unnamed numeric literals in rule bodies where a named constant might be clearer. |

This table serves as a concrete work breakdown for the engineering team and
clearly communicates the linter's initial capabilities and priorities to early
Expand Down
11 changes: 8 additions & 3 deletions src/linter/rules/correctness/unused_variable.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
//! The rule operates over semantic `RuleBinding` symbols rather than raw CST
//! name matching so shadowing, unresolved names, and wildcard ignores follow
//! the semantic model's existing contracts. Diagnostics currently use the
//! binding symbol's stored span, which points at the enclosing rule or literal
//! rather than the exact identifier token.
//! binding symbol's precise identifier span when semantic collection captured
//! one, falling back to the existing coarse provenance otherwise.

use crate::linter::{LintDiagnostic, Rule};
use crate::{SyntaxKind, declare_lint};
Expand All @@ -30,7 +30,8 @@ declare_lint! {
continue;
}

let Some(range) = crate::linter::span_utils::span_to_text_range(symbol.span()) else {
let span = symbol.name_span().unwrap_or_else(|| symbol.span());
let Some(range) = crate::linter::span_utils::span_to_text_range(span) else {
continue;
};

Expand Down Expand Up @@ -80,6 +81,10 @@ mod tests {
.map(crate::linter::LintDiagnostic::message),
Some("variable `head_x` is defined but never used in this rule")
);
assert_eq!(
diagnostics.first().map(crate::linter::LintDiagnostic::span),
Some(rowan::TextRange::new(7.into(), 13.into()))
);
}

#[rstest]
Expand Down
7 changes: 7 additions & 0 deletions src/parser/ast/function.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,13 @@ impl Function {
})
}

/// Precise source span for the function name token, if present.
#[must_use]
pub fn name_span(&self) -> Option<crate::Span> {
self.name()
.and_then(|name| super::find_identifier_span(&self.syntax, &name))
}

/// Returns `true` if declared with the `extern` keyword.
#[must_use]
pub fn is_extern(&self) -> bool {
Expand Down
44 changes: 41 additions & 3 deletions src/parser/ast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,55 @@
//! tests and fixtures to assemble match expressions without dipping into
//! private modules.

use rowan::SyntaxElement;
use rowan::{NodeOrToken, SyntaxElement, SyntaxNode};

use self::parse_utils::is_trivia;
use crate::{DdlogLanguage, SyntaxKind};
use crate::parser::ast::rule::text_range_to_span;
use crate::{DdlogLanguage, Span, SyntaxKind};

/// Common interface for AST wrappers.
pub(crate) trait AstNode {
/// Access the underlying syntax node.
fn syntax(&self) -> &rowan::SyntaxNode<DdlogLanguage>;
}

/// Find the first identifier token with matching text within a syntax subtree.
#[must_use]
pub(crate) fn find_identifier_span(syntax: &SyntaxNode<DdlogLanguage>, name: &str) -> Option<Span> {
find_identifier_span_in_range(syntax, &text_range_to_span(syntax.text_range()), name)
}

/// Find the first identifier token with matching text within a span-bounded subtree search.
#[must_use]
pub(crate) fn find_identifier_span_in_range(
syntax: &SyntaxNode<DdlogLanguage>,
search_span: &Span,
name: &str,
) -> Option<Span> {
syntax.children_with_tokens().find_map(|child| match child {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (review_instructions): This new match expression has three branches and should be extracted into a separate predicate/helper function to comply with the conditional branching guideline.

The closure passed to find_map contains a match with three arms. Per the instruction to move conditionals with more than two branches into a predicate function, consider extracting this logic into a small helper (e.g. fn identifier_span_for_child(...) -> Option<Span>), and then call that from find_map instead of inlining the full match here. That keeps the branching logic isolated and the main function simpler.

Review instructions:

Path patterns: **/*.rs

Instructions:
Move conditionals with >2 branches into a predicate function.

NodeOrToken::Token(token)
if token.kind() == SyntaxKind::T_IDENT && token.text() == name =>
{
let token_span = text_range_to_span(token.text_range());
span_contains(search_span, &token_span).then_some(token_span)
}
NodeOrToken::Node(node) => {
ranges_overlap(search_span, &text_range_to_span(node.text_range()))
.then(|| find_identifier_span_in_range(&node, search_span, name))
.flatten()
}
NodeOrToken::Token(_) => None,
})
}

fn ranges_overlap(left: &Span, right: &Span) -> bool {
left.start < right.end && right.start < left.end
}

fn span_contains(container: &Span, candidate: &Span) -> bool {
container.start <= candidate.start && candidate.end <= container.end
}

macro_rules! impl_ast_node {
($ty:ty) => {
impl AstNode for $ty {
Expand Down Expand Up @@ -191,4 +229,4 @@ mod tests {
let parsed = parse(src);
assert_eq!(parsed.root().imports().len(), 1);
}
}
}
7 changes: 7 additions & 0 deletions src/parser/ast/relation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,13 @@ impl Relation {
})
}

/// Precise source span for the relation name token, if present.
#[must_use]
pub fn name_span(&self) -> Option<crate::Span> {
self.name()
.and_then(|name| super::find_identifier_span(&self.syntax, &name))
}

/// Returns `true` if declared with the `input` keyword.
#[must_use]
pub fn is_input(&self) -> bool {
Expand Down
57 changes: 56 additions & 1 deletion src/parser/ast/rule_head.rs
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ pub struct RuleHead {
pub atom: Expr,
/// Optional `@` location expression.
pub location: Option<Expr>,
/// Source span covering only the head atom.
pub span: Span,
/// Precise identifier-token spans for bindings introduced by the head atom.
pub binding_spans: Vec<(String, Span)>,
}

pub(crate) fn parse_rule_heads(
Expand Down Expand Up @@ -162,16 +166,67 @@ fn parse_rule_head_span(
segment_offset.saturating_add(location_start),
)?);

return Ok(Some(RuleHead { atom, location }));
let binding_spans = collect_head_binding_spans(atom_src, segment_offset);

return Ok(Some(RuleHead {
atom,
location,
span: segment_offset..segment_offset.saturating_add(at_span.start),
binding_spans,
}));
}

let binding_spans = collect_head_binding_spans(trimmed, segment_offset);
let atom = lower_by_ref_head(parse_expr_with_offset(trimmed, segment_offset)?);
Ok(Some(RuleHead {
atom,
location: None,
span: segment_offset..segment_offset.saturating_add(trimmed.len()),
binding_spans,
}))
}

fn collect_head_binding_spans(src: &str, base_offset: usize) -> Vec<(String, Span)> {
let tokens = tokenize_without_trivia(src);
let mut bindings = Vec::new();

for (index, (kind, span)) in tokens.iter().enumerate() {
if *kind != SyntaxKind::T_IDENT {
continue;
}

let text = src.get(span.clone()).unwrap_or("");
if should_skip_binding_ident(&tokens, index) || text == "_" {
continue;
}

if bindings.iter().any(|(existing, _)| existing == text) {
continue;
}

bindings.push((
text.to_string(),
base_offset.saturating_add(span.start)..base_offset.saturating_add(span.end),
));
}

bindings
}

fn should_skip_binding_ident(tokens: &[(SyntaxKind, Span)], index: usize) -> bool {
let prev_kind = index
.checked_sub(1)
.and_then(|prev| tokens.get(prev))
.map(|(kind, _)| *kind);
let next_kind = tokens.get(index + 1).map(|(kind, _)| *kind);

prev_kind == Some(SyntaxKind::T_DOT)
|| matches!(
next_kind,
Some(SyntaxKind::T_LPAREN | SyntaxKind::T_LBRACE | SyntaxKind::T_COLON)
)
}

fn parse_expr_with_offset(src: &str, base_offset: usize) -> Result<Expr, Vec<Simple<SyntaxKind>>> {
let (start, end) = trim_byte_range(src);
let trimmed = src.get(start..end).unwrap_or("");
Expand Down
7 changes: 7 additions & 0 deletions src/parser/ast/type_def.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,13 @@ impl TypeDef {
super::take_first_ident(iter)
}

/// Precise source span for the type name token, if present.
#[must_use]
pub fn name_span(&self) -> Option<crate::Span> {
self.name()
.and_then(|name| super::find_identifier_span(&self.syntax, &name))
}

/// Whether this declaration is `extern`.
#[must_use]
pub fn is_extern(&self) -> bool {
Expand Down
Loading