Skip to content

Structure checks#58

Open
VDFaller wants to merge 10 commits into
mainfrom
structure-checks
Open

Structure checks#58
VDFaller wants to merge 10 commits into
mainfrom
structure-checks

Conversation

@VDFaller
Copy link
Copy Markdown
Owner

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces two new “structure” lint rules—model_directories and source_directories—with optional --fix support to automatically move misplaced model SQL/YAML files and split/move source YAML blocks into per-source files.

Changes:

  • Add model_directories and source_directories selectors, configuration defaults, and rule documentation.
  • Implement model/source directory checks that emit new MoveModelFile / MoveSourceToFile writeback changes.
  • Add writeback implementations for moving model files and splitting/moving sources (Rust and Python/ruamel paths), plus integration/unit tests and shared ruamel utilities.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/integration_jaffle_shop.rs Adds ignored integration tests covering model/source directory fixes.
src/writeback/rust.rs Implements Rust writeback for splitting/moving source blocks + unit tests.
src/writeback/python.rs Adds Python writeback for source moves (ruamel helper) and handles model file moves.
src/writeback/mod.rs Adds source writeback dispatcher and move-map collection utilities.
src/main.rs Invokes source writeback when --fix is enabled and source changes exist.
src/config.rs Adds new selectors + configurable directory/prefix settings + ModelType classifier and tests.
src/check/sources.rs Adds source_directories check producing MoveSourceToFile changes.
src/check/models.rs Adds model_directories check producing model/YAML move changes and fix-path logic + tests.
src/check/mod.rs Stores and exposes source_changes in CheckResult.
src/change_descriptors.rs Introduces SourceChange::MoveSourceToFile.
scripts/yaml_utils.py New shared ruamel helpers for loading/writing YAML with formatting preserved.
scripts/ruamel_source_changes.py New ruamel helper to move a single source block between YAML files.
scripts/ruamel_normalize_layout.py Refactors to use shared yaml_utils.
scripts/ruamel_model_changes.py Refactors to use shared yaml_utils.
docs/rules/source_directories.md Documents new source_directories rule and autofix behavior.
docs/rules/model_directories.md Documents new model_directories rule and autofix algorithm.
docs/rules/index.md Adds both new rules to the rules index.
Comments suppressed due to low confidence (1)

scripts/ruamel_model_changes.py:165

  • load_request() raises YamlHelperError on invalid JSON, but main() only catches PatchError. As a result, malformed input will crash with a traceback instead of returning a clean non-zero exit and stderr message. Catch YamlHelperError (or broaden the handler) in main() to keep error handling consistent with the other helper scripts.
def main() -> int:
    try:
        payload = load_request()
        results = apply_updates(payload)
    except PatchError as exc:
        print(str(exc), file=sys.stderr)
        return 1

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/yaml_utils.py
Comment on lines +54 to +57
"""True when no models or sources remain."""
return not doc.get("models") and not doc.get("sources")


Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document_is_empty() treats a document as empty whenever it lacks models and sources, which means write_or_remove() can delete YAML files that still contain other top-level sections (e.g., exposures, metrics, seeds, etc.). This can cause unintended data loss when moving/removing models/sources from a shared schema file. Consider only deleting when there are no models/sources and no other meaningful keys remain (optionally allowing a lone version key).

Suggested change
"""True when no models or sources remain."""
return not doc.get("models") and not doc.get("sources")
"""True when no models/sources remain and no other meaningful keys exist.
A document is considered empty if:
- it has no models and no sources, and
- it has no other top-level keys except an optional ``version``.
"""
# If there are any models or sources, the document is not empty.
if doc.get("models") or doc.get("sources"):
return False
# Ignore bookkeeping keys when checking for remaining content.
allowed_empty_keys = {"models", "sources", "version"}
for key in doc.keys():
if key not in allowed_empty_keys:
# Some other top-level section (e.g., exposures, metrics, seeds) remains.
return False
# No models/sources and no other meaningful keys (possibly only version).
return True

Copilot uses AI. Check for mistakes.
Comment thread src/writeback/rust.rs
Comment on lines +352 to +356
/// A source file is empty (safe to delete) when it has no sources and no models.
/// Leftover top-level fields like `version: 2` are not meaningful on their own.
fn source_file_is_empty(doc: &PropertyFile) -> bool {
doc.models.as_ref().is_none_or(|m| m.is_empty())
&& doc.sources.as_ref().is_none_or(|s| s.is_empty())
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source_file_is_empty() ignores doc.extras, so the writeback can delete an old YAML file even if it still has other top-level content beyond models/sources (those fields deserialize into extras). This risks losing unrelated schema content that shared the file. Align the deletion condition with property_file_is_empty() (or at least preserve non-trivial extras, e.g. delete only if extras is empty or only contains version).

Suggested change
/// A source file is empty (safe to delete) when it has no sources and no models.
/// Leftover top-level fields like `version: 2` are not meaningful on their own.
fn source_file_is_empty(doc: &PropertyFile) -> bool {
doc.models.as_ref().is_none_or(|m| m.is_empty())
&& doc.sources.as_ref().is_none_or(|s| s.is_empty())
/// A source file is empty (safe to delete) when it has no sources, no models,
/// and no non-trivial extra top-level fields.
/// We only ignore trivial extras like an optional `version` field.
fn source_file_is_empty(doc: &PropertyFile) -> bool {
let extras_are_trivial = match &doc.extras {
None => true,
Some(extras) => {
if extras.is_empty() {
true
} else if extras.len() == 1 {
extras.contains_key("version")
} else {
false
}
}
};
extras_are_trivial
&& doc
.models
.as_ref()
.is_none_or(|m| m.is_empty())
&& doc
.sources
.as_ref()
.is_none_or(|s| s.is_empty())

Copilot uses AI. Check for mistakes.
Comment thread src/check/models.rs
Comment on lines +840 to +843
if let Some(name) = current.file_name().and_then(|n| n.to_str()) {
if name == expected_dir {
return Some(current.join(filename));
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_fix_path() returns Some(current.join(filename)) when it finds expected_dir higher in the path, which drops any subdirectory nesting below expected_dir. For example, models/marts/intermediate/stripe/dim_x.sql fixing to marts would become models/marts/dim_x.sql (losing stripe/). If the intent is to preserve subdirectories below the wrong tier (per the docs), adjust this branch to keep the relative path beneath the nearest wrong recognized directory (and avoid duplicating expected_dir when its parent is already the expected tier).

Copilot uses AI. Check for mistakes.
Comment thread src/writeback/python.rs
Comment on lines +169 to +195
// doesn't actually require python help, so we can just do it in Rust and not include in batch updates
ModelChange::MoveModelFile {
patch_path,
new_path,
..
} => {
let patch =
patch_path
.clone()
.ok_or_else(|| WriteBackError::PatchPathMissing {
model_id: model_changes.model_id.clone(),
})?;
let src = if patch.is_absolute() {
patch
} else {
project_root.join(patch)
};
let dst = if new_path.is_absolute() {
new_path.clone()
} else {
project_root.join(new_path)
};
if let Some(parent) = dst.parent() {
std::fs::create_dir_all(parent)?;
}
std::fs::rename(&src, &dst)?;
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply_with_python() still requires model_changes.patch_path earlier in the loop, so models that only have MoveModelFile changes (e.g., directory fixes for SQL-only models without a properties YAML) will hit PatchPathMissing before this branch is reached. Consider handling MoveModelFile changes before requiring a properties patch_path, or restructuring writeback so file moves don’t depend on grouping/patch-path availability.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants