Structure checks#58
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces two new “structure” lint rules—model_directories and source_directories—with optional --fix support to automatically move misplaced model SQL/YAML files and split/move source YAML blocks into per-source files.
Changes:
- Add
model_directoriesandsource_directoriesselectors, configuration defaults, and rule documentation. - Implement model/source directory checks that emit new
MoveModelFile/MoveSourceToFilewriteback changes. - Add writeback implementations for moving model files and splitting/moving sources (Rust and Python/ruamel paths), plus integration/unit tests and shared ruamel utilities.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/integration_jaffle_shop.rs | Adds ignored integration tests covering model/source directory fixes. |
| src/writeback/rust.rs | Implements Rust writeback for splitting/moving source blocks + unit tests. |
| src/writeback/python.rs | Adds Python writeback for source moves (ruamel helper) and handles model file moves. |
| src/writeback/mod.rs | Adds source writeback dispatcher and move-map collection utilities. |
| src/main.rs | Invokes source writeback when --fix is enabled and source changes exist. |
| src/config.rs | Adds new selectors + configurable directory/prefix settings + ModelType classifier and tests. |
| src/check/sources.rs | Adds source_directories check producing MoveSourceToFile changes. |
| src/check/models.rs | Adds model_directories check producing model/YAML move changes and fix-path logic + tests. |
| src/check/mod.rs | Stores and exposes source_changes in CheckResult. |
| src/change_descriptors.rs | Introduces SourceChange::MoveSourceToFile. |
| scripts/yaml_utils.py | New shared ruamel helpers for loading/writing YAML with formatting preserved. |
| scripts/ruamel_source_changes.py | New ruamel helper to move a single source block between YAML files. |
| scripts/ruamel_normalize_layout.py | Refactors to use shared yaml_utils. |
| scripts/ruamel_model_changes.py | Refactors to use shared yaml_utils. |
| docs/rules/source_directories.md | Documents new source_directories rule and autofix behavior. |
| docs/rules/model_directories.md | Documents new model_directories rule and autofix algorithm. |
| docs/rules/index.md | Adds both new rules to the rules index. |
Comments suppressed due to low confidence (1)
scripts/ruamel_model_changes.py:165
load_request()raisesYamlHelperErroron invalid JSON, butmain()only catchesPatchError. As a result, malformed input will crash with a traceback instead of returning a clean non-zero exit and stderr message. CatchYamlHelperError(or broaden the handler) inmain()to keep error handling consistent with the other helper scripts.
def main() -> int:
try:
payload = load_request()
results = apply_updates(payload)
except PatchError as exc:
print(str(exc), file=sys.stderr)
return 1
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """True when no models or sources remain.""" | ||
| return not doc.get("models") and not doc.get("sources") | ||
|
|
||
|
|
There was a problem hiding this comment.
document_is_empty() treats a document as empty whenever it lacks models and sources, which means write_or_remove() can delete YAML files that still contain other top-level sections (e.g., exposures, metrics, seeds, etc.). This can cause unintended data loss when moving/removing models/sources from a shared schema file. Consider only deleting when there are no models/sources and no other meaningful keys remain (optionally allowing a lone version key).
| """True when no models or sources remain.""" | |
| return not doc.get("models") and not doc.get("sources") | |
| """True when no models/sources remain and no other meaningful keys exist. | |
| A document is considered empty if: | |
| - it has no models and no sources, and | |
| - it has no other top-level keys except an optional ``version``. | |
| """ | |
| # If there are any models or sources, the document is not empty. | |
| if doc.get("models") or doc.get("sources"): | |
| return False | |
| # Ignore bookkeeping keys when checking for remaining content. | |
| allowed_empty_keys = {"models", "sources", "version"} | |
| for key in doc.keys(): | |
| if key not in allowed_empty_keys: | |
| # Some other top-level section (e.g., exposures, metrics, seeds) remains. | |
| return False | |
| # No models/sources and no other meaningful keys (possibly only version). | |
| return True |
| /// A source file is empty (safe to delete) when it has no sources and no models. | ||
| /// Leftover top-level fields like `version: 2` are not meaningful on their own. | ||
| fn source_file_is_empty(doc: &PropertyFile) -> bool { | ||
| doc.models.as_ref().is_none_or(|m| m.is_empty()) | ||
| && doc.sources.as_ref().is_none_or(|s| s.is_empty()) |
There was a problem hiding this comment.
source_file_is_empty() ignores doc.extras, so the writeback can delete an old YAML file even if it still has other top-level content beyond models/sources (those fields deserialize into extras). This risks losing unrelated schema content that shared the file. Align the deletion condition with property_file_is_empty() (or at least preserve non-trivial extras, e.g. delete only if extras is empty or only contains version).
| /// A source file is empty (safe to delete) when it has no sources and no models. | |
| /// Leftover top-level fields like `version: 2` are not meaningful on their own. | |
| fn source_file_is_empty(doc: &PropertyFile) -> bool { | |
| doc.models.as_ref().is_none_or(|m| m.is_empty()) | |
| && doc.sources.as_ref().is_none_or(|s| s.is_empty()) | |
| /// A source file is empty (safe to delete) when it has no sources, no models, | |
| /// and no non-trivial extra top-level fields. | |
| /// We only ignore trivial extras like an optional `version` field. | |
| fn source_file_is_empty(doc: &PropertyFile) -> bool { | |
| let extras_are_trivial = match &doc.extras { | |
| None => true, | |
| Some(extras) => { | |
| if extras.is_empty() { | |
| true | |
| } else if extras.len() == 1 { | |
| extras.contains_key("version") | |
| } else { | |
| false | |
| } | |
| } | |
| }; | |
| extras_are_trivial | |
| && doc | |
| .models | |
| .as_ref() | |
| .is_none_or(|m| m.is_empty()) | |
| && doc | |
| .sources | |
| .as_ref() | |
| .is_none_or(|s| s.is_empty()) |
| if let Some(name) = current.file_name().and_then(|n| n.to_str()) { | ||
| if name == expected_dir { | ||
| return Some(current.join(filename)); | ||
| } |
There was a problem hiding this comment.
compute_fix_path() returns Some(current.join(filename)) when it finds expected_dir higher in the path, which drops any subdirectory nesting below expected_dir. For example, models/marts/intermediate/stripe/dim_x.sql fixing to marts would become models/marts/dim_x.sql (losing stripe/). If the intent is to preserve subdirectories below the wrong tier (per the docs), adjust this branch to keep the relative path beneath the nearest wrong recognized directory (and avoid duplicating expected_dir when its parent is already the expected tier).
| // doesn't actually require python help, so we can just do it in Rust and not include in batch updates | ||
| ModelChange::MoveModelFile { | ||
| patch_path, | ||
| new_path, | ||
| .. | ||
| } => { | ||
| let patch = | ||
| patch_path | ||
| .clone() | ||
| .ok_or_else(|| WriteBackError::PatchPathMissing { | ||
| model_id: model_changes.model_id.clone(), | ||
| })?; | ||
| let src = if patch.is_absolute() { | ||
| patch | ||
| } else { | ||
| project_root.join(patch) | ||
| }; | ||
| let dst = if new_path.is_absolute() { | ||
| new_path.clone() | ||
| } else { | ||
| project_root.join(new_path) | ||
| }; | ||
| if let Some(parent) = dst.parent() { | ||
| std::fs::create_dir_all(parent)?; | ||
| } | ||
| std::fs::rename(&src, &dst)?; | ||
| } |
There was a problem hiding this comment.
apply_with_python() still requires model_changes.patch_path earlier in the loop, so models that only have MoveModelFile changes (e.g., directory fixes for SQL-only models without a properties YAML) will hit PatchPathMissing before this branch is reached. Consider handling MoveModelFile changes before requiring a properties patch_path, or restructuring writeback so file moves don’t depend on grouping/patch-path availability.
No description provided.