fix: extend default ignore patterns and fix nested path matching#92
fix: extend default ignore patterns and fix nested path matching#92E-ChintanGohil wants to merge 3 commits intotirth8205:mainfrom
Conversation
The `_should_ignore` function only used `fnmatch.fnmatch()` which matches from the start of the path. This means nested dependency directories like `packages/app/node_modules/react/index.js` were NOT ignored — only top-level `node_modules/` was caught. This is a problem in monorepos, workspaces, and any project with nested dependency directories. The graph would parse thousands of third-party files, inflating build time and polluting blast radius queries. Fix: check if any path segment matches the pattern prefix (e.g., if "node_modules" appears anywhere in the path parts). Also extends DEFAULT_IGNORE_PATTERNS to cover common frameworks: - PHP/Laravel: vendor/, storage/, bootstrap/cache/, public/build/ - Ruby/Rails: vendor/bundle/, .bundle/ - Java/Kotlin: .gradle/, *.jar - .NET/C#: bin/, obj/, packages/ - Dart/Flutter: .dart_tool/, .pub-cache/ - General: coverage/, .cache/, tmp/, .nuxt/
tirth8205
left a comment
There was a problem hiding this comment.
Thanks for tackling this! The nested path matching is a real problem. However, the current fix has a significant issue:
packages/** in DEFAULT_IGNORE_PATTERNS will break monorepos. Every npm workspace, Lerna, and Turborepo project uses a packages/ directory for source code. This pattern would ignore all source files in monorepos.
Similarly, bin/**, build/**, storage/**, vendor/** are common source directory names that would cause false positives when matched against any path segment.
The root cause: the PurePosixPath.parts check matches single-segment prefixes anywhere in the path, so packages/** matches packages/app/src/main.ts (which is actual source code).
Suggested approach: Instead of matching pattern prefixes against any path segment, use **/ prefix patterns for things that are safe to match anywhere (like node_modules, __pycache__), and keep root-relative patterns for others. For example:
**/node_modules/**— safe anywhere**/vendor/**— probably safe anywherebuild/**— only at root (notsrc/build/)
Also: the from pathlib import PurePosixPath import is inside _should_ignore() which is a hot path — please move it to module level.
Addresses @tirth8205's review concerns: 1. `packages/**` in defaults breaks monorepos (npm workspaces, Lerna, Turborepo all use `packages/` for source code). Removed from defaults; .NET users can add via `.code-review-graphignore`. 2. `PurePosixPath.parts` matching anywhere was too aggressive — `packages/app/src/main.ts` would match `packages/**`. Split patterns into two explicit styles: - `**/name/**` — matches `name` as any path segment (safe-anywhere). Used only for dirs that are NEVER source code: node_modules, __pycache__, .venv, venv, vendor, .bundle, .gradle, .dart_tool, .pub-cache, .cache. - `name/**` — matches only at repo root. Used for dirs that may be valid source names: bin/, obj/, build/, dist/, target/, .next/, .nuxt/, storage/, bootstrap/cache/, public/build/, coverage/, tmp/, .tmp/. 3. Moved `from pathlib import PurePosixPath` to module level (it was inside the hot-path `_should_ignore` function). Multi-segment root prefixes like `bootstrap/cache/**` are also supported correctly. Test coverage added for: - Safe-anywhere matching nested dependency dirs - Root-relative patterns NOT matching nested `name/` dirs - Multi-segment prefix matching - Monorepo source code (packages/app/src/) preserved
|
Thanks for the review — you're absolutely right, 1. Explicit pattern styles
2. Removed 3. Moved New tests verify:
All 28 tests in |
Review: PR #92 — fix: extend default ignore patterns and fix nested path matchingThe author addressed all three concerns from the owner's CHANGES_REQUESTED review:
The updated implementation is correct and the test suite is comprehensive — tests verify:
One minor issue: The if len(parts) >= len(prefix_parts) and tuple(parts[: len(prefix_parts)]) == tuple(prefix_parts):This is 104 characters. Please wrap it. Verdict: Approve and merge once the line length issue is resolved and CI passes. The logic is sound, the tests are thorough, and all owner-requested changes were addressed. |
Summary
_should_ignoreto handle nested dependency directories (e.g.,packages/app/node_modules/) by checking path segments, not just prefix matchingDEFAULT_IGNORE_PATTERNSwith common framework patterns: PHP/Laravel (vendor/,storage/), Ruby/Rails, Java/Gradle, .NET, Dart/Flutter, and general (coverage/,.cache/,tmp/)Closes #91
Changes
code_review_graph/incremental.py: Updated_should_ignorewithPurePosixPath.partssegment matching + added 15 new default patternstests/test_incremental.py: Addedtest_should_ignore_nested_pathsandtest_should_ignore_framework_patternsTest plan
src/,app/) confirmed not affected by new patterns