Skip to content

fix: extend default ignore patterns and fix nested path matching#92

Open
E-ChintanGohil wants to merge 3 commits intotirth8205:mainfrom
E-ChintanGohil:feat/extend-default-ignore-patterns
Open

fix: extend default ignore patterns and fix nested path matching#92
E-ChintanGohil wants to merge 3 commits intotirth8205:mainfrom
E-ChintanGohil:feat/extend-default-ignore-patterns

Conversation

@E-ChintanGohil
Copy link
Copy Markdown

Summary

  • Fix _should_ignore to handle nested dependency directories (e.g., packages/app/node_modules/) by checking path segments, not just prefix matching
  • Extend DEFAULT_IGNORE_PATTERNS with common framework patterns: PHP/Laravel (vendor/, storage/), Ruby/Rails, Java/Gradle, .NET, Dart/Flutter, and general (coverage/, .cache/, tmp/)

Closes #91

Changes

  • code_review_graph/incremental.py: Updated _should_ignore with PurePosixPath.parts segment matching + added 15 new default patterns
  • tests/test_incremental.py: Added test_should_ignore_nested_paths and test_should_ignore_framework_patterns

Test plan

  • All existing tests pass
  • New tests cover nested node_modules, vendor, storage paths
  • New tests verify framework-specific patterns (Laravel, .NET, Dart, Gradle)
  • Source files (src/, app/) confirmed not affected by new patterns

The `_should_ignore` function only used `fnmatch.fnmatch()` which matches
from the start of the path. This means nested dependency directories like
`packages/app/node_modules/react/index.js` were NOT ignored — only
top-level `node_modules/` was caught.

This is a problem in monorepos, workspaces, and any project with nested
dependency directories. The graph would parse thousands of third-party
files, inflating build time and polluting blast radius queries.

Fix: check if any path segment matches the pattern prefix (e.g., if
"node_modules" appears anywhere in the path parts).

Also extends DEFAULT_IGNORE_PATTERNS to cover common frameworks:
- PHP/Laravel: vendor/, storage/, bootstrap/cache/, public/build/
- Ruby/Rails: vendor/bundle/, .bundle/
- Java/Kotlin: .gradle/, *.jar
- .NET/C#: bin/, obj/, packages/
- Dart/Flutter: .dart_tool/, .pub-cache/
- General: coverage/, .cache/, tmp/, .nuxt/
Copy link
Copy Markdown

@Akashnkumar1 Akashnkumar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Copy Markdown
Owner

@tirth8205 tirth8205 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this! The nested path matching is a real problem. However, the current fix has a significant issue:

packages/** in DEFAULT_IGNORE_PATTERNS will break monorepos. Every npm workspace, Lerna, and Turborepo project uses a packages/ directory for source code. This pattern would ignore all source files in monorepos.

Similarly, bin/**, build/**, storage/**, vendor/** are common source directory names that would cause false positives when matched against any path segment.

The root cause: the PurePosixPath.parts check matches single-segment prefixes anywhere in the path, so packages/** matches packages/app/src/main.ts (which is actual source code).

Suggested approach: Instead of matching pattern prefixes against any path segment, use **/ prefix patterns for things that are safe to match anywhere (like node_modules, __pycache__), and keep root-relative patterns for others. For example:

  • **/node_modules/** — safe anywhere
  • **/vendor/** — probably safe anywhere
  • build/** — only at root (not src/build/)

Also: the from pathlib import PurePosixPath import is inside _should_ignore() which is a hot path — please move it to module level.

Addresses @tirth8205's review concerns:

1. `packages/**` in defaults breaks monorepos (npm workspaces, Lerna,
   Turborepo all use `packages/` for source code). Removed from defaults;
   .NET users can add via `.code-review-graphignore`.

2. `PurePosixPath.parts` matching anywhere was too aggressive —
   `packages/app/src/main.ts` would match `packages/**`. Split patterns
   into two explicit styles:

   - `**/name/**` — matches `name` as any path segment (safe-anywhere).
     Used only for dirs that are NEVER source code: node_modules,
     __pycache__, .venv, venv, vendor, .bundle, .gradle, .dart_tool,
     .pub-cache, .cache.

   - `name/**` — matches only at repo root. Used for dirs that may be
     valid source names: bin/, obj/, build/, dist/, target/, .next/,
     .nuxt/, storage/, bootstrap/cache/, public/build/, coverage/,
     tmp/, .tmp/.

3. Moved `from pathlib import PurePosixPath` to module level (it was
   inside the hot-path `_should_ignore` function).

Multi-segment root prefixes like `bootstrap/cache/**` are also
supported correctly.

Test coverage added for:
- Safe-anywhere matching nested dependency dirs
- Root-relative patterns NOT matching nested `name/` dirs
- Multi-segment prefix matching
- Monorepo source code (packages/app/src/) preserved
@E-ChintanGohil
Copy link
Copy Markdown
Author

Thanks for the review — you're absolutely right, packages/** breaking monorepos was a critical oversight. Pushed a fix (b7c2793) that addresses all three points:

1. Explicit pattern styles

  • **/name/** — safe-anywhere (only for dirs never used as source code): node_modules, __pycache__, .venv, venv, vendor, .bundle, .gradle, .dart_tool, .pub-cache, .cache
  • name/** — root-relative only: bin, obj, build, dist, target, .next, .nuxt, storage, bootstrap/cache, public/build, coverage, tmp, .tmp

2. Removed packages/** from defaults. The npm/Lerna/Turborepo conflict makes it unsafe as a default — .NET users can add it via .code-review-graphignore if needed.

3. Moved PurePosixPath import to module level (out of the hot path).

New tests verify:

  • Nested packages/app/src/main.ts is NOT ignored (monorepo source preserved)
  • Nested packages/app/node_modules/* IS ignored
  • Multi-segment prefixes like bootstrap/cache/** work correctly
  • Root-only bin/** doesn't match services/api/bin/helper.sh

All 28 tests in test_incremental.py pass.

@tirth8205
Copy link
Copy Markdown
Owner

Review: PR #92 — fix: extend default ignore patterns and fix nested path matching

The author addressed all three concerns from the owner's CHANGES_REQUESTED review:

  • Removed packages/** from defaults (avoids monorepo false positives)
  • Implemented two-tier pattern semantics (/name/ for safe-anywhere, name/** for root-relative)
  • Moved PurePosixPath import to module level

The updated implementation is correct and the test suite is comprehensive — tests verify:

  • Nested node_modules, vendor, pycache ARE ignored
  • Nested packages/app/src/main.ts is NOT ignored (monorepo source preserved)
  • Root-only bin/, obj/, storage/, bootstrap/cache/ work correctly
  • Multi-segment prefix like bootstrap/cache/** matches only at root

One minor issue: The _should_ignore function has a long line (the tuple comparison on multi-segment prefixes) that exceeds the project's 100-char line length limit. This may cause a ruff lint failure:

if len(parts) >= len(prefix_parts) and tuple(parts[: len(prefix_parts)]) == tuple(prefix_parts):

This is 104 characters. Please wrap it.

Verdict: Approve and merge once the line length issue is resolved and CI passes. The logic is sound, the tests are thorough, and all owner-requested changes were addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nested dependency dirs not ignored + missing framework patterns in defaults

3 participants