Skip to content

feat: parser refactoring, Jedi call resolution, and performance optimizations#158

Open
gzenz wants to merge 1 commit intotirth8205:mainfrom
gzenz:feat/lang-handlers-jedi-resolver-and-perf
Open

feat: parser refactoring, Jedi call resolution, and performance optimizations#158
gzenz wants to merge 1 commit intotirth8205:mainfrom
gzenz:feat/lang-handlers-jedi-resolver-and-perf

Conversation

@gzenz
Copy link
Copy Markdown
Contributor

@gzenz gzenz commented Apr 8, 2026

Summary

Major improvements spanning parser architecture, call graph accuracy, build performance, and dead code detection:

  • Parser refactoring: Extracted 16 per-language handler modules into code_review_graph/lang/ using a strategy pattern, replacing monolithic conditionals in parser.py. Thread-safe parser caches with double-check locking.
  • Jedi-based call resolution: New jedi_resolver.py resolves Python method calls at build time. Pre-scan filtering by project function names reduces enrichment from 36s to 3s on large repos. New [enrichment] optional dependency group.
  • PreToolUse search enrichment: New enrich.py module and code-review-graph enrich CLI command inject graph context (callers, flows, community, tests) into agent search results passively via hook.
  • Call graph improvements: Typed variable call enrichment (Python, JS/TS, Kotlin/Java), star import resolution, namespace imports, CommonJS require(), Angular template parsing, JSX handler tracking, module-qualified call resolution, function/class references as arguments.
  • Dead code FP reduction: Framework decorators recognized as entry points, CDK construct methods and abstract overrides excluded, e2e test directories filtered.
  • Community detection 21x speedup: Bulk node loading + adjacency-indexed cohesion computation (48.6s to 2.3s on 41k-node repos).
  • Build performance: Batch file storage (50-file transactions), batch risk_index (2 GROUP BY queries replace per-node loops).
  • DB schema v8: Composite edge index for upsert performance (v7 reserved by PR fix: add sqlite edge compound indexes #127).
  • Other: Weighted flow risk scoring, transitive TESTED_BY, --quiet/--json CLI flags, search deduplication, 829+ tests (up from 615).

Evaluation

Tested against Gadgetbridge (41k nodes, 280k edges):

  • 8/10 scorecard PASS (callers_of, callees_of, tests_for, communities, flows, impact radius, risk scores)
  • Call resolution rate improved from 28% to 39.6%
  • Community detection: 48.6s to 2.3s
  • Full build time: ~22s for 3,574 files

Migration note

Our composite edge index migration is numbered v8 to avoid conflict with v6 (summary tables, already on main) and v7 (reserved by PR #127).

Test plan

  • uv run pytest tests/ --tb=short -q -- 829 passed, 4 skipped
  • uv run ruff check code_review_graph/ -- all checks passed
  • Full rebuild + evaluation on Gadgetbridge (41k nodes)
  • Full rebuild + evaluation on internal Python/TS/React project (261 files)
  • CI pipeline (lint, type-check, security, test matrix)

@gzenz gzenz force-pushed the feat/lang-handlers-jedi-resolver-and-perf branch from 6df1421 to 702ac5b Compare April 8, 2026 20:43
@tirth8205
Copy link
Copy Markdown
Owner

PR #158 supersedes #100, #102, #103, #104, #108. Detailed review: Strengths: lang/ strategy pattern is clean, Jedi pre-filtering (36s to 3s) is correct, v8 migration correctly accounts for #127 v7 slot, 829 tests. Issues before merge: (1) CI not yet run - must pass first. (2) v7/v8 migration dependency: if #127 does not land first, add a no-op v7 stub here. (3) Confirm jedi_resolver is lazy-imported when jedi is absent. (4) Check for duplicate logger = getLogger in search.py carried from #94. Overall: excellent quality, ready once CI is green.

@tirth8205
Copy link
Copy Markdown
Owner

Merge conflict detected. This PR has merge conflicts with main (mergeStateStatus: DIRTY). Given the scope (parser.py refactoring, cli.py changes, migrations.py, skills.py), this is expected given ongoing merges. Please rebase on main to resolve conflicts. The v7/v8 migration chain should be re-verified after rebase — confirm what migration version main is at after any intervening merges.

@gzenz
Copy link
Copy Markdown
Contributor Author

gzenz commented Apr 11, 2026

i'm rebasing again ...

@gzenz gzenz force-pushed the feat/lang-handlers-jedi-resolver-and-perf branch 2 times, most recently from 9445cbe to 320fac5 Compare April 11, 2026 19:54
@gzenz
Copy link
Copy Markdown
Contributor Author

gzenz commented Apr 11, 2026

@tirth8205 given the size of this PR and your merging frequency, rebasing becomes a bit of a chore. do you want to ping me before you want to merge this, so i rebase then - or do you want to take care of rebasing after you merged whatever you like merging before this?

@gzenz gzenz force-pushed the feat/lang-handlers-jedi-resolver-and-perf branch 3 times, most recently from 9cbe3dc to f8da885 Compare April 11, 2026 21:07
@tirth8205
Copy link
Copy Markdown
Owner

You're completely right @gzenz, and I'm sorry for the rebase churn — the merge frequency was intentional for the Windows hotfix (#46/#136 was blocking real users) but I should have coordinated with you given the scope of #158.

Here's a concrete plan, and I'll stick to it.

The problem

#158 is 72 files, 3,226 lines in parser.py alone, 829 tests, DB schema v8. This is an architectural refactor — it's not a PR I can responsibly merge without a proper review pass, and I can't keep making you rebase while I'm shipping small fixes on top of main. So let's stop the churn.

Merge window proposal

I will pause all non-urgent merges to main until #158 lands or we explicitly decide to defer. "Non-urgent" means anything that isn't a fresh security fix or a Windows-blocker-class bug. The queue I'm deferring:

Ping me when you're ready and I'll:

  1. Not merge anything new to `main` from the moment you confirm until feat: parser refactoring, Jedi call resolution, and performance optimizations #158 is reviewed/merged/closed.
  2. Review feat: parser refactoring, Jedi call resolution, and performance optimizations #158 in one focused pass — architecture-level first (parser.py split into lang/*, jedi_resolver, enrich module boundaries), then per-file correctness, then the 829-test suite.
  3. Coordinate with you on any review comments in-thread so the last rebase is the right one.
  4. Merge cleanly once we're both happy.

What I need from you

Just tell me when to start. Either:

  • "Start now" — I begin the review today, pause all other merges, we iterate until it lands.
  • "I'll ping you when rebased on v2.3.1" — I hold the merge pause until you're ready.
  • "I want to split this into smaller PRs first" — totally valid given the scope; I can pair with you on the split.

If you're not sure which of those to pick: given the size, I'd actually suggest option 3 (split). The lang/ refactor, the Jedi resolver, the PreToolUse search enrichment, the community detection speedup, and the dead-code FP fixes are each strong contributions on their own and would be much easier to review as 4-5 PRs of ~500 lines each than one 3,226-line one. But that's your call — if you'd rather ship as-is, I'll do the full review.

Concrete "what's currently on main" for your next rebase

If you go with option 1 or 2, the base to rebase onto is:

```
02a7bc5 chore: release v2.3.1 (#233) ← current main
7093e56 fix: offload long-running MCP tools (#231)
e366db8 chore: release v2.3.0 (#229)
3288f59 feat: Elixir + dry-run + CRG_DATA_DIR + cloud-warn + docs (#228)
8c1a7fb feat: qwen + objc + bash (#227)
c586202 chore: release v2.2.4 (#225)
```

The main conflict surface you'll hit is `parser.py` (because #227/#228 added Obj-C, Bash, and Elixir support there and you're refactoring that whole file into `lang/`). If you'd like, I can start a `lang/_objc.py`, `lang/_bash.py`, `lang/_elixir.py` stub for you to fold the v2.3.0 language code into — that might save you one more rebase round. Let me know.

Again, sorry for the disruption. Thank you for the patience and for pushing this forward.

@gzenz
Copy link
Copy Markdown
Contributor Author

gzenz commented Apr 11, 2026

@tirth8205 i set CC to a 15 min loop to rebase, run my extended regression tests on some of my larger projects in case of conflicts in my core contributions. So i think we're fine and you can continue merging what's important first. I'll just go to bed soon, so expect some break in case you continue today, i'll pick up tomorrow. And yes, it's more a refactor so I understand it'll take some work to tackle. I'd prefer to have it merged in one go, but if you rather have it split I can also do that tomorrow.

@gzenz gzenz force-pushed the feat/lang-handlers-jedi-resolver-and-perf branch from f8da885 to d44aba2 Compare April 11, 2026 21:21
…izations

Major improvements to code-review-graph spanning parser architecture,
call graph accuracy, and build performance.

Parser refactoring:
- Extract 16 per-language handler modules into code_review_graph/lang/
  using a strategy pattern, replacing monolithic conditionals in parser.py
- Thread-safe parser caches with double-check locking

Call graph enrichment:
- Jedi-based Python method call resolution at build time (jedi_resolver.py)
- Pre-scan filtering by project function names (36s to 3s on large repos)
- Typed variable call enrichment (Python, JS/TS, Kotlin/Java)
- Star import resolution, namespace imports, CommonJS require()
- Angular template parsing, JSX handler tracking
- Module-level import tracking and module-qualified call resolution
- Function/class references passed as call arguments

PreToolUse search enrichment:
- New enrich.py module and code-review-graph enrich CLI command
- Injects graph context (callers, flows, community, tests) into agent
  search results passively via hook

Dead code false positive reduction:
- Framework decorators recognized as entry points
- CDK construct methods, abstract overrides excluded
- E2e test directories excluded from dead code detection

Performance:
- Community detection: 48.6s to 2.3s (21x speedup) via bulk node
  loading and adjacency-indexed cohesion computation
- Jedi enrichment: 36s to 3s (12x) via pre-scan filtering
- Batch file storage (50-file transactions)
- Batch risk_index (2 GROUP BY queries replace per-node loops)

Other:
- Weighted flow risk scoring by criticality
- Transitive TESTED_BY lookup for tests_for and risk scoring
- DB schema v8: composite edge index (v7 reserved by PR tirth8205#127)
- --quiet and --json CLI flags
- Search query deduplication, test function deprioritization
- New [enrichment] optional dependency group for Jedi
- 829+ tests across 26 test files (up from 615)

Evaluated against Gadgetbridge (41k nodes, 280k edges): 8/10 PASS,
call resolution rate improved from 28% to 39.6%.
@gzenz gzenz force-pushed the feat/lang-handlers-jedi-resolver-and-perf branch from d44aba2 to 3336fd5 Compare April 11, 2026 21:52
@gzenz
Copy link
Copy Markdown
Contributor Author

gzenz commented Apr 12, 2026

@tirth8205 Thanks for pausing merges -- I'll split #158 into smaller PRs to make review easier. Here's the plan:

PR 1: Lang handler refactor (~2,800 lines, targets main)

  • code_review_graph/lang/ (21 handler files + base class)
  • parser.py handler dispatch only
  • test_multilang.py additions

PR 2: Parser call resolution & accuracy (~3,500 lines, targets PR 1)

  • Typed variable walkers, star import resolution, method blocklist
  • Angular template parsing, Dart call extraction, receiver filtering
  • test_parser.py + test_pain_points.py

PR 3: Jedi resolver (~500 lines, targets main)

  • jedi_resolver.py + jedi optional dep

PR 4: Search enrichment (~1,200 lines, targets main)

  • enrich.py (PreToolUse hook for graph-enriched search results)
  • skills.py hook registration

PR 5: Community, flows & dead code (~2,500 lines, targets main)

  • Community detection scaling, flow improvements, dead code FP fixes
  • DB migrations v7 stub + v8 composite index

PR 6: Tools, CLI & infrastructure (~5,000 lines, after 1-5 merge)

  • MCP tools, CLI integration, visualization, incremental build

PRs 1, 3, 4, 5 can be reviewed in parallel (all target main independently). PR 2 chains off PR 1. PR 6 wires everything together after the rest lands.

Will start opening these shortly.

@gzenz
Copy link
Copy Markdown
Contributor Author

gzenz commented Apr 12, 2026

Split PRs are up:

  1. feat: lang handler refactor + parser improvements #246 -- Lang handler refactor + parser improvements (46 files, +7.1k/-1.2k) -- targets main
  2. feat: Jedi-based Python call resolution #247 -- Jedi-based Python call resolution (2 files, +307) -- targets main
  3. feat: search enrichment via PreToolUse hooks #248 -- Search enrichment via PreToolUse hooks (2 files, +540) -- targets main
  4. feat: community detection, flow analysis, and dead code improvements #249 -- Community detection, flows, dead code improvements (14 files, +2.6k/-321) -- targets main
  5. feat: tools, CLI, visualization, and infrastructure integration #250 -- Tools, CLI, visualization, infrastructure (39 files, +7.2k/-506) -- depends on feat: lang handler refactor + parser improvements #246-feat: community detection, flow analysis, and dead code improvements #249

PRs 1-4 are independent and can be reviewed/merged in any order. PR 5 wires everything together after the others land. All branches pass full test suites + lint + mypy + bandit.

This PR (#158) can be closed once the split PRs are merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants