feat: parser refactoring, Jedi call resolution, and performance optimizations#158
feat: parser refactoring, Jedi call resolution, and performance optimizations#158gzenz wants to merge 1 commit intotirth8205:mainfrom
Conversation
6df1421 to
702ac5b
Compare
|
PR #158 supersedes #100, #102, #103, #104, #108. Detailed review: Strengths: lang/ strategy pattern is clean, Jedi pre-filtering (36s to 3s) is correct, v8 migration correctly accounts for #127 v7 slot, 829 tests. Issues before merge: (1) CI not yet run - must pass first. (2) v7/v8 migration dependency: if #127 does not land first, add a no-op v7 stub here. (3) Confirm jedi_resolver is lazy-imported when jedi is absent. (4) Check for duplicate logger = getLogger in search.py carried from #94. Overall: excellent quality, ready once CI is green. |
|
Merge conflict detected. This PR has merge conflicts with main (mergeStateStatus: DIRTY). Given the scope (parser.py refactoring, cli.py changes, migrations.py, skills.py), this is expected given ongoing merges. Please rebase on main to resolve conflicts. The v7/v8 migration chain should be re-verified after rebase — confirm what migration version main is at after any intervening merges. |
702ac5b to
0cc35e7
Compare
|
i'm rebasing again ... |
9445cbe to
320fac5
Compare
|
@tirth8205 given the size of this PR and your merging frequency, rebasing becomes a bit of a chore. do you want to ping me before you want to merge this, so i rebase then - or do you want to take care of rebasing after you merged whatever you like merging before this? |
9cbe3dc to
f8da885
Compare
|
You're completely right @gzenz, and I'm sorry for the rebase churn — the merge frequency was intentional for the Windows hotfix (#46/#136 was blocking real users) but I should have coordinated with you given the scope of #158. Here's a concrete plan, and I'll stick to it. The problem#158 is 72 files, 3,226 lines in Merge window proposalI will pause all non-urgent merges to
Ping me when you're ready and I'll:
What I need from youJust tell me when to start. Either:
If you're not sure which of those to pick: given the size, I'd actually suggest option 3 (split). The lang/ refactor, the Jedi resolver, the PreToolUse search enrichment, the community detection speedup, and the dead-code FP fixes are each strong contributions on their own and would be much easier to review as 4-5 PRs of ~500 lines each than one 3,226-line one. But that's your call — if you'd rather ship as-is, I'll do the full review. Concrete "what's currently on main" for your next rebaseIf you go with option 1 or 2, the base to rebase onto is: ``` The main conflict surface you'll hit is `parser.py` (because #227/#228 added Obj-C, Bash, and Elixir support there and you're refactoring that whole file into `lang/`). If you'd like, I can start a `lang/_objc.py`, `lang/_bash.py`, `lang/_elixir.py` stub for you to fold the v2.3.0 language code into — that might save you one more rebase round. Let me know. Again, sorry for the disruption. Thank you for the patience and for pushing this forward. |
|
@tirth8205 i set CC to a 15 min loop to rebase, run my extended regression tests on some of my larger projects in case of conflicts in my core contributions. So i think we're fine and you can continue merging what's important first. I'll just go to bed soon, so expect some break in case you continue today, i'll pick up tomorrow. And yes, it's more a refactor so I understand it'll take some work to tackle. I'd prefer to have it merged in one go, but if you rather have it split I can also do that tomorrow. |
f8da885 to
d44aba2
Compare
…izations Major improvements to code-review-graph spanning parser architecture, call graph accuracy, and build performance. Parser refactoring: - Extract 16 per-language handler modules into code_review_graph/lang/ using a strategy pattern, replacing monolithic conditionals in parser.py - Thread-safe parser caches with double-check locking Call graph enrichment: - Jedi-based Python method call resolution at build time (jedi_resolver.py) - Pre-scan filtering by project function names (36s to 3s on large repos) - Typed variable call enrichment (Python, JS/TS, Kotlin/Java) - Star import resolution, namespace imports, CommonJS require() - Angular template parsing, JSX handler tracking - Module-level import tracking and module-qualified call resolution - Function/class references passed as call arguments PreToolUse search enrichment: - New enrich.py module and code-review-graph enrich CLI command - Injects graph context (callers, flows, community, tests) into agent search results passively via hook Dead code false positive reduction: - Framework decorators recognized as entry points - CDK construct methods, abstract overrides excluded - E2e test directories excluded from dead code detection Performance: - Community detection: 48.6s to 2.3s (21x speedup) via bulk node loading and adjacency-indexed cohesion computation - Jedi enrichment: 36s to 3s (12x) via pre-scan filtering - Batch file storage (50-file transactions) - Batch risk_index (2 GROUP BY queries replace per-node loops) Other: - Weighted flow risk scoring by criticality - Transitive TESTED_BY lookup for tests_for and risk scoring - DB schema v8: composite edge index (v7 reserved by PR tirth8205#127) - --quiet and --json CLI flags - Search query deduplication, test function deprioritization - New [enrichment] optional dependency group for Jedi - 829+ tests across 26 test files (up from 615) Evaluated against Gadgetbridge (41k nodes, 280k edges): 8/10 PASS, call resolution rate improved from 28% to 39.6%.
d44aba2 to
3336fd5
Compare
|
@tirth8205 Thanks for pausing merges -- I'll split #158 into smaller PRs to make review easier. Here's the plan: PR 1: Lang handler refactor (~2,800 lines, targets main)
PR 2: Parser call resolution & accuracy (~3,500 lines, targets PR 1)
PR 3: Jedi resolver (~500 lines, targets main)
PR 4: Search enrichment (~1,200 lines, targets main)
PR 5: Community, flows & dead code (~2,500 lines, targets main)
PR 6: Tools, CLI & infrastructure (~5,000 lines, after 1-5 merge)
PRs 1, 3, 4, 5 can be reviewed in parallel (all target main independently). PR 2 chains off PR 1. PR 6 wires everything together after the rest lands. Will start opening these shortly. |
|
Split PRs are up:
PRs 1-4 are independent and can be reviewed/merged in any order. PR 5 wires everything together after the others land. All branches pass full test suites + lint + mypy + bandit. This PR (#158) can be closed once the split PRs are merged. |
Summary
Major improvements spanning parser architecture, call graph accuracy, build performance, and dead code detection:
code_review_graph/lang/using a strategy pattern, replacing monolithic conditionals inparser.py. Thread-safe parser caches with double-check locking.jedi_resolver.pyresolves Python method calls at build time. Pre-scan filtering by project function names reduces enrichment from 36s to 3s on large repos. New[enrichment]optional dependency group.enrich.pymodule andcode-review-graph enrichCLI command inject graph context (callers, flows, community, tests) into agent search results passively via hook.require(), Angular template parsing, JSX handler tracking, module-qualified call resolution, function/class references as arguments.--quiet/--jsonCLI flags, search deduplication, 829+ tests (up from 615).Evaluation
Tested against Gadgetbridge (41k nodes, 280k edges):
Migration note
Our composite edge index migration is numbered v8 to avoid conflict with v6 (summary tables, already on main) and v7 (reserved by PR #127).
Test plan
uv run pytest tests/ --tb=short -q-- 829 passed, 4 skippeduv run ruff check code_review_graph/-- all checks passed