[experiment] Commit miner: extract migration hints and rules from completed migrations#925
[experiment] Commit miner: extract migration hints and rules from completed migrations#925fabianvf wants to merge 8 commits into
Conversation
Split server.py (~1150 lines) into focused modules: - settings.py: SolutionServerSettings - resources.py: _SharedResources, KaiSolutionServerContext, with_db_recovery - service.py: all business logic + new query/collection/bulk functions - server.py: slimmed to ~200 lines (MCP tool wrappers only) Added _get_kai_ctx() helper to eliminate repeated context extraction boilerplate. Moved session_maker guard into with_db_recovery decorator. Updated test imports accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
Add a FastAPI REST API alongside the MCP server, sharing the same DB pool and lifespan. Endpoints under /api/v1/ for incidents, solutions, violations, hints, collections (CRUD), and bulk commit ingestion. New DBCollection model with association tables for grouping mined solutions by source repo, migration type, or review batch. Alembic migration included. Composite app startup: MCP at /, REST at /api/v1/ for streamable-http mode. Added fastapi and greenlet as dependencies. 20 REST API tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
New package that extracts migration hints and analysis rules from completed migrations. Given two git refs (before/after), it: 1. Infers the migration type from manifest diffs (pom.xml, etc.) 2. Auto-selects kantra label selectors matching the inferred migration 3. Runs static analysis on both refs via kantra (container mode) 4. Diffs analysis reports to find resolved violations 5. Attributes resolved violations to specific code changes in the diff 6. Generates hints per violation type via LLM (with skip/refine logic) 7. Discovers new analyzer-lsp rules from unattributed changes 8. Outputs rules as real analyzer-lsp YAML detecting pre-migration patterns 9. Produces self-contained HTML reports with relevance filtering Pluggable analyzer backends (kantra, precomputed, none). Qualitative migration relevance scoring (high/medium/low/very_low with reasoning). LLM token tracking with cost estimates. Dry-run mode with rich JSON output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
32 unit tests covering: - git/diff_parser: unified diff parsing (6 tests) - git/commit_walker: linear walking, start/end ranges (7 tests) - diff/report_differ: resolved, new, moved, mixed changes (7 tests) - attribution/fix_attributor: overlap, indirect, unattributed (4 tests) - classifier/llm_classifier: hint gen, rule discovery, YAML parsing (8 tests) 8 coolstore integration tests (require local coolstore repo clone): - Report diffing with real analysis data - Git operations on quarkus migration branch - Fix attribution with real diffs - Full pipeline dry-run with precomputed reports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
Switch report output from self-contained HTML to markdown that renders natively on GitHub. Add comprehensive README with usage, architecture diagram, CLI reference, and example links. Include sample report from mining the coolstore Java EE 7 to Quarkus migration (204 violations resolved, 34 hints, 7 new rules). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
Add second sample report from diff-only mode (no kantra): 15 rules discovered from raw diffs at $0.10. Show inferred label selector in report header so users can see what analysis scope was auto-selected. Update README to link both example reports with cost comparison. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Instead of sending all unattributed changes in one LLM call, group by file type (java, build, config, infrastructure, renames) and make separate calls. Each category gets focused LLM attention and a full output budget. Within each category, files are sorted by diff size descending so larger, more interesting diffs get priority. Configurable via --max-prompt-tokens (default 16000) to accommodate different LLM context windows. Results on coolstore (no-analysis mode): - Before: 20 rules from 1 LLM call ($0.10) - After: 27 rules from 6 LLM calls ($0.17) - Coverage: 43% -> 58% of ground truth ruleset Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
Rules that detect target-framework patterns (quarkus, jakarta, etc.) in their when condition are now filtered out post-generation. These are after-state rules that would fire on already-migrated code. Added build-file-specific prompt instructions telling the LLM to focus on removed dependencies (the pre-migration state) and generate one rule per dependency change. Results: 4 after-state rules correctly filtered (quarkus-rest-client, quarkus-rest-client-jackson, quarkus-smallrye-reactive-messaging, quarkus-openshift-extension). 25 clean rules remain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. |
Summary
Experimental tooling to mine migration knowledge from completed code migrations. Two new components:
How it works
Sample results
Tested on coolstore Java EE 7 → Quarkus migration:
Solution server changes
Status
This is experimental -- looking for feedback on:
🤖 Generated with Claude Code