[experiment] Commit miner: extract migration hints and rules from completed migrations by fabianvf · Pull Request #925 · konveyor/kai

fabianvf · 2026-04-02T20:49:17Z

Summary

Experimental tooling to mine migration knowledge from completed code migrations. Two new components:

kai_mcp_solution_server REST API -- FastAPI endpoints alongside the existing MCP server for non-MCP clients
kai_commit_miner -- given two git refs (before/after migration), generates hints for existing analysis rules and discovers new analyzer-lsp rules

How it works

Infers the migration type from manifest diffs (pom.xml, etc.) and auto-selects kantra label selectors
Runs kantra analysis on the before/after states (or skips analysis for diff-only rule discovery)
Diffs analysis reports to find resolved violations
Attributes resolved violations to specific code changes
LLM generates hints per violation type (gotchas, accompanying changes, ordering -- things the rule description doesn't cover)
LLM discovers new rules from unattributed changes, output as real analyzer-lsp YAML detecting pre-migration patterns
Qualitative migration relevance scoring (high/medium/low/very_low with reasoning)

Sample results

Tested on coolstore Java EE 7 → Quarkus migration:

Mode	Violations	Hints	Rules	Cost
With kantra analysis	204 resolved	34 generated, 8 skipped	7 new	$0.67
No analysis (diff-only)	—	—	15 from scratch	$0.10

Solution server changes

Extracted service layer from server.py (settings.py, resources.py, service.py)
Added REST API (29 FastAPI endpoints) with collections for grouping mined solutions
DBCollection model + Alembic migration
All existing MCP tests pass unchanged

Status

This is experimental -- looking for feedback on:

Rule quality and direction (do generated rules correctly detect pre-migration patterns?)
Hint usefulness (do they add value beyond the rule description for an agent?)
Cost efficiency (currently ~$0.67 per full run with Sonnet)
Integration path with the solution server and existing Kai agent workflow

🤖 Generated with Claude Code

Split server.py (~1150 lines) into focused modules: - settings.py: SolutionServerSettings - resources.py: _SharedResources, KaiSolutionServerContext, with_db_recovery - service.py: all business logic + new query/collection/bulk functions - server.py: slimmed to ~200 lines (MCP tool wrappers only) Added _get_kai_ctx() helper to eliminate repeated context extraction boilerplate. Moved session_maker guard into with_db_recovery decorator. Updated test imports accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>

Add a FastAPI REST API alongside the MCP server, sharing the same DB pool and lifespan. Endpoints under /api/v1/ for incidents, solutions, violations, hints, collections (CRUD), and bulk commit ingestion. New DBCollection model with association tables for grouping mined solutions by source repo, migration type, or review batch. Alembic migration included. Composite app startup: MCP at /, REST at /api/v1/ for streamable-http mode. Added fastapi and greenlet as dependencies. 20 REST API tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>

New package that extracts migration hints and analysis rules from completed migrations. Given two git refs (before/after), it: 1. Infers the migration type from manifest diffs (pom.xml, etc.) 2. Auto-selects kantra label selectors matching the inferred migration 3. Runs static analysis on both refs via kantra (container mode) 4. Diffs analysis reports to find resolved violations 5. Attributes resolved violations to specific code changes in the diff 6. Generates hints per violation type via LLM (with skip/refine logic) 7. Discovers new analyzer-lsp rules from unattributed changes 8. Outputs rules as real analyzer-lsp YAML detecting pre-migration patterns 9. Produces self-contained HTML reports with relevance filtering Pluggable analyzer backends (kantra, precomputed, none). Qualitative migration relevance scoring (high/medium/low/very_low with reasoning). LLM token tracking with cost estimates. Dry-run mode with rich JSON output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>

32 unit tests covering: - git/diff_parser: unified diff parsing (6 tests) - git/commit_walker: linear walking, start/end ranges (7 tests) - diff/report_differ: resolved, new, moved, mixed changes (7 tests) - attribution/fix_attributor: overlap, indirect, unattributed (4 tests) - classifier/llm_classifier: hint gen, rule discovery, YAML parsing (8 tests) 8 coolstore integration tests (require local coolstore repo clone): - Report diffing with real analysis data - Git operations on quarkus migration branch - Fix attribution with real diffs - Full pipeline dry-run with precomputed reports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>

Switch report output from self-contained HTML to markdown that renders natively on GitHub. Add comprehensive README with usage, architecture diagram, CLI reference, and example links. Include sample report from mining the coolstore Java EE 7 to Quarkus migration (204 violations resolved, 34 hints, 7 new rules). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>

Add second sample report from diff-only mode (no kantra): 15 rules discovered from raw diffs at $0.10. Show inferred label selector in report header so users can see what analysis scope was auto-selected. Update README to link both example reports with cost comparison. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>

coderabbitai · 2026-04-02T20:49:25Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4532072e-1328-43e4-a69d-c885e4b2d510

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Instead of sending all unattributed changes in one LLM call, group by file type (java, build, config, infrastructure, renames) and make separate calls. Each category gets focused LLM attention and a full output budget. Within each category, files are sorted by diff size descending so larger, more interesting diffs get priority. Configurable via --max-prompt-tokens (default 16000) to accommodate different LLM context windows. Results on coolstore (no-analysis mode): - Before: 20 rules from 1 LLM call ($0.10) - After: 27 rules from 6 LLM calls ($0.17) - Coverage: 43% -> 58% of ground truth ruleset Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>

Rules that detect target-framework patterns (quarkus, jakarta, etc.) in their when condition are now filtered out post-generation. These are after-state rules that would fire on already-migrated code. Added build-file-specific prompt instructions telling the LLM to focus on removed dependencies (the pre-migration state) and generate one rule per dependency change. Results: 4 after-state rules correctly filtered (quarkus-rest-client, quarkus-rest-client-jackson, quarkus-smallrye-reactive-messaging, quarkus-openshift-extension). 25 clean rules remain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>

github-actions · 2026-06-03T01:54:40Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days.
It will remain open for visibility and reporting purposes.
Please comment if this PR is still relevant.

fabianvf and others added 6 commits April 2, 2026 16:26

fabianvf and others added 2 commits April 3, 2026 12:35

github-actions Bot added the stale label Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[experiment] Commit miner: extract migration hints and rules from completed migrations#925

[experiment] Commit miner: extract migration hints and rules from completed migrations#925
fabianvf wants to merge 8 commits into
konveyor:mainfrom
fabianvf:solution-server-from-commits

fabianvf commented Apr 2, 2026

Uh oh!

coderabbitai Bot commented Apr 2, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

fabianvf commented Apr 2, 2026

Summary

How it works

Sample results

Solution server changes

Status

Uh oh!

coderabbitai Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 2, 2026 •

edited

Loading