Skip to content

[experiment] Commit miner: extract migration hints and rules from completed migrations#925

Draft
fabianvf wants to merge 8 commits into
konveyor:mainfrom
fabianvf:solution-server-from-commits
Draft

[experiment] Commit miner: extract migration hints and rules from completed migrations#925
fabianvf wants to merge 8 commits into
konveyor:mainfrom
fabianvf:solution-server-from-commits

Conversation

@fabianvf

@fabianvf fabianvf commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Experimental tooling to mine migration knowledge from completed code migrations. Two new components:

  • kai_mcp_solution_server REST API -- FastAPI endpoints alongside the existing MCP server for non-MCP clients
  • kai_commit_miner -- given two git refs (before/after migration), generates hints for existing analysis rules and discovers new analyzer-lsp rules

How it works

  1. Infers the migration type from manifest diffs (pom.xml, etc.) and auto-selects kantra label selectors
  2. Runs kantra analysis on the before/after states (or skips analysis for diff-only rule discovery)
  3. Diffs analysis reports to find resolved violations
  4. Attributes resolved violations to specific code changes
  5. LLM generates hints per violation type (gotchas, accompanying changes, ordering -- things the rule description doesn't cover)
  6. LLM discovers new rules from unattributed changes, output as real analyzer-lsp YAML detecting pre-migration patterns
  7. Qualitative migration relevance scoring (high/medium/low/very_low with reasoning)

Sample results

Tested on coolstore Java EE 7 → Quarkus migration:

Mode Violations Hints Rules Cost
With kantra analysis 204 resolved 34 generated, 8 skipped 7 new $0.67
No analysis (diff-only) 15 from scratch $0.10

Solution server changes

  • Extracted service layer from server.py (settings.py, resources.py, service.py)
  • Added REST API (29 FastAPI endpoints) with collections for grouping mined solutions
  • DBCollection model + Alembic migration
  • All existing MCP tests pass unchanged

Status

This is experimental -- looking for feedback on:

  • Rule quality and direction (do generated rules correctly detect pre-migration patterns?)
  • Hint usefulness (do they add value beyond the rule description for an agent?)
  • Cost efficiency (currently ~$0.67 per full run with Sonnet)
  • Integration path with the solution server and existing Kai agent workflow

🤖 Generated with Claude Code

fabianvf and others added 6 commits April 2, 2026 16:26
Split server.py (~1150 lines) into focused modules:
- settings.py: SolutionServerSettings
- resources.py: _SharedResources, KaiSolutionServerContext, with_db_recovery
- service.py: all business logic + new query/collection/bulk functions
- server.py: slimmed to ~200 lines (MCP tool wrappers only)

Added _get_kai_ctx() helper to eliminate repeated context extraction
boilerplate. Moved session_maker guard into with_db_recovery decorator.
Updated test imports accordingly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
Add a FastAPI REST API alongside the MCP server, sharing the same DB pool
and lifespan. Endpoints under /api/v1/ for incidents, solutions, violations,
hints, collections (CRUD), and bulk commit ingestion.

New DBCollection model with association tables for grouping mined solutions
by source repo, migration type, or review batch. Alembic migration included.

Composite app startup: MCP at /, REST at /api/v1/ for streamable-http mode.
Added fastapi and greenlet as dependencies. 20 REST API tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
New package that extracts migration hints and analysis rules from completed
migrations. Given two git refs (before/after), it:

1. Infers the migration type from manifest diffs (pom.xml, etc.)
2. Auto-selects kantra label selectors matching the inferred migration
3. Runs static analysis on both refs via kantra (container mode)
4. Diffs analysis reports to find resolved violations
5. Attributes resolved violations to specific code changes in the diff
6. Generates hints per violation type via LLM (with skip/refine logic)
7. Discovers new analyzer-lsp rules from unattributed changes
8. Outputs rules as real analyzer-lsp YAML detecting pre-migration patterns
9. Produces self-contained HTML reports with relevance filtering

Pluggable analyzer backends (kantra, precomputed, none). Qualitative
migration relevance scoring (high/medium/low/very_low with reasoning).
LLM token tracking with cost estimates. Dry-run mode with rich JSON output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
32 unit tests covering:
- git/diff_parser: unified diff parsing (6 tests)
- git/commit_walker: linear walking, start/end ranges (7 tests)
- diff/report_differ: resolved, new, moved, mixed changes (7 tests)
- attribution/fix_attributor: overlap, indirect, unattributed (4 tests)
- classifier/llm_classifier: hint gen, rule discovery, YAML parsing (8 tests)

8 coolstore integration tests (require local coolstore repo clone):
- Report diffing with real analysis data
- Git operations on quarkus migration branch
- Fix attribution with real diffs
- Full pipeline dry-run with precomputed reports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
Switch report output from self-contained HTML to markdown that renders
natively on GitHub. Add comprehensive README with usage, architecture
diagram, CLI reference, and example links.

Include sample report from mining the coolstore Java EE 7 to Quarkus
migration (204 violations resolved, 34 hints, 7 new rules).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
Add second sample report from diff-only mode (no kantra): 15 rules
discovered from raw diffs at $0.10. Show inferred label selector in
report header so users can see what analysis scope was auto-selected.

Update README to link both example reports with cost comparison.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
@coderabbitai

coderabbitai Bot commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4532072e-1328-43e4-a69d-c885e4b2d510

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

fabianvf and others added 2 commits April 3, 2026 12:35
Instead of sending all unattributed changes in one LLM call, group by
file type (java, build, config, infrastructure, renames) and make
separate calls. Each category gets focused LLM attention and a full
output budget.

Within each category, files are sorted by diff size descending so
larger, more interesting diffs get priority.

Configurable via --max-prompt-tokens (default 16000) to accommodate
different LLM context windows.

Results on coolstore (no-analysis mode):
- Before: 20 rules from 1 LLM call ($0.10)
- After: 27 rules from 6 LLM calls ($0.17)
- Coverage: 43% -> 58% of ground truth ruleset

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
Rules that detect target-framework patterns (quarkus, jakarta, etc.) in
their when condition are now filtered out post-generation. These are
after-state rules that would fire on already-migrated code.

Added build-file-specific prompt instructions telling the LLM to focus
on removed dependencies (the pre-migration state) and generate one rule
per dependency change.

Results: 4 after-state rules correctly filtered (quarkus-rest-client,
quarkus-rest-client-jackson, quarkus-smallrye-reactive-messaging,
quarkus-openshift-extension). 25 clean rules remain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Fabian von Feilitzsch <fabian@fabianism.us>
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had any activity for 60 days.
It will remain open for visibility and reporting purposes.
Please comment if this PR is still relevant.

@github-actions github-actions Bot added the stale label Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant