Skip to content

Add criteria comparison and tool/skill usage to summary reports#89

Draft
samvaity wants to merge 1 commit into
mainfrom
samvaity/improve-summary-report
Draft

Add criteria comparison and tool/skill usage to summary reports#89
samvaity wants to merge 1 commit into
mainfrom
samvaity/improve-summary-report

Conversation

@samvaity
Copy link
Copy Markdown
Collaborator

@samvaity samvaity commented Apr 4, 2026

Summary

Add criteria-level comparison and tool/skill usage tables to the summary markdown report, addressing #86.

New Sections

Criteria Comparison — Per-prompt table showing pass/fail for each criterion across configs:

Criteria azure-mcp baseline-skills baseline
Code Builds pass fail fail
Best Practices pass pass fail
Blob index tags pass pass missing

Plus strengths/issues listed per config.

Tool and Skill Usage by Config — Quick reference table:

Config Tools Used MCP Calls Skills Invoked Duration
azure-mcp report_intent, azure-get_azure_bestpractices, create 2 n/a 806s
baseline-skills report_intent, create, skill, view 0 azure-storage-blob-java 678s
baseline report_intent, create, powershell 0 n/a 565s

Motivation

The previous summary only showed scores (e.g., 11/12, 8/12, 5/12) without showing which criteria passed or failed in each config. This made it hard to understand what skills/MCP tools actually improved.

Fixes #86

@samvaity samvaity force-pushed the samvaity/improve-summary-report branch from e545800 to 6fa5ab4 Compare April 6, 2026 18:47
Add two new sections to the summary markdown:

1. Criteria Comparison — per-prompt cross-config table showing which
   criteria passed/failed in each config, with strengths and issues
   listed per config. Inspired by the manual evaluation-results.md
   format.

2. Tool & Skill Usage by Config — table showing tools used, MCP calls,
   skills invoked, and duration per config for quick comparison.

Fixes #86

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@samvaity samvaity force-pushed the samvaity/improve-summary-report branch from 6fa5ab4 to b5a67e9 Compare April 6, 2026 21:17
@samvaity samvaity closed this Apr 8, 2026
@samvaity samvaity reopened this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update the summary tab to generate evaluation results

1 participant