Hyoka bug fixes#640
Conversation
Manually port the logical changes from Larry's branch without dragging
the gofmt indentation churn that made up ~95% of his diff:
1. workspace/delta.go: TakeSnapshot now skips build-artifact directories
(target/, node_modules/, bin/, dist/, etc.) via utils.IsDefaultExcludedDir.
Generated files were overwhelming reviewer prompts.
2. review/{types,reviewer,buckets}.go: ReviewResult gains a
SkippedReviewers []SkippedReviewer field, populated by both
ReviewPanel and ReviewPanelBuckets when a reviewer model fails.
Users can now see which reviewers were skipped and why.
3. eval/copilot.go: Fixed two sites still using e.maxSessionActions
instead of the per-eval maxSessionActionsLimit (already in scope from
line 262, already used correctly at line 361). Also added counting
for assistant.reasoning events — per Ronnie's directive, every
action the agent takes (reasoning, tool calls, responses) counts
toward the action limit, not just tool invocations.
Tests added (Switch):
- TestTakeSnapshot_BuildArtifactDirsExcluded
- TestReviewPanel_SkippedReviewers (+ JSON marshaling test)
- TestReviewPanelBuckets_SkippedReviewers
- Per-eval limit + assistant.reasoning counter tests in eval package
Refs: #640
Co-authored-by: Larry Osterman <larryo@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Merged .squad/decisions/inbox/ (neo-pr640-port.md, switch-pr640-tests.md) into decisions.md - Deleted inbox files after merge - Added cross-agent notes to Oracle, Morpheus, Trinity, Tank history files: * Oracle: action semantics clarification (assistant.reasoning counts as action) * Morpheus: architectural implications of action counting * Trinity: SkippedReviewers field now surfaced in ReviewResult * Tank: build artifacts excluded from workspace snapshots - decisions.md now 73KB (no archival needed; no entries >30 days old) Commit: 703f638 (PR #640 port) — all 3 bugs fixed, 7 tests passing, all systems green.
|
Hey @LarryOsterman — thanks for the fixes! I've ported the three logical changes from this branch to
One addition on top of yours: I also added I skipped the All tests green on the touched packages. Closing this PR isn't required from your end — just wanted to acknowledge the carry-over and keep you in the loop. Appreciate the bug hunting! |
…matching, and inline graders - PR #640 Fixes: Generated files exclusion, skipped reviewers reporting, action counter for reasoning+tool_execution_start - Batch 2 (Per-property when negation): MatchSet polymorphic type, tags: matching with case-insensitive semantics - Batch 3 (Inline graders): graders: field on prompt files, breaking change removal of evaluation_criteria: from YAML Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document that actions include assistant.reasoning events, tool.execution_start events, tool calls, bash commands, and responses. Update description to match PR #640 fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- CHANGELOG: Entries for PR #640, MatchSet/tags:, inline graders - docs/prompt-authoring.md: graders:, tags:, inline graders guide - docs/graders/index.md: MatchSet polymorphic type, negation, tags: semantics - docs/graders/activity.md: Action counter semantics (reasoning + tools) - docs/architecture.md: Grader execution order, action counter - examples/prompts/inline-graders-example.prompt.md: Markdown format example Coverage: 100% complete. 0 gaps. 6 commits in dev. Build green. Authored-by: Oracle Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
No description provided.