Skip to content

Hyoka bug fixes#640

Open
LarryOsterman wants to merge 1 commit into
ronniegeraghty/devfrom
larryo/for_ronnie
Open

Hyoka bug fixes#640
LarryOsterman wants to merge 1 commit into
ronniegeraghty/devfrom
larryo/for_ronnie

Conversation

@LarryOsterman
Copy link
Copy Markdown
Collaborator

No description provided.

ronniegeraghty pushed a commit that referenced this pull request May 5, 2026
Manually port the logical changes from Larry's branch without dragging
the gofmt indentation churn that made up ~95% of his diff:

1. workspace/delta.go: TakeSnapshot now skips build-artifact directories
   (target/, node_modules/, bin/, dist/, etc.) via utils.IsDefaultExcludedDir.
   Generated files were overwhelming reviewer prompts.

2. review/{types,reviewer,buckets}.go: ReviewResult gains a
   SkippedReviewers []SkippedReviewer field, populated by both
   ReviewPanel and ReviewPanelBuckets when a reviewer model fails.
   Users can now see which reviewers were skipped and why.

3. eval/copilot.go: Fixed two sites still using e.maxSessionActions
   instead of the per-eval maxSessionActionsLimit (already in scope from
   line 262, already used correctly at line 361). Also added counting
   for assistant.reasoning events — per Ronnie's directive, every
   action the agent takes (reasoning, tool calls, responses) counts
   toward the action limit, not just tool invocations.

Tests added (Switch):
- TestTakeSnapshot_BuildArtifactDirsExcluded
- TestReviewPanel_SkippedReviewers (+ JSON marshaling test)
- TestReviewPanelBuckets_SkippedReviewers
- Per-eval limit + assistant.reasoning counter tests in eval package

Refs: #640

Co-authored-by: Larry Osterman <larryo@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ronniegeraghty pushed a commit that referenced this pull request May 5, 2026
- Merged .squad/decisions/inbox/ (neo-pr640-port.md, switch-pr640-tests.md) into decisions.md
- Deleted inbox files after merge
- Added cross-agent notes to Oracle, Morpheus, Trinity, Tank history files:
  * Oracle: action semantics clarification (assistant.reasoning counts as action)
  * Morpheus: architectural implications of action counting
  * Trinity: SkippedReviewers field now surfaced in ReviewResult
  * Tank: build artifacts excluded from workspace snapshots
- decisions.md now 73KB (no archival needed; no entries >30 days old)

Commit: 703f638 (PR #640 port) — all 3 bugs fixed, 7 tests passing, all systems green.
@ronniegeraghty
Copy link
Copy Markdown
Owner

Hey @LarryOsterman — thanks for the fixes! I've ported the three logical changes from this branch to ronniegeraghty/dev as commit 703f638:

  1. Build-artifact dir skip in workspace/delta.go (utils.IsDefaultExcludedDir check inside the walk callback) — plus your TestTakeSnapshot_BuildArtifactDirsExcluded.
  2. SkippedReviewers field on ReviewResult, populated by both ReviewPanel and ReviewPanelBuckets.
  3. Action counter now uses the per-eval maxSessionActionsLimit at lines 600 and 622 (previously stuck on e.maxSessionActions).

One addition on top of yours: I also added assistant.reasoning as a counted event type in the action switch. The semantic we're going with is every agent action — reasoning, tool calls, bash, responses — counts against the same limit, not just tool invocations. Switch wrote tests covering per-eval limits, the new event type, and mixed-event counting.

I skipped the engine_eval.go portion of your diff since it was indentation-only churn (your editor reformatted on save). Cherry-picking the whole commit would have dragged ~170 lines of whitespace conflicts; the manual port keeps the diff minimal and surgical.

All tests green on the touched packages. Closing this PR isn't required from your end — just wanted to acknowledge the carry-over and keep you in the loop. Appreciate the bug hunting!

ronniegeraghty pushed a commit that referenced this pull request May 14, 2026
…matching, and inline graders

- PR #640 Fixes: Generated files exclusion, skipped reviewers reporting, action counter for reasoning+tool_execution_start
- Batch 2 (Per-property when negation): MatchSet polymorphic type, tags: matching with case-insensitive semantics
- Batch 3 (Inline graders): graders: field on prompt files, breaking change removal of evaluation_criteria: from YAML

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ronniegeraghty pushed a commit that referenced this pull request May 14, 2026
Document that actions include assistant.reasoning events, tool.execution_start events,
tool calls, bash commands, and responses. Update description to match PR #640 fix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ronniegeraghty pushed a commit that referenced this pull request May 14, 2026
- CHANGELOG: Entries for PR #640, MatchSet/tags:, inline graders
- docs/prompt-authoring.md: graders:, tags:, inline graders guide
- docs/graders/index.md: MatchSet polymorphic type, negation, tags: semantics
- docs/graders/activity.md: Action counter semantics (reasoning + tools)
- docs/architecture.md: Grader execution order, action counter
- examples/prompts/inline-graders-example.prompt.md: Markdown format example

Coverage: 100% complete. 0 gaps. 6 commits in dev. Build green.

Authored-by: Oracle
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants