Migrated from spboyer/waza#195
Summary
Abstract ExecutionResponse away from copilot.SessionEvent types to enable multi-agent executor support. Then add engines for additional agents (Claude Code, Codex, generic CLI agent).
Problem
ExecutionResponse in internal/execution/engine.go directly imports and uses copilot.SessionEvent — tightly coupling the execution interface to a single agent SDK. This prevents adding executors for other agents (Claude Code, Codex, OpenCode, etc.).
SkillsBench supports 5+ agents via their Harbor framework. While our primary audience targets Copilot, cross-agent validation is increasingly valuable as skills become platform-agnostic.
Proposed Solution — Two Phases
Phase 1: Interface Decoupling (P1 — do now)
- Define generic event types in
internal/execution/ that mirror the needed fields from copilot.SessionEvent without importing the Copilot SDK
- Update
ExecutionResponse to use these generic types
- Copilot engine adapts SDK events → generic events internally
- No behavioral change — just cleaner abstraction
Phase 2: New Engines (P2 — do later)
- Claude Code engine: CLI wrapper that shells out to
claude CLI, passes prompt, captures output
- Generic CLI agent engine: Configurable command template for any CLI-based agent
- Each engine: ~300-500 lines implementing
AgentEngine interface (Initialize, Execute, Shutdown)
Implementation Notes
AgentEngine interface in engine.go is already well-abstracted — the issue is ExecutionResponse coupling
ExecutionResponse.Events []copilot.SessionEvent is the main coupling point
ExtractMessages() method checks evt.Type == copilot.AssistantMessage — needs generic equivalent
- New engines will need API keys for each provider, which complicates getting-started experience
Acceptance Criteria
Phase 1
Phase 2
Assignee Notes
Richard Park has explored supporting other engines — assign Phase 2 to him. Phase 1 (decoupling) can be done by anyone on the team as a prerequisite.
Summary
Abstract
ExecutionResponseaway fromcopilot.SessionEventtypes to enable multi-agent executor support. Then add engines for additional agents (Claude Code, Codex, generic CLI agent).Problem
ExecutionResponseininternal/execution/engine.godirectly imports and usescopilot.SessionEvent— tightly coupling the execution interface to a single agent SDK. This prevents adding executors for other agents (Claude Code, Codex, OpenCode, etc.).SkillsBench supports 5+ agents via their Harbor framework. While our primary audience targets Copilot, cross-agent validation is increasingly valuable as skills become platform-agnostic.
Proposed Solution — Two Phases
Phase 1: Interface Decoupling (P1 — do now)
internal/execution/that mirror the needed fields fromcopilot.SessionEventwithout importing the Copilot SDKExecutionResponseto use these generic typesPhase 2: New Engines (P2 — do later)
claudeCLI, passes prompt, captures outputAgentEngineinterface (Initialize,Execute,Shutdown)Implementation Notes
AgentEngineinterface inengine.gois already well-abstracted — the issue isExecutionResponsecouplingExecutionResponse.Events []copilot.SessionEventis the main coupling pointExtractMessages()method checksevt.Type == copilot.AssistantMessage— needs generic equivalentAcceptance Criteria
Phase 1
ExecutionResponseuses generic event types (nocopilotimport)Phase 2
executorfield in eval.yaml supports engine selectionAssignee Notes
Richard Park has explored supporting other engines — assign Phase 2 to him. Phase 1 (decoupling) can be done by anyone on the team as a prerequisite.