Skip to content

[E1] Decouple ExecutionResponse from Copilot SDK + Multi-Agent Engine Support #10

@spboyer

Description

@spboyer

Migrated from spboyer/waza#195

Summary

Abstract ExecutionResponse away from copilot.SessionEvent types to enable multi-agent executor support. Then add engines for additional agents (Claude Code, Codex, generic CLI agent).

Problem

ExecutionResponse in internal/execution/engine.go directly imports and uses copilot.SessionEvent — tightly coupling the execution interface to a single agent SDK. This prevents adding executors for other agents (Claude Code, Codex, OpenCode, etc.).

SkillsBench supports 5+ agents via their Harbor framework. While our primary audience targets Copilot, cross-agent validation is increasingly valuable as skills become platform-agnostic.

Proposed Solution — Two Phases

Phase 1: Interface Decoupling (P1 — do now)

  1. Define generic event types in internal/execution/ that mirror the needed fields from copilot.SessionEvent without importing the Copilot SDK
  2. Update ExecutionResponse to use these generic types
  3. Copilot engine adapts SDK events → generic events internally
  4. No behavioral change — just cleaner abstraction

Phase 2: New Engines (P2 — do later)

  1. Claude Code engine: CLI wrapper that shells out to claude CLI, passes prompt, captures output
  2. Generic CLI agent engine: Configurable command template for any CLI-based agent
  3. Each engine: ~300-500 lines implementing AgentEngine interface (Initialize, Execute, Shutdown)

Implementation Notes

  • AgentEngine interface in engine.go is already well-abstracted — the issue is ExecutionResponse coupling
  • ExecutionResponse.Events []copilot.SessionEvent is the main coupling point
  • ExtractMessages() method checks evt.Type == copilot.AssistantMessage — needs generic equivalent
  • New engines will need API keys for each provider, which complicates getting-started experience

Acceptance Criteria

Phase 1

  • ExecutionResponse uses generic event types (no copilot import)
  • Copilot engine converts SDK events to generic types
  • All existing tests pass unchanged
  • No behavioral regression

Phase 2

  • At least one non-Copilot engine implemented (Claude Code recommended)
  • executor field in eval.yaml supports engine selection
  • Generic CLI engine supports configurable command templates
  • Documentation updated with multi-agent usage examples

Assignee Notes

Richard Park has explored supporting other engines — assign Phase 2 to him. Phase 1 (decoupling) can be done by anyone on the team as a prerequisite.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions