[E1] Decouple ExecutionResponse from Copilot SDK + Multi-Agent Engine Support

> _Migrated from [spboyer/waza#195](https://github.com/spboyer/waza/issues/195)_

## Summary

Abstract `ExecutionResponse` away from `copilot.SessionEvent` types to enable multi-agent executor support. Then add engines for additional agents (Claude Code, Codex, generic CLI agent).

## Problem

`ExecutionResponse` in `internal/execution/engine.go` directly imports and uses `copilot.SessionEvent` — tightly coupling the execution interface to a single agent SDK. This prevents adding executors for other agents (Claude Code, Codex, OpenCode, etc.).

SkillsBench supports 5+ agents via their Harbor framework. While our primary audience targets Copilot, cross-agent validation is increasingly valuable as skills become platform-agnostic.

## Proposed Solution — Two Phases

### Phase 1: Interface Decoupling (P1 — do now)

1. Define generic event types in `internal/execution/` that mirror the needed fields from `copilot.SessionEvent` without importing the Copilot SDK
2. Update `ExecutionResponse` to use these generic types
3. Copilot engine adapts SDK events → generic events internally
4. No behavioral change — just cleaner abstraction

### Phase 2: New Engines (P2 — do later)

1. **Claude Code engine:** CLI wrapper that shells out to `claude` CLI, passes prompt, captures output
2. **Generic CLI agent engine:** Configurable command template for any CLI-based agent
3. Each engine: ~300-500 lines implementing `AgentEngine` interface (`Initialize`, `Execute`, `Shutdown`)

## Implementation Notes

- `AgentEngine` interface in `engine.go` is already well-abstracted — the issue is `ExecutionResponse` coupling
- `ExecutionResponse.Events []copilot.SessionEvent` is the main coupling point
- `ExtractMessages()` method checks `evt.Type == copilot.AssistantMessage` — needs generic equivalent
- New engines will need API keys for each provider, which complicates getting-started experience

## Acceptance Criteria

### Phase 1
- [ ] `ExecutionResponse` uses generic event types (no `copilot` import)
- [ ] Copilot engine converts SDK events to generic types
- [ ] All existing tests pass unchanged
- [ ] No behavioral regression

### Phase 2
- [ ] At least one non-Copilot engine implemented (Claude Code recommended)
- [ ] `executor` field in eval.yaml supports engine selection
- [ ] Generic CLI engine supports configurable command templates
- [ ] Documentation updated with multi-agent usage examples

## Assignee Notes

Richard Park has explored supporting other engines — assign Phase 2 to him. Phase 1 (decoupling) can be done by anyone on the team as a prerequisite.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[E1] Decouple ExecutionResponse from Copilot SDK + Multi-Agent Engine Support #10

Summary

Problem

Proposed Solution — Two Phases

Phase 1: Interface Decoupling (P1 — do now)

Phase 2: New Engines (P2 — do later)

Implementation Notes

Acceptance Criteria

Phase 1

Phase 2

Assignee Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[E1] Decouple ExecutionResponse from Copilot SDK + Multi-Agent Engine Support #10

Description

Summary

Problem

Proposed Solution — Two Phases

Phase 1: Interface Decoupling (P1 — do now)

Phase 2: New Engines (P2 — do later)

Implementation Notes

Acceptance Criteria

Phase 1

Phase 2

Assignee Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions