Meta
type: DesignDecision
stage: draft
maturity: L1
created: 2026-05-10
inputs:
- "OpenAI harness engineering (2026-02) — prefer composable, API-stable, training-data-represented technologies"
- "#24 — version pinning (dependency cadence)"
- "#14 — architectural drivers"
related: ["#14", "#24", "#33"]
Purpose. Establish a technology selection criterion that prefers boring, auditable, and agent-legible dependencies over novel or opaque ones — and define when re-implementing a thin wrapper is preferable to taking a complex upstream dependency.
Context
From OpenAI's harness engineering experience:
"Technologies often described as 'boring' are significantly easier for agents to model, due to composability, API stability, and their representation in the training dataset."
"In some cases it was cheaper to have the agent re-implement subsets of functionality rather than work around opaque upstream behavior from public libraries. For example, instead of using a generic p-limit-style package, we implemented our own map-with-concurrency helper: it's tightly integrated with our OpenTelemetry instrumentation, has 100% test coverage, and behaves exactly as our runtime expects."
"More of the system in a form that the agent can directly inspect, validate, and modify increases leverage — not just for Codex, but for other agents as well."
This principle applies directly to every dependency decision in this project, and is distinct from the version pinning concern in #24 (which specifies what to pin; this issue specifies how to choose).
The preference hierarchy
When selecting a dependency or deciding whether to re-implement, evaluate in order:
1. Agent-legibility (highest weight)
Can an agent reason about the full behavior of this dependency from the repository alone?
- Prefer: dependencies with stable, well-documented APIs; behavior fully specified in docs or source; no magic or hidden state
- Avoid: libraries with opaque internals, plugin ecosystems, or undocumented edge cases that require reading issues/PRs to understand
2. API stability
Does this library have a stable public API that changes infrequently?
3. Training data representation
Is this technology well-represented in LLM training data?
- Prefer: established libraries with large communities and extensive Stack Overflow / documentation coverage
- Avoid: niche or recently-published libraries with minimal public discussion
4. Re-implement threshold
When should we re-implement rather than depend?
Re-implement a thin wrapper when all three hold:
- The upstream library does ≥30% more than we need (the overhead isn't worth the complexity)
- The implementation is ≤ ~100 lines and has 100% test coverage
- The re-implementation can be tightly integrated with our observability/instrumentation
Do not re-implement when:
- The upstream library handles security-sensitive concerns (crypto, auth, sandboxing)
- The re-implementation would require maintaining parsing logic for external formats (JSON, YAML, semver)
Specific decisions to document
This issue should produce ADR entries for each of the following decisions (or confirm the existing choice against the criteria above):
| Dependency area |
Question |
Notes |
| Schema validation |
Zod vs. hand-rolled vs. other |
#31 covers WorkflowPackageLoader specifically; this covers the general policy |
| LLM client |
Anthropic SDK vs. custom HTTP client |
Should align with LlmProviderPort abstraction in #14 |
| Git worktree management |
simple-git vs. raw execa git calls vs. custom |
SAO (#43) depends on this; opaque magic in wrappers is a risk |
| Concurrency utilities |
p-limit style vs. custom (per harness engineering insight) |
SAO admission control (#51) depends on this |
| Logging |
pino vs. winston vs. console-only |
Must align with LoggerPort shape; #14 §3.1 specifies console-only for public surface |
| Test runner |
Vitest (decided in #26) |
Already decided; document the rationale here |
Acceptance
Meta
Context
From OpenAI's harness engineering experience:
This principle applies directly to every dependency decision in this project, and is distinct from the version pinning concern in #24 (which specifies what to pin; this issue specifies how to choose).
The preference hierarchy
When selecting a dependency or deciding whether to re-implement, evaluate in order:
1. Agent-legibility (highest weight)
Can an agent reason about the full behavior of this dependency from the repository alone?
2. API stability
Does this library have a stable public API that changes infrequently?
3. Training data representation
Is this technology well-represented in LLM training data?
4. Re-implement threshold
When should we re-implement rather than depend?
Re-implement a thin wrapper when all three hold:
Do not re-implement when:
Specific decisions to document
This issue should produce ADR entries for each of the following decisions (or confirm the existing choice against the criteria above):
LlmProviderPortabstraction in #14simple-gitvs. rawexecagit calls vs. customp-limitstyle vs. custom (per harness engineering insight)pinovs.winstonvs. console-onlyLoggerPortshape; #14 §3.1 specifies console-only for public surfaceAcceptance
docs/adr/XXXX-technology-selection-principles.md)docs/ARCHITECTURE.md