Skip to content

Phase 1 · Technology selection principles — prefer boring, auditable, and agent-legible dependencies #56

@Luis85

Description

@Luis85

Meta

type: DesignDecision
stage: draft
maturity: L1
created: 2026-05-10
inputs:
  - "OpenAI harness engineering (2026-02) — prefer composable, API-stable, training-data-represented technologies"
  - "#24 — version pinning (dependency cadence)"
  - "#14 — architectural drivers"
related: ["#14", "#24", "#33"]

Purpose. Establish a technology selection criterion that prefers boring, auditable, and agent-legible dependencies over novel or opaque ones — and define when re-implementing a thin wrapper is preferable to taking a complex upstream dependency.


Context

From OpenAI's harness engineering experience:

"Technologies often described as 'boring' are significantly easier for agents to model, due to composability, API stability, and their representation in the training dataset."

"In some cases it was cheaper to have the agent re-implement subsets of functionality rather than work around opaque upstream behavior from public libraries. For example, instead of using a generic p-limit-style package, we implemented our own map-with-concurrency helper: it's tightly integrated with our OpenTelemetry instrumentation, has 100% test coverage, and behaves exactly as our runtime expects."

"More of the system in a form that the agent can directly inspect, validate, and modify increases leverage — not just for Codex, but for other agents as well."

This principle applies directly to every dependency decision in this project, and is distinct from the version pinning concern in #24 (which specifies what to pin; this issue specifies how to choose).


The preference hierarchy

When selecting a dependency or deciding whether to re-implement, evaluate in order:

1. Agent-legibility (highest weight)

Can an agent reason about the full behavior of this dependency from the repository alone?

  • Prefer: dependencies with stable, well-documented APIs; behavior fully specified in docs or source; no magic or hidden state
  • Avoid: libraries with opaque internals, plugin ecosystems, or undocumented edge cases that require reading issues/PRs to understand

2. API stability

Does this library have a stable public API that changes infrequently?

3. Training data representation

Is this technology well-represented in LLM training data?

  • Prefer: established libraries with large communities and extensive Stack Overflow / documentation coverage
  • Avoid: niche or recently-published libraries with minimal public discussion

4. Re-implement threshold

When should we re-implement rather than depend?

Re-implement a thin wrapper when all three hold:

  1. The upstream library does ≥30% more than we need (the overhead isn't worth the complexity)
  2. The implementation is ≤ ~100 lines and has 100% test coverage
  3. The re-implementation can be tightly integrated with our observability/instrumentation

Do not re-implement when:

  • The upstream library handles security-sensitive concerns (crypto, auth, sandboxing)
  • The re-implementation would require maintaining parsing logic for external formats (JSON, YAML, semver)

Specific decisions to document

This issue should produce ADR entries for each of the following decisions (or confirm the existing choice against the criteria above):

Dependency area Question Notes
Schema validation Zod vs. hand-rolled vs. other #31 covers WorkflowPackageLoader specifically; this covers the general policy
LLM client Anthropic SDK vs. custom HTTP client Should align with LlmProviderPort abstraction in #14
Git worktree management simple-git vs. raw execa git calls vs. custom SAO (#43) depends on this; opaque magic in wrappers is a risk
Concurrency utilities p-limit style vs. custom (per harness engineering insight) SAO admission control (#51) depends on this
Logging pino vs. winston vs. console-only Must align with LoggerPort shape; #14 §3.1 specifies console-only for public surface
Test runner Vitest (decided in #26) Already decided; document the rationale here

Acceptance

  • Preference hierarchy written as an ADR (docs/adr/XXXX-technology-selection-principles.md)
  • Re-implement threshold criteria documented in the ADR
  • Each dependency area in the table above has a V1 decision recorded
  • All V1 dependency decisions satisfy the preference hierarchy or document an explicit exception
  • ADR referenced from docs/ARCHITECTURE.md
  • Phase 1 · Version pinning — agentic-workflow major and sim-ecs range #24 (version pinning) updated to reference this ADR as the selection rationale

Metadata

Metadata

Assignees

No one assigned

    Labels

    roadmap:architecturePhase 1: ratified architecture proposal, data model, and design decisions before code.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions