Skip to content

Fast-fail on agent errors instead of waiting for lifecycle timeout #3

@nev-offload

Description

@nev-offload

Problem

The client has a 300-second lifecycle timeout. When the agent errors immediately (e.g., auth failure, model error), the action still waits the full 300s before timing out. This wastes CI minutes and makes debugging slow.

Expected behavior

  • Detect error responses from the gateway/agent immediately
  • If the agent returns an error (non-200 status, error in response body), fail the action right away
  • Only use the 300s timeout for genuinely long-running agent tasks

Implementation notes

  • Check HTTP response status codes — fail immediately on 4xx/5xx
  • Parse streaming responses for error events
  • Add a heartbeat/keepalive mechanism — if no progress for N seconds, assume failure
  • Consider separate timeouts: connection timeout (30s), idle timeout (60s), max timeout (300s)
  • Surface the actual error message in the action output for easy debugging

Priority

P1 — Major DX issue. Waiting 5 minutes to see an auth error is painful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions