Skip to content

fix(infra): Fargate task OOM-kills mid-stream — increase default memory from 1024 to 2048 MiB #56

@jonathan-nicoletti

Description

@jonathan-nicoletti

Bug

API Error: The socket connection was closed unexpectedly mid-response in Claude Code, particularly during large tool call outputs (e.g. writing large files). Small responses complete fine.

Root cause

The CDK stack sets memoryLimitMiB: 1024 (1 GB). Under production load — with the Rust runtime, Postgres connection pool, background loops (health poller, spend tracker, pricing refresh, cache poller), and one or more concurrent large streaming responses — the process hits the memory ceiling and the OS kills it (exit code 137). From the client's perspective the TCP connection drops without a clean close.

This is not an ALB timeout issue. The ALB idle timeout is already set to 900s and data is actively flowing during the drop. There is no tower/axum timeout middleware in CCAG. The culprit is the process being killed mid-stream.

Evidence

  • Error is client-side socket close, not a timeout error
  • Small responses always succeed; large responses drop
  • No TimeoutLayer or .timeout() in src/main.rs or src/api/mod.rs
  • Fargate task definition: 512 CPU units, 1024 MiB — tight for a multi-threaded async server under load

Fix

Increase memoryLimitMiB from 1024 to 2048 in infra/stack.ts:

const taskDef = new ecs.FargateTaskDefinition(this, 'TaskDef', {
  memoryLimitMiB: 2048,  // was 1024 — insufficient under production load
  cpu: 512,

2048 MiB provides enough headroom for concurrent streaming responses without being excessive. 4096 MiB is an option if issues persist under very high concurrency.

Steps to reproduce

  1. Deploy CCAG on Fargate with default CDK stack (1024 MiB)
  2. Ask Claude Code to write a large file (>500 lines) in a single tool call
  3. Observe API Error: The socket connection was closed unexpectedly mid-response
  4. Check CloudWatch logs for the ECS task — look for exit code 137 around the time of the drop

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions