Bug
API Error: The socket connection was closed unexpectedly mid-response in Claude Code, particularly during large tool call outputs (e.g. writing large files). Small responses complete fine.
Root cause
The CDK stack sets memoryLimitMiB: 1024 (1 GB). Under production load — with the Rust runtime, Postgres connection pool, background loops (health poller, spend tracker, pricing refresh, cache poller), and one or more concurrent large streaming responses — the process hits the memory ceiling and the OS kills it (exit code 137). From the client's perspective the TCP connection drops without a clean close.
This is not an ALB timeout issue. The ALB idle timeout is already set to 900s and data is actively flowing during the drop. There is no tower/axum timeout middleware in CCAG. The culprit is the process being killed mid-stream.
Evidence
- Error is client-side socket close, not a timeout error
- Small responses always succeed; large responses drop
- No
TimeoutLayer or .timeout() in src/main.rs or src/api/mod.rs
- Fargate task definition: 512 CPU units, 1024 MiB — tight for a multi-threaded async server under load
Fix
Increase memoryLimitMiB from 1024 to 2048 in infra/stack.ts:
const taskDef = new ecs.FargateTaskDefinition(this, 'TaskDef', {
memoryLimitMiB: 2048, // was 1024 — insufficient under production load
cpu: 512,
2048 MiB provides enough headroom for concurrent streaming responses without being excessive. 4096 MiB is an option if issues persist under very high concurrency.
Steps to reproduce
- Deploy CCAG on Fargate with default CDK stack (1024 MiB)
- Ask Claude Code to write a large file (>500 lines) in a single tool call
- Observe
API Error: The socket connection was closed unexpectedly mid-response
- Check CloudWatch logs for the ECS task — look for exit code 137 around the time of the drop
Bug
API Error: The socket connection was closed unexpectedlymid-response in Claude Code, particularly during large tool call outputs (e.g. writing large files). Small responses complete fine.Root cause
The CDK stack sets
memoryLimitMiB: 1024(1 GB). Under production load — with the Rust runtime, Postgres connection pool, background loops (health poller, spend tracker, pricing refresh, cache poller), and one or more concurrent large streaming responses — the process hits the memory ceiling and the OS kills it (exit code 137). From the client's perspective the TCP connection drops without a clean close.This is not an ALB timeout issue. The ALB idle timeout is already set to 900s and data is actively flowing during the drop. There is no tower/axum timeout middleware in CCAG. The culprit is the process being killed mid-stream.
Evidence
TimeoutLayeror.timeout()insrc/main.rsorsrc/api/mod.rsFix
Increase
memoryLimitMiBfrom 1024 to 2048 ininfra/stack.ts:2048 MiB provides enough headroom for concurrent streaming responses without being excessive. 4096 MiB is an option if issues persist under very high concurrency.
Steps to reproduce
API Error: The socket connection was closed unexpectedlymid-response