[1/4 Add AWS integration layer#74
Conversation
…iting, environment detection - DynamoDBProgressManager for distributed task coordination with atomic claim/complete/fail operations - DistributedRateLimiter using DynamoDB with rate/period interface and exponential backoff on contention - ExecutionContext detection for LOCAL/EC2/FARGATE/BATCH environments (handles IMDSv2) - FAILED_PERMANENT status prevents exhausted-retry tasks from re-queuing - Run-level cost/token aggregate rollups - boto3 moved to [aws] extras, moto to [test] extras
b6c2492 to
fdaddb8
Compare
b851227 to
cffe810
Compare
…handle_error Lambda is source of truth)
ab53802 to
abeecff
Compare
|
I could be missing something here, but - we lock the task in
...I could just be missing a task timeout somewhere too. |
When a worker dies mid-task, the task stays IN_PROGRESS forever. This adds stale task detection to prevent task lockout: - get_pending_tasks() now returns IN_PROGRESS tasks that haven't been updated within stale_timeout_seconds (default 15 minutes) - claim_task() allows reclaiming stale IN_PROGRESS tasks - Timeout configurable, set to 0 to disable
Good catch. Pushed a commit to address that. |
Summary
Adds foundational AWS integration components for distributed benchmark execution:
[aws]extras, moto to[test]extrasPart of
This is PR 1/3 for AWS distributed execution:
Test plan