Skip to content

[PR 2/7] Add local checkpointing and progress tracking#70

Merged
gkamradt merged 3 commits into
checkpoint/pr1-storage-abstractionfrom
checkpoint/pr2-local-checkpointing
Jan 23, 2026
Merged

[PR 2/7] Add local checkpointing and progress tracking#70
gkamradt merged 3 commits into
checkpoint/pr1-storage-abstractionfrom
checkpoint/pr2-local-checkpointing

Conversation

@ericc59
Copy link
Copy Markdown
Contributor

@ericc59 ericc59 commented Jan 22, 2026

Summary

Implements two-level checkpointing system:

  • BatchProgressManager: tracks task status across the batch (pending, in_progress, completed, failed) with worker assignment and stale task recovery
  • TaskCheckpointManager: tracks within-task progress (attempts per test pair) for resume capability after interruption

Key features:

  • Persists to JSON files via storage abstraction
  • Schema versioning for future compatibility
  • retry_failed_tasks() to reset failed tasks for retry

Dependencies

  • Requires [PR 1/7] storage abstraction layer

Test plan

  • Run pytest src/arc_agi_benchmarking/tests/test_checkpoint.py (32 tests)
  • Run demo script: python scripts/demo_checkpoint.py

Implements two-level checkpointing system:
- BatchProgressManager: tracks task status across the batch (pending,
  in_progress, completed, failed) with worker assignment and stale
  task recovery
- TaskCheckpointManager: tracks within-task progress (attempts per
  test pair) for resume capability after interruption

Key features:
- Persists to JSON files via storage abstraction
- Schema versioning for future compatibility
- Decimal-based cost tracking (not float)
- retry_failed_tasks() to reset failed tasks for retry
- Comprehensive tests (32 tests) including edge cases

Includes demo script (scripts/demo_checkpoint.py) demonstrating
checkpointing with simulated task failures and retries.
Bug fixes:
- Fix race condition in claim_next_task() with retry loop
- Fix cost aggregation on task failure - mark_failed() now accepts
  cost/token params and accumulates to batch total
- Fix S3 exists() to raise StorageReadError on non-404 errors
  instead of silently returning False
- Add run_id validation on load - mismatched run_id starts fresh

Code quality:
- Replace deprecated datetime.utcnow() with datetime.now(timezone.utc)
  throughout checkpoint module

Tests:
- Add test_mark_failed_accumulates_costs
- Add test_run_id_mismatch_starts_fresh
- Fix test_reset_stale_tasks to use timezone-aware datetime

Updates demo script to pass costs to mark_failed().
@ericc59 ericc59 force-pushed the checkpoint/pr2-local-checkpointing branch from 4902f2d to 3158bfa Compare January 22, 2026 20:51
@gkamradt gkamradt merged commit 57303da into checkpoint/pr1-storage-abstraction Jan 23, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants