Skip to content

docs(benchmark): record ALCF retry rerun#320

Merged
JaimeCernuda merged 1 commit into
developfrom
docs/alcf-benchmark-after-retry-20260524
May 24, 2026
Merged

docs(benchmark): record ALCF retry rerun#320
JaimeCernuda merged 1 commit into
developfrom
docs/alcf-benchmark-after-retry-20260524

Conversation

@JaimeCernuda
Copy link
Copy Markdown
Collaborator

Summary

  • update the ALCF demo benchmark report after the transient-provider retry fix
  • record the improved result: 14/15 clean passes, 1 expected surfaced error, 0 partial recoveries, 0 failures
  • mark the transient-provider follow-up as merged/validated in TASK.md

Evidence

  • live backend: http://127.0.0.1:17964
  • provider/model: ALCF Metis gpt-oss-120b
  • evidence JSONL: tmp/clio-demo-benchmark-alcf-metis-20260524-after-retry.jsonl
  • key change: workflow_memory_followup is now a clean pass with six Parquet tool calls instead of a partial recovery

Verification

  • uv run python scripts/run_demo_benchmark.py --base-url http://127.0.0.1:17964 --data-dir tmp/clio-benchmark-data --output-jsonl tmp/clio-demo-benchmark-alcf-metis-20260524-after-retry.jsonl --report docs/ALCF_DEMO_BENCHMARK_REPORT.md --case-delay-s 5

@JaimeCernuda JaimeCernuda merged commit 79eb550 into develop May 24, 2026
1 check failed
@JaimeCernuda JaimeCernuda deleted the docs/alcf-benchmark-after-retry-20260524 branch May 24, 2026 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant