Skip to content

chore(e2ebench): main-v2 cache baseline marker (do not merge)#3099

Closed
esengine wants to merge 1 commit into
main-v2from
chore/e2e-baseline
Closed

chore(e2ebench): main-v2 cache baseline marker (do not merge)#3099
esengine wants to merge 1 commit into
main-v2from
chore/e2e-baseline

Conversation

@esengine
Copy link
Copy Markdown
Owner

@esengine esengine commented Jun 4, 2026

Do not merge. Empty commit off main-v2 HEAD (no file changes) so the e2e bot builds the current main-v2 agent verbatim. This gives a cache-hit baseline to compare the cache-sensitive PRs (#2958/#3027/#3037) against while we chase the cache-hit regression reported in #3091 — in particular the per-task cache hit on the compaction task.

/e2e

@esengine
Copy link
Copy Markdown
Owner Author

esengine commented Jun 4, 2026

Baseline run against main-v2 HEAD.

/e2e

@github-actions github-actions Bot added the v2 Go rewrite (1.x) — main-v2 branch, active development label Jun 4, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

🤖 Reasonix e2e benchmark

Accuracy: 4/4 (100%) · Cache hit: 79% · Tokens: 160,565 (prompt 158,394 / completion 2,171) · Compactions: 3 · Cost: ¥ 0.0406

Task Result Steps Prompt Completion Cache hit Compact Cost
compaction ✅ pass 8 102,643 1,061 68% 3 ¥ 0.0365
fix-add-bug ✅ pass 4 24,858 420 98% 0 ¥ 0.0017
fizzbuzz ✅ pass 2 12,258 244 99% 0 ¥ 0.0008
palindrome ✅ pass 3 18,635 446 99% 0 ¥ 0.0015

Real provider run. Cache-hit % is cached prompt tokens / total prompt tokens.

agent: PR head (00da2bc) · triggered by @esengine

@esengine
Copy link
Copy Markdown
Owner Author

esengine commented Jun 4, 2026

Baseline captured: main-v2 HEAD = 79% total / 68% on the compaction task / 4-4. Recorded for the #3091 cache investigation; closing the marker.

@esengine esengine closed this Jun 4, 2026
@esengine esengine deleted the chore/e2e-baseline branch June 4, 2026 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant