Nightly eval REGRESSION: benchmark 'cli_args' has a passing baseline but failed BOTH trials on 2026-06-11.
Error category: [logic_error]
Model: opencode-qwen3-5-35b-a3b-mxfp8 (local, tiers:smoke,core)
Results: /tmp/nightly_eval_20260611
It passed before, so this is a genuine regression — not a known gap. Investigate.
Binary info (auto-attached):
ailang version: v0.24.2-42-ge54841f4
git commit: e54841f
Reported by: nightly-eval via ailang messages
Nightly eval REGRESSION: benchmark 'cli_args' has a passing baseline but failed BOTH trials on 2026-06-11.
Error category: [logic_error]
Model: opencode-qwen3-5-35b-a3b-mxfp8 (local, tiers:smoke,core)
Results: /tmp/nightly_eval_20260611
It passed before, so this is a genuine regression — not a known gap. Investigate.
Binary info (auto-attached):
ailang version: v0.24.2-42-ge54841f4
git commit: e54841f
Reported by: nightly-eval via ailang messages