Skip to content

Latest commit

 

History

History
43 lines (31 loc) · 2.24 KB

File metadata and controls

43 lines (31 loc) · 2.24 KB

GateForge

CI  Release  License  Python >= 3.10

AI Agent for Physical Systems Modeling

Agentic Modelica Workflow Benchmark

Benchmark snapshot as of May 21, 2026.

All agents use the same foundation model family and are evaluated under the same benchmark and wall-clock conditions.

GateForge outperforms SOTA coding agents, with the strongest margin on medium and hard Modelica workflows.

Agent Total easy medium hard
GateForge 130/132 21/21 56/56 53/55
Claude Code 123/132 21/21 55/56 47/55
OpenCode 120/132 21/21 50/56 49/55

It beat both baselines: executing faster with fewer tokens than OpenCode, and finishing quicker with a higher success rate than Claude Code.

Agent reported tokens* wall time
GateForge ~39.7M ~14,658s
Claude Code ~15.9M ~35,191s
OpenCode ~66.1M ~20,843s

* Reported tokens are runner-reported estimates. GateForge records provider usage directly, while other runners may omit local context management, compression, retries, or tool-output handling costs.

Legal Notice

Without prior written permission, no content on this site may be used for AI model training, fine-tuning, evaluation, or dataset construction.

  • LEGAL_NOTICE.md
  • CONTENT_AUTHORIZATION_POLICY.md
  • robots.txt