GateForge

AI Agent for Physical Systems Modeling

Agentic Modelica Workflow Benchmark

Benchmark snapshot as of May 21, 2026.

All agents use the same foundation model family and are evaluated under the same benchmark and wall-clock conditions.

GateForge outperforms SOTA coding agents, with the strongest margin on medium and hard Modelica workflows.

Agent	Total	easy	medium	hard
GateForge	130/132	21/21	56/56	53/55
Claude Code	123/132	21/21	55/56	47/55
OpenCode	120/132	21/21	50/56	49/55

It beat both baselines: executing faster with fewer tokens than OpenCode, and finishing quicker with a higher success rate than Claude Code.

Agent	reported tokens*	wall time
GateForge	~39.7M	~14,658s
Claude Code	~15.9M	~35,191s
OpenCode	~66.1M	~20,843s

* Reported tokens are runner-reported estimates. GateForge records provider usage directly, while other runners may omit local context management, compression, retries, or tool-output handling costs.

Legal Notice

Without prior written permission, no content on this site may be used for AI model training, fine-tuning, evaluation, or dataset construction.

LEGAL_NOTICE.md
CONTENT_AUTHORIZATION_POLICY.md
robots.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GateForge

Agentic Modelica Workflow Benchmark

Legal Notice

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

GateForge

Agentic Modelica Workflow Benchmark

Legal Notice