Untuned compile/torch throughput regressed 30-60% vs prior README on humanoid/halfcheetah

## Summary

Post-merge of PR #68 (recompile-elimination + device_put fixes), untuned \`torch.compile(vmap(single_step), fullgraph=True)\` throughput on H200 regressed vs the prior README numbers for several envs. Tuned numbers + per-env root cause have not been re-measured yet.

## Observed vs prior README (H200, float64, 1000 steps, 7 batch sizes)

| env/B | prior (steps/s) | new (steps/s) | delta |
|---|---:|---:|---:|
| humanoid/32768 | 2.02M | 1.21M | -40% |
| humanoid/65536 | ~1.97M | 1.16M | -41% |
| humanoid/131072 | ~1.86M | 1.12M | -40% |
| halfcheetah/32768 | ~3.72M | 1.20M | -68% |

Compile time is also ~2-4x longer per env.

## Hypotheses to check

- Inductor tuning regression: try \`--tuned\` (coordinate-descent + aggressive fusion) and see if the gap closes.
- Graph structure: compare \`TORCH_LOGS=inductor,graph_breaks\` traces against the last-known-good commit.
- New overhead from the \`torch.compiler.is_compiling()\` guards (multiple new conditional branches inside \`step()\`).
- Extra host-side work in \`make_data\` (warm caches, precomp).

## Data

JSONL outputs from the 305299 sweep are in \`~/bench_all_305299/\` on steve.

## Not in scope

Immediate fix — follow-up agent will investigate with \`TORCH_LOGS=graph_breaks,recompiles,inductor\` + compare against the pre-PR commit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Untuned compile/torch throughput regressed 30-60% vs prior README on humanoid/halfcheetah #70

Summary

Observed vs prior README (H200, float64, 1000 steps, 7 batch sizes)

Hypotheses to check

Data

Not in scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

env/B	prior (steps/s)	new (steps/s)	delta
humanoid/32768	2.02M	1.21M	-40%
humanoid/65536	~1.97M	1.16M	-41%
humanoid/131072	~1.86M	1.12M	-40%
halfcheetah/32768	~3.72M	1.20M	-68%

Untuned compile/torch throughput regressed 30-60% vs prior README on humanoid/halfcheetah #70

Description

Summary

Observed vs prior README (H200, float64, 1000 steps, 7 batch sizes)

Hypotheses to check

Data

Not in scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions