Based on the 6 benchmark runs of deepseek-v4-flash on the M3 Max 128GB, the comparison between the two cooling strategies is unequivocal:
📊 Core Conclusion: Maximum Fan Speed Dominates Auto Fan Control
| Metric |
Auto Fan (Runs 1–3) |
Max Fan (Runs 4–6) |
Improvement |
| Avg Prefill Speed |
157.0 TPS |
178.2 TPS |
+13.5% |
| Avg Generation Speed |
14.74 TPS |
16.73 TPS |
+13.5% |
| Prefill Variability (CV) |
28.5% |
26.1% |
More Stable |
| Generation Variability (CV) |
21.1% |
15.3% |
More Stable |
🔍 Key Findings
1. The Auto Fan Strategy Suffers from Severe Performance Jitter
- In the 20K–40K context window range, Runs 2 and 3 experienced a catastrophic performance cliff.
- The worst data point occurs at ctx=28672:
- Run 2: Prefill crashed to 124.6 TPS (down 40% from Run 1’s 207.8), Gen crashed to 10.8 TPS (down 44%).
- Run 3: Prefill 142.7 TPS, Gen 12.1 TPS.
- This indicates the auto fan fails to dissipate heat under sustained load, triggering aggressive thermal throttling.
2. Max Fan Speed is Remarkably Consistent
- Runs 4, 5, and 6 trace nearly identical curves for both prefill and generation speeds, degrading smoothly as context length increases.
- At the same critical point (ctx=28672), the three runs recorded 215.4, 216.0, and 224.0 TPS respectively—minimal variance.
3. The "Sweet Spot" for Gains is the 25K–35K Context Range
- This is where the auto fan’s thermal throttling is most severe, making the max fan advantage most pronounced:
- Prefill peak gain: +60.1 TPS (+38%) at ctx=28672
- Generation peak gain: +5.5 TPS (+39%) at ctx=28672
4. Long-Context Gains Persist (70K+)
- Even at 102K context length, max fan maintains a stable 11–15% speed advantage over auto fan.
💡 Recommendation
For sustained inference workloads like deepseek-v4-flash on the M3 Max 128GB:
Lock the fans to maximum speed. The auto fan strategy not only delivers lower average throughput, but more critically, it triggers unpredictable thermal throttling in the mid-range context window (20K–40K), causing wild performance jitter. Maximum fan speed provides both higher average performance and run-to-run consistency.
Detailed comparison charts are available for download:
Full Fan Strategy Comparison
Critical Region Stability Analysis
###AUTO fan control

[m3_max_128gb.1.csv](https://github.com/user-attachments/files/27739216/m3_max_128gb.1.csv)

[m3_max_128gb.2.csv](https://github.com/user-attachments/files/27739220/m3_max_128gb.2.csv)

[m3_max_128gb.3.csv](https://github.com/user-attachments/files/27739221/m3_max_128gb.3.csv)
MAX fan speed

[m3_max_128gb.4.csv](https://github.com/user-attachments/files/27739230/m3_max_128gb.4.csv)

[m3_max_128gb.6.csv](https://github.com/user-attachments/files/27739252/m3_max_128gb.6.csv)
Based on the 6 benchmark runs of deepseek-v4-flash on the M3 Max 128GB, the comparison between the two cooling strategies is unequivocal:
📊 Core Conclusion: Maximum Fan Speed Dominates Auto Fan Control
🔍 Key Findings
1. The Auto Fan Strategy Suffers from Severe Performance Jitter
2. Max Fan Speed is Remarkably Consistent
3. The "Sweet Spot" for Gains is the 25K–35K Context Range
4. Long-Context Gains Persist (70K+)
💡 Recommendation
For sustained inference workloads like deepseek-v4-flash on the M3 Max 128GB:
Detailed comparison charts are available for download:
Full Fan Strategy Comparison
Critical Region Stability Analysis
###AUTO fan control
MAX fan speed