Skip to content

speed benchmark on M3 Max Macbook Pro with different fan control. #126

@taozhiyuai

Description

@taozhiyuai

Based on the 6 benchmark runs of deepseek-v4-flash on the M3 Max 128GB, the comparison between the two cooling strategies is unequivocal:


📊 Core Conclusion: Maximum Fan Speed Dominates Auto Fan Control

Metric Auto Fan (Runs 1–3) Max Fan (Runs 4–6) Improvement
Avg Prefill Speed 157.0 TPS 178.2 TPS +13.5%
Avg Generation Speed 14.74 TPS 16.73 TPS +13.5%
Prefill Variability (CV) 28.5% 26.1% More Stable
Generation Variability (CV) 21.1% 15.3% More Stable

🔍 Key Findings

1. The Auto Fan Strategy Suffers from Severe Performance Jitter

  • In the 20K–40K context window range, Runs 2 and 3 experienced a catastrophic performance cliff.
  • The worst data point occurs at ctx=28672:
    • Run 2: Prefill crashed to 124.6 TPS (down 40% from Run 1’s 207.8), Gen crashed to 10.8 TPS (down 44%).
    • Run 3: Prefill 142.7 TPS, Gen 12.1 TPS.
  • This indicates the auto fan fails to dissipate heat under sustained load, triggering aggressive thermal throttling.

2. Max Fan Speed is Remarkably Consistent

  • Runs 4, 5, and 6 trace nearly identical curves for both prefill and generation speeds, degrading smoothly as context length increases.
  • At the same critical point (ctx=28672), the three runs recorded 215.4, 216.0, and 224.0 TPS respectively—minimal variance.

3. The "Sweet Spot" for Gains is the 25K–35K Context Range

  • This is where the auto fan’s thermal throttling is most severe, making the max fan advantage most pronounced:
    • Prefill peak gain: +60.1 TPS (+38%) at ctx=28672
    • Generation peak gain: +5.5 TPS (+39%) at ctx=28672

4. Long-Context Gains Persist (70K+)

  • Even at 102K context length, max fan maintains a stable 11–15% speed advantage over auto fan.

💡 Recommendation

For sustained inference workloads like deepseek-v4-flash on the M3 Max 128GB:

Lock the fans to maximum speed. The auto fan strategy not only delivers lower average throughput, but more critically, it triggers unpredictable thermal throttling in the mid-range context window (20K–40K), causing wild performance jitter. Maximum fan speed provides both higher average performance and run-to-run consistency.

Detailed comparison charts are available for download:

Full Fan Strategy Comparison

Critical Region Stability Analysis

###AUTO fan control

Image [m3_max_128gb.1.csv](https://github.com/user-attachments/files/27739216/m3_max_128gb.1.csv) Image [m3_max_128gb.2.csv](https://github.com/user-attachments/files/27739220/m3_max_128gb.2.csv) Image [m3_max_128gb.3.csv](https://github.com/user-attachments/files/27739221/m3_max_128gb.3.csv)

MAX fan speed

Image [m3_max_128gb.4.csv](https://github.com/user-attachments/files/27739230/m3_max_128gb.4.csv) Image Image [m3_max_128gb.6.csv](https://github.com/user-attachments/files/27739252/m3_max_128gb.6.csv)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions