Replies: 2 comments 2 replies
-
|
What's your hardware / OS? To reproduce those numbers you need the latest MLX (0.30.0) on macOS 26.2 (beta release) on the M5. |
Beta Was this translation helpful? Give feedback.
-
|
If you want to compare those numbers to M3 Ultra (stand-alone or clustered) or run some more tests for comparision I've published a bunch at https://github.com/guruswami-ai/mlx-benchmarks. I haven't got my hands on an M5 system yet, but your results seem impressive. I'm looking forward to building a distributed M5 mesh if/when they are available. To date, the lesson seems to be fit everything into one node if you can. An M5 Ultra with 512GB RAM will be impressive and likely bridge the gap to NVIDIA GPU hardware even more. I need to update my benchmarks https://github.com/guruswami-ai/mlx-benchmarks/blob/main/docs/APPLE_SILICON_GUIDE.md and the 'cluster simulator'at https://chakra.guruswami.ai with your M5 results. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey all,
I came across this article: https://machinelearning.apple.com/research/exploring-llms-mlx-m5, where Apple claims to have achieved 2.87 sec TTFT on the
MacBook Pro M5-24GBfor theGPT-OSS-20B-MXFP4-Q4model using MLX. However, I can’t seem to replicate those numbers — I’m getting a TTFT of ~8 sec.Note: None of the models listed in the article are performing as claimed.
Here’s my benchmarking setup:
mlx_lm/generate.pyscript. Here’s the PR containing those changes: https://github.com/ml-explore/mlx-lm/pull/633/filesIt would be great if anyone has observed similar or different results and could share their setup here. Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions