Interactive Distributed Inference Simulator — chakra.guruswami.ai #1070
Replies: 1 comment
-
|
A bit of context on how this happened — it started because I wanted a cool real-time visualisation of distributed inference running on the cluster, similar to what the Exo team had built for their setup. I thought it would be a quick weekend project. Several months later, there are 185 music tracks, a talking guru character with 1,100 voice clips, and an interactive tutorial that teaches you distributed inference by deliberately crashing the cluster. The simulator runs entirely client-side with static benchmark data, but there's also a live mode that connects to the real cluster via MQTT and shows real-time TPS, memory, GPU utilisation, and TB5 mesh status. If anyone running MLX distributed inference wants a real-time dashboard/visualisation for their cluster, I'm happy to make that available as a separate tool. It works with any number of Apple Silicon nodes. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We've been building an interactive simulator to teach distributed inference concepts on Apple Silicon clusters using MLX. It uses real benchmark data from a 5-node M3 Ultra cluster (512GB each, TB5 mesh) to visualise how model configuration choices affect TPS, TTFT, memory, and topology.
Try it: https://chakra.guruswami.ai
What it does
Caveats — it's a work in progress
What it teaches (without ever saying "distributed inference")
Real benchmarks behind it
All the TPS, TTFT, and memory numbers come from actual MLX distributed runs on our cluster. The scaling factors, error conditions, and topology constraints are measured, not estimated. The simulator lets people explore these tradeoffs interactively without needing the hardware.
Next steps
If you have ideas on how to improve the teaching or spot things that are wrong, we'd love to hear it. The goal is to make distributed inference on Apple Silicon accessible to people who don't have a 5-node cluster to experiment with.
Built with React, XState, and real data from mlx-lm benchmarks.
Beta Was this translation helpful? Give feedback.
All reactions