A real-time D3.js training dashboard that visualizes GPU metrics, loss curves, gradient flow, and more — with live streaming from Google Colab or Kaggle notebooks via cloudflared tunnels.
- 12 interactive D3.js charts: Loss + Accuracy, GPU Utilization, Memory, Throughput, Gradient Norms, LR Schedule, Step Time Histogram, Gradient Flow Heatmap, ZeRO Memory Breakdown, System Health, and more
- Live streaming from Colab/Kaggle: Paste a tunnel URL and watch training in real-time
- Model-agnostic: Works with any model — titles and charts adapt dynamically
- Gated Self-Attention: Novel architecture modification with per-layer gate visualization
- Real GPU metrics:
pynvmlfor utilization/memory, CUDA events for timing, per-layer profiling - Single-file dashboard: No build step, no dependencies beyond D3.js CDN
- DeepSpeed ZeRO Stage 2: FP16, CPU optimizer offload, gradient accumulation
Open the GitHub Pages dashboard — it auto-loads real BERT-Large training data (460 steps, 11 evaluations, trained on IMDB with Gated Attention on a T4 GPU).
You can also drag & drop any training_metrics.json onto the dashboard to visualize a different run.
BERT-Large with Gated Attention (recommended):
GPT-2 Fine-Tuning (beginner-friendly, self-contained server + tunnel):
Colab/Kaggle (GPU) Your Browser
┌──────────────────┐ ┌──────────────────────┐
│ Training Loop │ HTTP POST │ D3.js Dashboard │
│ + Monitor class ├─────────────────►│ (GitHub Pages or │
│ + pynvml GPU │ /api/push │ local server.py) │
│ + CUDA timing │ │ │
└──────────────────┘ └──────────────────────┘
│ ▲
│ cloudflared tunnel │
└── https://xxx.trycloudflare.com ──────┘
Option A — Self-contained (GPT-2 notebook):
- Open the GPT-2 Colab notebook above
- Run All — it installs deps, starts a dashboard server + cloudflared tunnel inside Colab
- Click the tunnel URL printed in the output — that's your live dashboard
- Training starts automatically and charts update in real-time
Option B — Remote dashboard (BERT notebook):
- On your local machine:
python3 server.py --tunnel - Copy the tunnel URL printed in terminal
- Open the BERT Colab notebook and paste the URL into
DASHBOARD_URL - Run All — metrics stream to your local dashboard
deepseed/
├── index.html # Single-file D3.js dashboard (~3300 lines)
├── training_metrics.json # Real BERT-Large training data (460 steps)
├── server.py # Python HTTP server with SSE + CORS + tunnel support
├── deepseed_monitor.py # Python client library for pushing metrics (stdlib only)
├── deepspeed_bert_colab.ipynb # BERT-Large + Gated Attention notebook
├── deepspeed_gpt2_colab.ipynb # GPT-2 fine-tuning notebook (self-contained)
├── run_charts.py # Static chart generation (matplotlib)
├── orchestrator/ # K8s control plane (job store, Kaggle controller)
├── k8s/ # Kustomize manifests for K8s deployment
├── k8s_deploy.py # K8s deployer CLI
├── jobs/ # Job runner implementations
├── charts/ # Pre-rendered chart assets
└── multigpu_lora/ # Multi-GPU LoRA fine-tuning experiments
| Chart | What it shows |
|---|---|
| Training & Validation Loss | Loss curve with EMA overlay (α=0.1) + validation dots |
| Validation Accuracy & F1 | Eval metrics over training steps |
| GPU Utilization | Real pynvml GPU utilization percentage |
| GPU Memory Usage | VRAM consumption over time |
| Training Throughput | Samples/second with moving average |
| LR Schedule | Learning rate warmup + linear decay |
| Gradient Norms | Gradient magnitude tracking |
| Step Time Breakdown | Forward / Backward / Optimizer / Communication stacked bars |
| Gradient Flow Heatmap | Per-layer forward timing heatmap |
| ZeRO Memory | Parameter / Gradient / Optimizer / Activation memory breakdown |
| Step Time Histogram | Distribution of step durations |
| System Health | Live GPU/memory stats (live mode only) |
The BERT notebook includes a novel Gated Self-Attention mechanism:
g = sigmoid(W_g * attention_output) # learnable gate per position
output = g * attention_output + (1-g) * x # blend attend vs. skip
- Adds only 24,576 parameters to BERT-Large (0.007% overhead)
- Early layers learn to partially skip attention, late layers attend fully
- Improves convergence and gradient flow
- Gate evolution is tracked and visualized in the dashboard
The included training_metrics.json contains real training data from fine-tuning BERT-Large (335M params) with Gated Attention on IMDB sentiment classification:
- GPU: NVIDIA Tesla T4 (15.8 GB)
- Optimizer: DeepSpeed ZeRO Stage 2 + FP16 + CPU offload
- Dataset: IMDB (25K train / 25K test)
- Results: 460 logged steps, 11 evaluations, ~93% validation accuracy
- Metrics: Real GPU utilization, memory, per-layer timing via CUDA events
# Start the dashboard server with cloudflared tunnel
python3 server.py --tunnel
# Or just serve locally
python3 server.py
# Open http://localhost:8080- Dashboard: Vanilla HTML + CSS + D3.js v7 (single file, no build step)
- Training: PyTorch + DeepSpeed + Hugging Face Transformers
- Metrics: pynvml (GPU), CUDA events (timing), custom Monitor class
- Tunnel: cloudflared (Cloudflare Tunnel) for remote access
- Orchestration: Kubernetes + custom control plane for Kaggle job management
MIT

