Solutions Architect @ NVIDIA Β· HPC Β· AI Infrastructure Β· Energy Efficiency
Santa Clara, CA
I design and optimize large-scale AI systems at NVIDIA, focusing on:
- ποΈ HPC + AI infrastructure design (multi-node, scheduling, deployment)
- β‘ LLM inference optimization (vLLM Β· TensorRT-LLM Β· DeepSpeed)
- π₯ Performance-per-watt benchmarking across GPU clusters
How fast can we run β and at what energy cost?
- Multi-node GPU benchmarking across:
- vLLM Β· TensorRT-LLM Β· DeepSpeed Β· HuggingFace
- Throughput Β· Latency Β· Memory Β· Power efficiency
- Performance-per-watt analysis across workloads
- Sustainable AI system design
- Trade-offs: performance vs energy cost
- Slurm-based HPC scheduling
- Kubernetes-based AI deployment
- Hybrid cluster orchestration
- Multi-node GPU scaling
AI / ML
GPU / Systems
Infrastructure
Languages
- LLM inference engine benchmarking
- GPU power and energy-efficiency profiling
- Multi-node AI cluster performance analysis
- HPC scheduling and AI infrastructure design
- Sustainable AI system optimization
Website Β· LinkedIn Β· Google Scholar Β· ORCID

