chenxuniu

Follow

Chenxu Niu chenxuniu

Follow

Solutions Architecture@ NVIDIA HPC/AI/Energy Efficiency

15 followers · 25 following

NVIDIA
Santa Clara, CA
11:37 (UTC -07:00)
https://chenxuniu.github.io/
https://orcid.org/0000-0002-2142-1731
in/chenxu-niu-771578274
https://scholar.google.com/citations?user=i1H5XQ8AAAAJ&hl=en

Achievements

Achievements

chenxuniu/README.md

Solutions Architect @ NVIDIA · HPC · AI Infrastructure · Energy Efficiency
Santa Clara, CA

🧠 About Me

I design and optimize large-scale AI systems at NVIDIA, focusing on:

🏗️ HPC + AI infrastructure design (multi-node, scheduling, deployment)
⚡ LLM inference optimization (vLLM · TensorRT-LLM · DeepSpeed)
🔥 Performance-per-watt benchmarking across GPU clusters

How fast can we run — and at what energy cost?

📊 System Activity

🚀 Featured Work

⚡ LLM Inference Benchmarking

Multi-node GPU benchmarking across:
- vLLM · TensorRT-LLM · DeepSpeed · HuggingFace
Throughput · Latency · Memory · Power efficiency

🔋 GPU Energy Efficiency

Performance-per-watt analysis across workloads
Sustainable AI system design
Trade-offs: performance vs energy cost

🏗️ AI / HPC Infrastructure

Slurm-based HPC scheduling
Kubernetes-based AI deployment
Hybrid cluster orchestration
Multi-node GPU scaling

⚙️ Tech Stack

AI / ML

GPU / Systems

Infrastructure

Languages

📌 Focus Areas

LLM inference engine benchmarking
GPU power and energy-efficiency profiling
Multi-node AI cluster performance analysis
HPC scheduling and AI infrastructure design
Sustainable AI system optimization

🔗 Links

Website · LinkedIn · Google Scholar · ORCID

Pinned Loading

TokenPowerBench TokenPowerBench Public

chenxuniu/TokenSpark-Benchmark-Benchmarking-Power-Consumption-of-LLM-Inference-on-Multi-Node-Clusters

Python 11
LLM-Inference-Engine-Benchmark LLM-Inference-Engine-Benchmark Public

A comprehensive benchmarking tool for measuring energy consumption and power efficiency of Large Language Model (LLM) inference engines including vLLM, DeepSpeed, TensorRT-LLM, and Transformers. Fe…

Python 5
awesome-disaggregated-llm-serving awesome-disaggregated-llm-serving Public

A curated map of AFD, PD disaggregation, KV-cache systems, MoE serving, and re-aggregation baselines for LLM serving.

Python
llm-power-profiler llm-power-profiler Public

Lightweight local monitor for watts, tokens, and joules per token on OpenAI-compatible LLM servers.

Python
vllm-project/semantic-router vllm-project/semantic-router Public

System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

Go 4.2k 670