Skip to content
View chenxuniu's full-sized avatar

Block or report chenxuniu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
chenxuniu/README.md

Typing SVG

Solutions Architect @ NVIDIA Β· HPC Β· AI Infrastructure Β· Energy Efficiency
Santa Clara, CA

Website LinkedIn Google Scholar ORCID


🧠 About Me

I design and optimize large-scale AI systems at NVIDIA, focusing on:

  • πŸ—οΈ HPC + AI infrastructure design (multi-node, scheduling, deployment)
  • ⚑ LLM inference optimization (vLLM Β· TensorRT-LLM Β· DeepSpeed)
  • πŸ”₯ Performance-per-watt benchmarking across GPU clusters

How fast can we run β€” and at what energy cost?


πŸ“Š System Activity

Chenxu Niu GitHub streak


πŸš€ Featured Work

⚑ LLM Inference Benchmarking

  • Multi-node GPU benchmarking across:
    • vLLM Β· TensorRT-LLM Β· DeepSpeed Β· HuggingFace
  • Throughput Β· Latency Β· Memory Β· Power efficiency

πŸ”‹ GPU Energy Efficiency

  • Performance-per-watt analysis across workloads
  • Sustainable AI system design
  • Trade-offs: performance vs energy cost

πŸ—οΈ AI / HPC Infrastructure

  • Slurm-based HPC scheduling
  • Kubernetes-based AI deployment
  • Hybrid cluster orchestration
  • Multi-node GPU scaling

βš™οΈ Tech Stack

AI / ML

PyTorch vLLM TensorRT-LLM DeepSpeed Hugging Face

GPU / Systems

NVIDIA CUDA Multi-Node GPU Power Profiling

Infrastructure

Slurm Kubernetes Docker Linux

Languages

Python C++ Bash YAML


πŸ“Œ Focus Areas

  • LLM inference engine benchmarking
  • GPU power and energy-efficiency profiling
  • Multi-node AI cluster performance analysis
  • HPC scheduling and AI infrastructure design
  • Sustainable AI system optimization

πŸ”— Links

Website Β· LinkedIn Β· Google Scholar Β· ORCID

Pinned Loading

  1. TokenPowerBench TokenPowerBench Public

    chenxuniu/TokenSpark-Benchmark-Benchmarking-Power-Consumption-of-LLM-Inference-on-Multi-Node-Clusters

    Python 11

  2. LLM-Inference-Engine-Benchmark LLM-Inference-Engine-Benchmark Public

    A comprehensive benchmarking tool for measuring energy consumption and power efficiency of Large Language Model (LLM) inference engines including vLLM, DeepSpeed, TensorRT-LLM, and Transformers. Fe…

    Python 5

  3. awesome-disaggregated-llm-serving awesome-disaggregated-llm-serving Public

    A curated map of AFD, PD disaggregation, KV-cache systems, MoE serving, and re-aggregation baselines for LLM serving.

    Python

  4. llm-power-profiler llm-power-profiler Public

    Lightweight local monitor for watts, tokens, and joules per token on OpenAI-compatible LLM servers.

    Python

  5. vllm-project/semantic-router vllm-project/semantic-router Public

    System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

    Go 4.2k 670