Infrastructure engineer focused on LLM inference systems.
- M.S. Computer Science — Shanghai Jiao Tong University
- B.S. Computer Science — Harbin Institute of Technology
- 2 yrs at Alibaba
Currently contributing to vllm-project/vllm — KV cache transfer, scheduler optimization, and hybrid KV cache management (HMA).
LLM Inference
- vLLM internals
- KV cache transfer
- Prefill-decode disaggregation
- PagedAttention
- Speculative decoding
| Project | Area | Highlights |
|---|---|---|
| vLLM | Scheduler / KV Cache | Bounded prefetch scheduling, HMA default behavior, metrics fixes |
→ Full contribution list: vllm-contributions
Python CUDA Triton C++ PyTorch Linux

