I focus on building cloud-native AI inference infrastructure β making LLM serving on Kubernetes declarative, autoscaling, and scale-to-zero.
2 years of SRE (Prometheus, multi-cloud) before this β I bring that operational lens to a stack where GPUs and inference engines are still black boxes.
- π₯ Building Hearth β a vendor-neutral K8s operator for scale-to-zero serving of open-source LLMs (Qwen / DeepSeek / GLM),with NVIDIA / Ascend as pluggable backends. Verified end-to-end on real GPUs.
- π§° Focus: queue-driven autoscaling, cold-start UX, model caching, observability
- π« jzlyy68@gmail.com



