I build and optimize LLM systems. Active open-source contributor to the vLLM ecosystem (llm-compressor, compressed-tensors). I also develop production-grade LLM applications in RAG, fine-tuning, and autonomous agents.
vllm-project/llm-compressor : Quantization toolkit for LLM deployment with vLLM
- Add iMatrix weighted MSE observer and IMatrixGatherer : importance-weighted quantization, improves PPL across RTN/GPTQ/AWQ
- Add norm calibration context for unit-offset RMSNorm (Gemma/Qwen3Next) : fixes AWQ/SmoothQuant on Gemma models
- Add MoE calibration module for GlmMoeDsa (GLM-5) : packed 3D tensor handling for MoE architectures
- Fix topological ordering in FX graph cleanup : erase_node crash fix for Granite4 GPTQ
- Handle packed weights in granite4 to_3d_expert (W4A16)
- Fix SmoothQuant regex for DeepSeek/GLM-5
- Add SmoothQuant mapping for GLM-5
- Add AWQ mapping for GLM-5
vllm-project/compressed-tensors : Safetensors extension for sparse and quantized tensor storage
- Support N-dimensional tensors in pack/unpack_int32 : fixes 3D MoE expert weight packing
-
finsight : Visual RAG for French financial documents using ColQwen2.5 + Qdrant + Claude Sonnet/Opus. Indexed 10 annual reports (~5,982 pages). 90% Recall@10, 100% citation accuracy. Async FastAPI backend with SSE streaming + background adversarial verification, React + base-ui frontend. 183 tests, CI/CD.
-
reasonforge : Iterative LLM fine-tuning on Text-to-SQL using STaR (Self-Taught Reasoner). Ministral-8B: 60.1% baseline → 68.8% SFT → 78.0% after 3 STaR iterations on Spider dev set.
-
ai-watch : Autonomous AI news agent via LangGraph + Claude tool-use. Aggregates HuggingFace Papers, GitHub Trending, and Simon Willison RSS into daily markdown briefings. Deployed on GitHub Pages, running via GitHub Actions.
