ML & AI Engineer — Fine-Tuning · Agentic Systems · Edge Deployment · Production LLM Ops
I build production LLM systems from the metal up — from quantized models running on Jetson edge hardware to multi-agent cloud deployments with tool-use, permission gating, and audit trails. Currently focused on MoE fine-tuning (ZAYA1-8B), Blackwell-native FP4 quantization (NVFP4), and SOTA agentic coding benchmarks.
Dallas-Fort Worth, TX · ttimmsinternational@gmail.com
| Project | What | Why It Matters |
|---|---|---|
| zaya1-godspeed | Fine-tuning ZAYA1-8B MoE for agentic tool calling | 760M active params matching 14B models — closing a deliberate gap Zyphra left in the tech report |
| llama.cpp NVFP4 | Blackwell-native FP4 quantization with MSE-optimal scales | First consumer NVFP4 tooling on RTX 5070 Ti — PR #22897 awaiting upstream review |
Security-first open-source coding agent. Hand-rolled async ReAct loop with 4-tier deny-first permission engine, SHA-256 hash-chained audit trail, and 200+ LLM providers via LiteLLM. 854 tests.
- 30+ built-in tools with JSON Schema validation, MCP server + client
- Parallel + speculative tool dispatch, cost budget enforcement
- Self-evolution via LLM-guided mutations, multi-language verify gate with retry
- Training data export (openai/chatml/sharegpt), per-step reward annotations for GRPO
- SWE-bench Lite: 34.8% single-shot · 52.2% oracle best-of-5
Autonomous multi-agent personal intelligence system on NVIDIA Jetson Orin Nano. 5 LangGraph expert agents, LiteLLM gateway (4 providers + Ollama), 3-tier ONNX intent router. 393 tests. Fully on-device — zero cloud dependencies.
Multi-agent algorithmic trading pipeline with DeepSeek R1 reasoning at every stage. 4-agent pipeline (TA → Chief → Risk → Execution), Kelly Criterion position sizing, Monte Carlo risk simulation, real-time WebSocket market data.
Qwen3.5-4B fine-tuned with ORPO for biblical Q&A. Hybrid RAG (ChromaDB + BM25 + cross-encoder reranking), constitutional AI guardrails, voice pipeline (Whisper + Kokoro TTS), Gradio UI. 183 tests, 34 W&B runs, 5,925 training steps.
Comprehensive GPU fleet validation modeled on NVIDIA DCGM. 16 diagnostic modules, Prometheus + Grafana, fault injection, JUnit XML for CI. 188 tests.
ML research control plane — experiment lifecycle management, model registry, cloud training launcher. Orchestrates gpu-server-test-suite (preflight checks) and llm-wiki (knowledge persistence). 28 tests, v0.1.0.
Git-backed knowledge base — Karpathy's LLM Wiki pattern. LangGraph ingest/query pipelines, instructor + Pydantic structured output, BM25 search, Groq → Gemini → Ollama fallback via LiteLLM. 117 tests, 40 wiki pages.
SQL + Python ETL pipeline for semiconductor quality analysis — supplier performance scoring, defect Pareto distributions, yield trend analysis.
Multi-model ML pipeline for Tesla tire wear prediction. Random Forest, XGBoost, Neural Network ensemble with Claude AI integration.
- llama.cpp #22897 — NVFP4 default type mapping + per-tensor scale tensors + MSE-optimal correction
- llama.cpp #22858 — Missing
LLAMA_FTYPE_MOSTLY_NVFP4case fix (closed, replaced by #22897) - Zyphra/ZAYA1-8B — Agentic fine-tuning to complete the model's post-training (SFT + GRPO)
📈 Contribution Graph
| Area | Technologies |
|---|---|
| LLMs & Agents | LiteLLM, 200+ providers, Ollama, llama.cpp, multi-agent orchestration, ReAct loops |
| Fine-Tuning | Unsloth, TRL (SFT/DPO/GRPO/ORPO), QLoRA, PEFT, MoE architectures, RLHF/RLAIF |
| Inference | vLLM (custom forks), speculative decoding (750 tok/s), TensorRT-LLM, EXL2 |
| Quantization | NVFP4 (Blackwell-native), GGUF, EXL2, FP8, NF4, GPTQ, AWQ |
| ML Infrastructure | PyTorch, CUDA 12.8, torch.compile, DeepSpeed, lm-eval, W&B, MLflow |
| Systems | Python, Rust, TypeScript, Docker, GitHub Actions CI/CD, systemd |
| Edge / Hardware | NVIDIA Jetson Orin Nano, RTX 5070 Ti (Blackwell sm_120), 16 GB VRAM optimization |
| Data | PostgreSQL, SQL, pandas, SQLAlchemy, ChromaDB, LanceDB, BM25 |



