From Blank Whiteboard to Production
A collection of eight production-grade system designs covering MLOps, data engineering, AI security, orbital autonomy, infrastructure architecture, and Database Architecture.
"Software bug = satellite is now space junk" — This is why systems thinking matters, especially at 2 am.
Design a RAG system that doesn't hallucinate. Covers data ingestion, vector databases, context building, monitoring, and the 7 attack surfaces.
Key Insight: Most RAG failures happen at the retrieval layer, not the LLM.
Tech Stack: Python, LangChain, Pinecone/Weaviate, MLflow, Prometheus
Dashboards:
Build cloud-agnostic ML deployment that works on AWS, GCP, and Azure. Single Terraform config deploys anywhere—no vendor lock-in.
Key Insight: Vendor lock-in isn't technical—it's architectural. Design for portability from day one.
Tech Stack: Terraform, Kubernetes, MLflow, Prometheus, Grafana
Impact: Eliminated vendor lock-in risk, preserved negotiating power
Process 10TB/day across medallion architecture (Bronze → Silver → Gold). Optimize for cost and performance at scale.
Key Insight: Architecture determines cost. Bronze/Silver/Gold + Delta Lake + Z-ordering = 63% cost reduction.
Tech Stack: Databricks, Delta Lake, Spark, Airflow, Terraform
Impact: $244K → $91K/year (63% cost reduction)
Secure AI systems across 7 attack surfaces: Data, Model, Inference API, Output, Infrastructure, Supply Chain, and Humans.
Key Insight: Most teams secure ONE layer; attackers exploit the other SIX. Defense in depth is non-negotiable.
Tech Stack: Python, OWASP, HashiCorp Vault, Prometheus, Kubernetes
Dashboards:
Design K8s infrastructure for distributed ML training. GPU scheduling, storage strategy, cost optimization.
Key Insight: K8s for ML ≠ K8s for web services. Treat them the same = waste money.
Tech Stack: Kubernetes, CUDA, PyTorch, Databricks, Kubecost
Impact: $200K → $80K/month (60% cost reduction)
Build autonomous systems that work at 240ms latency with 92% disconnection time. Covers orbital mechanics, launch constraints, and safe mode design.
Key Insight: You can't remote-control what physics won't allow. Autonomy is required, not optional.
Tech Stack: C++, Python, RTOS, Fault-tolerant systems
Signature: "Software bug = satellite is now space junk"
Dashboards:
Everything runs on Linux. Master the foundation: kernel, networking, security, performance tuning.
Key Insight: You can't optimize what you don't understand. Know your stack from silicon to application.
Tech Stack: Linux, Bash, systemd, eBPF, perf
Design database architecture that matches data structure to database type. Covers the 5 database types (In-Memory, Time-Series, Graph, Relational, Distributed), polyglot persistence, and decision framework.
Key Insight: Performance problems are often architecture problems, not hardware problems. 30-second queries → 100 milliseconds = architecture change (300x faster, same hardware).
Tech Stack: PostgreSQL, Redis, Neo4j, Cassandra, InfluxDB, MongoDB, Elasticsearch
Impact: 300x performance improvement, $50K/month infrastructure savings, no emergency migrations
Tracy Manning
Staff MLOps Engineer | Multi-Cloud + Linux Infrastructure Architect | Austin, TX
💼 LinkedIn
🐦 X/Twitter
📊 Tableau Public
📱 WhatsApp Channel
MIT License - Feel free to use for learning and adapt for your systems.
"Software bug = satellite is now space junk."
This is why systems thinking matters.
When you can't send a technician, you design better systems. When you can't afford downtime, you design for failure. When you can't waste money, you architect for cost.
Production systems that work at 2am.
That's the standard.
Last updated: January 2026