Skip to content

gridatek/kldp

Repository files navigation

KLDP - Kubernetes Local Data Platform

A batteries-included local data engineering platform running on Minikube. Start with Airflow, add Spark, and scale to a full data stack - all on your laptop.

🎯 Vision

KLDP provides a production-like Kubernetes data platform for local development, testing, and learning. No cloud costs, no complex setup - just run a script and start building pipelines.

🚀 Quick Start

# Prerequisites: Docker, Minikube, Helm, kubectl

# Clone and setup
git clone https://github.com/gridatek/kldp.git
cd kldp

# Start with Airflow only
./scripts/install-airflow.sh

# Or install everything
./scripts/install-all.sh

# Access Airflow UI
minikube service airflow-webserver -n airflow

📦 Components

Phase 1 (Current)

  • Apache Airflow - Workflow orchestration with KubernetesExecutor
  • PostgreSQL - Metadata database
  • MinIO - S3-compatible object storage

Phase 2 (Planned)

  • 🔄 Spark Operator - Distributed data processing
  • 🔄 Sample Pipelines - Airflow + Spark integration examples

Phase 3 (Future)

  • 📋 Prometheus + Grafana - Monitoring and observability
  • 📋 Kafka - Streaming data platform
  • 📋 JupyterHub - Interactive notebooks
  • 📋 Data Catalog - Metadata management

🛠️ Project Structure

kldp/
├── core/
│   └── airflow/              # Airflow Helm configs
├── compute/
│   └── spark/                # Spark operator configs
├── storage/
│   ├── minio/                # Object storage
│   └── postgresql/           # Shared database
├── monitoring/
│   └── observability/        # Prometheus, Grafana
├── scripts/
│   ├── init-cluster.sh       # Initialize minikube
│   ├── install-airflow.sh    # Install Airflow
│   ├── install-spark.sh      # Install Spark operator
│   ├── install-all.sh        # Full stack installation
│   └── destroy.sh            # Cleanup everything
├── examples/
│   ├── airflow-basics/       # Basic Airflow DAGs
│   └── spark-pipeline/       # Airflow + Spark examples
└── docs/
    ├── GETTING_STARTED.md
    ├── ARCHITECTURE.md
    └── TROUBLESHOOTING.md

💻 System Requirements

Minimal (Airflow only)

  • 4 CPU cores
  • 8 GB RAM
  • 20 GB disk space

Recommended (Full stack)

  • 6 CPU cores
  • 12 GB RAM
  • 40 GB disk space

🎓 Use Cases

  • Learning: Hands-on experience with production data tools
  • Development: Test pipelines locally before deploying to prod
  • Prototyping: Experiment with data architectures risk-free
  • Teaching: Workshop and training material

📚 Documentation

🤝 Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

📄 License

MIT License - See LICENSE for details.

🙏 Acknowledgments

Built on top of:


Note: KLDP is optimized for local development. For production deployments, use managed services or properly configured production clusters.

About

A local data platform running on Minikube

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors