Skip to content

feat: Add MinIO S3-compatible object storage#1

Merged
klagrida merged 2 commits intomainfrom
feat/add-minio-storage
Dec 14, 2025
Merged

feat: Add MinIO S3-compatible object storage#1
klagrida merged 2 commits intomainfrom
feat/add-minio-storage

Conversation

@klagrida
Copy link
Copy Markdown
Contributor

Implement MinIO as the data lake storage layer for KLDP with:

Infrastructure:

  • Helm values configuration (core/storage/minio-values.yaml)
  • Standalone mode deployment optimized for local development
  • NodePort service for easy local access
  • 10Gi persistent volume for data storage
  • Resource limits: 256Mi/512Mi memory, 100m/500m CPU
  • Default buckets: datalake, raw, processed, curated

Installation:

  • Installation script (scripts/install-minio.sh)
  • Uses Bitnami MinIO Helm chart v17.0.21 via OCI registry
  • Default credentials: minioadmin/minioadmin
  • Automatic bucket provisioning on startup

Developer Experience:

  • Makefile targets: install-minio, minio-console, minio-api, logs-minio
  • Enhanced status command to include MinIO pods and release
  • Port forwarding helpers for console (9001) and API (9000)

Documentation:

  • Comprehensive setup guide (docs/MINIO_SETUP.md)
  • Data lake architecture patterns (bronze/silver/gold)
  • Integration examples with Airflow, Spark, Pandas
  • Troubleshooting and monitoring guide
  • Updated CLAUDE.md with MinIO installation instructions

Examples:

  • Example DAG (examples/dags/example_minio_s3.py)
  • Demonstrates S3 operations: connect, upload, list, download
  • Uses KubernetesPodOperator with boto3
  • Shows proper configuration for in-cluster connectivity

Integration:

  • S3-compatible API accessible to all components
  • Uses Kubernetes DNS: minio.storage.svc.cluster.local:9000
  • Ready for Spark, Airflow, and other data processing tools

MinIO provides the foundational storage layer for:

  • Raw data ingestion and landing zones
  • Processed data transformation pipelines
  • Curated analytics-ready datasets
  • Multi-stage data lake architecture

🤖 Generated with Claude Code

khalil and others added 2 commits December 14, 2025 12:10
Implement MinIO as the data lake storage layer for KLDP with:

**Infrastructure:**
- Helm values configuration (core/storage/minio-values.yaml)
- Standalone mode deployment optimized for local development
- NodePort service for easy local access
- 10Gi persistent volume for data storage
- Resource limits: 256Mi/512Mi memory, 100m/500m CPU
- Default buckets: datalake, raw, processed, curated

**Installation:**
- Installation script (scripts/install-minio.sh)
- Uses Bitnami MinIO Helm chart v17.0.21 via OCI registry
- Default credentials: minioadmin/minioadmin
- Automatic bucket provisioning on startup

**Developer Experience:**
- Makefile targets: install-minio, minio-console, minio-api, logs-minio
- Enhanced status command to include MinIO pods and release
- Port forwarding helpers for console (9001) and API (9000)

**Documentation:**
- Comprehensive setup guide (docs/MINIO_SETUP.md)
- Data lake architecture patterns (bronze/silver/gold)
- Integration examples with Airflow, Spark, Pandas
- Troubleshooting and monitoring guide
- Updated CLAUDE.md with MinIO installation instructions

**Examples:**
- Example DAG (examples/dags/example_minio_s3.py)
- Demonstrates S3 operations: connect, upload, list, download
- Uses KubernetesPodOperator with boto3
- Shows proper configuration for in-cluster connectivity

**Integration:**
- S3-compatible API accessible to all components
- Uses Kubernetes DNS: minio.storage.svc.cluster.local:9000
- Ready for Spark, Airflow, and other data processing tools

MinIO provides the foundational storage layer for:
- Raw data ingestion and landing zones
- Processed data transformation pipelines
- Curated analytics-ready datasets
- Multi-stage data lake architecture

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Replace deprecated schedule_interval with schedule parameter
- Fixes CI validation error: "unexpected keyword argument 'schedule_interval'"
- Compatible with Airflow 3.1.0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@klagrida klagrida merged commit 1dfddb2 into main Dec 14, 2025
3 checks passed
@klagrida klagrida deleted the feat/add-minio-storage branch December 14, 2025 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant