feat: Add Spark Operator for running Spark jobs on Kubernetes by klagrida · Pull Request #2 · gridatek/kldp

klagrida · 2025-12-14T11:57:58Z

Implement Kubeflow Spark Operator for distributed data processing:

Infrastructure:

Helm values configuration (core/compute/spark-operator-values.yaml)
Kubeflow Spark Operator with webhook enabled
Resource limits: 128Mi/256Mi memory, 100m/200m CPU
RBAC and service account configuration
Prometheus metrics integration

Installation:

Installation script (scripts/install-spark.sh)
Uses Kubeflow Spark Operator Helm chart
Automatic namespace and RBAC setup
Webhook for pod customization

Developer Experience:

Makefile targets: install-spark, logs-spark, spark-apps
Enhanced status command with Spark operator and applications
Easy Spark job submission and monitoring

Examples:

spark-pi.yaml: Basic Spark Pi calculation
spark-minio-example.yaml: Spark with MinIO S3 integration
example_spark_job.py: Airflow DAG for Spark job orchestration
Comprehensive README with usage examples

Integration:

SparkApplication CRD for declarative job submission
S3A configuration for MinIO integration
Airflow orchestration support via KubernetesPodOperator
Driver and executor pod templates

Features:

Declarative Spark application management
Automatic driver/executor pod creation
Resource quota and limits
Job monitoring and lifecycle management
S3-compatible storage (MinIO) access

Spark Operator enables:

Batch data processing at scale
ETL/ELT pipelines
Machine learning workloads
Real-time stream processing (with Structured Streaming)
SQL analytics on distributed data

🤖 Generated with Claude Code

Implement Kubeflow Spark Operator for distributed data processing: **Infrastructure:** - Helm values configuration (core/compute/spark-operator-values.yaml) - Kubeflow Spark Operator with webhook enabled - Resource limits: 128Mi/256Mi memory, 100m/200m CPU - RBAC and service account configuration - Prometheus metrics integration **Installation:** - Installation script (scripts/install-spark.sh) - Uses Kubeflow Spark Operator Helm chart - Automatic namespace and RBAC setup - Webhook for pod customization **Developer Experience:** - Makefile targets: install-spark, logs-spark, spark-apps - Enhanced status command with Spark operator and applications - Easy Spark job submission and monitoring **Examples:** - spark-pi.yaml: Basic Spark Pi calculation - spark-minio-example.yaml: Spark with MinIO S3 integration - example_spark_job.py: Airflow DAG for Spark job orchestration - Comprehensive README with usage examples **Integration:** - SparkApplication CRD for declarative job submission - S3A configuration for MinIO integration - Airflow orchestration support via KubernetesPodOperator - Driver and executor pod templates **Features:** - Declarative Spark application management - Automatic driver/executor pod creation - Resource quota and limits - Job monitoring and lifecycle management - S3-compatible storage (MinIO) access Spark Operator enables: - Batch data processing at scale - ETL/ELT pipelines - Machine learning workloads - Real-time stream processing (with Structured Streaming) - SQL analytics on distributed data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

klagrida merged commit d6ef85a into main Dec 14, 2025
3 checks passed

klagrida deleted the feat/add-spark-operator branch December 14, 2025 12:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Spark Operator for running Spark jobs on Kubernetes#2

feat: Add Spark Operator for running Spark jobs on Kubernetes#2
klagrida merged 1 commit intomainfrom
feat/add-spark-operator

klagrida commented Dec 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

klagrida commented Dec 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant