Distributed Stream Processing Platform

A production-grade stream processing system inspired by Apache Flink, implementing exactly-once semantics, fault tolerance, and high-throughput data processing with Python.

Teammates

Uditanshu Tomar (uditanshu.tomar@colorado.edu), Ishneet Chadha (ishneet.chadha@colorado.edu)

How to Run

Prerequisites

Docker & Docker Compose (for local deployment)
Python 3.9+ (for development)
Google Cloud SDK (only for GCP deployment)
kubectl (only for GCP deployment)

Option 1: Quick Start with Docker Compose (Recommended)

The easiest way to run the entire platform locally.

Navigate to deployment directory:
```
cd deployment
```
Start all services:
```
docker-compose up -d
```

Wait for services to be ready (~30 seconds):

# Check if services are running
docker-compose ps

Access the Web Dashboard: Open http://localhost:5000 in your browser.

Verify Cluster Health:

curl http://localhost:8081/cluster/metrics

Stop the cluster:
```
docker-compose down
```

Option 2: Development Setup (Run Components Individually)

For development and debugging, you can run components individually.

Setup development environment:
```
./scripts/setup_dev.sh
```
Activate virtual environment:
```
source venv/bin/activate
```

Start dependencies (PostgreSQL, Kafka, Zookeeper):

cd deployment
docker-compose up -d postgres zookeeper kafka

Start JobManager:

python -m jobmanager.api
# JobManager API will be available at http://localhost:8081

Start TaskManager (in a separate terminal):

source venv/bin/activate
python -m taskmanager.task_executor

Start Web GUI (in another terminal):

source venv/bin/activate
cd gui
python app.py
# GUI will be available at http://localhost:5000

Option 3: Run on Google Cloud Platform (GKE)

Deploy the platform to a Google Kubernetes Engine cluster.

Configure GCP Project:

export GCP_PROJECT_ID="your-project-id"
gcloud config set project $GCP_PROJECT_ID

Run Deployment Script: This script will setup GKE, build images, and deploy all services.
```
./deploy_to_gcp.sh
```

Access Services:

# Get External IP of the GUI
kubectl get svc -n stream-processing gui

# Access JobManager API
kubectl get svc -n stream-processing jobmanager

Running Jobs

1. Run the Demo (Web GUI)

Start the platform using Docker Compose (see Option 1 above).
Open the Dashboard at http://localhost:5000.
Click "Start Demo" in the "Control Panel".
Watch real-time metrics update as the DemoWeatherProcessing job runs.
See data flowing in the "Live Data Stream" panel.

2. Submit a Custom Job (CLI)

Example: Word Count

# 1. Generate the job file
python examples/word_count.py
# This creates word_count_job.pkl

# 2. Submit to the cluster
curl -X POST http://localhost:8081/jobs/submit \
  -F "job_file=@word_count_job.pkl"

# 3. Note the job_id from the response

Monitor the Job:

# Check job status
curl http://localhost:8081/jobs/{job_id}/status

# Get job metrics
curl http://localhost:8081/jobs/{job_id}/metrics

# List all jobs
curl http://localhost:8081/jobs

3. Run Example Jobs

The examples/ directory contains several example jobs:

# Word Count - Simple text processing
python examples/word_count.py

# Simple Pipeline - Map and filter operations
python examples/simple_pipeline.py

# Windowed Aggregation - Time-based aggregations
python examples/windowed_aggregation.py

# Stateful Deduplication - Remove duplicate records
python examples/stateful_deduplication.py

# Stream Join - Join two data streams
python examples/stream_join.py

# Data Generators - Generate test data
python examples/data_generator_iot.py
python examples/data_generator_ecommerce.py
python examples/data_generator_financial.py

Each example generates a .pkl file that can be submitted to the cluster.

Architecture

JobManager (Master): Coordinates execution, manages resources, and handles checkpoints.
TaskManager (Worker): Executes tasks in parallel slots.
Kafka: Handles data ingestion and inter-operator communication.
gRPC: Used for internal control plane communication.
RocksDB: Embedded state backend for stateful operations.
GCS/S3: Distributed storage for fault-tolerance checkpoints.

Features

Exactly-Once Processing: Distributed snapshots (Chandy-Lamport).
Fault Tolerance: Automatic failure recovery.
High Throughput: Operator chaining & flow control.
Stateful Operations: Windowing, Aggregations, Joins.
Observability: Prometheus metrics & Grafana dashboards.

Project Structure

stream-processing-platform/
├── jobmanager/              # Control Plane (Scheduler, API)
├── taskmanager/             # Data Plane (Execution, State)
├── common/                  # Shared Utils (Proto, Config)
├── gui/                     # Web Dashboard
├── examples/                # Example Jobs
├── deployment/              # Docker & K8s Configs
└── scripts/                 # Deployment Scripts

Configuration

Key environment variables in deployment/docker-compose.yml:

TASK_SLOTS: Number of concurrent tasks per TaskManager (Default: 4).
CHECKPOINT_INTERVAL: Frequency of checkpoints in ms (Default: 10000).
STATE_BACKEND: rocksdb or memory.
GCS_CHECKPOINT_PATH: GCS bucket for checkpoints.

Monitoring

When running with Docker Compose, monitoring services are automatically available:

Grafana Dashboard: http://localhost:3000
- Username: admin
- Password: admin
Prometheus Metrics: http://localhost:9090

Troubleshooting

Services not starting

# Check service logs
docker-compose logs jobmanager
docker-compose logs taskmanager
docker-compose logs kafka

# Check if ports are already in use
netstat -an | grep -E "5000|8081|9092|5432"

Job submission fails

# Verify JobManager is running
curl http://localhost:8081/health

# Check if Kafka is accessible
docker-compose exec kafka kafka-topics --list --bootstrap-server localhost:9092

Development mode issues

# Regenerate gRPC stubs
bash scripts/generate_proto.sh

# Reinstall dependencies
pip install -r jobmanager/requirements.txt
pip install -r taskmanager/requirements.txt

Built with: Python, FastAPI, gRPC, Kafka, RocksDB, Docker, Kubernetes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
common		common
deployment		deployment
examples		examples
gui		gui
jobmanager		jobmanager
monitoring		monitoring
sample_data		sample_data
scripts		scripts
taskmanager		taskmanager
.gcloudignore		.gcloudignore
.gitignore		.gitignore
README.md		README.md
cloudbuild-simple.yaml		cloudbuild-simple.yaml
cloudbuild.yaml		cloudbuild.yaml
deploy_after_build.sh		deploy_after_build.sh
deploy_to_gcp.sh		deploy_to_gcp.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Stream Processing Platform

Teammates

How to Run

Prerequisites

Option 1: Quick Start with Docker Compose (Recommended)

Option 2: Development Setup (Run Components Individually)

Option 3: Run on Google Cloud Platform (GKE)

Running Jobs

1. Run the Demo (Web GUI)

2. Submit a Custom Job (CLI)

3. Run Example Jobs

Architecture

Features

Project Structure

Configuration

Monitoring

Troubleshooting

Services not starting

Job submission fails

Development mode issues

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Distributed Stream Processing Platform

Teammates

How to Run

Prerequisites

Option 1: Quick Start with Docker Compose (Recommended)

Option 2: Development Setup (Run Components Individually)

Option 3: Run on Google Cloud Platform (GKE)

Running Jobs

1. Run the Demo (Web GUI)

2. Submit a Custom Job (CLI)

3. Run Example Jobs

Architecture

Features

Project Structure

Configuration

Monitoring

Troubleshooting

Services not starting

Job submission fails

Development mode issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages