A distributed real-time data streaming and processing system for analyzing taxi location data using Apache Kafka, Apache Storm, and Redis, with a modern web dashboard for visualization.
The system consists of four main components:
- Apache Kafka: Message broker for streaming taxi location data
- Apache Storm: Distributed stream processing engine for real-time analytics
- Redis: In-memory data store for caching taxi state and alerts
- Web Dashboard: Real-time visualization of taxi positions and alerts
- Docker and Docker Compose
- A
data/folder in the project root containing data files with taxi location data- Supported file extensions:
.txtor.csv - CSV format:
taxi_id, timestamp, longitude, latitude(comma-separated) - Supported timestamp formats:
YYYY-MM-DD HH:MM:SS(e.g., 2024-01-15 10:30:45)YYYY/MM/DD HH:MM:SS(e.g., 2024/01/15 10:30:45)- Unix epoch in seconds or milliseconds
- Supported file extensions:
Ensure you have a data/ folder in the project root with your CSV files (.txt or .csv extensions):
mkdir -p data
# Add your CSV files hereYou can download a real-world taxi trajectory dataset from Microsoft Research (T-Drive):
After downloading, extract the files and place the trajectory files into the local data/ directory so the feeder can ingest them.
To start all services, run:
docker compose up --build -dThis will:
- Start Kafka broker and topic initialization
- Start the feeder service (reads data from
data/folder into Kafka) - Start Storm topology for stream processing
- Start Redis for state storage
- Start the Web API and Dashboard
- Start Kafka UI for monitoring
The feeder will run as a one-shot service and exit once all data is processed.
Optional: Clean up before starting (removes orphaned containers):
docker compose down --remove-orphansOptional: Monitor feeder progress in real-time:
docker compose logs -f feederOpen your browser and navigate to:
http://localhost:800
Open the Kafka UI in your browser:
http://localhost:8085/
Query Redis directly for current taxi state and tracking information:
# View taxi state (current position, speed, etc.)
docker exec -it redis redis-cli hgetall taxi:100:state
# View taxi track history (last 5 updates)
docker exec -it redis redis-cli lrange taxi:100:track 0 5Replace 100 with the desired taxi ID.
View all service logs together:
docker compose logs -fOr monitor specific services:
Monitor the feeder progress:
docker compose logs -f feederMonitor Storm nimbus (main processing):
docker compose logs -f storm-nimbusMonitor Storm supervisor:
docker compose logs -f storm-supervisorView API logs:
docker compose logs -f apidocker compose downTo remove all data and start fresh:
docker compose down --remove-orphans -vproducer/: Kafka producer that ingests CSV data from thedata/folderstorm/: Apache Storm topology for stream processing and analyticswebGui/api/: FastAPI backend for serving taxi and alert data via REST/WebSocketwebGui/web/: Frontend dashboard for real-time visualizationdata/: Input CSV files (not included in repo)compose.yaml: Docker Compose configuration for all services
Edit compose.yaml to adjust:
-
File glob pattern: Change
--globin thefeederservice to match different file types (default is/data/*.txt, use/data/*.{txt,csv}or/data/*for all files) -
Processing speed (pace): Modify
--paceparameter in thefeederservice:0= Process data as fast as possible (no delay between records)1= Process at recorded speed (real-time playback, sleeps based on actual timestamp deltas)0.001= 1000x faster than recorded speed (sleep = 0.1% of time delta between records)0.5= 2x faster (sleep = 50% of time delta)2= 0.5x speed, twice as slow (sleep = 200% of time delta)
Formula:
sleep_time = (next_timestamp - prev_timestamp) × pace -
Number of taxi records: Set
--max-filesto limit the number of data files processed (0 = all files) -
Kafka partitions: Change
--partitionsin thetopic-initservice -
Batch settings: Adjust
--linger-msand--batch-bytesfor performance tuning
CSV files should follow this format (tab or comma-separated):
taxi_id,timestamp,longitude,latitude
100,2024-01-15 10:30:45,13.404954,52.520008
100,2024-01-15 10:30:50,13.405100,52.520100
See CONTRIBUTIONS.md for authorship and contributions.