Skip to content

Real-time taxi movement tracking system using Kafka and Spark

License

Notifications You must be signed in to change notification settings

vijethph/Navflux

Repository files navigation

Status GitHub issues Contributors GitHub forks GitHub stars GitHub license

Test and Coverage Quality Gate Status Bugs Code Smells Reliability Rating Security Rating Maintainability Rating Known Vulnerabilities

Navflux Logo

Navflux

Real-time taxi tracking system with distributed stream processing

Report Bug · Request Feature


Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Deployment
  5. Project Structure
  6. Contributing
  7. License
  8. Contact
  9. Acknowledgments

About The Project

Navflux Architecture

Navflux is a real-time taxi tracking system that uses big data technologies. The system ingests synthetic GPS events, processes them via stream processing, stores historical data, and serves real-time locations through REST and WebSocket APIs.

Architecture: Kafka (streaming) → Spark Streaming (processing) → HBase (storage) + Redis (cache) → FastAPI (web UI)

Key Features

  • Real-time Event Streaming: Fault-tolerant event delivery via Kafka (KRaft mode)
  • Stream Processing: Spark Structured Streaming for real-time location updates and trip aggregations
  • Dual Storage: HBase for historical analytics, Redis for hot data with 5-minute TTL
  • Async Web API: FastAPI with REST endpoints and WebSocket support for live updates
  • Data Generation: Synthetic taxi GPS data with realistic movement patterns using Faker and geopy
  • Container-Ready: Docker Compose for local development, Kubernetes manifests for production
  • Quality Assurance: Comprehensive test coverage with pytest, Ruff linting, Snyk security scanning

Built With

Python Apache Kafka Apache Spark HBase Redis FastAPI

(back to top)


Getting Started

Follow these steps to set up Navflux on your local machine.

Prerequisites

  • Python: 3.11+ (managed via uv)
  • Docker: 20.10+ and Docker Compose 2.0+
  • Kubernetes: 1.28+ (optional, for K8s deployment)
  • System Resources: 8GB RAM minimum, 16GB recommended

Installation

  1. Clone the repository

    git clone https://github.com/vijethph/Navflux.git
    cd Navflux
  2. Install uv package manager (if not installed)

    curl -LsSf https://astral.sh/uv/install.sh | sh
  3. Install Python dependencies

    uv sync
  4. Create environment configuration

    cp .env.example .env
    # Edit .env with your configuration
  5. Start infrastructure with Docker Compose

    docker compose up -d
  6. Verify services are healthy

    docker compose ps
    # All services should show status: Up (healthy)

(back to top)


Usage

Running the Application

Start all services:

docker compose up -d

Generate synthetic taxi data:

uv run python -m src.data_generator.generator

Publish events to Kafka:

uv run python -m src.kafka_producer.producer

Start Spark Streaming job:

spark-submit \
  --master local[*] \
  --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1 \
  src/spark_streaming/streaming_job.py

Launch FastAPI server:

uv run uvicorn src.web_ui.app:app --host 0.0.0.0 --port 8000

Access Web UI:

http://localhost:8000

API Endpoints

Health Check:

curl http://localhost:8000/health

Get All Taxis:

curl http://localhost:8000/api/taxis

Get Taxi Location:

curl http://localhost:8000/api/taxis/{taxi_id}

Get Trip History:

curl http://localhost:8000/api/trips?start_date=2024-01-01&end_date=2024-01-31

WebSocket Live Updates:

const ws = new WebSocket("ws://localhost:8000/ws/taxis");
ws.onmessage = (event) => console.log(JSON.parse(event.data));

Screenshots

Navflux Web UI

(back to top)


Deployment

Docker Compose

Production deployment:

docker compose -f docker-compose.yml up -d

View logs:

docker compose logs -f spark-streaming

Scale services:

docker compose up -d --scale spark-worker=3

Kubernetes

Apply manifests:

kubectl apply -f kubernetes/

Check deployment status:

kubectl get pods -n navflux

Access FastAPI service:

kubectl port-forward svc/navflux-web-ui 8000:8000 -n navflux

Management scripts:

# Deploy all services
./scripts/k8s-manager.sh deploy-all

# Access web UI
./scripts/k8s-manager.sh access-ui

# View Spark logs
./scripts/k8s-manager.sh logs-spark

(back to top)


Project Structure

Navflux/
├── src/
│   ├── data_generator/       # Synthetic GPS data generation
│   ├── kafka_producer/        # Event streaming to Kafka
│   ├── spark_streaming/       # Real-time processing jobs
│   ├── hbase_connector/       # Historical storage client
│   ├── redis_cache/           # Hot cache management
│   └── web_ui/                # FastAPI web gateway
├── tests/                     # pytest test suite
├── config/                    # Configuration management
├── docker/                    # Dockerfiles (multi-stage builds)
├── kubernetes/                # K8s manifests
├── scripts/                   # Deployment and management scripts
├── data/                      # Data directories (checkpoints, raw, processed)
├── logs/                      # Application logs
├── docker-compose.yml         # Docker Compose configuration
├── pyproject.toml             # Python dependencies (uv)
└── README.md                  # This file

(back to top)


Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the Project
  2. Create your Feature Branch
    git checkout -b feature/AmazingFeature
  3. Make your changes following code standards:
    • PEP 8 compliance
    • Type hints for all functions
    • reStructuredText docstrings
  4. Run tests and quality checks:
    uv run ruff check src/ tests/
    uv run pytest --cov
    snyk code test
  5. Commit your Changes
    git commit -m 'feat: Add AmazingFeature'
  6. Push to the Branch
    git push origin feature/AmazingFeature
  7. Open a Pull Request

(back to top)


License

Distributed under the Apache 2.0 License. See LICENSE for more information.

(back to top)


Contact

Vijeth P H - @vijethph

Project Link: https://github.com/vijethph/Navflux

(back to top)


Acknowledgments

(back to top)


About

Real-time taxi movement tracking system using Kafka and Spark

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published