Skip to content

RaghavGohil/kitagawa

Repository files navigation

Kitagawa Inference Pipeline

Description

Kitagawa is a Python-based maritime vessel inference pipeline for Sentinel imagery.
It runs vessel detection on:

  • Sentinel-1 SAR scenes
  • Sentinel-2 EO scenes

Then it post-processes detections, enriches them with timestamps, correlates detections with AIS tracks, and exports submission and visualization artifacts.

The project is orchestrated by a Prefect flow (inference_pipeline.py) and is designed to run inside Docker with GPU support.

Features

  • Multi-sensor inference:
    • SAR detection from Sentinel-1 (inference/imagery/SAR/inference.py)
    • EO detection from Sentinel-2 (inference/imagery/EO/inference.py)
  • Detection post-processing:
    • NMS filtering (inference/filter/nms_filter.py)
    • Landmask filtering against GSHHG shoreline polygons (inference/filter/landmask_filter.py)
    • EO cloud-based filtering using ConvNeXt (inference/filter/cloudmask_filter.py)
  • Detection timestamp generation:
    • SAR timestamps from calibration vectors (generate_detection_datetime_SAR.py)
    • EO timestamps from granule sensing metadata (generate_detection_datetime_EO.py)
  • Output generation:
    • COCO-like detections.json
    • Per-scene shapefiles from detections
    • AIS interpolation output (ais_interpolated_data.csv)
    • AIS correlation output (AIS_correlation.csv)
  • Visualization outputs (HTML/PNG):
    • EO and SAR detection visualization images
    • Detection bbox map
    • AIS interpolation/correlation/track maps
  • Data preparation utilities:
    • Input CSV extraction and image list generation
    • Sentinel image download
    • NOAA AIS download/extraction
    • Landmask download/extraction

Tech Stack

  • Python
  • Prefect (pipeline orchestration)
  • PyTorch + Ultralytics YOLO (inference)
  • Rasterio / GDAL / GeoPandas / Shapely / PyProj (geospatial processing)
  • Pandas / NumPy / SciPy (data processing/interpolation)
  • Folium / Matplotlib / OpenCV (visualization)
  • Docker + Docker Compose (containerized execution)
  • GitHub Actions (ruff lint + Docker build/push workflow)

Installation

Option 1: Docker Compose (recommended)

  1. Ensure Docker and NVIDIA runtime are available.
  2. From repository root:
docker compose up --build

This builds the image and starts the kitagawa-inference service defined in docker-compose.yml.

Option 2: Docker run manually

Build image:

docker build -t kitagawa-inference:latest -f dockerfile .

Run container (from docker_run.sh):

docker run --gpus all -it \
  -v $(pwd)/mnt:/app/mnt \
  -p 4200:4200 \
  kitagawa-inference:latest bash

Usage

1) Prepare inference data (outside the container flow, if needed)

From scripts/:

python prepare_inference_data_pipeline.py

This runs:

  1. extract_data_from_input.py
  2. download_AIS_data.py
  3. download_image_data.py
  4. download_landmask.py

2) Run full inference pipeline

Main orchestration:

python inference_pipeline.py

Flow order:

  1. SAR inference
  2. EO inference
  3. NMS, landmask, cloudmask filters
  4. Detection datetime generation (SAR + EO)
  5. Submission file generation (detections.json, shapefiles, AIS interpolation, AIS correlation)
  6. Visualization generation

3) Run with Docker entrypoint

docker_entrypoint.sh starts Prefect server (0.0.0.0:4200) and then launches:

python inference_pipeline.py

Project Structure

.
├── inference_pipeline.py
├── dockerfile
├── docker-compose.yml
├── docker_entrypoint.sh
├── requirements_inference.txt
├── scripts/
│   ├── prepare_inference_data_pipeline.py
│   ├── extract_data_from_input.py
│   ├── download_image_data.py
│   ├── download_AIS_data.py
│   ├── download_landmask.py
│   └── requirements_prepare.txt
├── inference/
│   ├── imagery/
│   │   ├── EO/
│   │   │   ├── inference.py
│   │   │   ├── generate_detection_datetime_EO.py
│   │   │   └── generate_inference_visualizations.py
│   │   ├── SAR/
│   │   │   ├── inference.py
│   │   │   ├── generate_detection_datetime_SAR.py
│   │   │   └── generate_inference_visualizations.py
│   │   ├── generate_detection_json_file.py
│   │   ├── generate_detection_shape_files.py
│   │   └── generate_bbox_visualizations.py
│   ├── filter/
│   │   ├── nms_filter.py
│   │   ├── landmask_filter.py
│   │   └── cloudmask_filter.py
│   └── AIS/
│       ├── generate_AIS_interpolated_file.py
│       ├── generate_AIS_correlation_file.py
│       ├── generate_AIS_interpolation_visualization.py
│       ├── generate_AIS_correlation_visualizations.py
│       └── generate_AIS_track_visualizations.py
└── .github/workflows/
    ├── linter_test.yml
    └── build_and_push_image.yml

File-by-File Walkthrough

This section explains what each file does, in execution order and by responsibility, so a new user can understand the complete codebase quickly.

Repository root

  • inference_pipeline.py: main Prefect flow; runs all inference, filtering, enrichment, artifact generation, and visualization steps in sequence.
  • dockerfile: builds GPU-ready runtime (PyTorch CUDA base image), installs GDAL/geospatial dependencies, copies pipeline code, and sets container entrypoint.
  • docker-compose.yml: defines kitagawa-inference service, GPU access, mnt volume mount, and Prefect port mapping (4200).
  • docker_entrypoint.sh: starts Prefect server (if not already up), waits for readiness, then runs inference_pipeline.py.
  • docker_run.sh: helper command for running the built image interactively with GPU and mounted mnt directory.
  • create_sha.sh: exports docker image tar, zips it, and prints SHA256 checksum.
  • requirements_inference.txt: Python dependencies used by inference and post-processing pipeline.
  • README.md: project documentation.
  • .gitignore: Python/tooling ignore rules; also ignores mnt/ generated data.

CI workflows

  • .github/workflows/linter_test.yml: installs Ruff and runs ruff check . on push/PR to main.
  • .github/workflows/build_and_push_image.yml: builds and pushes Docker image to DockerHub on main and manual dispatch.

Data preparation scripts (scripts/)

  • scripts/prepare_inference_data_pipeline.py: lightweight orchestrator; runs data prep steps and executes downloads concurrently with retries.
  • scripts/extract_data_from_input.py: parses imagery metadata CSV and generates:
    • mnt/data/vessel_detection_image_names.txt
    • mnt/data/AIS_correlation_image_names.txt
  • scripts/download_image_data.py: authenticates with Copernicus Data Space (COPERNICUS_USERNAME/COPERNICUS_PASSWORD), searches products by name, downloads ZIPs, and extracts SAFE folders.
  • scripts/download_AIS_data.py: reads required dates from image names, downloads NOAA AIS daily archives (.zip or .zst depending on year), and extracts CSVs to mnt/data/AIS.
  • scripts/download_landmask.py: downloads and extracts GSHHG shoreline shapefile package to mnt/data/mask.
  • scripts/requirements_prepare.txt: dependencies specifically for data prep scripts.

Inference imagery modules (inference/imagery/)

  • inference/imagery/SAR/inference.py:
    • scans Sentinel-1 SAFE folders,
    • reads VV TIFFs in tiles,
    • runs YOLO detections,
    • converts pixel boxes to georeferenced polygon WKT + centroid lat/lon,
    • writes mnt/data/detections/detections_s1.csv.
  • inference/imagery/EO/inference.py:
    • scans Sentinel-2 SAFE folders,
    • reads RGB bands (B04, B03, B02) in patches,
    • runs YOLO detections,
    • georeferences detections to WKT + centroid,
    • writes mnt/data/detections/detections_s2.csv.
  • inference/imagery/SAR/generate_detection_datetime_SAR.py: estimates per-detection timestamps by interpolating calibration XML azimuth timing using detection line position.
  • inference/imagery/EO/generate_detection_datetime_EO.py: estimates per-detection timestamps from granule sensing time and row fraction in image height.
  • inference/imagery/SAR/generate_inference_visualizations.py: produces per-scene SAR PNG summary (full downsampled scene + detection crop grid).
  • inference/imagery/EO/generate_inference_visualizations.py: produces per-scene EO PNG summary (full downsampled scene + detection crop grid).
  • inference/imagery/generate_detection_json_file.py:
    • merges S1/S2 detection CSVs,
    • attaches image metadata,
    • emits COCO-like mnt/results/submissions/detections.json.
  • inference/imagery/generate_detection_shape_files.py: reads detections.json + input metadata CSV and writes per-scene ESRI shapefiles with annotation geometry and metadata.
  • inference/imagery/generate_bbox_visualizations.py: renders all detection polygons in an interactive Folium map and saves HTML.

Detection filters (inference/filter/)

  • inference/filter/nms_filter.py: applies torchvision NMS to each image’s detection set and overwrites detection CSVs with kept boxes only.
  • inference/filter/landmask_filter.py: removes detections whose centroid falls on land using GSHHG polygons and a spatial index.
  • inference/filter/cloudmask_filter.py:
    • reconstructs cloud model archive from split parts,
    • loads ConvNeXt binary classifier weights,
    • classifies small EO patches around detection centroids,
    • keeps rows meeting cloud confidence threshold,
    • overwrites detections_s2.csv with filtered rows.

AIS modules (inference/AIS/)

  • inference/AIS/generate_AIS_interpolated_file.py: fills AIS interpolation target CSV using physics-informed cubic interpolation (lat/lon/speed/course) and saves ais_interpolated_data.csv.
  • inference/AIS/generate_AIS_correlation_file.py:
    • loads detections from S1/S2 CSVs (filtered to configured image list),
    • streams AIS CSVs, builds per-MMSI tracks in time windows,
    • interpolates tracks in parallel,
    • matches detections to nearest AIS points with time + distance tolerances,
    • writes mnt/results/submissions/AIS_correlation.csv.
  • inference/AIS/generate_AIS_interpolation_visualization.py: interactive map comparing original interpolation input points and interpolated AIS paths.
  • inference/AIS/generate_AIS_correlation_visualizations.py: per-day/per-file map showing detection markers and AIS points/tracks within square windows around detections.
  • inference/AIS/generate_AIS_track_visualizations.py: fast AIS-only track maps (recent points per MMSI) for exploratory review.

End-to-end data flow (quick mental model)

  1. Input metadata CSVs + credentials drive data prep scripts.
  2. Downloaded SAFE imagery + AIS CSVs are stored under mnt/data.
  3. EO/SAR inference generates raw detections CSVs.
  4. Filters clean detections (NMS, landmask, cloudmask).
  5. Datetime scripts append detection_datetime to both detection CSVs.
  6. Submission generators produce detections.json, shapefiles, AIS interpolation, and AIS correlation outputs.
  7. Visualization scripts create HTML/PNG artifacts for manual QA.

Configuration

Required environment variables

  • COPERNICUS_USERNAME (used by scripts/download_image_data.py)
  • COPERNICUS_PASSWORD (used by scripts/download_image_data.py)

If these are missing, image download script raises a runtime error.

Runtime paths and outputs

The pipeline uses /app/mnt (mounted from local ./mnt) as working data root.

Common generated locations:

  • mnt/data/inference_images/ (downloaded SAFE folders)
  • mnt/data/detections/detections_s1.csv
  • mnt/data/detections/detections_s2.csv
  • mnt/results/submissions/detections.json
  • mnt/results/submissions/ais_interpolated_data.csv
  • mnt/results/submissions/AIS_correlation.csv
  • mnt/results/additional_information/... (visualizations + shapefiles)

Scripts

There is no package.json or Makefile; commands are script-driven.

  • Data preparation:
    • python scripts/prepare_inference_data_pipeline.py
    • python scripts/extract_data_from_input.py
    • python scripts/download_image_data.py
    • python scripts/download_AIS_data.py
    • python scripts/download_landmask.py
  • Inference orchestration:
    • python inference_pipeline.py
  • Container helpers:
    • docker compose up --build
    • bash docker_run.sh
    • bash create_sha.sh (save image tar, zip, sha256)

API Endpoints

Application code does not expose a custom HTTP API for inference.

The container startup script uses Prefect server endpoints:

  • http://0.0.0.0:4200/api
  • http://0.0.0.0:4200/api/ready

Contributing

  1. Create a feature branch.
  2. Keep code style compatible with Ruff checks (ruff check .).
  3. Validate pipeline behavior on representative input data under mnt/input.

About

This project focuses on multi-sensor maritime vessel localization, classification and vessel trajectory interpolation and forecasting with AIS correlation for datasets available from Sentinel-1 and Sentinel-2 imagery.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors