Kitagawa is a Python-based maritime vessel inference pipeline for Sentinel imagery.
It runs vessel detection on:
- Sentinel-1 SAR scenes
- Sentinel-2 EO scenes
Then it post-processes detections, enriches them with timestamps, correlates detections with AIS tracks, and exports submission and visualization artifacts.
The project is orchestrated by a Prefect flow (inference_pipeline.py) and is designed to run inside Docker with GPU support.
- Multi-sensor inference:
- SAR detection from Sentinel-1 (
inference/imagery/SAR/inference.py) - EO detection from Sentinel-2 (
inference/imagery/EO/inference.py)
- SAR detection from Sentinel-1 (
- Detection post-processing:
- NMS filtering (
inference/filter/nms_filter.py) - Landmask filtering against GSHHG shoreline polygons (
inference/filter/landmask_filter.py) - EO cloud-based filtering using ConvNeXt (
inference/filter/cloudmask_filter.py)
- NMS filtering (
- Detection timestamp generation:
- SAR timestamps from calibration vectors (
generate_detection_datetime_SAR.py) - EO timestamps from granule sensing metadata (
generate_detection_datetime_EO.py)
- SAR timestamps from calibration vectors (
- Output generation:
- COCO-like
detections.json - Per-scene shapefiles from detections
- AIS interpolation output (
ais_interpolated_data.csv) - AIS correlation output (
AIS_correlation.csv)
- COCO-like
- Visualization outputs (HTML/PNG):
- EO and SAR detection visualization images
- Detection bbox map
- AIS interpolation/correlation/track maps
- Data preparation utilities:
- Input CSV extraction and image list generation
- Sentinel image download
- NOAA AIS download/extraction
- Landmask download/extraction
- Python
- Prefect (pipeline orchestration)
- PyTorch + Ultralytics YOLO (inference)
- Rasterio / GDAL / GeoPandas / Shapely / PyProj (geospatial processing)
- Pandas / NumPy / SciPy (data processing/interpolation)
- Folium / Matplotlib / OpenCV (visualization)
- Docker + Docker Compose (containerized execution)
- GitHub Actions (ruff lint + Docker build/push workflow)
- Ensure Docker and NVIDIA runtime are available.
- From repository root:
docker compose up --buildThis builds the image and starts the kitagawa-inference service defined in docker-compose.yml.
Build image:
docker build -t kitagawa-inference:latest -f dockerfile .Run container (from docker_run.sh):
docker run --gpus all -it \
-v $(pwd)/mnt:/app/mnt \
-p 4200:4200 \
kitagawa-inference:latest bashFrom scripts/:
python prepare_inference_data_pipeline.pyThis runs:
extract_data_from_input.pydownload_AIS_data.pydownload_image_data.pydownload_landmask.py
Main orchestration:
python inference_pipeline.pyFlow order:
- SAR inference
- EO inference
- NMS, landmask, cloudmask filters
- Detection datetime generation (SAR + EO)
- Submission file generation (
detections.json, shapefiles, AIS interpolation, AIS correlation) - Visualization generation
docker_entrypoint.sh starts Prefect server (0.0.0.0:4200) and then launches:
python inference_pipeline.py.
├── inference_pipeline.py
├── dockerfile
├── docker-compose.yml
├── docker_entrypoint.sh
├── requirements_inference.txt
├── scripts/
│ ├── prepare_inference_data_pipeline.py
│ ├── extract_data_from_input.py
│ ├── download_image_data.py
│ ├── download_AIS_data.py
│ ├── download_landmask.py
│ └── requirements_prepare.txt
├── inference/
│ ├── imagery/
│ │ ├── EO/
│ │ │ ├── inference.py
│ │ │ ├── generate_detection_datetime_EO.py
│ │ │ └── generate_inference_visualizations.py
│ │ ├── SAR/
│ │ │ ├── inference.py
│ │ │ ├── generate_detection_datetime_SAR.py
│ │ │ └── generate_inference_visualizations.py
│ │ ├── generate_detection_json_file.py
│ │ ├── generate_detection_shape_files.py
│ │ └── generate_bbox_visualizations.py
│ ├── filter/
│ │ ├── nms_filter.py
│ │ ├── landmask_filter.py
│ │ └── cloudmask_filter.py
│ └── AIS/
│ ├── generate_AIS_interpolated_file.py
│ ├── generate_AIS_correlation_file.py
│ ├── generate_AIS_interpolation_visualization.py
│ ├── generate_AIS_correlation_visualizations.py
│ └── generate_AIS_track_visualizations.py
└── .github/workflows/
├── linter_test.yml
└── build_and_push_image.yml
This section explains what each file does, in execution order and by responsibility, so a new user can understand the complete codebase quickly.
inference_pipeline.py: main Prefect flow; runs all inference, filtering, enrichment, artifact generation, and visualization steps in sequence.dockerfile: builds GPU-ready runtime (PyTorch CUDA base image), installs GDAL/geospatial dependencies, copies pipeline code, and sets container entrypoint.docker-compose.yml: defineskitagawa-inferenceservice, GPU access,mntvolume mount, and Prefect port mapping (4200).docker_entrypoint.sh: starts Prefect server (if not already up), waits for readiness, then runsinference_pipeline.py.docker_run.sh: helper command for running the built image interactively with GPU and mountedmntdirectory.create_sha.sh: exports docker image tar, zips it, and prints SHA256 checksum.requirements_inference.txt: Python dependencies used by inference and post-processing pipeline.README.md: project documentation..gitignore: Python/tooling ignore rules; also ignoresmnt/generated data.
.github/workflows/linter_test.yml: installs Ruff and runsruff check .on push/PR tomain..github/workflows/build_and_push_image.yml: builds and pushes Docker image to DockerHub onmainand manual dispatch.
scripts/prepare_inference_data_pipeline.py: lightweight orchestrator; runs data prep steps and executes downloads concurrently with retries.scripts/extract_data_from_input.py: parses imagery metadata CSV and generates:mnt/data/vessel_detection_image_names.txtmnt/data/AIS_correlation_image_names.txt
scripts/download_image_data.py: authenticates with Copernicus Data Space (COPERNICUS_USERNAME/COPERNICUS_PASSWORD), searches products by name, downloads ZIPs, and extracts SAFE folders.scripts/download_AIS_data.py: reads required dates from image names, downloads NOAA AIS daily archives (.zipor.zstdepending on year), and extracts CSVs tomnt/data/AIS.scripts/download_landmask.py: downloads and extracts GSHHG shoreline shapefile package tomnt/data/mask.scripts/requirements_prepare.txt: dependencies specifically for data prep scripts.
inference/imagery/SAR/inference.py:- scans Sentinel-1 SAFE folders,
- reads VV TIFFs in tiles,
- runs YOLO detections,
- converts pixel boxes to georeferenced polygon WKT + centroid lat/lon,
- writes
mnt/data/detections/detections_s1.csv.
inference/imagery/EO/inference.py:- scans Sentinel-2 SAFE folders,
- reads RGB bands (
B04,B03,B02) in patches, - runs YOLO detections,
- georeferences detections to WKT + centroid,
- writes
mnt/data/detections/detections_s2.csv.
inference/imagery/SAR/generate_detection_datetime_SAR.py: estimates per-detection timestamps by interpolating calibration XML azimuth timing using detection line position.inference/imagery/EO/generate_detection_datetime_EO.py: estimates per-detection timestamps from granule sensing time and row fraction in image height.inference/imagery/SAR/generate_inference_visualizations.py: produces per-scene SAR PNG summary (full downsampled scene + detection crop grid).inference/imagery/EO/generate_inference_visualizations.py: produces per-scene EO PNG summary (full downsampled scene + detection crop grid).inference/imagery/generate_detection_json_file.py:- merges S1/S2 detection CSVs,
- attaches image metadata,
- emits COCO-like
mnt/results/submissions/detections.json.
inference/imagery/generate_detection_shape_files.py: readsdetections.json+ input metadata CSV and writes per-scene ESRI shapefiles with annotation geometry and metadata.inference/imagery/generate_bbox_visualizations.py: renders all detection polygons in an interactive Folium map and saves HTML.
inference/filter/nms_filter.py: applies torchvision NMS to each image’s detection set and overwrites detection CSVs with kept boxes only.inference/filter/landmask_filter.py: removes detections whose centroid falls on land using GSHHG polygons and a spatial index.inference/filter/cloudmask_filter.py:- reconstructs cloud model archive from split parts,
- loads ConvNeXt binary classifier weights,
- classifies small EO patches around detection centroids,
- keeps rows meeting cloud confidence threshold,
- overwrites
detections_s2.csvwith filtered rows.
inference/AIS/generate_AIS_interpolated_file.py: fills AIS interpolation target CSV using physics-informed cubic interpolation (lat/lon/speed/course) and savesais_interpolated_data.csv.inference/AIS/generate_AIS_correlation_file.py:- loads detections from S1/S2 CSVs (filtered to configured image list),
- streams AIS CSVs, builds per-MMSI tracks in time windows,
- interpolates tracks in parallel,
- matches detections to nearest AIS points with time + distance tolerances,
- writes
mnt/results/submissions/AIS_correlation.csv.
inference/AIS/generate_AIS_interpolation_visualization.py: interactive map comparing original interpolation input points and interpolated AIS paths.inference/AIS/generate_AIS_correlation_visualizations.py: per-day/per-file map showing detection markers and AIS points/tracks within square windows around detections.inference/AIS/generate_AIS_track_visualizations.py: fast AIS-only track maps (recent points per MMSI) for exploratory review.
- Input metadata CSVs + credentials drive data prep scripts.
- Downloaded SAFE imagery + AIS CSVs are stored under
mnt/data. - EO/SAR inference generates raw detections CSVs.
- Filters clean detections (NMS, landmask, cloudmask).
- Datetime scripts append
detection_datetimeto both detection CSVs. - Submission generators produce
detections.json, shapefiles, AIS interpolation, and AIS correlation outputs. - Visualization scripts create HTML/PNG artifacts for manual QA.
COPERNICUS_USERNAME(used byscripts/download_image_data.py)COPERNICUS_PASSWORD(used byscripts/download_image_data.py)
If these are missing, image download script raises a runtime error.
The pipeline uses /app/mnt (mounted from local ./mnt) as working data root.
Common generated locations:
mnt/data/inference_images/(downloaded SAFE folders)mnt/data/detections/detections_s1.csvmnt/data/detections/detections_s2.csvmnt/results/submissions/detections.jsonmnt/results/submissions/ais_interpolated_data.csvmnt/results/submissions/AIS_correlation.csvmnt/results/additional_information/...(visualizations + shapefiles)
There is no package.json or Makefile; commands are script-driven.
- Data preparation:
python scripts/prepare_inference_data_pipeline.pypython scripts/extract_data_from_input.pypython scripts/download_image_data.pypython scripts/download_AIS_data.pypython scripts/download_landmask.py
- Inference orchestration:
python inference_pipeline.py
- Container helpers:
docker compose up --buildbash docker_run.shbash create_sha.sh(save image tar, zip, sha256)
Application code does not expose a custom HTTP API for inference.
The container startup script uses Prefect server endpoints:
http://0.0.0.0:4200/apihttp://0.0.0.0:4200/api/ready
- Create a feature branch.
- Keep code style compatible with Ruff checks (
ruff check .). - Validate pipeline behavior on representative input data under
mnt/input.