ACO (Ant Colony Optimization) solver for the Capacitated Vehicle Routing Problem (CVRP) with three C/CUDA backends:
seq: sequential CPU baselineopenmp-mpi: hybrid shared/distributed-memory backendcuda: single-device GPU backend
The project is designed for reproducible HPC experiments on clusters, with a make + sbatch pipeline and CSV/manifest result collection.
Given a depot and n customers with unit demand, CVRP requires building K routes that:
- start/end at the depot,
- satisfy vehicle capacity constraints,
- minimize total travel cost.
Instances are provided in .vrp (TSPLIB-like) format and generated/filtered in instances/test_aligned.
The core method is ACO: each ant builds a solution using pheromone information (tau) + heuristic information (eta), then pheromones are updated via evaporation/deposition.
Main architectural choices (aligned with internal technical docs):
- V2 OpenMP+MPI as the stable parallel baseline (persistent threading, parallel updates, optimized MPI sync).
- V3 excluded from main runs: collaborative intra-ant parallelism introduces synchronization overhead that is not beneficial on general-purpose CPUs.
Internal references:
Solver code:
src/seq/src/openmp-mpi/src/cuda/src/common/
Public headers:
include/
Operational tooling:
tools/makefile/make modulestools/bash/shell launchers (solve_*)tools/python/generators/analysis/report toolstools/batch/Slurm job scripts
Technical documentation:
docs/
Minimum:
make,gccpython3
For MPI:
mpicc,mpirunorsrun
For CUDA:
nvcc+ NVIDIA GPU
make seq
make openmp_mpi
make cudaFull build:
make allmake generate_problemsMain outputs:
instances/test_aligned/*.vrpinstances/test_aligned/manifest.csvinstances/test_aligned/manifest_openmp_mpi.csvinstances/test_aligned/manifest_cuda.csv
Standard manifest-based runs:
make solve_seq
make solve_mpi
make solve_cudaUseful variables (passed via --make-args in batch jobs):
SOLVE_CLIENTSSOLVE_*_REPEATSSOLVE_*_RUNTIME_SSOLVE_*_STAGNATION_EPOCHSSOLVE_*_MIN_REL_IMPROVEMENT
Main targets:
make exp_strong_openmp
make exp_strong_mpi
make exp_strong_hybrid
make exp_weak_openmp
make exp_weak_mpi
make exp_weak_hybridPractical campaign pipeline:
- details: practical_experiment_campaign.md
- aggregated data:
merged_by_run_backend/*.csv - summary report: REPORT.md
Source: merged_by_run_backend/REPORT.md
Coverage (status=ok rows):
seq_performance: 7cuda_performance: 10openmp_strong: 14mpi_strong: 12hybrid_strong: 13openmp_weak: 6mpi_weak: 6hybrid_weak: 3seq_quality: 11mpi_quality: 16cuda_quality: 30
Key findings:
- OpenMP strong: best average tradeoff at
4threads (with variation at largest size). - MPI strong: best average configuration at
2ranks. - Hybrid strong: best average configuration
4x4(ranks x threads) on available data. - CUDA vs SEQ: CUDA is faster on all overlapping sizes, observed speedup ~
2.87x–111.21x.
Generated plots:
merged_by_run_backend/plots/strong_*merged_by_run_backend/plots/weak_*merged_by_run_backend/plots/seq_vs_cuda_elapsed.pngmerged_by_run_backend/plots/quality_*
Regenerate plots from aggregated CSV files:
python3 tools/python/plot_merged_by_run_backend.pyOutput:
merged_by_run_backend/plots/*.pngmerged_by_run_backend/plots/README.md
- Large-size results include single-run points; variance may be underestimated there.
- For final academic tables, use medians and report standard deviation when
repeats > 1. - The
mainbranch objective is keeping solver code stable; tuning/campaign work should be done through orchestration (make, batch, scripts), not continuous core-solver rewrites.
Use this repository for educational/research purposes and comparative CVRP benchmarking across CPU/MPI/CUDA backends.