An efficient and faithful Python implementation of the Bounded Multi-Source Shortest Path (BMSSP) algorithm, along with baselines using pure-Python Dijkstra and SciPy-accelerated Dijkstra.
This implementation follows the algorithmic structure in the pseudocode (Algorithm 3: BMSSP) and preserves its key guarantees, including never writing distances ≥ B, not splitting tie groups in the controller structure D, and correctly maintaining boundary keys.
Paper reference: "A New Algorithm for the Single-Source Shortest Path Problem" (Ruobing Chen et al.).
BMSSP solves SSSP by recursively partitioning the problem. Compared to classic Dijkstra (≈ O(m + n log n)), BMSSP leverages a divide-and-conquer strategy with theoretical complexity O(m log^{2/3} n) and a specialized controller structure D to orchestrate bounded subproblems efficiently.
bmssp_py/common.py: Graph and Edge classes, plusinit_distances()helper. ExposesGraph.to_csr()for SciPy integration.bmssp_py/dijkstra.py: Pure-Python Dijkstra (multi-source, optional distance boundary).bmssp_py/dijkstra_scipy.py: SciPy-accelerated Dijkstra usingscipy.sparse.csgraph.dijkstrawith optimized multi-source support.bmssp_py/bmssp.py: Full BMSSP implementation with recursion, base case, pivot selection, controller structure_DataStructureD, andchoose_parameters()function.tests/: Unit tests and benchmark tests (pytest-benchmark).example.py: Minimal example to run BMSSP.scripts/quick_bench.py: Fast micro-benchmarks with beautiful terminal output and performance analysis.scripts/performance_analysis.py: Detailed stability analysis and performance variation study.scripts/advanced_benchmark.py: Comprehensive performance analysis with sophisticated graph generators, memory profiling, and crossover point detection.scripts/enterprise_benchmark.py: Enterprise-level benchmarking with parallel execution, statistical analysis, and hardware utilization monitoring.
- Initialize and Insert (lines 6–7):
_DataStructureD.initialize(M, B)and insertion of pivots with their current estimates. - Pull (line 8):
_DataStructureD.pull()drains the entire tie group for the smallest key even whenM = 1, ensuring tie groups are not split. It returnsBias the next strictly larger key (orBif none), exactly as required. - BaseCase (line 2):
_base_caseruns a bounded Dijkstra-like search and never writes distances ≥ B. The base-case singleton/split behavior mirrors the pseudocode. - Relaxations (lines 13–19): After recursion, edges out of
U_iare relaxed with the ≤ condition, but distances ≥ B are never written; insertions and theKset respect[B'_i, B_i). - BatchPrepend (line 21):
_DataStructureD.batch_prependmerges the smallest distance per vertex and inserts them as a batch. - Return (line 22): Returns
B' = min{B'_i, B}and augmentsUwith allx ∈ Wwhose current estimate is <B'.
We also provide regression tests to ensure tie groups are fully drained by _DataStructureD.pull().
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtRequirements: Python 3.9–3.12, pytest, pytest-benchmark, numpy, scipy>=1.3.0, psutil, matplotlib, networkx.
Note: SciPy 1.3.0+ is required for the min_only parameter in scipy.sparse.csgraph.dijkstra which enables optimized multi-source shortest path computation.
from bmssp_py.common import Graph
from bmssp_py.bmssp import BMSSPAlgorithm, choose_parameters
g = Graph(5)
g.add_undirected(0, 1, 1.0)
g.add_undirected(1, 2, 2.0)
g.add_undirected(2, 3, 1.0)
g.add_undirected(3, 4, 3.0)
# Use adaptive parameters
k, t, l = choose_parameters(g.n, mode="performance")
algo = BMSSPAlgorithm(g, l=l, B=1000.0, sources=[0], k_override=k, t_override=t)
dist = algo.solve()
print(f"Parameters: k={k}, t={t}, l={l}")
print(f"Metrics: {algo.metrics}")Run the example:
PYTHONPATH=. python example.pyBMSSP uses three key parameters derived from graph size n:
k = ⌊log^(1/3) n⌋: Controls pivot selection and base case sizet = ⌊log^(2/3) n⌋: Determines recursion branching factorl = ⌈log n / t⌉: Number of recursion levels
The choose_parameters(n, mode) function provides three tuning modes:
performance(default): Reduces k, increases t for small graphs to minimize overheadcompleteness: Increases k for better exploration, may use more recursion levelsbalanced: Uses theoretical parameters k=⌊log^(1/3) n⌋, t=⌊log^(2/3) n⌋
Both BMSSPAlgorithm and DijkstraAlgorithm track execution metrics:
- BMSSP:
edge_relaxations,basecase_pq_pops,controller_inserts,vertices_discovered - Dijkstra:
edge_relaxations,pq_pops,vertices_processed
These provide hardware-independent performance measures for algorithm comparison.
PYTHONPATH=. pytest -qHighlights:
tests/test_bmssp.py: BMSSP correctness (linear, cycle, disconnected, fractional weights, boundary behavior).tests/test_dijkstra.py: Dijkstra correctness (single/multi-source, bounded, cycle, disconnected).tests/test_data_structure_d.py: ensures_DataStructureD.pull()drains tie groups and keeps smallest values on batch prepend.
Fast micro-benchmarks with execution metrics:
# Default performance mode, 3 repetitions
PYTHONPATH=. python scripts/quick_bench.py --reps 3
# Completeness mode, 5 repetitions
PYTHONPATH=. python scripts/quick_bench.py --mode completeness --reps 5- Beautiful Terminal Output: Color-coded results with formatted tables
- Performance Analysis: Automatic winner detection and improvement calculations
- Algorithm Metrics: Detailed execution statistics for each algorithm
- Recommendations: Smart suggestions based on results
Tests linear (n=100), grid (10×10), and random (n=200, m=400) graphs, showing:
- Runtime statistics (mean, median, stdev, min, max)
- Performance improvements vs baseline
- BMSSP parameters used (k, t, l)
- Execution metrics for both algorithms
Comprehensive analysis of performance variations and stability:
# Run detailed stability analysis
PYTHONPATH=. python scripts/performance_analysis.pyPerformance variations between runs are normal and expected due to:
System Factors:
- CPU frequency scaling and thermal throttling
- Background processes and system load
- Memory allocation patterns and garbage collection
- Cache warming and CPU branch prediction
Python-Specific Factors:
- Python's garbage collector timing
- Memory allocation and deallocation patterns
- Import time and module loading
Algorithm-Specific Factors:
- Random number generation (for graph generation)
- Hash table collisions and resizing
- Memory access patterns and cache misses
- Heap operations and priority queue behavior
- Coefficient of Variation (CV): Measures consistency (lower = more stable)
- CV < 5%: Excellent stability
- CV 5-10%: Good stability
- CV 10-20%: Fair stability
- CV > 20%: Poor stability
- Use multiple runs (10+ recommended for reliable results)
- Report mean/median with standard deviation
- Consider CV < 10% as stable
- Run on dedicated hardware for best results
- Single measurements are not reliable for performance comparison
Comprehensive performance analysis with sophisticated graph generators, memory profiling, and crossover point detection:
# Basic run with default settings
PYTHONPATH=. python scripts/advanced_benchmark.py
# Custom configuration
PYTHONPATH=. python scripts/advanced_benchmark.py \
--sizes 32 64 128 256 512 \
--types sparse_random scale_free small_world \
--trials 5 \
--mode balanced- Sophisticated Graph Generators: Scale-free, small-world, grid, sparse random graphs
- Memory Profiling: Peak memory usage tracking with
psutil - Crossover Point Detection: Automatic identification of when BMSSP becomes faster
- Theoretical vs Empirical Analysis: Comparison of complexity predictions vs actual performance
- Comprehensive Reporting: Detailed markdown reports and visualization plots
- Multiple Baselines: Comparison against both pure-Python and SciPy-accelerated Dijkstra
ADVANCED_PERFORMANCE_REPORT.md: Detailed analysis with crossover points and performance tablesadvanced_performance_analysis.png: Four-panel visualization showing runtime comparison, improvement percentages, memory usage, and theoretical vs empirical analysis
High-performance benchmarking designed for maximum hardware utilization with parallel execution, statistical analysis, and enterprise-level reporting:
# Basic enterprise run
PYTHONPATH=. python scripts/enterprise_benchmark.py
# High-performance configuration for powerful machines
PYTHONPATH=. python scripts/enterprise_benchmark.py \
--sizes 64 128 256 512 1024 2048 4096 \
--types sparse_random scale_free small_world hierarchical power_law \
--trials 10 \
--mode balanced \
--workers 16- Parallel Execution: Multi-process benchmarking using all available CPU cores
- Advanced Graph Topologies: Hierarchical, power-law, and community-structured graphs
- Statistical Robustness: Confidence intervals, coefficient of variation, and statistical significance testing
- Hardware Monitoring: Real-time CPU and memory utilization tracking
- Enterprise Reporting: Comprehensive markdown reports with statistical analysis
- Adaptive Configuration: Automatic worker scaling based on hardware capabilities
- Progress Tracking: Real-time progress monitoring with ETA calculations
results/ENTERPRISE_PERFORMANCE_REPORT.md: Comprehensive statistical analysisresults/enterprise_performance_analysis.png: Four-panel visualization with confidence intervals- Detailed performance metrics including percentiles, stability analysis, and hardware utilization
This repo includes four families of benchmarks:
- Quick Benchmarks: Fast micro-benchmarks with beautiful output for development and testing
- Performance Analysis: Detailed stability analysis and variation study
- Advanced Benchmarks: Comprehensive analysis with sophisticated graph generators
- Enterprise Benchmarks: High-performance parallel benchmarking for maximum hardware utilization
Pure-Python comparison (BMSSP vs Dijkstra):
- Linear graphs (small/medium/large)
- Grid graphs (small/medium)
- Random graphs (sparse/dense)
- Multi-source and bounded runs
SciPy-accelerated baseline (Dijkstra via SciPy) vs BMSSP on linear graphs:
- Demonstrates a fast compiled baseline for Dijkstra using
scipy.sparse.csgraph.dijkstra. - Optimized multi-source computation using
min_only=Trueparameter (SciPy 1.3.0+).
Run only the SciPy benchmarks (fast):
PYTHONPATH=. pytest -q -k scipy --benchmark-onlyRun the whole benchmark suite (short runs by default):
PYTHONPATH=. pytest -q -k benchmark --benchmark-onlyBenchmark config (short by default) is set in pytest.ini:
--benchmark-max-time=0.2
--benchmark-min-rounds=5
Tips for larger runs:
- Increase time/rounds on the command line, e.g.
--benchmark-max-time=2 --benchmark-min-rounds=10. - Select specific groups, e.g.
-k comparison_random_sparseor-k comparison_grid_medium.
- Throughput and latency of complete runs for each algorithm on identically constructed graphs and sources.
- Impact of recursion depth
land boundaryBon BMSSP performance. - Effect of topology (linear vs grid vs random sparse/dense) and number of sources.
- With SciPy, a compiled Dijkstra baseline to gauge the overheads of Python vs an optimized implementation.
- On small/simple graphs, Dijkstra (especially SciPy) generally outperforms BMSSP due to lower constant overhead.
- As graph sizes grow and in certain sparsity/topology regimes, BMSSP's bounded recursion can reduce the number of effective relaxations. In pure Python, the crossover point can require larger instances than in low-level languages.
- Performance variations are normal - use multiple runs and statistical analysis for reliable comparisons.
- CPU: 2+ cores, 2.0+ GHz
- RAM: 4 GB
- Storage: 1 GB free space
- OS: Linux/macOS/Windows with Python 3.9+
- CPU: 4+ cores, 2.5+ GHz (Intel i5/AMD Ryzen 5 or better)
- RAM: 8 GB
- Storage: 2 GB free space
- OS: Linux/macOS with Python 3.9+
- CPU: 8+ cores, 3.0+ GHz (Intel i7/AMD Ryzen 7 or better)
- RAM: 16+ GB
- Storage: 5+ GB free space (SSD recommended)
- OS: Linux/macOS with Python 3.9+
- CPU: 16+ cores, 3.5+ GHz (Intel i9/AMD Ryzen 9, Xeon, or EPYC)
- RAM: 32+ GB (64+ GB for very large graphs)
- Storage: 10+ GB free space (NVMe SSD recommended)
- OS: Linux/macOS with Python 3.9+
- Parallel Processing: Support for multiprocessing with high core count
- CPU: 4+ cores recommended for parallel processing
- RAM: 8+ GB for large graph generation and memory profiling
- Graphics: Not required (plots are generated as PNG files)
- Network: Not required (all processing is local)
- CPU: 8+ cores for optimal parallel execution
- RAM: 16+ GB for large-scale graph generation and statistical analysis
- Storage: Fast SSD for I/O-intensive operations
- Process Isolation: Support for multiprocessing with memory isolation
- Small graphs (n ≤ 1,000): Any modern laptop
- Medium graphs (n ≤ 10,000): Recommended specs or better
- Large graphs (n ≤ 100,000): High-performance specs required
- Very large graphs (n > 100,000): Enterprise specs recommended
- Massive graphs (n > 1,000,000): Server-grade hardware required
- Quick benchmarks: 10-30 seconds
- Performance analysis: 1-3 minutes
- Standard benchmarks: 1-5 minutes
- Advanced benchmarks: 5-30 minutes (depending on graph sizes and trials)
- Enterprise benchmarks: 10-60 minutes (depending on hardware and configuration)
- Large-scale testing: 1-6 hours
- CPU: Enterprise benchmarks automatically scale to use all available cores
- Memory: Efficient memory management with garbage collection optimization
- Storage: Results are saved incrementally to prevent data loss
- Network: All processing is local for maximum performance
_DataStructureDuses a single global heap and a map of best keys per vertex.pull()drains a full tie group for the current minimum key before advancingBi. This strictly follows the pseudocode and prevents tie-splitting bugs common in multi-heap designs.- BMSSP never writes distances ≥ B (base case and relaxations). Batch inserts keep only the smallest value per vertex.
- Pivot selection runs
krounds of ≤-relaxations within boundB, constructs the equality forest, and selects pivots with subtree size ≥k. - SciPy integration uses
min_only=Truefor multi-source shortest paths when available (SciPy 1.3.0+), providing significant performance improvements over manual iteration.
If you use this repository in academic work, please cite the BMSSP paper (Chen et al.) and reference this implementation.