The correct unit of optimization in heterogeneous computing is not the operation β it is the crossing.
CrossingBench is a reproducible microbenchmark for validating the Domain Crossing Law: in heterogeneous compute systems, energy efficiency is dominated by crossing volume weighted by boundary cost in crossing-dominated regimes (Ξ΅ β 1).
The formal law statement, definitions, and empirical basis are in
docs/CROSSING_LAW.md.
All energy parameters and their sources are in
docs/HYPOTHESES.md.
π Working Paper (Feb 2026)
CrossingFlow: Minimizing Domain Crossings in Heterogeneous Systems Under Error Constraints
Download PDF
Heterogeneous systems β analog CIM, chiplets, near-memory, multi-voltage β achieve impressive intra-domain efficiency. Yet system-level gains consistently fall short because domain crossings (ADC/DAC, DRAM fetch, inter-die transfer, level shifter) cost 5Γ to 32,000Γ more per byte than the compute they enable.
Figure 1. Crossing energy fraction vs. crossing volume for three boundary types. The analog curve shifts left due to ultra-low intra-domain compute energy (0.1 pJ/B vs 0.25 pJ/B digital baseline). Shaded regions: Β±20% uncertainty on Ξ². Reproduce with
python examples/plot_dominance.py.
CrossingBench lets you:
- Measure crossing dominance via the elasticity metric Ξ΅
- Sweep crossing volume and observe the computeβcrossing transition
- Compare baseline vs crossing-reduced schedules
- Validate the law on your own boundary parameters
In late 2025, OpenAI announced an effort with Broadcom to co-design custom silicon for large-scale LLM inference. Public framing of these next-generation accelerators echoes a broader industry shift: at scale, the dominant cost of inference is increasingly moving data across boundaries β between memory and compute, between dies, between voltage and signal domains β not the arithmetic itself.
That is exactly the regime CrossingBench was built to measure.
CrossingBench does not model any specific product, and makes no claim about the internal design of any vendor's hardware. What it offers is the missing unit of analysis for boundary-dominated systems: the crossing. When an accelerator's efficiency story depends on keeping data local and minimizing domain transfers, the relevant metric is not FLOPs β or even operations β but crossing volume weighted by boundary cost (the elasticity Ξ΅ this benchmark measures).
In short:
- The industry is moving toward inference accelerators whose value comes largely from reducing data movement, not just adding raw compute.
- CrossingBench provides a reproducible, vendor-neutral way to quantify that value β to ask how much of a system's energy is spent crossing boundaries, and how much a given schedule actually saves.
- The Domain Crossing Law makes the intuition explicit and falsifiable, so claims about "less data movement" can be measured rather than asserted.
If you are evaluating or building boundary-dominated hardware, this benchmark is a small, dependency-free way to test the assumption the field is now betting on.
C_total = C_intra + Ξ£_b (Ξ±_b Β· events_b + Ξ²_b Β· bytes_b)
Default parameters match published measurements at 7nm (see
docs/HYPOTHESES.md). Override any parameter on the
command line.
python -m pip install -e .Or run directly without installing:
python -m crossingbench --helpRequires Python β₯ 3.9. Zero runtime dependencies (pure Python).
crossingbench sweep \
--boundary analog --compute analog \
--bytes 262144 --cross_min 256 --cross_max 524288 \
--steps 12 --out analog.csvOutput: CSV with crossing_fraction and epsilon_local at each point.
crossingbench compare \
--boundary analog --compute analog \
--bytes 262144 --cross_bytes 196608 --reduce_factor 3Output: energy gain, crossing fractions, and effective elasticity.
bash examples/reproduce_three_boundaries.shpython examples/real_workloads.pypython examples/plot_dominance.py
| Boundary | Ξ² (pJ/byte) | Ξ± (pJ/event) | Source |
|---|---|---|---|
analog |
3.20 | 0.0 | Wan et al., Nature 2022; Murmann Survey |
memory |
1.25 | 0.0 | Horowitz 2014, 7nm scaled |
chiplet |
5.00 | 0.0 | UCIe 1.0 Specification |
hbm |
10.00 | 0.0 | Samsung HBM3 Datasheet |
voltage |
0.80 | 0.0 | Multi-Vdd DVFS literature |
Override: --beta 2.5 --alpha 50 --compute_pj_per_byte 0.1
| Column | Description |
|---|---|
bytes_compute |
Intra-domain compute volume |
bytes_cross |
Crossing volume |
events_cross |
Number of crossing events |
c_intra_pj |
Compute energy (pJ) |
c_cross_pj |
Crossing energy (pJ) |
c_total_pj |
Total energy (pJ) |
crossing_fraction |
E_cross / E_total |
epsilon_local |
Local elasticity βlog(C)/βlog(V_b) |
CrossingBench/
βββ docs/
β βββ figures/
β β βββ crossing_dominance.png # Three-boundary dominance figure
β βββ CROSSING_LAW.md # The law (definitions, decomposition, results)
β βββ HYPOTHESES.md # Every pJ/byte value traced to its source
βββ paper/
β βββ CrossingFlow_Domain_Crossing_Law_v1_2026.pdf
βββ examples/
β βββ plot_dominance.py # Regenerate the dominance figure
β βββ reproduce_three_boundaries.sh
β βββ real_workloads.py # ResNet-50 + GPT-2 validation
βββ src/crossingbench/
β βββ core.py # Cost model, sweep, compare
β βββ cli.py # Command-line interface
β βββ io.py # CSV output
βββ tests/
β βββ test_core.py # Unit tests
βββ data/ # Reference CSVs
βββ pyproject.toml # Package metadata (hatchling)
python -m pip install -e '.[dev]'
pytest
ruff check .If you use CrossingBench in academic work, please cite:
Morissette, J. (2026). CrossingBench v0.1.0: Microbenchmark for Boundary-Dominated Energy Scaling. Zenodo. https://doi.org/10.5281/zenodo.18653510
BibTeX:
@software{morissette2026crossingbench,
author = {Morissette, Jessy},
title = {CrossingBench v0.1.0: Microbenchmark for Boundary-Dominated Energy Scaling},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.18653510},
url = {https://doi.org/10.5281/zenodo.18653510}
}Apache-2.0 (see LICENSE).
