Shuffle coding is a general method for optimal compression of unordered objects using bits-back coding. Data structures that can be compressed with our method include sets, multisets, graphs, hypergraphs, and others. Shuffle coding achieves state-of-the-art compression rates for unordered arrays (multisets), various molecule datasets and large network graphs, at practical, competitive speeds.
This implementation can be easily adapted to different data types and statistical models.
We published Practical Shuffle Coding at NeurIPS, based on our earlier ICLR publication Entropy Coding of Unordered Data Structures. This is the official implementation for both papers.
The library has these optional features disabled by default:
-
experimental: Enable experimental algorithms, including complete shuffle coding on graphs based on nauty and Traces. Requires a C compiler on your system. -
bench: Enable benchmarks used for the research experiments, includesexperimental.
The binary allows to run benchmark experiments and requires the bench feature. To see available commands, run:
cargo run --release --features bench -- --helpGraph datasets are downloaded automatically as needed (TU, SZIP and REC).
To replicate experiments from our "Practical Shuffle Coding" paper, run ./practical.sh.
To replicate experiments from our earlier paper "Entropy Coding of Unordered Data Structures", run ./complete.sh.
Code has been cleaned up and optimized since publication. Compression speeds are higher compared to the published results. Compression rates are unchanged.
If you find this code useful, please reference in your paper:
@article{kunze2024shuffle,
title={Practical Shuffle Coding},
author={Kunze, Julius and Severo, Daniel and van de Meent, Jan-Willem and Townsend, James},
journal={NeurIPS},
year={2024}
}
@article{kunze2024entropy,
title={Entropy Coding of Unordered Data Structures},
author={Kunze, Julius and Severo, Daniel and Zani, Giulio and van de Meent, Jan-Willem and Townsend, James},
journal={ICLR},
year={2024}
}