Skip to content

juliuskunze/shuffle-coding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shuffle Coding

Shuffle coding is a general method for optimal compression of unordered objects using bits-back coding. Data structures that can be compressed with our method include sets, multisets, graphs, hypergraphs, and others. Shuffle coding achieves state-of-the-art compression rates for unordered arrays (multisets), various molecule datasets and large network graphs, at practical, competitive speeds.

This implementation can be easily adapted to different data types and statistical models.

We published Practical Shuffle Coding at NeurIPS, based on our earlier ICLR publication Entropy Coding of Unordered Data Structures. This is the official implementation for both papers.

Features

The library has these optional features disabled by default:

  • experimental: Enable experimental algorithms, including complete shuffle coding on graphs based on nauty and Traces. Requires a C compiler on your system.

  • bench: Enable benchmarks used for the research experiments, includes experimental.

Running benchmarks

The binary allows to run benchmark experiments and requires the bench feature. To see available commands, run:

cargo run --release --features bench -- --help

Graph datasets are downloaded automatically as needed (TU, SZIP and REC).

To replicate experiments from our "Practical Shuffle Coding" paper, run ./practical.sh. To replicate experiments from our earlier paper "Entropy Coding of Unordered Data Structures", run ./complete.sh.

Code has been cleaned up and optimized since publication. Compression speeds are higher compared to the published results. Compression rates are unchanged.

Citing

If you find this code useful, please reference in your paper:

@article{kunze2024shuffle,
  title={Practical Shuffle Coding},
  author={Kunze, Julius and Severo, Daniel and van de Meent, Jan-Willem and Townsend, James},
  journal={NeurIPS},
  year={2024}
}

@article{kunze2024entropy,
  title={Entropy Coding of Unordered Data Structures},
  author={Kunze, Julius and Severo, Daniel and Zani, Giulio and van de Meent, Jan-Willem and Townsend, James},
  journal={ICLR},
  year={2024}
}

About

Lossless compression of sets, graphs and other unordered data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors