Releases: peterrrock2/binary-ensemble
v1.0.0
Binary Ensemble 1.0.0 is the first stable release of the rewritten BEN toolkit. This release stabilizes the Rust crate, command-line tools, Python package, and byte-level fixture contract for the BEN/XBEN/BENDL format family.
This is a broad stabilization release: the codec internals were reorganized, the Python package was rebuilt as binary-ensemble, BENDL bundles were added, and the format now has committed v1.0.0 fixtures, mutation tests, fuzz targets, and public specifications.
Highlights
- Added the
.bendlbundle format: a single self-describing file that embeds a BEN or XBEN assignment stream with graph, metadata, node permutation maps, and custom assets. - Stabilized three BEN encoding variants:
standard,mkv_chain, andtwodelta.twodeltais now the default encode variant for better compression on ReCom ensembles. - Added the
bendlCLI for creating, inspecting, extracting, appending to, removing assets from, and compacting.bendlfiles. - Reworked the
benCLI around explicit subcommands for encode/decode, XBEN conversion, lookup, relabeling, canonicalization, graph sorting, re-encoding, and PCompress interop. - Rebuilt the Python package as
binary-ensemblewith the import modulebinary_ensemble. - Added ergonomic Python APIs for plain streams and bundles, including
BenEncoder,BenDecoder,BendlEncoder,BendlDecoder, whole-file codec helpers, relabeling, recompression, graph ordering, and typed stubs. - Added public format specifications for BEN/XBEN, TwoDelta, and BENDL, plus a format stability policy.
- Added extensive Rust, Python, CLI, property-based, fixture-mutation, interop, soak, and fuzz coverage.
Breaking Changes
- The Python package has moved from the old
pybenlayout toben-pyin the repository, with the published package namedbinary-ensembleand imports underbinary_ensemble. - Standalone
pbenandrebenbinaries were removed. Their workflows now live under thebensubcommand tree, includingben pcompressandben reencode. - The Rust crate internals were reorganized around
codec,io,ops,json,format, andclimodules. Any downstream code using non-public/internal modules will need updates.ben canonicalizenow relabels district IDs in first-seen order starting at0. - Variant selection is explicit and normalized around
standard,mkv_chain, andtwodelta. Readers auto-detect variants, but writers default totwodeltaunless another variant is chosen.
CLI
ben
The ben command now covers the main stream workflows:
ben encodeconverts JSONL to BEN.ben xencodeconverts JSONL or BEN to XBEN.ben decodeconverts BEN to JSONL, or XBEN one layer down to BEN.ben xdecodeconverts XBEN directly to JSONL.ben lookupextracts a single sample from a BEN stream.ben canonicalizerelabels districts in first-seen order.ben relabelapplies a permutation map, graph key sort, or topology-based graph ordering.ben reencodechanges BEN variants and can collapse repeated runs.ben sort-graphemits a reordered graph and permutation map.ben pcompressconverts between BEN/XBEN and PCompress.ben xz-compressandben xz-decompressexpose general XZ helpers.
Notable CLI behavior changes:
--variantsupportsstandard,mkvchain/mkv_chain, andtwodelta/two_delta.--n-cpus -1means all available cores for XZ compression.- XZ options include compression level and block-size controls.
- Progress output has been cleaned up with quieter spinner behavior and a global
--quietflag. - Output path derivation and suffix behavior were tightened for relabeling, canonicalization, and
re-encoding.
bendl
The new bendl command manages bundle files:
bendl createpackages a.benor.xbenstream with optional graph, metadata,
node-permutation map, and custom assets.bendl inspectprints bundle header and asset-directory information.bendl extractextracts the embedded stream or an asset.bendl appendadds assets to a finalized bundle.bendl removeremoves assets and compacts the bundle.bendl compactrewrites a bundle in place to reclaim unreferenced bytes.
Assets are checksummed with CRC32C and may be xz-compressed on disk. Known assets use standardized names, and custom assets are prevented from claiming those names.
Python
The Python package is now a stable binary-ensemble package requiring Python 3.11+ with NetworkX as its only runtime dependency.
New Python surface:
BendlEncoderandBendlDecoderfor.bendlbundle creation, iteration, asset access, and verification.BenEncoderandBenDecoderfor plain.ben/.xbenstreams.- Whole-file helpers for JSONL, BEN, and XBEN conversion.
- Bundle helpers such as
compress_stream,relabel_bundle, and bundle compaction. - Variant-aware subsampling helpers for bundles and streams.
binary_ensemble.graphhelpers for key-based ordering, MLC ordering, and RCM ordering.- Type stubs and
py.typedsupport for editor and type-checker integration. binary_ensemble.__version__.
The Python docs were rebuilt with installation instructions, quickstarts, API reference pages, concept guides, how-to recipes, troubleshooting, and executable documentation-snippet tests.
Formats And Stability
This release establishes the v1.0.0 fixture contract:
- BEN fixtures for
standard,mkv_chain, andtwodelta. - XBEN fixtures for the same variants.
- BENDL fixtures covering stream checksums, header checksums, asset checksums, known assets,
compressed assets, unknown forward-compatible flags, and finalized bundle metadata. - A real PCompress interop fixture generated by the upstream PCompress encoder.
Readers are expected to continue accepting the committed v1.0.0 fixtures in later releases. The format stability policy explicitly forbids regenerating stable fixtures in place after release.
Integrity hardening in this release includes:
- CRC32C checksums for assignment streams, BENDL headers, and BENDL assets.
- Strict payload-length enforcement for streams and assets.
- Protection against overlong assignment lengths and oversized assets.
- Error returns instead of panics for corrupt input paths.
- Forward-compatible handling of reserved BENDL header and asset flags.
- Crash-safety improvements for bundle finalization, append, remove, and compaction flows.
- Protection against failed finalization, failed flushes, interrupted writes, and stale/dead bundle
payload regions.
Compression And Performance
- Added and stabilized the
twodeltavariant, including XBEN support and random-access lookup via snapshot replay. - Improved lookup behavior for standard, MKV-chain, and TwoDelta BEN streams.
- Reduced large tail-payload buffering so bundle operations stay streaming-friendly.
- Added graph-ordering tools that can materially improve run-length compression.
- Added MLC and RCM graph ordering implementations, plus key-based graph sorting.
- Added asset auto-compression for larger bundle payloads.
- Added multithreaded XZ tuning controls for compression level, CPU count, and block size.
- Preserved PCompress interoperability through
ben pcompress.
Documentation
New and expanded documentation includes:
- Repository README overhaul with format overview, CLI examples, BENDL workflow, relabeling guide, Python quickstart, assumptions, limitations, and testing policy.
docs/ben-format-spec.mddocs/twodelta-format-spec.mddocs/bendl-format-spec.mddocs/format-stability.mddocs/glossary.mddocs/coding-standards.md- Full ReadTheDocs source under
ben-py/docs/.
Testing And CI
The branch adds a much broader validation matrix:
- Rust unit and integration tests for codecs, readers, writers, CLI paths, graph ordering, relabeling, bundle read/write/append/remove/compact, and format stability.
- Python tests for bundle APIs, stream APIs, graph helpers, relabeling, recompression, documentation snippets, public surface, and type assertions.
- Property-based tests for boundary cases, operation equivalence, BENDL append behavior, and encode/decode pipelines.
- Exhaustive single-byte fixture mutation tests to ensure corrupt binary inputs fail loudly.
- Fuzz targets for BEN, XBEN, and BENDL readers.
- Cross-architecture and wheel smoke-test CI updates.
- Fast PR checks for the main Rust and Python test suites.
Upgrade Notes
- Update Python imports to
binary_ensemble. - Replace uses of removed
pbenandrebenbinaries withben pcompressandben reencode. - Expect newly encoded streams to default to
twodelta; pass--variant standardor
--variant mkvchainwhen older behavior is required. - Use
.bendlfor shareable ensembles where graph order, metadata, and assets must travel with the
assignment stream. - Treat the committed v1.0.0 fixture set as the compatibility baseline for future format changes.
v0.3.0
This release includes the long awaited python hooks for the BEN package. These hooks are now available in the py-ben python package hosted on pypi.
Ben Changes
- Introduced notion of "frames" to the decoders to make searching through compressed ensembles much easier.
- Added a frame subsampler type to make reading through compressed ensembles more intuitive
- Added more intense fuzzing tests
PyBen
Can be installed with pip install binary-ensemble
- Added PyBenEncoder and PyBenDecoder class hooks to make interacting with BEN and XBEN files easier
- Added the following functions so people do not need to use the CLI all of the time to compress files:
compress_jsonl_to_bencompress_jsonl_to_xbencompress_ben_to_xbendecompress_ben_to_jsonldecompress_xben_to_jsonldecompress_xben_to_ben
Full Changelog: v0.3.0...v0.2.0
v0.2.0
This release adds a new method into BEN that allows for much better compression of ensembles arising from Markov Chains. For low-rejection chains, the file sizes tend to be halved or so, and for high-rejection chains, the space savings can be an order of magnitude or more.
- The previous version of BEN is still supported, and now referred to as the
BenVariant::Standard. This version saves every single plan without trying to look for repetition. This is still the better option when considering ensembles of unique plans. - The new version of BEN is now denoted by
BenVariant::MkvChainwithin the source code. - Several structs and implementations have been added to w referred to as the
BenVariant::Standard. This version saves every single plan without trying to look for repetition. This is still the better option when considering ensembles of unique plans. - The new version of BEN is now denoted by
BenVariant::MkvChainwithin the source code. - Several structs and implementations have been added to make creating a encoder for BEN simpler.
- There is now a
pbenbinary file that allows for the conversion between PCompress files and BEN files.make creating a encoder for BEN simpler. - There is now a
pbenbinary file that allows for the conversion between PCompress files and BEN files.
Full Changelog: v0.1.3...v0.2.0
v0.1.3
Another hot patch for a demo today that fixes an error in the xz-decompress mode
Full Changelog: v0.1.2...v0.1.3
v0.1.2
This is a hot patch that fixes a major bug in the write_ben_file function
Full Changelog: v0.1.1...v0.1.2
v0.1.1
What's Changed
- Added functionality to the
bencli that allows for piping of files intoben - The
bencli will now print to the console if an output file is not specified - The
ben::encodemodule now has aBenWriterstruct with some write implementations that will take care of adding the bBEN STANDARD FILEheading to any outputs in the ben format so that the user does not need to remember to do this themselves.
Full Changelog: v0.1.0...v0.1.1
v0.1.0
Initial Release
This is the first release of the binary-ensemble package. Please see the README for more information on what is available and how to use the tool!