Skip to content

Releases: peterrrock2/binary-ensemble

v1.0.0

16 Jun 05:18

Choose a tag to compare

Binary Ensemble 1.0.0 is the first stable release of the rewritten BEN toolkit. This release stabilizes the Rust crate, command-line tools, Python package, and byte-level fixture contract for the BEN/XBEN/BENDL format family.

This is a broad stabilization release: the codec internals were reorganized, the Python package was rebuilt as binary-ensemble, BENDL bundles were added, and the format now has committed v1.0.0 fixtures, mutation tests, fuzz targets, and public specifications.

Highlights

  • Added the .bendl bundle format: a single self-describing file that embeds a BEN or XBEN assignment stream with graph, metadata, node permutation maps, and custom assets.
  • Stabilized three BEN encoding variants: standard, mkv_chain, and twodelta. twodelta is now the default encode variant for better compression on ReCom ensembles.
  • Added the bendl CLI for creating, inspecting, extracting, appending to, removing assets from, and compacting .bendl files.
  • Reworked the ben CLI around explicit subcommands for encode/decode, XBEN conversion, lookup, relabeling, canonicalization, graph sorting, re-encoding, and PCompress interop.
  • Rebuilt the Python package as binary-ensemble with the import module binary_ensemble.
  • Added ergonomic Python APIs for plain streams and bundles, including BenEncoder, BenDecoder, BendlEncoder, BendlDecoder, whole-file codec helpers, relabeling, recompression, graph ordering, and typed stubs.
  • Added public format specifications for BEN/XBEN, TwoDelta, and BENDL, plus a format stability policy.
  • Added extensive Rust, Python, CLI, property-based, fixture-mutation, interop, soak, and fuzz coverage.

Breaking Changes

  • The Python package has moved from the old pyben layout to ben-py in the repository, with the published package named binary-ensemble and imports under binary_ensemble.
  • Standalone pben and reben binaries were removed. Their workflows now live under the ben subcommand tree, including ben pcompress and ben reencode.
  • The Rust crate internals were reorganized around codec, io, ops, json, format, and cli modules. Any downstream code using non-public/internal modules will need updates. ben canonicalize now relabels district IDs in first-seen order starting at 0.
  • Variant selection is explicit and normalized around standard, mkv_chain, and twodelta. Readers auto-detect variants, but writers default to twodelta unless another variant is chosen.

CLI

ben

The ben command now covers the main stream workflows:

  • ben encode converts JSONL to BEN.
  • ben xencode converts JSONL or BEN to XBEN.
  • ben decode converts BEN to JSONL, or XBEN one layer down to BEN.
  • ben xdecode converts XBEN directly to JSONL.
  • ben lookup extracts a single sample from a BEN stream.
  • ben canonicalize relabels districts in first-seen order.
  • ben relabel applies a permutation map, graph key sort, or topology-based graph ordering.
  • ben reencode changes BEN variants and can collapse repeated runs.
  • ben sort-graph emits a reordered graph and permutation map.
  • ben pcompress converts between BEN/XBEN and PCompress.
  • ben xz-compress and ben xz-decompress expose general XZ helpers.

Notable CLI behavior changes:

  • --variant supports standard, mkvchain/mkv_chain, and twodelta/two_delta.
  • --n-cpus -1 means all available cores for XZ compression.
  • XZ options include compression level and block-size controls.
  • Progress output has been cleaned up with quieter spinner behavior and a global --quiet flag.
  • Output path derivation and suffix behavior were tightened for relabeling, canonicalization, and
    re-encoding.

bendl

The new bendl command manages bundle files:

  • bendl create packages a .ben or .xben stream with optional graph, metadata,
    node-permutation map, and custom assets.
  • bendl inspect prints bundle header and asset-directory information.
  • bendl extract extracts the embedded stream or an asset.
  • bendl append adds assets to a finalized bundle.
  • bendl remove removes assets and compacts the bundle.
  • bendl compact rewrites a bundle in place to reclaim unreferenced bytes.

Assets are checksummed with CRC32C and may be xz-compressed on disk. Known assets use standardized names, and custom assets are prevented from claiming those names.

Python

The Python package is now a stable binary-ensemble package requiring Python 3.11+ with NetworkX as its only runtime dependency.

New Python surface:

  • BendlEncoder and BendlDecoder for .bendl bundle creation, iteration, asset access, and verification.
  • BenEncoder and BenDecoder for plain .ben/.xben streams.
  • Whole-file helpers for JSONL, BEN, and XBEN conversion.
  • Bundle helpers such as compress_stream, relabel_bundle, and bundle compaction.
  • Variant-aware subsampling helpers for bundles and streams.
  • binary_ensemble.graph helpers for key-based ordering, MLC ordering, and RCM ordering.
  • Type stubs and py.typed support for editor and type-checker integration.
  • binary_ensemble.__version__.

The Python docs were rebuilt with installation instructions, quickstarts, API reference pages, concept guides, how-to recipes, troubleshooting, and executable documentation-snippet tests.

Formats And Stability

This release establishes the v1.0.0 fixture contract:

  • BEN fixtures for standard, mkv_chain, and twodelta.
  • XBEN fixtures for the same variants.
  • BENDL fixtures covering stream checksums, header checksums, asset checksums, known assets,
    compressed assets, unknown forward-compatible flags, and finalized bundle metadata.
  • A real PCompress interop fixture generated by the upstream PCompress encoder.

Readers are expected to continue accepting the committed v1.0.0 fixtures in later releases. The format stability policy explicitly forbids regenerating stable fixtures in place after release.

Integrity hardening in this release includes:

  • CRC32C checksums for assignment streams, BENDL headers, and BENDL assets.
  • Strict payload-length enforcement for streams and assets.
  • Protection against overlong assignment lengths and oversized assets.
  • Error returns instead of panics for corrupt input paths.
  • Forward-compatible handling of reserved BENDL header and asset flags.
  • Crash-safety improvements for bundle finalization, append, remove, and compaction flows.
  • Protection against failed finalization, failed flushes, interrupted writes, and stale/dead bundle
    payload regions.

Compression And Performance

  • Added and stabilized the twodelta variant, including XBEN support and random-access lookup via snapshot replay.
  • Improved lookup behavior for standard, MKV-chain, and TwoDelta BEN streams.
  • Reduced large tail-payload buffering so bundle operations stay streaming-friendly.
  • Added graph-ordering tools that can materially improve run-length compression.
  • Added MLC and RCM graph ordering implementations, plus key-based graph sorting.
  • Added asset auto-compression for larger bundle payloads.
  • Added multithreaded XZ tuning controls for compression level, CPU count, and block size.
  • Preserved PCompress interoperability through ben pcompress.

Documentation

New and expanded documentation includes:

  • Repository README overhaul with format overview, CLI examples, BENDL workflow, relabeling guide, Python quickstart, assumptions, limitations, and testing policy.
  • docs/ben-format-spec.md
  • docs/twodelta-format-spec.md
  • docs/bendl-format-spec.md
  • docs/format-stability.md
  • docs/glossary.md
  • docs/coding-standards.md
  • Full ReadTheDocs source under ben-py/docs/.

Testing And CI

The branch adds a much broader validation matrix:

  • Rust unit and integration tests for codecs, readers, writers, CLI paths, graph ordering, relabeling, bundle read/write/append/remove/compact, and format stability.
  • Python tests for bundle APIs, stream APIs, graph helpers, relabeling, recompression, documentation snippets, public surface, and type assertions.
  • Property-based tests for boundary cases, operation equivalence, BENDL append behavior, and encode/decode pipelines.
  • Exhaustive single-byte fixture mutation tests to ensure corrupt binary inputs fail loudly.
  • Fuzz targets for BEN, XBEN, and BENDL readers.
  • Cross-architecture and wheel smoke-test CI updates.
  • Fast PR checks for the main Rust and Python test suites.

Upgrade Notes

  • Update Python imports to binary_ensemble.
  • Replace uses of removed pben and reben binaries with ben pcompress and ben reencode.
  • Expect newly encoded streams to default to twodelta; pass --variant standard or
    --variant mkvchain when older behavior is required.
  • Use .bendl for shareable ensembles where graph order, metadata, and assets must travel with the
    assignment stream.
  • Treat the committed v1.0.0 fixture set as the compatibility baseline for future format changes.

v0.3.0

17 Oct 00:23

Choose a tag to compare

This release includes the long awaited python hooks for the BEN package. These hooks are now available in the py-ben python package hosted on pypi.

Ben Changes

  • Introduced notion of "frames" to the decoders to make searching through compressed ensembles much easier.
  • Added a frame subsampler type to make reading through compressed ensembles more intuitive
  • Added more intense fuzzing tests

PyBen

Can be installed with pip install binary-ensemble

  • Added PyBenEncoder and PyBenDecoder class hooks to make interacting with BEN and XBEN files easier
  • Added the following functions so people do not need to use the CLI all of the time to compress files:
    • compress_jsonl_to_ben
    • compress_jsonl_to_xben
    • compress_ben_to_xben
    • decompress_ben_to_jsonl
    • decompress_xben_to_jsonl
    • decompress_xben_to_ben

Full Changelog: v0.3.0...v0.2.0

v0.2.0

13 Jun 21:36
367a3a8

Choose a tag to compare

This release adds a new method into BEN that allows for much better compression of ensembles arising from Markov Chains. For low-rejection chains, the file sizes tend to be halved or so, and for high-rejection chains, the space savings can be an order of magnitude or more.

  • The previous version of BEN is still supported, and now referred to as the BenVariant::Standard. This version saves every single plan without trying to look for repetition. This is still the better option when considering ensembles of unique plans.
  • The new version of BEN is now denoted by BenVariant::MkvChain within the source code.
  • Several structs and implementations have been added to w referred to as the BenVariant::Standard. This version saves every single plan without trying to look for repetition. This is still the better option when considering ensembles of unique plans.
  • The new version of BEN is now denoted by BenVariant::MkvChain within the source code.
  • Several structs and implementations have been added to make creating a encoder for BEN simpler.
  • There is now a pben binary file that allows for the conversion between PCompress files and BEN files.make creating a encoder for BEN simpler.
  • There is now a pben binary file that allows for the conversion between PCompress files and BEN files.

Full Changelog: v0.1.3...v0.2.0

v0.1.3

09 Apr 14:59

Choose a tag to compare

Another hot patch for a demo today that fixes an error in the xz-decompress mode

Full Changelog: v0.1.2...v0.1.3

v0.1.2

09 Apr 14:31

Choose a tag to compare

This is a hot patch that fixes a major bug in the write_ben_file function

Full Changelog: v0.1.1...v0.1.2

v0.1.1

05 Apr 17:56

Choose a tag to compare

What's Changed

  • Added functionality to the ben cli that allows for piping of files into ben
  • The ben cli will now print to the console if an output file is not specified
  • The ben::encode module now has a BenWriter struct with some write implementations that will take care of adding the bBEN STANDARD FILE heading to any outputs in the ben format so that the user does not need to remember to do this themselves.

Full Changelog: v0.1.0...v0.1.1

v0.1.0

26 Mar 14:49

Choose a tag to compare

Initial Release

This is the first release of the binary-ensemble package. Please see the README for more information on what is available and how to use the tool!