Skip to content

feat: ZSTDMT-style parallel compress/decompress + wasm-simd128-mt payload #383

Description

@polaz

Summary

Add ZSTDMT-style parallelism: split the input into independent jobs so both
compression and decompression can run across multiple threads, and ship a
thread-capable wasm payload (wasm-simd128-mt) so the same parallelism is
available in the browser.

Core: parallel compress + decompress via job splitting

Mirror libzstd's ZSTDMT model. The unit of parallelism is an independent
job
, not a single zstd block:

So the deliverable is: a job-splitting layer (compress side) that produces a
stream whose job boundaries are independently decodable, plus a parallel driver
on both sides. Within one job/frame, decode stays sequential by construction.

wasm delivery: wasm-simd128-mt payload

A third wasm payload, built with the WebAssembly threads proposal, that runs the
parallel compress/decompress across Web Workers — shipped alongside the existing
simd and scalar payloads.

  1. Build with -C target-feature=+atomics,+bulk-memory,+mutable-globals and
    -Z build-std (shared-memory std).
  2. Bootstrap a worker pool (e.g. wasm-bindgen-rayon): the main module hands
    each worker the compiled module + shared memory.
  3. Runtime gate in the npm loader: select mt only when crossOriginIsolated
    is true (SharedArrayBuffer available); otherwise fall back to simd /
    scalar. Document the required COOP (same-origin) + COEP
    (require-corp) response headers for consumers.
  4. Package the third payload directory (mt/) and extend capability detection
    • lazy load.

Dependencies

The wasm-mt payload is the browser delivery of the same job-splitting parallel
codec; the native tracks (#19, #72) provide the parallel driver this packages.

Acceptance criteria

  • Job-split compression produces a valid stream whose jobs are independently
    decodable; parallel compress shows a multi-core speedup, output decodes in any
    compliant zstd decoder.
  • Parallel decompression across independent jobs/frames shows a multi-core
    speedup; single-frame decode is unchanged (documented sequential limit, not a
    regression).
  • wasm-simd128-mt builds and loads under a cross-origin-isolated host; falls
    back cleanly to simd / scalar when SharedArrayBuffer is unavailable.
  • Round-trip + cross-decoder parity tests pass on all payloads.
  • Consumer docs cover the COOP/COEP requirement.

Estimate

The wasm-mt payload + loader: 3d 4h (build-std + worker bootstrap + capability
detection + COOP/COEP docs + packaging + parity tests). The parallel
compress/decompress engine itself is the native tracks (#19, #72) and is
estimated there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2-mediumMedium priority — important improvementP3-lowLow priority — nice to haveenhancementNew feature or requestperformancePerformance optimization

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions