Skip to content

GPR v2.0: 41% smaller files + embedded ARM encoder mode#57

Open
dcliftreaves wants to merge 8 commits intogopro:masterfrom
dcliftreaves:feature/v2-final-2
Open

GPR v2.0: 41% smaller files + embedded ARM encoder mode#57
dcliftreaves wants to merge 8 commits intogopro:masterfrom
dcliftreaves:feature/v2-final-2

Conversation

@dcliftreaves
Copy link
Copy Markdown

Summary

This PR extends the GPR codec with three production-validated features:

  1. Joint RLV ANS entropy coding — adaptive entropy coder replacing the fixed VLC codebook. Up to 41% smaller files on GoPro hardware.
  2. Noise-aware adaptive quantization — estimates sensor noise and raises quantization to the noise floor, removing noise entropy while preserving signal.
  3. Embedded encoder mode — single-thread, arena-allocated code path designed for ARM SoC integration (tested with -E flag on desktop).

All features are opt-in via CLI flags. Without flags, the codec produces identical output to the existing version.

Compression Results

Camera Resolution Raw VLC (current) ANS+DN (new) Improvement
GoPro Hero6 12 MP 24 MB 5.3 MB 3.1 MB +41%
GoPro HERO7 12 MP 24 MB 7.7 MB 4.5 MB +42%
GoPro HERO10 (ISO 1600) 23 MP 46.5 MB 13.8 MB 10.0 MB +27%
Nikon Z8 45 MP 91 MB 15.5 MB 15.0 MB +3%
Hasselblad X2D 100 MP 204 MB N/A 36.1 MB 5.7x

Per-band auto-selection guarantees output is never larger than VLC.

Mass Validation: 1,231 Nikon Z8 Images

  3-4x:   32  ██
  4-5x:  271  ██████████████████
  5-6x:  312  █████████████████████  ← peak
  6-7x:  211  ██████████████
  7-8x:  131  █████████
  8-9x:   77  █████
 9-10x:   48  ███
10-12x:   90  ██████
12-15x:   51  ███
15-20x:    8  █

Zero failures. Min 3.3x, median 6.0x, max 18.8x.

Embedded ARM Viability (GoPro Camera Integration)

The codec includes an embedded mode (-E flag) designed for ARM SoC evaluation:

Decoder: Ready for On-Camera Use

Property Value Impact
Decode table 20 KB Fits in L1 cache
Working memory ~3 MB transient per band Freed after each band
Arithmetic Integer-only (no FP) No FPU required
Division None in decode path Table lookup + multiply + shift
Heap operations 0 per band (decoder) No malloc in hot path

Estimated decode throughput on Cortex-A78 (GP2-class): ~40 MP/s. A 27 MP HERO13 frame decodes in 0.67 seconds — fast enough for gallery browsing and thumbnail generation on-device.

Encoder: Embedded Mode (-E)

Property Normal Mode Embedded Mode
Threads 4 (parallel) 1 (serial)
Peak additional memory 451 MB 113 MB
Heap ops / image 36 36
Output byte-identical byte-identical
Encode (GoPro 12MP) 76 ms 75 ms
Encode (Z8 45MP) 1.2 s 1.2 s

Arena allocator: 1 malloc + 1 free per band (was 6 each). All encode buffers bump-allocated from a single contiguous block.

What This Means for GoPro Hardware

For a HERO13 Black shooting GPR at 27 MP:

  • 41% smaller files → 67% more photos per SD card
  • 41% fewer SD card writes → reduced storage wear
  • On-device GPR decode without phone → faster gallery
  • Encoder fits in 113 MB additional RAM in embedded mode
  • Integer-only decoder — no FPU required for decode path

Technical Approach

Joint RLV ANS Coder

  • 160 joint symbols (10 run classes × 16 magnitude classes)
  • 4-way interleaved rANS states for reduced pipeline stalls
  • Packed decode table: sym + freq + cum_freq in single 8-byte lookup
  • Fast bitbuf: word-aligned reads (10x fewer operations than bit-by-bit)
  • Per-band auto-selection: tries both ANS and VLC, picks the smaller

Noise-Aware Quantization

  • Pre-transform 4-bin signal-dependent MAD estimation
  • BayesShrink adaptive per-band thresholding
  • Never reduces quality below selected quality preset

Embedded Mode

  • -E flag: serial wavelet, no parallel pre-encode, arena allocator
  • Byte-identical output — testable on desktop, deployable on SoC
  • Reciprocal table precomputed for future division-free encode

Commits (8)

  1. 021ffdc 16-bit Bayer pixel format infrastructure
  2. 7ee77d5 Noise model, BayesShrink denoise, FPN calibration
  3. a8c5865 Joint RLV ANS entropy coder with 4-way interleaved decode
  4. 78cfe1a Encoder/decoder pipeline integration
  5. 0041a49 SDK + CLI tools
  6. fb1bab6 Test infrastructure and calibration tools
  7. 1026b05 Pre-PR review fixes (license, Linux -lm, backward compat)
  8. 4f6b5d9 Embedded encoder mode + arena allocator for ARM SoC

Full docs: architecture.md | gotchas.md | compression-results.html

Test Plan

  • ANS round-trip unit test (5 coefficient distributions)
  • GoPro smoke test (Hero6, HERO7, HERO10 at ISO 100 and 1600)
  • Nikon Z8 mass scan (1,231 files, zero failures)
  • Hasselblad X2D round-trip (4 files, ISO 64 and 200)
  • Backward compat: original GPR files decode correctly
  • Per-band auto-select: never larger than VLC
  • Embedded mode: byte-identical to normal mode
  • Linux build: -lm linkage, NEON auto-detect
  • License headers on all new files

🤖 Generated with Claude Code

hh-decr and others added 7 commits April 20, 2026 08:00
- PIXEL_FORMAT_RAW_RGGB_16 and GBRG_16 formats
- Updated wavelet, companding, and log curve for 16-bit range
- Prescale values {2,3,3} for 16-bit (vs {0,2,2} for 14-bit)
- Component clamping in decoder for Q6-Q8 overflow prevention
- PutBuffer overflow handling (assert → error return)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Noise model (noise_model.c/h):
- Poisson-Gaussian noise estimation from raw pixels
- noise_remove: quantize to noise floor (encoder, LUT-accelerated)
- noise_restore: PRNG triangular noise reconstruction (decoder)
- FPN polynomial model with row/column offsets and PRNU

Wavelet denoise (denoise.c):
- Phase 0.5: pre-transform signal-dependent MAD estimation
- BayesShrink adaptive per-band thresholding
- NoiseAwareRequantize: round coefficients to noise step size
- Prescale-aware wavelet noise gain computation

Tools: noise_analysis, calibrate, fpn_extract

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ANS coder (ans.c, ans_joint.c):
- Joint RLV: single symbol per coefficient (160 joint symbols)
- 10 run classes × 16 magnitude classes with residual bits
- 4-way interleaved rANS for reduced pipeline stalls
- Packed decode table (sym+freq+cum_freq in one lookup)
- Fast bitbuf_read: word-aligned reads instead of bit-by-bit
- Per-band frequency tables for adaptive compression

Modes: 3 (companded 14-bit), 4 (raw 16-bit)
Backward compatible decoder for modes 1/2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Encoder:
- Phase 0.5: pre-transform noise estimation + adaptive quantization
- Phase 1: parallel wavelet transform (4 threads)
- Phase 1.8: parallel ANS pre-encoding (4 threads)
- Phase 2: serial bitstream with per-band VLC/ANS auto-selection
- Negative quant sentinel for 16-bit skip-uncompand path

Decoder:
- ANS mode dispatch (modes 1-4) with jans_decode_band_x4
- Negative quant → skip uncompanding in dequantization
- Component clamping for Q6-Q8 overflow prevention
- NEON-accelerated dequantization paths
- Production hardening: assert(0) → proper error returns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SDK (gpr.cpp):
- noise_remove/noise_restore pipeline with DNG unit conversion
- Auto-triggers noise_restore on decode when noise seed present
- 16-bit pixel format support in all conversion functions

CLI tools:
- gpr_tools: -A (ANS), -D (denoise), -R (noise replace), -F (FPN) flags
- gpr_batch.sh: production batch encoder with parallel jobs
- compare_quality: PSNR, SSIM, noise preservation, per-region analysis
- ans_test: ANS round-trip unit test
- fuzz_ans: libFuzzer target for ANS decode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI (.github/workflows/ci.yml):
- Ubuntu + macOS builds with smoke tests
- ANS unit test and compare_quality build verification
- VLC, ANS, and ANS+DN round-trip tests

Test data (data/test_sets/):
- 3-tier structure: smoke, medium, corner_cases
- High/low ISO, high/low entropy test categories
- Test suite script (data/tests/test_suite.sh)

Calibration tools (tools/):
- Phocus capture sequences for automated dark/flat frames
- GoPro USB calibration script
- Interactive calibration guide

Build: ans_test and compare_quality added to CMake

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review fixes:
- Add GoPro dual license headers to 4 new files
- Add -lm linkage for Linux builds (vc5_common, vc5_encoder)
- Fix backward compat: route existing GPR→RAW through original decode
  path (not _ex which applies noise_restore to files with NoiseProfile)
- Remove hardcoded personal paths from test scripts and docs
- Fix batch_encode.sh: add nproc fallback for Linux
- NEON auto-detect on any ARM64 (not just Apple)
- Remove CI workflow (separate branch — not for upstream PR)

Documentation:
- docs/gotchas.md: 11 integration notes (LSB rounding, scope, embedded
  memory, division cost, malloc count, stack, FP, thread safety)
- docs/compression-results.html: interactive charts with bar graphs,
  histogram of 1,231 Z8 compression ratios, speed tables

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dcliftreaves dcliftreaves force-pushed the feature/v2-final-2 branch 2 times, most recently from 26fb8e7 to 05afc82 Compare April 20, 2026 17:26
Embedded mode (-E / --Embedded CLI flag):
- Wavelet transform runs serially (no pthreads)
- Phase 1.8 parallel ANS pre-encode skipped entirely
- Bands encode inline in Phase 2, one at a time
- Peak memory: 113 MB (vs 451 MB with 4 threads)
- Output is byte-identical to normal mode

Arena allocator:
- Single malloc per band instead of 6 separate allocations
- Bump-allocates tokens, residual, and rANS buffers from one block
- 36 heap operations per image (was 216)

Reciprocal frequency table (rcp_freq[]):
- Precomputed for future division-free encode
- Currently unused (32-bit approximation not exact for full state range)

Also:
- Remove Jetraw brand references (replaced with generic descriptions)
- Remove Phocus capture sequences (vendor-specific tooling)
- Rename PHOCUS env var to CALIBRATION_CAPTURES in test script

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants