GPR v2.0: 41% smaller files + embedded ARM encoder mode#57
Open
dcliftreaves wants to merge 8 commits intogopro:masterfrom
Open
GPR v2.0: 41% smaller files + embedded ARM encoder mode#57dcliftreaves wants to merge 8 commits intogopro:masterfrom
dcliftreaves wants to merge 8 commits intogopro:masterfrom
Conversation
- PIXEL_FORMAT_RAW_RGGB_16 and GBRG_16 formats
- Updated wavelet, companding, and log curve for 16-bit range
- Prescale values {2,3,3} for 16-bit (vs {0,2,2} for 14-bit)
- Component clamping in decoder for Q6-Q8 overflow prevention
- PutBuffer overflow handling (assert → error return)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Noise model (noise_model.c/h): - Poisson-Gaussian noise estimation from raw pixels - noise_remove: quantize to noise floor (encoder, LUT-accelerated) - noise_restore: PRNG triangular noise reconstruction (decoder) - FPN polynomial model with row/column offsets and PRNU Wavelet denoise (denoise.c): - Phase 0.5: pre-transform signal-dependent MAD estimation - BayesShrink adaptive per-band thresholding - NoiseAwareRequantize: round coefficients to noise step size - Prescale-aware wavelet noise gain computation Tools: noise_analysis, calibrate, fpn_extract Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ANS coder (ans.c, ans_joint.c): - Joint RLV: single symbol per coefficient (160 joint symbols) - 10 run classes × 16 magnitude classes with residual bits - 4-way interleaved rANS for reduced pipeline stalls - Packed decode table (sym+freq+cum_freq in one lookup) - Fast bitbuf_read: word-aligned reads instead of bit-by-bit - Per-band frequency tables for adaptive compression Modes: 3 (companded 14-bit), 4 (raw 16-bit) Backward compatible decoder for modes 1/2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Encoder: - Phase 0.5: pre-transform noise estimation + adaptive quantization - Phase 1: parallel wavelet transform (4 threads) - Phase 1.8: parallel ANS pre-encoding (4 threads) - Phase 2: serial bitstream with per-band VLC/ANS auto-selection - Negative quant sentinel for 16-bit skip-uncompand path Decoder: - ANS mode dispatch (modes 1-4) with jans_decode_band_x4 - Negative quant → skip uncompanding in dequantization - Component clamping for Q6-Q8 overflow prevention - NEON-accelerated dequantization paths - Production hardening: assert(0) → proper error returns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SDK (gpr.cpp): - noise_remove/noise_restore pipeline with DNG unit conversion - Auto-triggers noise_restore on decode when noise seed present - 16-bit pixel format support in all conversion functions CLI tools: - gpr_tools: -A (ANS), -D (denoise), -R (noise replace), -F (FPN) flags - gpr_batch.sh: production batch encoder with parallel jobs - compare_quality: PSNR, SSIM, noise preservation, per-region analysis - ans_test: ANS round-trip unit test - fuzz_ans: libFuzzer target for ANS decode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI (.github/workflows/ci.yml): - Ubuntu + macOS builds with smoke tests - ANS unit test and compare_quality build verification - VLC, ANS, and ANS+DN round-trip tests Test data (data/test_sets/): - 3-tier structure: smoke, medium, corner_cases - High/low ISO, high/low entropy test categories - Test suite script (data/tests/test_suite.sh) Calibration tools (tools/): - Phocus capture sequences for automated dark/flat frames - GoPro USB calibration script - Interactive calibration guide Build: ans_test and compare_quality added to CMake Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review fixes: - Add GoPro dual license headers to 4 new files - Add -lm linkage for Linux builds (vc5_common, vc5_encoder) - Fix backward compat: route existing GPR→RAW through original decode path (not _ex which applies noise_restore to files with NoiseProfile) - Remove hardcoded personal paths from test scripts and docs - Fix batch_encode.sh: add nproc fallback for Linux - NEON auto-detect on any ARM64 (not just Apple) - Remove CI workflow (separate branch — not for upstream PR) Documentation: - docs/gotchas.md: 11 integration notes (LSB rounding, scope, embedded memory, division cost, malloc count, stack, FP, thread safety) - docs/compression-results.html: interactive charts with bar graphs, histogram of 1,231 Z8 compression ratios, speed tables Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
26fb8e7 to
05afc82
Compare
Embedded mode (-E / --Embedded CLI flag): - Wavelet transform runs serially (no pthreads) - Phase 1.8 parallel ANS pre-encode skipped entirely - Bands encode inline in Phase 2, one at a time - Peak memory: 113 MB (vs 451 MB with 4 threads) - Output is byte-identical to normal mode Arena allocator: - Single malloc per band instead of 6 separate allocations - Bump-allocates tokens, residual, and rANS buffers from one block - 36 heap operations per image (was 216) Reciprocal frequency table (rcp_freq[]): - Precomputed for future division-free encode - Currently unused (32-bit approximation not exact for full state range) Also: - Remove Jetraw brand references (replaced with generic descriptions) - Remove Phocus capture sequences (vendor-specific tooling) - Rename PHOCUS env var to CALIBRATION_CAPTURES in test script Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
05afc82 to
44bc5af
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR extends the GPR codec with three production-validated features:
-Eflag on desktop).All features are opt-in via CLI flags. Without flags, the codec produces identical output to the existing version.
Compression Results
Per-band auto-selection guarantees output is never larger than VLC.
Mass Validation: 1,231 Nikon Z8 Images
Zero failures. Min 3.3x, median 6.0x, max 18.8x.
Embedded ARM Viability (GoPro Camera Integration)
The codec includes an embedded mode (
-Eflag) designed for ARM SoC evaluation:Decoder: Ready for On-Camera Use
Estimated decode throughput on Cortex-A78 (GP2-class): ~40 MP/s. A 27 MP HERO13 frame decodes in 0.67 seconds — fast enough for gallery browsing and thumbnail generation on-device.
Encoder: Embedded Mode (
-E)Arena allocator: 1 malloc + 1 free per band (was 6 each). All encode buffers bump-allocated from a single contiguous block.
What This Means for GoPro Hardware
For a HERO13 Black shooting GPR at 27 MP:
Technical Approach
Joint RLV ANS Coder
Noise-Aware Quantization
Embedded Mode
-Eflag: serial wavelet, no parallel pre-encode, arena allocatorCommits (8)
021ffdc16-bit Bayer pixel format infrastructure7ee77d5Noise model, BayesShrink denoise, FPN calibrationa8c5865Joint RLV ANS entropy coder with 4-way interleaved decode78cfe1aEncoder/decoder pipeline integration0041a49SDK + CLI toolsfb1bab6Test infrastructure and calibration tools1026b05Pre-PR review fixes (license, Linux -lm, backward compat)4f6b5d9Embedded encoder mode + arena allocator for ARM SoCFull docs: architecture.md | gotchas.md | compression-results.html
Test Plan
🤖 Generated with Claude Code