High-performance GPU-accelerated converter for 12-bit JPEG sequences (.8ij format) to H.264/HEVC MP4 video files.
Converts proprietary .8ij files (12-bit JPEG sequences with custom headers) to standard MP4 video format using a hybrid CPU/GPU pipeline:
- CPU: Multi-threaded 12-bit JPEG decoding (parallel batch processing)
- GPU: CUDA kernels for bit depth normalization, gamma correction, and color space conversion
- GPU: Hardware-accelerated H.264/HEVC encoding via NVENC
The converter supports both 8-bit (H.264) and 10-bit (HEVC) output with automatic NVENC hardware encoding and fallback to software encoding if GPU acceleration is unavailable.
- NVIDIA GPU: NVENC-capable (Turing generation or newer recommended)
- Maxwell (GTX 900 series): NVENC 6.0
- Pascal (GTX 10 series): NVENC 7.0
- Turing (RTX 20 series, GTX 16 series): NVENC 7.0
- Ampere (RTX 30 series, A-series): NVENC 8.0
- Ada Lovelace (RTX 40 series): NVENC 8.1
- GPU Memory: Minimum 4GB VRAM (8GB+ recommended for 4K content)
- CPU: Multi-core processor (32+ threads recommended for optimal JPEG decoding)
- CUDA Toolkit: 11.x or 12.x (must match your GPU architecture)
- NVIDIA Driver: 530+ recommended (for latest NVENC features)
- Python: 3.8 or later
- FFmpeg: With NVENC support (default path:
/opt/8i/bin/ffmpeg)
- H.264 (NVENC): Maximum 4096×4096 pixels
- HEVC (NVENC): Maximum 8192×8192 pixels
- Converter automatically selects HEVC for resolutions exceeding H.264 limits
Download and install CUDA 11.x or 12.x from NVIDIA:
# Verify CUDA installation
nvcc --version
nvidia-smipython3 -m venv venv
source venv/bin/activate# For CUDA 11.x
pip install -r requirements.txt
# For CUDA 12.x (edit requirements.txt first)
# Change: cupy-cuda11x to cupy-cuda12x
pip install -r requirements.txt/opt/8i/bin/ffmpeg -encoders | grep nvencExpected output should include:
h264_nvenc(for 8-bit encoding)hevc_nvenc(for 10-bit encoding)
PyNvVideoCodec provides significantly better performance by eliminating CPU<->GPU memory copies:
pip install pycuda
# Download and install PyNvVideoCodec from:
# https://github.com/NVIDIA/VideoProcessingFrameworkpython3 converter_8ij_to_mp4.py input.8ij output.mp4python3 converter_8ij_to_mp4.py input.8ij output.mp4 --bit-depth 10python3 converter_8ij_to_mp4.py input.8ij output.mp4 \
--bit-depth 10 \
--workers 32 \
--gpu 0 \
--batch 16 \
--ffmpeg /opt/8i/bin/ffmpegpython3 converter_8ij_to_mp4.py input.8ij test.mp4 --max-frames 100python3 batch_convert.py /path/to/input/dir /path/to/output/dir# Convert 2 files at once on same GPU
python3 batch_convert.py /path/to/input/dir /path/to/output/dir --parallel 2
# Use multiple GPUs (2 GPUs, 4 parallel jobs)
python3 batch_convert.py /path/to/input/dir /path/to/output/dir \
--parallel 4 --gpus 0,1python3 batch_convert.py /path/to/input/dir /path/to/output/dir --resumepython3 batch_convert.py /path/to/input/dir /path/to/output/dir --dry-run# Monitor all parallel conversion logs
./monitor_logs.sh /path/to/output/dir
# Watch two specific conversions side-by-side
./watch_parallel.sh /path/to/output/dir| Option | Default | Description |
|---|---|---|
--bit-depth |
8 | Output bit depth: 8 (H.264) or 10 (HEVC) |
--workers |
32 | CPU threads for JPEG decoding |
--gpu |
0 | GPU device ID |
--batch |
16 | Number of frames per batch |
--max-frames |
None | Process only first N frames (testing) |
--ffmpeg |
/opt/8i/bin/ffmpeg | Path to ffmpeg binary |
| Option | Default | Description |
|---|---|---|
--parallel |
1 | Number of files to convert simultaneously |
--gpus |
0 | Comma-separated GPU IDs (e.g., "0,1") |
--workers |
32 | CPU threads per conversion job |
--batch |
16 | Frames per batch |
--resume |
False | Skip already converted files |
--dry-run |
False | Preview conversion plan |
--ffmpeg |
/opt/8i/bin/ffmpeg | Path to ffmpeg binary |
- NVENC (GPU): 80-300 fps (depending on GPU model and resolution)
- libx264 (CPU fallback): 5-10 fps
- GPU VRAM: ~2-4GB per conversion (depends on resolution)
- System RAM: ~2-4GB per conversion (JPEG decoding buffers)
- Single stream: ~20 fps on 20-core CPU + RTX A5000
- Multiple streams: Up to 80 fps with proper parallelization
- Batch size: Increase
--batchfor higher throughput (higher VRAM usage) - CPU workers: Match to available CPU cores (default: 32)
When using multiple GPUs with batch conversion:
# 4 parallel conversions on 2 GPUs (2 jobs per GPU)
python3 batch_convert.py input/ output/ --parallel 4 --gpus 0,1The batch converter automatically distributes jobs across GPUs in round-robin fashion.
- Multi-threaded 12-bit JPEG decoding using imagecodecs/pylibjpeg
- Batch parallel processing (default: 32 workers)
- Output: uint16 arrays (0-4095 range)
Custom CUDA kernels perform:
- Normalize 12-bit → float32 (0.0-1.0)
- Apply gamma correction (default: γ=2.2⁻¹)
- Scale to target bit depth:
- 8-bit: 0-255 (uint8)
- 10-bit: 0-1023 (uint16)
- For 8-bit: Convert RGB → RGBA (add alpha channel)
- For 10-bit: Convert RGB → P010 (YUV420 10-bit)
Three encoding paths (tried in order):
- Direct GPU-to-encoder using CUDA Array Interface
- Eliminates CPU<->GPU memory transfers
- Requires: PyNvVideoCodec + PyCUDA
- Pipe raw frames to FFmpeg NVENC encoder
- Hardware-accelerated H.264/HEVC encoding
- Requires: FFmpeg with NVENC support
- CPU-based software encoding
- 8-bit only (10-bit requires NVENC)
- Used when GPU encoding unavailable
The converter implements a three-tier fallback strategy:
- Try PyNvVideoCodec: Zero-copy GPU encoding
- If unavailable: Try FFmpeg NVENC
- Try NVENC via FFmpeg: Hardware encoding with subprocess
- If NVENC unavailable: Try libx264 (8-bit only)
- Use libx264: CPU software encoding (8-bit only)
- Displays warning with expected performance
10-bit output requires HEVC with NVENC. If NVENC is unavailable:
RuntimeError: Cannot use 10-bit output with libx264 fallback encoder.
10-bit encoding requires NVENC with HEVC.
Please ensure ffmpeg with NVENC support is available.
- Try imagecodecs: Primary 12-bit JPEG decoder
- If 12-bit support unavailable: Try pylibjpeg
- Try pylibjpeg: Fallback 12-bit JPEG decoder
- If unavailable: Raise error with installation instructions
- Container for 12-bit JPEG sequences
- Custom format with stripped JPEG headers:
- Missing: SOI marker (FF D8) and APP0 marker (FF E0)
- Starts with: Length field (00 10) + JFIF identifier
- Frame structure: 16-byte header + JPEG data
- Header: "8IJ1" magic (4 bytes) + frame size (4 bytes) + frame index (4 bytes) + reserved (4 bytes)
- Companion .8ii file contains metadata (chunk-based binary format)
- 8-bit: H.264 High Profile, YUV420p, MP4 container
- 10-bit: HEVC Main10 Profile, YUV420p10le, MP4 container
- GOP size: Matches frame rate (keyframe every second)
- Rate control: VBR with CQ=19 (high quality)
# Check if NVENC encoder exists
/opt/8i/bin/ffmpeg -encoders | grep nvenc
# Check NVIDIA driver version
nvidia-smi
# Verify CUDA libraries are in LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/lib64:/usr/lib:/opt/8i/lib:$LD_LIBRARY_PATH- Reduce batch size:
--batch 8 - Use single GPU per conversion
- Check VRAM usage:
nvidia-smi
# Rebuild PyAV against system FFmpeg
./fix_nvenc.sh# Install fallback decoder
pip install pylibjpeg pylibjpeg-libjpeg
# Or rebuild imagecodecs with 12-bit support
pip install imagecodecs --no-binary imagecodecsCheck that:
- NVENC is available in FFmpeg
- CUDA driver version is 530+
- GPU has sufficient VRAM
- No other processes using GPU
- Check individual job logs in
<output_dir>/.conversion_logs/ - Monitor with:
./monitor_logs.sh <output_dir> - Kill hung process and use
--resumeto continue
The converter uses several strategies to minimize memory usage:
- Pinned Memory: 2x faster CPU<->GPU transfers
- Batch Processing: Amortizes overhead across multiple frames
- Immediate Cleanup: Frees large arrays after each stage
- Zero-Copy Path: PyNvVideoCodec eliminates redundant transfers
- Single conversion: Multi-threaded JPEG decoding + single GPU stream
- Batch conversion: Multi-process with one converter per parallel job
- GPU allocation: Round-robin distribution across specified GPUs
- Input: 12-bit RGB (uint16, 0-4095)
- Gamma correction: Compensate for 12-bit sensor gamma
- Output color space:
- 8-bit: RGB → YUV420p (NVENC handles conversion)
- 10-bit: RGB → P010 (manual YUV420 with 2x2 chroma subsampling)
This project is provided as-is for internal use. Refer to your organization's licensing terms.
Recent optimizations have achieved:
- 20 fps for single stream conversion
- 80 fps maximum on multi-core system (20 cores + RTX A5000)
- Reduced CPU usage through efficient GPU memory pinning
- 12-bit JPEG decoding at ~70 fps (with further GPU optimizations pending)
Commit history shows progressive performance improvements targeting both CPU and GPU bottlenecks.