Skip to content

GPU-accelerated utility to convert proprietary file formats to standard encodings (both 10-bit and 8-bit)

Notifications You must be signed in to change notification settings

8i-labs/conversion_utility

Repository files navigation

8ij to MP4 Converter

High-performance GPU-accelerated converter for 12-bit JPEG sequences (.8ij format) to H.264/HEVC MP4 video files.

Overview

Converts proprietary .8ij files (12-bit JPEG sequences with custom headers) to standard MP4 video format using a hybrid CPU/GPU pipeline:

  1. CPU: Multi-threaded 12-bit JPEG decoding (parallel batch processing)
  2. GPU: CUDA kernels for bit depth normalization, gamma correction, and color space conversion
  3. GPU: Hardware-accelerated H.264/HEVC encoding via NVENC

The converter supports both 8-bit (H.264) and 10-bit (HEVC) output with automatic NVENC hardware encoding and fallback to software encoding if GPU acceleration is unavailable.

Requirements

Hardware

  • NVIDIA GPU: NVENC-capable (Turing generation or newer recommended)
    • Maxwell (GTX 900 series): NVENC 6.0
    • Pascal (GTX 10 series): NVENC 7.0
    • Turing (RTX 20 series, GTX 16 series): NVENC 7.0
    • Ampere (RTX 30 series, A-series): NVENC 8.0
    • Ada Lovelace (RTX 40 series): NVENC 8.1
  • GPU Memory: Minimum 4GB VRAM (8GB+ recommended for 4K content)
  • CPU: Multi-core processor (32+ threads recommended for optimal JPEG decoding)

Software

  • CUDA Toolkit: 11.x or 12.x (must match your GPU architecture)
  • NVIDIA Driver: 530+ recommended (for latest NVENC features)
  • Python: 3.8 or later
  • FFmpeg: With NVENC support (default path: /opt/8i/bin/ffmpeg)

Resolution Limits

  • H.264 (NVENC): Maximum 4096×4096 pixels
  • HEVC (NVENC): Maximum 8192×8192 pixels
  • Converter automatically selects HEVC for resolutions exceeding H.264 limits

Installation

1. Install CUDA Toolkit

Download and install CUDA 11.x or 12.x from NVIDIA:

# Verify CUDA installation
nvcc --version
nvidia-smi

2. Create Python Virtual Environment

python3 -m venv venv
source venv/bin/activate

3. Install Python Dependencies

# For CUDA 11.x
pip install -r requirements.txt

# For CUDA 12.x (edit requirements.txt first)
# Change: cupy-cuda11x to cupy-cuda12x
pip install -r requirements.txt

4. Verify FFmpeg NVENC Support

/opt/8i/bin/ffmpeg -encoders | grep nvenc

Expected output should include:

  • h264_nvenc (for 8-bit encoding)
  • hevc_nvenc (for 10-bit encoding)

5. (Optional) Install PyNvVideoCodec for Zero-Copy Encoding

PyNvVideoCodec provides significantly better performance by eliminating CPU<->GPU memory copies:

pip install pycuda
# Download and install PyNvVideoCodec from:
# https://github.com/NVIDIA/VideoProcessingFramework

Usage

Single File Conversion

8-bit H.264 (default)

python3 converter_8ij_to_mp4.py input.8ij output.mp4

10-bit HEVC (higher quality)

python3 converter_8ij_to_mp4.py input.8ij output.mp4 --bit-depth 10

With Custom Settings

python3 converter_8ij_to_mp4.py input.8ij output.mp4 \
  --bit-depth 10 \
  --workers 32 \
  --gpu 0 \
  --batch 16 \
  --ffmpeg /opt/8i/bin/ffmpeg

Test on Subset

python3 converter_8ij_to_mp4.py input.8ij test.mp4 --max-frames 100

Batch Conversion

Sequential (one file at a time)

python3 batch_convert.py /path/to/input/dir /path/to/output/dir

Parallel (multiple files simultaneously)

# Convert 2 files at once on same GPU
python3 batch_convert.py /path/to/input/dir /path/to/output/dir --parallel 2

# Use multiple GPUs (2 GPUs, 4 parallel jobs)
python3 batch_convert.py /path/to/input/dir /path/to/output/dir \
  --parallel 4 --gpus 0,1

Resume Previous Batch

python3 batch_convert.py /path/to/input/dir /path/to/output/dir --resume

Dry Run (preview without converting)

python3 batch_convert.py /path/to/input/dir /path/to/output/dir --dry-run

Monitoring Parallel Conversions

# Monitor all parallel conversion logs
./monitor_logs.sh /path/to/output/dir

# Watch two specific conversions side-by-side
./watch_parallel.sh /path/to/output/dir

Command-Line Options

converter_8ij_to_mp4.py

Option Default Description
--bit-depth 8 Output bit depth: 8 (H.264) or 10 (HEVC)
--workers 32 CPU threads for JPEG decoding
--gpu 0 GPU device ID
--batch 16 Number of frames per batch
--max-frames None Process only first N frames (testing)
--ffmpeg /opt/8i/bin/ffmpeg Path to ffmpeg binary

batch_convert.py

Option Default Description
--parallel 1 Number of files to convert simultaneously
--gpus 0 Comma-separated GPU IDs (e.g., "0,1")
--workers 32 CPU threads per conversion job
--batch 16 Frames per batch
--resume False Skip already converted files
--dry-run False Preview conversion plan
--ffmpeg /opt/8i/bin/ffmpeg Path to ffmpeg binary

Performance

Encoding Performance (Typical)

  • NVENC (GPU): 80-300 fps (depending on GPU model and resolution)
  • libx264 (CPU fallback): 5-10 fps

Memory Usage

  • GPU VRAM: ~2-4GB per conversion (depends on resolution)
  • System RAM: ~2-4GB per conversion (JPEG decoding buffers)

Throughput Optimization

  • Single stream: ~20 fps on 20-core CPU + RTX A5000
  • Multiple streams: Up to 80 fps with proper parallelization
  • Batch size: Increase --batch for higher throughput (higher VRAM usage)
  • CPU workers: Match to available CPU cores (default: 32)

Multi-GPU Scaling

When using multiple GPUs with batch conversion:

# 4 parallel conversions on 2 GPUs (2 jobs per GPU)
python3 batch_convert.py input/ output/ --parallel 4 --gpus 0,1

The batch converter automatically distributes jobs across GPUs in round-robin fashion.

Encoding Pipeline

Stage 1: JPEG Decoding (CPU)

  • Multi-threaded 12-bit JPEG decoding using imagecodecs/pylibjpeg
  • Batch parallel processing (default: 32 workers)
  • Output: uint16 arrays (0-4095 range)

Stage 2: GPU Conversion (CUDA)

Custom CUDA kernels perform:

  1. Normalize 12-bit → float32 (0.0-1.0)
  2. Apply gamma correction (default: γ=2.2⁻¹)
  3. Scale to target bit depth:
    • 8-bit: 0-255 (uint8)
    • 10-bit: 0-1023 (uint16)
  4. For 8-bit: Convert RGB → RGBA (add alpha channel)
  5. For 10-bit: Convert RGB → P010 (YUV420 10-bit)

Stage 3: Video Encoding (GPU)

Three encoding paths (tried in order):

1. PyNvVideoCodec (Fastest - Zero-Copy)

  • Direct GPU-to-encoder using CUDA Array Interface
  • Eliminates CPU<->GPU memory transfers
  • Requires: PyNvVideoCodec + PyCUDA

2. NVENC via FFmpeg (Fast - Subprocess Pipe)

  • Pipe raw frames to FFmpeg NVENC encoder
  • Hardware-accelerated H.264/HEVC encoding
  • Requires: FFmpeg with NVENC support

3. libx264 via PyAV (Slow - Software Fallback)

  • CPU-based software encoding
  • 8-bit only (10-bit requires NVENC)
  • Used when GPU encoding unavailable

Fallback Behavior

The converter implements a three-tier fallback strategy:

GPU Encoding Fallback

  1. Try PyNvVideoCodec: Zero-copy GPU encoding
    • If unavailable: Try FFmpeg NVENC
  2. Try NVENC via FFmpeg: Hardware encoding with subprocess
    • If NVENC unavailable: Try libx264 (8-bit only)
  3. Use libx264: CPU software encoding (8-bit only)
    • Displays warning with expected performance

10-bit Encoding Requirements

10-bit output requires HEVC with NVENC. If NVENC is unavailable:

RuntimeError: Cannot use 10-bit output with libx264 fallback encoder.
10-bit encoding requires NVENC with HEVC.
Please ensure ffmpeg with NVENC support is available.

JPEG Decoding Fallback

  1. Try imagecodecs: Primary 12-bit JPEG decoder
    • If 12-bit support unavailable: Try pylibjpeg
  2. Try pylibjpeg: Fallback 12-bit JPEG decoder
    • If unavailable: Raise error with installation instructions

File Format Details

Input Format (.8ij)

  • Container for 12-bit JPEG sequences
  • Custom format with stripped JPEG headers:
    • Missing: SOI marker (FF D8) and APP0 marker (FF E0)
    • Starts with: Length field (00 10) + JFIF identifier
  • Frame structure: 16-byte header + JPEG data
    • Header: "8IJ1" magic (4 bytes) + frame size (4 bytes) + frame index (4 bytes) + reserved (4 bytes)
  • Companion .8ii file contains metadata (chunk-based binary format)

Output Format (.mp4)

  • 8-bit: H.264 High Profile, YUV420p, MP4 container
  • 10-bit: HEVC Main10 Profile, YUV420p10le, MP4 container
  • GOP size: Matches frame rate (keyframe every second)
  • Rate control: VBR with CQ=19 (high quality)

Troubleshooting

NVENC Not Available

# Check if NVENC encoder exists
/opt/8i/bin/ffmpeg -encoders | grep nvenc

# Check NVIDIA driver version
nvidia-smi

# Verify CUDA libraries are in LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/lib64:/usr/lib:/opt/8i/lib:$LD_LIBRARY_PATH

CUDA Out of Memory

  • Reduce batch size: --batch 8
  • Use single GPU per conversion
  • Check VRAM usage: nvidia-smi

PyAV Missing NVENC Support

# Rebuild PyAV against system FFmpeg
./fix_nvenc.sh

12-bit JPEG Decoding Fails

# Install fallback decoder
pip install pylibjpeg pylibjpeg-libjpeg

# Or rebuild imagecodecs with 12-bit support
pip install imagecodecs --no-binary imagecodecs

Slow Performance (Falling Back to CPU)

Check that:

  1. NVENC is available in FFmpeg
  2. CUDA driver version is 530+
  3. GPU has sufficient VRAM
  4. No other processes using GPU

Batch Conversion Hangs

  • Check individual job logs in <output_dir>/.conversion_logs/
  • Monitor with: ./monitor_logs.sh <output_dir>
  • Kill hung process and use --resume to continue

Technical Architecture

Memory Optimization

The converter uses several strategies to minimize memory usage:

  1. Pinned Memory: 2x faster CPU<->GPU transfers
  2. Batch Processing: Amortizes overhead across multiple frames
  3. Immediate Cleanup: Frees large arrays after each stage
  4. Zero-Copy Path: PyNvVideoCodec eliminates redundant transfers

Thread/Process Model

  • Single conversion: Multi-threaded JPEG decoding + single GPU stream
  • Batch conversion: Multi-process with one converter per parallel job
  • GPU allocation: Round-robin distribution across specified GPUs

Color Pipeline

  • Input: 12-bit RGB (uint16, 0-4095)
  • Gamma correction: Compensate for 12-bit sensor gamma
  • Output color space:
    • 8-bit: RGB → YUV420p (NVENC handles conversion)
    • 10-bit: RGB → P010 (manual YUV420 with 2x2 chroma subsampling)

License

This project is provided as-is for internal use. Refer to your organization's licensing terms.

Performance Benchmarks

Recent optimizations have achieved:

  • 20 fps for single stream conversion
  • 80 fps maximum on multi-core system (20 cores + RTX A5000)
  • Reduced CPU usage through efficient GPU memory pinning
  • 12-bit JPEG decoding at ~70 fps (with further GPU optimizations pending)

Commit history shows progressive performance improvements targeting both CPU and GPU bottlenecks.

About

GPU-accelerated utility to convert proprietary file formats to standard encodings (both 10-bit and 8-bit)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published