Skip to content

TarqAbdullah/COE506

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

GPU-Accelerated Video Feature Detection

Language Acceleration

This project implements FAST (Features from Accelerated Segment Test) corner detection using three distinct GPU programming models: OpenACC, CUDA Python (Numba), and Native CUDA C++. The goal is to demonstrate the performance trade-offs between development ease and raw hardware control, achieving real-time video processing speeds that far exceed naive CPU implementations.

๐Ÿš€ Key Features

  • Hybrid Pipeline: Python handles video I/O (using imageio) while C++/CUDA handles heavy computation.
  • Three Acceleration Methods:
    1. OpenACC: Directive-based acceleration for rapid C++ parallelization.
    2. CUDA Python: High-level kernel implementation using Numba JIT.
    3. CUDA C++: Low-level implementation using the CUDA Runtime API.
  • Scalable Kernels: Implements Grid-Stride Loops to handle arbitrary image resolutions (1080p, 4K) safely.
  • Performance: Achieved up to ~1958 FPS with OpenACC and a 176x speedup for the Python pipeline.

๐Ÿ“‚ Project Structure

File Description
main.py Primary entry point. Implements the Numba/CUDA Python kernels and visualizer.
fast_both.cpp Contains the OpenACC GPU implementation and the CPU baseline logic.
fast_both.cu Contains the Native CUDA C++ implementation using the Runtime API.
run_cpu.py Python wrapper to benchmark and visualize the CPU Baseline (via C++ shared library).
run_gpu.py Python wrapper to benchmark and visualize the OpenACC GPU implementation.
run_fast.py Helper script for video I/O and data preprocessing (flattening frames).

๐Ÿ› ๏ธ Prerequisites

  • Hardware: NVIDIA GPU (Compute Capability 6.0+ recommended).
  • Software:
    • NVIDIA HPC SDK (for nvc++ OpenACC compiler).
    • CUDA Toolkit (for nvcc).
    • Python 3.8+ with libraries: numpy, imageio, numba, opencv-python.

โš™๏ธ Compilation & Installation

  1. Compile Native CUDA (fast_both.cu) nvcc -Xcompiler -fPIC -shared -o libfast_cuda.so fast_both.cu

  2. Install Python Dependencies pip install numpy imageio numba opencv-python

๐Ÿ’ป Usage Running the CUDA Python (Numba) Implementation python main.py

Running the OpenACC Implementation

Ensure libfast_both.so is in the same directory:

python run_gpu.py

Running the CPU Baseline

โš ๏ธ This will be slow:

python run_cpu.py

๐Ÿ“Š Performance Results Implementation Frame Rate (FPS) Speedup vs Baseline OpenACC ~1958.7 7.92ร— (vs Optimized C++) CUDA Python ~92.0 176.35ร— (vs Python CPU) CUDA C++ ~1267.9 3.76ร— (vs Optimized C++)

๐Ÿ‘ฅ Authors

Fahad Aloraini

Tariq Alshaya

Ziyad Hussein

Before running the Python scripts, you must compile the C++ and CUDA files into shared libraries (.so files).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors