GPU-Accelerated Video Feature Detection

This project implements FAST (Features from Accelerated Segment Test) corner detection using three distinct GPU programming models: OpenACC, CUDA Python (Numba), and Native CUDA C++. The goal is to demonstrate the performance trade-offs between development ease and raw hardware control, achieving real-time video processing speeds that far exceed naive CPU implementations.

🚀 Key Features

Hybrid Pipeline: Python handles video I/O (using imageio) while C++/CUDA handles heavy computation.
Three Acceleration Methods:
1. OpenACC: Directive-based acceleration for rapid C++ parallelization.
2. CUDA Python: High-level kernel implementation using Numba JIT.
3. CUDA C++: Low-level implementation using the CUDA Runtime API.
Scalable Kernels: Implements Grid-Stride Loops to handle arbitrary image resolutions (1080p, 4K) safely.
Performance: Achieved up to ~1958 FPS with OpenACC and a 176x speedup for the Python pipeline.

📂 Project Structure

File	Description
`main.py`	Primary entry point. Implements the Numba/CUDA Python kernels and visualizer.
`fast_both.cpp`	Contains the OpenACC GPU implementation and the CPU baseline logic.
`fast_both.cu`	Contains the Native CUDA C++ implementation using the Runtime API.
`run_cpu.py`	Python wrapper to benchmark and visualize the CPU Baseline (via C++ shared library).
`run_gpu.py`	Python wrapper to benchmark and visualize the OpenACC GPU implementation.
`run_fast.py`	Helper script for video I/O and data preprocessing (flattening frames).

🛠️ Prerequisites

Hardware: NVIDIA GPU (Compute Capability 6.0+ recommended).
Software:
- NVIDIA HPC SDK (for nvc++ OpenACC compiler).
- CUDA Toolkit (for nvcc).
- Python 3.8+ with libraries: numpy, imageio, numba, opencv-python.

⚙️ Compilation & Installation

Compile Native CUDA (fast_both.cu) nvcc -Xcompiler -fPIC -shared -o libfast_cuda.so fast_both.cu
Install Python Dependencies pip install numpy imageio numba opencv-python

💻 Usage Running the CUDA Python (Numba) Implementation python main.py

Running the OpenACC Implementation

Ensure libfast_both.so is in the same directory:

python run_gpu.py

Running the CPU Baseline

⚠️ This will be slow:

python run_cpu.py

📊 Performance Results Implementation Frame Rate (FPS) Speedup vs Baseline OpenACC ~1958.7 7.92× (vs Optimized C++) CUDA Python ~92.0 176.35× (vs Python CPU) CUDA C++ ~1267.9 3.76× (vs Optimized C++)

👥 Authors

Fahad Aloraini

Tariq Alshaya

Ziyad Hussein

Before running the Python scripts, you must compile the C++ and CUDA files into shared libraries (.so files).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
CUDA_C++		CUDA_C++
CUDA_Python		CUDA_Python
ProjectReportTemplate.pdf		ProjectReportTemplate.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU-Accelerated Video Feature Detection

🚀 Key Features

📂 Project Structure

🛠️ Prerequisites

⚙️ Compilation & Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPU-Accelerated Video Feature Detection

🚀 Key Features

📂 Project Structure

🛠️ Prerequisites

⚙️ Compilation & Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages