This project was completed under the supervison of Dr. Ayaz ul Hassan Khan as part of the "GPU Programming & Architecture" course at King Fahad University of Petroleum and Minerals By Haya Alhuraib and Yasmin Alshalabi.
This project implements Latent Dirichlet Allocation (LDA), a probabilistic topic modeling algorithm, across four different computing platforms to compare performance:
- Sequential CPU Implementation (C++)
- OpenACC GPU Implementation (C++ with OpenACC directives)
- CUDA Python Implementation (Python with Numba/CUDA)
- CUDA C/C++ Implementation (Native CUDA with Thrust)
Benchmark and analyze the performance differences between CPU and GPU implementations of LDA topic modeling, comparing different GPU programming paradigms.
NOTE: After selecting the dataset you want to process, make sure to update the dataset filename in the preprocessing script
- Input: CSV files with text data
- Processing:
- Tokenization and cleaning (lowercase, remove short words, stopwords)
- Vocabulary building
- Bag-of-Words (BoW) representation
- Output:
training_bow.csv: Document-word-count matrixvocab_map.txt: Vocabulary mapping (word_id → word)
- Pure C++ implementation running on CPU
- Uses standard C++ libraries (
<random>,<chrono>, etc.) - Serial Gibbs sampling for topic assignment
- Output:
Nw_k_matrix.csv,Nd_k_matrix.csv,corpus_tokens.csv
- C++ with OpenACC directives for GPU offloading
- Uses
#pragma accfor parallelization - Includes custom xorshift64* PRNG for GPU
- Memory management with
malloc/free - Compilation:
nvc++ -acc -gpu=managed,cc70
- Python implementation using Numba CUDA
- Kernels for zero initialization, count building, and topic sampling
- Uses Numba's random number generators
- Automatic block/thread configuration
- Dependencies:
numpy,numba
- Native CUDA with Thrust library
- Uses Thrust device vectors and algorithms
- Custom CUDA kernels for count building
- Functor-based topic sampling with Thrust
- Compilation:
nvcc -arch=sm_75