Skip to content

ymalshalabi/topic-modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU-Accelerated NLP for Topic Modeling

Aknowledgement

This project was completed under the supervison of Dr. Ayaz ul Hassan Khan as part of the "GPU Programming & Architecture" course at King Fahad University of Petroleum and Minerals By Haya Alhuraib and Yasmin Alshalabi.

Project Overview

This project implements Latent Dirichlet Allocation (LDA), a probabilistic topic modeling algorithm, across four different computing platforms to compare performance:

  1. Sequential CPU Implementation (C++)
  2. OpenACC GPU Implementation (C++ with OpenACC directives)
  3. CUDA Python Implementation (Python with Numba/CUDA)
  4. CUDA C/C++ Implementation (Native CUDA with Thrust)

Purpose

Benchmark and analyze the performance differences between CPU and GPU implementations of LDA topic modeling, comparing different GPU programming paradigms.

Data sets used are in the available in the zip folder

NOTE: After selecting the dataset you want to process, make sure to update the dataset filename in the preprocessing script

📁 Project Structure

Data Preprocessing (Data Import & Preprocessing section)

  • Input: CSV files with text data
  • Processing:
    • Tokenization and cleaning (lowercase, remove short words, stopwords)
    • Vocabulary building
    • Bag-of-Words (BoW) representation
  • Output:
    • training_bow.csv: Document-word-count matrix
    • vocab_map.txt: Vocabulary mapping (word_id → word)

Implementation Details

1. Sequential CPU Implementation (lda_cpu.cpp)

  • Pure C++ implementation running on CPU
  • Uses standard C++ libraries (<random>, <chrono>, etc.)
  • Serial Gibbs sampling for topic assignment
  • Output: Nw_k_matrix.csv, Nd_k_matrix.csv, corpus_tokens.csv

2. OpenACC GPU Implementation (lda_openacc.cpp)

  • C++ with OpenACC directives for GPU offloading
  • Uses #pragma acc for parallelization
  • Includes custom xorshift64* PRNG for GPU
  • Memory management with malloc/free
  • Compilation: nvc++ -acc -gpu=managed,cc70

3. CUDA Python Implementation (lda_gpu.py)

  • Python implementation using Numba CUDA
  • Kernels for zero initialization, count building, and topic sampling
  • Uses Numba's random number generators
  • Automatic block/thread configuration
  • Dependencies: numpy, numba

4. CUDA C/C++ Implementation (lda_gpu_c.cu)

  • Native CUDA with Thrust library
  • Uses Thrust device vectors and algorithms
  • Custom CUDA kernels for count building
  • Functor-based topic sampling with Thrust
  • Compilation: nvcc -arch=sm_75

About

GPU-Accelerated Topic Modeling (LDA) — OpenACC, CUDA Python, and CUDA C/C++ Implementations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors