Skip to content

DPorichis/parallel-experimentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

159 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Systems Experimentation

This repository contains a collection of small assignments exploring serial and parallel implementations using Pthreads, OpenMP, MPI, and CUDA. Each assignment demonstrates different parallelization techniques and performance considerations, accompanied by detailed reports analyzing the results.

Problems include:

  • The Game of Life
  • Gauss Elimination
  • Readers/Writers on shared data structures
  • Matrix Multiplication
  • N particle simulation

This work was developed as part of the THP04 Parallel Systems Course at NKUA

Development Team

Repository Structure

  • You can find all implementations of the assignments under src/[Framework]/.

  • Scripts for gathering experiment data can be found under scripts/, with plotting scripts being in scripts/graph-gen.

  • Scripts will produce their own directories named hwX_data inside of scripts/, while plots will be saved in the graphs-gen folder.

  • Reports for all experiments can be found in the reports folder

Compilation and running

You can compile this project by running make all in the main directory.

All executables will be placed inside bin/, so you can execute each assignment using ./bin/[Framework]-[assigment_name] ....

You can run all data-gathering experiments using the bash script in the main directory, by running ./runall.bash

The graphs generation script requires specific Python packages to function. You can find the dependencies inside the scripts\graphs-gen\requirements.txt

Assignment notes

Here you can find notes about each assignment.

Pthreads

Pi Calculation

The program can be executed by running:

$ ./bin/pthread-pi [number of threads] [number of iterations] [optional: exclusivity flag -s/-p]

The optional flag was implemented for easier data gathering

  • -s runs only the serial implementation.
  • -p runs only the parallel implementation.
  • Not specifying a flag will execute both implementations

General notes:

Our parallel implementation does not use synchronization mechanisms. Instead, it stores each thread's results and sums them up at the end.

Parallel Increment Counter Loop

The program contains two thread functions.

For mutex locking, run with:

$ ./bin/pthread-loop [-m] [thread_count] [incr_num]

For atomic increments, use the following:

$ ./bin/pthread-loop [-a] [thread_count] [incr_num]

Parallel Sum Calculation

The program contains two thread functions.

For standard implementation (multiple cache updates):

$ ./bin/pthread-parallel-sum [thread_count] [incr_num]

For optimized cache-aware implementation:

$ ./bin/pthread-parallel-sum [thread_count] [incr_num] [-o]

Parallel List Accessing

The program can be executed by running:

$ ./bin/pthread-parallel-list [priority flag -r/-w] [thread_count] [keys_count] [ops_count] [search_percentage] [insert_percentage]
  • -w runs the writer's priority implementation.
  • -r runs the reader's priority implementation.

General notes:

You can find detailed logic information about each implementation in the source code comments


OpenMP

Game of Life

The program can be executed by running:

$ ./bin/omp-game-of-life [#threads (0 for serial implementation)] [dimensions count] [generation count] [optional: per item or per row parallelism (i/r)]

The optional flag was implemented to select the parallelism type

  • -i each time a thread is assigned a cell.
  • -r each time a thread is assigned a row.
  • Not specifying a flag will execute the per-row implementation

General notes:

  • For our grid, we use an oversized table of (dim + 2)x(dim + 2). By filling the frame with zeros, we can make neighbour calculations easier, without the need for checking corner cases (e.g a[0][0]).
  • As there is a clear dependency on the current generation to calculate the next, parallelism can only be used in the calculation process of a generation and NOT on the outer generation loop.
  • Our per-row implementation allows threads to use the locality of items in the same row, while also preventing cache refreshes for the other threads when editing a cell.

Gauss Elimination

The program can be executed by running:

$ ./bin/omp-gauss-elimination [matrice dimension] [-s (serial) | -p (parallel)] [-r (per row) | -c (per column)] [thread_count]

The source code contains implementations of four algorithms in total (2 with serial implementation and 2 with parallel).

Parallelization occurs in the inner loop in both algorithms.


MPI

Game of Life

The program can be executed by running:

$ mpiexec  [optional: -f ../machines]  [-n num of processes] ../bin/mpi-game-of-life [dimensions count] [generation count]

General notes:

  • The root process initializes a (dim + 2)*(dim + 2) size grid and scatters it to every process in chunks of rows_per_proc = dim / commsz.
  • Every process has its own local allocated chunk of size rows_per_proc + 2. These chunks have two extra upper and lower rows to make space for the fringes that it will receive from neighboring processes.
  • Sends its (central) rows to neighboring processes as fringes (nonblocking send) and receives from the neighboring processes its own fringes.
  • Calculates the (central) rows_per_proc with the help of the received fringes and repeats from the second step for generation count times.

Table Matrix Multiplication

The program can be executed by running:

$ mpiexec [optional: -f ../machines ] [-n num_of_processes] ../bin/mpi-table-matrix-mult [dimension]

General notes:

  • The program creates a random matrix and vector and performs the multiplication calculation upon it. To minimize cache misses we take as granted that the resulted random matrix is the TRANSPOSED version of the intended one.
  • Contrary to standard, columns are represented as table rows and vice versa.
  • Each process receives a part of the matrix and vector via the scatter function and performs its stand-alone calculations later merging them using reduction.
  • Time takes into account only the calculation aspect (including communication times) and not time need for allocating memory

About

Parallel Systems Experimentation - Serial and parallel implementations with Pthreads, OpenMP, MPI, and CUDA, developed as part of the THP04 Parallel Systems course at NKUA.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors