Parallel Systems Experimentation

This repository contains a collection of small assignments exploring serial and parallel implementations using Pthreads, OpenMP, MPI, and CUDA. Each assignment demonstrates different parallelization techniques and performance considerations, accompanied by detailed reports analyzing the results.

Problems include:

The Game of Life
Gauss Elimination
Readers/Writers on shared data structures
Matrix Multiplication
N particle simulation

This work was developed as part of the THP04 Parallel Systems Course at NKUA

Development Team

Stamoulis Livas (LinkedIn)
Dimitrios Stefanos Porichis (LinkedIn)

Repository Structure

You can find all implementations of the assignments under src/[Framework]/.
Scripts for gathering experiment data can be found under scripts/, with plotting scripts being in scripts/graph-gen.
Scripts will produce their own directories named hwX_data inside of scripts/, while plots will be saved in the graphs-gen folder.
Reports for all experiments can be found in the reports folder

Compilation and running

You can compile this project by running make all in the main directory.

All executables will be placed inside bin/, so you can execute each assignment using ./bin/[Framework]-[assigment_name] ....

You can run all data-gathering experiments using the bash script in the main directory, by running ./runall.bash

The graphs generation script requires specific Python packages to function. You can find the dependencies inside the scripts\graphs-gen\requirements.txt

Assignment notes

Here you can find notes about each assignment.

Pthreads

Pi Calculation

The program can be executed by running:

$ ./bin/pthread-pi [number of threads] [number of iterations] [optional: exclusivity flag -s/-p]

The optional flag was implemented for easier data gathering

-s runs only the serial implementation.
-p runs only the parallel implementation.
Not specifying a flag will execute both implementations

General notes:

Our parallel implementation does not use synchronization mechanisms. Instead, it stores each thread's results and sums them up at the end.

Parallel Increment Counter Loop

The program contains two thread functions.

For mutex locking, run with:

$ ./bin/pthread-loop [-m] [thread_count] [incr_num]

For atomic increments, use the following:

$ ./bin/pthread-loop [-a] [thread_count] [incr_num]

Parallel Sum Calculation

The program contains two thread functions.

For standard implementation (multiple cache updates):

$ ./bin/pthread-parallel-sum [thread_count] [incr_num]

For optimized cache-aware implementation:

$ ./bin/pthread-parallel-sum [thread_count] [incr_num] [-o]

Parallel List Accessing

The program can be executed by running:

$ ./bin/pthread-parallel-list [priority flag -r/-w] [thread_count] [keys_count] [ops_count] [search_percentage] [insert_percentage]

-w runs the writer's priority implementation.
-r runs the reader's priority implementation.

General notes:

You can find detailed logic information about each implementation in the source code comments

OpenMP

Game of Life

The program can be executed by running:

$ ./bin/omp-game-of-life [#threads (0 for serial implementation)] [dimensions count] [generation count] [optional: per item or per row parallelism (i/r)]

The optional flag was implemented to select the parallelism type

-i each time a thread is assigned a cell.
-r each time a thread is assigned a row.
Not specifying a flag will execute the per-row implementation

General notes:

For our grid, we use an oversized table of (dim + 2)x(dim + 2). By filling the frame with zeros, we can make neighbour calculations easier, without the need for checking corner cases (e.g a[0][0]).
As there is a clear dependency on the current generation to calculate the next, parallelism can only be used in the calculation process of a generation and NOT on the outer generation loop.
Our per-row implementation allows threads to use the locality of items in the same row, while also preventing cache refreshes for the other threads when editing a cell.

Gauss Elimination

The program can be executed by running:

$ ./bin/omp-gauss-elimination [matrice dimension] [-s (serial) | -p (parallel)] [-r (per row) | -c (per column)] [thread_count]

The source code contains implementations of four algorithms in total (2 with serial implementation and 2 with parallel).

Parallelization occurs in the inner loop in both algorithms.

MPI

Game of Life

The program can be executed by running:

$ mpiexec  [optional: -f ../machines]  [-n num of processes] ../bin/mpi-game-of-life [dimensions count] [generation count]

General notes:

The root process initializes a (dim + 2)*(dim + 2) size grid and scatters it to every process in chunks of rows_per_proc = dim / commsz.
Every process has its own local allocated chunk of size rows_per_proc + 2. These chunks have two extra upper and lower rows to make space for the fringes that it will receive from neighboring processes.
Sends its (central) rows to neighboring processes as fringes (nonblocking send) and receives from the neighboring processes its own fringes.
Calculates the (central) rows_per_proc with the help of the received fringes and repeats from the second step for generation count times.

Table Matrix Multiplication

The program can be executed by running:

$ mpiexec [optional: -f ../machines ] [-n num_of_processes] ../bin/mpi-table-matrix-mult [dimension]

General notes:

The program creates a random matrix and vector and performs the multiplication calculation upon it. To minimize cache misses we take as granted that the resulted random matrix is the TRANSPOSED version of the intended one.
Contrary to standard, columns are represented as table rows and vice versa.
Each process receives a part of the matrix and vector via the scatter function and performs its stand-alone calculations later merging them using reduction.
Time takes into account only the calculation aspect (including communication times) and not time need for allocating memory

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
bin		bin
build		build
header		header
libs		libs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
askdate.sh		askdate.sh
machines		machines
makefile		makefile
runMPI.bash		runMPI.bash
runOMP.bash		runOMP.bash
runPthread.bash		runPthread.bash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Systems Experimentation

Development Team

Repository Structure

Compilation and running

Assignment notes

Pthreads

Pi Calculation

Parallel Increment Counter Loop

Parallel Sum Calculation

Parallel List Accessing

OpenMP

Game of Life

Gauss Elimination

MPI

Game of Life

Table Matrix Multiplication

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Parallel Systems Experimentation

Development Team

Repository Structure

Compilation and running

Assignment notes

Pthreads

Pi Calculation

Parallel Increment Counter Loop

Parallel Sum Calculation

Parallel List Accessing

OpenMP

Game of Life

Gauss Elimination

MPI

Game of Life

Table Matrix Multiplication

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages