# 💻 Shared-Memory Scheduler Benchmark for Parallel Row Sorting

This project investigates different parallel scheduling strategies for sorting a  three-dimensional byte tensor stored in System V shared memory.

The program creates one original random input tensor, copies it into a separate shared-memory segment for each scheduler, sorts every row of every frame in place, checks that the result is correctly sorted, and reports the execution time of each scheduler in milliseconds.

---

## 📌 Project Overview

The input is treated as a flat `uint8_t` buffer with logical dimensions:

```text
F × H × W
```

where:

| Symbol | Meaning |
|--------|---------|
| `F` | Number of frames / tasks |
| `H` | Number of rows per frame |
| `W` | Number of elements per row |
| `K` | Number of workers used by the scheduler |

For every frame, the benchmark sorts all `H` rows of width `W` using `quick_sort`. Each scheduler receives the same task function, so the comparison focuses on the scheduling strategy rather than the sorting algorithm itself.

---

## 🧠 Implementations - File Map

### Main Executables

| File | Purpose |
|------|---------|
| `shm_generator.cpp` | Creates the original random tensor in shared memory or deletes all project shared-memory segments. |
| `shm_benchmark.cpp` | Runs all scheduler variants on identical copies of the generated tensor and prints timing / correctness results. |

### Core Files

| File | Purpose |
|------|---------|
| `row.hpp`, `row.cpp` | A small templated row container used by the sorting algorithms. |
| `sort_algorithms.hpp` | Contains sorting implementations, including the `quick_sort` used in the benchmark. |
| `scheduler_iface.hpp`, `scheduler_iface.cpp` | Defines the common task callback type and CPU affinity helper. |
| `shm_keys.hpp` | Stores the System V shared-memory keys used by the generator and benchmark copies. |
| `utilization_monitor.hpp`, `utilization_monitor.cpp` | Reads `/proc/stat` and computes CPU utilization metrics for the AIMD scheduler. |

### Thread-Based Schedulers

| Scheduler | Files | Description |
|----------|-------|-------------|
| Static | `static_scheduler.cpp`, `static_scheduler.hpp` | Splits frames into fixed contiguous blocks. |
| Dynamic | `dynamic_scheduler.cpp`, `dynamic_scheduler.hpp` | Uses a shared counter so workers grab one frame at a time. |
| Chunk | `chunk_scheduler.cpp`, `chunk_scheduler.hpp` | Workers grab fixed-size chunks of frames to reduce mutex traffic. |
| Chunk Stealing | `chunk_steal_scheduler.cpp`, `chunk_steal_scheduler.hpp` | Each worker has a deque and can steal chunks from other workers. |
| Guided | `guided_scheduler.cpp`, `guided_scheduler.hpp` | Starts with larger chunks and gradually reduces chunk size. |
| Adaptive | `adaptive_scheduler.cpp`, `adaptive_scheduler.hpp` | Adjusts chunk size according to each worker's observed speed. |
| AIMD | `aimd.cpp`, `aimd.hpp` | Uses an additive-increase / multiplicative-decrease policy driven by CPU utilization. |

### Process-Based Schedulers

| Scheduler | Files | Description |
|----------|-------|-------------|
| Bounded Prolific | `bounded_prolific_scheduler.cpp`, `bounded_prolific_scheduler.hpp` | Uses child processes, with each worker handling a strided subset of tasks. |
| Bounded Collective | `bounded_collective_scheduler.cpp`, `bounded_collective_scheduler.hpp` | Uses a bounded group of child processes to divide the task space collectively. |

### Pipe-Based Schedulers

| Scheduler | Files | Description |
|----------|-------|-------------|
| One-to-One Pipe | `one_to_one_pipe_scheduler.cpp`, `one_to_one_pipe_scheduler.hpp` | Coordinates worker processes through one-to-one pipe communication. |
| One-to-Many Pipe | `one_to_many_scheduler.cpp`, `one_to_many_scheduler.hpp` | Uses one controller-to-many worker pipe distribution. |
| Many-to-Many Pipe | `many_to_many_scheduler.cpp`, `many_to_many_scheduler.hpp` | Uses many-to-many pipe communication between scheduling components. |

---

## 🛠️ Requirements

This project is intended for Linux systems because it depends on:

- System V shared memory: `shmget`, `shmat`, `shmctl`
- POSIX process utilities: `fork`, `wait`, pipes, CPU affinity helpers
- `/proc/stat` for CPU utilization monitoring
- `g++` with C++14 support
- GNU Make
- POSIX threads / pthread
- Boost Thread
- Boost System
- Boost Chrono

The Makefile uses:

```bash
g++ -std=c++14 -Wall -Wextra -O3 -fpermissive
```

and links the benchmark with:

```bash
-lboost_thread -lboost_system -lboost_chrono -lpthread
```

---

## 🛠️ Compilation

After downloading the project folder, run:

```bash
make clean
```

before compiling for the first time. This removes old object files and executables so the build starts from a clean state.

Then compile with:

```bash
make
```

This creates two executables:

```text
shm_generator
shm_benchmark
```

On later runs, `make` recompiles only the files that changed.

---

## 🚀 Execution

The benchmark has two steps:

1. Generate the original shared-memory tensor.
2. Run the benchmark using the same dimensions.

### 1. Generate Input Data

```bash
./shm_generator F H W
```

Example:

```bash
./shm_generator 1000 640 640
```

Expected output format:

```text
Random shared vector created.
Key: 12345
Dimensions: F=1000, H=640, W=640 (flat 1D buffer of 409600000 bytes = 640000 rows of 640)
```

### 2. Run the Benchmark

```bash
./shm_benchmark F H W K [affinity: 0|1]
```

where:

| Argument | Meaning |
|----------|---------|
| `F` | Number of frames / tasks. Must match the generator. |
| `H` | Rows per frame. Must match the generator. |
| `W` | Elements per row. Must match the generator. |
| `K` | Number of worker threads or processes. |
| `affinity` | Optional. `1` enables CPU affinity, `0` disables it. Default is `1`. |

Example:

```bash
./shm_benchmark 1000 640 640 24 1
```

Expected output format:

```text
Benchmark input prepared.
F=1000 H=640 W=640 K=24 affinity=on

one_to_one              ... ms  sorted=yes
one_to_many             ... ms  sorted=yes
many_to_many            ... ms  sorted=yes
prolific                ... ms  sorted=yes
collective              ... ms  sorted=yes
static                  ... ms  sorted=yes
dynamic                 ... ms  sorted=yes
chunk                   ... ms  sorted=yes
chunk_steal             ... ms  sorted=yes
guided                  ... ms  sorted=yes
adaptive                ... ms  sorted=yes
aimd                    ... ms  sorted=yes

Fastest: scheduler_name (... ms)
```

The exact times depend on the machine, number of CPU cores, system load, tensor size, and affinity setting.

---

## 🧹 Cleaning Shared Memory

The benchmark deletes the temporary copy segments it creates, but it leaves the original generated tensor available so the same input can be reused for another run.

To delete all shared-memory segments used by the project, run:

```bash
./shm_generator delete
```

Expected output:

```text
Project shared-memory segments deleted.
```

This is useful after experiments or if a previous run was interrupted.

---

## 📌 Example Full Run

```bash
make clean
make

./shm_generator 1000 640 640
./shm_benchmark 1000 640 640 24 1
./shm_generator delete
```

---

## 📌 Additional Notes

- in order to change the task size of the pipe schedulers, one needs to do it manually by changing the PIPE_SCHEDULER_CHUNK parameter within their respective source files.
For example:
```bash
nano many_to_many_scheduler.cpp
```
on line 19 the parameter is initialized as PIPE_SCHEDULER_CHUNK = 12 which is the best for F=1000 on our implementation.

- Similarly with the AIMD scheduler, one may alter its CHUNK and overload_pct parameters within aimd.cpp.

# Paper link

Gailla, Argyro and Kontodimos, Dimitrios and Papouda, Anastasia and Ioannou, Theofanis and Kastrinaki, Stamatia and Kimpouropoulou, Hermione and Kontogianni, Kleopatra and Kontogiannis, Sotirios and Dedaj, Mejgan and Panagiotidis-Kannas, Michail and Sidiropoulou, Anna-Maria and Tavridis, George, Performance evaluation of scheduling tasks in multicore systems utilizing
processes and threads