Skip to content

Dimitrios-Kafetzis/DNN-Partitioning-and-Offloading-Testbed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DNN Partitioning and Offloading Testbed

License: MIT Python 3.8+ TensorFlow 2.x

A research testbed for evaluating DNN (Deep Neural Network) partitioning and offloading strategies on edge computing networks. The system distributes DNN inference across multiple Raspberry Pi devices connected to a central Base Station, enabling systematic benchmarking and optimization of partition schemes.

   ┌──────────┐         WiFi          ┌──────────────┐
   │ End Dev 1├─────────────────────┐ │              │
   │  (RPi)   │                     ├─┤ Base Station │
   ├──────────┤                     │ │ (Controller) │
   │ End Dev 2├─────────────────────┤ │              │
   │  (RPi)   │                     │ └──────────────┘
   ├──────────┤                     │
   │ End Dev N├─────────────────────┘
   │  (RPi)   │
   └──────────┘

Key Features

  • Layer-Level DNN Partitioning: Split DNN inference at any layer boundary across devices.
  • Four Optimization Algorithms: Greedy, Simulated Annealing, Genetic Algorithm, and Trend-Based Greedy.
  • Automated Benchmarking: Per-layer execution latency, CPU, and memory profiling across all devices.
  • Network-Aware Optimization: Incorporates measured communication latencies into partition decisions.
  • Power Monitoring: Optional INA260 sensor integration for per-device power consumption measurement.
  • Web Dashboard: Flask-based GUI with interactive map, scenario management, and real-time terminal output.
  • Configurable Scenarios: Per-node latency injection and computational load generation.
  • Task Scheduling: SJF (Shortest Job First) and Round Robin scheduling algorithms.

Project Structure

dnn-partitioning-testbed/
├── src/
│   ├── config.py                    # Central configuration loader
│   ├── models/                      # DNN model definitions
│   │   ├── alexnet.py               # AlexNet, MediumNet, DeepNet
│   │   ├── training.py              # Model training (CIFAR-10)
│   │   ├── inference.py             # Partial inference & benchmarking
│   │   └── data_utils.py            # Dataset utilities
│   ├── partitioning/                # Partitioning & optimization
│   │   ├── partitioner.py           # Greedy, SA, GA, Trend-Based
│   │   ├── delay_optimizer.py       # End-to-end delay calculation
│   │   ├── task.py                  # DNN task representation
│   │   └── scheduling.py            # SJF & Round Robin schedulers
│   ├── benchmarking/                # Benchmark processing
│   │   ├── benchmark_results.py     # Result data structures
│   │   ├── result_aggregation.py    # Cross-node aggregation
│   │   └── partition_calculator.py  # Feasible partition computation
│   ├── network/                     # Network operations
│   │   ├── ssh_executor.py          # Concurrent SSH execution
│   │   ├── latency_measurement.py   # Latency & system info
│   │   ├── power_measurement.py     # INA260 power sensing
│   │   └── node_statistics.py       # Node metric averaging
│   └── testbed/                     # Testbed control
│       ├── load_generator.py        # CPU & memory load injection
│       └── latency_injector.py      # Network latency injection (tc)
├── gui/
│   ├── app.py                       # Flask web application
│   ├── static/styles.css            # Dashboard styles
│   └── templates/                   # Jinja2 HTML templates
│       ├── index.html               # Main dashboard
│       └── map.html                 # Standalone map view
├── scripts/
│   ├── base_station/                # BS deployment scripts
│   │   ├── run_command.sh           # Remote command execution
│   │   ├── run_full_benchmarking.sh # Complete benchmarking pipeline
│   │   ├── benchmark_alexnet.sh     # AlexNet benchmarking
│   │   └── train_and_run_benchmark.sh
│   └── end_device/                  # ED deployment scripts
│       ├── benchmark_alexnet.sh     # AlexNet benchmarking (ED)
│       ├── train_and_run_benchmark.sh
│       ├── transfer_files.sh        # SCP file transfer
│       ├── apply_latency.sh         # tc latency injection
│       └── create_load.sh           # CPU/memory load generator
├── data/sample/                     # Example data files
│   └── nodes.example.json           # Network topology example
├── docs/
│   └── architecture.md              # Detailed architecture docs
├── results/                         # Benchmark output (gitignored)
├── config.example.yaml              # Configuration template
├── requirements.txt                 # Python dependencies
├── CONTRIBUTING.md                  # Contribution guidelines
└── LICENSE                          # MIT License

Prerequisites

  • Base Station: Linux machine (tested on Ubuntu 22.04)
  • End Devices: Raspberry Pi 4 (tested with Raspberry Pi OS)
  • Python: 3.8 or higher
  • Network: All devices on the same WiFi/LAN network
  • Optional: INA260 power sensor connected via I2C on end devices

Installation

# Clone the repository
git clone https://github.com/Dimitrios-Kafetzis/DNN-Partitioning-and-Offloading-Testbed.git
cd DNN-Partitioning-and-Offloading-Testbed

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Copy and edit configuration
cp config.example.yaml config.yaml
# Edit config.yaml with your network settings

Configuration

Copy config.example.yaml to config.yaml and update the values:

ssh_username: pi                    # SSH username for end devices
base_ip_prefix: "192.168.0."       # Network IP prefix
network_interface: wlan0            # Network interface on end devices

Sensitive values should be set via environment variables:

export TESTBED_SSH_PASSWORD="your_password"   # Or use SSH keys
export TESTBED_BS_DIR="/path/to/base/station"
export TESTBED_ED_DIR="/path/to/end/device"

Quick Start

1. Train the DNN Model

python3 src/models/training.py --filepath models --filename alexnet_cifar10

2. Run Benchmarking

# On each end device, or remotely via the GUI
python3 src/models/inference.py benchmark \
    --image_path data/test_image.jpg \
    --model_path models/alexnet_cifar10

3. Optimize Partitioning

python3 src/partitioning/delay_optimizer.py \
    calculated_max_partition_points.json \
    total_averages_of_benchmark_results.json \
    latencies/final_latencies_averages.json \
    --scheduling_algorithm SJF \
    --mode optimize

4. Launch the GUI

cd gui
python3 app.py
# Open http://localhost:8001 in your browser

Optimization Algorithms

Algorithm Description Best For
Greedy Iteratively assigns partitions minimizing delay at each step Fast initial solutions
Simulated Annealing Probabilistic search with temperature-based acceptance Escaping local optima
Genetic Algorithm Population-based evolution with crossover and mutation Exploring large search spaces
Trend-Based Greedy Multi-solution greedy with time budget Balanced speed/quality

Citation

If you use this testbed in your research, please cite:

@inproceedings{kafetzis2024demo,
  author    = {Kafetzis, Dimitrios and Koutsopoulos, Iordanis},
  title     = {Demo: An Experimental Platform for AI Model Partitioning on Resource-constrained Devices},
  booktitle = {Proceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc '24)},
  year      = {2024},
  publisher = {Association for Computing Machinery},
  address   = {New York, NY, USA},
  doi       = {10.1145/3641512.3690629},
  url       = {https://doi.org/10.1145/3641512.3690629},
}

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Author

Dimitrios KafetzisGitHub

About

A research testbed for evaluating DNN partitioning and offloading strategies on edge computing networks. Distributes inference across Raspberry Pi devices with optimization algorithms, automated benchmarking, network-aware optimization, power monitoring, and a Flask-based web dashboard for scenario management.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors