DREAMS

Decentralized Resource Allocation and Service Management across the Compute Continuum Using Service Affinity

DREAMS is a decentralized framework that optimizes microservice placement decisions collaboratively across distributed computational domains. Each domain is managed by an autonomous Local Domain Manager (LDM) that coordinates with peers through a Raft-based consensus algorithm and cost-benefit voting to achieve globally optimized service placements with high fault tolerance.

Paper: DREAMS: Decentralized Resource Allocation and Service Management across the Compute Continuum Using Service Affinity (IEEE ISNCC 2025)

Research Context

Modern manufacturing systems require adaptive computing infrastructures that respond to dynamic workloads across the compute continuum. Traditional centralized placement solutions struggle to scale, suffer from latency bottlenecks, and introduce single points of failure. DREAMS addresses this through decentralized, privacy-preserving coordination using Service Affinity -- a multi-dimensional metric capturing runtime communication, design-time dependencies, operational patterns, and data privacy constraints.

Key Contributions

A decentralized decision-making framework for collaborative resource allocation and service management across the compute continuum.
The design and implementation of a reusable Local Domain Manager (LDM), capable of autonomous operation and coordination through consensus mechanisms.
An extensive evaluation demonstrating feasibility and sub-linear scalability as the number of domains increases.

Architecture

Each LDM operates autonomously within its domain while coordinating globally through Raft consensus and a two-phase migration protocol.

flowchart TB
    subgraph LDM1["LDM 1 (us-east4)"]
        direction TB
        AM1["AM\nAdministrative"]
        CCM1["CCM\nConfiguration"]
        ODM1["ODM\nObservability"]
        DMM1["DMM\nDomain Monitoring"]
        MIM1["MIM\nMigration Intelligence"]
        CMM1["CMM\nConsensus Management"]
        MEM1["MEM\nMigration Execution"]
        ICM1["ICM\nInter-Domain Comm."]
    end

    subgraph LDM2["LDM 2 (europe-west3)"]
        direction TB
        CMM2["CMM"]
        ICM2["ICM"]
    end

    subgraph LDM3["LDM 3 (asia-southeast1)"]
        direction TB
        CMM3["CMM"]
        ICM3["ICM"]
    end

    CMM1 <-->|"Raft Consensus\n+ Voting"| CMM2
    CMM2 <-->|"Raft Consensus\n+ Voting"| CMM3
    CMM1 <-->|"Raft Consensus\n+ Voting"| CMM3
    ICM1 <-.->|"Gossip + Latency\nMeasurement"| ICM2
    ICM2 <-.->|"Gossip + Latency\nMeasurement"| ICM3

LDM Modules

Module	Abbreviation	Responsibility
Administrative	AM	Dashboard, policy management, visualization
Configuration Control	CCM	Configuration repository, dynamic updates, validation
Observability & Diagnostics	ODM	Metrics aggregation, event logging
Domain Monitoring	DMM	Service Health Monitor, Service Affinity Calculator
Migration Intelligence	MIM	Migration Eligibility Evaluator (leader), Cost-Benefit Analyzer (follower)
Consensus Management	CMM	Proposal Manager, Voting Engine, Leader Coordinator, Fault Recovery
Migration Execution	MEM	Migration Orchestrator, Rollback Manager, Health Validator
Inter-Domain Communication	ICM	LDM Discovery, Inter-domain Migration Coordinator

Repositories

Repository	Description
Event Log (ELR)	System health metrics, migration events, error traces
Domain State (DSR)	Intra-domain affinity scores, topology, resource availability
Consensus Log (CLR)	Raft messages, committed proposals, leader election history
Migration State (MSR)	Ongoing/completed migrations, checkpoints, rollback support
Peer Domain (PDR)	Peer health, inter-domain latency, membership

Service Affinity

Service Affinity captures five dimensions of microservice relationships to quantify placement quality (Eq. 12.1):

Data affinity -- normalized ratio of bytes exchanged between services to total data volume
Coupling affinity -- normalized message frequency and API call count between services
Functional affinity -- dependency graph membership (shared service dependency groups)
Operational affinity -- hardware similarity vs. resource contention balance
Security & privacy affinity -- penalty-based constraint against cross-group placements

QoS Improvement Score

For a microservice m currently in cluster c_current, the QoS improvement score is:

Q = A_inter - A_intra - L

where:

A_intra = affinity of m to services in its current cluster
A_inter = max affinity of m to any other cluster
L = latency penalty (sigmoid-scaled by affinity gain)

Migration is proposed when Q > θ_proposal.

Two-Phase Migration Protocol

Phase 1 -- Candidate Selection (Leader LDM)

Filter non-migratable microservices and those already in their highest-affinity cluster
Compute cluster affinity A_c(m) for each candidate across all clusters
Determine intra-cluster affinity A_intra and best inter-cluster affinity A_inter
Calculate affinity gain: ΔA = A_inter - A_intra
Retrieve inter-domain latency ℓ to the target cluster
Apply latency penalty with sigmoid scaling: L = ℓ / (1 + exp(ΔA / γ_proposal))
If Q = ΔA - L > θ_proposal, broadcast migration proposal to all LDMs

Phase 2 -- Voting (All LDMs)

Each LDM independently evaluates the proposal:

Compute local impact score: I_local = Σ a(m,v) for all local microservices connected to m
If no local impact (I_local = 0), cast positive vote immediately (positive-vote-by-default)
Normalize impact: Ĩ = (I_local - I_min) / (I_max - I_min + ε)
Compute latency difference: Δℓ = ℓ(target, voter) - ℓ(source, voter)
Calculate affinity penalty weight: W_aff = 1 / (1 + exp(-Ĩ / γ_vote))
Compute scaled latency penalty: P_lat = (Δℓ × W_aff) / ℓ_max
If P_lat < θ_vote, cast positive vote; otherwise, negative vote

Consensus is reached via Raft quorum (majority vote). The theoretical complexity of the voting procedure is O(1) -- all LDMs evaluate independently in parallel.

Tunable Parameters

Parameter	Symbol	Role
Affinity Gain Sensitivity	`γ_proposal`	Smaller = conservative; larger = aggressive migrations
Proposal Threshold	`θ_proposal`	Minimum net benefit to propose migration
Local Impact Sensitivity	`γ_vote`	Smaller = favors local stability; larger = global cooperation
Voting Threshold	`θ_vote`	Maximum acceptable latency penalty for positive vote

Design Principles

Self-governed Decentralization -- Each LDM operates autonomously; no central controller
Privacy-Preserving Computation -- Domains share only aggregated metrics, not raw data
Collaborative Optimization -- Global optimality through local decisions and consensus
Heuristic-Driven Placement -- Affinity-based cost-benefit analysis with sigmoid scaling
Fault Tolerance -- Raft consensus tolerates ⌊(N-1)/2⌋ node failures

Evaluation Results

Evaluated on 3 LDM clusters deployed across Google Cloud regions (us-east4, europe-west3, asia-southeast1) on e2-standard-4 VMs (4 vCPUs, 16 GB RAM, Ubuntu 22.04). Cluster sizes tested from 3 to 20 LDMs.

Optimization Correctness

Metric	Value
Initial inter-domain affinity	630
Globally optimal solution	395
Required migrations	MS3 → us-east4, MS10 → asia-southeast1
Result	Converged to optimal state

Under fault tolerance testing, the system correctly recovered after leader failure via Raft election and resumed optimization without manual intervention.

LDM Registration Latency

Cluster Size	Mean Registration	Trend
3 nodes	556 ms	Baseline
10 nodes	~1,500 ms	Sub-linear
20 nodes	3,121 ms	Sub-linear

Best-case (seed nodes): average 623.9 ms, stable across all configurations. Registration scales sub-linearly with the number of LDMs.

Migration Voting Latency

Cluster Size	Mean Voting Time	Std Dev
3 nodes	642.80 ms	Low
10 nodes	~850 ms	Low
20 nodes	1,054.10 ms	Low

Sub-linear growth with low variance -- voting remains predictable and stable as the cluster scales. The sub-linear trend is caused by second-order effects (minor leader workload increase, median node latency).

Tech Stack

Component	Technology
Language	Java 17
Framework	Quarkus with GraalVM
Consensus	Apache Ratis (Raft)
Clustering & Event Sourcing	Apache Pekko (Protobuf serialization)
Caching	Caffeine
Database	PostgreSQL with Liquibase migrations
Architecture	Hexagonal (Ports & Adapters), Domain-Driven Design
Frontend	Next.js 15 + Cytoscape.js
Deployment	Docker, Kubernetes (GKE)

Getting Started

Prerequisites

Java 17+
Docker & Docker Compose
PostgreSQL 17+ (or use the provided docker-compose)

Running with Docker Compose

# Clone the repository
git clone https://github.com/haidinhtuan/DREAMS.git
cd DREAMS

# Start a 3-node LDM cluster with PostgreSQL
docker compose up -d

# Monitor logs
docker compose logs -f ldm1 ldm2 ldm3

The first LDM (ldm1) initializes the database schema via Liquibase. Other LDMs must have LIQUIBASE_MIGRATE_AT_START=false.

Running from IDE

Add the following to your hosts file:

127.0.0.1 host.docker.internal
127.0.0.1 ldm1
127.0.0.1 ldm2
127.0.0.1 ldm3

Then run with:

./gradlew quarkusDev

Building

# Build the application
./gradlew build

# Build Docker image via JIB
./gradlew clean build -Dquarkus.container-image.build=true -Dquarkus.container-image.push=false

# Build native executable (requires GraalVM)
./gradlew build -Dquarkus.native.enabled=true

Compiling Protobuf Files

protoc -I=src/main/proto/com/dreams/infrastructure/serialization \
  --java_out=src/main/java \
  src/main/proto/com/dreams/infrastructure/serialization/migration_action.proto \
  src/main/proto/com/dreams/infrastructure/serialization/ping_pong.proto \
  src/main/proto/com/dreams/infrastructure/serialization/evaluate_migration_proposal.proto

Dashboard

Access the React dashboard for real-time microservice graph visualization and migration statistics:

http://localhost:3000/graph

The frontend connects via WebSocket to the LDM backend (e.g., ws://localhost:8080/dashboard).

REST APIs

Endpoint	Description
`GET /api/migrations`	Read migration actions from Raft storage
`GET /api/ratis/trigger-leader-change/{raftPeerId}`	Trigger leadership change (must be called on current leader)
`GET /q/health`	Combined health check (liveness + readiness)
`GET /q/health/live`	Liveness probe
`GET /q/health/ready`	Readiness probe
`GET /q/openapi`	OpenAPI specification
`GET /q/swagger-ui`	Swagger UI (dev mode only)

Project Structure

DREAMS/
├── src/main/java/com/dreams/
│   ├── application/
│   │   ├── port/                   # Port interfaces (MigrationService, ClusterMonitoringService)
│   │   └── service/                # MigrationEligibilityEvaluator, ServiceHealthMonitor, MetricsAggregator
│   ├── domain/
│   │   ├── model/                  # Microservice, K8sCluster, MigrationAction, MigrationCandidate
│   │   ├── measurement/            # MeasurementData, MeasurementDataDTO
│   │   └── service/impl/           # ServiceAffinityCalculator, CostBenefitAnalyzer, QoSImprovementCalculator
│   ├── infrastructure/
│   │   ├── adapter/in/
│   │   │   ├── pekko/             # LdmDiscoveryService, HealthExchangeService, MigrationProposalVoter
│   │   │   ├── ratis/             # LDMStateMachine, LeaderCoordinator
│   │   │   ├── rest/              # MigrationActionResource, RaftActionResource
│   │   │   ├── websocket/         # DashboardWebSocket
│   │   │   └── projection/        # ClusterStateProjectionR2dbcHandler
│   │   ├── adapter/out/pekko/      # ProposalManager, ConsensusVotingEngine, MigrationOrchestrator
│   │   ├── config/                 # LdmConfig, ActorSystemManager, RaftServerManager
│   │   ├── serialization/          # Protobuf generated classes
│   │   ├── mapper/                 # MapStruct mappers
│   │   ├── json/                   # Custom JSON serializers/deserializers
│   │   └── persistence/            # JPA entities and converters
│   ├── modules/                     # LDM module facades (AM, CCM, ODM, DMM, MIM, CMM, MEM, ICM)
│   └── shared/                      # Constants and utilities
├── src/test/java/com/dreams/        # Unit tests (20 tests)
├── src/main/proto/                  # Protobuf definitions
├── src/main/resources/
│   ├── application.yaml             # Quarkus + LDM configuration
│   ├── application.conf             # Apache Pekko configuration
│   └── db/changelog/               # Liquibase migrations
├── frontend/ldm-frontend/           # Next.js 15 dashboard
│   ├── app/dashboard/              # Dashboard page (WebSocket with auto-reconnect)
│   └── app/components/             # Graph.tsx (Cytoscape.js), KeyFigureCard.tsx
├── experiments/
│   ├── exp1/ -- exp5/              # Test scenarios with JSON topology files
│   ├── run-experiment.sh           # Experiment automation script
│   └── collect-results.sh         # Results collection script
├── .github/workflows/ci.yml        # GitHub Actions CI/CD
├── docker-compose.yml               # 3-node LDM cluster
├── docker-compose-exp.yml           # 6-node experimental cluster
└── build.gradle                     # Gradle build configuration

Experiments

#	Description	Expected Result
Exp 1	All clusters already optimal	No migration
Exp 2	MS3 misplaced in Berlin	MS3 migrates to New York (highest affinity)
Exp 3	MS10 misplaced in Singapore	MS10 migrates to Berlin (highest affinity)
Exp 4	MS3 and MS10 both misplaced	Both migrate to highest-affinity clusters
Exp 5	6 LDMs, 20 microservices	E2E QoS optimization across 6 domains

Select experiment data by updating the volume mounts in docker-compose.yml:

volumes:
  - ./experiments/exp4/LDM1.json:/data/LDM1.json

Set LEADER_ELECTION_MODE=DEFAULT for realistic Raft-based leader election, or TESTING for fixed leader (faster iteration).

Publications

DREAMS builds on a series of prior work on service affinity and microservice management:

H. Dinh-Tuan, T. H. Nguyen, and S. R. Pandey, "DREAMS: Decentralized Resource Allocation and Service Management across the Compute Continuum Using Service Affinity," 2025 12th International Symposium on Networks, Computers and Communications (ISNCC), IEEE, 2025. [IEEE] [arXiv]
H. Dinh-Tuan and F. F. Six, "Optimizing Cloud-Native Services with SAGA: A Service Affinity Graph-Based Approach," 2024 International Conference on Smart Applications, Communications and Networking (SmartNets), IEEE, 2024, pp. 1-6. [IEEE] [arXiv]
H. Dinh-Tuan and F. Beierle, "MS2M: A message-based approach for live stateful microservices migration," 2022 5th Conference on Cloud and Internet of Things (CIoT), IEEE, 2022, pp. 100-107. [IEEE] [arXiv]
H. Dinh-Tuan, K. Katsarou, and P. Herbke, "Optimizing microservices with hyperparameter optimization," 2021 17th International Conference on Mobility, Sensing and Networking (MSN), IEEE, 2021. [IEEE]

Citation

If you use this work, please cite:

@INPROCEEDINGS{11250481,
  author={Dinh-Tuan, Hai and Nguyen, Tien Hung and Pandey, Sanjeet Raj},
  booktitle={2025 International Symposium on Networks, Computers and Communications (ISNCC)}, 
  title={DREAMS: Decentralized Resource Allocation and Service Management across the Compute Continuum Using Service Affinity}, 
  year={2025},
  volume={},
  number={},
  pages={1-8},
  keywords={Fault tolerance;Scalability;Fault tolerant systems;Microservice architectures;Production;Fourth Industrial Revolution;Resource management;Optimization;Smart manufacturing;Manufacturing systems;compute continuum;microservices;decentralized optimization;Industry 4.0;smart manufacturing},
  doi={10.1109/ISNCC66965.2025.11250481}}

License

This project is licensed under the Apache License 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.github/workflows		.github/workflows
docs/images		docs/images
experiments		experiments
frontend/ldm-frontend		frontend/ldm-frontend
gradle/wrapper		gradle/wrapper
src		src
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
docker-compose-dev.yml		docker-compose-dev.yml
docker-compose-exp.yml		docker-compose-exp.yml
docker-compose.yml		docker-compose.yml
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Folders and files

Latest commit

History

Repository files navigation

DREAMS

Research Context

Key Contributions

Architecture

LDM Modules

Repositories

Service Affinity

QoS Improvement Score

Two-Phase Migration Protocol

Phase 1 -- Candidate Selection (Leader LDM)

Phase 2 -- Voting (All LDMs)

Tunable Parameters

Design Principles

Evaluation Results

Optimization Correctness

LDM Registration Latency

Migration Voting Latency

Tech Stack

Getting Started

Prerequisites

Running with Docker Compose

Running from IDE

Building

Compiling Protobuf Files

Dashboard

REST APIs

Project Structure

Experiments

Publications

Citation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages