WWW.Serve: Interconnecting Global LLM Services through Decentralization

Huanyu Wang, Ziyu Xia, Zhuoming Chen, Beidi Chen

Carnegie Mellon University

TL;DR

WWW.Serve operates as an intermediate decentralized serving layer between users and LLM service providers, offering users access to an open and competitive market of worldwide LLM services while preserving service providers’ anonymity and flexibility. Within WWW.Serve, inference requests follow a collaborative workflow that performs decentralized routing, execution, and quality-aware evaluation.

Three key designs are integrated:

a credit-based transaction system for trustless collaboration.
a gossip-driven protocol for dynamic peer synchronization.
a duel-and-judge mechanism for robust contributor evaluation.

Under various configurations, WWW.Serve improves global SLO attainment by up to 1.5x and lowers latency by 27.6%. Its performance approaches, and in some cases surpasses, centralized scheduling, while preserving the benefits of decentralization.

Repository Structure

The repository is organized as follows:

WWWServe/
├── experiments/
|   ├── simulation/
|   |   └── simu_xxx.py
|   └── visualization/
|       └── visualize_xxx.ipynb
├── node_configs/
|   └── nodex.yaml
├── www_serve/
|   ├── policies/
|   |   └── default_policy.py
|   └── core_codes.py
├── README.md
└── requirements.txt

experiments/: simulation scripts for network experiments and notebooks for visualization.
node_configs/: YAML configuration files specifying parameters for each node.
www_serve/: core implementation of WWW.Serve, including policies and scheduling logic.

Installation

conda create -n wwwserve python=3.12
conda activate wwwserve
pip install -r requirements.txt

Note: The above installs only the core dependencies for scheduling. To deploy actual LLM servers, you will need to manually install additional backends (e.g., SGLang, vLLM) depending on your experimental setup.

Usage

The typical workflow of WWW.Serve consists of the following steps:

1. Launch LLM Servers

Start your preferred LLM backend (OpenAI-Compatible Server) and obtain its base URL and API key. For example:

# SGLang
# Note: use "--enable-metrics" to expose server status for scheduling
python3 -m sglang.launch_server --model-path $MODEL_PATH --host 0.0.0.0 --port $PORT --enable-metrics

# vLLM
vllm serve $MODEL_PATH  --max-model-len=16384 --host 0.0.0.0 --port $PORT

2. Configure Nodes

Each node is specified via a YAML configuration file placed in node_configs/:

server_params:
  ip: 127.0.0.1                  # node communication address
  port: 5778
  policy: default_sglang         # scheduling / dispatching policy
  offload_frequency: 0.8
  queue_frequency: 0.2
  accept_frequency: 0.8

ledger_params:
  initial_credit: 1000.0
  initial_staked: 0.0

models:
  - model_path: Qwen/Qwen3-8B
    base_url: <BASE_URL>:<PORT>  # from launched LLM server (Step 1)
    api_key: None

    gen_params:
      max_tokens: 8192
      temperature: 0.0
      top_p: 0.95

    dispatch_params:
      target_token_usage: 0.6

3. Run Simulation

Use the scripts in experiments/simulation/ to start a network simulation. For example:

cd experiments/simulation
python simu_decentralized.py

# python simu_centralized.py
# python simu_single.py

By default, the simulation outputs are saved in experiments/results/, including:

nodex.json: runtime status log of each node.
result.json: aggregated results for all requests.

4. Visualize results

The simulation results can be analyzed using the Jupyter notebooks in experiments/visualization/. These notebooks allow you to visualize: Global SLO attainment, request latency distribution, and server load status.

Citation

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
assets		assets
experiments		experiments
node_configs		node_configs
www_serve		www_serve
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WWW.Serve: Interconnecting Global LLM Services through Decentralization

TL;DR

Repository Structure

Installation

Usage

1. Launch LLM Servers

2. Configure Nodes

3. Run Simulation

4. Visualize results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

Folders and files

Latest commit

History

Repository files navigation

WWW.Serve: Interconnecting Global LLM Services through Decentralization

TL;DR

Repository Structure

Installation

Usage

1. Launch LLM Servers

2. Configure Nodes

3. Run Simulation

4. Visualize results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages