Skip to content

Infini-AI-Lab/WWW.Serve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WWW.Serve: Interconnecting Global LLM Services through Decentralization

Huanyu Wang, Ziyu Xia, Zhuoming Chen, Beidi Chen

Carnegie Mellon University


[Paper] | [Blog]

TL;DR

WWW.Serve operates as an intermediate decentralized serving layer between users and LLM service providers, offering users access to an open and competitive market of worldwide LLM services while preserving service providers’ anonymity and flexibility. Within WWW.Serve, inference requests follow a collaborative workflow that performs decentralized routing, execution, and quality-aware evaluation.

Three key designs are integrated:

  • a credit-based transaction system for trustless collaboration.
  • a gossip-driven protocol for dynamic peer synchronization.
  • a duel-and-judge mechanism for robust contributor evaluation.

Under various configurations, WWW.Serve improves global SLO attainment by up to 1.5x and lowers latency by 27.6%. Its performance approaches, and in some cases surpasses, centralized scheduling, while preserving the benefits of decentralization.

Repository Structure

The repository is organized as follows:

WWWServe/
├── experiments/
|   ├── simulation/
|   |   └── simu_xxx.py
|   └── visualization/
|       └── visualize_xxx.ipynb
├── node_configs/
|   └── nodex.yaml
├── www_serve/
|   ├── policies/
|   |   └── default_policy.py
|   └── core_codes.py
├── README.md
└── requirements.txt
  • experiments/: simulation scripts for network experiments and notebooks for visualization.
  • node_configs/: YAML configuration files specifying parameters for each node.
  • www_serve/: core implementation of WWW.Serve, including policies and scheduling logic.

Installation

conda create -n wwwserve python=3.12
conda activate wwwserve
pip install -r requirements.txt

Note: The above installs only the core dependencies for scheduling. To deploy actual LLM servers, you will need to manually install additional backends (e.g., SGLang, vLLM) depending on your experimental setup.

Usage

The typical workflow of WWW.Serve consists of the following steps:

1. Launch LLM Servers

Start your preferred LLM backend (OpenAI-Compatible Server) and obtain its base URL and API key. For example:

# SGLang
# Note: use "--enable-metrics" to expose server status for scheduling
python3 -m sglang.launch_server --model-path $MODEL_PATH --host 0.0.0.0 --port $PORT --enable-metrics

# vLLM
vllm serve $MODEL_PATH  --max-model-len=16384 --host 0.0.0.0 --port $PORT

2. Configure Nodes

Each node is specified via a YAML configuration file placed in node_configs/:

server_params:
  ip: 127.0.0.1                  # node communication address
  port: 5778
  policy: default_sglang         # scheduling / dispatching policy
  offload_frequency: 0.8
  queue_frequency: 0.2
  accept_frequency: 0.8

ledger_params:
  initial_credit: 1000.0
  initial_staked: 0.0

models:
  - model_path: Qwen/Qwen3-8B
    base_url: <BASE_URL>:<PORT>  # from launched LLM server (Step 1)
    api_key: None

    gen_params:
      max_tokens: 8192
      temperature: 0.0
      top_p: 0.95

    dispatch_params:
      target_token_usage: 0.6

3. Run Simulation

Use the scripts in experiments/simulation/ to start a network simulation. For example:

cd experiments/simulation
python simu_decentralized.py

# python simu_centralized.py
# python simu_single.py

By default, the simulation outputs are saved in experiments/results/, including:

  • nodex.json: runtime status log of each node.
  • result.json: aggregated results for all requests.

4. Visualize results

The simulation results can be analyzed using the Jupyter notebooks in experiments/visualization/. These notebooks allow you to visualize: Global SLO attainment, request latency distribution, and server load status.

Citation

TODO

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages