RouterBench

Paper | Dataset

The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System

Setup process

Create .env file in the root directory. With the following variables:

CONNECTION_STRING='your mongodb connection string'

if you want to use MongoDB as an embedding cache.

We use Martian as it provides a unified gateway to access all the models we use. Please visit withmartian.com to create a new account and get started.

In root directory, run pip install -e . to install the packages.

Running the pipeilne

The pipeline relies on various command line arguments to specify the configuration. Alternatively, you can specify the configuration in a yaml file and pass it to the command line. Example configurations are in the configs/ directory.

First, if desired, make sure there is a MongoDB instance running that you can connect to. If there is not one, ensure that local_cache: true to ensure that the code only uses local files for caching.

Second, run convert_data.py --config=configs/convert_data.yaml to process the different data formats into a common format. This script can take raw format from martian-evals repo, as well as other relevant input formats.

Third, run evaluate_routers.py --config=configs/evaluate_routers.yaml to use the processed data to evaluate different routers. It generates a csv file (long format) with the results of the evaluation, and creates an EvaluationCollection containing the results.

Fourth, run visualize_results.py --config=configs/visualize.yaml uses the EvaluationCollection to visualize the results in a performance-vs-cost plot.

For these configurations, the paths to the data files will need to be updated to use your local paths. Example files to recreate results from the paper are available on Hugging Face.

Contribution Guide

The code is designed to be easily extended. To add a new router, or convertor for a different input data format, simply look at the abstract classes AbstractRouter and AbstractConvertor in routers/ and convertors/ respectively.

For each PR, please run flake8, black, isort

flake8 $(git ls-files '*.py')
black $(git ls-files '*.py')
isort $(git ls-files '*.py')

$(git ls-files '*.py') is for running only the files tracked by git, so exclude virtual env files or data files. You may need to run pip install flake8 black isort if you don't have them installed.

MMLU Ground Truth Extractor

This script extracts ground truth answers from the MMLU (Massive Multitask Language Understanding) dataset using the Hugging Face datasets library.

Features

Extracts questions and answers from all MMLU dataset splits
Saves output in JSON format

Requirements

pip install datasets tqdm

Usage

cd to extractors

python mmlu_extract_ground_truth_hf.py

This will:

Process all splits (test, auxiliary_train, dev, validation)
Save the output to all_mmlu_splits.json
Use default cache directory for dataset downloads

Output Format

The script generates a JSON file with the following structure:

{
    "question1": "answer1",
    "question2": "answer2",
    ...
}

Add Ground Truth

The add_ground_truth_mmlu.py script processes JSONL files containing MMLU evaluations and adds ground truth answers to them.

Usage

Setup:
- Place your JSONL files in the data/ directory
- Ensure all_mmlu_splits.json (ground truth data) is in the root directory
Run the script: cd to extractors

python mmlu_add_ground_truth_.py

Output:
- Processed files will be saved in data/processed/ directory
- Each output file will have "_with_gt" suffix

Modal update Guide

To deploy the updated modal app, run the following commands:

modal deploy modal_router.py

Citation

If you use this code, please cite the following paper:

@article{hu2024routerbench,
  title   = {ROUTERBENCH: A Benchmark for Multi-LLM Routing System},
  author  = {Qitian Jason Hu and Jacob Bieker and Xiuyu Li and Nan Jiang and Benjamin Keigwin and Gaurav Ranganath and Kurt Keutzer and Shriyash Kaustubh Upadhyay},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2403.12031}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RouterBench

Paper | Dataset

Setup process

Running the pipeilne

Contribution Guide

MMLU Ground Truth Extractor

Features

Requirements

Usage

Output Format

Add Ground Truth

Usage

Modal update Guide

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
configs		configs
convertors		convertors
data		data
embedding		embedding
evaluation		evaluation
extractors		extractors
routers		routers
tests		tests
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
add_ground_truth_mmlu.py		add_ground_truth_mmlu.py
convert_data.py		convert_data.py
cost_comparison_oracle_mmlu.py		cost_comparison_oracle_mmlu.py
evaluate_routers.py		evaluate_routers.py
evaluate_utils.py		evaluate_utils.py
extract_mmlu_hf.py		extract_mmlu_hf.py
modal_router.py		modal_router.py
requirements.txt		requirements.txt
setup.py		setup.py
utils.py		utils.py
visualize_results.py		visualize_results.py

Folders and files

Latest commit

History

Repository files navigation

RouterBench

Paper | Dataset

Setup process

Running the pipeilne

Contribution Guide

MMLU Ground Truth Extractor

Features

Requirements

Usage

Output Format

Add Ground Truth

Usage

Modal update Guide

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages