CARAML

Compact Automated Reproducible Assessment of Machine Learning (CARAML) is a benchmark framework designed to assess mainstream Computer Vision (CV) and Natural Language Processing (NLP) workloads on novel accelerators. It has been developed and extensively tested on systems at the Jülich Supercomputing Centre (JSC).

CARAML provides a compact and automated benchmarking tool that leverages JUBE, a scripting-based framework for creating benchmark sets, running them across different systems, and evaluating results. Additionally, it includes power/energy measurements through the jpwr tool.

Paper: Arxiv, IEEE

Tested Accelerators:

CARAML has been tested on the JURECA-DC EVALUATION PLATFORM, JURECA-DC, JEDI and WEST-AI Nodes. These include the accelerators:

AMD MI200 node with 4 $\times$ MI250 GPUs (tag: MI250)
Graphcore IPU-POD4 M2000 with 4 $\times$ GC200 IPUs (tag: GC200)
NVIDIA Ampere node (SXM) with 4 $\times$ A100 GPUs (tag: A100)
NVIDIA Hopper node (PCIe) with 4 $\times$ H100 GPUs (tag: H100)
NVIDIA Hopper node (NVLink) with 4 $\times$ H100 GPUs (tag: WAIH100)
NVIDIA Grace-Hopper chip with 1 $\times$ GH200 GPU (tag: GH200)
NVIDIA Grace-Hopper Node with 4 $\times$ GH200 GPUs (tag: JEDI)

Benchmark

CARAML currently provides two main benchmarks implemented in Python:

1. Computer Vision: Image Classification (Training)

The image_classification model training benchmark is implemented in PyTorch. It is designed to test image classification models such as ResNet50 on various accelerators. For IPU's graphcore/examples is used. Performance is measured in images/sec and energy is measured in Wh.

Note: Support for the Image Classification benchmark in TensorFlow has been discontinued.

2. GPT Language Model (LLM Training)

The LLM-training benchmark is implemented in PyTorch with:

Megatron-LM with commit: f7727433293427bef04858f67b2889fe9b177d88 and patch applied for NVIDIA
Megatron-LM-ROCm with commit: 21045b59127cd2d5509f1ca27d81fae7b485bd22 and patch applied for AMD
graphcore/examples (forked version) for Graphcore

Performance is measured in tokens/sec and energy is recorded in Wh.

Requirements

To run the benchmarks, you must install JUBE. Follow the JUBE Installation Documentation for setup instructions. The benchmarks are deployed using Apptainer containers and executed using SLURM on the tested accelerators.

Dataset

Image Classification: Synthetic data is generated on the host machine for benchmarking. The IPU tag synthetic additionally allows for the generation of synthetic data directly on the IPU.
LLM Training: A subset of the OSCAR dataset (790 samples, ~10 MB) is pre-processed using GPT-2 tokenizers. This data is provided in the llm_data directory.

Execution

Clone the repository and navigate into it:

git clone https://github.com/FZJ-JSC/CARAML.git
cd CARAML

Image Classification

Modify system, model parameters in JUBE config
To pull the required container use container tag as:
```
jube run  image_classification/image_classification_torch_benchmark.xml --tag container H100
```
For JSC systems H100 can be replaced with GH200, MI250 and GC200 as required.
To run the benchmark with defined configurations do
```
jube run image_classification/image_classification_torch_benchmark.xml --tag H100
```
H100 can be replaced with A100, WAIH100, GH200, JEDI, MI250 and GC200 as required.

After the benchmark has been executed, use jube continue to postprocess results

jube continue image_classification/image_classification_torch_benchmark._run -i last

To generate result do:

jube result image_classification/image_classification_torch_benchmark._run -i last

LLM-Training

Set the required system and model parameters in llm_benchmark_nvidia_amd.yaml for NVIDIA and AMD devices and in llm_benchmark_ipu.yaml for Graphcore
To run the benchmark with defined configurations for 800M GPT model with OSCAR data do:
```
jube run llm_training/llm_benchmark_nvidia_amd.yaml --tag 800M A100
```
A100 can be replaced with H100, WAIH100, GH200, JEDI and MI250 for the respective systems and 800M can be replaced with 13B and 175B for systems with more node resources like JEDI, H100, A100 and MI250.
To run the benchmark with defined configurations for 117M GPT model on Graphcore with synthetic data do
```
jube run llm_training/llm_benchmark_ipu.yaml --tag 117M synthetic
```
If tag synthetic is not given, the benchmark will use OSCAR data
After the benchmark has been executed, use jube continue to postprocess results
```
jube continue llm_training/llm_benchmark_nvidia_amd_run -i last
```

To generate result do:

jube result llm_training/llm_benchmark_nvidia_amd_run -i last

JSC Specific Fixes

In order to use PyTorch torch run API on JSC systems fixed_torch_run.py fix is required. The fix solves the issue defined here.

Additionally the hostname is appended with an i for allowing communication over InfiniBand as described here.

Citation

@INPROCEEDINGS{10820809,
  author={John, Chelsea Maria and Nassyr, Stepan and Penke, Carolin and Herten, Andreas},
  booktitle={SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis}, 
  title={Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML}, 
  year={2024},
  pages={1164-1176},
  doi={10.1109/SCW63240.2024.00158}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
aux		aux
image_classification		image_classification
llm_training		llm_training
requirements		requirements
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_pytorch_container.sh		get_pytorch_container.sh
get_tensorflow_container.sh		get_tensorflow_container.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CARAML

Tested Accelerators:

Benchmark

1. Computer Vision: Image Classification (Training)

2. GPT Language Model (LLM Training)

Requirements

Dataset

Execution

Image Classification

LLM-Training

JSC Specific Fixes

Citation

About

Uh oh!

Releases

Packages

Languages

License

caropen/CARAML

Folders and files

Latest commit

History

Repository files navigation

CARAML

Tested Accelerators:

Benchmark

1. Computer Vision: Image Classification (Training)

2. GPT Language Model (LLM Training)

Requirements

Dataset

Execution

Image Classification

LLM-Training

JSC Specific Fixes

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages