Skip to content

scale-snu/LLMSimulator

Repository files navigation

LLMSimulator

Overview

LLMSimulator simulator is a c++ based cycle-accurate simulator, which based on graph execution of Large Language Models. This simulator supports state-of-the-art LLMs such as DeepSeek, Llama, Mixtral and etc. This simulator supports not only Multi-Head Attention (MHA) mechanism, but also Grouped-Query Attention(GQA), Multi-Query Attention(MQA) and Multi-head Latent Attention (MLA). LLMSimulator equipped with simulation of Mixture of Expert (MoE). It integrates with modified Ramulator 2.0 for detailed memory modeling. LLMSimulator can evaluate various type of GPU generation such as H100, B100 and B200, and also including bank-level PIM, bank-group-level PIM, and Logic-PIM.

Key features:

  • Supports flexible input/output length, batch sizes, request injection rates, and multi-node hardware configurations
  • Models energy consumption and performance metrics across various memory systems

Prerequisites

  • Compiler: g++ version 11.4.0
  • cmake, clang++

LLMSimulator is tested under the following system.

Getting Started

Building LLMSimulator

  1. Clone the repository
   $ git clone https://github.com/scale-snu/LLMSimulator.git
   $ cd LLMSimulator
   $ git submodule update --init --recursive
  1. Apply patch
   $ cd src/dram/ramulator2
   $ git apply ../../../patch/ramulator2_pim.patch
   $ cd ../../../
  1. Build executable files
   $ mkdir build && cd build
   $ cmake ..
   $ make -j 

How to run

LLMSimulator has config file (config.yaml) and you can modify it with your configuration. After modifying config.yaml and saving it, you can run with command below

   $ ./run > test.log

Contact

Sungmin Yun sungmin.yun@snu.ac.kr

Kwanhee Kyung kwanhee.kyung@scale.snu.ac.kr

Juhwan Cho juhwan.cho@scale.snu.ac.kr

Note

This simulator builds upon the simulator introduced in the MICRO 2024 paper “Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching.”

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages