From b42fc56966157a9902bcb623ee313278bddc17fd Mon Sep 17 00:00:00 2001 From: tianhao909 <843101550@qq.com> Date: Wed, 18 Mar 2026 05:22:19 +0000 Subject: [PATCH 1/3] docs: complete bilingual EN/ZH documentation system for SimAI 1.6 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Build a comprehensive bilingual (English/Chinese) documentation system covering: - SimAI 1.6 Technical Report (EN/ZH) with GPU memory inference architecture - Complete user guides: installation, quickstart, workload generation, simulation - Developer guides: contributing, architecture overview - Project READMEs in EN/ZH/JA with feature highlights and quickstart - vidur-alibabacloud module documentation (EN/ZH + original Vidur README) - CONTRIBUTING, SECURITY, CODE_OF_CONDUCT, CHANGELOG in bilingual format 构建全面的双语(中英文)文档体系,涵盖: - SimAI 1.6 技术报告(中英文),含 GPU 内存推理架构说明 - 完整用户指南:安装、快速开始、工作负载生成、仿真运行 - 开发者指南:贡献指南、架构概览 - 项目 README 中英日三语版本,含功能亮点与快速开始 - vidur-alibabacloud 模块文档(中英文 + 原始 Vidur README) - CONTRIBUTING、SECURITY、CODE_OF_CONDUCT、CHANGELOG 双语版本 Co-authored-by: tianhao909 <843101550@qq.com> Co-authored-by: MXtremist <44829997+MXtremist@users.noreply.github.com> --- CHANGELOG.md | 62 +++ CHANGELOG_CN.md | 62 +++ CODE_OF_CONDUCT.md | 136 ++++++ CODE_OF_CONDUCT_CN.md | 91 ++++ CONTRIBUTING.md | 487 +++++++++++++++++++ CONTRIBUTING.zh-CN.md | 487 +++++++++++++++++++ README.ja.md | 3 +- README.md | 129 +++-- README_CN.md | 273 +++++++++++ SECURITY.md | 47 ++ SECURITY_CN.md | 47 ++ docs/SimAI_1.6_Tech_Report.md | 429 ++++++++++++++++ docs/SimAI_1.6_Tech_Report_CN.md | 429 ++++++++++++++++ docs/en/benchmarking/index.md | 33 ++ docs/en/benchmarking/test_suite.md | 139 ++++++ docs/en/community/index.md | 90 ++++ docs/en/components/aicb.md | 212 ++++++++ docs/en/components/astra_sim.md | 147 ++++++ docs/en/components/index.md | 80 +++ docs/en/components/ns3.md | 175 +++++++ docs/en/components/simccl.md | 82 ++++ docs/en/components/vidur.md | 196 ++++++++ docs/en/developer_guide/adding_models.md | 204 ++++++++ docs/en/developer_guide/architecture.md | 176 +++++++ docs/en/developer_guide/contributing.md | 224 +++++++++ docs/en/developer_guide/extending_ns3.md | 219 +++++++++ docs/en/developer_guide/index.md | 25 + docs/en/getting_started/index.md | 25 + docs/en/getting_started/installation.md | 96 ++++ docs/en/getting_started/quickstart.md | 85 ++++ docs/en/index.md | 73 +++ docs/en/technical_reference/cli_reference.md | 185 +++++++ docs/en/technical_reference/configuration.md | 163 +++++++ docs/en/technical_reference/index.md | 13 + docs/en/technical_reference/memory_module.md | 155 ++++++ docs/en/user_guide/index.md | 25 + docs/en/user_guide/inference_simulation.md | 176 +++++++ docs/en/user_guide/result_analysis.md | 183 +++++++ docs/en/user_guide/simai_analytical.md | 104 ++++ docs/en/user_guide/simai_physical.md | 123 +++++ docs/en/user_guide/simai_simulation.md | 113 +++++ docs/en/user_guide/supported_models.md | 141 ++++++ docs/en/user_guide/workload_generation.md | 160 ++++++ docs/images/simai_dingtalk.jpg | Bin 94912 -> 305265 bytes docs/images/simai_wechat.jpeg | Bin 158457 -> 268153 bytes docs/zh/benchmarking/index.md | 33 ++ docs/zh/benchmarking/test_suite.md | 139 ++++++ docs/zh/community/index.md | 91 ++++ docs/zh/components/aicb.md | 212 ++++++++ docs/zh/components/astra_sim.md | 147 ++++++ docs/zh/components/index.md | 80 +++ docs/zh/components/ns3.md | 175 +++++++ docs/zh/components/simccl.md | 82 ++++ docs/zh/components/vidur.md | 194 ++++++++ docs/zh/developer_guide/adding_models.md | 204 ++++++++ docs/zh/developer_guide/architecture.md | 176 +++++++ docs/zh/developer_guide/contributing.md | 224 +++++++++ docs/zh/developer_guide/extending_ns3.md | 219 +++++++++ docs/zh/developer_guide/index.md | 25 + docs/zh/getting_started/index.md | 25 + docs/zh/getting_started/installation.md | 96 ++++ docs/zh/getting_started/quickstart.md | 85 ++++ docs/zh/index.md | 73 +++ docs/zh/technical_reference/cli_reference.md | 185 +++++++ docs/zh/technical_reference/configuration.md | 163 +++++++ docs/zh/technical_reference/index.md | 13 + docs/zh/technical_reference/memory_module.md | 155 ++++++ docs/zh/user_guide/index.md | 25 + docs/zh/user_guide/inference_simulation.md | 151 ++++++ docs/zh/user_guide/result_analysis.md | 183 +++++++ docs/zh/user_guide/simai_analytical.md | 104 ++++ docs/zh/user_guide/simai_physical.md | 113 +++++ docs/zh/user_guide/simai_simulation.md | 113 +++++ docs/zh/user_guide/supported_models.md | 133 +++++ docs/zh/user_guide/workload_generation.md | 160 ++++++ vidur-alibabacloud/README-vidur.md | 162 +++++- vidur-alibabacloud/README.md | 314 ++++++++---- vidur-alibabacloud/README_CN.md | 458 +++++++++++++++++ 78 files changed, 11068 insertions(+), 148 deletions(-) create mode 100644 CHANGELOG.md create mode 100644 CHANGELOG_CN.md create mode 100644 CODE_OF_CONDUCT.md create mode 100644 CODE_OF_CONDUCT_CN.md create mode 100644 CONTRIBUTING.md create mode 100644 CONTRIBUTING.zh-CN.md create mode 100644 README_CN.md create mode 100644 SECURITY.md create mode 100644 SECURITY_CN.md create mode 100644 docs/SimAI_1.6_Tech_Report.md create mode 100644 docs/SimAI_1.6_Tech_Report_CN.md create mode 100644 docs/en/benchmarking/index.md create mode 100644 docs/en/benchmarking/test_suite.md create mode 100644 docs/en/community/index.md create mode 100644 docs/en/components/aicb.md create mode 100644 docs/en/components/astra_sim.md create mode 100644 docs/en/components/index.md create mode 100644 docs/en/components/ns3.md create mode 100644 docs/en/components/simccl.md create mode 100644 docs/en/components/vidur.md create mode 100644 docs/en/developer_guide/adding_models.md create mode 100644 docs/en/developer_guide/architecture.md create mode 100644 docs/en/developer_guide/contributing.md create mode 100644 docs/en/developer_guide/extending_ns3.md create mode 100644 docs/en/developer_guide/index.md create mode 100644 docs/en/getting_started/index.md create mode 100644 docs/en/getting_started/installation.md create mode 100644 docs/en/getting_started/quickstart.md create mode 100644 docs/en/index.md create mode 100644 docs/en/technical_reference/cli_reference.md create mode 100644 docs/en/technical_reference/configuration.md create mode 100644 docs/en/technical_reference/index.md create mode 100644 docs/en/technical_reference/memory_module.md create mode 100644 docs/en/user_guide/index.md create mode 100644 docs/en/user_guide/inference_simulation.md create mode 100644 docs/en/user_guide/result_analysis.md create mode 100644 docs/en/user_guide/simai_analytical.md create mode 100644 docs/en/user_guide/simai_physical.md create mode 100644 docs/en/user_guide/simai_simulation.md create mode 100644 docs/en/user_guide/supported_models.md create mode 100644 docs/en/user_guide/workload_generation.md create mode 100644 docs/zh/benchmarking/index.md create mode 100644 docs/zh/benchmarking/test_suite.md create mode 100644 docs/zh/community/index.md create mode 100644 docs/zh/components/aicb.md create mode 100644 docs/zh/components/astra_sim.md create mode 100644 docs/zh/components/index.md create mode 100644 docs/zh/components/ns3.md create mode 100644 docs/zh/components/simccl.md create mode 100644 docs/zh/components/vidur.md create mode 100644 docs/zh/developer_guide/adding_models.md create mode 100644 docs/zh/developer_guide/architecture.md create mode 100644 docs/zh/developer_guide/contributing.md create mode 100644 docs/zh/developer_guide/extending_ns3.md create mode 100644 docs/zh/developer_guide/index.md create mode 100644 docs/zh/getting_started/index.md create mode 100644 docs/zh/getting_started/installation.md create mode 100644 docs/zh/getting_started/quickstart.md create mode 100644 docs/zh/index.md create mode 100644 docs/zh/technical_reference/cli_reference.md create mode 100644 docs/zh/technical_reference/configuration.md create mode 100644 docs/zh/technical_reference/index.md create mode 100644 docs/zh/technical_reference/memory_module.md create mode 100644 docs/zh/user_guide/index.md create mode 100644 docs/zh/user_guide/inference_simulation.md create mode 100644 docs/zh/user_guide/result_analysis.md create mode 100644 docs/zh/user_guide/simai_analytical.md create mode 100644 docs/zh/user_guide/simai_physical.md create mode 100644 docs/zh/user_guide/simai_simulation.md create mode 100644 docs/zh/user_guide/supported_models.md create mode 100644 docs/zh/user_guide/workload_generation.md create mode 100644 vidur-alibabacloud/README_CN.md diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 00000000..9ad03ac6 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,62 @@ +
+ 中文  |  English +
+ +# Changelog + +All notable changes to SimAI will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). + +> **Note**: This changelog covers v1.0 (initial open-source release) and later versions. + +## [Unreleased] + +## [1.6.0] - 2026-03-16 + +### Added + +- GPU memory calculation module: accurate parameter counting and KV cache management for DeepSeek-V3-671B, Qwen3-MoE-235B, and Qwen3-Next-80B +- PD-separation memory planning for independent Prefill/Decode memory budgets +- Improved AICB decode time estimation with linear interpolation and global cache +- 4-scenario end-to-end inference test suite (`run_scenarios.sh`) +- SimAI 1.6 Technical Report (EN/ZH) +- Complete bilingual documentation system (30+ files under `docs/en/`, `docs/zh/`) +- GitHub community health files: issue/PR templates, Code of Conduct, Security Policy, Contributing Guide + +### Changed + +- Replaced print statements with logging across vidur-alibabacloud modules +- Added bilingual docstrings for public APIs +- Standardized TODO comments format + +### Removed + +- Removed ~390 lines of dead code in vidur-alibabacloud +- Cleaned personal debug markers across 8 files + +## [1.5.0] - 2025-12-30 + +### Added + +- **End-to-end multi-request inference simulation**: Full simulation support for multi-request inference workloads. +- **Prefill/Decode separation**: Model complex inference scenarios with Prefill/Decode phase separation. +- **Modern model support**: Added support for DeepSeek, Qwen3-MoE, and Qwen3-Next models. +- **Request scheduling via Vidur**: Integrated request scheduling component adapted from Microsoft's [Vidur](https://github.com/microsoft/vidur) (see [vidur-alibabacloud](./vidur-alibabacloud/)). +- **AICB inference workload generation**: AICB now supports generating prefill/decode inference workloads for DeepSeek, Qwen3-MoE, and Qwen3-Next. +- **DeepSeek training workload support**: AICB now supports generating training workloads for DeepSeek (contributed by [@parthpower](https://github.com/parthpower)). +- **SimCCL initial release**: First public release of the SimCCL collective communication transformation module. + +## [1.0.0] - 2024-10-18 + +### Added + +- Initial open-source release of SimAI: full-stack simulator for AI large-scale training +- Core components: AICB, SimCCL, astra-sim-alibabacloud, ns-3-alibabacloud +- SimAI-Analytical: fast simulation using bus bandwidth abstraction +- SimAI-Simulation: full-stack NS3-based network simulation +- SimAI-Physical (Beta): CPU RDMA cluster physical traffic generation + +### Academic + +- SimAI paper accepted by **NSDI'25 Spring**. See [paper](https://arxiv.org/abs/2410.07346). diff --git a/CHANGELOG_CN.md b/CHANGELOG_CN.md new file mode 100644 index 00000000..a9e50a03 --- /dev/null +++ b/CHANGELOG_CN.md @@ -0,0 +1,62 @@ ++ 中文  |  English +
+ +# 更新日志 + +SimAI 的所有重要变更均记录在此文件中。 + +格式基于 [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)。 + +> **注意**:本更新日志涵盖 v1.0(首次开源发布)及之后的版本。 + +## [未发布] + +## [1.6.0] - 2026-03-16 + +### 新增 + +- GPU 内存计算模块:支持 DeepSeek-V3-671B、Qwen3-MoE-235B、Qwen3-Next-80B 的精确参数计数与 KV Cache 管理 +- PD 分离内存规划:Prefill/Decode 阶段独立的内存预算计算 +- 改进 AICB decode 时间估算(首尾线性插值 + 全局缓存) +- 4 场景端到端推理测试套件(`run_scenarios.sh`) +- SimAI 1.6 技术报告(EN/ZH) +- 完整双语文档系统(`docs/en/`、`docs/zh/` 下 30+ 文件) +- GitHub 社区规范文件:Issue/PR 模板、行为准则、安全政策、贡献指南 + +### 变更 + +- vidur-alibabacloud 各模块 print 输出替换为 logging +- 公开 API 添加双语 docstring +- TODO 注释格式统一规范化 + +### 移除 + +- 清理 vidur-alibabacloud 中约 390 行死代码 +- 清理 8 个文件中的个人调试标记 + +## [1.5.0] - 2025-12-30 + +### 新增 + +- **端到端多请求推理仿真**:全面支持多请求推理工作负载的端到端仿真。 +- **Prefill/Decode 分离**:支持 Prefill/Decode 阶段分离等复杂推理场景建模。 +- **主流模型支持**:新增对 DeepSeek、Qwen3-MoE 和 Qwen3-Next 模型的支持。 +- **基于 Vidur 的请求调度**:集成了基于微软 [Vidur](https://github.com/microsoft/vidur) 适配的请求调度组件(详见 [vidur-alibabacloud](./vidur-alibabacloud/))。 +- **AICB 推理工作负载生成**:AICB 现已支持为 DeepSeek、Qwen3-MoE 和 Qwen3-Next 生成 prefill/decode 推理工作负载。 +- **DeepSeek 训练工作负载支持**:AICB 新增 DeepSeek 训练工作负载生成支持(由 [@parthpower](https://github.com/parthpower) 贡献)。 +- **SimCCL 首次发布**:SimCCL 集合通信转换模块首次对外公开发布。 + +## [1.0.0] - 2024-10-18 + +### 新增 + +- SimAI 首次开源发布:业界首个全栈高精度 AI 大规模训练模拟器 +- 核心组件:AICB、SimCCL、astra-sim-alibabacloud、ns-3-alibabacloud +- SimAI-Analytical:基于总线带宽抽象的快速仿真 +- SimAI-Simulation:基于 NS3 的全栈网络仿真 +- SimAI-Physical(Beta):CPU RDMA 集群物理流量生成 + +### 学术 + +- SimAI 论文被 **NSDI'25 Spring** 接收。详见 [论文](https://arxiv.org/abs/2410.07346)。 diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 00000000..1a84e5ce --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,136 @@ ++ 中文  |  English +
+ +# Contributor Covenant Code of Conduct + +## Our Pledge + +We as members, contributors, and leaders pledge to make participation in our +community a harassment-free experience for everyone, regardless of age, body +size, visible or invisible disability, ethnicity, sex characteristics, gender +identity and expression, level of experience, education, socio-economic status, +nationality, personal appearance, race, religion, or sexual identity +and orientation. + +We pledge to act and interact in ways that contribute to an open, welcoming, +diverse, inclusive, and healthy community. + +## Our Standards + +Examples of behavior that contributes to a positive environment for our +community include: + +* Demonstrating empathy and kindness toward other people +* Being respectful of differing opinions, viewpoints, and experiences +* Giving and gracefully accepting constructive feedback +* Accepting responsibility and apologizing to those affected by our mistakes, + and learning from the experience +* Focusing on what is best not just for us as individuals, but for the + overall community + +Examples of unacceptable behavior include: + +* The use of sexualized language or imagery, and sexual attention or + advances of any kind +* Trolling, insulting or derogatory comments, and personal or political attacks +* Public or private harassment +* Publishing others' private information, such as a physical or email + address, without their explicit permission +* Other conduct which could reasonably be considered inappropriate in a + professional setting + +## Enforcement Responsibilities + +Community leaders are responsible for clarifying and enforcing our standards of +acceptable behavior and will take appropriate and fair corrective action in +response to any behavior that they deem inappropriate, threatening, offensive, +or harmful. + +Community leaders have the right and responsibility to remove, edit, or reject +comments, commits, code, wiki edits, issues, and other contributions that are +not aligned to this Code of Conduct, and will communicate reasons for moderation +decisions when appropriate. + +## Scope + +This Code of Conduct applies within all community spaces, and also applies when +an individual is officially representing the community in public spaces. +Examples of representing our community include using an official e-mail address, +posting via an official social media account, or acting as an appointed +representative at an online or offline event. + +## Enforcement + +Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported to the community leaders responsible for enforcement at: + +* yunding.lg@alibaba-inc.com +* xuefeiyang.xfy@alibaba-inc.com +* qingxu.lqx@alibaba-inc.com + +All complaints will be reviewed and investigated promptly and fairly. + +All community leaders are obligated to respect the privacy and security of the +reporter of any incident. + +## Enforcement Guidelines + +Community leaders will follow these Community Impact Guidelines in determining +the consequences for any action they deem in violation of this Code of Conduct: + +### 1. Correction + +**Community Impact**: Use of inappropriate language or other behavior deemed +unprofessional or unwelcome in the community. + +**Consequence**: A private, written warning from community leaders, providing +clarity around the nature of the violation and an explanation of why the +behavior was inappropriate. A public apology may be requested. + +### 2. Warning + +**Community Impact**: A violation through a single incident or series +of actions. + +**Consequence**: A warning with consequences for continued behavior. No +interaction with the people involved, including unsolicited interaction with +those enforcing the Code of Conduct, for a specified period of time. This +includes avoiding interactions in community spaces as well as external channels +like social media. Violating these terms may lead to a temporary or +permanent ban. + +### 3. Temporary Ban + +**Community Impact**: A serious violation of community standards, including +sustained inappropriate behavior. + +**Consequence**: A temporary ban from any sort of interaction or public +communication with the community for a specified period of time. No public or +private interaction with the people involved, including unsolicited interaction +with those enforcing the Code of Conduct, is allowed during this period. +Violating these terms may lead to a permanent ban. + +### 4. Permanent Ban + +**Community Impact**: Demonstrating a pattern of violation of community +standards, including sustained inappropriate behavior, harassment of an +individual, or aggression toward or disparagement of classes of individuals. + +**Consequence**: A permanent ban from any sort of public interaction within +the community. + +## Attribution + +This Code of Conduct is adapted from the [Contributor Covenant][homepage], +version 2.0, available at +https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. + +Community Impact Guidelines were inspired by [Mozilla's code of conduct +enforcement ladder](https://github.com/mozilla/diversity). + +[homepage]: https://www.contributor-covenant.org + +For answers to common questions about this code of conduct, see the FAQ at +https://www.contributor-covenant.org/faq. Translations are available at +https://www.contributor-covenant.org/translations. diff --git a/CODE_OF_CONDUCT_CN.md b/CODE_OF_CONDUCT_CN.md new file mode 100644 index 00000000..79faac72 --- /dev/null +++ b/CODE_OF_CONDUCT_CN.md @@ -0,0 +1,91 @@ ++ 中文  |  English +
+ +# 贡献者公约行为准则 + +## 我们的承诺 + +作为成员、贡献者和领导者,我们承诺让参与我们社区的每个人都能获得无骚扰的体验,无论其年龄、体型、可见或不可见的残疾、种族、性别特征、性别认同与表达、经验水平、受教育程度、社会经济地位、国籍、个人外貌、种族、宗教或性取向如何。 + +我们承诺以有助于建设开放、包容、多元、友好和健康社区的方式行事与互动。 + +## 我们的标准 + +有助于为我们社区创造积极环境的行为示例包括: + +* 对他人表现出同理心和善意 +* 尊重不同的意见、观点和经历 +* 给予建设性的反馈,并能优雅地接受建设性反馈 +* 承担责任,向受到我们错误影响的人道歉,并从中学习 +* 关注对整个社区最有利的事情,而不仅仅是对个人最有利的事情 + +不可接受的行为示例包括: + +* 使用性暗示的语言或图像,以及任何形式的性关注或性挑逗 +* 恶意评论、侮辱性或贬损性言论,以及人身攻击或政治攻击 +* 公开或私下的骚扰行为 +* 未经明确许可,发布他人的私人信息(如实际地址或电子邮件地址) +* 在职业环境中可能被合理认为不适当的其他行为 + +## 执行职责 + +社区领导者负责阐明和执行我们的可接受行为标准,并将对任何被认为不适当、威胁性、冒犯性或有害的行为采取适当且公平的纠正措施。 + +社区领导者有权利和责任删除、编辑或拒绝与本行为准则不一致的评论、提交、代码、Wiki 编辑、Issue 及其他贡献,并在适当时说明审核决定的原因。 + +## 适用范围 + +本行为准则适用于所有社区空间,也适用于个人在公共场合正式代表社区的情形。代表我们社区的示例包括:使用官方电子邮件地址、通过官方社交媒体账户发帖,或在线上或线下活动中担任指定代表。 + +## 执行 + +若发生滥用、骚扰或其他不可接受的行为,可向负责执行的社区领导者举报: + +* yunding.lg@alibaba-inc.com +* xuefeiyang.xfy@alibaba-inc.com +* qingxu.lqx@alibaba-inc.com + +所有投诉都将得到及时、公平的审查和调查。 + +所有社区领导者都有义务尊重任何事件举报者的隐私和安全。 + +## 执行指南 + +社区领导者在确定违反本行为准则的行为的处理后果时,将遵循以下社区影响指南: + +### 1. 纠正 + +**社区影响**:使用不恰当的语言或其他被认为在社区中不专业或不受欢迎的行为。 + +**处理结果**:由社区领导者发出私下书面警告,说明违规行为的性质,并解释为何该行为不恰当。可能要求公开道歉。 + +### 2. 警告 + +**社区影响**:通过单一事件或一系列行为造成的违规。 + +**处理结果**:发出警告,并说明持续此类行为的后果。在规定时间内,禁止与相关人员互动,包括禁止与执行行为准则的人员进行未经请求的互动。这包括避免在社区空间以及社交媒体等外部渠道进行互动。违反这些条款可能导致临时或永久禁止参与。 + +### 3. 临时禁止 + +**社区影响**:严重违反社区标准,包括持续的不适当行为。 + +**处理结果**:在规定时间内,临时禁止与社区进行任何形式的互动或公开交流。在此期间,禁止与相关人员进行任何公开或私下互动,包括禁止与执行行为准则的人员进行未经请求的互动。违反这些条款可能导致永久禁止参与。 + +### 4. 永久禁止 + +**社区影响**:表现出违反社区标准的规律性行为,包括持续的不适当行为、骚扰某个人,或对某类人群的攻击或贬低。 + +**处理结果**:永久禁止在社区内进行任何形式的公开互动。 + +## 署名 + +本行为准则改编自 [贡献者公约][homepage] 2.0 版本, +原文见 https://www.contributor-covenant.org/version/2/0/code_of_conduct.html。 + +社区影响指南参考了 [Mozilla 的行为准则执行阶梯](https://github.com/mozilla/diversity)。 + +[homepage]: https://www.contributor-covenant.org + +有关本行为准则常见问题的解答,请参阅 https://www.contributor-covenant.org/faq。 +译文见 https://www.contributor-covenant.org/translations。 diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000..a6a3658b --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,487 @@ +# Contributing to SimAI + +[中文版](CONTRIBUTING.zh-CN.md) + +Thank you for your interest in contributing to SimAI! This guide will help you get started with contributing code, documentation, and ideas. + +--- + +## What We're Building + +**Vision**: The industry's first full-stack, high-precision simulator for AI large-scale inference and training. + +**Goal**: Provide end-to-end modeling and simulation of AI training/inference processes—encompassing framework, collective communication, network layers, and more—so researchers can analyze performance, evaluate optimizations, and explore infrastructure designs without real hardware. + +**Current Progress**: SimAI 1.5 released (Dec 2025), with end-to-end multi-request inference simulation, DeepSeek/Qwen3 model support, and Prefill/Decode separation. + +**Academic Background**: Accepted by NSDI'25 Spring. See our [paper](https://arxiv.org/abs/2410.07346) for technical details. + +--- + +## How to Contribute + +### Ways to Contribute + +1. **New features** — Add model support, parallelism strategies, scheduling policies, etc. +2. **Bug fixes** — Fix simulation inaccuracies, crashes, or incorrect results +3. **Performance optimization** — Improve simulation speed, memory usage, or scalability +4. **Documentation** — Improve tutorials, add examples, fix errors +5. **Benchmarks & validation** — Add validation against real hardware results +6. **Issue reports** — Report bugs, request features, or share feedback + +--- + +## Project Architecture + +SimAI is a modular project composed of 5 core submodules (Git submodules) and several supporting directories: + +``` +SimAI/ +├── aicb/ # AI Computation Benchmark — workload generation (Python) +│ ├── workload_generator/ # Generates training/inference workloads +│ └── aicb.py # Main entry point +├── astra-sim-alibabacloud/ # Simulation engine — core simulator (C++) +│ ├── astra-sim/ # Extended from astra-sim 1.0 +│ └── build.sh # Build script +├── ns-3-alibabacloud/ # NS-3 network simulator backend (C++) +├── vidur-alibabacloud/ # LLM inference simulation (Python) +│ ├── vidur/ # Core simulation framework +│ └── setup.py # Python package config +├── SimCCL/ # Collective communication transformation +├── docs/ # Documentation and tutorials +├── example/ # Example workloads and configurations +├── scripts/ # Build and utility scripts +│ └── build.sh # Main build script +├── results/ # Simulation output directory +├── bin/ # Compiled binary output +├── Dockerfile # Docker container definition +└── README.md # Project documentation +``` + +--- + +## Development Environment Setup + +### Prerequisites + +- **Python** 3.8+ (3.12 recommended with Docker image) +- **CMake** 3.16+ +- **GCC/G++** 9.4+ +- **Git** with submodule support + +### Option A: Docker (Recommended) + +```bash +# Build the Docker image +docker build -t simai:latest . + +# Run a container with GPU support +docker run --gpus all -it --rm \ + -v $(pwd)/results:/workspace/SimAI/results \ + simai:latest /bin/bash +``` + +### Option B: Build from Source + +```bash +# 1. Clone with submodules +git clone --recurse-submodules https://github.com/aliyun/SimAI.git +cd SimAI + +# 2. Build C++ components (choose one mode) +# Analytical mode (fast, no network detail): +bash scripts/build.sh -c analytical + +# NS-3 simulation mode (full-stack, detailed network): +bash scripts/build.sh -c ns3 + +# Physical mode (beta, RDMA clusters): +bash scripts/build.sh -c phy + +# 3. Install Python dependencies +pip install -r aicb/requirements.txt +pip install -r vidur-alibabacloud/requirements.txt + +# 4. Verify the build +ls bin/ # Should contain SimAI_analytical or SimAI_simulator +``` + +### Verify Installation + +```bash +# Quick check: run a simple analytical simulation +cd bin +./SimAI_analytical \ + --workload_path=../example/workload_analytical.txt \ + --comm_group_type=TP_GROUP \ + --busbw_path=../example/busbw.yaml +``` + +--- + +## Working with Submodules + +SimAI uses Git submodules for its core components. Understanding this is crucial for contributing. + +### Submodule Overview + +| Submodule | Repository | Language | Description | +|-----------|-----------|----------|-------------| +| `aicb` | [aliyun/aicb](https://github.com/aliyun/aicb) | Python | Workload generation | +| `SimCCL` | [aliyun/SimCCL](https://github.com/aliyun/SimCCL) | Python | Collective communication | +| `ns-3-alibabacloud` | [aliyun/ns-3-alibabacloud](https://github.com/aliyun/ns-3-alibabacloud) | C++ | Network simulation | +| `astra-sim-alibabacloud` | In-tree | C++ | Simulation engine | +| `vidur-alibabacloud` | In-tree | Python | Inference simulation | + +### Key Rules + +1. **Submodules have independent Git histories.** Changes inside a submodule directory are tracked by that submodule's own repo, not the parent. +2. **The parent repo only tracks the commit hash** of each submodule. After modifying a submodule, you must commit in both the submodule and the parent repo. +3. **Always initialize submodules** after cloning: + ```bash + git submodule update --init --recursive + ``` + +### Cross-Submodule Changes + +If your contribution spans multiple submodules (e.g., adding a new model in `aicb` and simulation support in `astra-sim-alibabacloud`): + +1. Make and commit changes in each submodule separately +2. Update the parent repo to point to the new submodule commits +3. Create separate PRs for each submodule repository if they have independent remotes +4. Reference the related PRs in your descriptions + +--- + +## Development Workflow + +### Step 1: Fork and Clone + +```bash +# Fork the repository on GitHub, then: +git clone --recurse-submodules https://github.com/YOUR_USERNAME/SimAI.git +cd SimAI + +# Add upstream remote +git remote add upstream https://github.com/aliyun/SimAI.git +``` + +### Step 2: Create a Feature Branch + +```bash +# Sync with upstream first +git fetch upstream +git checkout -b feature/your-feature-name upstream/master + +# Branch naming conventions: +# feature/xxx — New features +# fix/xxx — Bug fixes +# docs/xxx — Documentation +# perf/xxx — Performance improvements +# refactor/xxx — Code refactoring +``` + +### Step 3: Develop and Test + +```bash +# Make your changes... +# Run relevant tests (see Testing section below) + +# For C++ changes, rebuild: +bash scripts/build.sh -c analytical # or ns3 + +# For Python changes, verify imports and basic functionality +python -c "from aicb import ..." +``` + +### Step 4: Commit Your Changes + +```bash +# Stage your changes +git add -A + +# Commit with a descriptive message (see Commit Convention below) +git commit -m "feat(aicb): add Llama-4 model workload generation" +``` + +### Step 5: Push and Create PR + +```bash +# Push to your fork +git push origin feature/your-feature-name + +# Then create a Pull Request on GitHub +``` + +--- + +## Code Style + +### Python + +- **Formatter**: [black](https://github.com/psf/black) (default settings) +- **Import sorting**: [isort](https://pycqa.github.io/isort/) (compatible with black) +- **Linter**: [flake8](https://flake8.pycqa.org/) +- **Max line length**: 120 characters + +```bash +# Format your Python code +black --line-length 120 your_file.py +isort your_file.py +flake8 your_file.py --max-line-length 120 +``` + +### C++ + +- Follow the existing code style in `astra-sim-alibabacloud/` +- Use 4-space indentation +- Keep function and variable names in `snake_case` +- Add comments for non-trivial logic + +### Shell Scripts + +- Use `#!/bin/bash` shebang +- Quote all variables: `"${VAR}"` not `$VAR` +- Use `set -e` for error handling where appropriate + +### General Rules + +- Write comments in **English** +- All new functions/classes should have docstrings or header comments +- Avoid hardcoded paths; use relative paths or configuration variables +- Keep changes focused — one feature/fix per PR + +--- + +## Commit Message Convention + +Use [Conventional Commits](https://www.conventionalcommits.org/) format: + +``` ++ 中文  |  English +
+ +# SimAI + +[](LICENSE) +[](https://ennanzhai.github.io/pub/nsdi25spring-simai.pdf) + # Latest News ### Recent Updates +- [2026/03] **SimAI 1.6 Released!** This release adds GPU memory modeling for inference simulation. Key features include: + + - **GPU Memory Module:** Accurate parameter counting and KV cache management for DeepSeek-V3-671B, Qwen3-MoE-235B, and Qwen3-Next-80B. See [SimAI 1.6 Tech Report](./docs/SimAI_1.6_Tech_Report.md). + - **PD-Separation Memory Planning:** Independent parameter memory and KV cache budget calculation for Prefill and Decode phases. See [memory_planner.py](./vidur-alibabacloud/vidur/scheduler/utils/memory_planner.py). + - **Improved Decode Time Estimation:** Linear interpolation replacing nearest-neighbor for AICB decode time prediction, with global cache for cross-run reuse. See [execution_time.py](./vidur-alibabacloud/vidur/entities/execution_time.py). + - **4-Scenario Test Suite:** End-to-end validation covering Qwen3-Next-80B, DeepSeek-671B, and Qwen3-MoE-235B. See [run_scenarios.sh](./vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh). + - **Bilingual Documentation:** Complete EN/ZH documentation system. See [English Docs](./docs/en/index.md) | [中文文档](./docs/zh/index.md). + - **GitHub Community Health Files:** Added [Issue Templates](./.github/ISSUE_TEMPLATE/), [PR Template](./.github/pull_request_template.md), [Code of Conduct](./CODE_OF_CONDUCT.md), [Security Policy](./SECURITY.md), [Contributing Guide](./CONTRIBUTING.md), and [Changelog](./CHANGELOG.md). + - **Code Quality:** Replaced print with logging, added bilingual docstrings, removed ~390 lines of dead code, standardized TODOs, and added type annotations across vidur-alibabacloud modules. + - [2025/12] **SimAI 1.5 Released!** This release brings end-to-end simulation for multi-request **inference** workloads. Key features include: - - - **Advanced Inference Simulation:** Model complex scenarios with Prefill/Decode separation. - - **Modern Model Support:** Now includes DeepSeek, Qwen3Moe and Qwen3Next. See [AICB's README](./aicb/README.md) for more detailed information. - - **Request Scheduling:** Request scheduling is now handled by a component adapted from Microsoft's [Vidur](https://github.com/microsoft/vidur). See [Vidur-Alibabacloud's README](./vidur-alibabacloud/README.md) for more detailed information. + + - **Advanced Inference Simulation:** Model complex scenarios with Prefill/Decode separation. + - **Modern Model Support:** Now includes DeepSeek, Qwen3Moe and Qwen3Next. See [AICB's README](./aicb/README.md) for more detailed information. + - **Request Scheduling:** Request scheduling is now handled by a component adapted from Microsoft's [Vidur](https://github.com/microsoft/vidur). See [Vidur-Alibabacloud's README](./vidur-alibabacloud/README.md) for more detailed information. - [2025/11] [AICB](https://github.com/aliyun/aicb/tree/master) now supports generating **prefill/decode** inference workloads for **DeepSeek**, **Qwen3-MoE** and **Qwen3-Next**. @@ -14,7 +33,8 @@ - [2025/06] The code of SimCCL is first released in the branch [SimCCL](https://github.com/aliyun/SimAI/tree/SimCCL) and will be released in SimCCL repository soon. -**We warmly welcome contributions from the community!** If you are interested in helping shape the future of SimAI, please feel free to open an issue to discuss your ideas or submit a pull request. +**We warmly welcome contributions from the community!** If you are interested in helping shape the future of SimAI, please feel free to open an issue to discuss your ideas or submit a pull request. +
+ 中文  |  English +
+ +# SimAI + +[](LICENSE) +[](https://ennanzhai.github.io/pub/nsdi25spring-simai.pdf) + +# 最新动态 + +### 近期更新 + +- [2026/03] **SimAI 1.6 正式发布!** 本版本新增推理仿真的 GPU 内存建模能力。主要特性包括: + + - **GPU 内存计算模块:** 支持 DeepSeek-V3-671B、Qwen3-MoE-235B、Qwen3-Next-80B 的精确参数计数与 KV Cache 管理。详见 [SimAI 1.6 技术报告](./docs/SimAI_1.6_Tech_Report_CN.md)。 + - **PD 分离内存规划:** Prefill 与 Decode 阶段独立的参数内存和 KV Cache 预算计算。详见 [memory_planner.py](./vidur-alibabacloud/vidur/scheduler/utils/memory_planner.py)。 + - **Decode 时间估算改进:** 首尾线性插值替代最近邻的 AICB decode 时间预测,全局缓存支持跨运行复用。详见 [execution_time.py](./vidur-alibabacloud/vidur/entities/execution_time.py)。 + - **4 场景端到端测试:** 覆盖 Qwen3-Next-80B、DeepSeek-671B、Qwen3-MoE-235B 的完整验证套件。详见 [run_scenarios.sh](./vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh)。 + - **双语文档体系:** 全套 EN/ZH 文档系统。详见 [English Docs](./docs/en/index.md) | [中文文档](./docs/zh/index.md)。 + - **GitHub 社区规范文件:** 新增 [Issue 模板](./.github/ISSUE_TEMPLATE/)、[PR 模板](./.github/pull_request_template.md)、[行为准则](./CODE_OF_CONDUCT.md) ([中文](./CODE_OF_CONDUCT_CN.md))、[安全政策](./SECURITY.md) ([中文](./SECURITY_CN.md))、[贡献指南](./CONTRIBUTING.md) ([中文](./CONTRIBUTING.zh-CN.md)) 和 [更新日志](./CHANGELOG.md) ([中文](./CHANGELOG_CN.md))。 + - **代码质量提升:** logging 替换 print 输出、双语 docstring、清理 ~390 行死代码、TODO 规范化、类型标注补全。 + +- [2025/12] **SimAI 1.5 正式发布!** 本版本新增对多请求**推理**工作负载的端到端仿真支持,主要特性包括: + + - **高级推理仿真:** 支持 Prefill/Decode 分离等复杂场景建模。 + - **主流模型支持:** 新增 DeepSeek、Qwen3Moe 和 Qwen3Next 模型。详见 [AICB README](./aicb/README.md)。 + - **请求调度:** 请求调度组件基于微软 [Vidur](https://github.com/microsoft/vidur) 适配,详见 [Vidur-Alibabacloud README](./vidur-alibabacloud/README_CN.md)。 + +- [2025/11] [AICB](https://github.com/aliyun/aicb/tree/master) 新增对 **DeepSeek**、**Qwen3-MoE** 和 **Qwen3-Next** 的 **prefill/decode** 推理工作负载生成支持。 + +- [2025/09] [AICB](https://github.com/aliyun/aicb/tree/master) 新增 DeepSeek 训练工作负载生成支持。感谢 [@parthpower](https://github.com/parthpower) 的贡献。 + +- [2025/06] SimCCL 代码首次在 [SimCCL](https://github.com/aliyun/SimAI/tree/SimCCL) 分支发布,后续将在独立仓库正式开源。 + +**欢迎社区贡献!** 如有想法,欢迎提交 Issue 讨论或发起 Pull Request。 + ++ |--- AICB +SimAI --|--- SimCCL + |--- astra-sim-alibabacloud + |--- ns-3-alibabacloud + |--- vidur-alibabacloud ++ +在纯仿真能力基础上,SimAI 已演进为一个由四个组件([aicb](https://github.com/aliyun/aicb)、[SimCCL](https://github.com/aliyun/SimCCL)、[astra-sim-alibabacloud](https://github.com/aliyun/SimAI/tree/master/astra-sim-alibabacloud)、[ns-3-alibabacloud](https://github.com/aliyun/ns-3-alibabacloud))构成的全栈工具套件。这些组件可以灵活组合以实现不同功能。我们鼓励用户探索更多可能性。 + +下图为 SimAI 模拟器架构图: + + +astra-sim-alibabacloud 基于 [astra-sim](https://github.com/astra-sim/astra-sim/tree/ASTRA-sim-1.0) 扩展开发。感谢 astra-sim 团队的优秀工作和开源贡献。我们在其基础上集成了 NCCL 算法并添加了若干新特性。 + +## 应用场景 + +SimAI 支持三种主要运行模式: + +**SimAI-Analytical** 通过使用总线带宽(busbw)抽象网络通信细节来估算集合通信时间,实现快速仿真。目前支持用户自定义 busbw,自动计算 busbw 功能即将推出。 + +**SimAI-Simulation** 提供基于细粒度网络通信建模的全栈仿真。利用 NS-3 或其他网络模拟器(当前 NS-3 已开源)实现对所有通信行为的详细仿真,力求高保真还原真实训练环境。 + +**SimAI-Physical** *(Beta)* 支持在 CPU RDMA 集群环境下生成物理流量,通过生成类 NCCL 的流量模式深入研究 LLM 训练中的 NIC 行为。当前处于内测阶段。 + +| 场景 | 描述 | 组件组合 | +|------|------|----------| +| 1. AICB 测试套件 | 在 GPU 集群上使用 AICB 测试套件运行通信模式 | [AICB](https://github.com/aliyun/aicb) | +| 2. AICB/AIOB 工作负载 | 建模**推理**/训练过程的计算/通信模式以生成工作负载 | [AICB](https://github.com/aliyun/aicb) | +| 3. 集合通信分析 | 将集合通信操作分解为点对点通信集合 | [SimCCL](https://github.com/aliyun/SimCCL) | +| 4. 无 GPU 集合通信 | 在非 GPU 集群上执行 RDMA 集合通信流量 | [AICB](https://github.com/aliyun/aicb) + [SimCCL](https://github.com/aliyun/SimCCL) + [astra-sim-alibabacloud](https://github.com/aliyun/SimAI/tree/master/astra-sim-alibabacloud)(physical) | +| 5. SimAI-Analytical | 在任意服务器上快速进行 AICB 工作负载分析与仿真(忽略底层网络细节) | [AICB](https://github.com/aliyun/aicb) + [astra-sim-alibabacloud](https://github.com/aliyun/SimAI/tree/master/astra-sim-alibabacloud)(analytical) | +| 6. SimAI-Simulation | 在任意服务器上进行全栈仿真 | [AICB](https://github.com/aliyun/aicb) + [SimCCL](https://github.com/aliyun/SimCCL) + [astra-sim-alibabacloud](https://github.com/aliyun/SimAI/tree/master/astra-sim-alibabacloud)(simulation) + [ns-3-alibabacloud](https://github.com/aliyun/ns-3-alibabacloud) | +| 7. 多请求推理仿真 | 在单 GPU 服务器上进行多请求**推理**全栈仿真 | [AICB](https://github.com/aliyun/aicb) + [SimCCL](https://github.com/aliyun/SimCCL) + [vidur-alibabacloud](./vidur-alibabacloud) + [astra-sim-alibabacloud](https://github.com/aliyun/SimAI/tree/master/astra-sim-alibabacloud)(analytical/simulation) | + +## 引用 + +SimAI 论文已被 NSDI'25 Spring 接收,详情请参阅: + +*SimAI: Unifying Architecture Design and Performance Tuning for Large-Scale Large Language Model Training with Scalability and Precision.* + +[[pdf](https://ennanzhai.github.io/pub/nsdi25spring-simai.pdf)] / [[slides](./docs/SimAI_Intro_Online.pdf)] / [[video](https://n.dingtalk.com/dingding/live-room/index.html?roomId=OF5BkBUXVxmgsK7x&liveUuid=305736cd-aa70-498b-8003-2b471a53decd)] + +欢迎基于 SimAI 开展创新研究和功能扩展。欢迎加入社区群或通过邮件联系我们交流,我们可提供技术支持。 + +# 快速开始 + +以下为简单示例。完整教程请参见:[**SimAI@Tutorial**](./docs/Tutorial.md)、[**aicb@Tutorial**](https://github.com/aliyun/aicb/blob/master/training/tutorial.md)、[SimCCL@Tutorial]、[ns-3-alibabacloud@Tutorial] + +## 环境搭建 + +请按照以下步骤快速搭建环境并运行 SimAI。 + +### 从源码安装 + +以下步骤已在 Ubuntu 20.04 的 GCC/G++ 9.4.0、python 3.8.10 环境下验证。 + +可使用官方 Ubuntu 20.04 镜像,**不要安装 ninja**。 + +(对于工作负载生成场景,推荐直接使用 NGC 容器镜像。) + +```bash +# 克隆仓库 +$ git clone https://github.com/aliyun/SimAI.git +$ cd ./SimAI/ + +# 初始化子模块 +$ git submodule update --init --recursive +# 更新到最新提交 +$ git submodule update --remote + +# 编译 SimAI-Analytical +$ ./scripts/build.sh -c analytical + +# 编译 SimAI-Simulation (ns3) +$ ./scripts/build.sh -c ns3 +``` + +## 使用 SimAI-Analytical + +```bash +$ ./bin/SimAI_analytical -w example/workload_analytical.txt -g 9216 -g_p_s 8 -r test- -busbw example/busbw.yaml +``` + +若需自动计算总线带宽,请尝试: + +```bash +$ ./bin/SimAI_analytical -w ./example/workload_analytical.txt -g 9216 -nv 360 -nic 48.5 -n_p_s 8 -g_p_s 8 -r example- +``` + +## 使用 SimAI-Simulation + +```bash +# 生成网络拓扑 +$ python3 ./astra-sim-alibabacloud/inputs/topo/gen_Topo_Template.py -topo Spectrum-X -g 128 -gt A100 -bw 100Gbps -nvbw 2400Gbps + +# 运行仿真 +$ AS_SEND_LAT=3 AS_NVLS_ENABLE=1 ./bin/SimAI_simulator -t 16 -w ./example/microAllReduce.txt -n ./Spectrum-X_128g_8gps_100Gbps_A100 -c astra-sim-alibabacloud/inputs/config/SimAI.conf +``` + +## 使用多请求推理仿真 + +详情请参见 `vidur-alibabacloud` 目录下的 [README](./vidur-alibabacloud/README_CN.md)。该模块利用 AICB 对**推理**工作负载的计算时间进行 profiling。由于依赖 DeepGEMM 和 FlashMLA 等特定硬件加速库,目前仅兼容基于 **Hopper(SM90)** 和 **Blackwell(SM100)** 架构的 NVIDIA GPU。 + +```bash +# 从 Dockerfile 构建 +docker build -t image:latest . +docker run --gpus all -it --rm image:latest +``` + +**注意:** 若使用 Hopper GPU,请在 Dockerfile 中添加 `ENV FLASH_MLA_DISABLE_SM100=1`。 + +如需快速验证所有支持的推理场景(Qwen3-Next-80B、DeepSeek-671B、Qwen3-MoE-235B),可使用内置的四场景测试套件: + +```bash +# 前置条件:conda activate vidur +bash vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh --all +# 或单独运行某个场景: +bash vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh --scenario 1 +``` + +> **前置条件:** 需先激活 `conda activate vidur` 环境。详见 [环境配置](./vidur-alibabacloud/README_CN.md#-环境配置)。 +> +> 完整场景配置表与输出文件说明请参见 [Vidur-AlibabaCloud README](./vidur-alibabacloud/README_CN.md#四场景配置说明)。 + +# 致谢 + +衷心感谢以下人员和机构对本项目的贡献: + + +- TianHao Fu (Peking University) and [TELOS-syslab](https://github.com/TELOS-syslab/) +- Parth Parikh (KEYSIGHT) +- Sarah-Michelle Hammer & Ziyi Wang (TU-Berlin) +- Xinyue Li (BUPT) +- Tong Chen (Zhejiang University) +- Ming Wang (BUPT) +- Tao Jiang (Institute of Computing Technology, Chinese Academy of Sciences) + +……以及众多来自社区的个人贡献者(详见 [Contributors to aliyun/SimAI](https://github.com/aliyun/SimAI/graphs/contributors))。 + +同时感谢 Chenning Li(MIT CSAIL)发起了将 SimAI 集成到 [M4](https://github.com/netiken/m4) 的合作——M4 是一个新型创新模拟器。 + +**本项目持续欢迎更多贡献与建议。** + +# 贡献指南 + +欢迎参与贡献!开始前请阅读以下指引: + +| | | +|---|---| +| [贡献指南](./CONTRIBUTING.zh-CN.md) | 如何提交 Issue 和 Pull Request | +| [安全政策](./SECURITY_CN.md) | 如何报告安全漏洞 | +| [行为准则](./CODE_OF_CONDUCT_CN.md) | 社区行为规范 | +| [更新日志](./CHANGELOG_CN.md) | v1.5 起的版本历史 | + +# 联系我们 + +如有任何问题,欢迎发送邮件至:Gang Lu(yunding.lg@alibaba-inc.com)、Feiyang Xue(xuefeiyang.xfy@alibaba-inc.com)或 Qingxu Li(qingxu.lqx@alibaba-inc.com)。 + +欢迎加入 SimAI 社区交流群,左侧为钉钉群,右侧为微信群。 + +
+
++ 中文  |  English +
+ +# Security Policy + +## Reporting a Vulnerability + +The SimAI team takes security issues seriously. We appreciate your efforts to responsibly disclose any security vulnerabilities you find. + +**Please do NOT report security vulnerabilities through public GitHub issues.** + +Instead, please report them via email to: + +- Gang Lu: yunding.lg@alibaba-inc.com +- Feiyang Xue: xuefeiyang.xfy@alibaba-inc.com +- Qingxu Li: qingxu.lqx@alibaba-inc.com + +Please include the following information in your report: + +- Description of the vulnerability +- Steps to reproduce the issue +- Potential impact +- Suggested fix (if any) + +## Response Timeline + +- We will acknowledge receipt of your vulnerability report within **3 business days**. +- We will provide a more detailed response within **10 business days**, indicating the next steps for handling your report. +- We will keep you informed of the progress towards a fix and full announcement. + +## Supported Versions + +| Version | Supported | +|---------|--------------------| +| 1.5.x | :white_check_mark: | +| < 1.5 | :x: | + +## Disclosure Policy + +When we receive a security bug report, we will: + +1. Confirm the problem and determine the affected versions. +2. Audit code to find any similar problems. +3. Prepare fixes and release them as soon as possible. + +Thank you for helping keep SimAI and its users safe! diff --git a/SECURITY_CN.md b/SECURITY_CN.md new file mode 100644 index 00000000..87c911be --- /dev/null +++ b/SECURITY_CN.md @@ -0,0 +1,47 @@ ++ 中文  |  English +
+ +# 安全政策 + +## 报告漏洞 + +SimAI 团队高度重视安全问题。我们感谢您以负责任的方式披露所发现的安全漏洞。 + +**请勿通过公开的 GitHub Issue 报告安全漏洞。** + +请通过电子邮件将漏洞报告发送至以下联系人: + +- Gang Lu:yunding.lg@alibaba-inc.com +- Feiyang Xue:xuefeiyang.xfy@alibaba-inc.com +- Qingxu Li:qingxu.lqx@alibaba-inc.com + +报告中请包含以下信息: + +- 漏洞描述 +- 复现步骤 +- 潜在影响 +- 修复建议(如有) + +## 响应时间线 + +- 我们将在 **3 个工作日**内确认收到您的漏洞报告。 +- 我们将在 **10 个工作日**内提供更详细的回复,说明后续处理步骤。 +- 我们将持续向您通报修复进展及公告发布情况。 + +## 支持的版本 + +| 版本 | 是否支持 | +|---------|-----------------------| +| 1.5.x | :white_check_mark: | +| < 1.5 | :x: | + +## 披露政策 + +收到安全漏洞报告后,我们将: + +1. 确认问题并确定受影响的版本。 +2. 对代码进行审查,排查类似问题。 +3. 尽快准备并发布修复补丁。 + +感谢您帮助保障 SimAI 及其用户的安全! diff --git a/docs/SimAI_1.6_Tech_Report.md b/docs/SimAI_1.6_Tech_Report.md new file mode 100644 index 00000000..b00809b7 --- /dev/null +++ b/docs/SimAI_1.6_Tech_Report.md @@ -0,0 +1,429 @@ ++ 中文  |  English +
+ +# SimAI 1.6 Technical Report + +> This report covers all features from SimAI 1.5 as well as the new enhancements introduced in SimAI 1.6. + +## 1. Overview + +**SimAI** is the industry's first full-stack, high-precision **Sim**ulator for **AI** large-scale inference and training, open-sourced by Alibaba Cloud. SimAI provides detailed modeling and simulation of the entire LLM inference and training process, encompassing the framework layer, collective communication layer, and network transport layer, delivering end-to-end performance data. The SimAI paper was accepted by NSDI'25 Spring [1]. + +SimAI 1.6 builds upon SimAI 1.5 with further enhancements, primarily introducing the **GPU Memory Calculation Module** (supporting accurate parameter counting and KV cache management for DeepSeek-V3-671B, Qwen3-MoE-235B, and Qwen3-Next-80B), a **4-Scenario End-to-End Test Suite**, and comprehensive code quality improvements (bilingual documentation, logging system, dead code cleanup, etc.). + +### Component Overview + +``` + |--- AICB (Workload generation & compute profiling) +SimAI --|--- SimCCL (Collective communication algorithm analysis) + |--- astra-sim-alibabacloud (Simulation engine: Analytical / Simulation / Physical) + |--- ns-3-alibabacloud (NS-3 network backend) + |--- vidur-alibabacloud (Multi-request inference scheduling & memory management) +``` + +--- + +## 2. Key Milestones + +The following are the key development events from November 2025 to March 2026: + +| Date | Event | Description | +|------|-------|-------------| +| 2025/11 | AICB PR [#58](https://github.com/aliyun/aicb/pull/58) | AICB adds inference workload generation with prefill/decode phase separation, supporting DeepSeek, Qwen3-MoE, and Qwen3-Next | +| 2025/12 | AICB PR [#60](https://github.com/aliyun/aicb/pull/60) | AICB further update, refining inference workload generation | +| 2025/12 | SimAI PR [#203](https://github.com/aliyun/SimAI/pull/203) | SimAI 1.5 core update: end-to-end inference simulation, PD disaggregation, Vidur scheduling integration, modern model support | +| 2025/12 | ns-3 commit [7e3cb5b](https://github.com/aliyun/ns-3-alibabacloud/commit/7e3cb5b88c99abcb582c5abc3919484a4805111b) | ns-3-alibabacloud README documentation enhancement with detailed NS3 backend modifications | +| 2026/01 | Memory module commits | Completed accurate memory calculation for DeepSeek-V3-671B, Qwen3-Next-80B, and Qwen3-MoE-235B | +| 2026/02 | PD disaggregation memory planning | Implemented independent parameter memory and KV cache budget calculation for Prefill/Decode phases | +| 2026/03 | Code quality improvements | Comprehensive bilingual comments/docs/logs, dead code cleanup, TODO standardization, type annotations | + +--- + +## 3. End-to-End Inference Simulation + +SimAI supports complete multi-request LLM inference simulation with the following core features: + +### 3.1 Prefill-Decode (PD) Disaggregation Architecture + +The inference process is divided into two phases: + +- **Prefill phase**: Processes all input prompt tokens and generates the first output token (compute-intensive) +- **Decode phase**: Autoregressively generates subsequent output tokens one at a time (memory-bandwidth-intensive) + +PD disaggregation allows deploying Prefill and Decode phases on different GPU nodes, enabling: +- Elastic resource allocation (Prefill nodes can be configured with more compute, Decode nodes with more memory) +- Performance isolation (avoiding resource contention between Prefill and Decode) +- Flexible P:D node ratio configuration (via `--replica_config_pd_node_ratio`) + +This design was inspired by [splitwise-sim](https://github.com/Mutinifni/splitwise-sim) [6]. + +### 3.2 Multi-Request Inference Scheduling + +The request scheduling component is adapted from Microsoft's [Vidur](https://github.com/microsoft/vidur) [5] (vidur-alibabacloud), supporting the following scheduling strategies: + +| Scheduler Type | Level | Description | +|---------------|-------|-------------| +| `split_wise` | Global | Global scheduling for PD disaggregation, dispatching requests to Prefill and Decode replicas | +| `lor` | Global | Least Outstanding Requests, dispatching to the least-loaded replica | +| `round_robin` | Global | Round-robin dispatch | +| `sarathi` | Per-replica | Intra-replica batch scheduling | +| `split_wise` | Per-replica | Per-replica scheduling for PD disaggregation | + +### 3.3 Flexible Parallelism + +Supports combinations of multiple parallelism strategies: + +- **Data Parallel (DP)** — via `--cluster_config_num_replicas` +- **Tensor Parallel (TP)** — via `--replica_config_tensor_parallel_size` +- **Pipeline Parallel (PP)** — via `--replica_config_num_pipeline_stages` +- **Expert Parallel (EP)** — via `--replica_config_expert_model_parallel_size` + +Works for both dense and MoE (Mixture-of-Experts) models. + +### 3.4 Multiple Execution-Time Prediction Backends + +| Backend | Description | +|---------|-------------| +| **AICB/AIOB** | Partially supports compute kernels and TP/DP/PP/EP communication size modeling for DeepSeek-V3-671B, Qwen3-MoE-235B, Qwen3-Next-80B | +| **SimAI Simulation** | SimAI NS-3-based full-stack network simulation (currently supports TP) | +| **SimAI Analytical** | SimAI analytical performance model (currently supports TP) | +| **Native Vidur** | Original Vidur backend, supports TP, DP, PP | + +--- + +## 4. Modern Model Support + +SimAI 1.6 supports the following three state-of-the-art MoE large models, with configuration files located in `vidur-alibabacloud/data/hf_configs/`: + +### 4.1 DeepSeek-V3-671B + +| Attribute | Value | +|-----------|-------| +| Total Layers | 61 | +| Attention Type | MLA (Multi-head Latent Attention) | +| Attention Heads | 128 | +| Hidden Size | 7168 | +| KV LoRA Rank | 512 | +| Q LoRA Rank | 1536 | +| QK RoPE Head Dim | 64 | +| QK NoPE Head Dim | 128 | +| V Head Dim | 128 | +| MoE Routed Experts | 256 | +| Experts Per Token | 8 | +| Shared Experts | 1 | +| Dense Layers (first 3) | Fixed activation of 8 routed experts + 1 shared expert | +| Sparse Layers (layers 3-60) | Dynamically select 8 from 256 routed experts + 1 shared expert | + +Configuration file: `data/hf_configs/deepseek_v3_config.json` + +### 4.2 Qwen3-MoE-235B + +| Attribute | Value | +|-----------|-------| +| Total Layers | 94 | +| Attention Type | MHA/GQA | +| Attention Heads | 64 | +| KV Heads | 4 | +| Hidden Size | 4096 | +| Head Dim | 128 | +| MoE Routed Experts | 128 | +| Experts Per Token | 8 | +| MoE Intermediate Size | 1536 | + +Configuration file: `data/hf_configs/qwen3_moe_config.json` + +### 4.3 Qwen3-Next-80B + +| Attribute | Value | +|-----------|-------| +| Total Layers | 48 | +| Attention Type | Hybrid (full + linear attention, alternating every 4 layers) | +| Full Attention Heads | 16 | +| KV Heads | 2 | +| Hidden Size | 2048 | +| Head Dim | 256 | +| Linear Attention Key Heads | 16 | +| Linear Attention Value Heads | 32 | +| MoE Routed Experts | 512 | +| Experts Per Token | 10 | +| MoE Intermediate Size | 512 | + +Configuration file: `data/hf_configs/qwen3-next-80B-A3B_config.json` + +--- + +## 5. GPU Memory Calculation Module + +This is the core new feature in SimAI 1.6. The module provides accurate GPU memory estimation for inference simulation, covering model parameter memory, KV cache memory, and maximum batch size calculation, with separate memory budget computation for Prefill and Decode phases under PD disaggregation. + +### 5.1 Parameter Counting (ParamCounter) + +**File path**: `vidur-alibabacloud/vidur/utils/param_counter.py` + +ParamCounter supports per-layer and per-device parameter counting, returning a triple `(total_params, prefill_params, decode_params)` under PD disaggregation. + +#### MLA Parameters (DeepSeek-V3-671B) + +Per-layer MLA parameter components: + +- **Q LoRA down-projection**: `wq_down = hidden_size * q_lora_rank` = 7168 * 1536 +- **Q LoRA up-projection**: `wq_up = q_lora_rank * num_attention_heads * qk_head_dim` = 1536 * 128 * 192, where `qk_head_dim = qk_nope_head_dim + qk_rope_head_dim = 128 + 64 = 192` +- **KV LoRA down-projection**: `wkv_down = hidden_size * kv_lora_rank` = 7168 * 512 +- **KV LoRA up-projection**: `wkv_up = kv_lora_rank * num_attention_heads * (qk_nope_head_dim + v_head_dim)` = 512 * 128 * 256 +- **Output projection**: `wo = hidden_size * num_attention_heads * v_head_dim` = 7168 * 128 * 128 + +Under FP8 quantization, each parameter element uses 1 byte; under FP16/BF16, each uses 2 bytes. + +References: [3] [4] + +#### MHA/GQA Parameters (Qwen3-MoE-235B) + +Per-layer MHA parameters: + +``` +wq = hidden_size * num_attention_heads * head_dim +wk = hidden_size * num_key_value_heads * head_dim +wv = hidden_size * num_key_value_heads * head_dim +wo = hidden_size * num_attention_heads * head_dim +total = (wq + wk + wv + wo) * bytes_per_element +``` + +#### Linear Attention Parameters (Qwen3-Next-80B) + +Qwen3-Next-80B uses a hybrid attention architecture, alternating between full attention and linear (GDN) attention every 4 layers. Linear attention layers use independent key/value head configurations (`linear_key_head_dim`, `linear_num_key_heads`, etc.). + +#### MoE Expert Parameters + +Per-expert FFN parameters (3 weight matrices W1, W2, W3): + +``` +expert_params = 3 * hidden_size * moe_intermediate_size * bytes_per_element +``` + +#### PD Disaggregation Parameter Calculation + +Under PD disaggregation, the expert parallelism (EP) may differ between Prefill and Decode clusters: + +- **Prefill cluster**: Uses `prefill_world_size` as EP, experts per device = `num_routed_experts / prefill_world_size` +- **Decode cluster**: Uses `decode_world_size` as EP, experts per device = `num_routed_experts / decode_world_size` + +This results in different parameter memory for Prefill and Decode clusters, which in turn affects their respective available KV cache capacity. + +### 5.2 KV Cache Memory Management + +**File path**: `vidur-alibabacloud/vidur/scheduler/utils/memory_planner.py`, `vidur-alibabacloud/vidur/entities/replica.py` + +#### MHA/GQA KV Cache Calculation + +``` +kv_cache_per_token = 2 * num_kv_heads * head_dim * num_layers * bytes_per_element +``` + +The factor of 2 represents the K (Key) and V (Value) caches. + +#### MLA KV Cache Calculation (DeepSeek-V3-671B) + +The MLA architecture uses compressed KV representations. Unlike MHA which stores separate K and V caches, MLA stores a single compressed latent vector (`kv_lora_rank`) that jointly encodes K and V, plus the RoPE position keys (`qk_rope_head_dim`). Per-token KV cache size: + +``` +kv_cache_per_token = (kv_lora_rank + qk_rope_head_dim) * num_layers * bytes_per_element +``` + +Where `kv_lora_rank = 512` and `qk_rope_head_dim = 64`. Compared to MHA's per-token cache of `2 * num_kv_heads * head_dim` = 2 * 128 * 128 = 32768 elements, MLA reduces this to 576 elements — a **~57x** reduction. + +#### Per-Request KV Cache Tracking + +The `Replica` entity (`vidur/entities/replica.py`) maintains the following state: + +- `_allocated_kv_cache_memory`: Currently allocated KV cache memory (bytes) +- `_max_kv_cache_memory`: Maximum KV cache capacity (computed on first call by MemoryPlanner) +- `_kv_cache_allocation_map`: Per-request KV cache allocation mapping + +Supported operations: +- `allocate_request_kv_cache_memory(request, num_blocks, block_size)` — Allocate KV cache for a request +- `release_request_kv_cache_memory(request)` — Release KV cache for a completed request +- `get_remaining_kv_cache_capacity()` — Query remaining KV cache capacity and serviceable request count + +### 5.3 MemoryPlanner + +**File path**: `vidur-alibabacloud/vidur/scheduler/utils/memory_planner.py` + +MemoryPlanner is the central component for memory management, with the following calculation flow: + +1. **Compute available GPU memory**: `available_memory = total_GPU_memory * (1 - memory_margin_fraction)` +2. **Get model parameter memory**: Computed via ParamCounter; under PD disaggregation returns `(total, prefill, decode)` triple +3. **Compute KV cache available memory**: `kv_cache_available = available_memory - param_memory` +4. **Compute maximum concurrent requests**: `max_requests = kv_cache_available / kv_cache_per_request` + +Under PD disaggregation: +- Prefill replicas use `prefill_param_mem` for KV cache budget calculation +- Decode replicas use `decode_param_mem` for KV cache budget calculation + +Includes OOM detection: when parameter memory exceeds available memory, error messages are output with suggestions to increase TP/EP, use larger GPUs, or enable FP8 quantization. + +--- + +## 6. AICB Inference Workload Generation + +[AICB](https://github.com/aliyun/aicb) introduces inference workload generation capabilities (PR [#58](https://github.com/aliyun/aicb/pull/58), [#60](https://github.com/aliyun/aicb/pull/60)), with key features: + +- **Prefill/Decode phase separation**: Generates separate compute and communication workloads for Prefill and Decode phases +- **Compute kernel profiling**: Relies on the following hardware-accelerated libraries (requires Hopper SM90 or Blackwell SM100 GPUs): + - [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM) — FP8 matrix multiplication + - [FlashMLA](https://github.com/deepseek-ai/FlashMLA) — MLA attention acceleration + - [FlashInfer](https://github.com/flashinfer-ai/flashinfer) — High-performance inference kernels +- **Communication size modeling**: Supports communication size calculation for TP, DP, PP, EP parallelism strategies +- **Model support**: DeepSeek-V3-671B, Qwen3-MoE-235B, Qwen3-Next-80B + +--- + +## 7. Four-Scenario End-to-End Test Suite + +**File path**: `vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh` + +Provides 4 pre-configured end-to-end test scenarios covering different models, parallelism strategies, and PD disaggregation configurations. + +### Shared Hardware Configuration + +- GPU: H20 (h20_dgx) +- NVLink bandwidth: 1600 Gbps +- RDMA bandwidth: 800 Gbps +- PD P2P bandwidth: 800 Gbps +- Data type: fp8 +- Requests: Poisson QPS=100, 4 requests, fixed prefill=100 / decode=8 tokens + +### Scenario Configuration + +| Scenario | Model | PD Separation | World Size | TP | PP | EP | Global Scheduler | +|----------|-------|---------------|-----------|----|----|-----|-----------------| +| 1 | Qwen3-Next-80B (MoE) | No | 32 (dp=32) | 1 | 1 | 1 | lor | +| 2 | Qwen3-Next-80B (MoE) | Yes (P=2, D=6) | 8 | 1 | 1 | 1 | split_wise | +| 3 | DeepSeek-671B (MoE) | Yes (P=2, D=6) | 8 | 8 | 1 | 8 | split_wise | +| 4 | Qwen3-MoE-235B (MoE) | Yes (P=2, D=6) | 8 | 4 | 1 | 4 | split_wise | + +### Running + +```bash +# Run all 4 scenarios +bash vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh --all + +# Run a single scenario +bash vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh --scenario 3 +``` + +For detailed performance data, please run the test suite. Each run produces output files including `request_metrics.csv` (per-request metrics), `chrome_trace.json` (timeline trace), `config.json` (configuration snapshot), and metric files under the `plots/` directory. + +--- + +## 8. Code Quality Improvements + +SimAI 1.6 includes systematic code quality improvements: + +### 8.1 Bilingual Comments and Documentation + +- Added bilingual (Chinese/English) docstrings to all public APIs +- Added bilingual comments to config, scheduler, predictor, and utils modules +- Added bilingual comments to entity modules +- Shell script outputs and Python runtime outputs use bilingual format + +### 8.2 Logging System Improvements + +- Comprehensive replacement of `print` statements with the `logging` module (~12 files) +- Unified log format using parenthetical bilingual style (e.g., `"GPU总内存 (Total GPU mem): 96.00 GB"`) + +### 8.3 Dead Code Cleanup + +- Removed approximately 390 lines of dead code blocks +- Cleaned up personal debug markers + +### 8.4 TODO Standardization + +- Unified to `TODO(author): description` format +- Added missing type annotations + +--- + +## 9. System Architecture + +### Inference Simulation Data Flow + +``` +Request Generator + | Generate synthetic / real-trace requests + v +Global Scheduler + | Dispatch requests to Prefill / Decode replicas + v +Replica Scheduler + | Batch assembly and scheduling + v +Memory Management (MemoryPlanner + Replica) + | KV cache allocation and capacity checking + v +Execution Time Predictor + | AICB / SimAI Simulation / SimAI Analytical / Vidur + v +Metrics Store + | TTFT, TBT, E2E, communication / compute cost + v +Output (request_metrics.csv, chrome_trace.json, plots/) +``` + +--- + +## 10. Quick Start + +### Environment Setup + +#### Option 1: Docker (Recommended) + +```bash +# Build from project root +docker build -t simai:latest . +docker run --gpus all -it --rm simai:latest +``` + +> If using Hopper GPUs, add `ENV FLASH_MLA_DISABLE_SM100=1` to the Dockerfile. + +#### Option 2: Conda + +```bash +cd vidur-alibabacloud +conda env create -p ./env -f ./environment.yml +conda activate vidur +pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ +``` + +### Run 4-Scenario Test Suite + +```bash +# Prerequisites: conda activate vidur +bash vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh --all +``` + +### Compile and Run SimAI Training Simulation + +```bash +# Compile SimAI-Analytical +./scripts/build.sh -c analytical + +# Run +./bin/SimAI_analytical -w example/workload_analytical.txt -g 9216 -g_p_s 8 -r test- -busbw example/busbw.yaml +``` + +--- + +## 11. References + +[1] SimAI: Unifying Architecture Design and Performance Tuning for Large-Scale Large Language Model Training with Scalability and Precision. NSDI'25 Spring. [[pdf](https://ennanzhai.github.io/pub/nsdi25spring-simai.pdf)] + +[2] InferSim — Alibaba. Parameter counting and KV cache estimation. [[GitHub](https://github.com/alibaba/InferSim)] + +[3] DeepSeek V3 Parameter Derivation (Chinese). Zhihu. [[link](https://zhuanlan.zhihu.com/p/21455638257)] + +[4] DeepSeek V3 Parameter Size Analysis. Yang Wenbo. [[link](https://yangwenbo.com/articles/deepseek-v3-parameter-size.html)] + +[5] Vidur: A Large-Scale Simulation Framework For LLM Inference. Microsoft Research. [[GitHub](https://github.com/microsoft/vidur)] + +[6] splitwise-sim — Prefill-Decode Disaggregation Simulation. [[GitHub](https://github.com/Mutinifni/splitwise-sim)] diff --git a/docs/SimAI_1.6_Tech_Report_CN.md b/docs/SimAI_1.6_Tech_Report_CN.md new file mode 100644 index 00000000..228ae90a --- /dev/null +++ b/docs/SimAI_1.6_Tech_Report_CN.md @@ -0,0 +1,429 @@ ++ 中文  |  English +
+ +# SimAI 1.6 技术报告 + +> 本报告涵盖 SimAI 1.5 全部功能及 SimAI 1.6 新增特性。 + +## 1. 概述 + +**SimAI** 是业界首个全栈高精度 AI 大规模推理与训练模拟器(**Sim**ulator for **AI**),由阿里云开源。SimAI 对 LLM 推理与训练全流程进行详细建模和仿真,涵盖框架层、集合通信层、网络传输层等,提供端到端的性能数据。SimAI 论文已被 NSDI'25 Spring 接收 [1]。 + +SimAI 1.6 在 SimAI 1.5 的基础上进一步增强,主要新增了 **GPU 显存计算模块**(支持 DeepSeek-V3-671B、Qwen3-MoE-235B、Qwen3-Next-80B 三种 MoE 模型的精确参数量计算与 KV Cache 管理)、**四场景端到端测试套件**,以及全面的代码质量改进(双语文档、日志系统、死代码清理等)。 + +### 组件构成 + +``` + |--- AICB (工作负载生成与计算 profiling) +SimAI --|--- SimCCL (集合通信算法分析) + |--- astra-sim-alibabacloud (仿真引擎:Analytical / Simulation / Physical) + |--- ns-3-alibabacloud (NS-3 网络后端) + |--- vidur-alibabacloud (多请求推理调度与显存管理) +``` + +--- + +## 2. 关键里程碑 + +以下为 2025 年 11 月至 2026 年 3 月的关键开发事件: + +| 时间 | 事件 | 说明 | +|------|------|------| +| 2025/11 | AICB PR [#58](https://github.com/aliyun/aicb/pull/58) | AICB 新增推理工作负载生成能力,区分 prefill/decode 阶段,支持 DeepSeek、Qwen3-MoE、Qwen3-Next | +| 2025/12 | AICB PR [#60](https://github.com/aliyun/aicb/pull/60) | AICB 进一步更新,完善推理工作负载生成 | +| 2025/12 | SimAI PR [#203](https://github.com/aliyun/SimAI/pull/203) | SimAI 1.5 核心更新:端到端推理仿真、PD 分离、Vidur 调度集成、现代模型支持 | +| 2025/12 | ns-3 commit [7e3cb5b](https://github.com/aliyun/ns-3-alibabacloud/commit/7e3cb5b88c99abcb582c5abc3919484a4805111b) | ns-3-alibabacloud README 文档增强,详细说明 NS3 网络后端修改 | +| 2026/01 | 显存模块系列 commit | 完成 DeepSeek-V3-671B、Qwen3-Next-80B、Qwen3-MoE-235B 的精确显存计算 | +| 2026/02 | PD 分离显存规划 | 实现 Prefill/Decode 阶段独立的参数显存与 KV Cache 预算计算 | +| 2026/03 | 代码质量改进 | 双语注释/文档/日志全面改进,死代码清理,TODO 标准化,类型注解补充 | + +--- + +## 3. 端到端推理仿真 + +SimAI 支持完整的多请求 LLM 推理仿真,核心特性如下: + +### 3.1 Prefill–Decode(PD)分离架构 + +推理过程分为两个阶段: + +- **Prefill 阶段**:处理输入 prompt 的全部 token,生成第一个输出 token(计算密集型) +- **Decode 阶段**:逐 token 自回归生成后续输出(访存密集型) + +PD 分离允许将 Prefill 和 Decode 阶段部署在不同的 GPU 节点上,实现: +- 弹性资源分配(Prefill 节点可配置更多计算资源,Decode 节点可配置更多显存) +- 性能隔离(避免 Prefill 和 Decode 之间的资源争用) +- 灵活的 P:D 节点比例配置(通过 `--replica_config_pd_node_ratio` 控制) + +该设计参考了 [splitwise-sim](https://github.com/Mutinifni/splitwise-sim) [6]。 + +### 3.2 多请求推理调度 + +请求调度组件基于微软 [Vidur](https://github.com/microsoft/vidur) [5] 改编(vidur-alibabacloud),支持以下调度策略: + +| 调度器类型 | 级别 | 说明 | +|-----------|------|------| +| `split_wise` | 全局 | PD 分离场景下的全局调度,将请求分配到 Prefill 和 Decode 副本 | +| `lor` | 全局 | Least Outstanding Requests,将请求分配到负载最轻的副本 | +| `round_robin` | 全局 | 轮询分配 | +| `sarathi` | 副本级 | 单副本内的批处理调度 | +| `split_wise` | 副本级 | PD 分离场景下的副本级调度 | + +### 3.3 灵活的并行策略 + +支持多种并行策略的组合: + +- **数据并行(DP)** — 通过 `--cluster_config_num_replicas` 控制 +- **张量并行(TP)** — 通过 `--replica_config_tensor_parallel_size` 控制 +- **流水线并行(PP)** — 通过 `--replica_config_num_pipeline_stages` 控制 +- **专家并行(EP)** — 通过 `--replica_config_expert_model_parallel_size` 控制 + +同时支持 Dense 模型和 MoE(混合专家)模型。 + +### 3.4 多种执行时间预测后端 + +| 后端 | 说明 | +|------|------| +| **AICB/AIOB** | 部分支持 DeepSeek-V3-671B、Qwen3-MoE-235B、Qwen3-Next-80B 的计算核与 TP/DP/PP/EP 通信量建模 | +| **SimAI Simulation** | 基于 SimAI NS-3 的网络通信全栈仿真(当前支持 TP) | +| **SimAI Analytical** | SimAI 解析性能模型(当前支持 TP) | +| **Native Vidur** | 原版 Vidur 后端,支持 TP、DP、PP | + +--- + +## 4. 现代模型支持 + +SimAI 1.6 支持以下三种前沿 MoE 大模型,模型配置文件位于 `vidur-alibabacloud/data/hf_configs/`: + +### 4.1 DeepSeek-V3-671B + +| 属性 | 值 | +|------|-----| +| 总层数 | 61 | +| 注意力类型 | MLA(Multi-head Latent Attention) | +| 注意力头数 | 128 | +| 隐藏维度 | 7168 | +| KV LoRA 秩 | 512 | +| Q LoRA 秩 | 1536 | +| QK RoPE 头维度 | 64 | +| QK NoPE 头维度 | 128 | +| V 头维度 | 128 | +| MoE 路由专家数 | 256 | +| 每 token 激活专家数 | 8 | +| 共享专家数 | 1 | +| Dense 层(前 3 层) | 固定激活 8 个路由专家 + 1 个共享专家 | +| Sparse 层(第 3-60 层) | 从 256 个路由专家中动态选择 8 个 + 1 个共享专家 | + +配置文件:`data/hf_configs/deepseek_v3_config.json` + +### 4.2 Qwen3-MoE-235B + +| 属性 | 值 | +|------|-----| +| 总层数 | 94 | +| 注意力类型 | MHA/GQA | +| 注意力头数 | 64 | +| KV 头数 | 4 | +| 隐藏维度 | 4096 | +| 头维度 | 128 | +| MoE 路由专家数 | 128 | +| 每 token 激活专家数 | 8 | +| MoE 中间维度 | 1536 | + +配置文件:`data/hf_configs/qwen3_moe_config.json` + +### 4.3 Qwen3-Next-80B + +| 属性 | 值 | +|------|-----| +| 总层数 | 48 | +| 注意力类型 | 混合(全注意力 + 线性注意力,每 4 层交替) | +| 全注意力头数 | 16 | +| KV 头数 | 2 | +| 隐藏维度 | 2048 | +| 头维度 | 256 | +| 线性注意力键头数 | 16 | +| 线性注意力值头数 | 32 | +| MoE 路由专家数 | 512 | +| 每 token 激活专家数 | 10 | +| MoE 中间维度 | 512 | + +配置文件:`data/hf_configs/qwen3-next-80B-A3B_config.json` + +--- + +## 5. GPU 显存计算模块 + +这是 SimAI 1.6 的核心新增特性。该模块为推理仿真提供精确的 GPU 显存估算,覆盖模型参数显存、KV Cache 显存和最大批处理量计算,并在 PD 分离架构下分别计算 Prefill 和 Decode 阶段的显存预算。 + +### 5.1 参数量计算(ParamCounter) + +**文件路径**:`vidur-alibabacloud/vidur/utils/param_counter.py` + +ParamCounter 支持按层、按设备计算模型参数量,并在 PD 分离架构下返回三元组 `(total_params, prefill_params, decode_params)`。 + +#### MLA 参数量(DeepSeek-V3-671B) + +单层 MLA 参数量由以下部分组成: + +- **Q LoRA 下投影**:`wq_down = hidden_size * q_lora_rank` = 7168 * 1536 +- **Q LoRA 上投影**:`wq_up = q_lora_rank * num_attention_heads * qk_head_dim` = 1536 * 128 * 192,其中 `qk_head_dim = qk_nope_head_dim + qk_rope_head_dim = 128 + 64 = 192` +- **KV LoRA 下投影**:`wkv_down = hidden_size * kv_lora_rank` = 7168 * 512 +- **KV LoRA 上投影**:`wkv_up = kv_lora_rank * num_attention_heads * (qk_nope_head_dim + v_head_dim)` = 512 * 128 * 256 +- **输出投影**:`wo = hidden_size * num_attention_heads * v_head_dim` = 7168 * 128 * 128 + +FP8 量化下每个参数元素占 1 字节;FP16/BF16 下每个占 2 字节。 + +参考:[3] [4] + +#### MHA/GQA 参数量(Qwen3-MoE-235B) + +单层 MHA 参数量: + +``` +wq = hidden_size * num_attention_heads * head_dim +wk = hidden_size * num_key_value_heads * head_dim +wv = hidden_size * num_key_value_heads * head_dim +wo = hidden_size * num_attention_heads * head_dim +total = (wq + wk + wv + wo) * bytes_per_element +``` + +#### 线性注意力参数量(Qwen3-Next-80B) + +Qwen3-Next-80B 采用混合注意力架构,每 4 层交替使用全注意力和线性(GDN)注意力。线性注意力层使用独立的键/值头配置(`linear_key_head_dim`、`linear_num_key_heads` 等)。 + +#### MoE 专家参数量 + +每个专家的 FFN 参数量(3 个权重矩阵 W1、W2、W3): + +``` +expert_params = 3 * hidden_size * moe_intermediate_size * bytes_per_element +``` + +#### PD 分离下的参数量计算 + +在 PD 分离架构下,Prefill 和 Decode 集群的专家并行度(EP)可能不同: + +- **Prefill 集群**:使用 `prefill_world_size` 作为 EP,每设备加载的专家数 = `num_routed_experts / prefill_world_size` +- **Decode 集群**:使用 `decode_world_size` 作为 EP,每设备加载的专家数 = `num_routed_experts / decode_world_size` + +这导致 Prefill 和 Decode 集群的参数显存不同,进而影响各自可用的 KV Cache 容量。 + +### 5.2 KV Cache 显存管理 + +**文件路径**:`vidur-alibabacloud/vidur/scheduler/utils/memory_planner.py`、`vidur-alibabacloud/vidur/entities/replica.py` + +#### MHA/GQA KV Cache 计算 + +``` +kv_cache_per_token = 2 * num_kv_heads * head_dim * num_layers * bytes_per_element +``` + +其中因子 2 代表 K(Key)和 V(Value)两个缓存。 + +#### MLA KV Cache 计算(DeepSeek-V3-671B) + +MLA 架构使用压缩的 KV 表示。与 MHA 分别存储 K 和 V 缓存不同,MLA 存储一个联合编码 K 和 V 的压缩潜向量(`kv_lora_rank`),外加 RoPE 位置键(`qk_rope_head_dim`)。每 token 的 KV Cache 大小为: + +``` +kv_cache_per_token = (kv_lora_rank + qk_rope_head_dim) * num_layers * bytes_per_element +``` + +其中 `kv_lora_rank = 512`,`qk_rope_head_dim = 64`。相比 MHA 每 token 缓存量 `2 * num_kv_heads * head_dim` = 2 * 128 * 128 = 32768 个元素,MLA 减少至 576 个元素——约 **57 倍**压缩。 + +#### 逐请求 KV Cache 追踪 + +`Replica` 实体(`vidur/entities/replica.py`)维护以下状态: + +- `_allocated_kv_cache_memory`:已分配的 KV Cache 显存(字节) +- `_max_kv_cache_memory`:最大 KV Cache 容量(首次调用时由 MemoryPlanner 计算) +- `_kv_cache_allocation_map`:每请求 KV Cache 分配映射 + +支持的操作: +- `allocate_request_kv_cache_memory(request, num_blocks, block_size)` — 为请求分配 KV Cache +- `release_request_kv_cache_memory(request)` — 释放已完成请求的 KV Cache +- `get_remaining_kv_cache_capacity()` — 查询剩余 KV Cache 容量和可服务请求数 + +### 5.3 MemoryPlanner 显存规划 + +**文件路径**:`vidur-alibabacloud/vidur/scheduler/utils/memory_planner.py` + +MemoryPlanner 是显存管理的核心组件,计算流程如下: + +1. **计算可用 GPU 显存**:`available_memory = total_GPU_memory * (1 - memory_margin_fraction)` +2. **获取模型参数显存**:通过 ParamCounter 计算,PD 分离下返回 `(total, prefill, decode)` 三元组 +3. **计算 KV Cache 可用显存**:`kv_cache_available = available_memory - param_memory` +4. **计算最大并发请求数**:`max_requests = kv_cache_available / kv_cache_per_request` + +在 PD 分离架构下: +- Prefill 副本使用 `prefill_param_mem` 计算 KV Cache 预算 +- Decode 副本使用 `decode_param_mem` 计算 KV Cache 预算 + +包含 OOM 检测:当参数显存超过可用显存时,输出错误信息并建议增加 TP/EP、使用更大 GPU 或启用 FP8 量化。 + +--- + +## 6. AICB 推理工作负载生成 + +[AICB](https://github.com/aliyun/aicb) 新增了推理工作负载生成能力(PR [#58](https://github.com/aliyun/aicb/pull/58)、[#60](https://github.com/aliyun/aicb/pull/60)),主要特性: + +- **Prefill/Decode 阶段分离**:分别生成 Prefill 和 Decode 阶段的计算与通信工作负载 +- **计算核 Profiling**:依赖以下硬件加速库(需要 Hopper SM90 或 Blackwell SM100 GPU): + - [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM) — FP8 矩阵乘法 + - [FlashMLA](https://github.com/deepseek-ai/FlashMLA) — MLA 注意力加速 + - [FlashInfer](https://github.com/flashinfer-ai/flashinfer) — 高性能推理内核 +- **通信量建模**:支持 TP、DP、PP、EP 四种并行策略下的通信量计算 +- **模型支持**:DeepSeek-V3-671B、Qwen3-MoE-235B、Qwen3-Next-80B + +--- + +## 7. 四场景端到端测试套件 + +**文件路径**:`vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh` + +提供了 4 个预配置的端到端测试场景,覆盖不同模型、并行策略和 PD 分离配置。 + +### 共用硬件配置 + +- GPU:H20 (h20_dgx) +- NVLink 带宽:1600 Gbps +- RDMA 带宽:800 Gbps +- PD P2P 带宽:800 Gbps +- 数据类型:fp8 +- 请求:Poisson QPS=100,4 个请求,固定 prefill=100 / decode=8 tokens + +### 场景配置表 + +| 场景 | 模型 | PD 分离 | World Size | TP | PP | EP | 全局调度 | +|------|------|---------|-----------|----|----|-----|---------| +| 1 | Qwen3-Next-80B (MoE) | 否 | 32 (dp=32) | 1 | 1 | 1 | lor | +| 2 | Qwen3-Next-80B (MoE) | 是 (P=2, D=6) | 8 | 1 | 1 | 1 | split_wise | +| 3 | DeepSeek-671B (MoE) | 是 (P=2, D=6) | 8 | 8 | 1 | 8 | split_wise | +| 4 | Qwen3-MoE-235B (MoE) | 是 (P=2, D=6) | 8 | 4 | 1 | 4 | split_wise | + +### 运行方式 + +```bash +# 运行全部 4 个场景 +bash vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh --all + +# 运行单个场景 +bash vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh --scenario 3 +``` + +详细性能数据请运行测试套件获取。每次运行产生的输出文件包括 `request_metrics.csv`(逐请求指标)、`chrome_trace.json`(时间线追踪)、`config.json`(配置快照)以及 `plots/` 目录下的指标文件。 + +--- + +## 8. 代码质量改进 + +SimAI 1.6 在代码质量方面进行了系统性改进: + +### 8.1 双语注释与文档 + +- 所有公共 API 添加中英双语 docstring +- 配置模块(config)、调度器(scheduler)、预测器(predictor)、工具类(utils)添加双语注释 +- 实体模块(entities)添加双语注释 +- Shell 脚本输出和 Python 运行时输出均为双语格式 + +### 8.2 日志系统改进 + +- 全面将 `print` 语句替换为 `logging` 模块(涉及 ~12 个文件) +- 统一日志格式,使用括号式双语格式(如 `"GPU总内存 (Total GPU mem): 96.00 GB"`) + +### 8.3 死代码清理 + +- 移除约 390 行无效代码块 +- 清理个人调试标记 + +### 8.4 TODO 标准化 + +- 统一为 `TODO(author): description` 格式 +- 补充缺失的类型注解 + +--- + +## 9. 系统架构 + +### 推理仿真数据流 + +``` +请求生成器 (Request Generator) + │ 生成合成/真实 trace 请求 + ▼ +全局调度器 (Global Scheduler) + │ 将请求分配到 Prefill/Decode 副本 + ▼ +副本调度器 (Replica Scheduler) + │ 批处理组装与调度 + ▼ +显存管理 (MemoryPlanner + Replica) + │ KV Cache 分配与容量检查 + ▼ +执行时间预测 (Execution Time Predictor) + │ AICB / SimAI Simulation / SimAI Analytical / Vidur + ▼ +指标收集 (Metrics Store) + │ TTFT, TBT, E2E, 通信/计算开销 + ▼ +输出 (request_metrics.csv, chrome_trace.json, plots/) +``` + +--- + +## 10. 快速开始 + +### 环境搭建 + +#### 方式一:Docker(推荐) + +```bash +# 从项目根目录构建 +docker build -t simai:latest . +docker run --gpus all -it --rm simai:latest +``` + +> 若使用 Hopper GPU,请在 Dockerfile 中添加 `ENV FLASH_MLA_DISABLE_SM100=1`。 + +#### 方式二:Conda + +```bash +cd vidur-alibabacloud +conda env create -p ./env -f ./environment.yml +conda activate vidur +pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ +``` + +### 运行四场景测试 + +```bash +# 前置条件:conda activate vidur +bash vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh --all +``` + +### 编译与运行 SimAI 训练仿真 + +```bash +# 编译 SimAI-Analytical +./scripts/build.sh -c analytical + +# 运行 +./bin/SimAI_analytical -w example/workload_analytical.txt -g 9216 -g_p_s 8 -r test- -busbw example/busbw.yaml +``` + +--- + +## 11. 参考文献 + +[1] SimAI: Unifying Architecture Design and Performance Tuning for Large-Scale Large Language Model Training with Scalability and Precision. NSDI'25 Spring. [[pdf](https://ennanzhai.github.io/pub/nsdi25spring-simai.pdf)] + +[2] InferSim — Alibaba. Parameter counting and KV cache estimation. [[GitHub](https://github.com/alibaba/InferSim)] + +[3] DeepSeek V3 参数推导详解. 知乎. [[link](https://zhuanlan.zhihu.com/p/21455638257)] + +[4] DeepSeek V3 Parameter Size Analysis. Yang Wenbo. [[link](https://yangwenbo.com/articles/deepseek-v3-parameter-size.html)] + +[5] Vidur: A Large-Scale Simulation Framework For LLM Inference. Microsoft Research. [[GitHub](https://github.com/microsoft/vidur)] + +[6] splitwise-sim — Prefill-Decode Disaggregation Simulation. [[GitHub](https://github.com/Mutinifni/splitwise-sim)] diff --git a/docs/en/benchmarking/index.md b/docs/en/benchmarking/index.md new file mode 100644 index 00000000..2431cf17 --- /dev/null +++ b/docs/en/benchmarking/index.md @@ -0,0 +1,33 @@ +# Benchmarking + +This section covers benchmarking and validation approaches for SimAI. + +--- + +## Contents + +| Document | Description | +|----------|-------------| +| [4-Scenario End-to-End Test Suite](test_suite.md) | Pre-configured test scenarios covering different models, parallelism strategies, and PD configurations | + +--- + +## Benchmarking Approaches + +SimAI supports several benchmarking methodologies: + +### Architecture Comparison + +Compare different network architectures (e.g., Spectrum-X vs DCN+) under identical workloads to evaluate their performance characteristics. + +### Algorithm Comparison + +Compare different collective communication algorithms (e.g., RING vs NVLS) to understand their performance trade-offs at various message sizes. + +### Parameter Optimization + +Use SimAI-Analytical for rapid exploration of parallel parameter combinations (TP, PP, EP, DP) to find optimal configurations. + +### Validation Against Real Hardware + +Use AICB physical execution results as ground truth to validate simulation accuracy. diff --git a/docs/en/benchmarking/test_suite.md b/docs/en/benchmarking/test_suite.md new file mode 100644 index 00000000..54a54404 --- /dev/null +++ b/docs/en/benchmarking/test_suite.md @@ -0,0 +1,139 @@ +# 4-Scenario End-to-End Test Suite + +SimAI provides a pre-configured test suite covering 4 representative inference scenarios, enabling quick validation of all supported configurations. + +--- + +## Overview + +The test suite is located at `vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh` and covers different combinations of models, parallelism strategies, and PD disaggregation configurations. + +--- + +## Running + +```bash +# Run all 4 scenarios +bash vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh --all + +# Run a single scenario +bash vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh --scenario 1 + +# Show help +bash vidur-alibabacloud/examples/vidur-ali-scenarios/run_scenarios.sh --help +``` + +> **Prerequisites**: `conda activate vidur` environment must be active. + +--- + +## Shared Hardware Configuration + +All scenarios share the following hardware settings: + +| Parameter | Value | +|-----------|-------| +| GPU | H20 (h20_dgx) | +| NVLink Bandwidth | 1600 Gbps | +| RDMA Bandwidth | 800 Gbps | +| PD P2P Bandwidth | 800 Gbps | +| PD P2P Data Type | fp8 | +| Request Generator | Poisson, QPS=100 | +| Request Count | 4 | +| Prefill Tokens | 100 (fixed) | +| Decode Tokens | 8 (fixed) | + +--- + +## Scenario Configuration + +| Scenario | Model | PD Separation | World Size | TP | PP | EP | Global Scheduler | +|----------|-------|---------------|-----------|----|----|-----|-----------------| +| **1** | Qwen3-Next-80B (MoE) | No | 32 (dp=32) | 1 | 1 | 1 (default) | lor | +| **2** | Qwen3-Next-80B (MoE) | Yes (P=2, D=6) | 8 | 1 | 1 | 1 (default) | split_wise | +| **3** | DeepSeek-671B (MoE) | Yes (P=2, D=6) | 8 | 8 | 1 | 8 | split_wise | +| **4** | Qwen3-MoE-235B (MoE) | Yes (P=2, D=6) | 8 | 4 | 1 | 4 | split_wise | + +### Scenario Details + +- **Scenario 1**: Large-scale DP without PD separation — tests baseline throughput +- **Scenario 2**: Same model with PD separation — tests PD disaggregation overhead +- **Scenario 3**: DeepSeek-671B with large TP/EP — tests MoE with MLA attention +- **Scenario 4**: Qwen3-MoE-235B with moderate TP/EP — tests MHA/GQA attention model + +--- + +## Output + +### Output Directory + +- **Via run_scenarios.sh**: `examples/vidur-ali-scenarios/simulator_output/` +- **Direct python**: `./simulator_output/` + +### Output Files + +``` +
+
++ 中文  |  English +
+ +[](../../LICENSE) +[](https://ennanzhai.github.io/pub/nsdi25spring-simai.pdf) + +**SimAI** is the industry's first full-stack, high-precision **Sim**ulator for **AI** large-scale **inference** and **training**, open-sourced by Alibaba Cloud. It provides detailed modeling and simulation of the entire LLM training and inference process, encompassing the framework layer, collective communication layer, and network transport layer, delivering end-to-end performance data. + +SimAI enables researchers to: + +- Analyze inference/training process details +- Evaluate the time consumption of AI tasks under specific conditions +- Evaluate E2E performance gains from various algorithmic optimizations (framework parameters, collective communication algorithms, network protocols, congestion control, routing, topology, etc.) + +--- + +## Documentation Overview + +| Section | Description | +|---------|-------------| +| [Getting Started](getting_started/index.md) | Installation, environment setup, and quickstart guide | +| [User Guide](user_guide/index.md) | Detailed usage for SimAI-Analytical, SimAI-Simulation, SimAI-Physical, and Inference Simulation | +| [Components](components/index.md) | In-depth documentation for each submodule: AICB, SimCCL, astra-sim, ns-3, vidur | +| [Technical Reference](technical_reference/index.md) | GPU memory module, CLI parameters, and configuration reference | +| [Benchmarking](benchmarking/index.md) | 4-scenario end-to-end test suite and benchmark results | +| [Developer Guide](developer_guide/index.md) | Architecture, contributing guide, adding models, and extending NS-3 | +| [Community](community/index.md) | Events, contact information, and citation | + +--- + +## Architecture + +``` + |--- AICB (Workload generation & compute profiling) +SimAI --|--- SimCCL (Collective communication algorithm analysis) + |--- astra-sim-alibabacloud (Simulation engine: Analytical / Simulation / Physical) + |--- ns-3-alibabacloud (NS-3 network backend) + |--- vidur-alibabacloud (Multi-request inference scheduling & memory management) +``` + + + +--- + +## Three Operation Modes + +| Mode | Description | Use Cases | +|------|-------------|-----------| +| **SimAI-Analytical** | Fast simulation using bus bandwidth (busbw) to estimate collective communication time | Performance analysis, parallel parameter optimization, scale-up exploration | +| **SimAI-Simulation** | Full-stack simulation with NS-3 network backend for fine-grained network modeling | CC algorithm research, network protocol evaluation, novel architecture design | +| **SimAI-Physical** *(Beta)* | Physical traffic generation on CPU RDMA clusters | NIC behavior study during LLM training | + +--- + +## Supported Models + +- **DeepSeek-V3-671B** — MLA attention, 256 routed experts +- **Qwen3-MoE-235B** — MHA/GQA, 128 routed experts +- **Qwen3-Next-80B** — Hybrid full + linear attention, 512 routed experts +- **Meta-Llama-3-8B / 70B**, **Llama-2-7b / 70b**, **CodeLlama-34b**, **InternLM-20B**, **Qwen-72B** + +--- + +## Quick Links + +- [GitHub Repository](https://github.com/aliyun/SimAI) +- [NSDI'25 Paper (PDF)](https://ennanzhai.github.io/pub/nsdi25spring-simai.pdf) +- [Slides](../../docs/SimAI_Intro_Online.pdf) +- [Technical Report (1.6)](../SimAI_1.6_Tech_Report.md) +- [Contributing Guide](../../CONTRIBUTING.md) diff --git a/docs/en/technical_reference/cli_reference.md b/docs/en/technical_reference/cli_reference.md new file mode 100644 index 00000000..689d2058 --- /dev/null +++ b/docs/en/technical_reference/cli_reference.md @@ -0,0 +1,185 @@ +# CLI Reference + +Complete command-line parameter reference for all SimAI tools. + +--- + +## SimAI-Analytical + +**Binary**: `bin/SimAI_analytical` + +### Required Parameters + +| Flag | Long Form | Description | +|------|-----------|-------------| +| `-w` | `--workload` | Path to workload file | +| `-g` | `--gpus` | Simulation GPU scale | +| `-g_p_s` | `--gpus-per-server` | Scale-up size (GPUs per server) | +| `-r` | `--result` | Output file path and prefix (default: `./results/`) | +| `-busbw` | `--bus-bandwidth` | Path to busbw.yaml file | + +### Optional Parameters + +| Flag | Long Form | Description | +|------|-----------|-------------| +| `-v` | `--visual` | Generate visualization files | +| `-dp_o` | `--dp-overlap-ratio` | DP overlap ratio [0.0-1.0] | +| `-ep_o` | `--ep-overlap-ratio` | EP overlap ratio [0.0-1.0] | +| `-tp_o` | `--tp-overlap-ratio` | TP overlap ratio [0.0-1.0] | +| `-pp_o` | `--pp-overlap-ratio` | PP overlap ratio [0.0-1.0] | + +### Auto Busbw Calculation + +| Flag | Description | +|------|-------------| +| `-nv` | NVLink bandwidth (GB/s) | +| `-nic` | NIC bandwidth (GB/s) | +| `-n_p_s` | NICs per server | + +--- + +## SimAI-Simulation + +**Binary**: `bin/SimAI_simulator` + +### Environment Variables + +| Variable | Description | Default | +|----------|-------------|---------| +| `AS_LOG_LEVEL` | Log level: DEBUG/INFO/WARNING/ERROR | `INFO` | +| `AS_PXN_ENABLE` | Enable PXN | `0` | +| `AS_NVLS_ENABLE` | Enable NVLS | `0` | +| `AS_SEND_LAT` | Send latency (us) | `6` | +| `AS_NVLSTREE_ENABLE` | Enable NVLS Tree | `false` | + +### Parameters + +| Flag | Long Form | Description | Default | +|------|-----------|-------------|---------| +| `-t` | `--thread` | Number of threads | `1` | +| `-w` | `--workload` | Path to workload | Required | +| `-n` | `--network-topo` | Topology file path | Required | +| `-c` | `--config` | SimAI.conf path | Required | + +--- + +## SimAI-Physical + +**Binary**: `bin/SimAI_phynet` + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `hostlist` | Path to host IP list | Required | +| `-w` / `--workload` | Workload file path | `./microAllReduce.txt` | +| `-i` / `--gid_index` | GID index for RDMA | `0` | +| `-g` / `--gpus` | Number of GPUs | `8` | + +--- + +## Topology Generator + +**Script**: `astra-sim-alibabacloud/inputs/topo/gen_Topo_Template.py` + +| Level | Flag | Description | +|-------|------|-------------| +| Global | `-topo` | Template: Spectrum-X / AlibabaHPN / DCN+ | +| | `-g` | Number of GPUs | +| | `--dp` | Enable dual plane | +| | `--ro` | Enable rail-optimized | +| | `--dt` | Enable dual ToR | +| | `-er` | Error rate | +| Intra-Host | `-gps` | GPUs per server | +| | `-gt` | GPU type (A100/H100) | +| | `-nsps` | NV switches per server | +| | `-nvbw` | NVLink bandwidth | +| | `-nl` | NVLink latency | +| | `-l` | NIC latency | +| Intra-Segment | `-bw` | NIC to ASW bandwidth | +| | `-asw` | ASW switch count | +| | `-nps` | NICs per switch | +| Intra-Pod | `-psn` | PSW switch count | +| | `-apbw` | ASW to PSW bandwidth | +| | `-app` | ASW per PSW | + +--- + +## AICB Workload Generator + +**Script**: `scripts/megatron_workload_with_aiob.sh` or `python -m workload_generator.SimAI_training_workload_generator` + +### Core Parameters + +| Parameter | Description | +|-----------|-------------| +| `--frame` | Framework: Megatron / DeepSpeed / DeepSeek | +| `-m` / `--model_size` | Model size: 7/13/22/175/moe/deepseek | +| `--world_size` | Total GPU count | +| `--global_batch` | Total batch size | +| `--micro_batch` | Micro-batch size | +| `--seq_length` | Sequence length | +| `--epoch_num` | Number of iterations | + +### Parallelism Parameters + +| Parameter | Description | +|-----------|-------------| +| `--tensor_model_parallel_size` | TP degree | +| `--pipeline_model_parallel` | PP degree | +| `--expert_model_parallel_size` | EP degree | +| `--enable_sequence_parallel` | Enable SP | + +### Model Parameters + +| Parameter | Description | +|-----------|-------------| +| `--num_layers` | Transformer layers | +| `--hidden_size` | Hidden size | +| `--num_attention_heads` | Attention heads | +| `--ffn_hidden_size` | FFN hidden size | +| `--vocab_size` | Vocabulary size | + +### MoE Parameters + +| Parameter | Description | +|-----------|-------------| +| `--moe_enable` | Enable MoE | +| `--num_experts` | Number of experts | +| `--moe_router_topk` | Experts per token | +| `--moe_grouped_gemm` | Enable grouped GEMM | + +### DeepSeek Parameters + +| Parameter | Description | +|-----------|-------------| +| `--qk_rope_dim` | RoPE dimension for QK | +| `--qk_nope_dim` | Non-RoPE dimension for QK | +| `--q_lora_rank` | Q LoRA rank | +| `--kv_lora_rank` | KV LoRA rank | +| `--v_head_dim` | V head dimension | +| `--n_shared_expert` | Shared experts per MoE layer | +| `--n_dense_layer` | Dense layers count | + +### Optimization Parameters + +| Parameter | Description | +|-----------|-------------| +| `--use_flash_attn` | FlashAttention | +| `--swiglu` | SwiGLU activation | +| `--aiob_enable` | AIOB computation profiling | +| `--comp_filepath` | Pre-computed times file | + +--- + +## Vidur Inference Simulation + +**Command**: `python -m vidur.main` + +Run `python -m vidur.main -h` for the full parameter list. Key parameters are documented in the [vidur component page](../components/vidur.md). + +--- + +## See Also + +- [Configuration Reference](configuration.md) — Config file formats +- [SimAI-Analytical Guide](../user_guide/simai_analytical.md) — Usage examples +- [AICB Component](../components/aicb.md) — Full parameter details diff --git a/docs/en/technical_reference/configuration.md b/docs/en/technical_reference/configuration.md new file mode 100644 index 00000000..7ad67fb6 --- /dev/null +++ b/docs/en/technical_reference/configuration.md @@ -0,0 +1,163 @@ +# Configuration Reference + +This document covers all configuration files used by SimAI. + +--- + +## SimAI.conf + +**Path**: `astra-sim-alibabacloud/inputs/config/SimAI.conf` + +The main simulation configuration file used by both SimAI-Analytical and SimAI-Simulation modes. It controls communication algorithms, buffer sizes, and timing parameters. + +--- + +## busbw.yaml + +**Path**: `example/busbw.yaml` + +Used by SimAI-Analytical to specify bus bandwidth for different communication groups and collective operations. + +### Format + +```yaml +test +TP: + allreduce,: 300 # AllReduce busbw 300GB/s in TP group + allgather,: 280 + reducescatter,: 280 + alltoall,: 230 +DP: + allreduce,: null # null = not used in this group + allgather,: 380 + reducescatter,: 380 + alltoall,: null +EP: + allreduce,: null + allgather,: 45 + reducescatter,: 45 + alltoall,: 80 +``` + +### Communication Groups + +| Group | Description | +|-------|-------------| +| `TP` | Tensor Parallelism — intra-server NVLink communication | +| `DP` | Data Parallelism — inter-server RDMA communication | +| `EP` | Expert Parallelism — MoE expert communication | + +### Collective Operations + +| Operation | Description | +|-----------|-------------| +| `allreduce` | Reduce + broadcast across all ranks | +| `allgather` | Gather data from all ranks | +| `reducescatter` | Reduce and scatter | +| `alltoall` | All-to-all personalized exchange | + +Set value to `null` for operations not used in a particular group. + +--- + +## Topology Files + +Generated by `gen_Topo_Template.py`, topology files define the network structure for SimAI-Simulation. + +### Generation + +```bash +python3 ./astra-sim-alibabacloud/inputs/topo/gen_Topo_Template.py \ + -topo Spectrum-X -g 128 -gt A100 -bw 100Gbps -nvbw 2400Gbps +``` + +The output file is named based on parameters, e.g., `Spectrum-X_128g_8gps_100Gbps_A100`. + +### Template Defaults + +| Template | GPUs | Topo | Bandwidth | GPU Type | +|----------|------|------|-----------|----------| +| Spectrum-X | 4096 | Rail-optimized, single ToR | 400Gbps | H100 | +| AlibabaHPN (single) | 15360 | Rail-optimized, dual ToR | 200Gbps | H100 | +| AlibabaHPN (dual) | 15360 | Rail-optimized, dual ToR, dual plane | 200Gbps | H100 | +| DCN+ (single ToR) | 512 | Non rail-optimized | 400Gbps | A100 | +| DCN+ (dual ToR) | 512 | Non rail-optimized, dual ToR | 200Gbps | H100 | + +--- + +## Model Configuration Files + +### Inference Model Configs + +Located in `vidur-alibabacloud/data/hf_configs/`: + +| Model | Config File | +|-------|------------| +| DeepSeek-V3-671B | `deepseek_v3_config.json` | +| Qwen3-MoE-235B | `qwen3_moe_config.json` | +| Qwen3-Next-80B | `qwen3-next-80B-A3B_config.json` | + +These files follow the HuggingFace `config.json` format and define model architecture parameters. + +### Profiling Data + +Located in `vidur-alibabacloud/data/profiling/`: + +``` +profiling/ +├── compute/ +│ ├── a100/ +│ │ └──>DMIdn|vv#NRNV`unO<{3fYtqj?
z17FregN9UxFS$@oJ`k%g-Y@y|z|Mp5w@pN28h&y|P7n1);ZtW*-lgx-bdiYdL7ys`
zbfUA567HWMJhhFQF7>br`BH+@$t(t}?0RzxrlcEmcEpGaIk2>+zRG-Z(dFU9rsecq
zj>B6y1?nV+p2Q|Z3M@ghRzi=#O&pr6)8H~vg)1>z$mG2d08UYIsHe=OQa|%g*3tb@
zYiF85txXd(bH0%KqZB@;uPgSRc=BYymb_tU4>hU2dLwmu uXYa`v>ImcB{xVb-d1YB4`<1vwZ8#M4!1!b4&K
zO_6e)G=-tQ)6c5ATzI4;J{kRZ7!d^n?j?dXPDGDA_|X!beaUlBK_O&I>1V9|l6K$S
zg>Na2y{zotE$`)E8}Jw4+8aBpTbKinRoDE3i`
zJ~a{lT}
ABYDk~v7xbrJjM)oUHj1gvWIL=W`I5ITOxI7
z#@;(
;9
z3+h<_bCdYXK8-px<2xEZ)MTpRDP4GMM{`_^xqxZEX<|U{9wrqvwa=d350EiA;-bFj
z>*d>@dO1aqqM)FXw=MfB)?}C-14!2=;7sLnO^Iiix>f#^
*T0e_|G_SDay3KK{9++Kk4
zO&Jd~UA4&l