Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
name: Bug Report
description: Submit a bug report to help improve SimAI
title: "[BUG]: "
labels: ["bug"]
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to fill out this bug report!

It would be very helpful if you could provide as much detail as possible.

- type: textarea
id: bug-description
attributes:
label: Describe the Bug
description: A clear and concise description of what the bug is.
validations:
required: true

- type: textarea
id: reproduction
attributes:
label: Reproduction Details
description: |
Please provide detailed steps to reproduce the issue.
Include the branch names or commit IDs of SimAI/AICB you are using.
placeholder: |
1. **Branches / Commit IDs**: SimAI branch `master` (commit `abc1234`), AICB branch `master`
2. Go to '...'
3. Run `...`
4. See error: ...
validations:
required: true

- type: textarea
id: expected
attributes:
label: Expected Behavior
description: What did you expect to happen?
validations:
required: true

- type: textarea
id: actual
attributes:
label: Actual Behavior
description: What actually happened? Please include any error messages or logs.
validations:
required: true

- type: textarea
id: environment
attributes:
label: Environment
description: Please provide details about your environment.
placeholder: |
- OS: Ubuntu 20.04
- GCC/G++: 9.4.0
- Python: 3.8.10
- Docker image (if applicable): ...
- CUDA version (if applicable): ...
- SimAI branch/commit: master / abc1234
- AICB branch/commit: master / def5678
validations:
required: true

- type: textarea
id: usage-scenario
attributes:
label: Usage Scenario (Optional)
description: |
If possible, please describe your usage scenario for SimAI:
- What task or project you are working on
- The underlying goals or business context

This information will help us collect relevant use cases and optimize the SimAI simulator to better meet your needs.
validations:
required: false

- type: textarea
id: screenshots
attributes:
label: Screenshots / Logs
description: If applicable, add screenshots or log snippets to help explain your problem.
validations:
required: false
8 changes: 8 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
blank_issues_enabled: false
contact_links:
- name: SimAI Documentation
url: https://github.com/aliyun/SimAI/tree/master/docs
about: Refer to the SimAI documentation to help you get started.
- name: SimAI Community (DingTalk / WeChat)
url: https://github.com/aliyun/SimAI#contact-us
about: Join our DingTalk or WeChat community groups for discussion and support.
37 changes: 37 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Feature Request
description: Suggest an improvement for SimAI
title: "[FEATURE]: "
labels: ["enhancement"]
body:
- type: markdown
attributes:
value: |
Thank you for suggesting a feature to improve SimAI!

- type: textarea
id: feature-description
attributes:
label: Feature Description
description: A clear and concise description of the feature you'd like.
validations:
required: true

- type: textarea
id: problem
attributes:
label: Problem / Motivation
description: |
What problem does this feature solve? Why is it needed?
Please describe the use case or scenario where this feature would be helpful.
validations:
required: true

- type: textarea
id: alternatives
attributes:
label: Alternatives Considered
description: |
Have you considered any alternative solutions or workarounds?
Please describe them if applicable.
validations:
required: false
34 changes: 34 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## Description

<!-- Please provide a clear and concise description of what this PR does. -->

## Related Issue

<!-- Link to the related issue, e.g., Fixes #123 or Resolves #456 -->

## Type of Change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Performance improvement
- [ ] Refactoring (no functional changes)
- [ ] Documentation update
- [ ] Build / CI configuration change

## Checklist

- [ ] I have read the [CONTRIBUTING.md](../CONTRIBUTING.md) guide
- [ ] My code follows the existing code style of this project
- [ ] I have tested my changes locally
- [ ] I have added/updated documentation as needed
- [ ] My changes do not introduce new warnings or errors
- [ ] I have verified that simulation accuracy is not degraded (if applicable)

## Test Results

<!-- Describe the tests you ran and their results. -->
<!-- For simulation changes, include before/after accuracy comparison if possible. -->

## Additional Notes

<!-- Any additional information that reviewers should know. -->
52 changes: 52 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: Lint

on:
push:
branches: [master, main]
pull_request:
branches: [master, main]

jobs:
python-lint:
name: Python Lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install linters
run: pip install flake8

- name: Run flake8
run: |
# Stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# Treat all other issues as warnings (non-blocking)
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

markdown-lint:
name: Markdown Lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: markdownlint
uses: DavidAnson/markdownlint-cli2-action@v19
with:
globs: |
README.md
CONTRIBUTING.md
CHANGELOG.md
docs/**/*.md
config: |
{
"default": true,
"MD013": false,
"MD033": false,
"MD041": false
}
continue-on-error: true
15 changes: 14 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.vscode
# .vscode
astra-sim-alibabacloud/build/simai_analytical/build/
astra-sim-alibabacloud/build/astra_ns3/build/
astra-sim-alibabacloud/extern/
Expand All @@ -8,3 +8,16 @@ test/log/
*.log
.cur*
.DS_Store

# fth add
*.csv
*.txt
tmp_simai_inference_workload/
aicb/
Spectrum-X*

fth-test/*

# Personal dev / fth files
fth.sh
**/fth.sh
62 changes: 62 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
<p align="left">
<a href="CHANGELOG_CN.md">中文</a>&nbsp | &nbspEnglish
</p>

# Changelog

All notable changes to SimAI will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

> **Note**: This changelog covers v1.0 (initial open-source release) and later versions.

## [Unreleased]

## [1.6.0] - 2026-03-16

### Added

- GPU memory calculation module: accurate parameter counting and KV cache management for DeepSeek-V3-671B, Qwen3-MoE-235B, and Qwen3-Next-80B
- PD-separation memory planning for independent Prefill/Decode memory budgets
- Improved AICB decode time estimation with linear interpolation and global cache
- 4-scenario end-to-end inference test suite (`run_scenarios.sh`)
- SimAI 1.6 Technical Report (EN/ZH)
- Complete bilingual documentation system (30+ files under `docs/en/`, `docs/zh/`)
- GitHub community health files: issue/PR templates, Code of Conduct, Security Policy, Contributing Guide

### Changed

- Replaced print statements with logging across vidur-alibabacloud modules
- Added bilingual docstrings for public APIs
- Standardized TODO comments format

### Removed

- Removed ~390 lines of dead code in vidur-alibabacloud
- Cleaned personal debug markers across 8 files

## [1.5.0] - 2025-12-30

### Added

- **End-to-end multi-request inference simulation**: Full simulation support for multi-request inference workloads.
- **Prefill/Decode separation**: Model complex inference scenarios with Prefill/Decode phase separation.
- **Modern model support**: Added support for DeepSeek, Qwen3-MoE, and Qwen3-Next models.
- **Request scheduling via Vidur**: Integrated request scheduling component adapted from Microsoft's [Vidur](https://github.com/microsoft/vidur) (see [vidur-alibabacloud](./vidur-alibabacloud/)).
- **AICB inference workload generation**: AICB now supports generating prefill/decode inference workloads for DeepSeek, Qwen3-MoE, and Qwen3-Next.
- **DeepSeek training workload support**: AICB now supports generating training workloads for DeepSeek (contributed by [@parthpower](https://github.com/parthpower)).
- **SimCCL initial release**: First public release of the SimCCL collective communication transformation module.

## [1.0.0] - 2024-10-18

### Added

- Initial open-source release of SimAI: full-stack simulator for AI large-scale training
- Core components: AICB, SimCCL, astra-sim-alibabacloud, ns-3-alibabacloud
- SimAI-Analytical: fast simulation using bus bandwidth abstraction
- SimAI-Simulation: full-stack NS3-based network simulation
- SimAI-Physical (Beta): CPU RDMA cluster physical traffic generation

### Academic

- SimAI paper accepted by **NSDI'25 Spring**. See [paper](https://arxiv.org/abs/2410.07346).
62 changes: 62 additions & 0 deletions CHANGELOG_CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
<p align="left">
中文&nbsp | &nbsp<a href="CHANGELOG.md">English</a>
</p>

# 更新日志

SimAI 的所有重要变更均记录在此文件中。

格式基于 [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)。

> **注意**:本更新日志涵盖 v1.0(首次开源发布)及之后的版本。

## [未发布]

## [1.6.0] - 2026-03-16

### 新增

- GPU 内存计算模块:支持 DeepSeek-V3-671B、Qwen3-MoE-235B、Qwen3-Next-80B 的精确参数计数与 KV Cache 管理
- PD 分离内存规划:Prefill/Decode 阶段独立的内存预算计算
- 改进 AICB decode 时间估算(首尾线性插值 + 全局缓存)
- 4 场景端到端推理测试套件(`run_scenarios.sh`)
- SimAI 1.6 技术报告(EN/ZH)
- 完整双语文档系统(`docs/en/`、`docs/zh/` 下 30+ 文件)
- GitHub 社区规范文件:Issue/PR 模板、行为准则、安全政策、贡献指南

### 变更

- vidur-alibabacloud 各模块 print 输出替换为 logging
- 公开 API 添加双语 docstring
- TODO 注释格式统一规范化

### 移除

- 清理 vidur-alibabacloud 中约 390 行死代码
- 清理 8 个文件中的个人调试标记

## [1.5.0] - 2025-12-30

### 新增

- **端到端多请求推理仿真**:全面支持多请求推理工作负载的端到端仿真。
- **Prefill/Decode 分离**:支持 Prefill/Decode 阶段分离等复杂推理场景建模。
- **主流模型支持**:新增对 DeepSeek、Qwen3-MoE 和 Qwen3-Next 模型的支持。
- **基于 Vidur 的请求调度**:集成了基于微软 [Vidur](https://github.com/microsoft/vidur) 适配的请求调度组件(详见 [vidur-alibabacloud](./vidur-alibabacloud/))。
- **AICB 推理工作负载生成**:AICB 现已支持为 DeepSeek、Qwen3-MoE 和 Qwen3-Next 生成 prefill/decode 推理工作负载。
- **DeepSeek 训练工作负载支持**:AICB 新增 DeepSeek 训练工作负载生成支持(由 [@parthpower](https://github.com/parthpower) 贡献)。
- **SimCCL 首次发布**:SimCCL 集合通信转换模块首次对外公开发布。

## [1.0.0] - 2024-10-18

### 新增

- SimAI 首次开源发布:业界首个全栈高精度 AI 大规模训练模拟器
- 核心组件:AICB、SimCCL、astra-sim-alibabacloud、ns-3-alibabacloud
- SimAI-Analytical:基于总线带宽抽象的快速仿真
- SimAI-Simulation:基于 NS3 的全栈网络仿真
- SimAI-Physical(Beta):CPU RDMA 集群物理流量生成

### 学术

- SimAI 论文被 **NSDI'25 Spring** 接收。详见 [论文](https://arxiv.org/abs/2410.07346)。
Loading
Loading