Skip to content

Support magic_init#1075

Open
DanielSun11 wants to merge 2 commits into
PaddlePaddle:developfrom
DanielSun11:magic_init
Open

Support magic_init#1075
DanielSun11 wants to merge 2 commits into
PaddlePaddle:developfrom
DanielSun11:magic_init

Conversation

@DanielSun11
Copy link
Copy Markdown
Contributor

@DanielSun11 DanielSun11 commented May 30, 2026

PR Category

Operator Mechanism

PR Types

New features

Description

该 PR 旨在为 TransformerConfig 增加一个可选开关 magic_init,以对齐 ernie-core 的参数初始化行为,从而在 PaddleFleet 的 Transformer 相关模型中提供一致的初始化分布/方差选项。

Changes:

在 TransformerConfig 中新增 magic_init,开启时用固定公式 sqrt(0.3333 / hidden_size) 计算 std,并复用同一初始化方法用于 init/output/embedding。
增加单测覆盖 magic_init 的基本行为与 sigma 计算逻辑。

是否引起精度变化

Copilot AI review requested due to automatic review settings May 30, 2026 05:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 旨在为 TransformerConfig 增加一个可选开关 use_magic_weight_init,以对齐 ernie-core 的参数初始化行为,从而在 PaddleFleet 的 Transformer 相关模型中提供一致的初始化分布/方差选项。

Changes:

  • TransformerConfig 中新增 use_magic_weight_init,开启时用固定公式 sqrt(0.3333 / hidden_size) 计算 std,并复用同一初始化方法用于 init/output/embedding。
  • paddlefleet.utils 中新增 erniecore_init_method_normal(sigma) 初始化函数供配置层调用。
  • 增加单测覆盖 use_magic_weight_init 的基本行为与 sigma 计算逻辑。

另外:PR 标题未按仓库建议格式(如 [BugFix]...)填写;PR 描述模板未补充“为什么改/解决什么问题/是否引起精度变化”等信息,建议完善以便后续维护与回溯。

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
tests/single_card_tests/test_transformer_config.py 新增 use_magic_weight_init 的单测用例,覆盖 sigma 计算、方法一致性与 MoE 场景。
src/paddlefleet/utils.py 新增 ernie-core 对齐的初始化函数 erniecore_init_method_normal
src/paddlefleet/transformer/transformer_config.py 新增配置字段并在 __post_init__ 中接入 magic init 的初始化策略。

Comment thread src/paddlefleet/utils.py Outdated
Comment thread src/paddlefleet/transformer/transformer_config.py Outdated
Comment on lines +206 to +229
def test_use_magic_weight_init_true_sigma_calculation(self):
"""When use_magic_weight_init is True, sigma should be sqrt(0.3333 / hidden_size)."""
import math

hidden_size = 768
config = TransformerConfig(
num_hidden_layers=12,
hidden_size=hidden_size,
use_magic_weight_init=True,
)
expected_sigma = math.sqrt(0.3333 / hidden_size)
self.assertAlmostEqual(config.init_method_std, expected_sigma, places=6)

def test_use_magic_weight_init_true_all_methods_same(self):
"""When use_magic_weight_init is True, all init methods should be the same."""
config = TransformerConfig(
num_hidden_layers=12,
hidden_size=768,
use_magic_weight_init=True,
)
# All init methods should be the same function
self.assertIs(config.init_method, config.output_layer_init_method)
self.assertIs(config.init_method, config.embedding_init_method)

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@risemeup1111 risemeup1111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已完成初轮复查。发现两处需要先修复的阻塞问题,具体建议已放在行级评论里。

P3 优先级:P3
非行级:PR 描述不属于 diff 行,无法挂行级评论。当前描述仍是模板占位内容,并且 Check PR Description 已失败;这会让后续维护者无法判断该初始化开关的使用场景、精度影响和验证范围。建议补全 PR CategoryPR Types、变更动机、是否引起精度变化,以及已运行的测试/CI 结果。

Powered by Nyanpasu with gpt-5.5 xhigh, please check the suggestions carefully.

Comment thread src/paddlefleet/transformer/transformer_config.py Outdated
Comment thread src/paddlefleet/utils.py Outdated
Comment thread src/paddlefleet/utils.py Outdated
Comment thread src/paddlefleet/transformer/transformer_config.py Outdated
Comment thread src/paddlefleet/transformer/transformer_config.py Outdated
@PaddlePaddle PaddlePaddle deleted a comment from risemeup1111 May 30, 2026
Copy link
Copy Markdown
Collaborator

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-30 23:35:29

📋 Review 摘要

PR 概述:为 TransformerConfig 新增 magic_init 字段,启用后使用 sigma = sqrt(0.3333 / hidden_size) 的统一初始化方式设置所有 init method。
变更范围transformer_config.pyutils.py、单测
影响面 TagConfig Init

问题

级别 文件 概述
🟡 建议 transformer_config.py:882 魔法常量 0.3333 建议使用 1.0 / 3.0 提升精度与可读性
🟡 建议 transformer_config.py:943 magic_init=True 时无条件覆盖用户显式指定的 output_layer_init_methodembedding_init_method
❓ 疑问 utils.py:147 docstring 声称 "under fp32 default dtype guard" 但实现使用 weight.dtype

历史 Findings 修复情况

Finding 问题 状态
F1 新增字段未在 training/arguments.pytraining/yaml_arguments.py 中显式注册 ⚠️ 仍存在(字段已从 use_magic_weight_init 改名为 magic_initcore_transformer_config_from_args 动态转发机制可自动消费该字段,功能上不阻塞,但缺乏显式注册影响可发现性)

📝 PR 规范检查

标题缺少 Tag 前缀,描述模板各必填 section 内容为空。

标题建议(可直接复制):

  • [New features] Support use_magic_weight_init
PR 描述建议(点击展开,可直接复制)
### PR Category
<!-- One of [ User Experience | Execute Infrastructure | Operator Mechanism | Custom Device | Performance Optimization | Distributed Strategy | Parameter Server | Communication Library  | Environment Adaptation ] -->
Distributed Strategy
### PR Types
<!-- One of [ New features | Bug fixes | Improvements | Performance | BC Breaking | Deprecations | Docs | Devs | Not User Facing | Security | Others ] -->
New features
### Description
<!-- Describe what you've done -->
为 TransformerConfig 新增 `magic_init` 布尔字段(默认 False),启用后使用与 ernie-core 对齐的参数初始化方式:sigma = sqrt(0.3333 / hidden_size),并通过 `get_magic_init_method` 统一设置 init_method、output_layer_init_method 和 embedding_init_method。

总体评价

实现逻辑清晰,单测覆盖充分。建议关注魔法常量精度与用户显式指定 init method 被静默覆盖的问题。

Comment thread src/paddlefleet/transformer/transformer_config.py
)

if self.output_layer_init_method is None:
if self.magic_init:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 magic_init=True 时无条件覆盖用户显式设置的 output_layer_init_method

当前逻辑在 magic_init=True 时直接赋值,即使用户通过构造参数显式传入了自定义的 output_layer_init_method,也会被静默覆盖。同样的问题也存在于下方 embedding_init_method 的处理(line 956-958)。

建议修复策略:仅在用户未显式指定时覆盖,与 elif ... is None 模式保持一致:

if self.magic_init and self.output_layer_init_method is None:
    self.output_layer_init_method = self.init_method
elif self.output_layer_init_method is None:
    ...

如果设计意图就是强制统一,建议在 docstring 中明确说明 magic_init=True 会忽略用户指定的其他 init method。

Comment thread src/paddlefleet/utils.py
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@45ea854). Learn more about missing BASE report.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             develop    #1075   +/-   ##
==========================================
  Coverage           ?   92.59%           
==========================================
  Files              ?        4           
  Lines              ?       27           
  Branches           ?        7           
==========================================
  Hits               ?       25           
  Misses             ?        1           
  Partials           ?        1           
Flag Coverage Δ
coverage_combine 92.59% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/paddlefleet/transformer/transformer_config.py 94.44% <100.00%> (ø)
src/paddlefleet/utils.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@DanielSun11 DanielSun11 changed the title Support use_magic_weight_init Support magic_init May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants