Skip to content

Fix DeepSeek V4 gate softplus tail#655

Merged
zhangqi-chen merged 1 commit into
hw-native-sys:mainfrom
zhaozhaozz:fix/dsv4-gate-softplus
Jul 1, 2026
Merged

Fix DeepSeek V4 gate softplus tail#655
zhangqi-chen merged 1 commit into
hw-native-sys:mainfrom
zhaozhaozz:fix/dsv4-gate-softplus

Conversation

@zhaozhaozz

@zhaozhaozz zhaozhaozz commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Preserve a non-zero softplus tail for negative DeepSeek V4 gate logits.
  • Align the gate golden path with torch.nn.functional.softplus.

Validation

  • pre-commit: pass
  • unit-tests: pass
  • a2a3 real NPU: pass
  • Simulator numeric goldens are not used as the gate for this fix; the real NPU path is authoritative for the negative-logit softplus behavior.

Related Issues

None

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 97a05fef-1064-4ebd-9c62-a477bf5fda4f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The MoE router softplus computation in gate() is replaced with a two-part formula: a softplus-log from the positive region plus an explicit negative-region floor (masked near −10), combined via max before sqrt. The golden reference in golden_gate_core() is updated to use F.softplus(logits) instead of the prior manual numerically-stable formula.

Changes

DeepSeek-V4 MoE gate softplus update

Layer / File(s) Summary
gate() softplus with negative-region floor
models/deepseek/v4/gate.py
Replaces the single relu + log(exp(-|x|)+1) softplus with a two-part expression: gp_softplus_log (positive region) plus a masked gp_neg_floor correction near logits < −10, then max(gp_softplus_log, gp_neg_floor) before sqrt.
golden_gate_core() reference simplified
models/deepseek/v4/gate.py
Imports torch.nn.functional as F and replaces the manual stable softplus with F.softplus(logits) in the Python reference.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A bunny hopped through logit space,
Where negative floors found their place,
Max of two paths, softplus redone,
The golden ref now uses F.softplus — done! 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing the DeepSeek V4 gate softplus tail.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description matches the code changes by describing the negative-logit softplus tail fix and the golden path alignment to torch.nn.functional.softplus.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the gate computation in models/deepseek/v4/gate.py by introducing a negative floor mask and floor calculation to the softplus computation in the gate function. Additionally, it simplifies the golden_gate_core function by replacing the manual softplus implementation with PyTorch's native torch.nn.functional.softplus. There are no review comments, and I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@zhaozhaozz zhaozhaozz force-pushed the fix/dsv4-gate-softplus branch from 8092485 to 5014443 Compare July 1, 2026 05:04
@zhangqi-chen zhangqi-chen merged commit 779f611 into hw-native-sys:main Jul 1, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants