Skip to content

dsv4 decode_fwd: 7-layer perf config, drop lm_head tail#664

Draft
zhangqi-chen wants to merge 1 commit into
hw-native-sys:mainfrom
zhangqi-chen:decode-fwd-perf
Draft

dsv4 decode_fwd: 7-layer perf config, drop lm_head tail#664
zhangqi-chen wants to merge 1 commit into
hw-native-sys:mainfrom
zhangqi-chen:decode-fwd-perf

Conversation

@zhangqi-chen

Copy link
Copy Markdown
Collaborator

Summary

  • Reduce HCA_NUM_LAYERS 20->2 (CSA 3, FWD 7): 2 SWA + 2x(CSA+HCA) + 1 tail CSA = 7 layers, for a fast ep2 smoke/perf run.
  • Drop the lm_head_tp tail from l3_decode_fwd: host now outputs hidden_norm [N_RANKS, T, D] (post-final-norm hidden) instead of logits. Removes the lm_head_weight input, the logits output, and the lm_head window setup loop; spec builder outputs hidden_norm.
  • PASS a2a3 ep2 (--ptoas 0.46) with l2-swimlane + scope-stats.

- Reduce HCA_NUM_LAYERS 20->2 (CSA 3, FWD 7): 2 SWA + 2x(CSA+HCA) + 1
  tail CSA = 7 layers, for a fast ep2 smoke/perf run.
- Remove the lm_head_tp tail from l3_decode_fwd: host now outputs
  hidden_norm [N_RANKS, T, D] (post-final-norm hidden) instead of logits.
  Drops lm_head_weight input, the logits output, and the lm_head window
  setup loop; spec builder outputs hidden_norm.

PASS a2a3 ep2 (--ptoas 0.46) with l2-swimlane + scope-stats.
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ac421d7a-0b2e-4ec7-96ac-32f5c8be2836

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the language model head (lm_head) processing from the l3_decode_fwd function, shifting its output from logits to hidden_norm. It also reduces the layer counts for testing or smaller configurations and updates the tensor specifications accordingly. The review feedback recommends cleaning up now-unused imports (VOCAB_PER_TP) and dead code (LM_HEAD_NAMES and _make_lm_head_spec) resulting from these changes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines 97 to 100
from lm_head import (
TP_SIZE as LM_HEAD_ACTIVE_TP_SIZE,
T_MAX as LM_HEAD_T_MAX,
VOCAB_PER_TP,
lm_head_tp,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since the lm_head tail has been dropped from l3_decode_fwd, VOCAB_PER_TP is no longer used in any active code path. We should remove it from the import statement to keep the code clean. Note that LM_HEAD_ACTIVE_TP_SIZE is still used in an assertion on line 109, so we must keep it for now.

from lm_head import (
    TP_SIZE as LM_HEAD_ACTIVE_TP_SIZE,
)

Comment on lines 1240 to 1242
elif name in FINAL_NORM_NAMES:
specs.append(_make_final_norm_spec(name))
elif name in LM_HEAD_NAMES:
specs.append(_make_lm_head_spec(name))
else:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With the removal of the lm_head spec generation, LM_HEAD_NAMES (defined on line 125) and _make_lm_head_spec (defined on lines 899-913) are now dead code. Consider removing them from the file to improve maintainability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant