dsv4 decode_fwd: 7-layer perf config, drop lm_head tail by zhangqi-chen · Pull Request #664 · hw-native-sys/pypto-lib

zhangqi-chen · 2026-07-01T09:10:14Z

Summary

Reduce HCA_NUM_LAYERS 20->2 (CSA 3, FWD 7): 2 SWA + 2x(CSA+HCA) + 1 tail CSA = 7 layers, for a fast ep2 smoke/perf run.
Drop the lm_head_tp tail from l3_decode_fwd: host now outputs hidden_norm [N_RANKS, T, D] (post-final-norm hidden) instead of logits. Removes the lm_head_weight input, the logits output, and the lm_head window setup loop; spec builder outputs hidden_norm.
PASS a2a3 ep2 (--ptoas 0.46) with l2-swimlane + scope-stats.

- Reduce HCA_NUM_LAYERS 20->2 (CSA 3, FWD 7): 2 SWA + 2x(CSA+HCA) + 1 tail CSA = 7 layers, for a fast ep2 smoke/perf run. - Remove the lm_head_tp tail from l3_decode_fwd: host now outputs hidden_norm [N_RANKS, T, D] (post-final-norm hidden) instead of logits. Drops lm_head_weight input, the logits output, and the lm_head window setup loop; spec builder outputs hidden_norm. PASS a2a3 ep2 (--ptoas 0.46) with l2-swimlane + scope-stats.

coderabbitai · 2026-07-01T09:10:22Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ac421d7a-0b2e-4ec7-96ac-32f5c8be2836

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

gemini-code-assist

Code Review

This pull request removes the language model head (lm_head) processing from the l3_decode_fwd function, shifting its output from logits to hidden_norm. It also reduces the layer counts for testing or smaller configurations and updates the tensor specifications accordingly. The review feedback recommends cleaning up now-unused imports (VOCAB_PER_TP) and dead code (LM_HEAD_NAMES and _make_lm_head_spec) resulting from these changes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-01T09:12:20Z

 from lm_head import (
    TP_SIZE as LM_HEAD_ACTIVE_TP_SIZE,
-    T_MAX as LM_HEAD_T_MAX,
    VOCAB_PER_TP,
-    lm_head_tp,
 )


Since the lm_head tail has been dropped from l3_decode_fwd, VOCAB_PER_TP is no longer used in any active code path. We should remove it from the import statement to keep the code clean. Note that LM_HEAD_ACTIVE_TP_SIZE is still used in an assertion on line 109, so we must keep it for now.

from lm_head import ( TP_SIZE as LM_HEAD_ACTIVE_TP_SIZE, )

gemini-code-assist · 2026-07-01T09:12:20Z

        elif name in FINAL_NORM_NAMES:
            specs.append(_make_final_norm_spec(name))
-        elif name in LM_HEAD_NAMES:
-            specs.append(_make_lm_head_spec(name))
        else:


With the removal of the lm_head spec generation, LM_HEAD_NAMES (defined on line 125) and _make_lm_head_spec (defined on lines 899-913) are now dead code. Consider removing them from the file to improve maintainability.

gemini-code-assist Bot reviewed Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dsv4 decode_fwd: 7-layer perf config, drop lm_head tail#664

dsv4 decode_fwd: 7-layer perf config, drop lm_head tail#664
zhangqi-chen wants to merge 1 commit into
hw-native-sys:mainfrom
zhangqi-chen:decode-fwd-perf

zhangqi-chen commented Jul 1, 2026

Uh oh!

coderabbitai Bot commented Jul 1, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

zhangqi-chen commented Jul 1, 2026

Summary

Uh oh!

coderabbitai Bot commented Jul 1, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant