Skip to content

LATX, opt: Inline PCLMULQDQ/VPCLMULQDQ#293

Merged
luzeng87 merged 1 commit into
lat-opensource:masterfrom
phorcys:la64_pclmul
May 14, 2026
Merged

LATX, opt: Inline PCLMULQDQ/VPCLMULQDQ#293
luzeng87 merged 1 commit into
lat-opensource:masterfrom
phorcys:la64_pclmul

Conversation

@phorcys
Copy link
Copy Markdown
Contributor

@phorcys phorcys commented May 12, 2026

Using simple ctz loop to accel.
Not using openssl's 4bit table-lookup, which have larger latency but higher gcm bandwidth.

Cipher baseline kB/s build64 vpaes kB/s vpaes/base with pclmul vpaes+pclmul / base
AES-128-GCM 16599.14 40809.53 2.459 80756.74 4.865
AES-192-GCM 13885.03 38278.98 2.757 73515.01 5.294
AES-256-GCM 12097.68 36680.10 3.032 67323.35 5.565

Copy link
Copy Markdown
Contributor

@luzeng87 luzeng87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

审查总结

将 PCLMULQDQ/VPCLMULQDQ 从 helper 函数调用替换为 CTZ 循环内联实现。GF(2^128) 多项式乘法,CTZ 跳过零位,迭代次数从 64 次降为 popcount(lhs) 次。GCM 吞吐量提升约 2 倍,收益显著。

问题

1. translate_pclmulqdq 128 位路径缺少寄存器别名保护

256 位路径在 s0 == s1 时会显式创建副本:

if (s0 == s1) {
    src1_copy = ra_alloc_ftemp();
    la_xvori_b(src1_copy, src1, 0);
}

但 128 位路径 (translate_pclmulqdq) 直接传 dest 作为 v 参数。虽然当前实现中读取在写入之前(函数先 vpickve2gr_dv,最后才 vinsgr2vr_dd),运行结果正确,但与 256 位路径不一致。建议添加相同保护逻辑,防止后续重构引入 bug。

2. CTZ 循环体重复 3 次

emit_pclmulqdq_ctzemit_vpclmulqdq_ctz_128emit_vpclmulqdq_ctz_256 包含几乎相同的 ~20 行 CTZ 循环。建议抽取公共函数 emit_pclmul_ctz_loop(lhs, rhs, res_lo, res_hi)

3. 提交标题大小写

按项目规范应为 LATX, opt: Inline PCLMULQDQ/VPCLMULQDQ(Inline 首字母大写)。

小问题

  • 变量重命名 alhsbrhsreslres_lo 清晰度提升明显
  • 旧代码 cal_pclmulqdqra_free_temp(ctrlp),新代码去掉了——符合 LATX 依赖 TB 结束时统一释放的惯例,无泄漏

结论

通过。 性能收益真实且显著,算法正确,无阻塞性问题。建议后续迭代时抽取公共 CTZ 循环体并统一别名保护。

@phorcys phorcys changed the title LATX, opt: inline PCLMULQDQ/VPCLMULQDQ LATX, opt: Inline PCLMULQDQ/VPCLMULQDQ May 14, 2026
@phorcys phorcys requested a review from luzeng87 May 14, 2026 09:52
@luzeng87 luzeng87 merged commit f24f73b into lat-opensource:master May 14, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants