Optimize the implementation of scale. by Xreki · Pull Request #12 · lixinqi/Athena

Xreki · 2025-04-08T06:38:42Z

优化scale算子生成的代码，避免在代码中使用立即数。编译器默认立即数为double类型，影响整体性能。

优化前

// Note: need to support vectorized operation
__forceinline__ __host__ __device__
T operator()(T x, const Arguments& args, const MatrixCoord& coord) const {
  T out;
  float op1_out0 = static_cast<float>(0.100000 * x + 0.000000);
  out = op1_out0;
  return out;
}

优化后

  T out;
  float op1_scale = (0.100000);
  float op1_bias = (0.000000);
  float op1_out0 = static_cast<float>(op1_scale * x + op1_bias);
  out = op1_out0;
  return out;

[Add] full graph extraction function

Xreki added 2 commits April 8, 2025 13:04

Optimize the implementation of scale.

4000cdf

Merge branch 'ap' into opt_scale

1060704

hxzd5568 pushed a commit to hxzd5568/Athena that referenced this pull request Aug 18, 2025

Merge pull request lixinqi#12 from hxzd5568/add_full_graph

468719a

[Add] full graph extraction function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the implementation of scale.#12

Optimize the implementation of scale.#12
Xreki wants to merge 2 commits into
lixinqi:apfrom
Xreki:opt_scale

Xreki commented Apr 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Xreki commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Xreki commented Apr 8, 2025 •

edited

Loading