Skip to content

Optimize the implementation of scale.#12

Open
Xreki wants to merge 2 commits into
lixinqi:apfrom
Xreki:opt_scale
Open

Optimize the implementation of scale.#12
Xreki wants to merge 2 commits into
lixinqi:apfrom
Xreki:opt_scale

Conversation

@Xreki

@Xreki Xreki commented Apr 8, 2025

Copy link
Copy Markdown

优化scale算子生成的代码,避免在代码中使用立即数。编译器默认立即数为double类型,影响整体性能。

  • 优化前

    // Note: need to support vectorized operation
    __forceinline__ __host__ __device__
    T operator()(T x, const Arguments& args, const MatrixCoord& coord) const {
      T out;
      float op1_out0 = static_cast<float>(0.100000 * x + 0.000000);
      out = op1_out0;
      return out;
    }
  • 优化后

      T out;
      float op1_scale = (0.100000);
      float op1_bias = (0.000000);
      float op1_out0 = static_cast<float>(op1_scale * x + op1_bias);
      out = op1_out0;
      return out;

hxzd5568 pushed a commit to hxzd5568/Athena that referenced this pull request Aug 18, 2025
[Add] full graph extraction function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant