Skip to content

Conversation

@usamoi
Copy link
Contributor

@usamoi usamoi commented Jan 1, 2026

@folkertdev
Copy link
Contributor

Can you add something like this

    /// This is a trick used in the adler32 algorithm to get a widening addition. The
    /// multiplication by 1 is trivial, but must not be optimized out because then the vpmaddwd
    /// instruction is no longer selected.
    #[target_feature(enable = "avx2")]
    #[cfg_attr(test, assert_instr(vpmaddwd))]
    unsafe fn _mm256_madd_epi16_mul_one(mad: __m256i) -> __m256i {
        let one_v = _mm256_set1_epi16(1);
        _mm256_madd_epi16(mad, one_v)
    }

So that we remember in the future why we can't just use that generic implementation?

@usamoi
Copy link
Contributor Author

usamoi commented Jan 1, 2026

I have added the following comments:

// It's a trick used in the Adler-32 algorithm to perform a widening addition.
//
// ```rust
// #[target_feature(enable = "sse2")]
// unsafe fn widening_add(mad: __m128i) -> __m128i {
//     _mm_madd_epi16(mad, _mm_set1_epi16(1))
// }
// ```
//
// If we implement this using generic vector intrinsics, the optimizer
// will eliminate this pattern, and `pmaddwd` will no longer be emitted.
// For this reason, we use x86 intrinsics.

Comment on lines -213 to 227
#[rustc_const_unstable(feature = "stdarch_const_x86", issue = "149298")]
pub const fn _mm_madd_epi16(a: __m128i, b: __m128i) -> __m128i {
unsafe {
let r: i32x8 = simd_mul(simd_cast(a.as_i16x8()), simd_cast(b.as_i16x8()));
let even: i32x4 = simd_shuffle!(r, r, [0, 2, 4, 6]);
let odd: i32x4 = simd_shuffle!(r, r, [1, 3, 5, 7]);
simd_add(even, odd).as_m128i()
}
pub fn _mm_madd_epi16(a: __m128i, b: __m128i) -> __m128i {
// It's a trick used in the Adler-32 algorithm to perform a widening addition.
//
// ```rust
// #[target_feature(enable = "sse2")]
// unsafe fn widening_add(mad: __m128i) -> __m128i {
// _mm_madd_epi16(mad, _mm_set1_epi16(1))
// }
// ```
//
// If we implement this using generic vector intrinsics, the optimizer
// will eliminate this pattern, and `pmaddwd` will no longer be emitted.
// For this reason, we use x86 intrinsics.
unsafe { transmute(pmaddwd(a.as_i16x8(), b.as_i16x8())) }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could consider using https://doc.rust-lang.org/std/intrinsics/fn.const_eval_select.html so we don't loose all of the const stuff. Up to @sayantn though, I don't have full context on what we'd like to be const fn currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[target_feature] functions do not implement the Fn traits, while const_eval_select restricts FnOnce. So this does not seem feasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants