Skip to content

REF: Use moments accumulator from libalgos to compute variance#64848

Closed
Alvaro-Kothe wants to merge 4 commits into
pandas-dev:mainfrom
Alvaro-Kothe:refactor/combine-var-impl
Closed

REF: Use moments accumulator from libalgos to compute variance#64848
Alvaro-Kothe wants to merge 4 commits into
pandas-dev:mainfrom
Alvaro-Kothe:refactor/combine-var-impl

Conversation

@Alvaro-Kothe
Copy link
Copy Markdown
Member

@Alvaro-Kothe Alvaro-Kothe commented Mar 25, 2026


Use the moments accumulator functions introduced in #64366 to compute variance, standard error and standard error of the mean.

Not applying these changes to nanvar because had a big performance impact on several cases, mainly for integer types.


Benchmarks

Details

Change Before [3ba8677] <refactor/combine-var-impl~2> After [23b6f0f] <refactor/combine-var-impl> Ratio Benchmark (Parameter)
+ 69.6±9μs 81.1±0.6μs 1.17 groupby.GroupByMethods.time_dtype_as_field('float', 'std', 'direct', 1, 'cython')
+ 70.9±7μs 81.0±0.1μs 1.14 groupby.GroupByMethods.time_dtype_as_field('float', 'sem', 'direct', 1, 'cython')
+ 13.6±0.7ms 15.4±0.08ms 1.14 series_methods.NanOps.time_func('sem', 1000000, 'boolean')
- 53.3±0.7μs 48.4±0.3μs 0.91 series_methods.NanOps.time_func('std', 1000, 'float64')
- 40.6±0.5μs 36.9±0.4μs 0.91 series_methods.NanOps.time_func('std', 1000, 'int8')
- 5.75±0.01ms 5.20±0.5ms 0.9 rolling.Methods.time_method('DataFrame', ('rolling', {'window': 10}), 'float', 'sem')
- 4.40±0.03ms 3.90±0.4ms 0.89 rolling.Methods.time_method('DataFrame', ('expanding', {}), 'int', 'sem')
- 2.73±0ms 2.44±0.3ms 0.89 rolling.Methods.time_method('DataFrame', ('rolling', {'window': 10}), 'float', 'std')
- 140±0.7ms 121±0.6ms 0.86 groupby.GroupByCythonAggEaDtypes.time_frame_agg('Int64', 'var')
- 99.8±1ms 79.8±0.5ms 0.8 groupby.GroupByCythonAggEaDtypes.time_frame_agg('Int32', 'var')
- 92.7±1ms 72.0±0.6ms 0.78 groupby.GroupByCythonAggEaDtypes.time_frame_agg('Float64', 'var')

@jbrockmendel
Copy link
Copy Markdown
Member

jbrockmendel commented Mar 25, 2026

possibly also #51332? #61677?

@Alvaro-Kothe
Copy link
Copy Markdown
Member Author

I checked the reproduction from #51332. The results are exact when not using bottleneck. But due to bottleneck, I think it cannot be closed.

#61677 may be closed due to #63048.

@Alvaro-Kothe Alvaro-Kothe changed the title BUG: Use moments accumulator for variance in libalgos and fix float computation BUG: Use moments accumulator for variance in libalgos and fix complex variance computation Mar 25, 2026
@Alvaro-Kothe Alvaro-Kothe force-pushed the refactor/combine-var-impl branch from 98703da to 0a569f9 Compare March 26, 2026 23:39
@Alvaro-Kothe Alvaro-Kothe changed the title BUG: Use moments accumulator for variance in libalgos and fix complex variance computation BUG: Use moments accumulator for variance in libalgos Mar 26, 2026
@Alvaro-Kothe Alvaro-Kothe changed the title BUG: Use moments accumulator for variance in libalgos REF: Use moments accumulator from libalgos to compute variance Apr 2, 2026
@Alvaro-Kothe Alvaro-Kothe force-pushed the refactor/combine-var-impl branch from 0a569f9 to 3b74275 Compare April 2, 2026 00:29
@Alvaro-Kothe Alvaro-Kothe force-pushed the refactor/combine-var-impl branch from 3b74275 to 23b6f0f Compare April 4, 2026 23:08
@Alvaro-Kothe Alvaro-Kothe marked this pull request as ready for review April 4, 2026 23:12
@Alvaro-Kothe Alvaro-Kothe force-pushed the refactor/combine-var-impl branch from 23b6f0f to e857f81 Compare April 4, 2026 23:16
Comment thread pandas/_libs/algos.pxd Outdated
Comment thread pandas/_libs/groupby.pyx Outdated
@jbrockmendel
Copy link
Copy Markdown
Member

@Alvaro-Kothe is this worth spending more time reviewing, or is it superceded by the SIMD PR?

@Alvaro-Kothe
Copy link
Copy Markdown
Member Author

is this worth spending more time reviewing, or is it superceded by the SIMD PR?

This isn't superseded by the SIMD PR (#64905). But I think it's best to put this on a hold for now, because the SIMD code can be used for nanvar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants