BUG: combine all implementations of kurtosis and skewness computations and align results with SciPy by Alvaro-Kothe · Pull Request #64366 · pandas-dev/pandas

Alvaro-Kothe · 2026-03-02T15:48:19Z

Unifies implementation for groupby, nanops and rolling window.
Limits nanskew and nankurt to support only axis={0, 1, None}
fix the skewness and kurtosis computation to follow the results from scipy, where it returns NaN on degenerate distributions (everything is constant).
On window methods, it no longers returns NaN on low variance distributions for kurtosis and skewness.
I tried to make the refactor without the bugfix first, but, unfortunately, the tests asserts inconsistencies, namely, kurtosis on degenerate distributions are expected to be -3.0 for window computations, while it's expected to be 0.0 on nanops.
Everything is using online computation to accumulate moments, so the implementation is robust against degenerate distributions and no longer needs to check for floating point summation error.
The benchmark results (in details) have shown improvements in most of the operations, with some performance degradation in the case of SeriesOps and FrameOps (with axis={None,0})

Details

Change	Before [`d778f33`] <refactor/skew-kurt-reductions~6>	After [`56add91`] <refactor/skew-kurt-reductions>	Ratio	Benchmark (Parameter)
+	2.69±0.02ms	4.12±0.01ms	1.53	stat_ops.FrameOps.time_op('kurt', 'int', 0)
+	635±4μs	968±3μs	1.52	stat_ops.SeriesOps.time_op('kurt', 'int')
+	3.13±0.01ms	4.50±0.02ms	1.44	stat_ops.FrameOps.time_op('kurt', 'int', None)
+	684±6μs	943±4μs	1.38	stat_ops.SeriesOps.time_op('skew', 'int')
+	2.90±0.02ms	3.97±0.02ms	1.37	stat_ops.FrameOps.time_op('skew', 'int', 0)
+	3.31±0.01ms	4.35±0.01ms	1.31	stat_ops.FrameOps.time_op('skew', 'int', None)
-	3.23±0.01ms	2.92±0.01ms	0.91	rolling.ForwardWindowMethods.time_rolling('Series', 10, 'float', 'kurt')
-	3.33±0.01ms	3.02±0.02ms	0.91	rolling.Methods.time_method('DataFrame', ('rolling', {'window': 1000}), 'float', 'kurt')
-	3.05±0.01ms	2.76±0.01ms	0.91	rolling.Methods.time_method('DataFrame', ('rolling', {'window': 10}), 'int', 'skew')
-	2.96±0.01ms	2.68±0.01ms	0.91	rolling.Methods.time_method('Series', ('rolling', {'window': 1000}), 'int', 'skew')
-	2.85±0.01ms	2.58±0.01ms	0.91	rolling.Methods.time_method('Series', ('rolling', {'window': 10}), 'float', 'skew')
-	3.26±0.03ms	2.96±0.01ms	0.91	rolling.VariableWindowMethods.time_method('Series', '1d', 'int', 'skew')
-	3.27±0.02ms	2.95±0.01ms	0.9	rolling.ForwardWindowMethods.time_rolling('DataFrame', 10, 'float', 'kurt')
-	3.20±0.01ms	2.89±0.01ms	0.9	rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'float', 'kurt')
-	2.95±0.01ms	2.65±0.02ms	0.9	rolling.Methods.time_method('DataFrame', ('rolling', {'window': 10}), 'float', 'skew')
-	3.49±0.01ms	3.16±0.02ms	0.9	rolling.Methods.time_method('DataFrame', ('rolling', {'window': 10}), 'int', 'kurt')
-	1.94±0.02ms	1.74±0.02ms	0.9	rolling.Methods.time_method('Series', ('expanding', {}), 'float', 'skew')
-	2.83±0.02ms	2.55±0.01ms	0.9	rolling.Methods.time_method('Series', ('rolling', {'window': 1000}), 'float', 'skew')
-	3.30±0.02ms	2.95±0.01ms	0.89	rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'float', 'kurt')
-	2.15±0.1ms	1.92±0ms	0.89	rolling.Methods.time_method('DataFrame', ('expanding', {}), 'int', 'skew')
-	30.4±0.2ms	26.7±0.1ms	0.88	stat_ops.FrameOps.time_op('skew', 'Int64', 1)
-	11.6±0.2ms	9.31±0.03ms	0.8	series_methods.NanOps.time_func('kurt', `1000000`, 'int8')
-	12.1±0.1ms	9.43±0.01ms	0.78	series_methods.NanOps.time_func('kurt', `1000000`, 'int32')
-	12.4±0.8ms	9.65±0.02ms	0.78	series_methods.NanOps.time_func('kurt', `1000000`, 'int64')
-	12.1±0.1ms	8.99±0.02ms	0.75	series_methods.NanOps.time_func('skew', `1000000`, 'int8')
-	12.5±0.2ms	9.12±0.04ms	0.73	series_methods.NanOps.time_func('skew', `1000000`, 'int32')
-	12.9±1ms	9.34±0.01ms	0.72	series_methods.NanOps.time_func('skew', `1000000`, 'int64')
-	21.0±0.1ms	15.2±0.3ms	0.72	stat_ops.FrameOps.time_op('skew', 'int', 1)
-	15.4±0.9ms	9.74±0.03ms	0.63	series_methods.NanOps.time_func('kurt', `1000000`, 'Int64')
-	15.1±0.07ms	9.44±0.03ms	0.63	series_methods.NanOps.time_func('kurt', `1000000`, 'boolean')
-	15.8±0.9ms	9.47±0.01ms	0.6	series_methods.NanOps.time_func('skew', `1000000`, 'Int64')
-	26.1±1ms	15.7±0.3ms	0.6	stat_ops.FrameOps.time_op('kurt', 'int', 1)
-	14.9±0.1ms	8.75±0.01ms	0.59	series_methods.NanOps.time_func('kurt', `1000000`, 'float64')
-	15.5±0.06ms	9.12±0.02ms	0.59	series_methods.NanOps.time_func('skew', `1000000`, 'boolean')
-	15.3±0.05ms	8.43±0.01ms	0.55	series_methods.NanOps.time_func('skew', `1000000`, 'float64')
-	42.0±0.6ms	15.1±0.3ms	0.36	stat_ops.FrameOps.time_op('kurt', 'float', 1)
-	13.4±0.1ms	4.65±0.01ms	0.35	stat_ops.FrameOps.time_op('kurt', 'Int64', None)
-	13.8±1ms	4.74±0.02ms	0.34	stat_ops.FrameOps.time_op('kurt', 'float', None)
-	13.3±0.1ms	4.52±0.01ms	0.34	stat_ops.FrameOps.time_op('skew', 'Int64', None)
-	13.5±1ms	4.62±0.04ms	0.34	stat_ops.FrameOps.time_op('skew', 'float', None)
-	65.3±2μs	21.7±0.3μs	0.33	series_methods.NanOps.time_func('kurt', 1000, 'Int64')
-	66.1±0.8μs	21.6±0.4μs	0.33	series_methods.NanOps.time_func('kurt', 1000, 'boolean')
-	67.9±2μs	22.4±0.4μs	0.33	series_methods.NanOps.time_func('skew', 1000, 'boolean')
-	43.6±0.6ms	14.4±0.3ms	0.33	stat_ops.FrameOps.time_op('skew', 'float', 1)
-	58.9±1μs	18.3±0.3μs	0.31	series_methods.NanOps.time_func('kurt', 1000, 'int64')
-	58.4±0.5μs	18.3±0.5μs	0.31	series_methods.NanOps.time_func('kurt', 1000, 'int8')
-	70.8±1μs	21.6±0.2μs	0.31	series_methods.NanOps.time_func('skew', 1000, 'Int64')
-	61.6±2μs	18.3±0.5μs	0.3	series_methods.NanOps.time_func('kurt', 1000, 'int32')
-	61.5±1μs	18.5±0.5μs	0.3	series_methods.NanOps.time_func('skew', 1000, 'int32')
-	64.0±1μs	18.4±0.3μs	0.29	series_methods.NanOps.time_func('skew', 1000, 'int64')
-	61.8±0.4μs	18.1±0.2μs	0.29	series_methods.NanOps.time_func('skew', 1000, 'int8')
-	64.0±0.7μs	17.9±0.1μs	0.28	series_methods.NanOps.time_func('kurt', 1000, 'float64')
-	67.3±1μs	17.7±0.5μs	0.26	series_methods.NanOps.time_func('skew', 1000, 'float64')

mroeschke

This is very exciting! I would love to see all our reductions implementations in nanops, groupby, and rolling use a shared implementation like this

jbrockmendel · 2026-03-06T18:12:17Z

Any idea why the perf hit for the slower cases? Is it a perf-vs-accuracy tradeoff?

jbrockmendel · 2026-03-06T18:12:37Z

I think there's at least one other "align with scipy" PR that this will close

Alvaro-Kothe · 2026-03-06T19:23:28Z

I think there's at least one other "align with scipy" PR that this will close

I believe this also closes #53639. Thanks; please let me know if there are others.

Any idea why the perf hit for the slower cases? Is it a perf-vs-accuracy tradeoff?

I didn't manage to figure it out. All the regressions are focused on int data types with axis=0 or axis=None. Previously, I experienced some performance regressions with the flloat case, which was mitigated by assuming the 2D array was F-contiguous.

Alvaro-Kothe · 2026-03-06T19:31:23Z

A hypothesis is that NumPy have optimizations for it's sum method (e.g., simd, loop unrolling) that may improve the performance. It's not possible to use simd in the Welford algorithm. The best improvement that I know of is using the pairwise version (Divide and Conquer) that permits using multiprocessing.

rhshadrach · 2026-03-06T21:52:50Z

Previously, I experienced some performance regressions with the flloat case, which was mitigated by assuming the 2D array was F-contiguous.

Can this cause issues such as #61031?

Alvaro-Kothe · 2026-03-06T22:04:04Z

Can this cause issues such as #61031?

No. This PR doesn't enforce contiguity in the memoryview.

It also introduces behavior changes, mainly on degenerate distributions for nanops and window methods, now it returns NaN. Also changes the behavior on low variance windows, to make it tolerant against low variance distributions, where it no longer clips the result to 0.0.

type: fix mypy errors docs: use round in skew() and kurt() methods to ignore fperr

mroeschke

Generally looks good. A few follow up questions

jbrockmendel · 2026-03-11T19:30:58Z

On today's dev call we discussed #62536 and whether this (and follow-ups for other reductions) could make it irrelevant. Does numba do something smart that we can adapt here?

Alvaro-Kothe · 2026-03-11T19:47:21Z

On today's dev call we discussed #62536 and whether this (and follow-ups for other reductions) could make it irrelevant. Does numba do something smart that we can adapt here?

I am not familiar with Numba to know how it compiles the code.

But performance-wise it's still possible to optimize this implementation even further, mainly by adding parallelization and SIMD (Wikipedia says it's possible).

If necessary, I can include a merge function in this PR that will be useful for future optimizations.

jbrockmendel · 2026-03-11T22:29:11Z

I know it would break the "just one implementation", but what if we just kept the numpy version for int dtype?

Alvaro-Kothe · 2026-03-13T13:57:25Z

I know it would break the "just one implementation", but what if we just kept the numpy version for int dtype?

Not fond of the idea, but if it's necessary to maintain performance may be valid. Another option is to try to create moments reduction for integer types.

Alvaro-Kothe · 2026-03-13T14:04:39Z

I put up an implementation using parallelization with OpenMP for 1D reduction in #64582. Had some performance benefits on my machine, but I ended up implementing it in C for better control of OMP directives.

mroeschke

Overall I think this is a net improvement. @jbrockmendel feel free to merge if satisfied

jbrockmendel · 2026-03-13T17:51:43Z

Handful of small things, nothing big

jbrockmendel · 2026-03-13T20:33:03Z

Thanks @Alvaro-Kothe

Alvaro-Kothe changed the title ~~fix: combine all implementations of kurtosis and skewness computations and align results with SciPy~~ BUG: combine all implementations of kurtosis and skewness computations and align results with SciPy Mar 3, 2026

Alvaro-Kothe force-pushed the refactor/skew-kurt-reductions branch from 11a22d6 to 56add91 Compare March 3, 2026 23:59

mroeschke reviewed Mar 4, 2026

View reviewed changes

Comment thread pandas/_libs/algos.pxd

mroeschke added the Reduction Operations sum, mean, min, max, etc. label Mar 4, 2026

Alvaro-Kothe commented Mar 4, 2026

View reviewed changes

Comment thread pandas/_libs/algos.pyx Outdated

jbrockmendel mentioned this pull request Mar 7, 2026

BUG: Aligned skew and kurt results with scipy.stats #62925

Closed

5 tasks

Alvaro-Kothe added 8 commits March 9, 2026 18:52

Fix CI errors

3c895ea

type: fix mypy errors docs: use round in skew() and kurt() methods to ignore fperr

docs: correct statement about Series.kurt

175f9bd

baseline

e5d1be8

baseline 2

551e805

baseline3

f3eb25d

refactor: add axis parameter

abf39a1

test: remove if block

f9d932f

Alvaro-Kothe force-pushed the refactor/skew-kurt-reductions branch from 9378baf to f9d932f Compare March 9, 2026 21:57

mroeschke reviewed Mar 11, 2026

View reviewed changes

Comment thread pandas/_libs/algos.pxd

mroeschke reviewed Mar 11, 2026

View reviewed changes

Comment thread pandas/_libs/algos.pxd

mroeschke reviewed Mar 11, 2026

View reviewed changes

mroeschke added this to the 3.1 milestone Mar 11, 2026

mroeschke approved these changes Mar 13, 2026

View reviewed changes