Skip to content

BUG: combine all implementations of kurtosis and skewness computations and align results with SciPy#64366

Merged
jbrockmendel merged 12 commits into
pandas-dev:mainfrom
Alvaro-Kothe:refactor/skew-kurt-reductions
Mar 13, 2026
Merged

BUG: combine all implementations of kurtosis and skewness computations and align results with SciPy#64366
jbrockmendel merged 12 commits into
pandas-dev:mainfrom
Alvaro-Kothe:refactor/skew-kurt-reductions

Conversation

@Alvaro-Kothe
Copy link
Copy Markdown
Member

@Alvaro-Kothe Alvaro-Kothe commented Mar 2, 2026


  • Unifies implementation for groupby, nanops and rolling window.
  • Limits nanskew and nankurt to support only axis={0, 1, None}
  • fix the skewness and kurtosis computation to follow the results from scipy, where it returns NaN on degenerate distributions (everything is constant).
  • On window methods, it no longers returns NaN on low variance distributions for kurtosis and skewness.
  • I tried to make the refactor without the bugfix first, but, unfortunately, the tests asserts inconsistencies, namely, kurtosis on degenerate distributions are expected to be -3.0 for window computations, while it's expected to be 0.0 on nanops.
  • Everything is using online computation to accumulate moments, so the implementation is robust against degenerate distributions and no longer needs to check for floating point summation error.
  • The benchmark results (in details) have shown improvements in most of the operations, with some performance degradation in the case of SeriesOps and FrameOps (with axis={None,0})
Details
Change Before [d778f33] <refactor/skew-kurt-reductions~6> After [56add91] <refactor/skew-kurt-reductions> Ratio Benchmark (Parameter)
+ 2.69±0.02ms 4.12±0.01ms 1.53 stat_ops.FrameOps.time_op('kurt', 'int', 0)
+ 635±4μs 968±3μs 1.52 stat_ops.SeriesOps.time_op('kurt', 'int')
+ 3.13±0.01ms 4.50±0.02ms 1.44 stat_ops.FrameOps.time_op('kurt', 'int', None)
+ 684±6μs 943±4μs 1.38 stat_ops.SeriesOps.time_op('skew', 'int')
+ 2.90±0.02ms 3.97±0.02ms 1.37 stat_ops.FrameOps.time_op('skew', 'int', 0)
+ 3.31±0.01ms 4.35±0.01ms 1.31 stat_ops.FrameOps.time_op('skew', 'int', None)
- 3.23±0.01ms 2.92±0.01ms 0.91 rolling.ForwardWindowMethods.time_rolling('Series', 10, 'float', 'kurt')
- 3.33±0.01ms 3.02±0.02ms 0.91 rolling.Methods.time_method('DataFrame', ('rolling', {'window': 1000}), 'float', 'kurt')
- 3.05±0.01ms 2.76±0.01ms 0.91 rolling.Methods.time_method('DataFrame', ('rolling', {'window': 10}), 'int', 'skew')
- 2.96±0.01ms 2.68±0.01ms 0.91 rolling.Methods.time_method('Series', ('rolling', {'window': 1000}), 'int', 'skew')
- 2.85±0.01ms 2.58±0.01ms 0.91 rolling.Methods.time_method('Series', ('rolling', {'window': 10}), 'float', 'skew')
- 3.26±0.03ms 2.96±0.01ms 0.91 rolling.VariableWindowMethods.time_method('Series', '1d', 'int', 'skew')
- 3.27±0.02ms 2.95±0.01ms 0.9 rolling.ForwardWindowMethods.time_rolling('DataFrame', 10, 'float', 'kurt')
- 3.20±0.01ms 2.89±0.01ms 0.9 rolling.ForwardWindowMethods.time_rolling('Series', 1000, 'float', 'kurt')
- 2.95±0.01ms 2.65±0.02ms 0.9 rolling.Methods.time_method('DataFrame', ('rolling', {'window': 10}), 'float', 'skew')
- 3.49±0.01ms 3.16±0.02ms 0.9 rolling.Methods.time_method('DataFrame', ('rolling', {'window': 10}), 'int', 'kurt')
- 1.94±0.02ms 1.74±0.02ms 0.9 rolling.Methods.time_method('Series', ('expanding', {}), 'float', 'skew')
- 2.83±0.02ms 2.55±0.01ms 0.9 rolling.Methods.time_method('Series', ('rolling', {'window': 1000}), 'float', 'skew')
- 3.30±0.02ms 2.95±0.01ms 0.89 rolling.ForwardWindowMethods.time_rolling('DataFrame', 1000, 'float', 'kurt')
- 2.15±0.1ms 1.92±0ms 0.89 rolling.Methods.time_method('DataFrame', ('expanding', {}), 'int', 'skew')
- 30.4±0.2ms 26.7±0.1ms 0.88 stat_ops.FrameOps.time_op('skew', 'Int64', 1)
- 11.6±0.2ms 9.31±0.03ms 0.8 series_methods.NanOps.time_func('kurt', 1000000, 'int8')
- 12.1±0.1ms 9.43±0.01ms 0.78 series_methods.NanOps.time_func('kurt', 1000000, 'int32')
- 12.4±0.8ms 9.65±0.02ms 0.78 series_methods.NanOps.time_func('kurt', 1000000, 'int64')
- 12.1±0.1ms 8.99±0.02ms 0.75 series_methods.NanOps.time_func('skew', 1000000, 'int8')
- 12.5±0.2ms 9.12±0.04ms 0.73 series_methods.NanOps.time_func('skew', 1000000, 'int32')
- 12.9±1ms 9.34±0.01ms 0.72 series_methods.NanOps.time_func('skew', 1000000, 'int64')
- 21.0±0.1ms 15.2±0.3ms 0.72 stat_ops.FrameOps.time_op('skew', 'int', 1)
- 15.4±0.9ms 9.74±0.03ms 0.63 series_methods.NanOps.time_func('kurt', 1000000, 'Int64')
- 15.1±0.07ms 9.44±0.03ms 0.63 series_methods.NanOps.time_func('kurt', 1000000, 'boolean')
- 15.8±0.9ms 9.47±0.01ms 0.6 series_methods.NanOps.time_func('skew', 1000000, 'Int64')
- 26.1±1ms 15.7±0.3ms 0.6 stat_ops.FrameOps.time_op('kurt', 'int', 1)
- 14.9±0.1ms 8.75±0.01ms 0.59 series_methods.NanOps.time_func('kurt', 1000000, 'float64')
- 15.5±0.06ms 9.12±0.02ms 0.59 series_methods.NanOps.time_func('skew', 1000000, 'boolean')
- 15.3±0.05ms 8.43±0.01ms 0.55 series_methods.NanOps.time_func('skew', 1000000, 'float64')
- 42.0±0.6ms 15.1±0.3ms 0.36 stat_ops.FrameOps.time_op('kurt', 'float', 1)
- 13.4±0.1ms 4.65±0.01ms 0.35 stat_ops.FrameOps.time_op('kurt', 'Int64', None)
- 13.8±1ms 4.74±0.02ms 0.34 stat_ops.FrameOps.time_op('kurt', 'float', None)
- 13.3±0.1ms 4.52±0.01ms 0.34 stat_ops.FrameOps.time_op('skew', 'Int64', None)
- 13.5±1ms 4.62±0.04ms 0.34 stat_ops.FrameOps.time_op('skew', 'float', None)
- 65.3±2μs 21.7±0.3μs 0.33 series_methods.NanOps.time_func('kurt', 1000, 'Int64')
- 66.1±0.8μs 21.6±0.4μs 0.33 series_methods.NanOps.time_func('kurt', 1000, 'boolean')
- 67.9±2μs 22.4±0.4μs 0.33 series_methods.NanOps.time_func('skew', 1000, 'boolean')
- 43.6±0.6ms 14.4±0.3ms 0.33 stat_ops.FrameOps.time_op('skew', 'float', 1)
- 58.9±1μs 18.3±0.3μs 0.31 series_methods.NanOps.time_func('kurt', 1000, 'int64')
- 58.4±0.5μs 18.3±0.5μs 0.31 series_methods.NanOps.time_func('kurt', 1000, 'int8')
- 70.8±1μs 21.6±0.2μs 0.31 series_methods.NanOps.time_func('skew', 1000, 'Int64')
- 61.6±2μs 18.3±0.5μs 0.3 series_methods.NanOps.time_func('kurt', 1000, 'int32')
- 61.5±1μs 18.5±0.5μs 0.3 series_methods.NanOps.time_func('skew', 1000, 'int32')
- 64.0±1μs 18.4±0.3μs 0.29 series_methods.NanOps.time_func('skew', 1000, 'int64')
- 61.8±0.4μs 18.1±0.2μs 0.29 series_methods.NanOps.time_func('skew', 1000, 'int8')
- 64.0±0.7μs 17.9±0.1μs 0.28 series_methods.NanOps.time_func('kurt', 1000, 'float64')
- 67.3±1μs 17.7±0.5μs 0.26 series_methods.NanOps.time_func('skew', 1000, 'float64')

@Alvaro-Kothe Alvaro-Kothe changed the title fix: combine all implementations of kurtosis and skewness computations and align results with SciPy BUG: combine all implementations of kurtosis and skewness computations and align results with SciPy Mar 3, 2026
@Alvaro-Kothe Alvaro-Kothe force-pushed the refactor/skew-kurt-reductions branch from 11a22d6 to 56add91 Compare March 3, 2026 23:59
Copy link
Copy Markdown
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very exciting! I would love to see all our reductions implementations in nanops, groupby, and rolling use a shared implementation like this

Comment thread pandas/_libs/algos.pxd
@mroeschke mroeschke added the Reduction Operations sum, mean, min, max, etc. label Mar 4, 2026
Comment thread pandas/_libs/algos.pyx Outdated
@jbrockmendel
Copy link
Copy Markdown
Member

Any idea why the perf hit for the slower cases? Is it a perf-vs-accuracy tradeoff?

@jbrockmendel
Copy link
Copy Markdown
Member

I think there's at least one other "align with scipy" PR that this will close

@Alvaro-Kothe
Copy link
Copy Markdown
Member Author

I think there's at least one other "align with scipy" PR that this will close

I believe this also closes #53639. Thanks; please let me know if there are others.

Any idea why the perf hit for the slower cases? Is it a perf-vs-accuracy tradeoff?

I didn't manage to figure it out. All the regressions are focused on int data types with axis=0 or axis=None. Previously, I experienced some performance regressions with the flloat case, which was mitigated by assuming the 2D array was F-contiguous.

@Alvaro-Kothe
Copy link
Copy Markdown
Member Author

Alvaro-Kothe commented Mar 6, 2026

A hypothesis is that NumPy have optimizations for it's sum method (e.g., simd, loop unrolling) that may improve the performance. It's not possible to use simd in the Welford algorithm. The best improvement that I know of is using the pairwise version (Divide and Conquer) that permits using multiprocessing.

@rhshadrach
Copy link
Copy Markdown
Member

Previously, I experienced some performance regressions with the flloat case, which was mitigated by assuming the 2D array was F-contiguous.

Can this cause issues such as #61031?

@Alvaro-Kothe
Copy link
Copy Markdown
Member Author

Can this cause issues such as #61031?

No. This PR doesn't enforce contiguity in the memoryview.

It also introduces behavior changes, mainly on degenerate distributions
for nanops and window methods, now it returns NaN.
Also changes the behavior on low variance windows, to make it
tolerant against low variance distributions, where it no longer clips
the result to 0.0.
type: fix mypy errors

docs: use round in skew() and kurt() methods to ignore fperr
@Alvaro-Kothe Alvaro-Kothe force-pushed the refactor/skew-kurt-reductions branch from 9378baf to f9d932f Compare March 9, 2026 21:57
Comment thread pandas/_libs/algos.pxd
Comment thread pandas/_libs/algos.pxd
Copy link
Copy Markdown
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. A few follow up questions

@mroeschke mroeschke added this to the 3.1 milestone Mar 11, 2026
@jbrockmendel
Copy link
Copy Markdown
Member

On today's dev call we discussed #62536 and whether this (and follow-ups for other reductions) could make it irrelevant. Does numba do something smart that we can adapt here?

@Alvaro-Kothe
Copy link
Copy Markdown
Member Author

On today's dev call we discussed #62536 and whether this (and follow-ups for other reductions) could make it irrelevant. Does numba do something smart that we can adapt here?

I am not familiar with Numba to know how it compiles the code.

But performance-wise it's still possible to optimize this implementation even further, mainly by adding parallelization and SIMD (Wikipedia says it's possible).

If necessary, I can include a merge function in this PR that will be useful for future optimizations.

@jbrockmendel
Copy link
Copy Markdown
Member

I know it would break the "just one implementation", but what if we just kept the numpy version for int dtype?

@Alvaro-Kothe
Copy link
Copy Markdown
Member Author

I know it would break the "just one implementation", but what if we just kept the numpy version for int dtype?

Not fond of the idea, but if it's necessary to maintain performance may be valid. Another option is to try to create moments reduction for integer types.

@Alvaro-Kothe
Copy link
Copy Markdown
Member Author

I put up an implementation using parallelization with OpenMP for 1D reduction in #64582. Had some performance benefits on my machine, but I ended up implementing it in C for better control of OMP directives.

Copy link
Copy Markdown
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is a net improvement. @jbrockmendel feel free to merge if satisfied

Comment thread pandas/core/nanops.py Outdated
Comment thread pandas/core/nanops.py Outdated
Comment thread pandas/_libs/algos.pyx
Comment thread pandas/core/nanops.py Outdated
Comment thread pandas/_libs/algos.pyx Outdated
Comment thread pandas/_libs/algos.pyx Outdated
@jbrockmendel
Copy link
Copy Markdown
Member

Handful of small things, nothing big

@jbrockmendel jbrockmendel merged commit b935209 into pandas-dev:main Mar 13, 2026
77 of 79 checks passed
@jbrockmendel
Copy link
Copy Markdown
Member

Thanks @Alvaro-Kothe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Reduction Operations sum, mean, min, max, etc.

Projects

None yet

4 participants