ENH: add sort() method to StringArray and ArrowStringArray by ssam18 · Pull Request #65052 · pandas-dev/pandas

ssam18 · 2026-04-03T16:21:36Z

Fixes #64977
In pandas 3.x, Series.unique() returns StringArray or ArrowStringArray instead of a numpy array, but neither of these had a .sort() method that numpy arrays have always supported. This broke code that called .sort() on the result of .unique().

The fix adds a sort() method to NDArrayBackedExtensionArray (which covers StringArray) and ArrowExtensionArray (which covers ArrowStringArray). For numpy-backed arrays, the underlying _ndarray is reordered in-place; for Arrow-backed arrays (which are immutable), the internal reference is swapped.

Tests cover ascending and descending sort, NA placement with na_position='first'/'last', and the exact repro from the issue.

jbrockmendel · 2026-04-04T02:02:24Z

is the idea to add it for just these arrays or all EAs? I think @rhshadrach would want all EAs

ssam18 · 2026-04-04T02:43:13Z

@jbrockmendel Thank you for the clarification. I have moved the implementation to the base ExtensionArray class so all EAs that support setitem get sort() for free. The numpy-backed and Arrow-backed overrides are kept for efficiency.

github-actions · 2026-05-05T00:24:58Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

ssam18 · 2026-05-05T13:18:41Z

can someone review this PR?

rhshadrach

Looks good - just some requests on testing style. Also needs a whatsnew in 3.1.0rst under "Other Enhancements"

In pandas 3.x, Series.unique() started returning StringArray/ArrowStringArray instead of numpy.ndarray. Since numpy arrays have an in-place sort() method but extension arrays didn't, code that called .sort() on unique() results broke. Added sort() to NDArrayBackedExtensionArray and ArrowExtensionArray, and tests covering ascending, descending, NA placement, and the original repro.

…ubclasses

1. Add whatsnew entry under Other enhancements in v3.1.0.rst 2. Add GH issue reference comments to new sort tests 3. Simplify NA handling in test_sort_with_na to use dtype.na_value directly 4. Parametrize test_sort_with_na on na_position to remove duplicated body

rhshadrach

lgtm

jbrockmendel · 2026-05-07T21:04:39Z

How necessary are the subclass overrides? For NDArrayBacked in particular I don't expect it to make much difference

jbrockmendel · 2026-05-07T21:15:30Z

A few questions, no objections.

Parth-Dholariya · 2026-05-14T12:59:03Z

Hi, I noticed the discussion about moving the sort() tests to the extension array test suite. I’m new to contributing to pandas, but I’d be happy to help look into the existing extension test structure and check whether a common ExtensionArray.sort test can be added.

Please let me know if this would be useful, and I can start exploring it.

Per review feedback on GH#65052, hoist the in-place sort tests from the StringArray-specific test file into BaseMethodsTests so every ExtensionArray subclass exercises the inherited sort() method through data_for_sorting / data_missing_for_sorting fixtures.

SparseArray.__setitem__ raises TypeError, so the base ExtensionArray.sort() implementation (which does self[:] = self.take(...)) failed with a confusing internal error. Override sort() to raise NotImplementedError cleanly and assert that behavior in the extension test suite. Surfaced by the new shared sort tests in BaseMethodsTests.

JSONArray.__setitem__ in the extension test helper didn't iterate over slice keys, so any self[:] = ... path (including the inherited ExtensionArray.sort()) blew up with "slice object is not iterable". Expand slice keys to a range before dispatching. Three pre-existing xfail markers on test_setitem_slice* now XPASS, so remove or scope them accordingly.

…od-for-StringArray

jbrockmendel · 2026-06-01T19:19:49Z

Do we need to add sort in docs/source/reference/extensions.rst?

Simplified test_sort_unique_result to construct the StringArray directly rather than going through a DataFrame, since the bug is just about unique followed by sort. Also listed ExtensionArray.sort in the extensions reference page next to argsort so the new public method is documented.

github-actions Bot added the Stale label May 5, 2026

rhshadrach requested changes May 5, 2026

View reviewed changes

Comment thread pandas/core/arrays/base.py

Comment thread pandas/tests/arrays/string_/test_string.py Outdated

Comment thread pandas/tests/arrays/string_/test_string.py Outdated

Comment thread pandas/tests/arrays/string_/test_string.py Outdated

rhshadrach changed the title ~~BUG: add sort() method to StringArray and ArrowStringArray~~ ENH: add sort() method to StringArray and ArrowStringArray May 5, 2026

rhshadrach added Enhancement and removed Stale labels May 5, 2026

ssam18 added 5 commits May 5, 2026 19:36

CLN: fix line length in sort() methods

6fe3a9e

DOC: fix doctest dtype strings in sort() methods

07e3281

BUG: add sort() to base ExtensionArray; keep efficient overrides in s…

6f0b324

…ubclasses

ssam18 force-pushed the BUG-GH64977-sort-method-for-StringArray branch from 2ebe0fd to 0a477cf Compare May 6, 2026 00:41

Merge branch 'main' into BUG-GH64977-sort-method-for-StringArray

351a15d

rhshadrach approved these changes May 7, 2026

View reviewed changes

rhshadrach requested a review from jbrockmendel May 7, 2026 20:30

rhshadrach added ExtensionArray Extending pandas with custom dtypes or arrays. Sorting e.g. sort_index, sort_values labels May 7, 2026

rhshadrach added this to the 3.1 milestone May 7, 2026

jbrockmendel reviewed May 7, 2026

View reviewed changes

Comment thread pandas/tests/arrays/string_/test_string.py Outdated

ssam18 added 4 commits May 15, 2026 15:31

Merge remote-tracking branch 'origin/main' into BUG-GH64977-sort-meth…

6bcc1a1

…od-for-StringArray

jbrockmendel reviewed Jun 1, 2026

View reviewed changes

Comment thread pandas/tests/arrays/string_/test_string.py Outdated

Uh oh!

Conversation

ssam18 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel commented Apr 4, 2026

Uh oh!

ssam18 commented Apr 4, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

ssam18 commented May 5, 2026

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jbrockmendel commented May 7, 2026

Uh oh!

jbrockmendel commented May 7, 2026

Uh oh!

Parth-Dholariya commented May 14, 2026

Uh oh!

Uh oh!

jbrockmendel commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ssam18 commented Apr 3, 2026 •

edited

Loading