Enhance ivset indexing#567
Conversation
…ttempt to preserve metadata. This will prevent metadata to be dropped when indexing is unsorted list or a reverse slice. Did not any test, don't seem necessary, but could add some if needed. Still need to do: Did not make any modification to indexing with tuple, but planed to. The logic seems a little different there. Need to think about it.
Although this passed all test as an elif block following the condition for given column index. Since I am writing a slightly different behavior (not returning but modify the key (sort or reverse slice) so in the final else block the metadata will not be dropped due to unsorted input, maybe it is better that this is a separate block from above column index. Shouldn't make an actual difference since the column indexing will do early return.
There was a problem hiding this comment.
Pull request overview
This PR aims to address #560 by preserving IntervalSet metadata longer during row-style indexing/slicing that would otherwise produce an unsorted start/end order and trigger metadata dropping in IntervalSet.__init__.
Changes:
- Normalize unsorted integer-list row indexing by sorting indices (with a warning) before slicing.
- Attempt to normalize descending slices (negative
step) into ascending slices (with a warning) before slicing. - Keep existing behavior for metadata-column selection via
self[[*str]].
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| # check if list is sorted (ascending) | ||
| if not all(x < y for x, y in zip(key, key[1:])): |
There was a problem hiding this comment.
The “already sorted” check uses a strict comparison (x < y), so keys with repeated indices like [0, 0, 1] will be treated as unsorted and will emit a warning even though sorting doesn’t change the order. If repeated indices are meant to be allowed, use a non-decreasing check (<=) to avoid spurious warnings.
| # check if list is sorted (ascending) | |
| if not all(x < y for x, y in zip(key, key[1:])): | |
| # check if list is sorted (non-decreasing) | |
| if not all(x <= y for x, y in zip(key, key[1:])): |
| warnings.warn("Recieved unsorted index, this is sorted to preserve the invariant that " | ||
| "nap.IntervalSet remains ordered. This differs from standard NumPy/Pandas " \ | ||
| "indexing semantics as index order is not preserved.", | ||
| UserWarning) |
There was a problem hiding this comment.
These new warnings.warn(...) calls don’t set stacklevel, so they will point users at interval_set.py internals rather than their own indexing call site. Elsewhere in this file user-facing warnings typically use stacklevel=2; consider doing the same here for consistency.
| # A separate if-elif block to reorder key if given row style index of type | ||
| # list[int] or slice that is not in ascending order. | ||
| if isinstance(key, list) and all(isinstance(x, int) for x in key): |
There was a problem hiding this comment.
PR description mentions potentially adjusting .loc[tuple] behavior for a tuple of (row_index, column_keys), but this change only normalizes row-style list/slice keys in IntervalSet.__getitem__ and doesn’t affect IntervalSet.loc (implemented via _IntervalSetSliceHelper). If .loc[(rows, [cols...])] support is intended to be part of this PR, it looks missing here.
| # A separate if-elif block to reorder key if given row style index of type | ||
| # list[int] or slice that is not in ascending order. | ||
| if isinstance(key, list) and all(isinstance(x, int) for x in key): | ||
| # check if list is sorted (ascending) | ||
| if not all(x < y for x, y in zip(key, key[1:])): | ||
| key = sorted(key) | ||
| warnings.warn("Recieved unsorted index, this is sorted to preserve the invariant that " | ||
| "nap.IntervalSet remains ordered. This differs from standard NumPy/Pandas " \ | ||
| "indexing semantics as index order is not preserved.", | ||
| UserWarning) | ||
|
|
||
| elif isinstance(key,slice): | ||
| if key.step is None: | ||
| pass # prevent None step to go into next condition. No action needed | ||
| elif key.step < 0: | ||
| warnings.warn("Recieved descending slice, this is reversed to preserve the invariant that " | ||
| "nap.IntervalSet remains ordered. This differs from standard NumPy/Pandas " \ | ||
| "indexing semantics as index order is not preserved.", | ||
| UserWarning) | ||
| key = slice(key.stop + 1, key.start + 1, -key.step) |
There was a problem hiding this comment.
New behavior is introduced here (sorting unsorted integer lists / normalizing descending slices to preserve metadata), but there are existing IntervalSet indexing tests in tests/test_interval_set.py and none appear to cover these new cases. Please add tests asserting that metadata is preserved for unsorted integer lists and descending slices, and that the appropriate warning(s) are emitted.
| warnings.warn("Recieved unsorted index, this is sorted to preserve the invariant that " | ||
| "nap.IntervalSet remains ordered. This differs from standard NumPy/Pandas " \ | ||
| "indexing semantics as index order is not preserved.", | ||
| UserWarning) |
There was a problem hiding this comment.
The warning message has a spelling error: "Recieved" should be "Received" (also appears in the descending-slice warning below).
| elif isinstance(key,slice): | ||
| if key.step is None: | ||
| pass # prevent None step to go into next condition. No action needed | ||
| elif key.step < 0: | ||
| warnings.warn("Recieved descending slice, this is reversed to preserve the invariant that " | ||
| "nap.IntervalSet remains ordered. This differs from standard NumPy/Pandas " \ | ||
| "indexing semantics as index order is not preserved.", | ||
| UserWarning) | ||
| key = slice(key.stop + 1, key.start + 1, -key.step) |
There was a problem hiding this comment.
Rewriting a descending slice via slice(key.stop + 1, key.start + 1, -key.step) will raise a TypeError when key.stop is None (e.g. ep[5::-1] or ep[::-1]), because None + 1 is invalid. Handle None start/stop explicitly before doing arithmetic.
| elif isinstance(key,slice): | |
| if key.step is None: | |
| pass # prevent None step to go into next condition. No action needed | |
| elif key.step < 0: | |
| warnings.warn("Recieved descending slice, this is reversed to preserve the invariant that " | |
| "nap.IntervalSet remains ordered. This differs from standard NumPy/Pandas " \ | |
| "indexing semantics as index order is not preserved.", | |
| UserWarning) | |
| key = slice(key.stop + 1, key.start + 1, -key.step) | |
| elif isinstance(key, slice): | |
| step = key.step | |
| if step is None or step > 0: | |
| # No reordering needed for ascending slices or when step is omitted. | |
| pass | |
| else: | |
| warnings.warn("Recieved descending slice, this is reversed to preserve the invariant that " | |
| "nap.IntervalSet remains ordered. This differs from standard NumPy/Pandas " \ | |
| "indexing semantics as index order is not preserved.", | |
| UserWarning) | |
| # Normalize slice bounds to concrete integers before rewriting. | |
| start, stop, step = key.indices(len(self)) | |
| # At this point, step is guaranteed to be negative. | |
| key = slice(stop + 1, start + 1, -step) |
| elif isinstance(key,slice): | ||
| if key.step is None: | ||
| pass # prevent None step to go into next condition. No action needed | ||
| elif key.step < 0: | ||
| warnings.warn("Recieved descending slice, this is reversed to preserve the invariant that " | ||
| "nap.IntervalSet remains ordered. This differs from standard NumPy/Pandas " \ | ||
| "indexing semantics as index order is not preserved.", | ||
| UserWarning) | ||
| key = slice(key.stop + 1, key.start + 1, -key.step) |
There was a problem hiding this comment.
The slice “reversal” logic only preserves the same index set for step == -1. For steps like -2, converting to slice(stop+1, start+1, 2) produces a different set of rows than the original slice, changing the result. Consider materializing the indices (e.g., via np.arange(len(self))[key]), sorting them, and using that for indexing, or restrict the normalization to step == -1 and otherwise fall back to index materialization.
| elif isinstance(key,slice): | |
| if key.step is None: | |
| pass # prevent None step to go into next condition. No action needed | |
| elif key.step < 0: | |
| warnings.warn("Recieved descending slice, this is reversed to preserve the invariant that " | |
| "nap.IntervalSet remains ordered. This differs from standard NumPy/Pandas " \ | |
| "indexing semantics as index order is not preserved.", | |
| UserWarning) | |
| key = slice(key.stop + 1, key.start + 1, -key.step) | |
| elif isinstance(key, slice): | |
| if key.step is None or key.step > 0: | |
| # ascending slice or default step; nothing to normalize | |
| pass | |
| elif key.step < 0: | |
| warnings.warn( | |
| "Recieved descending slice, this is reordered to preserve the invariant that " | |
| "nap.IntervalSet remains ordered. This differs from standard NumPy/Pandas " | |
| "indexing semantics as index order is not preserved.", | |
| UserWarning, | |
| ) | |
| if key.step == -1: | |
| # For simple reversal, we can safely convert to an ascending slice | |
| key = slice(key.stop + 1, key.start + 1, 1) | |
| else: | |
| # For other negative steps, materialize indices, sort them, and use that | |
| idx = np.arange(len(self))[key] | |
| idx.sort() | |
| key = idx |
| if isinstance(key, list) and all(isinstance(x, int) for x in key): | ||
| # check if list is sorted (ascending) |
There was a problem hiding this comment.
all(isinstance(x, int) for x in key) misses common index lists containing NumPy integer scalars (e.g. list(np.array([2, 0])) yields np.int64 elements). In that case unsorted indices won’t be normalized and metadata will still be dropped. Consider checking numbers.Integral (or (int, np.integer)) instead of int only.
The previous way will not handle descending slice with step < -1 correctly. Now instead of modify the slice parameters I sort the recreated a sorted list of integers from the slice parameters using list comprehension. Now that I realized this.. Maybe I do need to add test...
Modified reorder of list keys to remove duplicate index. Also added test case to assert output returns the values intended by the slice (except being reversed). Added testing that metadata is properly handled when handling reverse index.
For some reason can't do tox and cannot pass lint... Lemme try reformatting myself.
As mentioned in #560 metadata columns are dropped when unsorted or descending row index are given. This modification should allow the metadata to be kept a little longer withou effecting other behaviors of the object (since sorting will be done anyway in the else block when calling the initializer)
I also mentioned also the behavior of loc seems a little off when a tuple of row index and column keys are given. Since the current logical statements would only allow tuple where the column key is a single string. I wonder if this behavior was the intial intention?
I don't really have good idea of dealing with list of column keys. There will be an error if both start/end and metadata columns are in the list of strings. Seems like in
__getitem__()for the IntervalSet class column index with list of string only allowed on metadata columns. Should.loc[tuple]be consistant with that?