perf: internal integer indexing by selmanozleyen · Pull Request #209 · scverse/annbatch

selmanozleyen · 2026-05-18T08:16:30Z

In this PR we use row indices for the fetching as much as possible instead of slices.

In the future if we bring zarrs native integer indexing the transition will be easier. Also according to @ilan-gold for bigger chunk sizes it has no performance impact and it gives a %20 boost for integer indexing.

…ay differ

…ay differ (scverse#207) * fix: indexing indices and data separately because their chunk grids may differ * chore: relnote * chore: move to hatch-vcs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * version type * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

ilan-gold · 2026-05-18T13:58:43Z

+            b_end = b_start + shape[0]
+            mask = (global_index >= b_start) & (global_index < b_end)
+            if mask.any():
+                offset = 0 if use_original_space else b_start


Is use_original_space used anywhere anymore? I am fine getting rid its use (in generating the indices), but then lets remove the arg because it's dead code

true, removed it

ilan-gold

Let's remove that arg!

codecov · 2026-05-18T16:25:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.33%. Comparing base (254ad41) to head (d6155ba).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #209      +/-   ##
==========================================
+ Coverage   93.18%   93.33%   +0.15%     
==========================================
  Files          14       14              
  Lines        1130     1111      -19     
==========================================
- Hits         1053     1037      -16     
+ Misses         77       74       -3

Files with missing lines	Coverage Δ
src/annbatch/loader.py	`93.70% <100.00%> (+0.56%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

selmanozleyen · 2026-05-19T11:37:02Z

-            dataset_index_to_slices
-                A lookup of the list-placement index of a dataset to the request slices.
+            dataset_index_to_rows
+                A lookup of the list-placement index of a dataset to the sorted row indices to fetch.


A lookup of the list-placement index of a dataset to the sorted row indices to fetch.

@ilan-gold this isn't correct right? Is it sorted row indices?

Don't think anything is sorted. Let me check

YEah, I mean you can double check, but I don't think we have any sorting enforcement. That's why constructing the batches to yield is completely the sampler's responsibility and relies on the sampler knowing its own fetch order

selmanozleyen · 2026-05-19T12:28:34Z

I can't access the cluster right now but my worry was creating integers from slices to then creating slices again from those integers. I want to just point out when the number of slices to loop over is small (which I thought was the goal for having slices) main should win right?

So I ran this local isolated benchmark: https://gist.github.com/selmanozleyen/199b73288ce23cd3928e0b5f00425924

The cutoff chunksize for main to win is like below:

preload_nchunks  cutoff
32               32-64
64               64-128
128              64-128
256              64-128
512              64-128
32768            64-128

When the number of datasets is 5, I found these cutoffs to be higher around 64-128. So the main should win when preload_nchunks is low and chunksize is high. Whenever the dataset number is higher main win gets harder.

But I'd consider this a branch win in general.

ilan-gold

I want to just point out when the number of slices to loop over is small (which I thought was the goal for having slices) main should win

Maybe, but running locally, it seems that the amount of time it takes on small numbers of slices once you get bigger chunks is 1/5th-1/10th regardless of the method. So it isn't really a bottleneck because the absolute time is much smaller regardless. For example, yes, the below case is 3x slower on this branch

❯ python tester.py --n-rows=100_000_000 --n-datasets=1 --preload-nchunks=32 --chunk-sizes=512 
n_rows=100000000 n_datasets=1 preload_nchunks=32 mean_nnz=20.0
chunk_size      rows/request    main_us branch_us       branch/main
512     16384   80.7    264.7   3.28

but compared to the amount of time the chunk_size=1 case takes in absolute numbers

❯ python tester.py --n-rows=20_000_000 --n-datasets=1 --preload-nchunks=4096 --chunk-sizes=1 
n_rows=20000000 n_datasets=1 preload_nchunks=4096 mean_nnz=20.0
chunk_size      rows/request    main_us branch_us       branch/main
1       4096    8516.1  5411.5  0.64

it's irrelevant. Also, your benchmark with row_runs bit can happen in somewhat in parallel to fetching because our _async_array._get_selection releases the GIL, but not sure how much impact that has.

Are you seeing this as well?

selmanozleyen · 2026-05-19T15:48:29Z

Thats very convincing! Thanks! And its kind of mind blowing

ilan-gold and others added 10 commits May 6, 2026 14:26

perf: use integer indexes internall as much as possible

9c75b92

fix: more _dataset_intervals removal

e7053a8

fix: no sort

8501235

man this ai stuff really stinks

fca804d

fix: indexing indices and data separately because their chunk grids m…

cb00eaf

…ay differ

Merge branch 'ig/fix_indexer' into ig/internal_integer_indexing

47465aa

fix: ok we still need those contiguous runs

c84957c

Merge branch 'main' into perf/internal-integer-indexing

cd92534

dont call _slices_to_slices_with_array_index twice

0d520d3

selmanozleyen changed the title ~~Perf: Internal integer indexing~~ perf: internal integer indexing May 18, 2026

ilan-gold reviewed May 18, 2026

View reviewed changes

Merge branch 'main' into perf/internal-integer-indexing

99a3643

ilan-gold added the run-gpu-ci Signal that gpu ci should be run label May 18, 2026

selmanozleyen and others added 3 commits May 19, 2026 13:18

Merge branch 'main' into perf/internal-integer-indexing

4a8d1d7

remove that arg

65fc1e3

rename from slices to rows

49a8075

selmanozleyen commented May 19, 2026

View reviewed changes

selmanozleyen marked this pull request as ready for review May 19, 2026 12:29

selmanozleyen requested a review from ilan-gold May 19, 2026 12:30

ilan-gold reviewed May 19, 2026

View reviewed changes

Comment thread src/annbatch/loader.py Outdated

selmanozleyen added 2 commits May 19, 2026 22:02

nnnz to nnz

214ede3

running_nnz = dest[-1] works

d6155ba

ilan-gold approved these changes May 20, 2026

View reviewed changes

ilan-gold merged commit 91a6c3e into scverse:main May 20, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: internal integer indexing#209

perf: internal integer indexing#209
ilan-gold merged 16 commits into
scverse:mainfrom
selmanozleyen:perf/internal-integer-indexing

selmanozleyen commented May 18, 2026 •

edited

Loading

Uh oh!

ilan-gold May 18, 2026

Uh oh!

selmanozleyen May 19, 2026

Uh oh!

ilan-gold left a comment

Uh oh!

codecov Bot commented May 18, 2026 •

edited

Loading

Uh oh!

selmanozleyen May 19, 2026 •

edited

Loading

Uh oh!

ilan-gold May 20, 2026

Uh oh!

ilan-gold May 20, 2026

Uh oh!

selmanozleyen commented May 19, 2026 •

edited

Loading

Uh oh!

ilan-gold left a comment

Uh oh!

Uh oh!

selmanozleyen commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

selmanozleyen commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold May 18, 2026

Choose a reason for hiding this comment

Uh oh!

selmanozleyen May 19, 2026

Choose a reason for hiding this comment

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

selmanozleyen May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold May 20, 2026

Choose a reason for hiding this comment

Uh oh!

ilan-gold May 20, 2026

Choose a reason for hiding this comment

Uh oh!

selmanozleyen commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

selmanozleyen commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

selmanozleyen commented May 18, 2026 •

edited

Loading

codecov Bot commented May 18, 2026 •

edited

Loading

selmanozleyen May 19, 2026 •

edited

Loading

selmanozleyen commented May 19, 2026 •

edited

Loading