perf: internal integer indexing#209
Conversation
…ay differ (scverse#207) * fix: indexing indices and data separately because their chunk grids may differ * chore: relnote * chore: move to hatch-vcs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * version type * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
| b_end = b_start + shape[0] | ||
| mask = (global_index >= b_start) & (global_index < b_end) | ||
| if mask.any(): | ||
| offset = 0 if use_original_space else b_start |
There was a problem hiding this comment.
Is use_original_space used anywhere anymore? I am fine getting rid its use (in generating the indices), but then lets remove the arg because it's dead code
ilan-gold
left a comment
There was a problem hiding this comment.
Let's remove that arg!
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #209 +/- ##
==========================================
+ Coverage 93.18% 93.33% +0.15%
==========================================
Files 14 14
Lines 1130 1111 -19
==========================================
- Hits 1053 1037 -16
+ Misses 77 74 -3
🚀 New features to boost your workflow:
|
| dataset_index_to_slices | ||
| A lookup of the list-placement index of a dataset to the request slices. | ||
| dataset_index_to_rows | ||
| A lookup of the list-placement index of a dataset to the sorted row indices to fetch. |
There was a problem hiding this comment.
A lookup of the list-placement index of a dataset to the sorted row indices to fetch.
@ilan-gold this isn't correct right? Is it sorted row indices?
There was a problem hiding this comment.
Don't think anything is sorted. Let me check
There was a problem hiding this comment.
YEah, I mean you can double check, but I don't think we have any sorting enforcement. That's why constructing the batches to yield is completely the sampler's responsibility and relies on the sampler knowing its own fetch order
|
I can't access the cluster right now but my worry was creating integers from slices to then creating slices again from those integers. I want to just point out when the number of slices to loop over is small (which I thought was the goal for having slices) So I ran this local isolated benchmark: https://gist.github.com/selmanozleyen/199b73288ce23cd3928e0b5f00425924 The cutoff chunksize for main to win is like below: When the number of datasets is 5, I found these cutoffs to be higher around 64-128. So the main should win when preload_nchunks is low and chunksize is high. Whenever the dataset number is higher main win gets harder. But I'd consider this a branch win in general. |
ilan-gold
left a comment
There was a problem hiding this comment.
I want to just point out when the number of slices to loop over is small (which I thought was the goal for having slices) main should win
Maybe, but running locally, it seems that the amount of time it takes on small numbers of slices once you get bigger chunks is 1/5th-1/10th regardless of the method. So it isn't really a bottleneck because the absolute time is much smaller regardless. For example, yes, the below case is 3x slower on this branch
❯ python tester.py --n-rows=100_000_000 --n-datasets=1 --preload-nchunks=32 --chunk-sizes=512
n_rows=100000000 n_datasets=1 preload_nchunks=32 mean_nnz=20.0
chunk_size rows/request main_us branch_us branch/main
512 16384 80.7 264.7 3.28
but compared to the amount of time the chunk_size=1 case takes in absolute numbers
❯ python tester.py --n-rows=20_000_000 --n-datasets=1 --preload-nchunks=4096 --chunk-sizes=1
n_rows=20000000 n_datasets=1 preload_nchunks=4096 mean_nnz=20.0
chunk_size rows/request main_us branch_us branch/main
1 4096 8516.1 5411.5 0.64
it's irrelevant. Also, your benchmark with row_runs bit can happen in somewhat in parallel to fetching because our _async_array._get_selection releases the GIL, but not sure how much impact that has.
Are you seeing this as well?
|
Thats very convincing! Thanks! And its kind of mind blowing |
In this PR we use row indices for the fetching as much as possible instead of slices.
In the future if we bring zarrs native integer indexing the transition will be easier. Also according to @ilan-gold for bigger chunk sizes it has no performance impact and it gives a %20 boost for integer indexing.