Skip to content

perf(reader): Implement AsyncFileReader get_byte_ranges and coalesce close ranges#2181

Open
mbutrovich wants to merge 4 commits intoapache:mainfrom
mbutrovich:get_byte_ranges
Open

perf(reader): Implement AsyncFileReader get_byte_ranges and coalesce close ranges#2181
mbutrovich wants to merge 4 commits intoapache:mainfrom
mbutrovich:get_byte_ranges

Conversation

@mbutrovich
Copy link
Collaborator

@mbutrovich mbutrovich commented Feb 26, 2026

Which issue does this PR close?

What changes are included in this PR?

  • Adapt range coalescing from object_store.

Are these changes tested?

Existing tests, some new ones just to sanity check merge_ranges. Also ran full Iceberg Java suite via Comet. Benchmarks below.

@mbutrovich mbutrovich changed the title perf(reader): Add get_byte_ranges and merge_ranges. perf(reader): Add get_byte_ranges and merge_ranges Feb 26, 2026
@mbutrovich
Copy link
Collaborator Author

I ran SELECT SUM(l_extendedprice * l_discount) on a SF100 lineitem TPC-H table accessed via S3 (essentially just the scan from query 6 in TPC-H) with Comet. Here's the breakdown:

Metric main (uses get_bytes) PR #2181 (uses get_byte_ranges) Change
Total GETs 773 500 35% fewer
Total bytes 3,535 MB 3,535 MB same
Percentile main (uses get_bytes) PR #2181 (uses get_byte_ranges) Change
P50 (median) 1.5 MB 3.6 MB 2.4x larger
P75 3.3 MB 17.0 MB 5.2x larger
histogram

@mbutrovich mbutrovich marked this pull request as ready for review February 26, 2026 20:54
@mbutrovich mbutrovich self-assigned this Feb 26, 2026
@mbutrovich mbutrovich changed the title perf(reader): Add get_byte_ranges and merge_ranges perf(reader): Implement AsyncFileReader get_byte_ranges and coalesce close ranges Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant