Optimize buffer ops#8322
Conversation
Merging this PR will improve performance by 46.77%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
20.2 µs | 34.5 µs | -41.34% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
274.4 µs | 309.8 µs | -11.41% |
| ⚡ | Simulation | slice_empty_vortex |
2,599.4 ns | 368.3 ns | ×7.1 |
| ⚡ | Simulation | append_buffer_vortex_buffer[65536] |
95.4 µs | 27 µs | ×3.5 |
| ⚡ | Simulation | append_buffer_vortex_buffer[16384] |
32 µs | 12.9 µs | ×2.5 |
| ⚡ | Simulation | append_buffer_vortex_buffer[128] |
11.6 µs | 5.4 µs | ×2.2 |
| ⚡ | Simulation | append_buffer_vortex_buffer[1024] |
13.6 µs | 8.5 µs | +61.24% |
| ⚡ | Simulation | slice_vortex_buffer[1024] |
1,276.7 ns | 813.1 ns | +57.02% |
| ⚡ | Simulation | slice_vortex_buffer[16384] |
1,276.7 ns | 813.1 ns | +57.02% |
| ⚡ | Simulation | slice_vortex_buffer[2048] |
1,276.7 ns | 813.1 ns | +57.02% |
| ⚡ | Simulation | slice_vortex_buffer[128] |
1,276.7 ns | 813.1 ns | +57.02% |
| ⚡ | Simulation | slice_vortex_buffer[65536] |
1,276.7 ns | 813.1 ns | +57.02% |
| ⚡ | Simulation | append_buffer_vortex_buffer[2048] |
11.4 µs | 7.9 µs | +45.37% |
| ⚡ | Simulation | search_index_below_min_chunked |
1.5 ms | 1.3 ms | +15.71% |
| ⚡ | Simulation | search_index_mixed_out_of_range_chunked |
1.5 ms | 1.3 ms | +15.27% |
| ⚡ | Simulation | search_index_full_range_random_chunked |
1.6 ms | 1.4 ms | +13.51% |
| ⚡ | Simulation | compare[6] |
79.3 µs | 70 µs | +13.31% |
| ⚡ | Simulation | compare[6] |
79 µs | 69.8 µs | +13.11% |
| ⚡ | Simulation | compare[6] |
80.7 µs | 71.4 µs | +12.94% |
| ⚡ | Simulation | compare[5] |
75.9 µs | 68.5 µs | +10.83% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing adamg/buffer-slice-fast (ae9bc12) with develop (3d7bbfb)
## Summary Adds a basic benchmark for slicing, including an Arrow baseline. Hopefully building up to #8322, but I want a baseline first. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
fd451bf to
341039a
Compare
Polar Signals Profiling ResultsLatest Run
Previous Runs (2)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.090x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (1.090x ➖, 0↑ 4↓)
No file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.112x ❌, 0↑ 6↓)
datafusion / vortex-compact (1.080x ➖, 0↑ 2↓)
datafusion / parquet (1.121x ❌, 0↑ 5↓)
duckdb / vortex-file-compressed (1.141x ❌, 0↑ 8↓)
duckdb / vortex-compact (1.083x ➖, 0↑ 4↓)
duckdb / parquet (1.102x ❌, 0↑ 3↓)
File Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.062x ➖, 0↑ 2↓)
datafusion / vortex-compact (0.932x ➖, 10↑ 0↓)
datafusion / parquet (1.041x ➖, 1↑ 4↓)
datafusion / arrow (0.960x ➖, 4↑ 3↓)
duckdb / vortex-file-compressed (1.041x ➖, 0↑ 6↓)
duckdb / vortex-compact (1.000x ➖, 1↑ 0↓)
duckdb / parquet (0.990x ➖, 0↑ 0↓)
duckdb / duckdb (1.006x ➖, 0↑ 0↓)
File Size Changes (9 files changed, +0.2% overall, 9↑ 0↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.991x ➖, 6↑ 1↓)
datafusion / vortex-compact (1.003x ➖, 0↑ 2↓)
datafusion / parquet (0.996x ➖, 3↑ 0↓)
duckdb / vortex-file-compressed (0.994x ➖, 2↑ 1↓)
duckdb / vortex-compact (0.994x ➖, 0↑ 1↓)
duckdb / parquet (0.999x ➖, 1↑ 1↓)
duckdb / duckdb (0.997x ➖, 0↑ 3↓)
File Size Changes (7 files changed, +0.0% overall, 7↑ 0↓)
Totals:
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (1.022x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.020x ➖, 0↑ 0↓)
duckdb / parquet (1.025x ➖, 0↑ 0↓)
File Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.971x ➖, 1↑ 1↓)
datafusion / vortex-compact (1.014x ➖, 2↑ 1↓)
datafusion / parquet (0.920x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.853x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.968x ➖, 0↑ 1↓)
duckdb / parquet (0.972x ➖, 0↑ 0↓)
|
BENCHMARK FAILEDBenchmark |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.993x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.006x ➖, 0↑ 0↓)
datafusion / parquet (0.997x ➖, 0↑ 0↓)
datafusion / arrow (0.926x ➖, 7↑ 0↓)
duckdb / vortex-file-compressed (1.002x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.003x ➖, 0↑ 0↓)
duckdb / parquet (1.000x ➖, 0↑ 0↓)
duckdb / duckdb (0.998x ➖, 0↑ 0↓)
File Size Changes (26 files changed, -0.0% overall, 8↑ 18↓)
Totals:
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.969x ➖, 13↑ 2↓)
datafusion / parquet (0.913x ➖, 13↑ 0↓)
duckdb / vortex-file-compressed (1.011x ➖, 4↑ 0↓)
duckdb / parquet (1.002x ➖, 0↑ 0↓)
duckdb / duckdb (1.013x ➖, 0↑ 1↓)
File Size Changes (107 files changed, -0.0% overall, 56↑ 51↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.873x ➖, 3↑ 0↓)
datafusion / vortex-compact (0.867x ➖, 5↑ 0↓)
datafusion / parquet (1.267x ➖, 1↑ 13↓)
duckdb / vortex-file-compressed (0.875x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.933x ➖, 0↑ 0↓)
duckdb / parquet (0.917x ➖, 0↑ 0↓)
|
Benchmarks: Appian on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.008x ➖, 0↑ 0↓)
datafusion / parquet (1.007x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.031x ➖, 0↑ 0↓)
duckdb / parquet (1.006x ➖, 0↑ 0↓)
duckdb / duckdb (1.007x ➖, 0↑ 0↓)
File Size Changes (4 files changed, -0.0% overall, 1↑ 3↓)
Totals:
|
Benchmarks: CompressionVortex (geomean): 1.016x ➖ How to read Verdict and Engines
unknown / unknown (1.045x ➖, 2↑ 29↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.870x ➖, 4↑ 1↓)
datafusion / vortex-compact (0.867x ➖, 3↑ 0↓)
datafusion / parquet (1.044x ➖, 0↑ 4↓)
duckdb / vortex-file-compressed (0.929x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.065x ➖, 0↑ 2↓)
duckdb / parquet (0.921x ➖, 0↑ 0↓)
|
|
I made #8162 that optimizes some of these code paths |
06ac2f8 to
dde48e4
Compare
|
@robert3005 do you want to merge that first? |
|
that would be ideal, the pr I made is a revival of an older pr already |
|
I'll review it |
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
dde48e4 to
ae9bc12
Compare
Summary
This PR includes a few optimization for buffer-level ops:
BitBufferMut::append_bufferuses arrow's word-sized append for unaligned bitbuffers instead of bitvec which is 1 bit a time.Alignment, instead of having less specific checks in different callsites.After this PR is merged, I'll follow up and remove
bitvecas a dependency, its currently used in a couple of pretty random places and I suspect there's nothing special about them compared to our ownBitBuffer.