Skip to content

Make GPU-native DuckDB scan the default#893

Draft
kevkrist wants to merge 3 commits into
sirius-db:devfrom
kevkrist:test_bench_refactor
Draft

Make GPU-native DuckDB scan the default#893
kevkrist wants to merge 3 commits into
sirius-db:devfrom
kevkrist:test_bench_refactor

Conversation

@kevkrist

@kevkrist kevkrist commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Turns on the GPU-native duckdb scan by default so seq_scan over duckdb tables runs on the GPU without a flag. Falls back to the CPU scan for in-memory tables and anything the native path can't read, so nothing should break. Also points the TPC-H rigging at it and drops a short SF30 perf writeup in docs/. Draft for now to get some eyes on it.

kckristensen and others added 3 commits June 5, 2026 14:24
Flip enable_gpu_duckdb_native_scan to true so DuckDB seq_scan routes to the
GPU-native scan operator by default (no flag/SET needed), and point the perf
rigging at it.

Engine:
- src/include/sirius_config.hpp: default enable_gpu_duckdb_native_scan = true
- src/sirius_extension.cpp: update the SET-option description (default on)
- src/pipeline/sirius_pipeline_converter.cpp: only route seq_scan to native when
  bind_data is a TableScanBindData; otherwise fall through to the CPU scan
- src/sirius_context.cpp: stop force-enabling use_sirius_datasource when native is
  on (the io_uring/O_DIRECT datasource is now opt-in; native reads via DuckDB
  BufferManager by default, which works everywhere incl. hosts without io_uring)
- src/scan_manager/duckdb_native_split_provider.cpp: skip the uring io_object for
  ":memory:" databases and catch create_io_object failures -> BufferManager fallback
- src/op/scan/duckdb_native_metadata.cpp: mark tables with transient/in-memory
  segments (non-CONSTANT block_id<0) non-viable

Rigging / docs:
- test/tpch_performance/{run_tpch_duckdb.sh,benchmark_and_validate.sh,CLAUDE.md}:
  the duckdb data source now uses GPU-native scan by default; --gpu-native-scan and
  --data-source duckdb-native are redundant aliases kept for compatibility
- test/cpp/integration/integration-gb10.yaml: config tuned for the GB10 unified-memory box
- docs/super-sirius/native-duckdb-scan-perf-findings.md: SF30 native-vs-parquet analysis

KNOWN WIP (this is a checkpoint, not green):
- In-memory (":memory:") / transient-segment tables are correctly detected as
  non-viable, but the scan-manager's non-viable throw happens during execution and
  does NOT trigger the transparent CPU fallback, so it surfaces as a hard error.
  The [transparent] integration tests (which use :memory: tables) currently fail.
  Next step: move the viability check to plan-conversion time in the converter so
  non-viable tables fall through to the CPU scan before execution commits.
- Native-scan component tests ([duckdb_native_walker], [duckdb_native_split_provider])
  pass; SF30 file-backed benchmarks validate 22/22 against DuckDB CPU.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tests

Make insert_duckdb_native_scan_operator a try-insert that returns false (caller
falls through to the CPU duckdb scan) instead of throwing, and decide viability at
plan-conversion time so non-viable tables never hard-fail mid-execution (the
scan-manager's runtime non-viable throw does NOT trigger transparent fallback).

Converter (src/pipeline/sirius_pipeline_converter.cpp + header):
- insert_duckdb_native_scan_operator now returns bool; split_table_scan_source falls
  through to the CPU scan when it returns false.
- Fall back to CPU for: non-base-table bind data, null client_context, in-memory
  databases (StorageManager::InMemory() — transaction-free, so it also covers
  plan generation paths without an active transaction, e.g. plan printing/EXPLAIN),
  metadata-walk failure (wrapped in try/catch), walker-rejected tables (unsupported
  types/codecs), and tables with unreadable transient segments (non-CONSTANT
  block_id<0, e.g. uncheckpointed data).

Walker (src/op/scan/duckdb_native_metadata.cpp):
- Revert the block_id<0 rejection added in the checkpoint commit. The walker is a
  pure metadata describer (its unit tests build in-memory fixtures and expect
  viable=true); the "is this readable" decision belongs in the converter, which now
  scans the walked descriptors for non-CONSTANT block_id<0 segments.

Verified on the GB10 box:
- Full unit suite: 1555/1556 pass. The one failure ("Task scheduler dispatches tasks
  with device preference") is a pre-existing 2-GPU test on this 1-GPU host
  ("Requested number of GPUs exceeds available GPUs"), unrelated to this change.
- [transparent], [plan_printer], [duckdb_native_walker], [duckdb_native_split_provider]
  all green.
- SF30 benchmark_and_validate: --data-source duckdb (native by default, no flag) and
  --data-source parquet both validate 22/22 vs DuckDB CPU; sirius uses
  GPU_DUCKDB_NATIVE_SCAN and GPU_PARQUET_SCAN respectively.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ents

Addresses findings from a review of the native-scan-default change.

Perf (high): the plan-time viability gate added in this PR walked the DuckDB
segment metadata (DataTable::GetColumnSegmentInfo over every segment — the
dominant cold-scan cost) and discarded the result, then the split provider
re-walked the identical metadata at execution. Cache the converter's validated
walk on duckdb_native_scan_info::prewalked_metadata; the split provider moves it
out (zero-copy) and only walks when absent (direct construction / unit tests).
The ctor now seals the const _scan_info in the body, after the cached metadata
is moved out of the by-value param, to avoid a deep copy.

Docs/comments:
- sirius_context.cpp: the comment claimed the native scan reads via DuckDB's
  BufferManager by default. It does not — gpu_ioctxs_ is built unconditionally,
  the scan gets one, and the decoder reads via host_read by default; BufferManager
  is only the per-block fallback when the io_object can't be opened. Also drop the
  false claim that use_sirius_datasource gates the native path (it is read in no
  live code path).
- test/tpch_performance/CLAUDE.md: fix stale line ref (:384 catch block -> :451
  routing gate), name the function to resist drift.
- run_tpch_duckdb.sh: --gpu-native-scan and SIRIUS_NATIVE_SCAN_VERIFY are
  independent; stop implying the flag is needed for verify.

Note: deliberately did NOT fold the block_id<0 readability check into the walker
(a suggested but rejected option) — the walker is a pure metadata describer whose
unit tests expect viable=true on in-memory fixtures; folding it would re-break them.

Verified: builds clean in pixi; [duckdb_native_split_provider], [duckdb_native_walker],
[plan_printer], [transparent] unit suites all pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants