Skip to content

Bugfix desc null fix + add tpcds fixture#810

Merged
ran-yuan-rui merged 2 commits into
sirius-db:devfrom
ran-yuan-rui:bugfix-desc-null-fix-addtpcds-fixture
May 30, 2026
Merged

Bugfix desc null fix + add tpcds fixture#810
ran-yuan-rui merged 2 commits into
sirius-db:devfrom
ran-yuan-rui:bugfix-desc-null-fix-addtpcds-fixture

Conversation

@ran-yuan-rui

@ran-yuan-rui ran-yuan-rui commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Split out from #806 per @wmalpica review comment.

This PR contains two independent pieces:

  1. Fix GPU ORDER BY / TOP-N correctness for DESC ... NULLS FIRST/LAST.
  2. Add a reusable SF0.01 TPC-DS DuckDB fixture for integration / plan-translation tests.

RCA: DESC NULL ordering bug

DuckDB SQL treats NULLS FIRST / NULLS LAST as the final, absolute placement of NULLs in the result:

  • DESC NULLS FIRST => NULLs first, then values descending
  • DESC NULLS LAST => values descending, then NULLs last

Sirius previously mapped SQL null ordering directly to cuDF:

NULLS_FIRST -> cudf::null_order::BEFORE
NULLS_LAST  -> cudf::null_order::AFTER

That is correct for ASC, but wrong for DESC: cuDF applies null_order in its sort-direction frame, so DESC effectively reverses the NULL placement. As a result, GPU ORDER BY k DESC NULLS LAST could place NULLs first, disagreeing with DuckDB CPU.

The same naive mapping was duplicated across the distributed sort path:

  • ORDER_BY
  • SORT_SAMPLE
  • SORT_PARTITION
  • MERGE_SORT
  • TOP_N

There was one additional TOP-N issue: cudf::top_k_order has no null_order parameter, so single-key TOP-N on a nullable key could select the wrong top-k rows before any final sorting step.

Fix

  • Added src/include/op/cudf_sort_order.hpp as the single source of truth:
    • to_cudf_order(...)
    • to_cudf_null_order(...)
  • to_cudf_null_order(...) flips BEFORE / AFTER for descending keys so SQL NULLS FIRST/LAST matches DuckDB semantics.
  • Routed all sort/top-n operators through the helper.
  • For single-key TOP-N:
    • keep cudf::top_k_order fast path when the key has no NULLs
    • use sort_by_key + slice when the key is nullable, so NULL placement is honored

TPC-DS fixture

Update (reworked per @mbrobbel's review): Dropped the committed 12 MB tpcds.duckdb binary. The fixture is now generated at build time via an opt-in CMake target tpcds-fixture (runs generate_tpcds_duckdb.sh into ${CMAKE_BINARY_DIR}/test-fixtures/, SF 0.01 via DuckDB's dsdgen). Run cmake --build . --target tpcds-fixture once before the TPC-DS integration tests.

Tests

Added test_gpu_execution_order_nulls.cpp:

  • ORDER BY
  • TOP-N single-key
  • TOP-N multi-key
  • ASC/DESC × NULLS FIRST/LAST

The test materializes GPU output once, adds observed_position to make ordering differences visible, and compares against DuckDB CPU with bidirectional EXCEPT ALL.

Validation:

pre-commit run --all-files
pixi run -e default make release
build/release/extension/sirius/test/cpp/sirius_unittest "[nulls]"

cuDF applies null_order in the ascending frame and reverses it for a
descending column, so the naive NULLS_FIRST?BEFORE:AFTER mapping mis-placed
NULLs for every DESC sort key (e.g. ORDER BY x DESC NULLS LAST put NULLs
first, disagreeing with DuckDB CPU). The mapping was duplicated across the
order, sort-sample, sort-partition, merge-sort and top-n operators.

Centralize the contract in op/cudf_sort_order.hpp (to_cudf_order /
to_cudf_null_order, which flips BEFORE<->AFTER for descending keys) and
route every sort/top-n path through it.

Also fix single-key top-n with a nullable key: cudf::top_k_order takes no
null_order and cannot honor NULLS FIRST/LAST when selecting the top k, so a
nullable key now falls back to a full sort that honors null placement.

Tests: test_gpu_execution_order_nulls.cpp adds 3 Catch2 cases (12 dynamic
scenarios) covering ORDER BY / TOP-N single-key / TOP-N multi-key under
ASC/DESC x NULLS FIRST/LAST, comparing GPU output to DuckDB CPU via
bidirectional EXCEPT ALL.
@ran-yuan-rui ran-yuan-rui changed the title Bugfix desc null fix addtpcds fixture Bugfix desc null fix + add tpcds fixture May 27, 2026
Comment thread test/cpp/integration/data/duckdb/tpcds.duckdb Outdated
Add an opt-in CMake target `tpcds-fixture` that runs
test/cpp/integration/data/duckdb/generate_tpcds_duckdb.sh into
${CMAKE_BINARY_DIR}/test-fixtures/tpcds.duckdb (SF 0.01, via DuckDB's
tpcds extension dsdgen). Run `cmake --build . --target tpcds-fixture`
once before the TPC-DS integration tests; the database is not committed.

TPC-H integration.duckdb keeps its committed form for now; migrating it
to the same scheme can be a follow-up.
@ran-yuan-rui ran-yuan-rui force-pushed the bugfix-desc-null-fix-addtpcds-fixture branch from 669a5ce to 7be5bff Compare May 30, 2026 15:36
@ran-yuan-rui ran-yuan-rui added this pull request to the merge queue May 30, 2026
Merged via the queue into sirius-db:dev with commit 7ade2b6 May 30, 2026
15 checks passed
@ran-yuan-rui ran-yuan-rui deleted the bugfix-desc-null-fix-addtpcds-fixture branch May 30, 2026 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants