Bugfix desc null fix + add tpcds fixture#810
Merged
ran-yuan-rui merged 2 commits intoMay 30, 2026
Merged
Conversation
cuDF applies null_order in the ascending frame and reverses it for a descending column, so the naive NULLS_FIRST?BEFORE:AFTER mapping mis-placed NULLs for every DESC sort key (e.g. ORDER BY x DESC NULLS LAST put NULLs first, disagreeing with DuckDB CPU). The mapping was duplicated across the order, sort-sample, sort-partition, merge-sort and top-n operators. Centralize the contract in op/cudf_sort_order.hpp (to_cudf_order / to_cudf_null_order, which flips BEFORE<->AFTER for descending keys) and route every sort/top-n path through it. Also fix single-key top-n with a nullable key: cudf::top_k_order takes no null_order and cannot honor NULLS FIRST/LAST when selecting the top k, so a nullable key now falls back to a full sort that honors null placement. Tests: test_gpu_execution_order_nulls.cpp adds 3 Catch2 cases (12 dynamic scenarios) covering ORDER BY / TOP-N single-key / TOP-N multi-key under ASC/DESC x NULLS FIRST/LAST, comparing GPU output to DuckDB CPU via bidirectional EXCEPT ALL.
mbrobbel
reviewed
May 27, 2026
wmalpica
approved these changes
May 29, 2026
Add an opt-in CMake target `tpcds-fixture` that runs
test/cpp/integration/data/duckdb/generate_tpcds_duckdb.sh into
${CMAKE_BINARY_DIR}/test-fixtures/tpcds.duckdb (SF 0.01, via DuckDB's
tpcds extension dsdgen). Run `cmake --build . --target tpcds-fixture`
once before the TPC-DS integration tests; the database is not committed.
TPC-H integration.duckdb keeps its committed form for now; migrating it
to the same scheme can be a follow-up.
669a5ce to
7be5bff
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Split out from #806 per @wmalpica review comment.
This PR contains two independent pieces:
ORDER BY/TOP-Ncorrectness forDESC ... NULLS FIRST/LAST.RCA: DESC NULL ordering bug
DuckDB SQL treats
NULLS FIRST/NULLS LASTas the final, absolute placement of NULLs in the result:DESC NULLS FIRST=> NULLs first, then values descendingDESC NULLS LAST=> values descending, then NULLs lastSirius previously mapped SQL null ordering directly to cuDF:
That is correct for
ASC, but wrong forDESC: cuDF appliesnull_orderin its sort-direction frame, so DESC effectively reverses the NULL placement. As a result, GPUORDER BY k DESC NULLS LASTcould place NULLs first, disagreeing with DuckDB CPU.The same naive mapping was duplicated across the distributed sort path:
ORDER_BYSORT_SAMPLESORT_PARTITIONMERGE_SORTTOP_NThere was one additional TOP-N issue:
cudf::top_k_orderhas nonull_orderparameter, so single-key TOP-N on a nullable key could select the wrong top-k rows before any final sorting step.Fix
src/include/op/cudf_sort_order.hppas the single source of truth:to_cudf_order(...)to_cudf_null_order(...)to_cudf_null_order(...)flipsBEFORE/AFTERfor descending keys so SQLNULLS FIRST/LASTmatches DuckDB semantics.cudf::top_k_orderfast path when the key has no NULLssort_by_key + slicewhen the key is nullable, so NULL placement is honoredTPC-DS fixture
Update (reworked per @mbrobbel's review): Dropped the committed 12 MB tpcds.duckdb binary. The fixture is now generated at build time via an opt-in CMake target tpcds-fixture (runs generate_tpcds_duckdb.sh into ${CMAKE_BINARY_DIR}/test-fixtures/, SF 0.01 via DuckDB's dsdgen). Run cmake --build . --target tpcds-fixture once before the TPC-DS integration tests.
Tests
Added
test_gpu_execution_order_nulls.cpp:ORDER BYTOP-N single-keyTOP-N multi-keyASC/DESC × NULLS FIRST/LASTThe test materializes GPU output once, adds
observed_positionto make ordering differences visible, and compares against DuckDB CPU with bidirectionalEXCEPT ALL.Validation:
pre-commit run --all-files pixi run -e default make release build/release/extension/sirius/test/cpp/sirius_unittest "[nulls]"