gpu_execution is the recommended execution path — out-of-core execution with tiered memory management (GPU/host/disk), automatic data partitioning, and spilling. It currently works with Parquet data format.
Clone the Sirius repository:
git clone --recurse-submodules https://github.com/sirius-db/sirius.git
cd sirius
Set up the environment with Pixi and build:
pixi shell
CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) make
Note that if building consumes too much memory, try reducing the CMAKE_BUILD_PARALLEL_LEVEL value.
gpu_execution requires a config file in YAML format. See the Configuration documentation for the full reference, including config file resolution order, all available options, and byte suffixes. An example config file is provided at test/cpp/integration/integration.yaml.
export SIRIUS_CONFIG_FILE=/path/to/sirius.yaml
./build/release/duckdbFrom the DuckDB shell, create views pointing to your Parquet files and run queries with gpu_execution:
-- Create views for parquet data
CREATE VIEW lineitem AS SELECT * FROM read_parquet('/data/lineitem/*.parquet');
CREATE VIEW orders AS SELECT * FROM read_parquet('/data/orders/*.parquet');
CREATE VIEW customer AS SELECT * FROM read_parquet('/data/customer/*.parquet');
-- Run a query on GPU
CALL gpu_execution('SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price
FROM lineitem
WHERE l_shipdate <= date ''1998-09-02''
GROUP BY l_returnflag, l_linestatus
ORDER BY l_returnflag, l_linestatus');When gpu_execution is enabled (the default after loading the extension), all DuckDB queries are automatically intercepted by the optimizer hook and run on GPU — no CALL gpu_execution('...') wrapper needed:
-- Plain SQL, runs on GPU automatically
SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price
FROM lineitem
WHERE l_shipdate <= date '1998-09-02'
GROUP BY l_returnflag, l_linestatus
ORDER BY l_returnflag, l_linestatus;Queries with unsupported operators fall back silently to DuckDB CPU execution. To disable transparent execution for a connection:
SET gpu_execution = false;To re-enable:
SET gpu_execution = true;How it works: Two optimizer extensions are registered at extension load time. A pre-optimizer hook disables DuckDB optimizers incompatible with Sirius (such as IN_CLAUSE, COMPRESSED_MATERIALIZATION, and LATE_MATERIALIZATION). A post-optimizer hook captures the optimized logical plan and attempts GPU plan generation via sirius_physical_plan_generator. If plan generation succeeds, a PhysicalSiriusExecution node replaces the DuckDB physical plan and the query runs on GPU; if plan generation throws, the original DuckDB CPU plan runs unchanged.
For TPC-H benchmarking, use the provided data generation script:
cd test/tpch_performance
pixi run bash generate_tpch_data.sh 100 # generates SF100 parquet dataThis produces partitioned Parquet files under test_datasets/tpch_parquet_sf100/. Then create views from the DuckDB shell:
CREATE VIEW lineitem AS SELECT * FROM read_parquet('test_datasets/tpch_parquet_sf100/lineitem/*.parquet');
-- repeat for other tables...For your own data, point read_parquet() at any Parquet file or glob:
CREATE VIEW my_table AS SELECT * FROM read_parquet('/path/to/my_data/*.parquet');gpu_execution uses C++ unit tests built with Catch2. Test files are in test/cpp/.
Run all unit tests:
CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) make
build/release/extension/sirius/test/cpp/sirius_unittest
Run tests associated with a specific tag or a specific test:
build/release/extension/sirius/test/cpp/sirius_unittest "[cpu_cache]"
build/release/extension/sirius/test/cpp/sirius_unittest "test_cpu_cache_basic_string_single_col"
Test logs are saved in:
build/release/extension/sirius/test/cpp/log
For in-depth documentation on the gpu_execution engine, see the Super Sirius Documentation.