Add warning log to EvictionQueue::Purge for slow purge detection by krleonid · Pull Request #45 · krleonid/duckdb

krleonid · 2026-05-12T09:49:01Z

Summary

Add DUCKDB_LOG_WARNING to EvictionQueue::Purge() when iterations > 10 or elapsed time > 1 second
Logs queue_size and dead_nodes count for diagnosis
Also prints to stderr in DEBUG builds via Printer::PrintF

Usage

SET GLOBAL enable_logging = true;
SET GLOBAL logging_level = 'warning';
-- run workload, then:
SELECT * FROM duckdb_logs() WHERE message LIKE '%Purge%';

Test plan

Run concurrent workload with multiple attached databases
Verify warning appears in duckdb_logs() when purge takes many iterations
Verify no output when purge is fast (< 10 iterations, < 1s)

🤖 Generated with Claude Code

Hi team, I found for unsigned integer cast, stats are not applied as expected, which leads to unnecessary table scan. ```sql memory D CREATE TABLE t_signed AS SELECT i::INT AS id FROM range(0, 100) t(i); memory D EXPLAIN SELECT * FROM t_signed WHERE id::BIGINT > 100; ┌─────────────────────────────┐ │┌───────────────────────────┐│ ││ Physical Plan ││ │└───────────────────────────┘│ └─────────────────────────────┘ ┌───────────────────────────┐ │ EMPTY_RESULT │ └───────────────────────────┘ memory D CREATE TABLE t_unsigned AS SELECT i::UINTEGER AS id FROM range(0, 100) t(i); memory D EXPLAIN SELECT * FROM t_unsigned WHERE id::BIGINT > 100; ┌─────────────────────────────┐ │┌───────────────────────────┐│ ││ Physical Plan ││ │└───────────────────────────┘│ └─────────────────────────────┘ ┌───────────────────────────┐ │ SEQ_SCAN │ │ ──────────────────── │ │ Table: │ │ memory.main.t_unsigned │ │ │ │ Type: Sequential Scan │ │ Projections: id │ │ │ │ Filters: │ │ (CAST(id AS BIGINT) > 100)│ │ │ │ ~20 rows │ └───────────────────────────┘ ``` In the above example, table `t_unsigned`'s min/max stats should be kept IMO, which could avoid the scan.

- Compute version matrix using git tags instead of using hardcoded version list. - Group bwc test jobs by minor duckdb version. - Fix test-utils commit. We should not use `Update extension` commits because that [fails](https://github.com/duckdb/duckdb/actions/runs/25500841929/job/74839369343#step:8:111): ``` File "/home/runner/work/duckdb/duckdb/test/bwc/runner.py", line 201, in do_run_test raise RuntimeError( RuntimeError: test-utils extension version mismatch: v1.1.0 is @ 5b9c733 and v1.6.0-dev5484 is @ 7074208 Error: Process completed with exit code 1. ``` ### CI failure is unrelated ``` Run python scripts/plan_cost_runner.py --old build/base/release/duckdb --new build/current/release/duckdb --dir=benchmark/imdb_plan_cost tcache_thread_shutdown(): unaligned tcache chunk detected Aborted (core dumped) ```

…n` to settings and test configs (duckdb#22553) Previously these passes were done only in debug mode: * `ColumnLifetimeAnalyzer` would add a bunch of extra projections to stress test projection maps * We would call `ColumnBindingResolver::Verify` after planning and after every optimizer pass to ensure we never created broken plans, also not as intermediate optimizer steps This PR splits these off into separate options and adds test config runs for them. This actually found a problem with the `CommonAggregateOptimizer` that was previously hidden by the verification projections.

Currently, only `src/libduckdb.so` exists in the uploaded artifact: ``` release artifact includes: duckdb repository/c3d4868dfa/linux_amd64/core_functions.duckdb_extension repository/c3d4868dfa/linux_amd64/parquet.duckdb_extension src/libduckdb.so test/extension/loadable_extension_demo.duckdb_extension test/extension/loadable_extension_optimizer_demo.duckdb_extension test/unittest ``` but `src/libduckdb_static.a` is missing. This PR adds the static archive file also to the uploaded artifact file.

Part of https://github.com/duckdblabs/duckdb-internal/issues/6380 The goal is to generate the parts of the `Transformer` that can be generated. Specifically, extracting the types from `ParseResult` at the correct indices. The script `gen_transformer_v2` (Will delete the other script and rename this one in the future), takes the parsed grammar files, adds the type information in `grammar_types.yml` and generated `TransformRuleInternal` methods. This will automate the extraction of manual retrieval of `ParseResult` with indices and make it easier for developers to write transformer functions. The script does not register generated functions yet (to be done). It currently generates only for the following grammar files: - `transaction.gram` - `use.gram` - `export.gram` - `detach.gram` Take `use.gram` as an example: ``` UseStatement <- 'USE' UseTarget UseTarget <- UseTargetCatalogSchema / SchemaName / CatalogName UseTargetCatalogSchema <- CatalogName '.' ReservedSchemaName DotIdentifier* DotIdentifier <- '.' Identifier ``` The script generates the following file: ```cpp // AUTO-GENERATED by scripts/parser/gen_transformer_v2.py -- DO NOT EDIT #include "duckdb/parser/peg/transformer/peg_transformer.hpp" namespace duckdb { unique_ptr<SQLStatement> PEGTransformerFactory::TransformUseStatementInternal(PEGTransformer &transformer, ParseResult &parse_result) { auto &list_pr = parse_result.Cast<ListParseResult>(); auto use_target = transformer.Transform<QualifiedName>(list_pr, 1); return TransformUseStatement(use_target); } QualifiedName PEGTransformerFactory::TransformUseTargetInternal(PEGTransformer &transformer, ParseResult &parse_result) { auto &list_pr = parse_result.Cast<ListParseResult>(); auto &choice_pr = list_pr.Child<ChoiceParseResult>(0); return TransformUseTarget(transformer, choice_pr.GetResult()); } QualifiedName PEGTransformerFactory::TransformUseTargetCatalogSchemaInternal(PEGTransformer &transformer, ParseResult &parse_result) { auto &list_pr = parse_result.Cast<ListParseResult>(); auto catalog_name = list_pr.Child<IdentifierParseResult>(0).identifier; auto reserved_schema_name = list_pr.Child<IdentifierParseResult>(2).identifier; auto &dot_identifier_opt = list_pr.Child<OptionalParseResult>(3); vector<string> dot_identifier; if (dot_identifier_opt.HasResult()) { auto &dot_identifier_repeat = dot_identifier_opt.GetResult().Cast<RepeatParseResult>(); for (auto &dot_identifier_item : dot_identifier_repeat.GetChildren()) { dot_identifier.push_back(transformer.Transform<string>(dot_identifier_item)); } } return TransformUseTargetCatalogSchema(catalog_name, reserved_schema_name, dot_identifier); } string PEGTransformerFactory::TransformDotIdentifierInternal(PEGTransformer &transformer, ParseResult &parse_result) { auto &list_pr = parse_result.Cast<ListParseResult>(); auto identifier = list_pr.Child<IdentifierParseResult>(1).identifier; return TransformDotIdentifier(identifier); } } // namespace duckdb ``` The script can currently handle simple rule references, optional results, repeat, choices and simple macros. Nested macro (`Parens(List(Rule))`) is not yet supported. Some grammar rules provide no semantic value, for example since they only match a choice of keywords. Those grammar rules can be excluded from generation by adding them to the `excluded` section in `grammar_types.yml` The script also generates declarations for the `Internal` and `user-implemented` stubs with the correct types: ```cpp static unique_ptr<SQLStatement> TransformUseStatementInternal(PEGTransformer &transformer, ParseResult &parse_result); static unique_ptr<SQLStatement> TransformUseStatement(QualifiedName use_target); ``` This is included in the `PEGTransformerFactory`: https://github.com/Dtenwolde/duckdb/blob/0761edb240deb637cf747fbdeb746157c54d95a0/src/include/duckdb/parser/peg/transformer/peg_transformer.hpp#L1213 I added the generate script to `build_grammar.sh` together with a `make format-fix` since the generated code does not always pass the format checker, but perhaps this is not the cleanest. The generator script can be improved in this aspect For future work: - Make the script work with nested macros - Generate more complex grammar files / rewrite the user-implemented stubs - Make script emit properly formatted code

Fix NightlyTests CI job `Release Assertions with Clang` linking [failure](https://github.com/duckdb/duckdb/actions/runs/25645238791/job/75274381203#step:8:546): ``` mold: error: undefined symbol: __atomic_is_lock_free >>> referenced by ub_test_common.cpp >>> test/common/CMakeFiles/test_common.dir/ub_test_common.cpp.o:(____C_A_T_C_H____T_E_S_T____40())>>> referenced by ub_test_common.cpp >>> test/common/CMakeFiles/test_common.dir/ub_test_common.cpp.o:(____C_A_T_C_H____T_E_S_T____40()) clang++: error: linker command failed with exit code 1 (use -v to see invocation) [17/18] Linking CXX executable duckdb ``` The C++ test `test/common/test_checked_integer.cpp:247` calls `std::atomic<int64_t>().is_lock_free()`: ```c++ TEST_CASE("CheckedInteger atomic operations", "[checked_integer]") { SECTION("is_lock_free / is_always_lock_free forward to underlying type") { std::atomic<ci64> a(0); REQUIRE(a.is_lock_free() == std::atomic<int64_t>().is_lock_free()); REQUIRE(std::atomic<ci64>::is_always_lock_free == std::atomic<int64_t>::is_always_lock_free); } ``` On Linux/clang, `is_lock_free()` can lower to the runtime symbol `__atomic_is_lock_free`, which lives in `libatomic`. Fixes https://github.com/duckdblabs/duckdb-internal/issues/9167.

Fix: https://github.com/duckdblabs/duckdb-internal/issues/9063 The following query failed with an internal exception, while this was previously a parser error: ```sql n''''+ST�&6>ttacrotablesampimportleT�&6>tta�rordeT�Niqeger[], e�tfilter% t_not_currorderent; ; ``` It looks like an odd query, but it seems to be caused by the `[]` which matches a `SliceExpression`. Every element of this subrule is optional: `SliceBound <- Expression? EndSliceBound? StepSliceBound?` But a completely empty `SliceExpression` should not be accepted (Previously led to a generic `syntax error`). So now we check if the result is empty and throw a descriptive error if this happens. I am surprised this wasn't previously caught in other tests

…ed > 1s Logs via DUCKDB_LOG_WARNING with queue_size and dead_nodes count. Also prints to stderr in DEBUG builds for local debugging. Requires: SET GLOBAL enable_logging = true; SET GLOBAL logging_level = 'warning'; Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Dtenwolde and others added 30 commits May 7, 2026 10:23

Starting on auto generation

f397f55

Clean up inline_grammar.py

2c150b3

Add --write mode to write autogenerated transformer files

973dd16

Add generated subdirectory

6ac4e6b

Add generated header

b62515f

Add generated Internal wrapper, fix up written case

056c085

Update variable name

270e9a3

Handle choice rules as well

512b491

Move DotIdentifier to separate rule

794e79e

Auto generate move of use.gram

a5ba7cc

Remove old TransfomrUseTargetCatalog

ad7fcc8

Support List and Parens more explicitly

8959074

Auto generate parts of transform_transaction

faf9139

Update gen to write into class

602886c

Add first version of generated file as well

0c7c697

Move excluded rules to yml file

1e27d70

Auto generate more of transaction

42e4bbd

Format fix

c9b4fa1

Bump ci

24f763f

Bump CI again

5ac4e26

Bump CI

3fa616e

Merge remote-tracking branch 'upstream/main' into autogen-transformer

42f033d

Remove regex node

06471fb

remove negation node

336acca

Make * and + more like matcher

7193b72

Dead code removal

fe0f62b

Update docstring

dab3435

Add transformer generation to build_grammar.sh

4862981

Move format-venv to .cache/ because build/ is often nuked

bb6c6ba

Ignore four more slow hash-zero tests

57083d4

dentiny and others added 27 commits May 9, 2026 17:52

empty commit to trigger CI: rerun flaky test

d6ff89d

disallow cast between enums

d09782b

Move verification projection to test config

84e502c

Split off debug_verify_column_bindings into a setting as well

263a7b2

Add to Main.yml

a974dd6

Auto generate export as well

0761edb

Format fix

29fc453

Make parameters const & if possible

c1841a3

Format fix

b139b06

add define guard to fix tidy check

100399d

Move generated rules to a single file

77f2a6e

Auto register the generated rules

b57d8be

Remove move if not needed

4941c7a

Remove old declaration

637037d

Add libduckdb_static.a to build artifact in CI

af35978

Guard against empty subscript expression

a9936bb

Add test on empty subscript

7bfa2fc

Change to empty check

d702be2

Bump CI

5230c02

Add libatomic for missing __atomic_is_lock_free

cfdc4fc

krleonid changed the base branch from v1.5-variegata to main May 12, 2026 09:49

krleonid force-pushed the duckdb_log_clean branch from cf4e592 to 1c06de8 Compare May 12, 2026 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add warning log to EvictionQueue::Purge for slow purge detection#45

Add warning log to EvictionQueue::Purge for slow purge detection#45
krleonid wants to merge 65 commits into
mainfrom
duckdb_log_clean

krleonid commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

krleonid commented May 12, 2026

Summary

Usage

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants