Skip to content

Add warning log to EvictionQueue::Purge for slow purge detection#45

Open
krleonid wants to merge 65 commits into
mainfrom
duckdb_log_clean
Open

Add warning log to EvictionQueue::Purge for slow purge detection#45
krleonid wants to merge 65 commits into
mainfrom
duckdb_log_clean

Conversation

@krleonid
Copy link
Copy Markdown
Owner

Summary

  • Add DUCKDB_LOG_WARNING to EvictionQueue::Purge() when iterations > 10 or elapsed time > 1 second
  • Logs queue_size and dead_nodes count for diagnosis
  • Also prints to stderr in DEBUG builds via Printer::PrintF

Usage

SET GLOBAL enable_logging = true;
SET GLOBAL logging_level = 'warning';
-- run workload, then:
SELECT * FROM duckdb_logs() WHERE message LIKE '%Purge%';

Test plan

  • Run concurrent workload with multiple attached databases
  • Verify warning appears in duckdb_logs() when purge takes many iterations
  • Verify no output when purge is fast (< 10 iterations, < 1s)

🤖 Generated with Claude Code

dentiny and others added 27 commits May 9, 2026 17:52
Hi team, I found for unsigned integer cast, stats are not applied as
expected, which leads to unnecessary table scan.
```sql
memory D CREATE TABLE t_signed AS SELECT i::INT AS id FROM range(0, 100) t(i);
memory D EXPLAIN SELECT * FROM t_signed WHERE id::BIGINT > 100;

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│        EMPTY_RESULT       │
└───────────────────────────┘
memory D CREATE TABLE t_unsigned AS SELECT i::UINTEGER AS id FROM range(0, 100) t(i);
memory D EXPLAIN SELECT * FROM t_unsigned WHERE id::BIGINT > 100;

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│          SEQ_SCAN         │
│    ────────────────────   │
│           Table:          │
│   memory.main.t_unsigned  │
│                           │
│   Type: Sequential Scan   │
│      Projections: id      │
│                           │
│          Filters:         │
│ (CAST(id AS BIGINT) > 100)│
│                           │
│          ~20 rows         │
└───────────────────────────┘
```
In the above example, table `t_unsigned`'s min/max stats should be kept
IMO, which could avoid the scan.
- Compute version matrix using git tags instead of using hardcoded
version list.
- Group bwc test jobs by minor duckdb version.
- Fix test-utils commit.

We should not use `Update extension` commits because that
[fails](https://github.com/duckdb/duckdb/actions/runs/25500841929/job/74839369343#step:8:111):
```
  File "/home/runner/work/duckdb/duckdb/test/bwc/runner.py", line 201, in do_run_test
    raise RuntimeError(
RuntimeError: test-utils extension version mismatch:
v1.1.0 is @ 5b9c733 and v1.6.0-dev5484 is @ 7074208
Error: Process completed with exit code 1.
```

### CI failure is unrelated

```
Run python scripts/plan_cost_runner.py --old build/base/release/duckdb --new build/current/release/duckdb --dir=benchmark/imdb_plan_cost
tcache_thread_shutdown(): unaligned tcache chunk detected
Aborted (core dumped)
```
…n` to settings and test configs (duckdb#22553)

Previously these passes were done only in debug mode:

* `ColumnLifetimeAnalyzer` would add a bunch of extra projections to
stress test projection maps
* We would call `ColumnBindingResolver::Verify` after planning and after
every optimizer pass to ensure we never created broken plans, also not
as intermediate optimizer steps

This PR splits these off into separate options and adds test config runs
for them. This actually found a problem with the
`CommonAggregateOptimizer` that was previously hidden by the
verification projections.
Currently, only `src/libduckdb.so` exists in the uploaded artifact:
```
release artifact includes:
  duckdb
  repository/c3d4868dfa/linux_amd64/core_functions.duckdb_extension
  repository/c3d4868dfa/linux_amd64/parquet.duckdb_extension
  src/libduckdb.so
  test/extension/loadable_extension_demo.duckdb_extension
  test/extension/loadable_extension_optimizer_demo.duckdb_extension
  test/unittest
```
but `src/libduckdb_static.a` is missing.

This PR adds the static archive file also to the uploaded artifact file.
Part of https://github.com/duckdblabs/duckdb-internal/issues/6380

The goal is to generate the parts of the `Transformer` that can be
generated. Specifically, extracting the types from `ParseResult` at the
correct indices. The script `gen_transformer_v2` (Will delete the other
script and rename this one in the future), takes the parsed grammar
files, adds the type information in `grammar_types.yml` and generated
`TransformRuleInternal` methods. This will automate the extraction of
manual retrieval of `ParseResult` with indices and make it easier for
developers to write transformer functions. The script does not register
generated functions yet (to be done).

It currently generates only for the following grammar files:
- `transaction.gram`
- `use.gram`
- `export.gram`
- `detach.gram`

Take `use.gram` as an example:
```
UseStatement <- 'USE' UseTarget
UseTarget <- UseTargetCatalogSchema / SchemaName / CatalogName
UseTargetCatalogSchema <- CatalogName '.' ReservedSchemaName DotIdentifier*
DotIdentifier <- '.' Identifier
```

The script generates the following file:
```cpp
// AUTO-GENERATED by scripts/parser/gen_transformer_v2.py -- DO NOT EDIT
#include "duckdb/parser/peg/transformer/peg_transformer.hpp"

namespace duckdb {

unique_ptr<SQLStatement> PEGTransformerFactory::TransformUseStatementInternal(PEGTransformer &transformer,
                                                                              ParseResult &parse_result) {
	auto &list_pr = parse_result.Cast<ListParseResult>();
	auto use_target = transformer.Transform<QualifiedName>(list_pr, 1);
	return TransformUseStatement(use_target);
}

QualifiedName PEGTransformerFactory::TransformUseTargetInternal(PEGTransformer &transformer,
                                                                ParseResult &parse_result) {
	auto &list_pr = parse_result.Cast<ListParseResult>();
	auto &choice_pr = list_pr.Child<ChoiceParseResult>(0);
	return TransformUseTarget(transformer, choice_pr.GetResult());
}

QualifiedName PEGTransformerFactory::TransformUseTargetCatalogSchemaInternal(PEGTransformer &transformer,
                                                                             ParseResult &parse_result) {
	auto &list_pr = parse_result.Cast<ListParseResult>();
	auto catalog_name = list_pr.Child<IdentifierParseResult>(0).identifier;
	auto reserved_schema_name = list_pr.Child<IdentifierParseResult>(2).identifier;
	auto &dot_identifier_opt = list_pr.Child<OptionalParseResult>(3);
	vector<string> dot_identifier;
	if (dot_identifier_opt.HasResult()) {
		auto &dot_identifier_repeat = dot_identifier_opt.GetResult().Cast<RepeatParseResult>();
		for (auto &dot_identifier_item : dot_identifier_repeat.GetChildren()) {
			dot_identifier.push_back(transformer.Transform<string>(dot_identifier_item));
		}
	}
	return TransformUseTargetCatalogSchema(catalog_name, reserved_schema_name, dot_identifier);
}

string PEGTransformerFactory::TransformDotIdentifierInternal(PEGTransformer &transformer, ParseResult &parse_result) {
	auto &list_pr = parse_result.Cast<ListParseResult>();
	auto identifier = list_pr.Child<IdentifierParseResult>(1).identifier;
	return TransformDotIdentifier(identifier);
}

} // namespace duckdb
```

The script can currently handle simple rule references, optional
results, repeat, choices and simple macros. Nested macro
(`Parens(List(Rule))`) is not yet supported.

Some grammar rules provide no semantic value, for example since they
only match a choice of keywords. Those grammar rules can be excluded
from generation by adding them to the `excluded` section in
`grammar_types.yml`

The script also generates declarations for the `Internal` and
`user-implemented` stubs with the correct types:
```cpp
static unique_ptr<SQLStatement> TransformUseStatementInternal(PEGTransformer &transformer, ParseResult &parse_result);
static unique_ptr<SQLStatement> TransformUseStatement(QualifiedName use_target);
```
This is included in the `PEGTransformerFactory`:
https://github.com/Dtenwolde/duckdb/blob/0761edb240deb637cf747fbdeb746157c54d95a0/src/include/duckdb/parser/peg/transformer/peg_transformer.hpp#L1213

I added the generate script to `build_grammar.sh` together with a `make
format-fix` since the generated code does not always pass the format
checker, but perhaps this is not the cleanest. The generator script can
be improved in this aspect

For future work:
- Make the script work with nested macros
- Generate more complex grammar files / rewrite the user-implemented
stubs
- Make script emit properly formatted code
Fix NightlyTests CI job `Release Assertions with Clang` linking
[failure](https://github.com/duckdb/duckdb/actions/runs/25645238791/job/75274381203#step:8:546):
```

mold: error: undefined symbol: __atomic_is_lock_free
>>> referenced by ub_test_common.cpp
>>>               test/common/CMakeFiles/test_common.dir/ub_test_common.cpp.o:(____C_A_T_C_H____T_E_S_T____40())>>> referenced by ub_test_common.cpp
>>>               test/common/CMakeFiles/test_common.dir/ub_test_common.cpp.o:(____C_A_T_C_H____T_E_S_T____40())
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
[17/18] Linking CXX executable duckdb
```

The C++ test `test/common/test_checked_integer.cpp:247` calls
`std::atomic<int64_t>().is_lock_free()`:
```c++
TEST_CASE("CheckedInteger atomic operations", "[checked_integer]") {
        SECTION("is_lock_free / is_always_lock_free forward to underlying type") {
                std::atomic<ci64> a(0);
                REQUIRE(a.is_lock_free() == std::atomic<int64_t>().is_lock_free());
                REQUIRE(std::atomic<ci64>::is_always_lock_free == std::atomic<int64_t>::is_always_lock_free);
        }
```

On Linux/clang, `is_lock_free()` can lower to the runtime symbol
`__atomic_is_lock_free`, which lives in `libatomic`.

Fixes https://github.com/duckdblabs/duckdb-internal/issues/9167.
Fix: https://github.com/duckdblabs/duckdb-internal/issues/9063

The following query failed with an internal exception, while this was
previously a parser error:
```sql
n''''+ST�&6>ttacrotablesampimportleT�&6>tta�rordeT�Niqeger[], e�tfilter%
t_not_currorderent;
;
```

It looks like an odd query, but it seems to be caused by the `[]` which
matches a `SliceExpression`. Every element of this subrule is optional:
`SliceBound <- Expression? EndSliceBound? StepSliceBound?`
But a completely empty `SliceExpression` should not be accepted
(Previously led to a generic `syntax error`). So now we check if the
result is empty and throw a descriptive error if this happens.

I am surprised this wasn't previously caught in other tests
@krleonid krleonid changed the base branch from v1.5-variegata to main May 12, 2026 09:49
…ed > 1s

Logs via DUCKDB_LOG_WARNING with queue_size and dead_nodes count.
Also prints to stderr in DEBUG builds for local debugging.

Requires: SET GLOBAL enable_logging = true; SET GLOBAL logging_level = 'warning';

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@krleonid krleonid force-pushed the duckdb_log_clean branch from cf4e592 to 1c06de8 Compare May 12, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants