Add warning log to EvictionQueue::Purge for slow purge detection#45
Open
krleonid wants to merge 65 commits into
Open
Add warning log to EvictionQueue::Purge for slow purge detection#45krleonid wants to merge 65 commits into
krleonid wants to merge 65 commits into
Conversation
Hi team, I found for unsigned integer cast, stats are not applied as expected, which leads to unnecessary table scan. ```sql memory D CREATE TABLE t_signed AS SELECT i::INT AS id FROM range(0, 100) t(i); memory D EXPLAIN SELECT * FROM t_signed WHERE id::BIGINT > 100; ┌─────────────────────────────┐ │┌───────────────────────────┐│ ││ Physical Plan ││ │└───────────────────────────┘│ └─────────────────────────────┘ ┌───────────────────────────┐ │ EMPTY_RESULT │ └───────────────────────────┘ memory D CREATE TABLE t_unsigned AS SELECT i::UINTEGER AS id FROM range(0, 100) t(i); memory D EXPLAIN SELECT * FROM t_unsigned WHERE id::BIGINT > 100; ┌─────────────────────────────┐ │┌───────────────────────────┐│ ││ Physical Plan ││ │└───────────────────────────┘│ └─────────────────────────────┘ ┌───────────────────────────┐ │ SEQ_SCAN │ │ ──────────────────── │ │ Table: │ │ memory.main.t_unsigned │ │ │ │ Type: Sequential Scan │ │ Projections: id │ │ │ │ Filters: │ │ (CAST(id AS BIGINT) > 100)│ │ │ │ ~20 rows │ └───────────────────────────┘ ``` In the above example, table `t_unsigned`'s min/max stats should be kept IMO, which could avoid the scan.
- Compute version matrix using git tags instead of using hardcoded version list. - Group bwc test jobs by minor duckdb version. - Fix test-utils commit. We should not use `Update extension` commits because that [fails](https://github.com/duckdb/duckdb/actions/runs/25500841929/job/74839369343#step:8:111): ``` File "/home/runner/work/duckdb/duckdb/test/bwc/runner.py", line 201, in do_run_test raise RuntimeError( RuntimeError: test-utils extension version mismatch: v1.1.0 is @ 5b9c733 and v1.6.0-dev5484 is @ 7074208 Error: Process completed with exit code 1. ``` ### CI failure is unrelated ``` Run python scripts/plan_cost_runner.py --old build/base/release/duckdb --new build/current/release/duckdb --dir=benchmark/imdb_plan_cost tcache_thread_shutdown(): unaligned tcache chunk detected Aborted (core dumped) ```
…n` to settings and test configs (duckdb#22553) Previously these passes were done only in debug mode: * `ColumnLifetimeAnalyzer` would add a bunch of extra projections to stress test projection maps * We would call `ColumnBindingResolver::Verify` after planning and after every optimizer pass to ensure we never created broken plans, also not as intermediate optimizer steps This PR splits these off into separate options and adds test config runs for them. This actually found a problem with the `CommonAggregateOptimizer` that was previously hidden by the verification projections.
Currently, only `src/libduckdb.so` exists in the uploaded artifact: ``` release artifact includes: duckdb repository/c3d4868dfa/linux_amd64/core_functions.duckdb_extension repository/c3d4868dfa/linux_amd64/parquet.duckdb_extension src/libduckdb.so test/extension/loadable_extension_demo.duckdb_extension test/extension/loadable_extension_optimizer_demo.duckdb_extension test/unittest ``` but `src/libduckdb_static.a` is missing. This PR adds the static archive file also to the uploaded artifact file.
Part of https://github.com/duckdblabs/duckdb-internal/issues/6380 The goal is to generate the parts of the `Transformer` that can be generated. Specifically, extracting the types from `ParseResult` at the correct indices. The script `gen_transformer_v2` (Will delete the other script and rename this one in the future), takes the parsed grammar files, adds the type information in `grammar_types.yml` and generated `TransformRuleInternal` methods. This will automate the extraction of manual retrieval of `ParseResult` with indices and make it easier for developers to write transformer functions. The script does not register generated functions yet (to be done). It currently generates only for the following grammar files: - `transaction.gram` - `use.gram` - `export.gram` - `detach.gram` Take `use.gram` as an example: ``` UseStatement <- 'USE' UseTarget UseTarget <- UseTargetCatalogSchema / SchemaName / CatalogName UseTargetCatalogSchema <- CatalogName '.' ReservedSchemaName DotIdentifier* DotIdentifier <- '.' Identifier ``` The script generates the following file: ```cpp // AUTO-GENERATED by scripts/parser/gen_transformer_v2.py -- DO NOT EDIT #include "duckdb/parser/peg/transformer/peg_transformer.hpp" namespace duckdb { unique_ptr<SQLStatement> PEGTransformerFactory::TransformUseStatementInternal(PEGTransformer &transformer, ParseResult &parse_result) { auto &list_pr = parse_result.Cast<ListParseResult>(); auto use_target = transformer.Transform<QualifiedName>(list_pr, 1); return TransformUseStatement(use_target); } QualifiedName PEGTransformerFactory::TransformUseTargetInternal(PEGTransformer &transformer, ParseResult &parse_result) { auto &list_pr = parse_result.Cast<ListParseResult>(); auto &choice_pr = list_pr.Child<ChoiceParseResult>(0); return TransformUseTarget(transformer, choice_pr.GetResult()); } QualifiedName PEGTransformerFactory::TransformUseTargetCatalogSchemaInternal(PEGTransformer &transformer, ParseResult &parse_result) { auto &list_pr = parse_result.Cast<ListParseResult>(); auto catalog_name = list_pr.Child<IdentifierParseResult>(0).identifier; auto reserved_schema_name = list_pr.Child<IdentifierParseResult>(2).identifier; auto &dot_identifier_opt = list_pr.Child<OptionalParseResult>(3); vector<string> dot_identifier; if (dot_identifier_opt.HasResult()) { auto &dot_identifier_repeat = dot_identifier_opt.GetResult().Cast<RepeatParseResult>(); for (auto &dot_identifier_item : dot_identifier_repeat.GetChildren()) { dot_identifier.push_back(transformer.Transform<string>(dot_identifier_item)); } } return TransformUseTargetCatalogSchema(catalog_name, reserved_schema_name, dot_identifier); } string PEGTransformerFactory::TransformDotIdentifierInternal(PEGTransformer &transformer, ParseResult &parse_result) { auto &list_pr = parse_result.Cast<ListParseResult>(); auto identifier = list_pr.Child<IdentifierParseResult>(1).identifier; return TransformDotIdentifier(identifier); } } // namespace duckdb ``` The script can currently handle simple rule references, optional results, repeat, choices and simple macros. Nested macro (`Parens(List(Rule))`) is not yet supported. Some grammar rules provide no semantic value, for example since they only match a choice of keywords. Those grammar rules can be excluded from generation by adding them to the `excluded` section in `grammar_types.yml` The script also generates declarations for the `Internal` and `user-implemented` stubs with the correct types: ```cpp static unique_ptr<SQLStatement> TransformUseStatementInternal(PEGTransformer &transformer, ParseResult &parse_result); static unique_ptr<SQLStatement> TransformUseStatement(QualifiedName use_target); ``` This is included in the `PEGTransformerFactory`: https://github.com/Dtenwolde/duckdb/blob/0761edb240deb637cf747fbdeb746157c54d95a0/src/include/duckdb/parser/peg/transformer/peg_transformer.hpp#L1213 I added the generate script to `build_grammar.sh` together with a `make format-fix` since the generated code does not always pass the format checker, but perhaps this is not the cleanest. The generator script can be improved in this aspect For future work: - Make the script work with nested macros - Generate more complex grammar files / rewrite the user-implemented stubs - Make script emit properly formatted code
Fix NightlyTests CI job `Release Assertions with Clang` linking [failure](https://github.com/duckdb/duckdb/actions/runs/25645238791/job/75274381203#step:8:546): ``` mold: error: undefined symbol: __atomic_is_lock_free >>> referenced by ub_test_common.cpp >>> test/common/CMakeFiles/test_common.dir/ub_test_common.cpp.o:(____C_A_T_C_H____T_E_S_T____40())>>> referenced by ub_test_common.cpp >>> test/common/CMakeFiles/test_common.dir/ub_test_common.cpp.o:(____C_A_T_C_H____T_E_S_T____40()) clang++: error: linker command failed with exit code 1 (use -v to see invocation) [17/18] Linking CXX executable duckdb ``` The C++ test `test/common/test_checked_integer.cpp:247` calls `std::atomic<int64_t>().is_lock_free()`: ```c++ TEST_CASE("CheckedInteger atomic operations", "[checked_integer]") { SECTION("is_lock_free / is_always_lock_free forward to underlying type") { std::atomic<ci64> a(0); REQUIRE(a.is_lock_free() == std::atomic<int64_t>().is_lock_free()); REQUIRE(std::atomic<ci64>::is_always_lock_free == std::atomic<int64_t>::is_always_lock_free); } ``` On Linux/clang, `is_lock_free()` can lower to the runtime symbol `__atomic_is_lock_free`, which lives in `libatomic`. Fixes https://github.com/duckdblabs/duckdb-internal/issues/9167.
Fix: https://github.com/duckdblabs/duckdb-internal/issues/9063 The following query failed with an internal exception, while this was previously a parser error: ```sql n''''+ST�&6>ttacrotablesampimportleT�&6>tta�rordeT�Niqeger[], e�tfilter% t_not_currorderent; ; ``` It looks like an odd query, but it seems to be caused by the `[]` which matches a `SliceExpression`. Every element of this subrule is optional: `SliceBound <- Expression? EndSliceBound? StepSliceBound?` But a completely empty `SliceExpression` should not be accepted (Previously led to a generic `syntax error`). So now we check if the result is empty and throw a descriptive error if this happens. I am surprised this wasn't previously caught in other tests
…ed > 1s Logs via DUCKDB_LOG_WARNING with queue_size and dead_nodes count. Also prints to stderr in DEBUG builds for local debugging. Requires: SET GLOBAL enable_logging = true; SET GLOBAL logging_level = 'warning'; Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cf4e592 to
1c06de8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DUCKDB_LOG_WARNINGtoEvictionQueue::Purge()when iterations > 10 or elapsed time > 1 secondPrinter::PrintFUsage
Test plan
duckdb_logs()when purge takes many iterations🤖 Generated with Claude Code