From 94e36d59a8da6c7711c008e2936d84e26f6a62b5 Mon Sep 17 00:00:00 2001 From: Kai Huang Date: Mon, 11 May 2026 16:25:14 -0700 Subject: [PATCH 1/3] Forward PPL_SYNTAX_LEGACY_PREFERRED through UnifiedQueryContext `RestUnifiedQueryAction.applyClusterOverrides` previously only forwarded `PPL_REX_MAX_MATCH_LIMIT` into the per-request `UnifiedQueryContext`. As a result, cluster-side updates to `plugins.ppl.syntax.legacy.preferred` were ignored on the analytics-engine route: `PPLQueryParser` -> `AstBuilder` -> `ArgumentFactory` read the legacy-preferred flag from the unified context's settings map, which never received the override. This caused queries like `top age` / `rare state` with `usenull` defaulting off to behave as if `usenull=true` on the analytics route. Refactor the override builder into a small helper and forward both `PPL_REX_MAX_MATCH_LIMIT` and `PPL_SYNTAX_LEGACY_PREFERRED`. Future keys can be added with a one-liner. Fixes `CalciteTopCommandIT.testTopCommandLegacyFalse` and `CalciteRareCommandIT.testRareCommandLegacyFalse` against the analytics route (`-Dtests.analytics.force_routing=true`). Signed-off-by: Kai Huang --- .../plugin/rest/RestUnifiedQueryAction.java | 26 ++++++++++--------- 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java b/plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java index e1bb84778d..95ff51b0b9 100644 --- a/plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java +++ b/plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java @@ -190,23 +190,25 @@ private UnifiedQueryContext buildContext(QueryType queryType, boolean profiling) * setting(String, Object)} API, keeping {@link UnifiedQueryContext} decoupled from any specific * {@link org.opensearch.sql.common.setting.Settings} implementation. * - *

Currently scoped to {@code plugins.ppl.rex.max_match.limit} — required so the unified path - * honors {@code _cluster/settings} updates for {@code rex max_match} (CalciteRexCommandIT's - * testRexMaxMatchConfigurableLimit). Add keys here if a future PR / IT depends on cluster-side - * fidelity for one of the other planning settings. + *

Add keys here if a future PR / IT depends on cluster-side fidelity for one of the other + * planning settings. */ private UnifiedQueryContext.Builder applyClusterOverrides(UnifiedQueryContext.Builder builder) { - Object rexLimit = - pluginSettings.getSettingValue( - org.opensearch.sql.common.setting.Settings.Key.PPL_REX_MAX_MATCH_LIMIT); - if (rexLimit != null) { - builder.setting( - org.opensearch.sql.common.setting.Settings.Key.PPL_REX_MAX_MATCH_LIMIT.getKeyValue(), - rexLimit); - } + forwardClusterSetting( + builder, org.opensearch.sql.common.setting.Settings.Key.PPL_REX_MAX_MATCH_LIMIT); + forwardClusterSetting( + builder, org.opensearch.sql.common.setting.Settings.Key.PPL_SYNTAX_LEGACY_PREFERRED); return builder; } + private void forwardClusterSetting( + UnifiedQueryContext.Builder builder, org.opensearch.sql.common.setting.Settings.Key key) { + Object value = pluginSettings.getSettingValue(key); + if (value != null) { + builder.setting(key.getKeyValue(), value); + } + } + /** * Extract the source index name by parsing the query and visiting the AST to find the Relation * node. Uses the context's parser which supports both PPL and SQL. From f6b8b98ab9e84bc5dc786a27c5fad92c8d007e76 Mon Sep 17 00:00:00 2001 From: Kai Huang Date: Mon, 11 May 2026 16:25:35 -0700 Subject: [PATCH 2/3] Stabilize rare/top tie-break by appending field columns to ROW_NUMBER order `CalciteRelNodeVisitor.visitRareTopN` lowers `rare`/`top` to a `ROW_NUMBER() OVER (PARTITION BY ... ORDER BY count [DESC])` window. With only the count column in the ORDER BY clause, ties at the same count resolved via the upstream operator's insertion order, which differed between backends (in-process Calcite vs. analytics-engine vs. Lucene pushdown). On the analytics-engine route, `testRareWithGroup` failed because ROW_NUMBER picked NV at count=8 while the test expected AR. Append the rare/top field columns as secondary ASC keys so ties resolve alphabetically and deterministically across backends. This matches the behavior of the existing OpenSearch terms-aggregation pushdown, which tie-breaks on `_key:asc`. Update `RareTopPushdownRule` to accept the new shape: 1 or 2 order keys, where the (optional) second key must be the rare/top target field in ASC direction. The pushdown's wire payload is unchanged. Update the matching unit-test expectations in `CalcitePPLRareTopNTest` (11 RelNode/result/SparkSQL strings) and 5 explain YAML fixtures. Fixes `CalciteRareCommandIT.testRareWithGroup` against the analytics route (and removes the same class of tie-break flakiness across other rare/top tests). Signed-off-by: Kai Huang --- .../sql/calcite/CalciteRelNodeVisitor.java | 10 +- .../calcite/explain_nested_agg_top_push.yaml | 2 +- .../calcite/explain_rare_usenull_false.yaml | 2 +- .../calcite/explain_rare_usenull_true.yaml | 4 +- .../calcite/explain_top_usenull_false.yaml | 2 +- .../calcite/explain_top_usenull_true.yaml | 4 +- .../planner/rules/RareTopPushdownRule.java | 30 ++++- .../ppl/calcite/CalcitePPLRareTopNTest.java | 106 +++++++++--------- 8 files changed, 95 insertions(+), 65 deletions(-) diff --git a/core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java b/core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java index 1251f51b13..eae02484c9 100644 --- a/core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java +++ b/core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java @@ -3038,6 +3038,14 @@ public RelNode visitRareTopN(RareTopN node, CalcitePlanContext context) { countField = context.relBuilder.field(countFieldName); } + // Append the rare/top field columns as secondary order keys so ties in the count column + // resolve deterministically. Without this, ROW_NUMBER's tie-break is insertion-order + // dependent and varies between backends (e.g. analytics-engine vs in-process Calcite). + List tieBreakKeys = rexVisitor.analyze(fieldList, context); + List orderKeys = new ArrayList<>(tieBreakKeys.size() + 1); + orderKeys.add(countField); + orderKeys.addAll(tieBreakKeys); + RexNode rowNumberWindowOver = PlanUtils.makeOver( context, @@ -3045,7 +3053,7 @@ public RelNode visitRareTopN(RareTopN node, CalcitePlanContext context) { null, List.of(), partitionKeys, - List.of(countField), + orderKeys, WindowFrame.toCurrentRow()); context.relBuilder.projectPlus( context.relBuilder.alias(rowNumberWindowOver, ROW_NUMBER_COLUMN_FOR_RARE_TOP)); diff --git a/integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml b/integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml index 65f3ed320f..7fedc9a434 100644 --- a/integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml +++ b/integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_top_push.yaml @@ -3,7 +3,7 @@ calcite: LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT]) LogicalProject(address.city=[$0], count=[$1]) LogicalFilter(condition=[<=($2, 10)]) - LogicalProject(address.city=[$0], count=[$1], _row_number_rare_top_=[ROW_NUMBER() OVER (ORDER BY $1 DESC)]) + LogicalProject(address.city=[$0], count=[$1], _row_number_rare_top_=[ROW_NUMBER() OVER (ORDER BY $1 DESC, $0)]) LogicalAggregate(group=[{0}], count=[COUNT()]) LogicalProject(address.city=[$3]) LogicalFilter(condition=[IS NOT NULL($3)]) diff --git a/integ-test/src/test/resources/expectedOutput/calcite/explain_rare_usenull_false.yaml b/integ-test/src/test/resources/expectedOutput/calcite/explain_rare_usenull_false.yaml index 9c5157c72b..eb0c361ad8 100644 --- a/integ-test/src/test/resources/expectedOutput/calcite/explain_rare_usenull_false.yaml +++ b/integ-test/src/test/resources/expectedOutput/calcite/explain_rare_usenull_false.yaml @@ -3,7 +3,7 @@ calcite: LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT]) LogicalProject(gender=[$0], state=[$1], count=[$2]) LogicalFilter(condition=[<=($3, 2)]) - LogicalProject(gender=[$0], state=[$1], count=[$2], _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2)]) + LogicalProject(gender=[$0], state=[$1], count=[$2], _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2, $1)]) LogicalAggregate(group=[{0, 1}], count=[COUNT()]) LogicalProject(gender=[$4], state=[$7]) LogicalFilter(condition=[AND(IS NOT NULL($4), IS NOT NULL($7))]) diff --git a/integ-test/src/test/resources/expectedOutput/calcite/explain_rare_usenull_true.yaml b/integ-test/src/test/resources/expectedOutput/calcite/explain_rare_usenull_true.yaml index 7bc4aa1e21..f936ad008e 100644 --- a/integ-test/src/test/resources/expectedOutput/calcite/explain_rare_usenull_true.yaml +++ b/integ-test/src/test/resources/expectedOutput/calcite/explain_rare_usenull_true.yaml @@ -3,12 +3,12 @@ calcite: LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT]) LogicalProject(gender=[$0], state=[$1], count=[$2]) LogicalFilter(condition=[<=($3, 2)]) - LogicalProject(gender=[$0], state=[$1], count=[$2], _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2)]) + LogicalProject(gender=[$0], state=[$1], count=[$2], _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2, $1)]) LogicalAggregate(group=[{0, 1}], count=[COUNT()]) LogicalProject(gender=[$4], state=[$7]) CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]]) physical: | EnumerableLimit(fetch=[10000]) EnumerableCalc(expr#0..3=[{inputs}], expr#4=[2], expr#5=[<=($t3, $t4)], proj#0..2=[{exprs}], $condition=[$t5]) - EnumerableWindow(window#0=[window(partition {0} order by [2] rows between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])]) + EnumerableWindow(window#0=[window(partition {0} order by [2, 1] rows between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])]) CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1},count=COUNT())], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":true,"missing_order":"first","order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":true,"missing_order":"first","order":"asc"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) \ No newline at end of file diff --git a/integ-test/src/test/resources/expectedOutput/calcite/explain_top_usenull_false.yaml b/integ-test/src/test/resources/expectedOutput/calcite/explain_top_usenull_false.yaml index 21457c6170..bcb8a6c16f 100644 --- a/integ-test/src/test/resources/expectedOutput/calcite/explain_top_usenull_false.yaml +++ b/integ-test/src/test/resources/expectedOutput/calcite/explain_top_usenull_false.yaml @@ -3,7 +3,7 @@ calcite: LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT]) LogicalProject(gender=[$0], state=[$1], count=[$2]) LogicalFilter(condition=[<=($3, 2)]) - LogicalProject(gender=[$0], state=[$1], count=[$2], _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2 DESC)]) + LogicalProject(gender=[$0], state=[$1], count=[$2], _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2 DESC, $1)]) LogicalAggregate(group=[{0, 1}], count=[COUNT()]) LogicalProject(gender=[$4], state=[$7]) LogicalFilter(condition=[AND(IS NOT NULL($4), IS NOT NULL($7))]) diff --git a/integ-test/src/test/resources/expectedOutput/calcite/explain_top_usenull_true.yaml b/integ-test/src/test/resources/expectedOutput/calcite/explain_top_usenull_true.yaml index 51ffb88340..704b043dce 100644 --- a/integ-test/src/test/resources/expectedOutput/calcite/explain_top_usenull_true.yaml +++ b/integ-test/src/test/resources/expectedOutput/calcite/explain_top_usenull_true.yaml @@ -3,12 +3,12 @@ calcite: LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT]) LogicalProject(gender=[$0], state=[$1], count=[$2]) LogicalFilter(condition=[<=($3, 2)]) - LogicalProject(gender=[$0], state=[$1], count=[$2], _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2 DESC)]) + LogicalProject(gender=[$0], state=[$1], count=[$2], _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2 DESC, $1)]) LogicalAggregate(group=[{0, 1}], count=[COUNT()]) LogicalProject(gender=[$4], state=[$7]) CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]]) physical: | EnumerableLimit(fetch=[10000]) EnumerableCalc(expr#0..3=[{inputs}], expr#4=[2], expr#5=[<=($t3, $t4)], proj#0..2=[{exprs}], $condition=[$t5]) - EnumerableWindow(window#0=[window(partition {0} order by [2 DESC] rows between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])]) + EnumerableWindow(window#0=[window(partition {0} order by [2 DESC, 1] rows between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])]) CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1},count=COUNT())], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":true,"missing_order":"first","order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":true,"missing_order":"first","order":"asc"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) \ No newline at end of file diff --git a/opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java b/opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java index 5a815e8745..2bee10c130 100644 --- a/opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java +++ b/opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java @@ -8,6 +8,7 @@ import java.util.List; import java.util.function.Predicate; import org.apache.calcite.plan.RelOptRuleCall; +import org.apache.calcite.rel.RelFieldCollation; import org.apache.calcite.rel.logical.LogicalFilter; import org.apache.calcite.rel.logical.LogicalProject; import org.apache.calcite.rel.rules.SubstitutionRule; @@ -49,22 +50,39 @@ protected void onMatchImpl(RelOptRuleCall call) { List groupIndices = PlanUtils.getSelectColumns(windows.getFirst().partitionKeys); List byList = groupIndices.stream().map(fieldNameList::get).toList(); - if (windows.getFirst().orderKeys.size() != 1) { + // ORDER BY must be either the count column alone, or the count column followed by the + // rare/top target field ASC (the stable tie-break inserted by visitRareTopN). The pushdown + // to the OpenSearch terms aggregation naturally tie-breaks on `_key:asc`, so the latter + // shape lowers to the same wire request as the single-key shape. + List orderKeys = windows.getFirst().orderKeys; + if (orderKeys.isEmpty() || orderKeys.size() > 2) { return; } - RexFieldCollation orderKey = windows.getFirst().orderKeys.getFirst(); - List orderIndices = PlanUtils.getSelectColumns(List.of(orderKey.getKey())); - List orderList = orderIndices.stream().map(fieldNameList::get).toList(); + RexFieldCollation primaryOrderKey = orderKeys.getFirst(); + List primaryOrderIndices = + PlanUtils.getSelectColumns(List.of(primaryOrderKey.getKey())); + List primaryOrderList = primaryOrderIndices.stream().map(fieldNameList::get).toList(); List targetList = fieldNameList.stream() .filter(Predicate.not(byList::contains)) - .filter(Predicate.not(orderList::contains)) + .filter(Predicate.not(primaryOrderList::contains)) .toList(); if (targetList.size() != 1) { return; } String targetName = targetList.getFirst(); - digest = new RareTopDigest(targetName, byList, number, orderKey.getDirection()); + if (orderKeys.size() == 2) { + RexFieldCollation tieBreakKey = orderKeys.get(1); + if (tieBreakKey.getDirection() != RelFieldCollation.Direction.ASCENDING) { + return; + } + List tieBreakIndices = PlanUtils.getSelectColumns(List.of(tieBreakKey.getKey())); + List tieBreakList = tieBreakIndices.stream().map(fieldNameList::get).toList(); + if (tieBreakList.size() != 1 || !tieBreakList.getFirst().equals(targetName)) { + return; + } + } + digest = new RareTopDigest(targetName, byList, number, primaryOrderKey.getDirection()); } catch (Exception e) { return; } diff --git a/ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLRareTopNTest.java b/ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLRareTopNTest.java index 2fcee84931..21aa15c2e6 100644 --- a/ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLRareTopNTest.java +++ b/ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLRareTopNTest.java @@ -27,7 +27,7 @@ public void testRare() { "LogicalProject(JOB=[$0], count=[$1])\n" + " LogicalFilter(condition=[<=($2, 10)])\n" + " LogicalProject(JOB=[$0], count=[$1], _row_number_rare_top_=[ROW_NUMBER() OVER" - + " (ORDER BY $1)])\n" + + " (ORDER BY $1, $0)])\n" + " LogicalAggregate(group=[{0}], count=[COUNT()])\n" + " LogicalProject(JOB=[$2])\n" + " LogicalTableScan(table=[[scott, EMP]])\n"; @@ -38,14 +38,14 @@ public void testRare() { + "JOB=PRESIDENT; count=1\n" + "JOB=ANALYST; count=2\n" + "JOB=MANAGER; count=3\n" - + "JOB=SALESMAN; count=4\n" - + "JOB=CLERK; count=4\n"; + + "JOB=CLERK; count=4\n" + + "JOB=SALESMAN; count=4\n"; verifyResult(root, expectedResult); String expectedSparkSql = "SELECT `JOB`, `count`\n" + "FROM (SELECT `JOB`, COUNT(*) `count`, ROW_NUMBER() OVER (ORDER BY COUNT(*) NULLS" - + " LAST) `_row_number_rare_top_`\n" + + " LAST, `JOB` NULLS LAST) `_row_number_rare_top_`\n" + "FROM `scott`.`EMP`\n" + "GROUP BY `JOB`) `t1`\n" + "WHERE `_row_number_rare_top_` <= 10"; @@ -60,7 +60,7 @@ public void testRareBy() { "LogicalProject(DEPTNO=[$0], JOB=[$1], count=[$2])\n" + " LogicalFilter(condition=[<=($3, 10)])\n" + " LogicalProject(DEPTNO=[$0], JOB=[$1], count=[$2]," - + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2)])\n" + + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2, $1)])\n" + " LogicalAggregate(group=[{0, 1}], count=[COUNT()])\n" + " LogicalProject(DEPTNO=[$7], JOB=[$2])\n" + " LogicalTableScan(table=[[scott, EMP]])\n"; @@ -69,20 +69,20 @@ public void testRareBy() { String expectedResult = "" + "DEPTNO=20; JOB=MANAGER; count=1\n" - + "DEPTNO=20; JOB=CLERK; count=2\n" + "DEPTNO=20; JOB=ANALYST; count=2\n" - + "DEPTNO=10; JOB=MANAGER; count=1\n" + + "DEPTNO=20; JOB=CLERK; count=2\n" + "DEPTNO=10; JOB=CLERK; count=1\n" + + "DEPTNO=10; JOB=MANAGER; count=1\n" + "DEPTNO=10; JOB=PRESIDENT; count=1\n" - + "DEPTNO=30; JOB=MANAGER; count=1\n" + "DEPTNO=30; JOB=CLERK; count=1\n" + + "DEPTNO=30; JOB=MANAGER; count=1\n" + "DEPTNO=30; JOB=SALESMAN; count=4\n"; verifyResult(root, expectedResult); String expectedSparkSql = "SELECT `DEPTNO`, `JOB`, `count`\n" + "FROM (SELECT `DEPTNO`, `JOB`, COUNT(*) `count`, ROW_NUMBER() OVER (PARTITION BY" - + " `DEPTNO` ORDER BY COUNT(*) NULLS LAST) `_row_number_rare_top_`\n" + + " `DEPTNO` ORDER BY COUNT(*) NULLS LAST, `JOB` NULLS LAST) `_row_number_rare_top_`\n" + "FROM `scott`.`EMP`\n" + "GROUP BY `DEPTNO`, `JOB`) `t1`\n" + "WHERE `_row_number_rare_top_` <= 10"; @@ -97,7 +97,7 @@ public void testRareDisableShowCount() { "LogicalProject(DEPTNO=[$0], JOB=[$1])\n" + " LogicalFilter(condition=[<=($3, 10)])\n" + " LogicalProject(DEPTNO=[$0], JOB=[$1], count=[$2]," - + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2)])\n" + + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2, $1)])\n" + " LogicalAggregate(group=[{0, 1}], count=[COUNT()])\n" + " LogicalProject(DEPTNO=[$7], JOB=[$2])\n" + " LogicalTableScan(table=[[scott, EMP]])\n"; @@ -106,20 +106,20 @@ public void testRareDisableShowCount() { String expectedResult = "" + "DEPTNO=20; JOB=MANAGER\n" - + "DEPTNO=20; JOB=CLERK\n" + "DEPTNO=20; JOB=ANALYST\n" - + "DEPTNO=10; JOB=MANAGER\n" + + "DEPTNO=20; JOB=CLERK\n" + "DEPTNO=10; JOB=CLERK\n" + + "DEPTNO=10; JOB=MANAGER\n" + "DEPTNO=10; JOB=PRESIDENT\n" - + "DEPTNO=30; JOB=MANAGER\n" + "DEPTNO=30; JOB=CLERK\n" + + "DEPTNO=30; JOB=MANAGER\n" + "DEPTNO=30; JOB=SALESMAN\n"; verifyResult(root, expectedResult); String expectedSparkSql = "SELECT `DEPTNO`, `JOB`\n" + "FROM (SELECT `DEPTNO`, `JOB`, COUNT(*) `count`, ROW_NUMBER() OVER (PARTITION BY" - + " `DEPTNO` ORDER BY COUNT(*) NULLS LAST) `_row_number_rare_top_`\n" + + " `DEPTNO` ORDER BY COUNT(*) NULLS LAST, `JOB` NULLS LAST) `_row_number_rare_top_`\n" + "FROM `scott`.`EMP`\n" + "GROUP BY `DEPTNO`, `JOB`) `t1`\n" + "WHERE `_row_number_rare_top_` <= 10"; @@ -134,7 +134,7 @@ public void testRareCountField() { "LogicalProject(DEPTNO=[$0], JOB=[$1], my_cnt=[$2])\n" + " LogicalFilter(condition=[<=($3, 10)])\n" + " LogicalProject(DEPTNO=[$0], JOB=[$1], my_cnt=[$2]," - + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2)])\n" + + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2, $1)])\n" + " LogicalAggregate(group=[{0, 1}], my_cnt=[COUNT()])\n" + " LogicalProject(DEPTNO=[$7], JOB=[$2])\n" + " LogicalTableScan(table=[[scott, EMP]])\n"; @@ -143,20 +143,20 @@ public void testRareCountField() { String expectedResult = "" + "DEPTNO=20; JOB=MANAGER; my_cnt=1\n" - + "DEPTNO=20; JOB=CLERK; my_cnt=2\n" + "DEPTNO=20; JOB=ANALYST; my_cnt=2\n" - + "DEPTNO=10; JOB=MANAGER; my_cnt=1\n" + + "DEPTNO=20; JOB=CLERK; my_cnt=2\n" + "DEPTNO=10; JOB=CLERK; my_cnt=1\n" + + "DEPTNO=10; JOB=MANAGER; my_cnt=1\n" + "DEPTNO=10; JOB=PRESIDENT; my_cnt=1\n" - + "DEPTNO=30; JOB=MANAGER; my_cnt=1\n" + "DEPTNO=30; JOB=CLERK; my_cnt=1\n" + + "DEPTNO=30; JOB=MANAGER; my_cnt=1\n" + "DEPTNO=30; JOB=SALESMAN; my_cnt=4\n"; verifyResult(root, expectedResult); String expectedSparkSql = "SELECT `DEPTNO`, `JOB`, `my_cnt`\n" + "FROM (SELECT `DEPTNO`, `JOB`, COUNT(*) `my_cnt`, ROW_NUMBER() OVER (PARTITION BY" - + " `DEPTNO` ORDER BY COUNT(*) NULLS LAST) `_row_number_rare_top_`\n" + + " `DEPTNO` ORDER BY COUNT(*) NULLS LAST, `JOB` NULLS LAST) `_row_number_rare_top_`\n" + "FROM `scott`.`EMP`\n" + "GROUP BY `DEPTNO`, `JOB`) `t1`\n" + "WHERE `_row_number_rare_top_` <= 10"; @@ -171,7 +171,7 @@ public void testRareUseNullFalse() { "LogicalProject(DEPTNO=[$0], JOB=[$1], count=[$2])\n" + " LogicalFilter(condition=[<=($3, 10)])\n" + " LogicalProject(DEPTNO=[$0], JOB=[$1], count=[$2]," - + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2)])\n" + + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2, $1)])\n" + " LogicalAggregate(group=[{0, 1}], count=[COUNT()])\n" + " LogicalProject(DEPTNO=[$7], JOB=[$2])\n" + " LogicalFilter(condition=[AND(IS NOT NULL($7), IS NOT NULL($2))])\n" @@ -181,20 +181,20 @@ public void testRareUseNullFalse() { String expectedResult = "" + "DEPTNO=20; JOB=MANAGER; count=1\n" - + "DEPTNO=20; JOB=CLERK; count=2\n" + "DEPTNO=20; JOB=ANALYST; count=2\n" - + "DEPTNO=10; JOB=MANAGER; count=1\n" + + "DEPTNO=20; JOB=CLERK; count=2\n" + "DEPTNO=10; JOB=CLERK; count=1\n" + + "DEPTNO=10; JOB=MANAGER; count=1\n" + "DEPTNO=10; JOB=PRESIDENT; count=1\n" - + "DEPTNO=30; JOB=MANAGER; count=1\n" + "DEPTNO=30; JOB=CLERK; count=1\n" + + "DEPTNO=30; JOB=MANAGER; count=1\n" + "DEPTNO=30; JOB=SALESMAN; count=4\n"; verifyResult(root, expectedResult); String expectedSparkSql = "SELECT `DEPTNO`, `JOB`, `count`\n" + "FROM (SELECT `DEPTNO`, `JOB`, COUNT(*) `count`, ROW_NUMBER() OVER (PARTITION BY" - + " `DEPTNO` ORDER BY COUNT(*) NULLS LAST) `_row_number_rare_top_`\n" + + " `DEPTNO` ORDER BY COUNT(*) NULLS LAST, `JOB` NULLS LAST) `_row_number_rare_top_`\n" + "FROM `scott`.`EMP`\n" + "WHERE `DEPTNO` IS NOT NULL AND `JOB` IS NOT NULL\n" + "GROUP BY `DEPTNO`, `JOB`) `t2`\n" @@ -230,7 +230,7 @@ public void testTop() { "LogicalProject(JOB=[$0], count=[$1])\n" + " LogicalFilter(condition=[<=($2, 10)])\n" + " LogicalProject(JOB=[$0], count=[$1], _row_number_rare_top_=[ROW_NUMBER() OVER" - + " (ORDER BY $1 DESC)])\n" + + " (ORDER BY $1 DESC, $0)])\n" + " LogicalAggregate(group=[{0}], count=[COUNT()])\n" + " LogicalProject(JOB=[$2])\n" + " LogicalTableScan(table=[[scott, EMP]])\n"; @@ -238,8 +238,8 @@ public void testTop() { String expectedResult = "" - + "JOB=SALESMAN; count=4\n" + "JOB=CLERK; count=4\n" + + "JOB=SALESMAN; count=4\n" + "JOB=MANAGER; count=3\n" + "JOB=ANALYST; count=2\n" + "JOB=PRESIDENT; count=1\n"; @@ -248,7 +248,7 @@ public void testTop() { String expectedSparkSql = "SELECT `JOB`, `count`\n" + "FROM (SELECT `JOB`, COUNT(*) `count`, ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC" - + " NULLS FIRST) `_row_number_rare_top_`\n" + + " NULLS FIRST, `JOB` NULLS LAST) `_row_number_rare_top_`\n" + "FROM `scott`.`EMP`\n" + "GROUP BY `JOB`) `t1`\n" + "WHERE `_row_number_rare_top_` <= 10"; @@ -263,7 +263,7 @@ public void testTopBy() { "LogicalProject(DEPTNO=[$0], JOB=[$1], count=[$2])\n" + " LogicalFilter(condition=[<=($3, 10)])\n" + " LogicalProject(DEPTNO=[$0], JOB=[$1], count=[$2]," - + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2 DESC)])\n" + + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2 DESC, $1)])\n" + " LogicalAggregate(group=[{0, 1}], count=[COUNT()])\n" + " LogicalProject(DEPTNO=[$7], JOB=[$2])\n" + " LogicalTableScan(table=[[scott, EMP]])\n"; @@ -271,21 +271,22 @@ public void testTopBy() { String expectedResult = "" - + "DEPTNO=20; JOB=CLERK; count=2\n" + "DEPTNO=20; JOB=ANALYST; count=2\n" + + "DEPTNO=20; JOB=CLERK; count=2\n" + "DEPTNO=20; JOB=MANAGER; count=1\n" - + "DEPTNO=10; JOB=MANAGER; count=1\n" + "DEPTNO=10; JOB=CLERK; count=1\n" + + "DEPTNO=10; JOB=MANAGER; count=1\n" + "DEPTNO=10; JOB=PRESIDENT; count=1\n" + "DEPTNO=30; JOB=SALESMAN; count=4\n" - + "DEPTNO=30; JOB=MANAGER; count=1\n" - + "DEPTNO=30; JOB=CLERK; count=1\n"; + + "DEPTNO=30; JOB=CLERK; count=1\n" + + "DEPTNO=30; JOB=MANAGER; count=1\n"; verifyResult(root, expectedResult); String expectedSparkSql = "SELECT `DEPTNO`, `JOB`, `count`\n" + "FROM (SELECT `DEPTNO`, `JOB`, COUNT(*) `count`, ROW_NUMBER() OVER (PARTITION BY" - + " `DEPTNO` ORDER BY COUNT(*) DESC NULLS FIRST) `_row_number_rare_top_`\n" + + " `DEPTNO` ORDER BY COUNT(*) DESC NULLS FIRST, `JOB` NULLS LAST)" + + " `_row_number_rare_top_`\n" + "FROM `scott`.`EMP`\n" + "GROUP BY `DEPTNO`, `JOB`) `t1`\n" + "WHERE `_row_number_rare_top_` <= 10"; @@ -300,7 +301,7 @@ public void testTopDisableShowCount() { "LogicalProject(DEPTNO=[$0], JOB=[$1])\n" + " LogicalFilter(condition=[<=($3, 10)])\n" + " LogicalProject(DEPTNO=[$0], JOB=[$1], count=[$2]," - + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2 DESC)])\n" + + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2 DESC, $1)])\n" + " LogicalAggregate(group=[{0, 1}], count=[COUNT()])\n" + " LogicalProject(DEPTNO=[$7], JOB=[$2])\n" + " LogicalTableScan(table=[[scott, EMP]])\n"; @@ -308,21 +309,22 @@ public void testTopDisableShowCount() { String expectedResult = "" - + "DEPTNO=20; JOB=CLERK\n" + "DEPTNO=20; JOB=ANALYST\n" + + "DEPTNO=20; JOB=CLERK\n" + "DEPTNO=20; JOB=MANAGER\n" - + "DEPTNO=10; JOB=MANAGER\n" + "DEPTNO=10; JOB=CLERK\n" + + "DEPTNO=10; JOB=MANAGER\n" + "DEPTNO=10; JOB=PRESIDENT\n" + "DEPTNO=30; JOB=SALESMAN\n" - + "DEPTNO=30; JOB=MANAGER\n" - + "DEPTNO=30; JOB=CLERK\n"; + + "DEPTNO=30; JOB=CLERK\n" + + "DEPTNO=30; JOB=MANAGER\n"; verifyResult(root, expectedResult); String expectedSparkSql = "SELECT `DEPTNO`, `JOB`\n" + "FROM (SELECT `DEPTNO`, `JOB`, COUNT(*) `count`, ROW_NUMBER() OVER (PARTITION BY" - + " `DEPTNO` ORDER BY COUNT(*) DESC NULLS FIRST) `_row_number_rare_top_`\n" + + " `DEPTNO` ORDER BY COUNT(*) DESC NULLS FIRST, `JOB` NULLS LAST)" + + " `_row_number_rare_top_`\n" + "FROM `scott`.`EMP`\n" + "GROUP BY `DEPTNO`, `JOB`) `t1`\n" + "WHERE `_row_number_rare_top_` <= 10"; @@ -337,7 +339,7 @@ public void testTopCountField() { "LogicalProject(DEPTNO=[$0], JOB=[$1], my_cnt=[$2])\n" + " LogicalFilter(condition=[<=($3, 10)])\n" + " LogicalProject(DEPTNO=[$0], JOB=[$1], my_cnt=[$2]," - + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2 DESC)])\n" + + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2 DESC, $1)])\n" + " LogicalAggregate(group=[{0, 1}], my_cnt=[COUNT()])\n" + " LogicalProject(DEPTNO=[$7], JOB=[$2])\n" + " LogicalTableScan(table=[[scott, EMP]])\n"; @@ -345,21 +347,22 @@ public void testTopCountField() { String expectedResult = "" - + "DEPTNO=20; JOB=CLERK; my_cnt=2\n" + "DEPTNO=20; JOB=ANALYST; my_cnt=2\n" + + "DEPTNO=20; JOB=CLERK; my_cnt=2\n" + "DEPTNO=20; JOB=MANAGER; my_cnt=1\n" - + "DEPTNO=10; JOB=MANAGER; my_cnt=1\n" + "DEPTNO=10; JOB=CLERK; my_cnt=1\n" + + "DEPTNO=10; JOB=MANAGER; my_cnt=1\n" + "DEPTNO=10; JOB=PRESIDENT; my_cnt=1\n" + "DEPTNO=30; JOB=SALESMAN; my_cnt=4\n" - + "DEPTNO=30; JOB=MANAGER; my_cnt=1\n" - + "DEPTNO=30; JOB=CLERK; my_cnt=1\n"; + + "DEPTNO=30; JOB=CLERK; my_cnt=1\n" + + "DEPTNO=30; JOB=MANAGER; my_cnt=1\n"; verifyResult(root, expectedResult); String expectedSparkSql = "SELECT `DEPTNO`, `JOB`, `my_cnt`\n" + "FROM (SELECT `DEPTNO`, `JOB`, COUNT(*) `my_cnt`, ROW_NUMBER() OVER (PARTITION BY" - + " `DEPTNO` ORDER BY COUNT(*) DESC NULLS FIRST) `_row_number_rare_top_`\n" + + " `DEPTNO` ORDER BY COUNT(*) DESC NULLS FIRST, `JOB` NULLS LAST)" + + " `_row_number_rare_top_`\n" + "FROM `scott`.`EMP`\n" + "GROUP BY `DEPTNO`, `JOB`) `t1`\n" + "WHERE `_row_number_rare_top_` <= 10"; @@ -374,7 +377,7 @@ public void testTopUseNullFalse() { "LogicalProject(DEPTNO=[$0], JOB=[$1], count=[$2])\n" + " LogicalFilter(condition=[<=($3, 10)])\n" + " LogicalProject(DEPTNO=[$0], JOB=[$1], count=[$2]," - + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2 DESC)])\n" + + " _row_number_rare_top_=[ROW_NUMBER() OVER (PARTITION BY $0 ORDER BY $2 DESC, $1)])\n" + " LogicalAggregate(group=[{0, 1}], count=[COUNT()])\n" + " LogicalProject(DEPTNO=[$7], JOB=[$2])\n" + " LogicalFilter(condition=[AND(IS NOT NULL($7), IS NOT NULL($2))])\n" @@ -383,21 +386,22 @@ public void testTopUseNullFalse() { String expectedResult = "" - + "DEPTNO=20; JOB=CLERK; count=2\n" + "DEPTNO=20; JOB=ANALYST; count=2\n" + + "DEPTNO=20; JOB=CLERK; count=2\n" + "DEPTNO=20; JOB=MANAGER; count=1\n" - + "DEPTNO=10; JOB=MANAGER; count=1\n" + "DEPTNO=10; JOB=CLERK; count=1\n" + + "DEPTNO=10; JOB=MANAGER; count=1\n" + "DEPTNO=10; JOB=PRESIDENT; count=1\n" + "DEPTNO=30; JOB=SALESMAN; count=4\n" - + "DEPTNO=30; JOB=MANAGER; count=1\n" - + "DEPTNO=30; JOB=CLERK; count=1\n"; + + "DEPTNO=30; JOB=CLERK; count=1\n" + + "DEPTNO=30; JOB=MANAGER; count=1\n"; verifyResult(root, expectedResult); String expectedSparkSql = "SELECT `DEPTNO`, `JOB`, `count`\n" + "FROM (SELECT `DEPTNO`, `JOB`, COUNT(*) `count`, ROW_NUMBER() OVER (PARTITION BY" - + " `DEPTNO` ORDER BY COUNT(*) DESC NULLS FIRST) `_row_number_rare_top_`\n" + + " `DEPTNO` ORDER BY COUNT(*) DESC NULLS FIRST, `JOB` NULLS LAST)" + + " `_row_number_rare_top_`\n" + "FROM `scott`.`EMP`\n" + "WHERE `DEPTNO` IS NOT NULL AND `JOB` IS NOT NULL\n" + "GROUP BY `DEPTNO`, `JOB`) `t2`\n" From 59c9f7dcc6a8a67959bbd41ab07e0aa06f91eddf Mon Sep 17 00:00:00 2001 From: Kai Huang Date: Mon, 11 May 2026 16:43:39 -0700 Subject: [PATCH 3/3] Reorder doctest null rows in rare.md / top.md to match new ASC tie-break The stable tie-break added in the previous commit appends the rare/top field columns as secondary ASC keys to the `ROW_NUMBER` `ORDER BY`. ASC ordering uses NULLS LAST by default, so the existing `usenull=true email` examples in `docs/user/ppl/cmd/rare.md` and `docs/user/ppl/cmd/top.md` now emit `null` last instead of first. Update the doctest expected output blocks accordingly. No behavior change for the non-null rows. Signed-off-by: Kai Huang --- docs/user/ppl/cmd/rare.md | 2 +- docs/user/ppl/cmd/top.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/user/ppl/cmd/rare.md b/docs/user/ppl/cmd/rare.md index c3f28eb1a7..8014e95785 100644 --- a/docs/user/ppl/cmd/rare.md +++ b/docs/user/ppl/cmd/rare.md @@ -152,10 +152,10 @@ fetched rows / total rows = 4/4 +-----------------------+-------+ | email | count | |-----------------------+-------| -| null | 1 | | amberduke@pyrami.com | 1 | | daleadams@boink.com | 1 | | hattiebond@netagy.com | 1 | +| null | 1 | +-----------------------+-------+ ``` diff --git a/docs/user/ppl/cmd/top.md b/docs/user/ppl/cmd/top.md index 4f93c8fd58..56f8ba7481 100644 --- a/docs/user/ppl/cmd/top.md +++ b/docs/user/ppl/cmd/top.md @@ -166,10 +166,10 @@ fetched rows / total rows = 4/4 +-----------------------+-------+ | email | count | |-----------------------+-------| -| null | 1 | | amberduke@pyrami.com | 1 | | daleadams@boink.com | 1 | | hattiebond@netagy.com | 1 | +| null | 1 | +-----------------------+-------+ ```