Skip to content

Route analytics queries by index setting, not table-name prefix#5429

Merged
ahkcs merged 2 commits into
opensearch-project:feature/mustang-ppl-integrationfrom
bowenlan-amzn:experiment/routing-by-dataformat-setting
May 11, 2026
Merged

Route analytics queries by index setting, not table-name prefix#5429
ahkcs merged 2 commits into
opensearch-project:feature/mustang-ppl-integrationfrom
bowenlan-amzn:experiment/routing-by-dataformat-setting

Conversation

@bowenlan-amzn
Copy link
Copy Markdown
Member

Description

Today `RestUnifiedQueryAction.isAnalyticsIndex` routes queries to the analytics engine when the source index name starts with `parquet_`. That's brittle — it conflates naming convention with storage type: an index created without the prefix but with pluggable dataformat enabled is silently sent to the Lucene path, and an index named `parquet_foo` without the setting is mis-dispatched.

Switch to the authoritative signal: the `index.pluggable.dataformat.enabled` cluster-state setting. This is the same flag integration tests (`CoordinatorReduceIT`, `CompositeCommitDeletionIT`) use to create analytics-backed indices, and what `FieldStorageResolver` reads to resolve field storage.

Routing behavior

`index.pluggable.dataformat.enabled` Dispatch
`true` analytics engine (DataFusion via `QueryPlanExecutor`)
absent / `false` / index missing sql-plugin Calcite → OpenSearch DSL path

Issues Resolved

Mustang rollout pre-work — aligns PPL/SQL routing with the index-setting-based model already used elsewhere in the engine.

Check List

  • New functionality includes testing.
  • New functionality has been documented. (javadoc on `isAnalyticsIndex`)
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 10, 2026

PR Reviewer Guide 🔍

(Review updated until commit f2e83ee)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The method isPluggableDataformatIndex returns false when indexMetadata is null (index does not exist), but the calling code in isAnalyticsIndex catches all exceptions and returns false. If indexMetadata.getSettings() or the setting getters throw an exception (e.g., due to malformed settings), the exception is silently swallowed. This could mask configuration errors or unexpected states, making debugging difficult when an index should route to analytics but doesn't due to a settings issue.

private boolean isPluggableDataformatIndex(String indexName) {
  var indexMetadata = clusterService.state().metadata().index(indexName);
  if (indexMetadata == null) {
    return false;
  }
  var settings = indexMetadata.getSettings();
  return IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.get(settings)
      && "parquet".equals(IndexSettings.PLUGGABLE_DATAFORMAT_VALUE_SETTING.get(settings));
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 10, 2026

PR Code Suggestions ✨

Latest suggestions up to f2e83ee

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Cache cluster state reference

The method accesses cluster state without synchronization, which could lead to race
conditions if the cluster state changes between the null check and settings
retrieval. Consider caching the cluster state reference or using a snapshot to
ensure consistency throughout the method execution.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [106-114]

 private boolean isPluggableDataformatIndex(String indexName) {
-  var indexMetadata = clusterService.state().metadata().index(indexName);
+  var clusterState = clusterService.state();
+  var indexMetadata = clusterState.metadata().index(indexName);
   if (indexMetadata == null) {
     return false;
   }
   var settings = indexMetadata.getSettings();
   return IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.get(settings)
       && "parquet".equals(IndexSettings.PLUGGABLE_DATAFORMAT_VALUE_SETTING.get(settings));
 }
Suggestion importance[1-10]: 4

__

Why: While caching the cluster state reference is a minor improvement for code clarity, the concern about race conditions is overstated. The clusterService.state() already returns a consistent snapshot, and the operations are read-only. The suggestion provides marginal benefit in terms of readability but doesn't address a critical issue.

Low

Previous suggestions

Suggestions up to commit 7f221e1
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null-safety for cluster state

Add null-safety check for clusterService.state() and metadata() to prevent potential
NullPointerException when cluster state is unavailable. This ensures the method
gracefully handles cluster state initialization or unavailability scenarios.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [105-109]

 private boolean isPluggableDataformatIndex(String indexName) {
-  var indexMetadata = clusterService.state().metadata().index(indexName);
+  var clusterState = clusterService.state();
+  if (clusterState == null || clusterState.metadata() == null) {
+    return false;
+  }
+  var indexMetadata = clusterState.metadata().index(indexName);
   return indexMetadata != null
       && IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.get(indexMetadata.getSettings());
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion adds defensive null checks for clusterService.state() and metadata(), which could prevent potential NullPointerException in edge cases. However, in typical OpenSearch plugin contexts, these are rarely null during normal operation, making this a minor defensive improvement rather than a critical fix.

Low
Suggestions up to commit f2bc431
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null-safety for cluster state

Add null-safety check for clusterService.state() and metadata() to prevent potential
NullPointerException. The cluster state or metadata could be null during cluster
initialization or shutdown scenarios.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [108-114]

 private boolean isPluggableDataformatIndex(String indexName) {
-  var indexMetadata = clusterService.state().metadata().index(indexName);
+  var clusterState = clusterService.state();
+  if (clusterState == null || clusterState.metadata() == null) {
+    return false;
+  }
+  var indexMetadata = clusterState.metadata().index(indexName);
   if (indexMetadata == null) {
     return false;
   }
   return indexMetadata.getSettings().getAsBoolean(PLUGGABLE_DATAFORMAT_ENABLED_SETTING, false);
 }
Suggestion importance[1-10]: 5

__

Why: While adding null checks for clusterService.state() and metadata() could prevent potential NPEs in edge cases, this is a defensive programming suggestion rather than a critical bug fix. The ClusterService typically guarantees non-null state in normal operation, and the existing null check for indexMetadata already handles the most common failure case.

Low
Suggestions up to commit cbc0ca7
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add exception handling for cluster state access

The method accesses cluster state without proper synchronization or error handling.
Cluster state operations can throw exceptions if the cluster is in an unstable
state. Wrap the cluster state access in a try-catch block to prevent unexpected
failures from propagating.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [105-111]

 private boolean isPluggableDataformatIndex(String indexName) {
-  var indexMetadata = clusterService.state().metadata().index(indexName);
-  if (indexMetadata == null) {
+  try {
+    var indexMetadata = clusterService.state().metadata().index(indexName);
+    if (indexMetadata == null) {
+      return false;
+    }
+    return indexMetadata.getSettings().getAsBoolean(PLUGGABLE_DATAFORMAT_ENABLED_SETTING, false);
+  } catch (Exception e) {
     return false;
   }
-  return indexMetadata.getSettings().getAsBoolean(PLUGGABLE_DATAFORMAT_ENABLED_SETTING, false);
 }
Suggestion importance[1-10]: 5

__

Why: While adding error handling for cluster state access is a reasonable defensive practice, the suggestion doesn't provide strong evidence that clusterService.state().metadata().index() commonly throws exceptions in this context. The score reflects that this is a minor improvement for robustness rather than fixing a critical bug.

Low

@bowenlan-amzn bowenlan-amzn force-pushed the experiment/routing-by-dataformat-setting branch from cbc0ca7 to f2bc431 Compare May 10, 2026 04:48
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit f2bc431

Today `RestUnifiedQueryAction.isAnalyticsIndex` dispatches to the analytics
engine when the source index name starts with `parquet_`. That's brittle —
it conflates naming convention with storage type. An index created without
the prefix but with pluggable dataformat enabled is silently sent to the
Lucene path; an index named `parquet_foo` without the setting is
mis-dispatched to analytics.

Use the authoritative signal instead: the `index.pluggable.dataformat.enabled`
setting on cluster-state metadata. This is the same setting integration tests
(`CoordinatorReduceIT`, `CompositeCommitDeletionIT`, etc.) already use to
create analytics-backed indices, and it's what `FieldStorageResolver` reads
to decide field-level storage.

Behavior:
- `index.pluggable.dataformat.enabled=true`  → analytics engine (DataFusion)
- flag absent / false / index missing         → Calcite→OpenSearch DSL path

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
@bowenlan-amzn bowenlan-amzn force-pushed the experiment/routing-by-dataformat-setting branch from f2bc431 to 7f221e1 Compare May 10, 2026 05:56
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 7f221e1

private boolean isPluggableDataformatIndex(String indexName) {
var indexMetadata = clusterService.state().metadata().index(indexName);
return indexMetadata != null
&& IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.get(indexMetadata.getSettings());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    "index.pluggable.dataformat.enabled": true,
    "index.pluggable.dataformat": "composite",
    "index.composite.primary_data_format": "parquet"

I think we also need to check on index.composite.primary_data_format which would indicate if this is Parquet backed index.

If the pluggable dataformat is enabled index.pluggable.dataformat.enabled , but index.composite.primary_data_format : lucene , then we can still go the old route until AnalyticsPlugin supports pure Lucene Indices and DocValues.

Only route to the analytics path when both pluggable.dataformat.enabled=true
AND pluggable.dataformat=parquet. If the format is lucene (or anything else),
fall through to the standard Calcite→DSL path.

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit f2e83ee

@ahkcs ahkcs merged commit f006e29 into opensearch-project:feature/mustang-ppl-integration May 11, 2026
34 of 36 checks passed
}
var settings = indexMetadata.getSettings();
return IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.get(settings)
&& "parquet".equals(IndexSettings.PLUGGABLE_DATAFORMAT_VALUE_SETTING.get(settings));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLUGGABLE_DATAFORMAT_VALUE_SETTING should be composite

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants