Route analytics queries by index setting, not table-name prefix#5429
Conversation
PR Reviewer Guide 🔍(Review updated until commit f2e83ee)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to f2e83ee Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 7f221e1
Suggestions up to commit f2bc431
Suggestions up to commit cbc0ca7
|
cbc0ca7 to
f2bc431
Compare
|
Persistent review updated to latest commit f2bc431 |
Today `RestUnifiedQueryAction.isAnalyticsIndex` dispatches to the analytics engine when the source index name starts with `parquet_`. That's brittle — it conflates naming convention with storage type. An index created without the prefix but with pluggable dataformat enabled is silently sent to the Lucene path; an index named `parquet_foo` without the setting is mis-dispatched to analytics. Use the authoritative signal instead: the `index.pluggable.dataformat.enabled` setting on cluster-state metadata. This is the same setting integration tests (`CoordinatorReduceIT`, `CompositeCommitDeletionIT`, etc.) already use to create analytics-backed indices, and it's what `FieldStorageResolver` reads to decide field-level storage. Behavior: - `index.pluggable.dataformat.enabled=true` → analytics engine (DataFusion) - flag absent / false / index missing → Calcite→OpenSearch DSL path Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
f2bc431 to
7f221e1
Compare
|
Persistent review updated to latest commit 7f221e1 |
| private boolean isPluggableDataformatIndex(String indexName) { | ||
| var indexMetadata = clusterService.state().metadata().index(indexName); | ||
| return indexMetadata != null | ||
| && IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.get(indexMetadata.getSettings()); |
There was a problem hiding this comment.
"index.pluggable.dataformat.enabled": true,
"index.pluggable.dataformat": "composite",
"index.composite.primary_data_format": "parquet"
I think we also need to check on index.composite.primary_data_format which would indicate if this is Parquet backed index.
If the pluggable dataformat is enabled index.pluggable.dataformat.enabled , but index.composite.primary_data_format : lucene , then we can still go the old route until AnalyticsPlugin supports pure Lucene Indices and DocValues.
Only route to the analytics path when both pluggable.dataformat.enabled=true AND pluggable.dataformat=parquet. If the format is lucene (or anything else), fall through to the standard Calcite→DSL path. Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
|
Persistent review updated to latest commit f2e83ee |
f006e29
into
opensearch-project:feature/mustang-ppl-integration
| } | ||
| var settings = indexMetadata.getSettings(); | ||
| return IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.get(settings) | ||
| && "parquet".equals(IndexSettings.PLUGGABLE_DATAFORMAT_VALUE_SETTING.get(settings)); |
There was a problem hiding this comment.
PLUGGABLE_DATAFORMAT_VALUE_SETTING should be composite
Description
Today `RestUnifiedQueryAction.isAnalyticsIndex` routes queries to the analytics engine when the source index name starts with `parquet_`. That's brittle — it conflates naming convention with storage type: an index created without the prefix but with pluggable dataformat enabled is silently sent to the Lucene path, and an index named `parquet_foo` without the setting is mis-dispatched.
Switch to the authoritative signal: the `index.pluggable.dataformat.enabled` cluster-state setting. This is the same flag integration tests (`CoordinatorReduceIT`, `CompositeCommitDeletionIT`) use to create analytics-backed indices, and what `FieldStorageResolver` reads to resolve field storage.
Routing behavior
Issues Resolved
Mustang rollout pre-work — aligns PPL/SQL routing with the index-setting-based model already used elsewhere in the engine.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.