Skip to content

Integrate syntax errors with error report#5371

Open
ritvibhatt wants to merge 28 commits intoopensearch-project:mainfrom
ritvibhatt:syntax-exception-error-message
Open

Integrate syntax errors with error report#5371
ritvibhatt wants to merge 28 commits intoopensearch-project:mainfrom
ritvibhatt:syntax-exception-error-message

Conversation

@ritvibhatt
Copy link
Copy Markdown
Contributor

@ritvibhatt ritvibhatt commented Apr 20, 2026

Description

Integrates syntax errors with the error reporting infrastructure and adds suggestions for common syntax error patterns.

Suggestion System

  • Added SyntaxErrorSuggestionRegistry with pattern-based providers:
    • SelectStarSuggestionProvider: Suggests PPL syntax when SQL is used in PPL context
    • UnmatchedParenthesesSuggestionProvider: Detects parentheses mismatches
    • UnquotedTableNameSuggestionProvider: Suggests backticks for special characters in table names
    • ExpectedTokensSuggestionProvider: Falls back to ANTLR's expected tokens

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

PR Reviewer Guide 🔍

(Review updated until commit 5620d46)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The catch block at line 75 checks if the cause is a SyntaxCheckException and re-throws the ErrorReport. However, if the cause is not a SyntaxCheckException, it wraps the ErrorReport in an IllegalStateException. This means that ErrorReports with non-syntax causes are double-wrapped, losing the original ErrorReport's structured context. This occurs whenever an ErrorReport is thrown from the strategy.plan() call but its cause is not a SyntaxCheckException.

} catch (ErrorReport e) {
  if (e.getCause() instanceof SyntaxCheckException) throw e;
  throw new IllegalStateException("Failed to plan query", e);
Possible Issue

The code builds an ErrorReport with a SyntaxCheckException as the cause, but the SyntaxCheckException is constructed with only the 'details' string. The original RecognitionException 'e' (which contains the parser state and expected tokens) is not passed to the SyntaxCheckException constructor. This loses valuable debugging information. If SyntaxCheckException supports a cause parameter, 'e' should be passed to preserve the exception chain.

SyntaxCheckException cause = new SyntaxCheckException(details);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 20, 2026

PR Code Suggestions ✨

Latest suggestions up to 5620d46

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Use locale-independent string comparison

The toLowerCase() call creates a new string allocation for every syntax error check.
Since the pattern matching already uses case-insensitive regex patterns, consider
using toLowerCase(Locale.ROOT) for consistency with the rest of the codebase and to
avoid locale-dependent behavior.

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SelectStarSuggestionProvider.java [28-32]

 // Check if query starts with "select" (case-insensitive)
 String query = context.getQuery().trim();
-if (!query.toLowerCase().startsWith("select")) {
+if (!query.toLowerCase(Locale.ROOT).startsWith("select")) {
   return Optional.empty();
 }
Suggestion importance[1-10]: 6

__

Why: Good suggestion to use toLowerCase(Locale.ROOT) for locale-independent behavior, which is consistent with the String.format(Locale.ROOT, ...) pattern used elsewhere in the file. This prevents potential locale-dependent issues, though the impact is relatively minor.

Low
Use typed map for position

The position map uses raw types without type safety. Consider using a more
structured approach or at least specify generic types for the map to ensure type
consistency and prevent potential runtime errors when accessing position data.

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java [62-64]

 // Create a SyntaxCheckException as the underlying cause
 SyntaxCheckException cause = new SyntaxCheckException(details);
 
 // Build position information
-Map<String, Object> position = new HashMap<>();
+Map<String, Integer> position = new HashMap<>();
 position.put("line", line);
 position.put("column", charPositionInLine);
 
 // Build ErrorReport with structured context
 ErrorReport.Builder reportBuilder =
     ErrorReport.wrap(cause)
Suggestion importance[1-10]: 5

__

Why: Valid suggestion to improve type safety by changing Map<String, Object> to Map<String, Integer> for the position map. However, this is a minor improvement that only enhances code clarity and type checking without fixing any actual bug.

Low

Previous suggestions

Suggestions up to commit 81d2b94
CategorySuggestion                                                                                                                                    Impact
General
Prevent suggestion failures from hiding errors

The SyntaxErrorSuggestionRegistry.findSuggestion() call may throw an exception if
any provider fails. This would prevent the original syntax error from being
reported. Wrap the suggestion lookup in a try-catch to ensure the primary error is
always thrown even if suggestion generation fails.

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java [76-84]

 // Use the suggestion registry to find pattern-based suggestions
-SyntaxErrorContext context =
-    new SyntaxErrorContext(recognizer, offendingToken, tokens, query, e);
-Optional<String> customSuggestion = SyntaxErrorSuggestionRegistry.findSuggestion(context);
+try {
+  SyntaxErrorContext context =
+      new SyntaxErrorContext(recognizer, offendingToken, tokens, query, e);
+  Optional<String> customSuggestion = SyntaxErrorSuggestionRegistry.findSuggestion(context);
 
-if (customSuggestion.isPresent()) {
-  reportBuilder.suggestion(customSuggestion.get());
+  if (customSuggestion.isPresent()) {
+    reportBuilder.suggestion(customSuggestion.get());
+  }
+} catch (Exception suggestionError) {
+  // Log but don't fail - the primary syntax error must be reported
 }
 
 throw reportBuilder.build();
Suggestion importance[1-10]: 7

__

Why: Valid concern about preventing suggestion generation failures from hiding the primary syntax error. However, this is a defensive programming suggestion that asks to verify/ensure robustness rather than fixing a critical bug, so it should not score above 7.

Medium
Add null check for token text

The parenthesis counting logic doesn't handle null token text. If t.getText()
returns null, calling equals() on the string literal will work, but it's safer to
add a null check to prevent potential issues with malformed tokens.

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/UnmatchedParenthesesSuggestionProvider.java [19-29]

 int openParens = 0;
 int closeParens = 0;
 for (Token t : ctx.getAllTokens()) {
   if (t.getChannel() != Token.DEFAULT_CHANNEL) continue;
   String text = t.getText();
+  if (text == null) continue;
   if ("(".equals(text)) {
     openParens++;
   } else if (")".equals(text)) {
     closeParens++;
   }
 }
Suggestion importance[1-10]: 5

__

Why: This is a defensive null check suggestion. While it could prevent potential issues, Token.getText() in ANTLR typically doesn't return null for valid tokens on the default channel, making this a minor defensive improvement rather than a critical fix.

Low
Suggestions up to commit c0fe174
CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle all exception paths correctly

The method should handle all ErrorReport exceptions consistently. Currently, if an
ErrorReport is caught but its cause is not a SyntaxCheckException, the method falls
through without returning a value, which will cause a compilation error or
unexpected behavior.

async-query-core/src/main/java/org/opensearch/sql/spark/utils/SQLQueryUtils.java [89-93]

 try {
   flintSparkSqlExtensionsParser.statement();
   return true;
 } catch (ErrorReport e) {
   if (e.getCause() instanceof SyntaxCheckException) {
     return false;
   }
+  throw e;
 }
Suggestion importance[1-10]: 9

__

Why: This identifies a critical bug where the method can fall through without returning a value when ErrorReport is caught but its cause is not SyntaxCheckException, causing compilation errors or undefined behavior.

High
General
Protect against suggestion generation failures

The code should handle potential exceptions from
SyntaxErrorSuggestionRegistry.findSuggestion() to prevent suggestion generation
failures from masking the original syntax error. Wrap the suggestion lookup in a
try-catch to ensure the primary error is always reported.

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java [78-82]

-Optional<String> customSuggestion = SyntaxErrorSuggestionRegistry.findSuggestion(context);
-
-if (customSuggestion.isPresent()) {
-  reportBuilder.suggestion(customSuggestion.get());
+try {
+  Optional<String> customSuggestion = SyntaxErrorSuggestionRegistry.findSuggestion(context);
+  if (customSuggestion.isPresent()) {
+    reportBuilder.suggestion(customSuggestion.get());
+  }
+} catch (Exception suggestionError) {
+  // Log but don't fail - the original syntax error is more important
 }
Suggestion importance[1-10]: 6

__

Why: Valid defensive programming suggestion to prevent suggestion generation failures from masking the original syntax error. However, the impact is moderate since suggestion providers should be robust.

Low
Preserve exception context properly

The catch block for ErrorReport should rethrow the exception directly without
wrapping it in IllegalStateException when the cause is not a SyntaxCheckException.
This preserves the original error context and prevents double-wrapping of
exceptions.

api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java [75-77]

 } catch (ErrorReport e) {
-  if (e.getCause() instanceof SyntaxCheckException) throw e;
+  if (e.getCause() instanceof SyntaxCheckException) {
+    throw e;
+  }
   throw new IllegalStateException("Failed to plan query", e);
Suggestion importance[1-10]: 3

__

Why: The suggestion is correct but offers minimal improvement. The code already preserves the exception context by passing e to IllegalStateException. The formatting change doesn't impact functionality.

Low
Suggestions up to commit e3547be
CategorySuggestion                                                                                                                                    Impact
General
Use locale-independent string comparison

The toLowerCase() call creates a new string allocation for every invocation. Since
the pattern matching is already case-insensitive via regex patterns, consider using
toLowerCase(Locale.ROOT) for consistent locale-independent behavior and to avoid
potential locale-specific issues.

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SelectStarSuggestionProvider.java [28-32]

 // Check if query starts with "select" (case-insensitive)
 String query = context.getQuery().trim();
-if (!query.toLowerCase().startsWith("select")) {
+if (!query.toLowerCase(Locale.ROOT).startsWith("select")) {
   return Optional.empty();
 }
Suggestion importance[1-10]: 6

__

Why: Using toLowerCase(Locale.ROOT) ensures consistent, locale-independent behavior and avoids potential issues with Turkish locale (where 'I'.toLowerCase() != 'i'). This is a good practice for string comparisons in code that may run in different locales.

Low
Use specific type for position map

The position map uses raw Object values which loses type safety. Consider using a
more specific type like Integer or creating a dedicated Position class to ensure
type safety and prevent potential ClassCastException when consuming this data.

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java [58-65]

 // Create a SyntaxCheckException as the underlying cause
 SyntaxCheckException cause = new SyntaxCheckException(details);
 
 // Build position information
-Map<String, Object> position = new HashMap<>();
+Map<String, Integer> position = new HashMap<>();
 position.put("line", line);
 position.put("column", charPositionInLine);
Suggestion importance[1-10]: 5

__

Why: Using Map<String, Integer> instead of Map<String, Object> improves type safety for the position map. This is a valid improvement that prevents potential ClassCastException when consuming the data, though the impact is moderate since the context is internal.

Low
Remove redundant exception re-wrapping

The ErrorReport exception is caught but then immediately re-thrown without any
additional handling or logging. This pattern doesn't add value and creates
unnecessary exception wrapping. Consider removing this catch block or adding
meaningful error handling/logging before re-throwing.

api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java [75-77]

 } catch (ErrorReport e) {
-  if (e.getCause() instanceof SyntaxCheckException) throw e;
-  throw new IllegalStateException("Failed to plan query", e);
+  throw e;
Suggestion importance[1-10]: 3

__

Why: The suggestion correctly identifies that the ErrorReport is re-thrown without additional handling. However, the code intentionally wraps non-SyntaxCheckException causes in IllegalStateException, which provides meaningful context. The suggestion oversimplifies the logic and would lose this distinction.

Low
Suggestions up to commit f65c32d
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix duplicate variable name

There is a duplicate variable name 'e' in the catch clause. This is a syntax error
that will prevent compilation. Remove the duplicate 'e' to fix the issue.

api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java [73-74]

-} catch (SyntaxCheckException | UnsupportedOperationException e e) {
+} catch (SyntaxCheckException | UnsupportedOperationException e) {
   throw e;
Suggestion importance[1-10]: 10

__

Why: The duplicate variable e in the catch clause is a critical syntax error that will prevent compilation. This is a major bug that must be fixed.

High
General
Preserve original exception type

The condition checks if the cause is SyntaxCheckException but then rethrows the
ErrorReport wrapper. This may lose the original exception type. Consider unwrapping
and rethrowing the cause directly.

api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java [75-77]

 } catch (ErrorReport e) {
-  if (e.getCause() instanceof SyntaxCheckException) throw e;
+  if (e.getCause() instanceof SyntaxCheckException) throw (SyntaxCheckException) e.getCause();
   throw new IllegalStateException("Failed to plan query", e);
Suggestion importance[1-10]: 4

__

Why: The suggestion to unwrap and rethrow SyntaxCheckException directly may improve exception handling clarity, but the current approach of rethrowing ErrorReport is also valid and maintains the structured error context. This is a minor improvement.

Low
Remove unnecessary intermediate exception

The SyntaxCheckException is created but never used as a standalone exception - it's
immediately wrapped in ErrorReport. Consider whether the intermediate exception is
necessary or if ErrorReport should be constructed directly.

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java [59-70]

-SyntaxCheckException cause = new SyntaxCheckException(details);
-...
 ErrorReport.Builder reportBuilder =
-    ErrorReport.wrap(cause)
+    ErrorReport.builder()
+        .message(details)
Suggestion importance[1-10]: 3

__

Why: The SyntaxCheckException serves as the underlying cause for ErrorReport, which is a valid design pattern for wrapping exceptions with additional context. Removing it would require verifying that ErrorReport.builder() supports direct message construction, making this a minor refactoring suggestion.

Low
Suggestions up to commit 5556f7b
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix invalid multi-catch syntax

The multi-catch syntax SyntaxCheckException | UnsupportedOperationException e |
ErrorReport e is invalid Java — each exception type in a multi-catch must share a
single variable name declared once at the end. This will cause a compilation error.

api/src/main/java/org/opensearch/sql/api/UnifiedQueryPlanner.java [73]

-} catch (SyntaxCheckException | UnsupportedOperationException e | ErrorReport e) {
+} catch (SyntaxCheckException | UnsupportedOperationException | ErrorReport e) {
Suggestion importance[1-10]: 10

__

Why: The multi-catch syntax SyntaxCheckException | UnsupportedOperationException e | ErrorReport e is invalid Java — the variable name e is declared twice, which is a compilation error. The fix SyntaxCheckException | UnsupportedOperationException | ErrorReport e is the correct syntax.

High
Guard against null token list

CommonTokenStream.getTokens() can return null if the stream has not been filled yet
(i.e., fill() was never called). Callers like UnmatchedParenthesesSuggestionProvider
iterate over the result without a null check, which would cause a
NullPointerException. Return an empty list as a safe fallback.

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SyntaxErrorContext.java [37-39]

 public List<Token> getAllTokens() {
-  return tokens.getTokens();
+  List<Token> t = tokens.getTokens();
+  return t != null ? t : List.of();
 }
Suggestion importance[1-10]: 6

__

Why: CommonTokenStream.getTokens() can return null before fill() is called, and callers like UnmatchedParenthesesSuggestionProvider iterate over the result without a null check, which would cause a NullPointerException. The fix is accurate and prevents a real potential runtime error.

Low
General
Fix race condition in provider registration

The register method adds new providers and then re-sorts the entire list. However,
since CopyOnWriteArrayList does not support sort in-place atomically, this can lead
to race conditions where a concurrent findSuggestions call sees a partially-sorted
list. Consider copying to a temporary list, sorting, and replacing atomically, or
using a synchronized block.

common/src/main/java/org/opensearch/sql/common/antlr/suggestion/SyntaxErrorSuggestionRegistry.java [29-32]

 public static void register(SyntaxErrorSuggestionProvider... providers) {
-  PROVIDERS.addAll(Arrays.asList(providers));
-  PROVIDERS.sort(Comparator.comparingInt(SyntaxErrorSuggestionProvider::getPriority));
+  synchronized (PROVIDERS) {
+    PROVIDERS.addAll(Arrays.asList(providers));
+    List<SyntaxErrorSuggestionProvider> sorted = new ArrayList<>(PROVIDERS);
+    sorted.sort(Comparator.comparingInt(SyntaxErrorSuggestionProvider::getPriority));
+    PROVIDERS.clear();
+    PROVIDERS.addAll(sorted);
+  }
 }
Suggestion importance[1-10]: 5

__

Why: The CopyOnWriteArrayList.sort() is not atomic with addAll, creating a potential race condition during concurrent register() calls. However, register() is only called from a static initializer and tests, making this a low-probability issue in practice.

Low
Strengthen error type assertion in fallback test

The test previously verified that the fallback was triggered specifically by a
SyntaxCheckException. After the change, it only checks for ErrorReport, but the
fallback condition in RestSQLQueryAction also triggers for SyntaxCheckException and
UnsupportedCursorRequestException. The test should additionally verify that the
ErrorReport has the expected ErrorCode.SYNTAX_ERROR code to ensure the correct error
type is being propagated.

legacy/src/test/java/org/opensearch/sql/legacy/plugin/RestSQLQueryActionTest.java [131]

 assertTrue(exception instanceof ErrorReport);
+assertEquals(ErrorCode.SYNTAX_ERROR, ((ErrorReport) exception).getCode());
Suggestion importance[1-10]: 4

__

Why: The suggestion to also assert ErrorCode.SYNTAX_ERROR would make the test more precise, but the improved_code references ErrorCode which may not be imported in the test file, and the suggestion is a minor test quality improvement rather than a critical fix.

Low

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 14475e5

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c9ad1f2

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e473f7f

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit f7cbe56

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 575202d

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 8e8ab9e

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 71ad3bb

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit bd46b1e

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 3cd01e4

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 5620d46.

PathLineSeverityDescription
common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java72lowThe full raw query string is embedded into the ErrorReport context object (context("query", query)). If ErrorReport objects are serialized to logs or returned in API responses, this could inadvertently expose sensitive query content (e.g., credential values embedded in queries). This appears to be for debugging/suggestion purposes but warrants a review of how ErrorReport context fields are surfaced to callers.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 078dc07

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 940310d

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 51952fb

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 283ebd9

Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit 5556f7b

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit f65c32d

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit e3547be

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit c0fe174

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Persistent review updated to latest commit 81d2b94

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Persistent review updated to latest commit 5620d46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant