synq: auto-detect sqlite3 shell scripts so .read is not a syntax error#268
Draft
LalitMaganti wants to merge 3 commits into
Draft
synq: auto-detect sqlite3 shell scripts so .read is not a syntax error#268LalitMaganti wants to merge 3 commits into
LalitMaganti wants to merge 3 commits into
Conversation
The VSCode extension / LSP / CLI flagged sqlite3 CLI shell scripts — files using `.read foo.sql` dot-commands, column-0 `#` comments, and `GO`/`/` terminators — with a spurious "syntax error near '.'" (issue #88). Those constructs belong to the sqlite3 shell language, a layer above the SQL library language, so the SQL parser correctly rejects them; pragmatically we must handle them because such scripts are ubiquitous. Treat the shell language as a separate language, auto-detected per file from its content, and reuse the existing embedded-SQL machinery (EmbeddedFragment / EmbeddedAnalyzer / OffsetMap) with the roles flipped: find non-SQL shell lines *around* SQL instead of SQL *inside* a host language. Shell fragments are contiguous verbatim slices with no holes, so offsets map back to host coordinates via a pure base offset. - embedded/shell.rs: new `extract_shell` + `is_shell_script`. A line is a dot-command (leading whitespace tolerated, only outside a pending statement), a column-0 `#` comment, a lone `GO`/`/` terminator, blank, or SQL. The first dot-command or column-0 `#` switches the file into shell mode; `GO`/`/` are terminators only in shell mode, so a stray `GO` in pure SQL stays a parse error. Handles CRLF line endings. - lsp/host.rs: `ensure_analysis` auto-detects shell scripts and routes the SQL fragments through the embedded analyzer, mapping diagnostics back to host offsets so shell lines never reach the SQL parser. - cli (analyze.rs / cli.rs): add a `Shell` host language and auto-detect shell scripts when no `--experimental-lang` is given. - Tests at three layers: shell extractor units (incl. CRLF + offset invariants), EmbeddedAnalyzer integration, and the LSP-host #88 regression plus a stray-`GO` guard. Semantic tokens / hover / go-to-definition for shell files are a follow-up (TODO); this fixes the spurious parse error only.
…ests Harden the sqlite3 shell-language support against the upstream CLI documentation (https://sqlite.org/cli.html). The docs are explicit that dot-commands must begin with "." at the left margin with no preceding whitespace, so an indented `.read` is NOT a dot-command — it reaches the SQL core and is a syntax error. The previous heuristic tolerated leading whitespace; this aligns it with the documented behavior. - embedded/shell.rs: classify_line now requires the dot at column 0 (`raw.as_bytes().first() == Some(&b'.')`, mirroring the `#` rule), still gated on not being mid-statement. Updated doc comments and the lenient unit tests (indented `.read` is now SQL; indented `.read` no longer triggers shell-mode detection). - embedded/mod.rs: flip the now-misnamed indented-dot detection test. - tests/lsp_diff_tests/shell.py: new LSP diagnostics diff suite (13 cases) covering every documented input-parsing rule observably end to end through the real `syntaqlite lsp` server. - integration_tests/suites/analyze.py: 24 CLI `analyze` tests covering the same rules via auto-detection, exit codes, and error mapping. Coverage spans every documented rule: dot-command column-0 requirement (positive + negative), single-line and mid-statement (continuation) rules, no-comment-in-dot-command, trailing-bare-semicolon, `GO`/`/` terminators (case-insensitive, surrounding whitespace, shell-mode-only), `;` statement separation, column-0 `#` comments (vs indented `#`), plus consequences: correct line/col mapping, semantic diagnostics still flow, CRLF handling, dot-only files, blank lines, and explicit `--experimental-lang shell`.
sqlite3 shell scripts commonly end a SQL statement with a semicolon followed by an inline comment before the next dot-command. The shell extractor must recognize that as a complete statement so the dot-command stays shell syntax instead of being parsed as SQL. - Track the last significant SQL byte outside strings and comments for shell pending-state detection. - Add extractor and LSP diff regressions for semicolon-before-comment lines followed by dot-commands.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #88.
The VSCode extension / LSP / CLI flagged sqlite3 CLI shell scripts —
files using
.read foo.sqldot-commands, column-0#comments, andGO//terminators — with a spurious "syntax error near '.'" (issue#88). Those constructs belong to the sqlite3 shell language, a layer
above the SQL library language, so the SQL parser correctly rejects
them; pragmatically we must handle them because such scripts are
ubiquitous.
Treat the shell language as a separate language, auto-detected per file
from its content, and reuse the existing embedded-SQL machinery
(EmbeddedFragment / EmbeddedAnalyzer / OffsetMap) with the roles flipped:
find non-SQL shell lines around SQL instead of SQL inside a host
language. Shell fragments are contiguous verbatim slices with no holes,
so offsets map back to host coordinates via a pure base offset.
extract_shell+is_shell_script. A line isa dot-command (leading whitespace tolerated, only outside a pending
statement), a column-0
#comment, a loneGO//terminator, blank,or SQL. The first dot-command or column-0
#switches the file intoshell mode;
GO//are terminators only in shell mode, so a strayGOin pure SQL stays a parse error. Handles CRLF line endings.ensure_analysisauto-detects shell scripts and routesthe SQL fragments through the embedded analyzer, mapping diagnostics
back to host offsets so shell lines never reach the SQL parser.
Shellhost language and auto-detectshell scripts when no
--experimental-langis given.invariants), EmbeddedAnalyzer integration, and the LSP-host Add LSP/VSCode support for sqlite shell commands (e.g. .read) #88
regression plus a stray-
GOguard.Semantic tokens / hover / go-to-definition for shell files are a
follow-up (TODO); this fixes the spurious parse error only.