feat(py): DataSourceReader bridge for ggsql database pushdown by cpsievert · Pull Request #221 · posit-dev/querychat

cpsievert · 2026-04-18T03:31:46Z

Motivation

querychat's current ggsql integration splits execution into two phases: run the SQL on the real database, then replay the VISUALISE portion locally in an in-memory DuckDB. This has two problems:

Scaling — the full SQL result must be pulled into Python memory, even when ggsql's stat transforms (histogram, density, boxplot) would reduce it to a small summary. A histogram of 10M rows pulls all 10M rows into memory only to bin them into ~30 buckets.
Multi-source layers — ggsql supports per-layer data sources (e.g., a CTE fed to a different DRAW clause). The two-phase approach loses intermediate tables at the DataSource boundary, so querychat rejects these queries entirely.

Both problems stem from the same root cause: querychat splits the query at the SQL/VISUALISE boundary rather than letting ggsql run the full pipeline against the real database.

Approach

For SQLAlchemySource data sources, this PR implements a DataSourceReader — a Python object that satisfies ggsql's reader protocol (execute_sql(), register(), unregister()) by routing SQL to the real database via SQLAlchemy. ggsql runs its entire pipeline (parsing, CTEs, stat transforms, layer queries) against the real DB.

sqlglot transpiles ggsql's ANSI-generated SQL to the target database dialect. The dialect mapping covers 22 SQLAlchemy backends, verified by installing the actual driver packages and checking engine.dialect.name.

Falls back to the current two-phase approach when the bridge fails (e.g., temp table permission denied, unsupported dialect, transpilation error) or for non-SQLAlchemy data sources.

Why a separate PR

This is split off from feat/ggsql-integration because the DataSourceReader bridge is Python-specific — it depends on SQLAlchemy and sqlglot, neither of which has an R equivalent. The parent branch (feat/ggsql-integration) contains ggsql prompt/syntax updates and other changes that apply to both R and Python. Keeping this separate makes it clear that the R package doesn't need a corresponding change.

Changes

New: _datasource_reader.py — DataSourceReader class, SQLGLOT_DIALECTS mapping (22 dialects), transpile_sql(), register_sqlglot_dialect() for custom additions
Modified: _viz_ggsql.py — execute_ggsql() tries bridge path first, falls back to execute_two_phase() (renamed current logic); logs a warning for unknown dialects
Modified: _viz_tools.py, _shiny_module.py — updated callers to pass original query string
New dep: sqlglot>=26.0 added to the viz extra (pure Python, zero transitive deps, 6.6 MB)
Tests: 19 new tests covering dialect mapping, transpilation, DataSourceReader lifecycle, and end-to-end ggsql integration against real SQLite

Test plan

All existing ggsql tests pass with updated 3-arg execute_ggsql() signature
New DataSourceReader unit tests pass (SQLite in-memory)
End-to-end: ggsql.execute(query, reader) works for scatter, filter, Form B, aggregation
Manual test with a real Snowflake/Postgres connection (TODO)

🤖 Generated with Claude Code

Implements a DataSourceReader that routes ggsql's full pipeline through the real database via SQLAlchemy, using sqlglot for dialect transpilation. For SQLAlchemySource with a known dialect, ggsql runs CTEs, stat transforms, and layer queries directly on the real DB — avoiding the need to pull large result sets into Python memory. Falls back to the existing two-phase approach (now `execute_two_phase`) for other DataSource types or on bridge failure. Includes verified dialect mappings for 22 SQLAlchemy backends and register_sqlglot_dialect() for custom additions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(py): DataSourceReader bridge for ggsql database pushdown#221

feat(py): DataSourceReader bridge for ggsql database pushdown#221
cpsievert wants to merge 1 commit intofeat/ggsql-integrationfrom
feat/datasource-reader-bridge

cpsievert commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cpsievert commented Apr 18, 2026

Motivation

Approach

Why a separate PR

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant