Skip to content

Add BigQuery CREATE EXTERNAL TABLE WITH PARTITION COLUMNS parsing support#66

Open
mesmacosta wants to merge 2 commits intotobilg:mainfrom
mesmacosta:feat/bigquery-create-external-table
Open

Add BigQuery CREATE EXTERNAL TABLE WITH PARTITION COLUMNS parsing support#66
mesmacosta wants to merge 2 commits intotobilg:mainfrom
mesmacosta:feat/bigquery-create-external-table

Conversation

@mesmacosta
Copy link

Summary

BigQuery's CREATE EXTERNAL TABLE syntax supports clauses like WITH PARTITION COLUMNS and WITH CONNECTION that use the WITH keyword differently from the generic WITH properties parser. The generic parser was causing parse failures whenever column definitions or partition columns were present.

Example: failing query before this fix

CREATE EXTERNAL TABLE IF NOT EXISTS `{project}.{dataset}.{table}`
WITH PARTITION COLUMNS (
  table_name  STRING,
  sync_date   DATE,
  start_date  DATE,
  end_date    DATE,
  sync_id     STRING
)
OPTIONS (
  format                        = 'PARQUET',
  uris                          = ['gs://{bucket}/data/table_name={table}/*'],
  hive_partition_uri_prefix     = 'gs://{bucket}/data',
  require_hive_partition_filter = false
);
Error: Expected identifier, got LParen

Fix:

  • Consolidate all BigQuery EXTERNAL TABLE WITH handling into a single location at the WITH token consumption site, with a clean early return
  • Add EXTERNAL to BigQuery's special_modifier dialect list
  • Add generation support (builder + generator) for the new AST fields (with_partition_columns, with_connection)

This does not affect parsing of regular BigQuery CREATE TABLE statements, CTEs in AS SELECT, or any other dialect's WITH handling — the logic only activates when table_modifier == "EXTERNAL" AND dialect == BigQuery.

Changes

  • parser.rs — BigQuery EXTERNAL TABLE parsing + 19 new tests
  • expressions.rs — New with_partition_columns and with_connection fields on CreateTable
  • builder.rs — Builder support for new fields
  • generator.rs — SQL generation for BigQuery EXTERNAL TABLE syntax
  • dialects/mod.rs — Add EXTERNAL to BigQuery's special modifiers

Test plan

  • 12 new BigQuery EXTERNAL TABLE tests (basic, with columns, with partition columns, with connection, full, roundtrips, IF NOT EXISTS, OR REPLACE, PR description SQL)
  • 8 new BigQuery WITH syntax compatibility tests (regular AS SELECT, single CTE, multiple CTEs, PARTITION BY + CLUSTER BY + OPTIONS, OR REPLACE, IF NOT EXISTS)
  • All 20 BigQuery-specific tests pass
  • Full test suite: 917 passed, 0 failed, 0 regressions

- Parse CREATE EXTERNAL TABLE with column definitions, WITH PARTITION
  COLUMNS, WITH CONNECTION, and OPTIONS clauses
- Add is_bigquery_external guard to prevent generic with_properties
  from consuming the WITH token needed by BigQuery-specific clauses
- Add BigQuery EXTERNAL TABLE generation (builder + generator)
- Add EXTERNAL to BigQuery dialect's special_modifier list
- Add 19 tests covering parsing, roundtrip, and WITH syntax
  compatibility (CTEs, PARTITION BY, CLUSTER BY, OR REPLACE, etc.)
- All 916 tests pass with no regressions
@tobilg
Copy link
Owner

tobilg commented Mar 15, 2026

Nice work! Clean early-return pattern that keeps the BigQuery-specific logic well-scoped. A few things to address:

Must fix

match_identifier("COLUMNS") ignores its return value (parser.rs, around the WITH PARTITION COLUMNS parsing block):

  self.advance(); // consume PARTITION
  self.match_identifier("COLUMNS");   // ← return value discarded
  if self.check(TokenType::LParen) {

If someone writes WITH PARTITION (dt DATE) (missing COLUMNS), this silently accepts it — match_identifier returns false without advancing, and the LParen check still succeeds.
Should check the return value and error, e.g.:

  if !self.match_identifier("COLUMNS") {
      // error or handle bare WITH PARTITION case if BigQuery allows it
  }

Suggestions

  • Reversed clause order: The while loop allows WITH PARTITION COLUMNS and WITH CONNECTION in any order (good!), but there's no test for WITH CONNECTION ... WITH PARTITION
    COLUMNS .... Worth adding one to lock in that behavior.
  • Bare WITH PARTITION COLUMNS (no column list): BigQuery allows this for auto-detected hive partition columns. Currently the code just skips column parsing if no ( follows,
    which happens to work, but a dedicated test would make this intentional.
  • Duplicate clauses: Multiple WITH PARTITION COLUMNS clauses silently overwrite. Not a real-world concern but worth a comment in the code.

Nits

  • Extra blank line at the end of the BigQuery early-return block (between the closing } and // For DYNAMIC/ICEBERG/EXTERNAL tables...).

What looks good

  • Early-return keeps the generic WITH properties path untouched
  • Generator placement is correct (WITH PARTITION COLUMNS → WITH CONNECTION → OPTIONS)
  • Pretty-print and compact modes both handled
  • Great test coverage — especially the 8 compatibility tests verifying non-EXTERNAL BigQuery syntax isn't broken
  • Roundtrip tests for all major variants
  • Proper #[serde(default, skip_serializing_if = ...)] on new fields

@mesmacosta mesmacosta force-pushed the feat/bigquery-create-external-table branch from 1a9add1 to 0f99e5f Compare March 16, 2026 17:11
@mesmacosta
Copy link
Author

@tobilg addressed the review items.

@tobilg
Copy link
Owner

tobilg commented Mar 16, 2026

Will have a look, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants