Skip to content

sqlcapture+source-mysql: Discovery filtering improvements#4481

Merged
willdonnelly merged 6 commits into
mainfrom
wgd/2026-05-14-sqlcapture-table-filters
May 15, 2026
Merged

sqlcapture+source-mysql: Discovery filtering improvements#4481
willdonnelly merged 6 commits into
mainfrom
wgd/2026-05-14-sqlcapture-table-filters

Conversation

@willdonnelly
Copy link
Copy Markdown
Member

Description:

Implements #4476 in the shared sqlcapture logic and in source-mysql. Other databases remain largely unchanged with only minimal changes as required to fit the new list-then-details interface.

Workflow steps:

Nothing should change for databases other than MySQL.

Users of source-mysql will see a new "Discovery Filters" section in the task configuration, where they can configure more flexible discovery selection options.

Documentation links affected:

The configuration sections in all source-mysql docs should be updated:

Notes for reviewers:

It might help to go commit-by-commit here, this is a conceptually straightforward refactor-then-optimize-then-extend sequence and seeing everything at once makes it look more complicated than it really is.

Refactors the 'DiscoverTables' operation into two distinct ones:
- ListTables: returns a list of table names and nothing else
- DiscoverTableDetails: returns a map of full discovery info
  corresponding to a specific set of tables.

This commit is just the minimal change to refactor that interface
and shouldn't introduce any significant behavioral differences.
Modifies various places where table discovery is performed in non-
Discovery RPC contexts to explicitly specify just the list of what
tables we care about thanks to active bindings. Right now this is
not a huge change but it means that improvements to the selectivity
of DiscoverTableDetails can yield meaningful efficiency gains.

Also removes the discovery call from the Validate RPC entirely. It
was added long long ago as a simple "make sure things are basically
sane with this capture" check, but is both expensive and entirely
redundant with SetupPrerequisites / SetupTablePrerequisites checks
these days.
Just fixing linter complaints while I'm here.
Replaces the old discovery queries (which would, for instance,
list all columns of all tables and then aggregate the results)
with more restrained versions which query for just the metadata
of a specific list of desired tables.

This should already be much more efficient than before, but when
we add some additional table/schema filtering options it will be
even more useful.
Implements the "Discovery Filters" config section as described in
#4476
@willdonnelly willdonnelly requested a review from a team May 14, 2026 22:51
@willdonnelly willdonnelly added the change:planned This is a planned change label May 14, 2026
Copy link
Copy Markdown
Member

@Alex-Bair Alex-Bair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is mostly a note for myself for when I take another look later). The commits through the chunked discovery queries LGTM. I held off reviewing the last commit & approving since, based off your Slack thread, it sounds like we're considering using globbing instead of regex to match table patterns.

Implements a new helper package `go/tableglob` which knows how to
match glob patterns against `(schema, table)` name tuples, with a
bare pattern `"foo"` matching against the table in any schema and
a qualified pattern `"foo.bar"` matching against both the schema
and the table name.

We could in theory have been lazier and just done regex matching
against `schema + "." + table` or something, but this felt wrong
because the natural impulse to write like `sales.items` to match
a specific qualified table name would basically work but would be
misusing the `.` metacharacter. It also just felt wrong to make
users write `.*` instead of `*` for prefix/suffix matching and I
liked the idea of supporting unqualified table names.
@willdonnelly
Copy link
Copy Markdown
Member Author

Yeah, I think we settled on globbing so I've gone ahead and implemented that now.

I ended up adding a helper package in go/tableglob for that because this didn't seem worth pulling in a non-stdlib dependency for, using filepath.Match would have been a little awkward, and the translate-to-regex-and-compile code ended up being not too hairy.

Copy link
Copy Markdown
Member

@Alex-Bair Alex-Bair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@willdonnelly willdonnelly merged commit 579c7aa into main May 15, 2026
68 of 69 checks passed
@willdonnelly willdonnelly deleted the wgd/2026-05-14-sqlcapture-table-filters branch May 15, 2026 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:planned This is a planned change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants