sqlcapture+source-mysql: Discovery filtering improvements#4481
Merged
Conversation
Refactors the 'DiscoverTables' operation into two distinct ones: - ListTables: returns a list of table names and nothing else - DiscoverTableDetails: returns a map of full discovery info corresponding to a specific set of tables. This commit is just the minimal change to refactor that interface and shouldn't introduce any significant behavioral differences.
Modifies various places where table discovery is performed in non- Discovery RPC contexts to explicitly specify just the list of what tables we care about thanks to active bindings. Right now this is not a huge change but it means that improvements to the selectivity of DiscoverTableDetails can yield meaningful efficiency gains. Also removes the discovery call from the Validate RPC entirely. It was added long long ago as a simple "make sure things are basically sane with this capture" check, but is both expensive and entirely redundant with SetupPrerequisites / SetupTablePrerequisites checks these days.
Just fixing linter complaints while I'm here.
Replaces the old discovery queries (which would, for instance, list all columns of all tables and then aggregate the results) with more restrained versions which query for just the metadata of a specific list of desired tables. This should already be much more efficient than before, but when we add some additional table/schema filtering options it will be even more useful.
Implements the "Discovery Filters" config section as described in #4476
Alex-Bair
reviewed
May 15, 2026
Member
There was a problem hiding this comment.
(This is mostly a note for myself for when I take another look later). The commits through the chunked discovery queries LGTM. I held off reviewing the last commit & approving since, based off your Slack thread, it sounds like we're considering using globbing instead of regex to match table patterns.
Implements a new helper package `go/tableglob` which knows how to match glob patterns against `(schema, table)` name tuples, with a bare pattern `"foo"` matching against the table in any schema and a qualified pattern `"foo.bar"` matching against both the schema and the table name. We could in theory have been lazier and just done regex matching against `schema + "." + table` or something, but this felt wrong because the natural impulse to write like `sales.items` to match a specific qualified table name would basically work but would be misusing the `.` metacharacter. It also just felt wrong to make users write `.*` instead of `*` for prefix/suffix matching and I liked the idea of supporting unqualified table names.
Member
Author
|
Yeah, I think we settled on globbing so I've gone ahead and implemented that now. I ended up adding a helper package in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
Implements #4476 in the shared
sqlcapturelogic and insource-mysql. Other databases remain largely unchanged with only minimal changes as required to fit the new list-then-details interface.Workflow steps:
Nothing should change for databases other than MySQL.
Users of
source-mysqlwill see a new "Discovery Filters" section in the task configuration, where they can configure more flexible discovery selection options.Documentation links affected:
The configuration sections in all source-mysql docs should be updated:
Notes for reviewers:
It might help to go commit-by-commit here, this is a conceptually straightforward refactor-then-optimize-then-extend sequence and seeing everything at once makes it look more complicated than it really is.