-
Notifications
You must be signed in to change notification settings - Fork 0
Add /refresh endpoint #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Replace tests using the removed /discover endpoint with tests for the new /refresh endpoint. Remove unused discover_connection_handler and DiscoverConnectionResponse.
- Consolidate prepare_cache_write to always use versioned directories - Add CacheWriteHandle struct for atomic data refresh operations - Remove delete_stale_tables (too error-prone, doesn't handle cleanup) - Rename get_due_deletions to get_pending_deletions - Add include_uncached flag to connection data refresh (default: false)
Resolve conflicts from origin/main's new migration system refactor. Add v2.sql and v3.sql migrations for pending_deletions table.
- Add warning log on timestamp parse failure instead of silent fallback - Defer deletion by 24h on parse error to prevent premature data loss - Expand include_uncached documentation with usage context - Add comment explaining temp_dir lifetime in test harness
- Add retry_count to pending_deletions for stuck deletion handling - Remove records after MAX_DELETION_RETRIES (5) failed attempts - Make deletion worker interval configurable via builder - Make parallel refresh count configurable via builder - Extract magic numbers to named constants - Handle S3 temp file removal failure gracefully with warning
S3Storage::delete now properly handles directory URLs by listing and deleting all objects with that prefix, matching FilesystemStorage behavior. Migrated S3 tests to use testcontainers for automatic MinIO orchestration.
The tables_removed field was misleading - it reported tables detected as removed from the remote source but didn't actually delete them from the catalog. Removed the field entirely to avoid confusing API consumers.
All call sites that delete versioned cache directories now use delete_prefix instead of delete. Reverted S3Storage::delete to only handle single files. This makes the API clearer: delete for files, delete_prefix for directories/prefixes.
FilesystemStorage::delete_prefix now strips the file:// prefix if present, fixing cache cleanup for local filesystem storage. Added integration tests to verify delete_prefix works with both file:// URLs and raw paths.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Details
Breaking Changes
API
The
/refreshendpoint provides a unified interface for refreshing schema metadata and cached data. It supports multiple operation modes depending on the parameters provided.Endpoint
Request Parameters
connection_idschema_nametable_nameschema_nameandconnection_id.datatrue, refreshes cached parquet data. Iffalse(default), refreshes schema metadata only.include_uncacheddata=trueon a connection-wide refresh, also sync tables that aren't already cached. Default:false.Operation Modes
1. Schema Refresh: All Connections
Discovers tables from all configured connections and updates the catalog.
POST /refresh {}Response:
{ "connections_refreshed": 2, "connections_failed": 0, "tables_discovered": 15, "tables_added": 3, "tables_modified": 1 }If any connections fail (e.g., due to missing credentials), the operation continues with remaining connections and reports failures:
{ "connections_refreshed": 1, "connections_failed": 1, "tables_discovered": 10, "tables_added": 2, "tables_modified": 0, "errors": [ { "connection_id": "broken_conn", "error": "Failed to connect: missing credentials" } ] }2. Schema Refresh: Single Connection
Discovers tables from a specific connection.
Response:
{ "connections_refreshed": 1, "connections_failed": 0, "tables_discovered": 8, "tables_added": 1, "tables_modified": 0 }3. Data Refresh: Single Table
Re-fetches data for a specific table and updates the parquet cache.
Response:
{ "connection_id": "my_postgres", "schema_name": "public", "table_name": "orders", "rows_synced": 15000, "duration_ms": 1234 }Returns
404if the table doesn't exist in the catalog.4. Data Refresh: All Tables in Connection
Re-fetches data for all cached tables in a connection.
Response:
{ "connection_id": "my_postgres", "tables_refreshed": 5, "tables_failed": 1, "total_rows": 50000, "duration_ms": 5678, "errors": [ { "schema_name": "public", "table_name": "broken_table", "error": "Query failed: relation does not exist" } ] }By default, only tables that already have cached data are refreshed. To also sync uncached tables:
Warnings
Non-fatal issues (like failed cleanup of old cache files) are reported in a
warningsarray without failing the operation:{ "connection_id": "my_postgres", "schema_name": "public", "table_name": "orders", "rows_synced": 15000, "duration_ms": 1234, "warnings": [ { "schema_name": "public", "table_name": "orders", "message": "Failed to schedule deletion of old cache: disk full" } ] }The
warningsfield is omitted when empty.Invalid Combinations
data=truewithoutconnection_id400 Bad Request- data refresh requires connection_idschema_namewithouttable_name(schema=false)400 Bad Request- schema-level refresh not supportedschema_namewithouttable_name(data=true)400 Bad Request- data refresh with schema requires table_nametable_namewithoutschema_name400 Bad Request- table_name requires schema_nameschema_nameortable_namewith schema refresh400 Bad Request- schema refresh cannot target specific tablesError Responses
400 Bad Request- Invalid parameter combination404 Not Found- Connection or table not found500 Internal Server Error- Unexpected server error