Add /refresh endpoint #55

zfarrell · 2026-01-16T21:04:28Z

Summary

Replaces /connections/{id}/discover with unified POST /refresh for schema + data refresh
Adds versioned cache writes (atomic swaps) and deferred deletion with a background worker
Adds refresh warnings and better error mapping for not-found tables

Details

New refresh request/response models with optional warnings
Storage refactor: versioned directories + cache write handle (filesystem + S3)
Catalog support for pending deletions with retry tracking
Extensive new tests for refresh behavior, storage, and deletion flow

Breaking Changes

Removes /connections/{id}/discover; use POST /refresh instead

API

The /refresh endpoint provides a unified interface for refreshing schema metadata and cached data. It supports multiple operation modes depending on the parameters provided.

Endpoint

POST /refresh
Content-Type: application/json

Request Parameters

Parameter	Type	Required	Description
`connection_id`	string	No	External connection ID to target. If omitted, operates on all connections.
`schema_name`	string	No	Database schema name. Required when targeting a specific table.
`table_name`	string	No	Table name. Requires `schema_name` and `connection_id`.
`data`	boolean	No	If `true`, refreshes cached parquet data. If `false` (default), refreshes schema metadata only.
`include_uncached`	boolean	No	When `data=true` on a connection-wide refresh, also sync tables that aren't already cached. Default: `false`.

Operation Modes

1. Schema Refresh: All Connections

Discovers tables from all configured connections and updates the catalog.

POST /refresh
{}

Response:

{
  "connections_refreshed": 2,
  "connections_failed": 0,
  "tables_discovered": 15,
  "tables_added": 3,
  "tables_modified": 1
}

If any connections fail (e.g., due to missing credentials), the operation continues with remaining connections and reports failures:

{
  "connections_refreshed": 1,
  "connections_failed": 1,
  "tables_discovered": 10,
  "tables_added": 2,
  "tables_modified": 0,
  "errors": [
    {
      "connection_id": "broken_conn",
      "error": "Failed to connect: missing credentials"
    }
  ]
}

2. Schema Refresh: Single Connection

Discovers tables from a specific connection.

POST /refresh
{
  "connection_id": "my_postgres"
}

Response:

{
  "connections_refreshed": 1,
  "connections_failed": 0,
  "tables_discovered": 8,
  "tables_added": 1,
  "tables_modified": 0
}

3. Data Refresh: Single Table

Re-fetches data for a specific table and updates the parquet cache.

POST /refresh
{
  "connection_id": "my_postgres",
  "schema_name": "public",
  "table_name": "orders",
  "data": true
}

Response:

{
  "connection_id": "my_postgres",
  "schema_name": "public",
  "table_name": "orders",
  "rows_synced": 15000,
  "duration_ms": 1234
}

Returns 404 if the table doesn't exist in the catalog.

4. Data Refresh: All Tables in Connection

Re-fetches data for all cached tables in a connection.

POST /refresh
{
  "connection_id": "my_postgres",
  "data": true
}

Response:

{
  "connection_id": "my_postgres",
  "tables_refreshed": 5,
  "tables_failed": 1,
  "total_rows": 50000,
  "duration_ms": 5678,
  "errors": [
    {
      "schema_name": "public",
      "table_name": "broken_table",
      "error": "Query failed: relation does not exist"
    }
  ]
}

By default, only tables that already have cached data are refreshed. To also sync uncached tables:

POST /refresh
{
  "connection_id": "my_postgres",
  "data": true,
  "include_uncached": true
}

Warnings

Non-fatal issues (like failed cleanup of old cache files) are reported in a warnings array without failing the operation:

{
  "connection_id": "my_postgres",
  "schema_name": "public",
  "table_name": "orders",
  "rows_synced": 15000,
  "duration_ms": 1234,
  "warnings": [
    {
      "schema_name": "public",
      "table_name": "orders",
      "message": "Failed to schedule deletion of old cache: disk full"
    }
  ]
}

The warnings field is omitted when empty.

Invalid Combinations

Request	Error
`data=true` without `connection_id`	`400 Bad Request` - data refresh requires connection_id
`schema_name` without `table_name` (schema=false)	`400 Bad Request` - schema-level refresh not supported
`schema_name` without `table_name` (data=true)	`400 Bad Request` - data refresh with schema requires table_name
`table_name` without `schema_name`	`400 Bad Request` - table_name requires schema_name
`schema_name` or `table_name` with schema refresh	`400 Bad Request` - schema refresh cannot target specific tables

Error Responses

400 Bad Request - Invalid parameter combination
404 Not Found - Connection or table not found
500 Internal Server Error - Unexpected server error

…etions

Replace tests using the removed /discover endpoint with tests for the new /refresh endpoint. Remove unused discover_connection_handler and DiscoverConnectionResponse.

- Consolidate prepare_cache_write to always use versioned directories - Add CacheWriteHandle struct for atomic data refresh operations - Remove delete_stale_tables (too error-prone, doesn't handle cleanup) - Rename get_due_deletions to get_pending_deletions - Add include_uncached flag to connection data refresh (default: false)

Resolve conflicts from origin/main's new migration system refactor. Add v2.sql and v3.sql migrations for pending_deletions table.

- Add warning log on timestamp parse failure instead of silent fallback - Defer deletion by 24h on parse error to prevent premature data loss - Expand include_uncached documentation with usage context - Add comment explaining temp_dir lifetime in test harness

- Add retry_count to pending_deletions for stuck deletion handling - Remove records after MAX_DELETION_RETRIES (5) failed attempts - Make deletion worker interval configurable via builder - Make parallel refresh count configurable via builder - Extract magic numbers to named constants - Handle S3 temp file removal failure gracefully with warning

S3Storage::delete now properly handles directory URLs by listing and deleting all objects with that prefix, matching FilesystemStorage behavior. Migrated S3 tests to use testcontainers for automatic MinIO orchestration.

The tables_removed field was misleading - it reported tables detected as removed from the remote source but didn't actually delete them from the catalog. Removed the field entirely to avoid confusing API consumers.

All call sites that delete versioned cache directories now use delete_prefix instead of delete. Reverted S3Storage::delete to only handle single files. This makes the API clearer: delete for files, delete_prefix for directories/prefixes.

FilesystemStorage::delete_prefix now strips the file:// prefix if present, fixing cache cleanup for local filesystem storage. Added integration tests to verify delete_prefix works with both file:// URLs and raw paths.

zfarrell added 30 commits January 14, 2026 16:26

docs: add refresh endpoint implementation plan

881dbe5

docs: add detailed step-by-step implementation plan

76b87eb

feat(http): add refresh endpoint request/response types

c30f964

feat(catalog): add pending deletion type and new trait methods

9baaa84

feat(catalog): implement new methods in backend

caf9324

feat(catalog): add v2 migration for pending_deletions table

92ea79d

feat(catalog): implement new trait methods in sqlite manager

07ec414

feat(catalog): implement new trait methods in postgres manager

3f768bf

test(catalog): add unit tests for delete_stale_tables and pending del…

04ab2d3

…etions

feat(storage): add prepare_versioned_cache_write method

725c531

test(storage): add unit tests for versioned cache paths

ef861d3

feat(orchestrator): add discover_tables method

143d4c7

feat(orchestrator): add refresh_table method with atomic swap

ec582c3

feat(engine): add refresh_schema and refresh_all_schemas methods

53c2bdd

feat(engine): add data refresh methods with atomic swap

8cd3d22

feat(engine): add background deletion worker

1a9c3c5

feat(http): add refresh endpoint handler

bb37517

feat(http): add /refresh route, remove /discover route

1685279

refactor(engine): remove discover_connection, use refresh_schema

954cbd8

test: update HTTP tests to use /refresh endpoint

7f6e366

Replace tests using the removed /discover endpoint with tests for the new /refresh endpoint. Remove unused discover_connection_handler and DiscoverConnectionResponse.

test: add comprehensive refresh endpoint integration tests

49d05af

test(refresh): add concurrent refresh integration tests

15c5c76

test(refresh): add tests for include_uncached flag

04fa6e6

merge(catalog): integrate new compile-time migration system

f5a3b6e

Resolve conflicts from origin/main's new migration system refactor. Add v2.sql and v3.sql migrations for pending_deletions table.

fix(storage): handle versioned directory deletion in S3

f219484

S3Storage::delete now properly handles directory URLs by listing and deleting all objects with that prefix, matching FilesystemStorage behavior. Migrated S3 tests to use testcontainers for automatic MinIO orchestration.

refactor(api): remove tables_removed from schema refresh response

90c4ad4

The tables_removed field was misleading - it reported tables detected as removed from the remote source but didn't actually delete them from the catalog. Removed the field entirely to avoid confusing API consumers.

zfarrell added 9 commits January 16, 2026 11:48

fix(storage): handle file:// URLs in delete_prefix

6371584

FilesystemStorage::delete_prefix now strips the file:// prefix if present, fixing cache cleanup for local filesystem storage. Added integration tests to verify delete_prefix works with both file:// URLs and raw paths.

fix(engine): clamp parallel_refresh_count to minimum 1

810e93c

fix(refresh): return 404 early for missing tables

59e09c6

feat(refresh): add non-fatal warnings for deletion enqueue failures

eab3090

test(refresh): add tests for warnings serialization behavior

93e8952

refactor(migrations): collapse v2-v4 into v1

b90393c

remove docs

16fbc1d

fix(tests): address clippy warnings in refresh tests

d564291

fix(refresh): continue processing when connection schema refresh fails

d7a86d8

zfarrell marked this pull request as ready for review January 16, 2026 21:37

zfarrell merged commit 3206eb7 into main Jan 16, 2026
6 checks passed

zfarrell mentioned this pull request Jan 21, 2026

Add support for updating table data #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add /refresh endpoint #55

Add /refresh endpoint #55

Uh oh!

zfarrell commented Jan 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add /refresh endpoint #55

Add /refresh endpoint #55

Uh oh!

Conversation

zfarrell commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Breaking Changes

API

Endpoint

Request Parameters

Operation Modes

1. Schema Refresh: All Connections

2. Schema Refresh: Single Connection

3. Data Refresh: Single Table

4. Data Refresh: All Tables in Connection

Warnings

Invalid Combinations

Error Responses

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zfarrell commented Jan 16, 2026 •

edited

Loading