Skip to content

Refactor: Consolidate schema inference (Phase 3)#16

Merged
jayendra13 merged 2 commits into
mainfrom
refactor/phase3-schema-inference
Feb 4, 2026
Merged

Refactor: Consolidate schema inference (Phase 3)#16
jayendra13 merged 2 commits into
mainfrom
refactor/phase3-schema-inference

Conversation

@jayendra13
Copy link
Copy Markdown
Owner

Summary

Phase 3 of refactoring to eliminate duplicated code. This builds on Phase 1 (#13) and Phase 2 (#15).

Note: This PR includes Phase 1 and Phase 2 commits. After those PRs are merged, this PR will auto-update to show only Phase 3 changes.

Phase 3 Changes

Extract the duplicated schema building logic from three functions into a single build_schema_from_store_meta() helper:

  • infer_schema_from_zmetadata_json
  • infer_schema_with_meta
  • infer_schema_with_meta_async

All three had identical logic for:

  1. Building Dictionary-encoded coordinate fields (with CF time support)
  2. Building regular array fields for data variables

The helper centralizes this logic, making future schema changes easier to maintain consistently.

Impact

  • Phase 3 net change: -50 lines (57 additions, 107 deletions)
  • Single source of truth for schema construction
  • No functional changes - all tests pass

Test plan

  • cargo test - All tests pass
  • cargo clippy - No warnings
  • cargo fmt - Properly formatted

Depends on: #13 (Phase 1), #15 (Phase 2)

Part of #5

Extract common patterns from execute_remote, execute_virtualizarr, and
execute_virtualizarr_with_adapter into shared helpers:

- build_exec_projected_schema(): Common schema projection logic
- AsyncReadParams: Encapsulates common async read parameters
- execute_async_read(): Generic async execution with store setup closure

This consolidates the repeated stream creation pattern:
1. Build projected schema
2. Create async stream with store setup
3. Call read_zarr_async
4. Collect batches and wrap in RecordBatchStreamAdapter

Impact: -17 lines (132 additions, 149 deletions) with clearer separation
of concerns. Each execute function now focuses only on its unique store
setup logic.

Part of: #5
Extract the duplicated schema building logic from three functions into
a single `build_schema_from_store_meta()` helper:

- infer_schema_from_zmetadata_json
- infer_schema_with_meta
- infer_schema_with_meta_async

All three had identical logic for:
1. Building Dictionary-encoded coordinate fields (with CF time support)
2. Building regular array fields for data variables

The helper centralizes this logic, making future schema changes easier
to maintain consistently.

Impact: -50 lines (57 additions, 107 deletions)

Part of: #5
@jayendra13 jayendra13 merged commit b9dc89b into main Feb 4, 2026
1 of 3 checks passed
@jayendra13 jayendra13 deleted the refactor/phase3-schema-inference branch February 4, 2026 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant