Skip to content

Mission-critical audit: downloader safety, endpoint hardening, and test stabilization#33

Open
christopherkarani wants to merge 1 commit intomainfrom
automation/check-frameworks-issues-20260301
Open

Mission-critical audit: downloader safety, endpoint hardening, and test stabilization#33
christopherkarani wants to merge 1 commit intomainfrom
automation/check-frameworks-issues-20260301

Conversation

@christopherkarani
Copy link
Owner

Summary

This PR applies mission-critical correctness and safety fixes identified during the framework audit, plus regression test stabilization to prevent false negatives.

What Changed

1) Build/Test gate correctness

  • Fixed DiffusionModelDownloaderTests compile guard to match production symbol availability:
    • from #if canImport(Hub)
    • to #if CONDUIT_TRAIT_MLX && canImport(MLX) && (canImport(Hub) || canImport(HuggingFace))
  • Prevents test target compile failures in default trait configurations.

2) Diffusion downloader correctness hardening

File: Sources/Conduit/ImageGeneration/DiffusionModelDownloader.swift

  • Added in-flight reserved disk accounting:
    • reservedDiskBytesByModel tracks reservation per model download.
    • checkAvailableDiskSpace now accounts for in-progress reservations to avoid concurrent overcommit false safety checks.
  • Added in-flight task reuse check before creating a new task for the same model.
  • Fixed deletion race:
    • deleteModel(modelId:) now cancels and awaits any in-flight task before deleting/removing registry state.
    • Prevents model re-registration after deletion from a completing stale task.
  • Fixed state consistency for bulk delete:
    • deleteAllModels() no longer silently ignores deletion failures and then wipes registry.
    • It removes successfully deleted/missing entries and throws AIError.fileError when real deletion failures occur.
  • Strengthened checksum integrity behavior:
    • If checksum is expected but no .safetensors file is found, now throws AIError.checksumMismatch instead of silently succeeding.
  • Ensured reserved disk state is released on cleanup/cancellation (cancelDownload, cancelAllDownloads, deferred completion).

3) ModelManager task lifecycle safety

File: Sources/Conduit/ModelManagement/ModelManager.swift

  • Replaced Task.detached with Task for background download kickoff in downloadTask(for:).
  • Improves cancellation/lifecycle semantics by keeping task behavior tied to surrounding async context instead of fully detached fire-and-forget semantics.

4) OpenAI Azure endpoint crash hardening

File: Sources/Conduit/Providers/OpenAI/OpenAIEndpoint.swift

  • Removed force-unwrapped URL construction from user-provided Azure resource input path.
  • Added resource sanitization (trim/lowercase/allowed charset/collapse separators) and URL construction through URLComponents.
  • Added safe fallback URL if component construction fails.
  • Eliminates runtime crash risk from malformed resource strings.

5) OpenAI endpoint regression tests

File: Tests/ConduitTests/Providers/OpenAI/OpenAIProviderTests.swift

  • Added tests for:
    • invalid Azure resource sanitization
    • empty Azure resource fallback to default

6) Registry/provider test stability improvements

Files:

  • Tests/ConduitTests/Core/ModelIdentifierTests.swift

  • Tests/ConduitTests/Core/ProtocolCompilationTests.swift

  • Replaced brittle hardcoded model/provider totals with invariant assertions based on current registry contents.

  • Updated ProviderType case-iterable expectations to include newer providers (kimi, minimax) and current case count.

  • Prevents unrelated provider catalog growth from causing false failing tests.

7) Compiler-safety cleanup

Files:

  • Sources/Conduit/Core/Types/GenerationSchema.swift

  • Sources/Conduit/Core/Types/DeviceCapabilities.swift

  • GenerationSchema: changed EncodingError.invalidValue(Any, Context) to EncodingError.invalidValue(String, Context) to avoid strict-concurrency Sendable warnings around Any payloads.

  • DeviceCapabilities: replaced deprecated C-string conversion with safe UTF-8 byte decoding path.

Verification

  • swift test -> PASS (352 tests in 60 suites)
  • swift build -> PASS

Risk Notes

  • Downloader changes affect runtime state transitions around cancellation/deletion and disk-space checks. Tests/build pass; behavior is now stricter and safer under concurrent operations.
  • Registry test updates intentionally shift from static-count assertions to invariant-based assertions to reduce brittleness as model catalogs evolve.

Mission-critical audit fixes:
- Align DiffusionModelDownloaderTests compile gating with production MLX/Hub/HuggingFace availability
- Harden DiffusionModelDownloader correctness:
  - reserve disk budget per in-flight model to prevent concurrent overcommit checks
  - cancel/await in-flight task during deleteModel to prevent re-registration races
  - release reserved bytes on cancellation and cleanup
  - make deleteAllModels report file deletion failures instead of silently clearing state
  - enforce checksum verification failure when expected checksum is provided but no safetensors file is present
- Replace detached model download launcher with child Task in ModelManager for better cancellation/lifecycle semantics
- Eliminate Azure endpoint crash risk by sanitizing resource names and constructing URL via URLComponents (with safe fallback)
- Add Azure endpoint regression tests for invalid/empty resource inputs
- Remove brittle hardcoded registry/provider totals in tests and assert invariant behavior against current registry data
- Update ProviderType case-iterable test coverage for newly added providers
- Clean up strict-concurrency warning path in GenerationSchema encoding error payload type
- Update DeviceCapabilities C-string decoding to non-deprecated UTF-8 decoding

Verification:
- swift test (352 tests, 60 suites) passes
- swift build passes
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63e604a278

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 202 to +204
activeDownloads[modelId]?.cancel()
activeDownloads.removeValue(forKey: modelId)
releaseReservedDiskSpace(for: modelId)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve disk reservation until cancellation completes

Releasing reserved bytes inside cancelDownload before the underlying task has finished makes the new disk-safety accounting inaccurate under cooperative cancellation. The download task only checks cancellation around the long snapshot await, so a canceled transfer can keep consuming disk for a while; if another download starts in that window, checkAvailableDiskSpace no longer includes the in-flight reservation and can admit an overcommit that this change is meant to prevent. Keep the reservation until task termination (e.g., via the existing deferred cleanup) rather than clearing it immediately on cancel.

Useful? React with 👍 / 👎.

@christopherkarani
Copy link
Owner Author

@claude Releasing reserved bytes inside cancelDownload before the underlying task has finished makes the new disk-safety accounting inaccurate under cooperative cancellation. The download task only checks cancellation around the long snapshot await, so a canceled transfer can keep consuming disk for a while; if another download starts in that window, checkAvailableDiskSpace no longer includes the in-flight reservation and can admit an overcommit that this change is meant to prevent. Keep the reservation until task termination (e.g., via the existing deferred cleanup) rather than clearing it immediately on cancel

@claude fix merge conflicts with main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant