Skip to content

Preserve KeyError for missing artifacts#3181

Open
talsperre wants to merge 1 commit intomasterfrom
fix/load-artifacts-missing-keyerror
Open

Preserve KeyError for missing artifacts#3181
talsperre wants to merge 1 commit intomasterfrom
fix/load-artifacts-missing-keyerror

Conversation

@talsperre
Copy link
Copy Markdown
Collaborator

@talsperre talsperre commented May 7, 2026

Summary

  • remove the DataException wrapper for artifacts missing from _objects
  • let the natural _objects[name] lookup preserve the historical KeyError behavior
  • keep the old metadata fallback path: if an artifact has an object entry but no _info entry, load it with very old pickle defaults
  • add focused serializer integration coverage for missing artifact, missing metadata, and metadata-without-object cases

Context

The pluggable serializer load path already uses self._info.get(name, {}), which is the right compatibility behavior for artifacts without metadata: assume the very old pickle serializer defaults. The regression came from wrapping the later _objects[name] miss in a DataException, which made truly absent artifacts look like metadata/blob corruption.

Validation

  • python -m pytest test/unit/test_serializer_integration.py::test_missing_artifact_raises_key_error test/unit/test_serializer_integration.py::test_missing_info_with_object_uses_pickle_defaults test/unit/test_serializer_integration.py::test_info_without_object_raises_key_error -q
  • python -m pytest test/unit/test_serializer_integration.py -q
  • pre-commit run --files metaflow/datastore/task_datastore.py test/unit/test_serializer_integration.py

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR fixes a regression in the pluggable-serializer load path where self._info.get(name, {}) caused truly-absent artifacts to masquerade as having default metadata, turning their _objects miss into a DataException (corruption report) instead of the expected KeyError. The fix removes the surrounding try/except and lets the raw KeyError from self._objects[name] propagate for both the absent-artifact and the metadata-with-no-blob cases.

  • task_datastore.py: Drops the try/except KeyError → DataException wrapper around to_load[self._objects[name]].append(name), so both the "artifact was never saved" and "metadata exists but no stored blob" paths now raise a bare KeyError.
  • test_serializer_integration.py: Adds three focused tests: missing artifact raises KeyError, missing _info entry falls back to pickle defaults, and metadata-only artifact raises KeyError.

Confidence Score: 5/5

Safe to merge; the one-line change is narrowly scoped, the tests directly cover both affected paths, and no existing callers of load_artifacts catch DataException specifically for the corruption case.

The change is minimal and well-tested. Both the absent-artifact and metadata-without-blob paths now consistently raise KeyError, matching the historical contract. The only outstanding item is a documentation discrepancy in the PR description which does not affect runtime behavior.

No files require special attention.

Important Files Changed

Filename Overview
metaflow/datastore/task_datastore.py Removes try/except around self._objects[name] so a bare KeyError propagates for both truly-absent artifacts and the metadata-with-no-blob corruption case, restoring the historical contract.
test/unit/test_serializer_integration.py Adds three new tests covering the missing-artifact KeyError path, the no-metadata-with-object pickle-default path, and the metadata-only KeyError path; tests accurately reflect the new code behavior.

Reviews (2): Last reviewed commit: "Preserve KeyError for missing artifact b..." | Re-trigger Greptile

@talsperre talsperre force-pushed the fix/load-artifacts-missing-keyerror branch from 4a6a1d8 to ba21c10 Compare May 7, 2026 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant