Default DataFrame source to pandas driver when none is given (fixes #1403)#1483
Merged
LuukBlom merged 3 commits intoJun 23, 2026
Merged
Conversation
When a DataFrame source was created without an explicit driver, DataSource._infer_default_driver searched all drivers (BaseDriver) for one supporting the file extension. As both the DataFrame 'pandas' driver and the GeoDataFrame 'geodataframe_table' driver claim .csv/.parquet, the latter was picked first, which is not a valid driver for DataFrameSource and raised a pydantic ValidationError. Override _infer_default_driver on DataFrameSource to restrict the search to DataFrame drivers, mirroring GeoDatasetSource. RasterDataset, GeoDataFrame and GeoDataset sources already do this; Dataset is unaffected as no other driver claims its extensions. Fixes Deltares#1403
Contributor
Author
|
I checked the current red CI jobs for this branch.
I did not push code changes because both failures are outside the scope of this PR. |
LuukBlom
approved these changes
Jun 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Defining a
DataFramesource without an explicitdriverraised aValidationError(#1403):DataSource._infer_default_driversearches all drivers (BaseDriver.find_all_possible_types()) for one that supports the file extension. Both the DataFramepandasdriver and the GeoDataFramegeodataframe_tabledriver claim.csv/.parquet, andgeodataframe_tableis enumerated first — so it gets selected, which is not a valid driver forDataFrameSource.Fix
Override
_infer_default_driveronDataFrameSourceto restrict the search toDataFrameDriversubclasses, mirroring the existing override onGeoDatasetSource.RasterDatasetSource,GeoDataFrameSourceandGeoDatasetSourcealready do this;DatasetSourceis unaffected because no other driver claims its extensions.Reproduction / verification
tests/data_catalog/sources/test_dataframe_source.py:_infer_default_driverreturns"pandas"for.csv/.parquet/.xlsx, andmodel_validateof a driver-less DataFrame dict yields thepandasdriver. They fail onmainand pass with this change. An explicitly specified driver is still honoured.tests/data_catalog/sources/suite passes (81 passed); the broadertests/data_catalog/run is green apart from one pre-existing failure that requires the Deltares P-drive (test_export_deltares_data[grwl_mask]), unrelated to this change.ruff checkandruff format --checkclean.