Skip to content

Default DataFrame source to pandas driver when none is given (fixes #1403)#1483

Merged
LuukBlom merged 3 commits into
Deltares:mainfrom
gaoflow:fix/1403-dataframe-default-driver
Jun 23, 2026
Merged

Default DataFrame source to pandas driver when none is given (fixes #1403)#1483
LuukBlom merged 3 commits into
Deltares:mainfrom
gaoflow:fix/1403-dataframe-default-driver

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Problem

Defining a DataFrame source without an explicit driver raised a ValidationError (#1403):

data_catalog = hydromt.DataCatalog()
data_catalog.from_dict({
    "agroforestry_dataframe": {
        "data_type": "DataFrame",
        "uri": "path/to/dataframe.csv",
        # no "driver" specified
    },
})
# pydantic_core.ValidationError: ... driver
#   Value error, Unknown 'name': 'geodataframe_table'

DataSource._infer_default_driver searches all drivers (BaseDriver.find_all_possible_types()) for one that supports the file extension. Both the DataFrame pandas driver and the GeoDataFrame geodataframe_table driver claim .csv/.parquet, and geodataframe_table is enumerated first — so it gets selected, which is not a valid driver for DataFrameSource.

Fix

Override _infer_default_driver on DataFrameSource to restrict the search to DataFrameDriver subclasses, mirroring the existing override on GeoDatasetSource. RasterDatasetSource, GeoDataFrameSource and GeoDatasetSource already do this; DatasetSource is unaffected because no other driver claims its extensions.

Reproduction / verification

import hydromt
dc = hydromt.DataCatalog()
dc.from_dict({"df": {"data_type": "DataFrame", "uri": "path/to/dataframe.csv"}})
print(dc.get_source("df").driver.name)
# before: ValidationError (geodataframe_table)
# after : "pandas"
  • New tests in tests/data_catalog/sources/test_dataframe_source.py: _infer_default_driver returns "pandas" for .csv/.parquet/.xlsx, and model_validate of a driver-less DataFrame dict yields the pandas driver. They fail on main and pass with this change. An explicitly specified driver is still honoured.
  • Full tests/data_catalog/sources/ suite passes (81 passed); the broader tests/data_catalog/ run is green apart from one pre-existing failure that requires the Deltares P-drive (test_export_deltares_data[grwl_mask]), unrelated to this change.
  • ruff check and ruff format --check clean.

gaoflow and others added 2 commits June 1, 2026 15:42
When a DataFrame source was created without an explicit driver,
DataSource._infer_default_driver searched all drivers (BaseDriver) for one
supporting the file extension. As both the DataFrame 'pandas' driver and the
GeoDataFrame 'geodataframe_table' driver claim .csv/.parquet, the latter was
picked first, which is not a valid driver for DataFrameSource and raised a
pydantic ValidationError.

Override _infer_default_driver on DataFrameSource to restrict the search to
DataFrame drivers, mirroring GeoDatasetSource. RasterDataset, GeoDataFrame
and GeoDataset sources already do this; Dataset is unaffected as no other
driver claims its extensions.

Fixes Deltares#1403
@gaoflow

gaoflow commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

I checked the current red CI jobs for this branch.

  • pytest low-314 (ubuntu-latest) has a single failure in tests/data_catalog/test_data_catalog.py::test_azure_get_rasterdataset[esa_worldcover_azure]. The traceback is from fetching the Azure SAS token and ends with ConnectionResetError: [Errno 104] Connection reset by peer, so this looks like an external Azure/network integration failure rather than the DataFrame driver change.
  • run test coverage and SonarQube scan completed the test suite successfully (1018 passed, 9 skipped, 77 deselected) and then failed in the SonarQube upload because the workflow does not have an authorized SONAR_TOKEN in this context.

I did not push code changes because both failures are outside the scope of this PR.

@LuukBlom LuukBlom merged commit b96b3f7 into Deltares:main Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants