Skip to content

Comments

chore(deps): upgrade to DataFusion 52#1997

Open
ethan-tyler wants to merge 4 commits intoapache:mainfrom
ethan-tyler:chore/datafusion-52-validation
Open

chore(deps): upgrade to DataFusion 52#1997
ethan-tyler wants to merge 4 commits intoapache:mainfrom
ethan-tyler:chore/datafusion-52-validation

Conversation

@ethan-tyler
Copy link

@ethan-tyler ethan-tyler commented Jan 6, 2026

Which issue does this PR close?

Validates and adopts DataFusion 52

What changes are included in this PR?

  • Upgrade DataFusion integration from 51.x to 52.x
  • Keep the DataFusion Python dependency dynamic within major version 52
  • Update the Python FFI table provider bridge for DataFusion 52 API/ABI expectations:
    • session-aware __datafusion_table_provider__(session) integration
    • DF52 compatible FFI table provider construction with task context and logical codec handling
  • Update sqllogictest physical plan expectations for DataFusion 52 planner output changes
  • Refresh lockfiles impacted by the upgrade

DataFusion FFI API change

DataFusion 52 expanded table provider FFI construction to include task context and optional logical codec parameters.

Updated the Rust/Python bridge accordingly allowing filter/logical expression serialization remains compatible across the FFI boundary.

Are these changes tested?

Yes

@ethan-tyler
Copy link
Author

The audit failure (RUSTSEC-2026-0001 for rkyv) is unrelated to this PR - it's being addressed in #1994. Will rebase once that lands.

@ethan-tyler
Copy link
Author

Fix for Python Bindings CI Failure

The initial PR failed the Bindings Python CI workflow due to a breaking API change in DataFusion 51+'s FFI module.

Root cause: FFI_TableProvider::new signature changed from 3 to 5 arguments.

Fix (commit 33d5608):

  • Added datafusion and datafusion-execution dependencies to bindings/python/Cargo.toml
  • Updated datafusion_table_provider.rs to create a TaskContextProvider from SessionContext and pass it to FFI_TableProvider::new

The core iceberg-rust crates were already compatible with DataFusion 52 - only the Python bindings needed this update.

@timsaucer
Copy link
Member

Please let me know if you run into difficulties with this PR also regarding the FFI change. I think that my approach in apache/datafusion-python#1337 will help resolve the missing elements here.

@ethan-tyler ethan-tyler force-pushed the chore/datafusion-52-validation branch 2 times, most recently from ae7b70e to 723e3a6 Compare January 23, 2026 23:59
@Smith-Cruise
Copy link

Is there any progress now?

@ethan-tyler ethan-tyler force-pushed the chore/datafusion-52-validation branch from 723e3a6 to a19062d Compare February 21, 2026 18:03
@ethan-tyler
Copy link
Author

Is there any progress now?

I rebased and got CI cleaned up. DF 52 Python wheels should be landing in the next day or so - https://lists.apache.org/thread/76v9pmqh7cflgjwx4wnqsmdzw00v62bl. To limit an additional follow up, I am waiting to include that and will open the PR.

@timsaucer
Copy link
Member

Let me know if you need any help with this PR.

@ethan-tyler
Copy link
Author

Let me know if you need any help with this PR.

Thanks Tim - it's been a hot minute since I worked on Iceberg and got myself into a CI pickle. I think we should be good after this most recent push. I'll let you know if I run into any trouble.

@ethan-tyler ethan-tyler changed the title [WIP] chore(deps): validate DataFusion 52 compatibility chore(deps): upgrade to DataFusion 52 Feb 23, 2026
@ethan-tyler ethan-tyler force-pushed the chore/datafusion-52-validation branch from 19690bf to 3309d6c Compare February 23, 2026 19:59
@ethan-tyler ethan-tyler marked this pull request as ready for review February 23, 2026 20:15
@mbutrovich mbutrovich self-requested a review February 23, 2026 20:20
- uses: PyO3/maturin-action@v1
with:
working-directory: "bindings/python"
maturin-version: "v1.12.2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2166 should also resolve the CI issue. I would lean towards that solution. Otherwise, we should pin all the PyO3/maturin-action@v1 usages

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I agree and like this better!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2166 is merged now, could you rebase and undo the current change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye aye captain

Cargo.toml Outdated
datafusion = "51.0"
datafusion-cli = "51.0"
datafusion-sqllogictest = "51.0"
datafusion = "52.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

52.1.0 is already out 😄 should we use that instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes absolutely thanks for the catch

assert (
datafusion.__version__ >= "45"
) # iceberg table provider only works for datafusion >= 45
if Version(datafusion.__version__) < Version("52.0.0"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this pr, but is it possible to extract the version from workspace Cargo.toml?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - I can do this on a follow up PR

@ethan-tyler ethan-tyler force-pushed the chore/datafusion-52-validation branch from 3309d6c to 196c528 Compare February 24, 2026 04:22
datafusion-pruning@51.0.0 X
datafusion-session@51.0.0 X
datafusion-sql@51.0.0 X
datafusion@52.0.0 X
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We forgot to update this?

name = "pyiceberg-core"
readme = "project-description.md"
requires-python = ">=3.10,<4"
requires-python = ">=3.10,<3.13"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? cc @kevinjqliu Is this reasonable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants