Skip to content

Comments

Add support for FFI config extensions#19469

Open
timsaucer wants to merge 7 commits intoapache:mainfrom
timsaucer:feat/ffi-config-options
Open

Add support for FFI config extensions#19469
timsaucer wants to merge 7 commits intoapache:mainfrom
timsaucer:feat/ffi-config-options

Conversation

@timsaucer
Copy link
Member

@timsaucer timsaucer commented Dec 23, 2025

Which issue does this PR close?

This addresses part of #17035

This is also a blocker for #20450

Rationale for this change

Currently we cannot support user defined configuration extensions via FFI. This is because much of the infrastructure on how to add and extract custom extensions relies on knowing concrete types of the extensions. This is not supported in FFI. This PR adds an implementation of configuration extensions that can be used across a FFI boundary.

What changes are included in this PR?

  • Implement FFI_ExtensionOptions.
  • Update ConfigOptions to check if a datafusion_ffi namespace exists when setting values
  • Add unit test

Are these changes tested?

Unit test added.

Also tested against datafusion-python locally. With this code I have the following test that passes. I have created a simple python exposed MyConfig:

from datafusion import SessionConfig
from datafusion_ffi_example import MyConfig

def test_catalog_provider():
    config = MyConfig()
    config = SessionConfig().with_extension(config)
    config.set("my_config.baz_count", "42")

Are there any user-facing changes?

New addition only.

@github-actions github-actions bot added common Related to common crate ffi Changes to the ffi crate labels Dec 23, 2025
@timsaucer timsaucer force-pushed the feat/ffi-config-options branch from bd9be45 to b774113 Compare January 7, 2026 14:20
@timsaucer timsaucer force-pushed the feat/ffi-config-options branch from b774113 to d8195fc Compare February 17, 2026 02:48
@timsaucer timsaucer marked this pull request as ready for review February 17, 2026 13:55
@timsaucer
Copy link
Member Author

FYI @davisp I think this is ready for review and since you've been looking at these lately would you mind taking a look?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an FFI-safe representation of configuration extensions so non-Rust consumers (via datafusion-ffi) can register and mutate custom ConfigOptions / TableOptions extensions across an FFI boundary (per #17035).

Changes:

  • Introduces FFI_ExtensionOptions (FFI-stable container for extension key/value pairs) plus FFI_ConfigOptions / FFI_TableOptions wrappers.
  • Updates ConfigOptions::set / TableOptions::set to route unknown namespaces to the datafusion_ffi extension namespace when present.
  • Adds integration/unit tests to validate extension round-tripping and setting via the FFI extension path.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
datafusion/ffi/tests/ffi_config.rs New integration tests covering FFI extension behavior for ConfigOptions, TableOptions, and SessionConfig.
datafusion/ffi/src/tests/mod.rs Exposes test config module + adds create_extension_options export in the test root module.
datafusion/ffi/src/tests/config.rs Defines a sample ExternalConfig extension and exports create_extension_options returning FFI_ExtensionOptions.
datafusion/ffi/src/session/config.rs Changes FFI_SessionConfig to transport config options via FFI_ConfigOptions instead of a string map.
datafusion/ffi/src/lib.rs Exposes new config module.
datafusion/ffi/src/config/mod.rs Implements FFI_ConfigOptions / FFI_TableOptions and ExtensionOptionsFFIProvider.
datafusion/ffi/src/config/extension_options.rs Implements FFI_ExtensionOptions and conversion helpers (add_config, merge, to_extension).
datafusion/common/src/config.rs Adds DATAFUSION_FFI_CONFIG_NAMESPACE and updates set logic to support routing to the FFI namespace.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I wish we didn't need the datafusion_ffi namespace, but after spending a couple days staring at the existing config API, I don't see a better approach.

/// be used to simplify accessing FFI extensions.
#[repr(C)]
#[derive(Debug, Clone, StableAbi)]
pub struct FFI_TableOptions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for adding these preemptively.

/// the extension from the local options when possible. Should that fail, it
/// should attempt to extract the FFI options and then convert them to the
/// desired [`ConfigExtension`].
fn local_or_ffi_extension<C: ConfigExtension + Clone + Default>(&self) -> Option<C>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea having this built in.

@timsaucer
Copy link
Member Author

@milenkovicm would you mind reviewing this? I suspect it will be helpful to ballista's python integration as well. We've already had one reviewer from a community member who has been looking at the same issue.

{
inner_key = key;
prefix = DATAFUSION_FFI_CONFIG_NAMESPACE;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in case the user does something like this directly:

SET datafusion_ffi.custom_namespace.foo;

I imagine that:

  • If no datafusion_ffi namespace exists in the extension options, it will attempt datafusion_ffi.datafusion_ffi.custom_namespace.foo?
  • If datafusion_ffi actually exists in the namespace, then the FFI mechanism will kick in and the value will be set in the Python world. Is that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this helps. In my current datafusion-python branch I have the following:

from datafusion import SessionConfig, SessionContext
from datafusion_ffi_example import MyConfig


def test_catalog_provider():
    config = MyConfig()
    config = SessionConfig(
        {"datafusion.catalog.information_schema": "true"}
    ).with_extension(config)
    config.set("my_config.baz_count", "42")
    ctx = SessionContext(config)

    ctx.sql("SHOW my_config.baz_count;").show()

    ctx.sql("SET my_config.baz_count=1;")
    ctx.sql("SHOW my_config.baz_count;").show()

    ctx.sql("SET datafusion_ffi.my_config.baz_count=99;")
    ctx.sql("SHOW my_config.baz_count;").show()

This generates

DataFrame()
+---------------------+-------+
| name                | value |
+---------------------+-------+
| my_config.baz_count | 42    |
+---------------------+-------+
DataFrame()
+---------------------+-------+
| name                | value |
+---------------------+-------+
| my_config.baz_count | 1     |
+---------------------+-------+
DataFrame()
+---------------------+-------+
| name                | value |
+---------------------+-------+
| my_config.baz_count | 99    |
+---------------------+-------+

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, the edge case I was trying to reach is, in a context without DataFusion Python, if SET datafusion_ffi.custom_namespace.foo could break in an unexpected way, but I imagine the only thing will happen is that both datafusion_ffi.custom_namespace.foo and datafusion_ffi.datafusion_ffi.custom_namespace.foo will be tried and none will set anything.

@milenkovicm
Copy link
Contributor

will do @timsaucer

@timsaucer
Copy link
Member Author

Downstream PR in draft because it requires DF 53 release: apache/datafusion-python#1391

Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! having the "datafusion_ffi" doesn't seem like the most elegant thing in the world, but it's the most elegant thing that comes to mind. I'm not super familiar with the FFI code, so I'll let @milenkovicm chime in.

}
}

pub const DATAFUSION_FFI_CONFIG_NAMESPACE: &str = "datafusion_ffi";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is something public, and therefore, part of the public API, it would be nice to add a doc comment explaining what it is and how does it work. Probably a comment similar to what you posted in https://github.com/apache/datafusion/pull/19469/changes#r2842595482 but de-Pythonified would be a great doc comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could use this namespace for anything other than FFI.

Copy link
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good @timsaucer, thanks a lot.

I haven nothing to add to @gabotechs and @davisp reviews, I believe they did a good review

}
}

pub const DATAFUSION_FFI_CONFIG_NAMESPACE: &str = "datafusion_ffi";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could use this namespace for anything other than FFI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate ffi Changes to the ffi crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants