Skip to content

Adding PFN Surrogate#731

Open
KislayaRavi wants to merge 7 commits intoexperimental-design:mainfrom
KislayaRavi:pfn
Open

Adding PFN Surrogate#731
KislayaRavi wants to merge 7 commits intoexperimental-design:mainfrom
KislayaRavi:pfn

Conversation

@KislayaRavi
Copy link
Contributor

Motivation

Adding PFN surrogates to BoFire.
Closing Issue #655
I used the botorch_community version of PFN as mentioned in the aforementioned issue.

(Write your motivation here.)

Have you read the Contributing Guidelines on pull requests?

Yes

Have you updated CHANGELOG.md?

Yes

Test Plan

  • Added valid and invalid specs check
  • Added test_pfn.py which test the PFN Surrogates.

Some important things to note:

  • The PFN surrogate in botorch_community does not apply outcome_transform automatically. So, I needed to add it explicitly in the _predict function of PFNSurrogates. This also requires overriding _load and _dump functions for PFNsurrogate.
  • The performance depends considerably on the trained models. As of now, one can access the models in the PFN4BO repo. Users can also access any models which they trained on their own via url requests.
  • As of now TabPFN models cannot be used. It requires more work to add them (also some license issues). I have one eye on TabICL too. Its implementation is sometime further down the line (hopefully).
  • Preferably run PFN on GPUs. Speed is considerably slow on CPUs.

@jduerholt
Copy link
Contributor

Thanks @KislayaRavi,

I think we should have a discussion (including @bertiqwerty and @LukasHebing ) on how to handle this. I am bit hesitant to merge it in due to the kind of dependency hell with the pfn4bo package (https://github.com/experimental-design/bofire/pull/731/changes#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R55).

Maybe we should also add a bofire community directory (in analogy to botorch comunity) to handle this kind of stuff. From an architectural perspecitive, we would need to have an easy way of registering new data models and its functional equivalents, then people could easily put their custom models in without the need of changing the codebase.

What do you think?

Best,

Johannes

@KislayaRavi
Copy link
Contributor Author

I totally agree with having a separate bofire_community folder. Including PFN in the main part will cause a lot of headaches in future.
I have not thought very clearly about the architecture. In my opinion, the bofire_community folder can have the same structure as bofire folder and tests should be separate.
The idea of having an easy custom plug and play data model where users can test their custom surrogate without going through a lot of data_model stuff also intrigues me. This will increase flexibility.
I am also interested to hear from others. After that, I will create a separate bofire_community folder, move PFN related stuff there and submit the changes in the same PR.

@LukasHebing
Copy link
Contributor

LukasHebing commented Feb 20, 2026

Thanks @KislayaRavi,

I think we should have a discussion (including @bertiqwerty and @LukasHebing ) on how to handle this. I am bit hesitant to merge it in due to the kind of dependency hell with the pfn4bo package (https://github.com/experimental-design/bofire/pull/731/changes#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R55).

Maybe we should also add a bofire community directory (in analogy to botorch comunity) to handle this kind of stuff. From an architectural perspecitive, we would need to have an easy way of registering new data models and its functional equivalents, then people could easily put their custom models in without the need of changing the codebase.

What do you think?

Best,

Johannes

I like the idea of simplifying additional content (surrogates, objectives, etc.) with data-models without having to add them in 5 different bofire files. The architecture with data-model objects and "real" objects makes this difficult.

A nice solution would be an abstract data-model for e.g. surrogates, which has also all abstract methods (including torch-callables, _fit, etc.). The bahavior of the map functions could be pre-defined. But this requires some efforts.

If we have that in bofire, custom other content can be added way more easy.

@bertiqwerty
Copy link
Contributor

bertiqwerty commented Feb 20, 2026

Hi all, thanks @KislayaRavi.

I agree with @jduerholt that we should not merge this PR into BoFire due to the depedencies. Especially that only Python older 3.12 is allowed is a clear deal breaker.

Already some time ago I thought it could be helpful to have BoFire "extendible from the outside". Currently even if one would not want to use any mappers Pydantic would scream at you if you pass a datamodel of a surrogate that is not part of the corresponding union (e.g., AnyBotorchSurrogate) to a datamodel of a strategy , I guess. @jduerholt, what was the reason of creating these union types next to the inheritance hierarchy again? If this would be fixed, one maybe could create a BoFire community package maybe even in a separate repository under experimental-design. However, also a bofire-community-package should not be limited to Python versions < 3.12.

@LukasHebing, your proposal I did not understand. Maybe something to discuss in person or in a call?

@jduerholt
Copy link
Contributor

jduerholt commented Feb 20, 2026

I agree with you all, from my perspective we need a register function for the core components of BoFire like

register_surrogate(data_model, surrogate)

This method should then be exposed on the high level api, and can be used by people to add new surrogates, strategies, priors etc. without the need to modify the code. What do you think? Under the hood the method would register the data model and its functional equivalent in the example of a surrogate at this place:

SURROGATE_MAP: Dict[Type[data_models.Surrogate], Type[Surrogate]] = {

What do you think?

I agree, a call for this could be helpful, who wants to join?

@KislayaRavi @LukasHebing @bertiqwerty @TobyBoyne

@TobyBoyne
Copy link
Collaborator

If you schedule a call for any time next week, I will join :)

Some quick thoughts:

  1. I'm in favour of a bofire_community. For example with entmoot, it means that the main pipeline tests won't fail when entmoot packages aren't up to scratch. The biggest upside is that it lowers the bar for academic collaborators to contribute to BoFire. The biggest downside is that it implicitly says "don't rely on the things in here, because they aren't being actively maintained". Which would then maybe make it less likely for these strategies to be adopted, which would be a shame!

  2. I'm very much in favour of a register_* functionality. For BARK, when I wanted to compare my approach to standard BoFire models, I ended up with this subpackage. Specifically, I had to make this mapper, which first checked my custom dictionary before checking the bofire.strategies.api mapper. I found myself sorely wanting some register_* functions!

@KislayaRavi
Copy link
Contributor Author

Hello everyone,
I can also join the meeting this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants