Skip to content

Small Refactorings to the DoE Strategy#514

Draft
jduerholt wants to merge 1 commit intomainfrom
refactor/doe_strategy
Draft

Small Refactorings to the DoE Strategy#514
jduerholt wants to merge 1 commit intomainfrom
refactor/doe_strategy

Conversation

@jduerholt
Copy link
Contributor

I was looking to the DoEStrategyand spotted some issues and possible improvements:

  • Make strategy.set_candidates versatile enough to be able to deal also with partially fixed one without overwriting the method just for the DoEStrategy.
  • For dealing with categoricals, a one-hot encoding is performed and new continuous features are introduced. We should use there the already implemented method for one-hot encoding of categoricals, so that we do not duplicate code. Furthermore, the continuous features are named based on the category names, but this will break when you have two categorical variables which share at least one category with the same name. We should follow there the general convention to name it always <feature.key>_<category_name>.
  • Currently, it is only conditioning the generated experiments on the candidates. It should also do this for self.experiments, as this is the usual way most ACB APIs work, by using ask and tell and conditioning on self.experiments. For this, I assume that it needs to be combined with the partially_fixed_candidates under the hood.

As I am not so deep in the DoEStrategy, I only started to work on these improvements. Maybe you @Osburg have time to proceed on it. I assume that you are much faster on this as I am. I was struggling especially on how to combine the candidates and experiments. The rest is in principle done. If you do not have time, just feel free to tell, then I dig deeper ;)

cc: @dlinzner-bcs

@Osburg
Copy link
Collaborator

Osburg commented Feb 7, 2025

Hey Johannes :)

thanks for pointing this out! I can definitely do it, but I cannot promise when I will have time for it (certainly not until the end of next week :/). If it is not urgent for you, we can assign this to me and I'll work on it when I have time for it. If it is urgent, I think it is better to declare this an open task for @dlinzner-bcs, you and me, and the first person who has time and feels like it works on it. What do you think?

Cheers
Aaron

@jduerholt
Copy link
Contributor Author

jduerholt commented Feb 7, 2025

Thanks @Osburg, it is not urgent, if we can get it sorted out somewhere over the next month, it would be great.

Two other questions in this context (where the first is kind of urgent :))

  • For what do you need this sorting here:
    torch.sum(torch.sort(torch.pdist(X))[0][: self.n_experiments]).backward()
    I do not understand it. If not needed, we can also get rid of it. An answer here is a bit more ugent as I try to understand it for trying out something similar.
  • Why do you go here over pandas:
    X = pd.DataFrame(
    It is not needed, or? And creates only overhead, or?

@jduerholt
Copy link
Contributor Author

First question is clarified ;)

@Osburg
Copy link
Collaborator

Osburg commented Feb 7, 2025

Towards the second: I think you are right, this is not necessary and I guess we can remove this.

@dlinzner-bcs
Copy link
Contributor

dlinzner-bcs commented Feb 18, 2025

@jduerholt thank you! Yes, I agree with all your points and will help. I also think that this might be a good starting point to add support for constraining categorical/discrete variables. Currently the validator of all constraints complain if a non-continuous variable is added. My only fear is that this will have larger implications for the rest of the package. What do you think?
cc: @Osburg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants