🤖 BO-ICL: Bayesian Optimization using in-context learning

BO-ICL does regression with uncertainties using frozen Large Language Models by using token probabilities. It uses LangChain to select examples to create in-context learning prompts from training data. By selecting examples, it can consider more training data than it fits in the model's context window. Being able to predict uncertainty, allow the employment of interesting techniques such as Bayesian Optimization.

Table of content

BO-ICL

Install 📦

boicl can simply be installed using pip:

pip install boicl

Some additional requirements are needed to use the Gaussian Process Regressor (GPR) module. They can also be installed using pip:

pip install boicl[gpr]

Usage 💻

You need to set up your OpenAI API key in order to use BO-ICL. You can do that using the os Python library:

import os
os.environ["OPENAI_API_KEY"] = "<your-key-here>"

Quickstart 🔥

boicl provides a simple interface to use the model.

# Create the model object
asktell = boicl.AskTellFewShotTopk()

# Tell some points to the model
asktell.tell("1-bromopropane", -1.730)
asktell.tell("1-bromopentane", -3.080)
asktell.tell("1-bromooctane", -5.060)
asktell.tell("1-bromonaphthalene", -4.35)

# Make a prediction
yhat = asktell.predict("1-bromobutane")
print(yhat.mean(), yhat.std())

This prediction returns $-2.92 \pm 1.27$.

Further improvements can be done by using Bayesian Optimization.

# Create a list of examples
pool_list = [
  "1-bromoheptane",
  "1-bromohexane",
  "1-bromo-2-methylpropane",
  "butan-1-ol"
]

# Create the pool object
pool=boicl.Pool(pool_list)

# Ask the next point
asktell.ask(pool)

# Output:
(['1-bromo-2-methylpropane'], [-1.284916344093158], [-1.92])

Where the first value is the selected point, the second value is the value of the acquisition function, and the third value is the predicted mean.

Let's tell this point to the model with its correct label and make a prediction:

asktell.tell("1-bromo-2-methylpropane", -2.430)

yhat = asktell.predict("1-bromobutane")
print(yhat.mean(), yhat.std())

This prediction returns $-1.866 \pm 0.012$. Which is closer to the label of -2.370 for the 1-bromobutane and the uncertainty also decreased.

Customising the model

boicl provides different models depending on the prompt you want to use. One example of usage can be seen in the following:

import boicl
asktell = boicl.AskTellFewShotTopk(
  x_formatter=lambda x: f"iupac name {x}",
  y_name="measured log solubility in mols per litre",
  y_formatter=lambda y: f"{y:.2f}",
  model="gpt-4",
  selector_k=5,
  temperature=0.7,
)

Other arguments can be used to customize the prompt (prefix, prompt_template, suffix) and the in-context learning procedure (use_quantiles, n_quantiles). Additionally, we implemented other models. A brief list can be seen below:

AskTellFewShotTopk;
AskTellFinetuning;
AskTellRidgeKernelRegression;
AskTellGPR;
AskTellNearestNeighbor.

Refer to the notebooks available in the paper directory to see examples of how to use boicl and the paper for a detailed description of the classes.

Inverse design

Aiming to propose new data, boicl implements another approach to generate data. After following a similar procedure to tell datapoints to the model, the inv_predict can be used to do an inverse prediction. For carrying an inverse design out, we query the label we want and the model should generate a data that corresponds to that label:

data_x = [
"A 15 wt% tungsten carbide catalyst was prepared with Fe dopant metal at 0.5 wt% and carburized at 835 °C. The reaction was run at 280 °C, resulting in a CO yield of",
"A 15 wt% tungsten carbide catalyst was prepared with Fe dopant metal at 0.5 wt% and carburized at 835 °C. The reaction was run at 350 °C, resulting in a CO yield of",
...
]

data_y = [
1.66,
3.03,
...
]


for i in range(n):
  asktell.tell(data_x[i], data_y[i]

asktell.inv_predict(20.0)

The data for that is available in the paper directory. This generated the following procedure:

the synthesis procedure:"A 30 wt% tungsten carbide catalyst was prepared with Cu dopant metal at 5 wt% and carburized at 835 C. The reaction was run at 350 ºC"

Citation

Please, cite Ramos et al.:

@misc{ramos2023bayesian,
      title={Bayesian Optimization of Catalysts With In-context Learning},
      author={Mayk Caldas Ramos and Shane S. Michtavy and Marc D. Porosoff and Andrew D. White},
      year={2023},
      eprint={2304.05341},
      archivePrefix={arXiv},
      primaryClass={physics.chem-ph}
}

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github/workflows		.github/workflows
boicl		boicl
paper		paper
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
dev-requirements.txt		dev-requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 BO-ICL: Bayesian Optimization using in-context learning

Table of content

Install 📦

Usage 💻

Quickstart 🔥

Customising the model

Inverse design

Citation

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 BO-ICL: Bayesian Optimization using in-context learning

Table of content

Install 📦

Usage 💻

Quickstart 🔥

Customising the model

Inverse design

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages