This repository contains code for predicting the reduction potentials of molecular species in various solvents. This approach enables simultaneous learning of electron affinity (EA) and solvent-dependent corrections for redox potentials and can generalise to previously unseen solvents.
This implementation supports two modes of training, with and without the explicit conditioning for the solvent-independent terms (-EA). This can be set up in run_experiment.py or train.py, by setting the explicit_ea = False
Place your CSV file (e.g., ReSolvedData.csv) in the project folder. It should contain columns for:
smiles(string)EA(float)RP_ACN,RP_H2O,RP_THF,RP_DMSO,RP_DMF(float) — these are solvent-dependent properties.
- Modify hyperparameters in
run_experiment.pyortrain.pyif desired (e.g.,num_layers,emb_dim,epochs). - Run:
python run_experiment.py
best_model.pth: the best-scoring model checkpoint.loss_curve.png: training vs. validation loss plot.- Scatter plots of predicted vs. reference values for each solvent.
- CSV files (e.g.,
train_data.csv,test_data.csv) with ground-truth and predicted values.
- Provide the path to the weigths in eval_trained.py or use the provided weigths in /weights.
- Run:
python eval_trained.py
We estimate the solution-phase free energy change for the reduction:
using a thermodynamic cycle:
Here:
-
$\Delta\Delta G_{\text{solution}}$ is the gas-phase electron attachment free energy. -
$\Delta G_{\text{solv},A}$ and$\Delta G_{\text{solv},A^-}$ are the solvation free energies for neutral$\ A$ and anionic$\ A^-$ , respectively.
The electrode potential
where
In this codebase, node (atom) and edge (bond) features are updated using multiple rounds of message passing. Each MPNN layer:
- Computes messages from neighboring nodes and edges.
- Produces updated node features, which we combine residually with the previous layer’s node features.
- Similarly, the edge states are updated in a residual fashion.
After the final layer, we concatenate the final node and edge embeddings.
We pass the concatenated node-edge feature set through two parallel Set Transformer aggregations:
- One dedicated to predicting electron affinity (EA).
- Another to incorporate solvent information (via learnable embeddings of each solvent’s dielectric constant and refractive index) and generate the solvent-dependent correction.
By concatenating these aggregated representations with the solvent embeddings, the model predicts:
- The negative of the EA.
- The combined (-EA + solvent contribution) for each solvent.
Thus, each forward pass yields a multi-output prediction vector:
resolved_project/
├── Evolving_EA.ipynb # Example of generation of new molecules in a traget range of EA (evomol needed)
├── data_utils.py # Reading CSV, dataset creation, SMILES->PyG conversion
├── features.py # RDKit-based atom and bond feature extraction
├── model/
│ ├── mpnn_layer.py # Custom MPNN layer (MessagePassing in PyG)
│ ├── readout.py # Set Transformer-based readout + solvent embeddings
│ └── mpnn_model.py # Full model combining MPNN layers + readout
├── train.py # Setting up the model & hyperparameters, evalauting the training
├── train_loop.py # Main training loop
├── evaluate.py # Evaluation loop, metrics and plotting tools
├── eval_trained.py # Loading the weights, evaluating and plotting
└── run_experiment.py # Script orchestrating data loading, training, and evaluation
