Caption molecules and materials for pretraining for neural networks.
ChemCaption is a tool designed to generate prompts for molecular features to train neural networks.
Here is a quick example of one of the featurizers designed to count the number of elements in a molecule.
from chemcaption.presets import ORGANIC
from chemcaption.molecules import SMILESMolecule
from chemcaption.featurize.composition import ElementCountFeaturizer
# Molecule we want to featurize
molecule = SMILESMolecule("C1(Br)=CC=CC=C1Br")
# We can eather specify the symbol or the full name
el_count_name = ElementCountFeaturizer(['carbon', 'hydrogen', 'oxygen', 'bromine'])
# Featurize the molecule
prompt = el_count_name.text_featurize(molecule=molecule)The generate prompt has the following QA pair.
Question: What are the atom counts of Carbon, Hydrogen, Hidrogen, and Bromine of the molecule with SMILES Brc1ccccc1Br?
Answer: 6, 4, 0, and 2
For more details and all other available featurizers please visit the documentation.
The most recent release can be installed from PyPI with:
pip install chemcaptionThe most recent code and data can be installed directly from GitHub with:
pip install git+https://github.com/lamalab-org/chem-captionSome of the ChemCaption featurizers are dependent on morfeus and might require additional dependencies to be installed. You can see all the optional dependencies for morfeus-ml here
Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.
The code in this package is licensed under the MIT License.
This package was created with @audreyfeldroy's cookiecutter package using @cthoyt's cookiecutter-snekpack template.
See developer instructions
The final section of the README is for if you want to get involved by making a code contribution.
To install in development mode, use the following:
$ git clone git+https://github.com/lamalab-org/chem-caption
$ cd chem-caption
$ pip install -e .After cloning the repository and installing nox with pip install nox, the unit tests in the tests/ folder can be
run reproducibly with:
$ noxAdditionally, these tests are automatically re-run with each commit in a GitHub Action.
The documentation can be built locally using the following:
$ git clone git+https://github.com/lamalab-org/chem-caption
$ cd chem-caption
$ nox --session docs
$ open docs/build/html/index.htmlThe documentation automatically installs the package as well as the docs
extra specified in the setup.cfg. sphinx plugins
like texext can be added there. Additionally, they need to be added to the
extensions list in docs/source/conf.py.
After installing the package in development mode and installing
nox with pip install nox, the commands for making a new release are contained within the finish environment
in noxfile.py. Run the following from the shell:
$ nox --session finishThis script does the following:
- Uses Bump2Version to switch the version number in the
setup.cfg,src/chemcaption/version.py, anddocs/source/conf.pyto not have the-devsuffix - Packages the code in both a tar archive and a wheel using
build - Uploads to PyPI using
twine. Be sure to have a.pypircfile configured to avoid the need for manual input at this step - Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
- Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can
use
nox -e bumpversion -- minorafter.
