Hi,
In kermt/util/utils.py, the filter_invalid_smiles function should filter out invalid SMILES. However, when the SMILES string is invalid, mol = Chem.MolFromSmiles(datapoint.smiles) line will actually return None, and the next line, if mol.GetNumHeavyAtoms() == 0: fails because mol is None. This can be prevented by adding the following 3 lines:
def filter_invalid_smiles(data: MoleculeDataset) -> MoleculeDataset:
"""
Filters out invalid SMILES.
:param data: A MoleculeDataset.
:return: A MoleculeDataset with only valid molecules.
"""
datapoint_list = []
for idx, datapoint in enumerate(data):
if datapoint.smiles == '':
print(f'invalid smiles {idx}: {datapoint.smiles}')
continue
mol = Chem.MolFromSmiles(datapoint.smiles)
######################## NEW LINES ########################
if mol is None:
print(f'invalid smiles parse {idx}: {datapoint.smiles}')
continue
######################## NEW LINES ########################
if mol.GetNumHeavyAtoms() == 0:
print(f'invalid heavy {idx}')
continue
datapoint_list.append(datapoint)
return MoleculeDataset(datapoint_list)
Some SMILES strings that were valid with a different version of RDKit were encountering error when I was running KERMT. Hope this can make the filtering function more robust.
Hi,
In
kermt/util/utils.py, thefilter_invalid_smilesfunction should filter out invalid SMILES. However, when the SMILES string is invalid,mol = Chem.MolFromSmiles(datapoint.smiles)line will actually return None, and the next line,if mol.GetNumHeavyAtoms() == 0:fails becausemolisNone. This can be prevented by adding the following 3 lines:Some SMILES strings that were valid with a different version of RDKit were encountering error when I was running KERMT. Hope this can make the filtering function more robust.