One liner descriptors added#554
Conversation
97be849 to
a02cf3f
Compare
| @@ -0,0 +1,59 @@ | |||
| """Atom count descriptors implemented with direct RDKit atom access. | |||
There was a problem hiding this comment.
We always start docstrings from new line
| FEATURE_NAMES = [ | ||
| "nH", | ||
| "nB", | ||
| "nC", | ||
| "nN", | ||
| "nO", | ||
| "nS", | ||
| "nP", | ||
| "nF", | ||
| "nCl", | ||
| "nBr", | ||
| "nI", | ||
| "nX", | ||
| ] |
| values.extend( | ||
| sum(atomic_num == _ELEMENTS[name] for atomic_num in atomic_nums) | ||
| for name in FEATURE_NAMES[1:-1] | ||
| ) |
There was a problem hiding this comment.
This is inefficient. Just make a list of atomic numbers above (with comments for elements) and iterate over it here.
| Hydrogen counts use RDKit's total hydrogen count on each atom, so implicit | ||
| hydrogens are included without adding explicit hydrogen atoms to the | ||
| molecule. Heavy-element counts are taken from the molecule's atom list. |
There was a problem hiding this comment.
Technical details, unnecessary in a docstring
|
|
||
| def calc(mol: Mol) -> tuple[np.ndarray, list[str]]: | ||
| """ | ||
| Compute the remaining Mordred atom count descriptors. |
There was a problem hiding this comment.
Why "remaining"? Those are just atom counts, and we should have all general atom counts in a single module
| def _default_int_dict() -> defaultdict[int, int]: | ||
| """ | ||
| Create a nested default dictionary for carbon type counts. | ||
| """ | ||
| return defaultdict(int) |
There was a problem hiding this comment.
Don't call a function like that (overhead), just use defaultdict(int) where necessary
| "nAtom", | ||
| "nHeavyAtom", | ||
| "nSpiro", | ||
| "nBridgehead", | ||
| "nHetero", |
There was a problem hiding this comment.
This looks perfect for atom_count.py
| "nSpiro", | ||
| "nBridgehead", | ||
| "nHetero", | ||
| "FCSP3", |
There was a problem hiding this comment.
returns the fraction of C atoms that are SP3 hybridized - you calculate this in carbon_types.py anyway
There was a problem hiding this comment.
I'm not sure which one will be faster, though, check that
| _safe_value( | ||
| rdMolDescriptors.CalcNumRings, | ||
| mol_regular, | ||
| ), | ||
| _safe_value( | ||
| rdMolDescriptors.CalcNumHeterocycles, | ||
| mol_regular, | ||
| ), | ||
| _safe_value( | ||
| rdMolDescriptors.CalcNumAromaticRings, | ||
| mol_regular, | ||
| ), | ||
| _safe_value(rdMolDescriptors.CalcNumAromaticHeterocycles, mol_regular), | ||
| _safe_value(rdMolDescriptors.CalcNumAliphaticRings, mol_regular), | ||
| _safe_value(rdMolDescriptors.CalcNumAliphaticHeterocycles, mol_regular), | ||
| _safe_value(rdMolDescriptors.CalcNumRotatableBonds, mol_regular), |
There was a problem hiding this comment.
You calculate this in ring_count.py, right?
| _safe_value(rdMolDescriptors.CalcPMI3, mol_with_3d_conformer), | ||
| _safe_value(rdMolDescriptors.CalcPMI2, mol_with_3d_conformer), | ||
| _safe_value(rdMolDescriptors.CalcPMI1, mol_with_3d_conformer), |
0ea4a46 to
41f47cd
Compare
Changes
Short description of changes
Checklist before requesting a review
make test-coverage)make docsand seedocs/_build/index.html)