Skip to content

One liner descriptors added#554

Open
pkacprzak5 wants to merge 9 commits intomordredfrom
pk-rdkit-descriptors
Open

One liner descriptors added#554
pkacprzak5 wants to merge 9 commits intomordredfrom
pk-rdkit-descriptors

Conversation

@pkacprzak5
Copy link
Copy Markdown
Collaborator

Changes

Short description of changes

Checklist before requesting a review

  • Docstrings added/updated in public functions and classes
  • Tests added, reasonable test coverage (at least ~90%, make test-coverage)
  • Sphinx docs added/updated and render properly (make docs and see docs/_build/index.html)

@@ -0,0 +1,59 @@
"""Atom count descriptors implemented with direct RDKit atom access.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always start docstrings from new line

Comment on lines +12 to +25
FEATURE_NAMES = [
"nH",
"nB",
"nC",
"nN",
"nO",
"nS",
"nP",
"nF",
"nCl",
"nBr",
"nI",
"nX",
]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer "num_" for clarity

Comment on lines +53 to +56
values.extend(
sum(atomic_num == _ELEMENTS[name] for atomic_num in atomic_nums)
for name in FEATURE_NAMES[1:-1]
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inefficient. Just make a list of atomic numbers above (with comments for elements) and iterate over it here.

Comment on lines +46 to +48
Hydrogen counts use RDKit's total hydrogen count on each atom, so implicit
hydrogens are included without adding explicit hydrogen atoms to the
molecule. Heavy-element counts are taken from the molecule's atom list.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technical details, unnecessary in a docstring


def calc(mol: Mol) -> tuple[np.ndarray, list[str]]:
"""
Compute the remaining Mordred atom count descriptors.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "remaining"? Those are just atom counts, and we should have all general atom counts in a single module

Comment on lines +79 to +83
def _default_int_dict() -> defaultdict[int, int]:
"""
Create a nested default dictionary for carbon type counts.
"""
return defaultdict(int)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't call a function like that (overhead), just use defaultdict(int) where necessary

Comment on lines +26 to +30
"nAtom",
"nHeavyAtom",
"nSpiro",
"nBridgehead",
"nHetero",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks perfect for atom_count.py

"nSpiro",
"nBridgehead",
"nHetero",
"FCSP3",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns the fraction of C atoms that are SP3 hybridized - you calculate this in carbon_types.py anyway

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which one will be faster, though, check that

Comment on lines +149 to +164
_safe_value(
rdMolDescriptors.CalcNumRings,
mol_regular,
),
_safe_value(
rdMolDescriptors.CalcNumHeterocycles,
mol_regular,
),
_safe_value(
rdMolDescriptors.CalcNumAromaticRings,
mol_regular,
),
_safe_value(rdMolDescriptors.CalcNumAromaticHeterocycles, mol_regular),
_safe_value(rdMolDescriptors.CalcNumAliphaticRings, mol_regular),
_safe_value(rdMolDescriptors.CalcNumAliphaticHeterocycles, mol_regular),
_safe_value(rdMolDescriptors.CalcNumRotatableBonds, mol_regular),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You calculate this in ring_count.py, right?

Comment on lines +185 to +187
_safe_value(rdMolDescriptors.CalcPMI3, mol_with_3d_conformer),
_safe_value(rdMolDescriptors.CalcPMI2, mol_with_3d_conformer),
_safe_value(rdMolDescriptors.CalcPMI1, mol_with_3d_conformer),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this ordering?

@pkacprzak5 pkacprzak5 force-pushed the pk-rdkit-descriptors branch from 0ea4a46 to 41f47cd Compare April 30, 2026 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants