One liner descriptors added by pkacprzak5 · Pull Request #554 · MLCIL/scikit-fingerprints

pkacprzak5 · 2026-04-27T17:02:16Z

Changes

Short description of changes

Checklist before requesting a review

Docstrings added/updated in public functions and classes
Tests added, reasonable test coverage (at least ~90%, make test-coverage)
Sphinx docs added/updated and render properly (make docs and see docs/_build/index.html)

j-adamczyk · 2026-04-29T07:25:56Z

@@ -0,0 +1,59 @@
+"""Atom count descriptors implemented with direct RDKit atom access.


We always start docstrings from new line

j-adamczyk · 2026-04-29T07:26:14Z

+FEATURE_NAMES = [
+    "nH",
+    "nB",
+    "nC",
+    "nN",
+    "nO",
+    "nS",
+    "nP",
+    "nF",
+    "nCl",
+    "nBr",
+    "nI",
+    "nX",
+]


Prefer "num_" for clarity

j-adamczyk · 2026-04-29T07:27:51Z

+    values.extend(
+        sum(atomic_num == _ELEMENTS[name] for atomic_num in atomic_nums)
+        for name in FEATURE_NAMES[1:-1]
+    )


This is inefficient. Just make a list of atomic numbers above (with comments for elements) and iterate over it here.

j-adamczyk · 2026-04-29T07:28:01Z

+    Hydrogen counts use RDKit's total hydrogen count on each atom, so implicit
+    hydrogens are included without adding explicit hydrogen atoms to the
+    molecule. Heavy-element counts are taken from the molecule's atom list.


Technical details, unnecessary in a docstring

j-adamczyk · 2026-04-29T07:28:21Z

+
+def calc(mol: Mol) -> tuple[np.ndarray, list[str]]:
+    """
+    Compute the remaining Mordred atom count descriptors.


Why "remaining"? Those are just atom counts, and we should have all general atom counts in a single module

j-adamczyk · 2026-04-29T08:33:38Z

+def _default_int_dict() -> defaultdict[int, int]:
+    """
+    Create a nested default dictionary for carbon type counts.
+    """
+    return defaultdict(int)


Don't call a function like that (overhead), just use defaultdict(int) where necessary

j-adamczyk · 2026-04-29T08:34:16Z

+    "nAtom",
+    "nHeavyAtom",
+    "nSpiro",
+    "nBridgehead",
+    "nHetero",


This looks perfect for atom_count.py

j-adamczyk · 2026-04-29T08:35:15Z

+    "nSpiro",
+    "nBridgehead",
+    "nHetero",
+    "FCSP3",


returns the fraction of C atoms that are SP3 hybridized - you calculate this in carbon_types.py anyway

I'm not sure which one will be faster, though, check that

j-adamczyk · 2026-04-29T08:36:12Z

+            _safe_value(
+                rdMolDescriptors.CalcNumRings,
+                mol_regular,
+            ),
+            _safe_value(
+                rdMolDescriptors.CalcNumHeterocycles,
+                mol_regular,
+            ),
+            _safe_value(
+                rdMolDescriptors.CalcNumAromaticRings,
+                mol_regular,
+            ),
+            _safe_value(rdMolDescriptors.CalcNumAromaticHeterocycles, mol_regular),
+            _safe_value(rdMolDescriptors.CalcNumAliphaticRings, mol_regular),
+            _safe_value(rdMolDescriptors.CalcNumAliphaticHeterocycles, mol_regular),
+            _safe_value(rdMolDescriptors.CalcNumRotatableBonds, mol_regular),


You calculate this in ring_count.py, right?

j-adamczyk · 2026-04-29T08:36:24Z

+        _safe_value(rdMolDescriptors.CalcPMI3, mol_with_3d_conformer),
+        _safe_value(rdMolDescriptors.CalcPMI2, mol_with_3d_conformer),
+        _safe_value(rdMolDescriptors.CalcPMI1, mol_with_3d_conformer),


Why this ordering?

pkacprzak5 requested review from j-adamczyk, mjste and my-alaska as code owners April 27, 2026 17:02

pkacprzak5 force-pushed the pk-rdkit-descriptors branch from 97be849 to a02cf3f Compare April 27, 2026 18:15

j-adamczyk requested changes Apr 29, 2026

View reviewed changes

pkacprzak5 added 9 commits April 30, 2026 19:13

One liner descriptors added

01b348a

Apply pre-commit formatting

6b120e8

Add remaining Mordred atom count descriptors

d1efaeb

Add remaining Mordred carbon type descriptors

60ef314

Add Mordred rotatable bond ratio descriptor

9ab261b

Add remaining Mordred ring count descriptors

853f17b

Avoid lambda wrappers in RDKit descriptors

ccc6dbe

Fix 2D descriptor hydrogen handling

f2c93d3

Address RDKit descriptor review

41f47cd

pkacprzak5 force-pushed the pk-rdkit-descriptors branch from 0ea4a46 to 41f47cd Compare April 30, 2026 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One liner descriptors added#554

One liner descriptors added#554
pkacprzak5 wants to merge 9 commits intomordredfrom
pk-rdkit-descriptors

pkacprzak5 commented Apr 27, 2026

Uh oh!

j-adamczyk Apr 29, 2026

Uh oh!

j-adamczyk Apr 29, 2026

Uh oh!

j-adamczyk Apr 29, 2026

Uh oh!

j-adamczyk Apr 29, 2026

Uh oh!

j-adamczyk Apr 29, 2026

Uh oh!

j-adamczyk Apr 29, 2026

Uh oh!

j-adamczyk Apr 29, 2026

Uh oh!

j-adamczyk Apr 29, 2026

Uh oh!

j-adamczyk Apr 29, 2026

Uh oh!

j-adamczyk Apr 29, 2026

Uh oh!

j-adamczyk Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,59 @@
		"""Atom count descriptors implemented with direct RDKit atom access.

Conversation

pkacprzak5 commented Apr 27, 2026

Changes

Checklist before requesting a review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants