topology featurizers debug by XiaoqZhang · Pull Request #460 · lamalab-org/mofdscribe

XiaoqZhang · 2024-08-14T08:46:47Z

Hi, I found a bug in the topology featurizers. The original way of splitting atom_types will split 'Cu' to ['C', 'u'], for example. Besides, the tuple type is error-prone to the last , as highlighted by the red circle in the figure attached.

Thus, I made several changes accordingly:

change the data type of atom_types from tuple to list;
add a function filter_element_for_ph, which is a modified version of filter_element for persistence homology. This function is added instead of modifying the original function because filter_element is also called by mofdscribe.featurizers.chemistry.AMD
uncomment the two lines: https://github.com/kjappelbaum/mofdscribe/blob/44c44f7b9d847105073432b4fb66ad750e6e63b2/src/mofdscribe/featurizers/topology/_tda_helpers.py#L107 and https://github.com/kjappelbaum/mofdscribe/blob/44c44f7b9d847105073432b4fb66ad750e6e63b2/src/mofdscribe/featurizers/topology/_tda_helpers.py#L108

Summary by Sourcery

Fix the atom type splitting issue in topology featurizers by changing the data type from tuple to list and introduce a new function filter_element_for_ph for persistence homology. Uncomment previously commented lines in _tda_helpers.py to restore functionality.

Bug Fixes:

Fix the incorrect splitting of atom types in topology featurizers by changing the data type from tuple to list.

Enhancements:

Introduce a new function filter_element_for_ph to handle element filtering specifically for persistence homology, ensuring compatibility with existing functionality.

sourcery-ai · 2024-08-14T08:46:55Z

Reviewer's Guide by Sourcery

This pull request addresses a bug in the topology featurizers of the mofdscribe library. The main changes involve modifying how atom types are handled, introducing a new function for filtering elements in persistence homology calculations, and adjusting data types to improve robustness. The changes span across multiple files in the project, primarily focusing on the topology-related modules.

File-Level Changes

Files	Changes
`src/mofdscribe/featurizers/utils/substructures.py`	Introduced a new function 'filter_element_for_ph' specifically for persistence homology calculations
`src/mofdscribe/featurizers/topology/ph_image.py` `src/mofdscribe/featurizers/topology/ph_hist.py` `src/mofdscribe/featurizers/topology/ph_stats.py` `src/mofdscribe/featurizers/topology/ph_vect.py`	Changed the data type of 'atom_types' from tuple to list in multiple classes
`src/mofdscribe/featurizers/topology/_tda_helpers.py`	Modified the 'get_persistent_images_for_structure' function to use the new 'filter_element_for_ph' function
`src/mofdscribe/featurizers/topology/_tda_helpers.py`	Uncommented two lines in the 'make_supercell' function to include original coordinates and elements
`src/mofdscribe/featurizers/topology/ph_image.py` `src/mofdscribe/featurizers/topology/ph_hist.py` `src/mofdscribe/featurizers/topology/ph_stats.py` `src/mofdscribe/featurizers/topology/ph_vect.py`	Updated the handling of atom types in various classes to accommodate the new list data type

Tips

Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
Continue your discussion with Sourcery by replying directly to review comments.
You can change your review settings at any time by accessing your dashboard:
- Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
- Change the review language;
You can always contact us if you have any questions or feedback.

sourcery-ai

Hey @XiaoqZhang - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.}

sourcery-ai · 2024-08-14T08:48:12Z

        return Molecule.from_sites(keep_sites)
+
+
+def filter_element_for_ph(


issue (complexity): Consider refactoring to consolidate logic into a single function.

The new code introduces unnecessary complexity by duplicating logic in the filter_element_for_ph function, which increases maintenance overhead. The use of elements_group as a tuple doesn't add functional value and makes the code harder to read. Additionally, the handling of elements is inconsistent, which could lead to bugs. Consider refactoring to consolidate the logic into a single function, using list comprehensions for clarity, and ensuring consistent handling of elements. This will improve readability and maintainability.

can you make a suggestion?

Certainly! To address the complexity issue, you could refactor the filter_element and filter_element_for_ph functions to share common logic. Here's a suggestion:

Create a Helper Function: Extract the common logic into a helper function that both filter_element and filter_element_for_ph can call. This will reduce duplication and make the code easier to maintain.

Use List Comprehensions: If possible, use list comprehensions to make the filtering logic more concise and readable.

Consistent Handling of elements: Ensure that the handling of elements is consistent across both functions. This might involve standardizing the input format or processing steps.

Here's a rough example of how you might structure the refactoring:

def filter_elements_common(sites, elements, elements_group): return [site for site in sites if site.specie.symbol in elements or site.specie.symbol in elements_group] def filter_element(structure, elements, elements_group=()): keep_sites = filter_elements_common(structure.sites, elements, elements_group) return Structure.from_sites(keep_sites) if isinstance(structure, Structure) else Molecule.from_sites(keep_sites) def filter_element_for_ph(structure, elements, elements_group=()): # If there's any specific logic for persistence homology, handle it here keep_sites = filter_elements_common(structure.sites, elements, elements_group) return Structure.from_sites(keep_sites) if isinstance(structure, Structure) else Molecule.from_sites(keep_sites)

This approach keeps the core filtering logic in one place, making it easier to update and maintain. If filter_element_for_ph requires additional specific logic, you can still handle that separately within its function. Let me know if you need further clarification or assistance!

kjappelbaum · 2024-08-18T21:03:50Z

interesting, thanks for looking into it!

kjappelbaum · 2024-08-18T21:05:41Z

I agree that deepsource is not heavily used at the moment, but why are those changes needed for this PR?

kjappelbaum · 2024-08-18T21:07:18Z

+    xyz_periodic_copies.append(coords)
+    element_copies.append(np.array(elements).reshape(-1,1))


ok, seems like a good catch, because the x==y==z==0 condition is skipped atm

kjappelbaum · 2024-08-18T21:08:36Z

    def __init__(
        self,
-        atom_types: Tuple[str] = (
+        atom_types: List[str] = [


Tuple was originally used because it is immutable and hence a better default for arguments.

But since we do not change the atom_types this is probably a safe choice

kjappelbaum · 2024-08-18T21:23:25Z

+    elements_ = []
+    elements_group = (elements,)
+    for atom_type in elements_group:


i'm not sure I understand this change. You are certainly right that the old code behaves in an unexpected way for elements with more than two symbols in the letter. But wouldn't then the clearer fix be something else? For example, always split by - first and then handle other things

topology featurizers debug

eabe0dc

sourcery-ai Bot reviewed Aug 14, 2024

View reviewed changes

ci: update .deepsource.toml

376c598

kjappelbaum reviewed Aug 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

topology featurizers debug#460

topology featurizers debug#460
XiaoqZhang wants to merge 2 commits intolamalab-org:mainfrom
XiaoqZhang:main

XiaoqZhang commented Aug 14, 2024 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Aug 14, 2024 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Aug 14, 2024

Uh oh!

kjappelbaum Aug 18, 2024

Uh oh!

sourcery-ai Bot Aug 18, 2024

Uh oh!

kjappelbaum commented Aug 18, 2024

Uh oh!

kjappelbaum Aug 18, 2024

Uh oh!

kjappelbaum Aug 18, 2024

Uh oh!

kjappelbaum Aug 18, 2024

Uh oh!

kjappelbaum Aug 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return Molecule.from_sites(keep_sites)


		def filter_element_for_ph(

		xyz_periodic_copies.append(coords)
		element_copies.append(np.array(elements).reshape(-1,1))

Conversation

XiaoqZhang commented Aug 14, 2024 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide by Sourcery

File-Level Changes

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

kjappelbaum Aug 18, 2024

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Aug 18, 2024

Choose a reason for hiding this comment

Uh oh!

kjappelbaum commented Aug 18, 2024

Uh oh!

kjappelbaum Aug 18, 2024

Choose a reason for hiding this comment

Uh oh!

kjappelbaum Aug 18, 2024

Choose a reason for hiding this comment

Uh oh!

kjappelbaum Aug 18, 2024

Choose a reason for hiding this comment

Uh oh!

kjappelbaum Aug 18, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

XiaoqZhang commented Aug 14, 2024 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Aug 14, 2024 •

edited

Loading