[DRAFT] fix: skip builtin field names when copying atoms.arrays/info#16
Draft
speckhard wants to merge 2 commits intoLeMaterial:mainfrom
Draft
[DRAFT] fix: skip builtin field names when copying atoms.arrays/info#16speckhard wants to merge 2 commits intoLeMaterial:mainfrom
speckhard wants to merge 2 commits intoLeMaterial:mainfrom
Conversation
_extract_ase_record copied keys from atoms.arrays and atoms.info into
custom properties without filtering builtin field names. This caused
silent duplicate state for any molecule whose ASE Atoms object had
"forces" stashed in atoms.arrays or "energy" in atoms.info: the bridge
stored the calculator-derived value in builtins["forces"] AND the
arrays/info value in properties["forces"], producing two divergent
values per builtin field.
The two affected paths in _extract_ase_record:
- atoms.arrays loop only excluded {"positions", "numbers"}. Now also
skips _BUILTIN_FIELDS (energy/forces/charges/velocities/cell/stress/
pbc).
- _merge_properties (called for atoms.info and the info-override kwarg)
only special-cased "stress". Now skips all _BUILTIN_FIELDS, with the
existing stress-as-(3,3) builtin override behavior preserved.
The atoms.calc.results path (line 186) already filtered _BUILTIN_FIELDS
correctly; this PR brings the other two paths into line.
New test_from_ase_does_not_duplicate_builtins_into_custom_properties
constructs an ASE Atoms object with builtin keys deliberately stashed
into both arrays and info, and asserts the builtins remain
calculator-derived while none of the builtin keys leak into custom
properties.
Reviewer flagged a symmetric duplicate-state path on the to_ase side:
is_reserved_ase_array_key in molecule_helpers.rs only excluded
"numbers"/"positions", so a custom property named after a builtin field
(e.g. mol.set_property("forces", X)) would land in atoms.arrays["forces"]
right next to the calculator-attached builtin forces. Same divergence
class as the from_ase bug closed in the parent commit.
Extended the matches! list to also reject "energy", "forces", "charges",
"velocities", "cell", "stress", "pbc". The set_property side still
allows the user to set a custom property named after a builtin (only
"stress" is reserved at set_property time today), but to_ase will no
longer surface it in atoms.arrays. Tightening set_property to reject all
builtin keys is a more invasive behavior change worth its own PR.
New tests:
- test_from_ase_info_override_kwarg_filters_builtins covers the second
caller of _merge_properties (the from_ase info= kwarg) which the
earlier test only reached transitively.
- test_to_ase_does_not_duplicate_builtins_in_arrays sets a hostile
"forces" custom property and confirms to_ase keeps the calculator-
attached forces as the source of truth, while a legitimate custom key
("tags") still flows into atoms.arrays.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_extract_ase_record(from_asepath) ANDis_reserved_ase_array_key(to_asepath) both copied/emitted keys for builtin field names without filtering, causing silent duplicate state on round-trip:atoms.arrays["forces"]andatoms.info["energy"]would land in custompropertiesalongside the calculator-derived builtin values.mol.set_property("forces", X)) would land inatoms.arrays["forces"]next to the calculator-attached builtin forces.This PR fixes both directions.
What changes
atompack-py/python/atompack/ase_bridge.py:_merge_properties(called foratoms.infoand theinfo=override) now skips all_BUILTIN_FIELDS. The existingstress-as-(3,3) builtin override behavior is preserved inside the new_BUILTIN_FIELDSbranch.atoms.arraysloop previously skipped only{"positions", "numbers"}. Now also skips_BUILTIN_FIELDS.atoms.calc.resultsalready filtered_BUILTIN_FIELDScorrectly (line 186 on main); no change needed there.atompack-py/src/molecule_helpers.rs:is_reserved_ase_array_keypreviously matched only"numbers" | "positions". Now also rejects"energy" | "forces" | "charges" | "velocities" | "cell" | "stress" | "pbc".Out of scope (deferred)
The
set_propertyside still allows the user to set a custom property named after a builtin (only"stress"is reserved at set_property time today). Tighteningset_propertyto reject all builtin keys is a more invasive behavior change worth its own PR — the present defensive fix already prevents the symmetric duplicate-state onto_aseregardless of whatset_propertyaccepts.Backward compat
atoms.info["forces"]becoming a custom property afterfrom_ase, that behavior is gone; use the dedicatedfrom_ase(forces=...)kwarg instead.mol.set_property("forces", X)then surfacing inatoms.arrays["forces"]afterto_ase, that behavior is also gone; the calculator-attached forces are now the only path.Test plan
Three new tests:
test_from_ase_does_not_duplicate_builtins_into_custom_properties— hostileatoms.arraysandatoms.infowith builtin-named keys; confirms builtins remain calculator-derived and no leak into custom properties.test_from_ase_info_override_kwarg_filters_builtins— same filter applies to theinfo=kwarg path offrom_ase.test_to_ase_does_not_duplicate_builtins_in_arrays—mol.set_property("forces", X)does NOT contaminateatoms.arrays["forces"]afterto_ase; legitimate non-builtin custom property still flows.