Skip to content

[DRAFT] fix: skip builtin field names when copying atoms.arrays/info#16

Draft
speckhard wants to merge 2 commits intoLeMaterial:mainfrom
speckhard:fix/ase-bridge-builtin-filter
Draft

[DRAFT] fix: skip builtin field names when copying atoms.arrays/info#16
speckhard wants to merge 2 commits intoLeMaterial:mainfrom
speckhard:fix/ase-bridge-builtin-filter

Conversation

@speckhard
Copy link
Copy Markdown

@speckhard speckhard commented Apr 30, 2026

Summary

_extract_ase_record (from_ase path) AND is_reserved_ase_array_key (to_ase path) both copied/emitted keys for builtin field names without filtering, causing silent duplicate state on round-trip:

  • from_ase: atoms.arrays["forces"] and atoms.info["energy"] would land in custom properties alongside the calculator-derived builtin values.
  • to_ase: a custom property named after a builtin (e.g. via mol.set_property("forces", X)) would land in atoms.arrays["forces"] next to the calculator-attached builtin forces.

This PR fixes both directions.

What changes

atompack-py/python/atompack/ase_bridge.py:

  • _merge_properties (called for atoms.info and the info= override) now skips all _BUILTIN_FIELDS. The existing stress-as-(3,3) builtin override behavior is preserved inside the new _BUILTIN_FIELDS branch.
  • The atoms.arrays loop previously skipped only {"positions", "numbers"}. Now also skips _BUILTIN_FIELDS.
  • atoms.calc.results already filtered _BUILTIN_FIELDS correctly (line 186 on main); no change needed there.

atompack-py/src/molecule_helpers.rs:

  • is_reserved_ase_array_key previously matched only "numbers" | "positions". Now also rejects "energy" | "forces" | "charges" | "velocities" | "cell" | "stress" | "pbc".

Out of scope (deferred)

The set_property side still allows the user to set a custom property named after a builtin (only "stress" is reserved at set_property time today). Tightening set_property to reject all builtin keys is a more invasive behavior change worth its own PR — the present defensive fix already prevents the symmetric duplicate-state on to_ase regardless of what set_property accepts.

Backward compat

  • If anyone was relying on atoms.info["forces"] becoming a custom property after from_ase, that behavior is gone; use the dedicated from_ase(forces=...) kwarg instead.
  • If anyone was relying on mol.set_property("forces", X) then surfacing in atoms.arrays["forces"] after to_ase, that behavior is also gone; the calculator-attached forces are now the only path.
  • No on-disk format change. No public API signature change.

Test plan

Three new tests:

  • test_from_ase_does_not_duplicate_builtins_into_custom_properties — hostile atoms.arrays and atoms.info with builtin-named keys; confirms builtins remain calculator-derived and no leak into custom properties.
  • test_from_ase_info_override_kwarg_filters_builtins — same filter applies to the info= kwarg path of from_ase.
  • test_to_ase_does_not_duplicate_builtins_in_arraysmol.set_property("forces", X) does NOT contaminate atoms.arrays["forces"] after to_ase; legitimate non-builtin custom property still flows.
$ uv run --extra dev --locked --with "maturin>=1.4,<2.0" maturin develop --release
🛠 Installed atompack-db-0.2.1

$ uv run --extra dev --locked pytest tests/test_from_ase.py -v
============================== 15 passed in 0.47s ==============================

$ uv run --extra dev --locked pytest tests/ -q
142 passed, 1 skipped in 7.03s   # 139 baseline + 3 new tests

_extract_ase_record copied keys from atoms.arrays and atoms.info into
custom properties without filtering builtin field names. This caused
silent duplicate state for any molecule whose ASE Atoms object had
"forces" stashed in atoms.arrays or "energy" in atoms.info: the bridge
stored the calculator-derived value in builtins["forces"] AND the
arrays/info value in properties["forces"], producing two divergent
values per builtin field.

The two affected paths in _extract_ase_record:

- atoms.arrays loop only excluded {"positions", "numbers"}. Now also
  skips _BUILTIN_FIELDS (energy/forces/charges/velocities/cell/stress/
  pbc).
- _merge_properties (called for atoms.info and the info-override kwarg)
  only special-cased "stress". Now skips all _BUILTIN_FIELDS, with the
  existing stress-as-(3,3) builtin override behavior preserved.

The atoms.calc.results path (line 186) already filtered _BUILTIN_FIELDS
correctly; this PR brings the other two paths into line.

New test_from_ase_does_not_duplicate_builtins_into_custom_properties
constructs an ASE Atoms object with builtin keys deliberately stashed
into both arrays and info, and asserts the builtins remain
calculator-derived while none of the builtin keys leak into custom
properties.
Reviewer flagged a symmetric duplicate-state path on the to_ase side:
is_reserved_ase_array_key in molecule_helpers.rs only excluded
"numbers"/"positions", so a custom property named after a builtin field
(e.g. mol.set_property("forces", X)) would land in atoms.arrays["forces"]
right next to the calculator-attached builtin forces. Same divergence
class as the from_ase bug closed in the parent commit.

Extended the matches! list to also reject "energy", "forces", "charges",
"velocities", "cell", "stress", "pbc". The set_property side still
allows the user to set a custom property named after a builtin (only
"stress" is reserved at set_property time today), but to_ase will no
longer surface it in atoms.arrays. Tightening set_property to reject all
builtin keys is a more invasive behavior change worth its own PR.

New tests:
- test_from_ase_info_override_kwarg_filters_builtins covers the second
  caller of _merge_properties (the from_ase info= kwarg) which the
  earlier test only reached transitively.
- test_to_ase_does_not_duplicate_builtins_in_arrays sets a hostile
  "forces" custom property and confirms to_ase keeps the calculator-
  attached forces as the source of truth, while a legitimate custom key
  ("tags") still flows into atoms.arrays.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant