add tutorial 2 without large self-energy files#39
Conversation
* refactor dptb_negf as dpnegf * fix: remove old version __init__ * add loggers.py * rename NEGF.py location * add __init__.py * add make_kpoints.py * add example: atomic_chain * force torch.float64 * fix model dtype as float32 * add atom_norbs in NEGFHamiltonianInit * 更新 .gitignore,添加 playground 目录以排除不必要的文件 * update atom_norbs * 在 pyproject.toml 中添加 dptb 依赖项 * 在 dpnegf/entrypoints 中添加 run.py 文件,实现运行 NEGF 和 TBtrans 计算的功能 * 在 dpnegf/runner 中添加 tbtrans_init.py 文件,实现 TBtrans 输入文件的生成和处理功能 * add unit tests * copy argcheck from dptb * add run_input.json * fix: remove deprecation warning when running unit tests * add __init__.py in tests * fix: set default dtype for torch model in NEGFHamiltonianInit * refactor: 移除未使用的代码和多余的执行计数
…nd self energy cal
create np data module
* update totorial to delete unnecessary log info. * add docs * update index.rst * add gen_input_docs * add tpye check * update gen_input_docs and conf.py * fix typos
* update totorial to delete unnecessary log info. * add docs * update index.rst * add gen_input_docs * add tpye check * update gen_input_docs and conf.py * fix typos * update tutorial by delete wrong outputs * update docs * remove INPUT TAG for temp * remove shell dependence * remove shell dependency * add docs * add warning
WalkthroughRefactors self-energy computation to a path-based on-disk cache with HDF5 utilities and parallel batch processing; adjusts NEGF runner to a prepare_self_energy flow; renames gate() to Dirichlet_BC() and expands option schemas; updates tests accordingly; and adds extensive graphene/hBN example configs and data. Also updates .gitignore to dpnegf paths. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant NEGF as NEGF
participant LeadProp as LeadProperty
participant IO as Disk I/O (H5/PTH)
participant Calc as self_energy_cal
User->>NEGF: negf_compute()
NEGF->>NEGF: prepare_self_energy(scf_require)
NEGF->>LeadProp: self_energy(k,e, save_path=dir, save_format=h5)
LeadProp->>LeadProp: _get_save_path(k,e,format,dir)
alt saved exists
LeadProp->>IO: _load_self_energy(path,k,e,format)
IO-->>LeadProp: Σ(k,e)
else compute
LeadProp->>Calc: compute Σ(k,e,eta,method)
Calc-->>LeadProp: Σ(k,e)
LeadProp->>IO: write_to_hdf5(path,k,e,Σ)
end
LeadProp-->>NEGF: Σ_L/Σ_R
NEGF-->>User: Continue NEGF workflow
sequenceDiagram
autonumber
actor User
participant Orchestrator as compute_all_self_energy
participant Pool as Parallel Workers
participant LeadProp as LeadProperty
participant IO as HDF5 Temp Files
participant Merge as merge_hdf5_files
User->>Orchestrator: (eta, lead_L, lead_R, k-grid, e-grid, save_dir)
loop for batches
Orchestrator->>Pool: dispatch self_energy_worker(k,e,...,save_dir)
Pool->>LeadProp: self_energy(k,e, save_path=tmp_h5)
LeadProp-->>Pool: Σ(k,e)
Pool->>IO: write_to_hdf5(tmp_h5,k,e,Σ)
end
Orchestrator->>Merge: merge temp H5 -> per-lead H5
Merge-->>Orchestrator: consolidated H5
Orchestrator-->>User: done
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 37
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
dpnegf/utils/argcheck.py (3)
558-565: Fix misusedoptionalparameter forlatent_kwargs(breaks validation).
optionalexpects a bool, not a dict. This will raise during schema construction.Apply:
- Argument( - "latent_kwargs", dict, - optional={ - "mlp_latent_dimensions": [128, 128, 256], - "mlp_nonlinearity": "silu", - "mlp_initialization": "uniform" - }, - default=None, - doc=doc_latent_kwargs - ), + Argument( + "latent_kwargs", + dict, + optional=True, + default=None, + doc=doc_latent_kwargs + ),
949-954: NameError: undefinedloinnormalize_test.
lois not defined; likely meantloss_options().Apply:
- base = Argument("base", dict, [co, da, to, lo]) + lo = loss_options() + base = Argument("base", dict, [co, da, to, lo])
1173-1178: Wrong type forFiori.integrate_way.Type is
intbut default is'direct'(string).Apply:
- Argument("integrate_way", int, optional=True, default='direct', doc=doc_integrate_way), + Argument("integrate_way", str, optional=True, default='direct', doc=doc_integrate_way),dpnegf/negf/device_property.py (3)
79-96: Type hints and E_ref assignment are inconsistent (can become dict)
efermiis typed as dict but used as a scalar energy reference. Ensure Optional[float] and robust assignment.- def __init__(self, hamiltonian, structure, results_path, e_T=300, - efermi: dict=None, chemiPot: dict=None, E_ref: float=None ) -> None: + def __init__(self, hamiltonian, structure, results_path, e_T=300, + efermi: Optional[float] = None, chemiPot: Optional[dict] = None, + E_ref: Optional[float] = None) -> None: @@ - # self.efermi = efermi self.chemiPot = chemiPot - if E_ref is None: - self.E_ref = efermi - log.info(f"Using efermi as E_ref in DeviceProperty: {self.E_ref}") - else: - self.E_ref = E_ref + if E_ref is None: + self.E_ref = float(efermi) if efermi is not None else 0.0 + log.info(f"Using efermi as E_ref in DeviceProperty: {self.E_ref}") + else: + self.E_ref = float(E_ref)
292-311: Wrong condition for NSCF efermi check; rename ambiguousI
- The inequality is inverted; it warns when left/right are equal.
- Avoid single-letter ambiguous variable names.
- if abs(self.lead_L.efermi-self.lead_R.efermi)<5e-4: - log.warning(msg="The Fermi energy of the left and right leads should be equal in nscf current calculation.") + if abs(self.lead_L.efermi - self.lead_R.efermi) > 5e-4: + log.warning(msg="The Fermi energies of the left and right leads should be equal in NSCF current calculation.") @@ - for dv in vv * 0.5: - I = simpson(y=(f(energy_grid+efermi, efermi-vm+dv) - -f(energy_grid+efermi, efermi-vm-dv)) * tc, x=energy_grid) - cc.append(I) + for dv in vv * 0.5: + curr = simpson( + y=(f(energy_grid + efermi, efermi - vm + dv) - f(energy_grid + efermi, efermi - vm - dv)) * tc, + x=energy_grid, + ) + cc.append(curr)
448-451: Likely wrong argument passed to Ozaki.integrate
devicepropshould be the DeviceProperty instance, not the string"cpu".- DM_eq, DM_neq = dm.integrate(deviceprop=self.device, kpoint=self.kpoint) + DM_eq, DM_neq = dm.integrate(deviceprop=self, kpoint=self.kpoint)dpnegf/runner/NEGF.py (2)
27-31: Import AtomicData from dpnegf first; fallback to dptbThis module defines AtomicData in dpnegf; requiring external dptb breaks out of the box.
Apply:
-try: - from dptb.data import AtomicData, AtomicDataDict -except ImportError: - raise ImportError("dptb.data is not available. Please install dptb package to use AtomicData.") +try: + from dpnegf.data import AtomicData, AtomicDataDict +except ImportError: + try: + from dptb.data import AtomicData, AtomicDataDict + except ImportError: + raise ImportError("Neither dpnegf.data nor dptb.data is available. Please install the appropriate package to use AtomicData.")
776-793: get_grid(): z-grid bug (uses atom z instead of grid z)You pass
zatwice; should passzgfor the z-grid.Apply:
- grid = Grid(xg,yg,za,xa,ya,za) #TODO: change back to zg + grid = Grid(xg,yg,zg,xa,ya,za) # use zg for the grid axis
| @@ -0,0 +1,7 @@ | |||
| urllib3 == 1.26.15 | |||
There was a problem hiding this comment.
🧩 Analysis chain
Upgrade urllib3; pinned 1.26.15 has multiple CVEs
Move to a safe version range.
-urllib3 == 1.26.15
+urllib3>=2.2.2,<3🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== CWD ==="
pwd
echo
echo "=== docs/requirements.txt (if present) ==="
if [ -f docs/requirements.txt ]; then
sed -n '1,200p' docs/requirements.txt || true
else
echo "docs/requirements.txt not found"
fi
echo
echo "=== Ripgrep search for urllib3 occurrences ==="
rg -n --hidden -S "urllib3" || true
echo
echo "=== Ripgrep search for pinned urllib3 versions (==) ==="
rg -nP --hidden -S "urllib3\s*==\s*[0-9]+\.[0-9]+\.[0-9]+" || trueLength of output: 429
Upgrade urllib3; pinned 1.26.15 has multiple CVEs
File: docs/requirements.txt:1 — replace pinned vulnerable version with a safe range.
-urllib3 == 1.26.15
+urllib3>=2.2.2,<3📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| urllib3 == 1.26.15 | |
| urllib3>=2.2.2,<3 |
🧰 Tools
🪛 OSV Scanner (2.2.2)
[HIGH] 1-1: urllib3 1.26.15: undefined
(PYSEC-2023-192)
[HIGH] 1-1: urllib3 1.26.15: undefined
(PYSEC-2023-212)
[HIGH] 1-1: urllib3 1.26.15: urllib3's Proxy-Authorization request header isn't stripped during cross-origin redirects
[HIGH] 1-1: urllib3 1.26.15: urllib3's request body not stripped after redirect from 303 status changes request method to GET
[HIGH] 1-1: urllib3 1.26.15: urllib3 redirects are not disabled when retries are disabled on PoolManager instantiation
[HIGH] 1-1: urllib3 1.26.15: Cookie HTTP header isn't stripped on cross-origin redirects
🤖 Prompt for AI Agents
In docs/requirements.txt lines 1-1 replace the pinned vulnerable urllib3 ==
1.26.15 with a safe, non-vulnerable range (for example urllib3>=1.26.16,<2.0) to
avoid the CVEs; update the requirements file accordingly, run dependency
resolution (pip-compile or your lockfile generator) and CI tests, and commit the
updated requirements plus any updated lockfile or SBOM so builds use the fixed
urllib3 version.
| import importlib.metadata | ||
|
|
||
| __version__ = importlib.metadata.version("dpnegf") |
There was a problem hiding this comment.
Guard version lookup to avoid import-time crash in editable/dev installs
importlib.metadata.version(...) raises PackageNotFoundError when the dist isn’t installed; importing dpnegf from source will fail. Wrap in try/except.
Apply this diff:
-import importlib.metadata
-
-__version__ = importlib.metadata.version("dpnegf")
+from importlib.metadata import PackageNotFoundError, version
+try:
+ __version__ = version("dpnegf")
+except PackageNotFoundError:
+ __version__ = "0+unknown"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import importlib.metadata | |
| __version__ = importlib.metadata.version("dpnegf") | |
| from importlib.metadata import PackageNotFoundError, version | |
| try: | |
| __version__ = version("dpnegf") | |
| except PackageNotFoundError: | |
| __version__ = "0+unknown" |
🤖 Prompt for AI Agents
In dpnegf/__init__.py around lines 2 to 4, the direct call to
importlib.metadata.version("dpnegf") can raise PackageNotFoundError during
editable/dev installs; wrap the lookup in a try/except that imports
PackageNotFoundError (from importlib.metadata) and on exception assign a safe
fallback version string (e.g. "0.0.0" or "0+unknown") to __version__ so
importing from source does not crash.
| __all__ = [ | ||
| AtomicData, | ||
| PBC, | ||
| register_fields, | ||
| deregister_fields, | ||
| block_to_feature, | ||
| feature_to_block, | ||
| _register_field_prefix, | ||
| OrbitalMapper, | ||
| _NODE_FIELDS, | ||
| _EDGE_FIELDS, | ||
| _GRAPH_FIELDS, | ||
| _LONG_FIELDS, | ||
| ] |
There was a problem hiding this comment.
all must contain strings, not objects.
Current __all__ lists symbols directly, which breaks from dpnegf.data import *. Use string names.
-__all__ = [
- AtomicData,
- PBC,
- register_fields,
- deregister_fields,
- block_to_feature,
- feature_to_block,
- _register_field_prefix,
- OrbitalMapper,
- _NODE_FIELDS,
- _EDGE_FIELDS,
- _GRAPH_FIELDS,
- _LONG_FIELDS,
-]
+__all__ = [
+ "AtomicData",
+ "PBC",
+ "register_fields",
+ "deregister_fields",
+ "block_to_feature",
+ "feature_to_block",
+ "_register_field_prefix",
+ "OrbitalMapper",
+ "_NODE_FIELDS",
+ "_EDGE_FIELDS",
+ "_GRAPH_FIELDS",
+ "_LONG_FIELDS",
+]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| __all__ = [ | |
| AtomicData, | |
| PBC, | |
| register_fields, | |
| deregister_fields, | |
| block_to_feature, | |
| feature_to_block, | |
| _register_field_prefix, | |
| OrbitalMapper, | |
| _NODE_FIELDS, | |
| _EDGE_FIELDS, | |
| _GRAPH_FIELDS, | |
| _LONG_FIELDS, | |
| ] | |
| __all__ = [ | |
| "AtomicData", | |
| "PBC", | |
| "register_fields", | |
| "deregister_fields", | |
| "block_to_feature", | |
| "feature_to_block", | |
| "_register_field_prefix", | |
| "OrbitalMapper", | |
| "_NODE_FIELDS", | |
| "_EDGE_FIELDS", | |
| "_GRAPH_FIELDS", | |
| "_LONG_FIELDS", | |
| ] |
🤖 Prompt for AI Agents
In dpnegf/data/__init__.py around lines 15 to 28, the __all__ list currently
contains symbol objects rather than string names, which breaks wildcard imports;
replace each entry with its corresponding string (e.g. "AtomicData", "PBC",
"register_fields", etc.), ensuring the list contains the exact export names as
strings and includes any private names intentionally exported (like
"_NODE_FIELDS") spelled exactly as in the module.
| def register_fields( | ||
| node_fields: Sequence[str] = [], | ||
| edge_fields: Sequence[str] = [], | ||
| env_fields: Sequence[str] = [], | ||
| onsitenv_fields: Sequence[str] = [], | ||
| graph_fields: Sequence[str] = [], | ||
| long_fields: Sequence[str] = [], | ||
| ) -> None: | ||
|
|
||
| r"""Register fields as being per-atom, per-edge, or per-frame. | ||
|
|
||
| Args: | ||
| node_permute_fields: fields that are equivariant to node permutations. | ||
| edge_permute_fields: fields that are equivariant to edge permutations. | ||
| """ | ||
|
|
||
| node_fields: set = set(node_fields) | ||
| edge_fields: set = set(edge_fields) | ||
| env_fields: set = set(env_fields) | ||
| onsitenv_fields: set = set(onsitenv_fields) | ||
| graph_fields: set = set(graph_fields) | ||
| long_fields: set = set(long_fields) | ||
| allfields = node_fields.union(edge_fields, graph_fields, env_fields, onsitenv_fields) | ||
| assert len(allfields) == len(node_fields) + len(edge_fields) + len(graph_fields) | ||
| _NODE_FIELDS.update(node_fields) | ||
| _EDGE_FIELDS.update(edge_fields) | ||
| _ENV_FIELDS.update(env_fields) | ||
| _ONSITENV_FIELDS.update(onsitenv_fields) | ||
| _GRAPH_FIELDS.update(graph_fields) | ||
| _LONG_FIELDS.update(long_fields) | ||
| if len(set.union(_NODE_FIELDS, _EDGE_FIELDS, _GRAPH_FIELDS)) < ( |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Fix mutable defaults and disjointness check in register_fields()
- Mutable defaults (lists) are unsafe; use None and initialize inside.
- Disjointness assert ignores env_fields/onsitenv_fields; it can pass with duplicates across these sets.
Apply:
-def register_fields(
- node_fields: Sequence[str] = [],
- edge_fields: Sequence[str] = [],
- env_fields: Sequence[str] = [],
- onsitenv_fields: Sequence[str] = [],
- graph_fields: Sequence[str] = [],
- long_fields: Sequence[str] = [],
-) -> None:
+def register_fields(
+ node_fields: Optional[Sequence[str]] = None,
+ edge_fields: Optional[Sequence[str]] = None,
+ env_fields: Optional[Sequence[str]] = None,
+ onsitenv_fields: Optional[Sequence[str]] = None,
+ graph_fields: Optional[Sequence[str]] = None,
+ long_fields: Optional[Sequence[str]] = None,
+) -> None:
@@
- node_fields: set = set(node_fields)
- edge_fields: set = set(edge_fields)
- env_fields: set = set(env_fields)
- onsitenv_fields: set = set(onsitenv_fields)
- graph_fields: set = set(graph_fields)
- long_fields: set = set(long_fields)
- allfields = node_fields.union(edge_fields, graph_fields, env_fields, onsitenv_fields)
- assert len(allfields) == len(node_fields) + len(edge_fields) + len(graph_fields)
+ node_fields_set: set = set(node_fields or [])
+ edge_fields_set: set = set(edge_fields or [])
+ env_fields_set: set = set(env_fields or [])
+ onsitenv_fields_set: set = set(onsitenv_fields or [])
+ graph_fields_set: set = set(graph_fields or [])
+ long_fields_set: set = set(long_fields or [])
+ allfields = node_fields_set.union(edge_fields_set, graph_fields_set, env_fields_set, onsitenv_fields_set)
+ assert len(allfields) == (
+ len(node_fields_set) + len(edge_fields_set) + len(graph_fields_set) + len(env_fields_set) + len(onsitenv_fields_set)
+ )
@@
- _NODE_FIELDS.update(node_fields)
- _EDGE_FIELDS.update(edge_fields)
- _ENV_FIELDS.update(env_fields)
- _ONSITENV_FIELDS.update(onsitenv_fields)
- _GRAPH_FIELDS.update(graph_fields)
- _LONG_FIELDS.update(long_fields)
+ _NODE_FIELDS.update(node_fields_set)
+ _EDGE_FIELDS.update(edge_fields_set)
+ _ENV_FIELDS.update(env_fields_set)
+ _ONSITENV_FIELDS.update(onsitenv_fields_set)
+ _GRAPH_FIELDS.update(graph_fields_set)
+ _LONG_FIELDS.update(long_fields_set)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def register_fields( | |
| node_fields: Sequence[str] = [], | |
| edge_fields: Sequence[str] = [], | |
| env_fields: Sequence[str] = [], | |
| onsitenv_fields: Sequence[str] = [], | |
| graph_fields: Sequence[str] = [], | |
| long_fields: Sequence[str] = [], | |
| ) -> None: | |
| r"""Register fields as being per-atom, per-edge, or per-frame. | |
| Args: | |
| node_permute_fields: fields that are equivariant to node permutations. | |
| edge_permute_fields: fields that are equivariant to edge permutations. | |
| """ | |
| node_fields: set = set(node_fields) | |
| edge_fields: set = set(edge_fields) | |
| env_fields: set = set(env_fields) | |
| onsitenv_fields: set = set(onsitenv_fields) | |
| graph_fields: set = set(graph_fields) | |
| long_fields: set = set(long_fields) | |
| allfields = node_fields.union(edge_fields, graph_fields, env_fields, onsitenv_fields) | |
| assert len(allfields) == len(node_fields) + len(edge_fields) + len(graph_fields) | |
| _NODE_FIELDS.update(node_fields) | |
| _EDGE_FIELDS.update(edge_fields) | |
| _ENV_FIELDS.update(env_fields) | |
| _ONSITENV_FIELDS.update(onsitenv_fields) | |
| _GRAPH_FIELDS.update(graph_fields) | |
| _LONG_FIELDS.update(long_fields) | |
| if len(set.union(_NODE_FIELDS, _EDGE_FIELDS, _GRAPH_FIELDS)) < ( | |
| def register_fields( | |
| node_fields: Optional[Sequence[str]] = None, | |
| edge_fields: Optional[Sequence[str]] = None, | |
| env_fields: Optional[Sequence[str]] = None, | |
| onsitenv_fields: Optional[Sequence[str]] = None, | |
| graph_fields: Optional[Sequence[str]] = None, | |
| long_fields: Optional[Sequence[str]] = None, | |
| ) -> None: | |
| r"""Register fields as being per-atom, per-edge, or per-frame. | |
| Args: | |
| node_permute_fields: fields that are equivariant to node permutations. | |
| edge_permute_fields: fields that are equivariant to edge permutations. | |
| """ | |
| node_fields_set: set = set(node_fields or []) | |
| edge_fields_set: set = set(edge_fields or []) | |
| env_fields_set: set = set(env_fields or []) | |
| onsitenv_fields_set: set = set(onsitenv_fields or []) | |
| graph_fields_set: set = set(graph_fields or []) | |
| long_fields_set: set = set(long_fields or []) | |
| allfields = node_fields_set.union(edge_fields_set, graph_fields_set, env_fields_set, onsitenv_fields_set) | |
| assert len(allfields) == ( | |
| len(node_fields_set) + len(edge_fields_set) + len(graph_fields_set) + len(env_fields_set) + len(onsitenv_fields_set) | |
| ) | |
| _NODE_FIELDS.update(node_fields_set) | |
| _EDGE_FIELDS.update(edge_fields_set) | |
| _ENV_FIELDS.update(env_fields_set) | |
| _ONSITENV_FIELDS.update(onsitenv_fields_set) | |
| _GRAPH_FIELDS.update(graph_fields_set) | |
| _LONG_FIELDS.update(long_fields_set) | |
| if len(set.union(_NODE_FIELDS, _EDGE_FIELDS, _GRAPH_FIELDS)) < ( |
🤖 Prompt for AI Agents
In dpnegf/data/AtomicData.py around lines 119 to 149, change the function
signature to avoid mutable default arguments by using None for each sequence
param and initialize them inside (e.g., if node_fields is None: node_fields =
[]), then convert those to sets; also fix the disjointness check so it considers
all field categories (node, edge, graph, env, onsitenv, long) by computing the
union of all those sets and asserting its size equals the sum of the individual
set sizes, and use set.union(...) consistently when forming the combined set
used in the final length comparison.
| def deregister_fields(*fields: Sequence[str]) -> None: | ||
| r"""Deregister a field registered with ``register_fields``. | ||
|
|
||
| Silently ignores fields that were never registered to begin with. | ||
|
|
||
| Args: | ||
| *fields: fields to deregister. | ||
| """ | ||
| for f in fields: | ||
| assert f not in _DEFAULT_NODE_FIELDS, "Cannot deregister built-in field" | ||
| assert f not in _DEFAULT_EDGE_FIELDS, "Cannot deregister built-in field" | ||
| assert f not in _DEFAULT_GRAPH_FIELDS, "Cannot deregister built-in field" | ||
| assert f not in _DEFAULT_ENV_FIELDS, "Cannot deregister built-in field" | ||
| assert f not in _DEFAULT_ONSITENV_FIELDS, "Cannot deregister built-in field" | ||
| _NODE_FIELDS.discard(f) | ||
| _EDGE_FIELDS.discard(f) | ||
| _ENV_FIELDS.discard(f) | ||
| _ONSITENV_FIELDS.discard(f) | ||
| _GRAPH_FIELDS.discard(f) | ||
|
|
There was a problem hiding this comment.
deregister_fields varargs type bug
The signature makes fields a tuple of sequences; comparisons assert against strings will misbehave.
Apply:
-def deregister_fields(*fields: Sequence[str]) -> None:
+def deregister_fields(*fields: str) -> None:
@@
- for f in fields:
+ for f in fields:
assert f not in _DEFAULT_NODE_FIELDS, "Cannot deregister built-in field"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def deregister_fields(*fields: Sequence[str]) -> None: | |
| r"""Deregister a field registered with ``register_fields``. | |
| Silently ignores fields that were never registered to begin with. | |
| Args: | |
| *fields: fields to deregister. | |
| """ | |
| for f in fields: | |
| assert f not in _DEFAULT_NODE_FIELDS, "Cannot deregister built-in field" | |
| assert f not in _DEFAULT_EDGE_FIELDS, "Cannot deregister built-in field" | |
| assert f not in _DEFAULT_GRAPH_FIELDS, "Cannot deregister built-in field" | |
| assert f not in _DEFAULT_ENV_FIELDS, "Cannot deregister built-in field" | |
| assert f not in _DEFAULT_ONSITENV_FIELDS, "Cannot deregister built-in field" | |
| _NODE_FIELDS.discard(f) | |
| _EDGE_FIELDS.discard(f) | |
| _ENV_FIELDS.discard(f) | |
| _ONSITENV_FIELDS.discard(f) | |
| _GRAPH_FIELDS.discard(f) | |
| def deregister_fields(*fields: str) -> None: | |
| r"""Deregister a field registered with ``register_fields``. | |
| Silently ignores fields that were never registered to begin with. | |
| Args: | |
| *fields: fields to deregister. | |
| """ | |
| for f in fields: | |
| assert f not in _DEFAULT_NODE_FIELDS, "Cannot deregister built-in field" | |
| assert f not in _DEFAULT_EDGE_FIELDS, "Cannot deregister built-in field" | |
| assert f not in _DEFAULT_GRAPH_FIELDS, "Cannot deregister built-in field" | |
| assert f not in _DEFAULT_ENV_FIELDS, "Cannot deregister built-in field" | |
| assert f not in _DEFAULT_ONSITENV_FIELDS, "Cannot deregister built-in field" | |
| _NODE_FIELDS.discard(f) | |
| _EDGE_FIELDS.discard(f) | |
| _ENV_FIELDS.discard(f) | |
| _ONSITENV_FIELDS.discard(f) | |
| _GRAPH_FIELDS.discard(f) |
🤖 Prompt for AI Agents
In dpnegf/data/AtomicData.py around lines 157 to 176, the function signature
uses "*fields: Sequence[str]" which makes "fields" a tuple of sequences and
causes the subsequent per-field string assertions to behave incorrectly; change
the varargs annotation to "*fields: str" so each item is a string, keep the
existing loop and assertions as-is, and update any type hints or docstring if
needed to reflect that the function accepts multiple field names as separate
string arguments.
| Argument("kline_type", str, optional=False, doc=doc_kline_type), | ||
| Argument("kpath", [str,list], optional=False, doc=doc_kpath), | ||
| Argument("high_sym_kpoints",dict,optional=True,default={},doc=doc_high_sym_kpoints), | ||
| Argument("number_in_line", int, optional=True, default=None, doc=doc_num_in_line), | ||
| Argument("klabels", list, optional=True, default=[''], doc=doc_klabels), |
There was a problem hiding this comment.
band.number_in_line default None vs type int.
Allow None in type to match default.
Apply:
- Argument("number_in_line", int, optional=True, default=None, doc=doc_num_in_line),
+ Argument("number_in_line", [int, None], optional=True, default=None, doc=doc_num_in_line),Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In dpnegf/utils/argcheck.py around lines 1376 to 1380, the Argument for
"number_in_line" has type int but default None; update its allowed types to
include None (e.g., use [int, type(None)] or equivalent in this codebase) so the
declared type matches the default, keep optional=True and default=None
unchanged.
| for name, func in modules.items(): | ||
| # arg = func() if callable(func) else func | ||
| # print(f"[DEBUG] {name} returned:", arg, type(arg)) | ||
| arg = func() | ||
| rst = f""" | ||
| ======================================== | ||
| {name.replace("_", " ").title()} | ||
| ======================================== | ||
|
|
||
| .. _`{name}`: | ||
|
|
||
| {render_argument(arg)} | ||
| """ | ||
| with open(os.path.join(output_dir, f"{name}.rst"), "w", encoding="utf-8") as f: | ||
| f.write(rst) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Derive output name from Argument to avoid name mismatch
run_options returns an Argument named “run_op”, but files are written as run_options.rst. Derive from arg.name.
- for name, func in modules.items():
- # arg = func() if callable(func) else func
- # print(f"[DEBUG] {name} returned:", arg, type(arg))
- arg = func()
+ for _key, func in modules.items():
+ arg = func()
+ name = getattr(arg, "name", _key)
rst = f"""
========================================
{name.replace("_", " ").title()}
========================================
.. _`{name}`:
{render_argument(arg)}
"""
- with open(os.path.join(output_dir, f"{name}.rst"), "w", encoding="utf-8") as f:
+ with open(os.path.join(output_dir, f"{name}.rst"), "w", encoding="utf-8") as f:
f.write(rst)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for name, func in modules.items(): | |
| # arg = func() if callable(func) else func | |
| # print(f"[DEBUG] {name} returned:", arg, type(arg)) | |
| arg = func() | |
| rst = f""" | |
| ======================================== | |
| {name.replace("_", " ").title()} | |
| ======================================== | |
| .. _`{name}`: | |
| {render_argument(arg)} | |
| """ | |
| with open(os.path.join(output_dir, f"{name}.rst"), "w", encoding="utf-8") as f: | |
| f.write(rst) | |
| for _key, func in modules.items(): | |
| arg = func() | |
| name = getattr(arg, "name", _key) | |
| rst = f""" | |
| ======================================== | |
| {name.replace("_", " ").title()} | |
| ======================================== | |
| .. _`{name}`: | |
| {render_argument(arg)} | |
| """ | |
| with open(os.path.join(output_dir, f"{name}.rst"), "w", encoding="utf-8") as f: | |
| f.write(rst) |
| "if os.path.exists(output):\n", | ||
| " os.system('rm -rf %s' % output)\n", | ||
| "\n", |
There was a problem hiding this comment.
Avoid shell=True and use safe deletion for the output directory.
os.system('rm -rf %s' % output) is shell-injection prone and can delete unintended paths. Use shutil.rmtree with a sanity check.
Apply this diff:
- if os.path.exists(output):
- os.system('rm -rf %s' % output)
+from shutil import rmtree
+from pathlib import Path
+out_path = Path(output)
+if out_path.exists():
+ # Only allow removing if it's a directory within the repo/examples
+ if out_path.is_dir() and "examples" in str(out_path.resolve()):
+ rmtree(out_path)
+ else:
+ raise RuntimeError(f"Refusing to remove non-example path: {out_path}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "if os.path.exists(output):\n", | |
| " os.system('rm -rf %s' % output)\n", | |
| "\n", | |
| from shutil import rmtree | |
| from pathlib import Path | |
| out_path = Path(output) | |
| if out_path.exists(): | |
| # Only allow removing if it's a directory within the repo/examples | |
| if out_path.is_dir() and "examples" in str(out_path.resolve()): | |
| rmtree(out_path) | |
| else: | |
| raise RuntimeError(f"Refusing to remove non-example path: {out_path}") |
| @@ -0,0 +1 @@ | |||
| /personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.ep3000.pth No newline at end of file | |||
There was a problem hiding this comment.
🧩 Analysis chain
Remove absolute symlinked checkpoint from repo; ignore or use LFS instead
This file appears to be a symlink (or text stub) pointing to an absolute local path: /personal/DeepTB/dptb_Zjj/.../nnsk.ep3000.pth. It will be broken for all users, leaks local filesystem details, and contradicts the PR goal of avoiding large artifacts. Please:
- Remove this file from version control.
- Add patterns to .gitignore (or route through Git LFS if you truly need to distribute model weights).
- Ensure example scripts don’t hard-require this file; prefer a download/train step or a configurable path.
Suggested .gitignore additions (adjust as needed):
- examples/graphene/train/train_out/checkpoint/*.pth
- examples//train_out//checkpoint/*.pth
- examples//train_out//checkpoint/
If distribution is necessary, prefer Git LFS and add to .gitattributes:
- *.pth filter=lfs diff=lfs merge=lfs -text
Run this to verify symlink status and references:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
f='examples/graphene/train/train_out/checkpoint/nnsk.best.pth'
echo "== File mode via git (120000 indicates symlink) =="
git ls-files -s "$f" || true
echo "== File contents (should not be absolute path) =="
if [ -f "$f" ]; then
cat "$f"
else
echo "File not found in repo."
fi
echo "== Check for Git LFS pointer =="
if rg -n 'https?://git-lfs.github.com/spec/v1' "$f"; then
echo "Looks like a Git LFS pointer."
else
echo "Not a Git LFS pointer."
fi
echo "== Search for usages expecting this checkpoint path =="
rg -n -C2 -g '!**/node_modules/**' -P 'nnsk\.(best|ep\d+)\.pth|train_out/checkpoint|\.pth\b'Length of output: 591
🏁 Script executed:
#!/bin/bash
set -euo pipefail
f='examples/graphene/train/train_out/checkpoint/nnsk.best.pth'
echo "== git ls-files -s '$f' =="
git ls-files -s "$f" || true
sha=$(git ls-files -s "$f" 2>/dev/null | awk '{print $2}' || true)
echo "== blob sha: '$sha' =="
if [ -n "$sha" ]; then
echo "== git cat-file -p $sha (blob content - likely symlink target) =="
git cat-file -p "$sha" || true
else
echo "No sha found for $f"
fi
echo "== git ls-tree -l HEAD for checkpoint dir =="
git ls-tree -l HEAD examples/graphene/train/train_out/checkpoint || true
echo "== Search repo (including ignored/hidden files) for references to .pth or this filename =="
rg -n -S -uu -C2 -g '!**/node_modules/**' -P 'nnsk\.(best|ep\d+)\.pth|examples/graphene/train/train_out/checkpoint|\.pth\b' || trueLength of output: 836
Remove absolute symlinked checkpoint from repo; ignore or use LFS instead
examples/graphene/train/train_out/checkpoint/nnsk.best.pth is committed as a git symlink (mode 120000) whose blob content points to an absolute local path (/personal/DeepTB/dptb_Zjj/dpnegf/examples/graphene/train/train_out/checkpoint/nnsk.ep3000.pth) — this leaks local filesystem details and will be broken for other users.
- Remove the file from version control (delete / git rm[ --cached] and commit).
- Add .gitignore entries, e.g.:
- examples/graphene/train/train_out/checkpoint/*.pth
- examples//train_out//checkpoint/*.pth
- If you must distribute weights, use Git LFS and add to .gitattributes:
- *.pth filter=lfs diff=lfs merge=lfs -text
- Ensure example scripts don’t hard-depend on this path; use a configurable path, download step, or training fallback.
🤖 Prompt for AI Agents
In examples/graphene/train/train_out/checkpoint/nnsk.best.pth (lines 1-1) the
committed git symlink points to an absolute local path which leaks local
filesystem details and will break for others; remove the file from version
control (git rm or git rm --cached as appropriate and commit the deletion), add
.gitignore entries such as examples/graphene/train/train_out/checkpoint/*.pth
and examples/**/train_out/**/checkpoint/*.pth to prevent future commits, if you
need to ship model weights track them with Git LFS and add a .gitattributes rule
like *.pth filter=lfs diff=lfs merge=lfs -text, and update any example scripts
to accept a configurable checkpoint path or include a download/fallback step
instead of hard-coding this file.
| @@ -0,0 +1,18 @@ | |||
| { | |||
| "structure":"train/data/POSCAR", | |||
There was a problem hiding this comment.
🧩 Analysis chain
Verify structure path correctness
The path "train/data/POSCAR" may not exist relative to this example’s location. For an example under examples/hBN, consider pointing to a colocated POSCAR/STRU, or confirm the referenced path exists.
Run this to verify and suggest alternatives:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Validate JSON
jq empty examples/hBN/band.json
# 2) Extract structure path and check existence
STRUCT_PATH="$(jq -r '.structure' examples/hBN/band.json)"
echo "structure: $STRUCT_PATH"
if [ -f "$STRUCT_PATH" ]; then
echo "OK: structure file exists at $STRUCT_PATH"
else
echo "WARN: structure file not found at $STRUCT_PATH"
echo "Searching for likely candidates under examples/hBN..."
fd -a -H -t f -g 'POSCAR|STRU|structure*|POSCAR*' examples/hBN || true
fiLength of output: 134
Fix missing structure path in examples/hBN/band.json
The 'structure' value "train/data/POSCAR" does not exist in the repo (no matches under examples/hBN). Update the path to point to a colocated structure file or add the referenced file.
Location: examples/hBN/band.json (line 2)
🤖 Prompt for AI Agents
In examples/hBN/band.json around line 2, the "structure" value
"train/data/POSCAR" points to a non-existent path; either update the JSON to
reference a colocated structure file within examples/hBN (for example adjust to
"structure":"POSCAR" or "structure":"structure.vasp" or
"structure":"examples/hBN/POSCAR") or create the referenced file at
examples/hBN/train/data/POSCAR so the path resolves; ensure the JSON path
exactly matches the file name and relative location.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (8)
dpnegf/negf/lead_property.py (5)
157-170: Type-hint Optional for save_path (RUF013).PEP 484: default None requires Optional[str].
-def _get_save_path(self, kpoint, energy, save_format: str, save_path: str = None): +def _get_save_path(self, kpoint, energy, save_format: str, save_path: Optional[str] = None):Also add Optional import at the top of this file:
from typing import List, Optional
299-303: Avoid perpetual HS reload: check HLk, not HL.
hasattr(self, "HL")is always False (you set HLk), forcing unnecessary re-fetching.- if not hasattr(self, "HL") \ + if not hasattr(self, "HLk") \ or abs(self.voltage_old-self.voltage)>1e-6 \ or max(abs(self.kpoint-torch.tensor(kpoint)))>1e-6:
491-496: Ensure output directory exists before parallel runs.Absent dir causes write_to_hdf5 to fail.
if self_energy_save_path is None: @@ - self_energy_save_path = os.path.join(lead_L.results_path, "self_energy") + self_energy_save_path = os.path.join(lead_L.results_path, "self_energy") + os.makedirs(self_energy_save_path, exist_ok=True)
548-551: Create save directory in worker before writing temp files.Prevents races on first writer and missing-dir errors.
def self_energy_worker(k, e, eta, lead_L, lead_R, self_energy_save_path): @@ - save_tmp_L = os.path.join(self_energy_save_path, f"tmp_leadL_k{k[0]}_{k[1]}_{k[2]}_E{e:.8f}.h5") + os.makedirs(self_energy_save_path, exist_ok=True) + save_tmp_L = os.path.join(self_energy_save_path, f"tmp_leadL_k{k[0]}_{k[1]}_{k[2]}_E{e:.8f}.h5") save_tmp_R = os.path.join(self_energy_save_path, f"tmp_leadR_k{k[0]}_{k[1]}_{k[2]}_E{e:.8f}.h5")
103-110: Docstring mismatches the API (nosaveparam).Update the docstring to reflect path-based caching (
save_path,save_format) and removal ofsave.dpnegf/runner/NEGF.py (3)
51-51: Type-hint Optional for self_energy_save_path (RUF013).Default None requires Optional[str].
- use_saved_se: bool=False, self_energy_save_path: str=None, se_info_display: bool=False, + use_saved_se: bool=False, self_energy_save_path: Optional[str]=None, se_info_display: bool=False,
395-402: Gate profiling behind a flag to avoid hard dependency and overhead.Make profiling opt-in, e.g., via env var.
- profiler = Profiler() - profiler.start() + if os.getenv("DPNEGF_PROFILE") == "1": + profiler = Profiler() + profiler.start() @@ - profiler.stop() - output_path = os.path.join(self.results_path, "profile_report.html") - with open(output_path, 'w') as report_file: - report_file.write(profiler.output_html()) + if 'profiler' in locals(): + profiler.stop() + output_path = os.path.join(self.results_path, "profile_report.html") + with open(output_path, 'w') as report_file: + report_file.write(profiler.output_html())
828-829: Type-hint Optional for AtomicData_options (RUF013).Default None requires Optional[dict].
- def update_atomicdata_options(model,AtomicData_options: dict=None) -> dict: + def update_atomicdata_options(model, AtomicData_options: Optional[dict] = None) -> dict:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dpnegf/negf/lead_property.py(3 hunks)dpnegf/runner/NEGF.py(7 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
dpnegf/runner/NEGF.py (2)
dpnegf/utils/argcheck.py (4)
negf(1014-1068)poisson_options(1142-1151)get_cutoffs_from_model_options(1709-1771)model_options(670-681)dpnegf/negf/lead_property.py (2)
compute_all_self_energy(458-517)_has_saved_self_energy(614-624)
dpnegf/negf/lead_property.py (2)
dpnegf/utils/tools.py (1)
makedirs(736-737)dpnegf/negf/negf_hamiltonian_init.py (1)
get_hs_device(732-855)
🪛 Ruff (0.13.1)
dpnegf/runner/NEGF.py
51-51: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
552-552: Multiple statements on one line (semicolon)
(E702)
553-553: Multiple statements on one line (colon)
(E701)
828-828: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
dpnegf/negf/lead_property.py
157-157: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
194-194: Avoid specifying long messages outside the exception class
(TRY003)
211-211: Avoid specifying long messages outside the exception class
(TRY003)
214-214: Avoid specifying long messages outside the exception class
(TRY003)
241-241: Avoid specifying long messages outside the exception class
(TRY003)
250-253: Avoid specifying long messages outside the exception class
(TRY003)
255-255: Avoid specifying long messages outside the exception class
(TRY003)
258-258: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: run-tests
🔇 Additional comments (2)
dpnegf/negf/lead_property.py (1)
564-567: Don’t create dataset after warning it exists.This still errors on overwrite; return early.
- if dset_name in grp: - log.warning(f"Dataset {dset_name} already exists in group {group_name}. Skipping it.") - grp.create_dataset(dset_name, data=se.cpu().numpy(), compression="gzip") + if dset_name in grp: + log.warning(f"Dataset {dset_name} already exists in group {group_name}. Skipping it.") + return + grp.create_dataset(dset_name, data=se.cpu().numpy(), compression="gzip")dpnegf/runner/NEGF.py (1)
539-545: Guard integrate_range access; fallback to uni_grid.Ozaki density lacks integrate_range; avoid AttributeError and make behavior robust.
- compute_all_self_energy(self.eta_lead, self.deviceprop.lead_L, self.deviceprop.lead_R, - self.kpoints, self.density.integrate_range, self.self_energy_save_path) + energies = getattr(self.density, "integrate_range", getattr(self, "uni_grid", [])) + compute_all_self_energy( + self.eta_lead, + self.deviceprop.lead_L, + self.deviceprop.lead_R, + self.kpoints, + energies, + self.self_energy_save_path, + )
| # Try load | ||
| if os.path.isfile(save_path): | ||
| if se_info_display: | ||
| log.info(f"Loading {self.tab} self-energy from {save_path}") | ||
| self.se = self._load_self_energy(save_path, kpoint, energy, save_format) | ||
| return | ||
|
|
There was a problem hiding this comment.
Fix HDF5 partial-load crash and persist newly computed self-energy.
If the H5 file exists but the (k, E) entry is missing, _load_self_energy raises and you never compute. Also, computed results aren’t saved, defeating caching and causing repeated recomputation.
Apply:
- if os.path.isfile(save_path):
- if se_info_display:
- log.info(f"Loading {self.tab} self-energy from {save_path}")
- self.se = self._load_self_energy(save_path, kpoint, energy, save_format)
- return
+ if os.path.isfile(save_path):
+ try:
+ if se_info_display:
+ log.info(f"Loading {self.tab} self-energy from {save_path}")
+ self.se = self._load_self_energy(save_path, kpoint, energy, save_format)
+ return
+ except KeyError:
+ if se_info_display:
+ log.info("Entry not found in existing self-energy file; computing and appending.")
+ except IOError:
+ log.warning(f"Failed to load {save_path}; recomputing.")
@@
- self.se = self.self_energy_cal( kpoint,
+ self.se = self.self_energy_cal( kpoint,
energy,
eta_lead=eta_lead,
method=method,
HS_inmem=HS_inmem)
+ # Persist to disk
+ if save_path is not None:
+ try:
+ os.makedirs(os.path.dirname(save_path), exist_ok=True)
+ except Exception:
+ pass
+ if save_format == "h5":
+ ev = float(energy.item() if hasattr(energy, "item") else energy)
+ write_to_hdf5(save_path, kpoint, ev, self.se)
+ elif save_format == "pth":
+ torch.save(self.se, save_path)Also applies to: 151-156
Uploading the saved self-energy files is unnecessary. Here we add grapehene/negf_out_k20/self_energy/* to .gitignore to save time.
Summary by CodeRabbit
New Features
Improvements
Breaking Changes
Examples