Feat/modular julia pardiso#310
Conversation
…o/Julia band structure calculations and include related tests.
- Remove redundant solve_eigen_dense function and fixed band window logic - Replace custom eigenvalue solving with existing solve_eigen_at_k function - Update example notebook with new imports and execution outputs
- Fix root_dir path from Pardiso_teach to To_pardiso - Add sys.path manipulation for proper module imports - Expand Julia installation instructions with detailed steps - Add MKL and Pardiso environment variable configuration - Include troubleshooting section for common issues - Clear unnecessary cell outputs and update execution counts
…y check Replace generic PardisoSolver with MKLPardisoSolver for consistent performance and add explicit architecture check for Apple Silicon systems to prevent runtime failures with helpful guidance to use numpy solver instead.
- Implement `dos_calculation.jl` for Density of States calculations - Update `main.jl` to switch between `band` and `dos` tasks - Refactor `band_calculation.jl` to export `bandstructure.h5` using native HDF5 - Add incremental `bands.dat` text output for real-time tracking - Add verification notebook `examples/To_pardiso/dptb_to_Pardiso_new.ipynb`
- Refactored to use direct ASE integration in . - Implemented modular Julia backend structure in . - Optimized : compressed species, removed redundant orbital arrays, aligned formatting. - Updated example notebook with new API usage. - Improved logging: Output logs to both console and . - Renamed output directory to .
…so solver backend.
…fix spin handling - Renamed `dptb/postprocess/julia` to `dptb/postprocess/pardiso`. - Updated `io.jl` to support legacy `.dat` files via `load_structure_dat`. - Fixed `load_structure_json` to correctly account for spin degeneracy. - Updated `pdso.py` and tests to reflect directory rename.
…hod and rename the legacy text export to `to_pardiso_debug`.
…pdso` entrypoint with ill-conditioned state projection parameters.
…o backend, including platform support details.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughThis PR integrates a high-performance Julia-based Pardiso eigensolver backend into DeePTB. The addition includes a modular Julia backend with separate I/O, solver, and task modules; a new ChangesPardiso Backend Integration
🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 18
🤖 Fix all issues with AI agents
In `@dptb/entrypoints/pdso.py`:
- Around line 18-22: The parameters log_level, log_path, and **kwargs on the
function signature in dptb/entrypoints/pdso.py are currently unused; either
remove them or configure logging at the start of that function: import logging,
call logging.basicConfig(level=log_level, **kwargs) (or pass specific kwargs
through), get a logger (e.g., log = logging.getLogger(__name__)), and if
log_path is set add a FileHandler with file_handler.setLevel(log_level) and
log.addHandler(file_handler); if you prefer to drop them, remove log_level,
log_path and **kwargs from the function signature and any docs/comments
referring to them.
In `@dptb/postprocess/pardiso/io/io.jl`:
- Around line 116-124: The z_to_symbol Dict is missing atomic numbers 57–71
(lanthanides) and a few others so lookups like z_to_symbol[z] (used later in
this module) will KeyError; fix by either populating z_to_symbol with the full
mapping for 1–~84 (add 57=>"La", 58=>"Ce", 59=>"Pr", 60=>"Nd", 61=>"Pm",
62=>"Sm", 63=>"Eu", 64=>"Gd", 65=>"Tb", 66=>"Dy", 67=>"Ho", 68=>"Er", 69=>"Tm",
70=>"Yb", 71=>"Lu" and any other missing entries) or replace direct lookups with
a safe lookup using get(z_to_symbol, z, fallback_symbol) or integrate a
periodic-table package (e.g. PeriodicTable.jl) and use its lookup function
instead.
In `@dptb/postprocess/pardiso/main.jl`:
- Around line 118-130: Refactor pardiso_solver.jl to declare a module (e.g.,
module PardisoSolver) and export/define solve_eigen_k inside it (mirroring
DenseSolver), then update main.jl to select the implementation by qualifying the
function or importing selectively: when eig_solver == "dense" use
DenseSolver.solve_eigen_k (or import DenseSolver: solve_eigen_k), and when
eig_solver == "pardiso" use PardisoSolver.solve_eigen_k (or import
PardisoSolver: solve_eigen_k); ensure main.jl no longer relies on an unmodulized
global solve_eigen_k so the Pardiso implementation is not shadowed by
DenseSolver.
In `@dptb/postprocess/pardiso/solvers/pardiso_solver.jl`:
- Around line 1-13: This file is missing a module wrapper so the export
statements (used later around the exports near the end of the file) have no
effect; fix it by wrapping the file contents in a module declaration (e.g.,
module PardisoSolver) and a matching end at the bottom of pardiso_solver.jl,
moving the using statements (using Pardiso, Arpack, LinearMaps, LinearAlgebra)
and const default_dtype inside that module so exported symbols work correctly;
ensure the module name you choose matches the intended public API referenced by
the export statements.
In `@dptb/postprocess/pardiso/sparse_calc_npy_print.jl`:
- Around line 326-334: The z_to_symbol Dict in sparse_calc_npy_print.jl is
incomplete (missing Z=57–71, 85–88, 89+) and will raise KeyError for those
atoms; update the z_to_symbol mapping to include the full periodic table or
replace its usage with a reliable lookup (e.g., using a package like
PeriodicTable.jl or a complete array/map) and add a safe fallback (e.g., return
"X" or string(z) when an unknown Z is requested) wherever z_to_symbol is
accessed (search for z_to_symbol in this file to locate call sites).
- Around line 398-400: Remove the duplicated save call: there are two identical
invocations of save(sparse_file, "H_R", H_R, "S_R", S_R) back-to-back; keep a
single save(...) to avoid redundant I/O and leave the subsequent
tee_info("Sparse matrices constructed and cached", log_path) unchanged so the
caching message still logs after the one save completes.
In `@dptb/postprocess/pardiso/tasks/dos_calculation.jl`:
- Around line 94-98: The format string in the `@printf` call within the open(...,
"w") do f block is using an escaped backslash sequence ("\\n") which will write
a literal backslash and 'n' instead of a newline; update the `@printf` call in the
loop (for (ω, d) in zip(ωlist, dos)) to use a normal newline escape ("\n") so
each record is written on its own line and the output file dos.dat is correctly
formatted.
In `@dptb/tests/test_to_pardiso.py`:
- Around line 118-126: total_orbitals and expected_orbitals are assigned but
never asserted; compute expected_orbitals from the per-atom orbital count (e.g.
data["basis_info"]["orbitals_per_atom"] or similar field) multiplied by the
number of atoms in the test system, then if has_soc (tbsys.model.soc_param
present) adjust expected_orbitals for spin doubling and assert that
data["basis_info"]["total_orbitals"] == expected_orbitals; add a clear assertion
message referencing total_orbitals, expected_orbitals and the spinful flag to
fail loudly if mismatch.
In `@examples/To_pardiso/dptb_to_Pardiso_new.ipynb`:
- Around line 133-135: The julia_script path is wrong and will raise
FileNotFoundError; update the string used to build the Julia script path (the
julia_script variable where parent_path is joined) from
"dptb/postprocess/julia/main.jl" to the correct
"dptb/postprocess/pardiso/main.jl" so it matches other references (see pdso.py
usage) and ensure parent_path/julia_script points to the actual main.jl; verify
the related variables parent_path and config_path remain unchanged.
- Around line 107-108: Replace the non-existent method call tbsys.to_pardiso_new
with the correct existing method tbsys.to_pardiso throughout the notebook and
its documentation: update the call at the shown cell (lines calling
to_pardiso_new), any example/test references that mention to_pardiso_new, and
any explanatory text or docstrings that describe to_pardiso_new so they
reference to_pardiso instead; verify the notebook now invokes the actual
function name to_pardiso (the implementation is the existing to_pardiso method).
In `@examples/To_pardiso/dptb_to_Pardiso.ipynb`:
- Around line 263-265: The notebook sets julia_script using parent_path and the
outdated subpath "dptb/postprocess/julia/sparse_calc_npy_print.jl"; update the
julia_script assignment to point to the new location under
"dptb/postprocess/pardiso/" (replace the "dptb/postprocess/julia/..." segment
with "dptb/postprocess/pardiso/...") so the variable julia_script resolves to
the moved script while still constructing the path from parent_path.
In `@examples/To_pardiso/README.md`:
- Around line 85-89: The README k-point example uses "klabels": ["Γ", "Z"] which
is inconsistent with the project's config files that use "G"; update the
documentation to match the actual configs by replacing the "Γ" label with "G"
(or alternatively update the config files to use "Γ" if you prefer the Unicode
label) so that the "klabels" entries in the README and the "klabels" in
band.json / band_new.json are identical; locate the "klabels" array in the
README example and make the label change accordingly.
In `@examples/To_pardiso/test_pardiso_new.py`:
- Around line 15-16: The current sys.path insertion uses
os.path.dirname(os.path.dirname(os.getcwd())) which is fragile; change the
insertion to derive the project root from the script file location (use
__file__) instead — e.g. compute the absolute path via
os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..')) or
Path(__file__).resolve().parents[2] and pass that to sys.path.insert(0, ...);
update the call site where sys.path.insert is invoked and remove the
os.getcwd()-based os.path.dirname(os.path.dirname(os.getcwd())) expression.
In `@install_julia_packages.jl`:
- Around line 29-46: The loop that installs packages (iterating over packages
and calling Pkg.add) currently swallows failures and proceeds to Pkg.precompile,
so change it to collect failed package names when a Pkg.add throws (e.g., push!
to a failed_packages Vector), print a summary of failed packages after the loop
and abort immediately (exit(1) or throw an error) if failed_packages is
non-empty instead of calling Pkg.precompile or printing "Installation complete";
keep the success/failed prints per-package but ensure Pkg.precompile only runs
when failed_packages is empty.
In `@install_julia.sh`:
- Around line 68-74: The current call julia install_julia_packages.jl is
relative to the current working directory; make it path-independent by resolving
the script's directory and invoking Julia with the absolute path to
install_julia_packages.jl (i.e., compute the script folder from $0/BASH_SOURCE
and call Julia with "$SCRIPT_DIR/install_julia_packages.jl"). Update
install_julia.sh to resolve the script directory and use that absolute path when
running the install_julia_packages.jl command so the install works regardless of
where the user runs the shell script.
In `@README.md`:
- Around line 197-202: Update the README's "Manual Installation" bash snippet
comment that currently reads "# Linux/macOS" to clarify macOS support: change
the comment to explicitly state that the installer command works on macOS but
the Pardiso backend is not supported (e.g., "# Linux (macOS: Julia installs but
Pardiso backend won't work)") or remove "macOS" entirely; modify the comment
above the curl command in the Manual Installation section so readers know Julia
installs on macOS but Pardiso is unsupported, referencing the existing "#
Linux/macOS" comment to locate the change.
In `@test_mkl_pardiso.jl`:
- Around line 37-39: The condition uses the Ref{Bool} flag MKL_PARDISO_LOADED
but fails to dereference it; change the conditional and any access to
MKL_PARDISO_LOADED to use MKL_PARDISO_LOADED[] so the Bool value is read (e.g.,
if !MKL_PARDISO_LOADED[] ...), and update any other occurrences in this test
(such as prints or branches) to consistently use the [] dereference when
checking or reading the flag.
- Around line 10-13: Replace the hard-coded macOS library extension when
building mkl_path in test_mkl_pardiso.jl: instead of appending ".dylib"
directly, use a platform-aware extension (e.g. Libdl.dlext or conditional on
Sys.islinux()/Sys.isapple()/Sys.iswindows()) and construct mkl_path with
joinpath(MKL_jll.LIBPATH[], "libmkl_rt"*Libdl.dlext) so the test finds the
correct MKL runtime on Linux (.so), macOS (.dylib) or Windows (.dll).
🧹 Nitpick comments (34)
CLAUDE.md (1)
51-89: Consider documenting the newpdsoCLI command.The PR introduces a new
pdsoentrypoint for the Pardiso workflow, but it's not documented in the "Running DeePTB" section. For completeness, consider adding:# Run Pardiso workflow for postprocessing uv run dptb pdso INPUT -i INIT_MODEL [-stu STRUCTURE] [-o OUTPUT]examples/To_pardiso/band.json (2)
32-37: String booleans instead of native JSON booleans.Multiple fields use string
"false"instead of JSON booleanfalse. This is consistent withband_new.json, but native booleans would be more idiomatic and avoid potential parsing issues."device": "cpu", - "out_wfc": "false", + "out_wfc": false, "which_k": 0, "max_iter": 400, "num_band": 30, - "gamma_only": "false", - "isspinful": "false" + "gamma_only": false, + "isspinful": false }
37-38: Minor: Missing trailing newline.The file doesn't end with a newline character, which is a POSIX convention. Most editors and linters prefer files to end with a newline.
examples/To_pardiso/README.md (1)
63-69: Hardcoded relative path to Julia script may be fragile.The Python API example uses a relative path
../../dptb/postprocess/pardiso/main.jlwhich will break if the script is run from a different directory. Consider documenting the need to adjust the path or suggesting an absolute path approach.Suggested improvement
import subprocess import os # Get path relative to dptb package installation import dptb pardiso_main = os.path.join(os.path.dirname(dptb.__file__), "postprocess/pardiso/main.jl") subprocess.run([ "julia", pardiso_main, "--input_dir", "pardiso_data", "--output_dir", "results", "--config", "band_new.json" ])install_julia_packages.jl (1)
7-10: Activate a project-local environment for reproducible installs.
Installing into the default environment can cause version drift across machines. Consider activating a local env (and committing Project/Manifest) so this script is deterministic.♻️ Suggested adjustment
using Pkg + +# Use a project-local environment for reproducible installs +Pkg.activate(`@__DIR__`)dptb/postprocess/pardiso/README.md (1)
161-166: Minor wording polish in Performance Tips.
Consider removing “very” or using a more precise phrasing.✏️ Suggested tweak
-4. **Memory**: For very large systems (>10000 orbitals), consider reducing `num_band` +4. **Memory**: For large systems (>10,000 orbitals), consider reducing `num_band`examples/To_pardiso/pardiso_tutorial.ipynb (1)
318-326: Small notebook cleanups (label name + f-string).
Rename the list-comprehension variable for clarity and drop the f-string with no placeholders.✏️ Suggested tweak
- labels = [l.decode('utf-8') for l in labels_bytes] + labels = [label.decode('utf-8') for label in labels_bytes] ... -print(f"Successfully loaded bandstructure.h5") +print("Successfully loaded bandstructure.h5")dptb/postprocess/pardiso/utils/kpoints.jl (1)
21-56:klabels_vecis returned but never populated fromkpath_config.The function returns
klabels_vecwhich is either empty or a copy of thelabelsparameter, but the k-path configuration rows likely contain label information that isn't being extracted. If labels are expected to come fromkpath_config(e.g., as a 5th element), they should be parsed; otherwise, the docstring should clarify that labels must be passed explicitly.Additionally, when
n_segment == 0for a segment, no intermediate k-points are added, buthigh_sym_kptsstill accumulates the distance. This may cause a mismatch betweenhigh_sym_kptsindices and actual k-point positions inklist_vec.dptb/postprocess/pardiso/solvers/dense_solver.jl (2)
70-74: Empty conditional block is dead code.The
ifblock at lines 71-74 contains only a comment and no executable code. Either implement the padding logic or remove this block to reduce confusion.♻️ Suggested fix
- # Check if we have enough - if length(closest_indices) < num_band - # Pad with very large values if not enough states (unlikely for dense) - # But here we just return what we have - end + # Note: If length(closest_indices) < num_band, we return what we have. + # This is unlikely for dense diagonalization but handled gracefully.
46-68: Band selection strategy may cause discontinuities in band structure plots.Selecting the
num_bandeigenvalues closest tofermi_levelindependently at each k-point can cause band switching/discontinuities when bands cross the selection boundary. The inline comments acknowledge this concern.For production use, consider either:
- Selecting a fixed index range (e.g., bands
nton+num_band-1) determined at the first k-point- Using an energy window
[fermi_level - delta, fermi_level + delta]This is acceptable for initial implementation but worth revisiting if visual artifacts appear.
dptb/postprocess/pardiso/io/io.jl (2)
78-84: Potential inconsistency between calculatedsite_norbitsandtotal_orbitalsfrom JSON.
site_norbitsis calculated locally (line 78) whilenorbitsis read directly frombasis["total_orbitals"](line 84). If these don't match due to data inconsistency, downstream code may fail silently. Consider adding a validation check.♻️ Suggested validation
site_norbits = [orb_counts[sym] * (1 + spinful) for sym in symbols] + + # Validate consistency + calculated_total = sum(site_norbits) + if calculated_total != basis["total_orbitals"] + `@warn` "Mismatch: calculated orbitals ($calculated_total) != JSON total ($(basis["total_orbitals"]))" + end structure = Dict{String, Any}(
183-184: Missing blank line before docstring.There's no blank line between the closing of
load_matrix_hdf5(line 183) and the docstring forexpand_species(line 184). This affects readability and may cause documentation generation issues.♻️ Suggested fix
end end + """ Expand chemical formula string (e.g. "C2H2") or list to list of symbols. """dptb/entrypoints/main.py (2)
511-517: Argument naming inconsistency:--init_modelvs--init-model.Other subparsers (train, test, run, export) use
--init-modelwith a hyphen, but pdso uses--init_modelwith an underscore. This inconsistency may confuse users. Argparse converts hyphens to underscores internally, so--init-modelwould still map toinit_modelin the namespace.♻️ Suggested fix for consistency
parser_pdso.add_argument( "-i", - "--init_model", + "--init-model", type=str, default=None, help="Path to model checkpoint (triggers Export + Run mode)." )Note: You'll also need to update
dptb/entrypoints/pdso.pyparameter name frominit_modelto match, or usedest="init_model"in the argument definition.
543-548: Non-standard boolean argument parsing.Using
type=lambda x: x.lower() == 'true'requires users to type--ill_project trueor--ill_project false. The more common CLI pattern uses--flag/--no-flagwithstore_true/store_falseactions, orBooleanOptionalActionin Python 3.9+.Current approach works but may surprise users expecting standard flag behavior.
♻️ Alternative using BooleanOptionalAction (Python 3.9+)
parser_pdso.add_argument( - "--ill_project", - type=lambda x: x.lower() == 'true', - default=True, - help="Enable ill-conditioned state projection (default: True)." + "--ill-project/--no-ill-project", + action=argparse.BooleanOptionalAction, + default=True, + help="Enable/disable ill-conditioned state projection (default: enabled)." )This allows
--ill-project(enable) or--no-ill-project(disable).examples/To_pardiso/dptb_to_Pardiso_new.ipynb (2)
47-50: Incorrect use of"__file__"string literal.Line 47 uses
os.path.abspath("__file__")which gets the absolute path of a literal string"__file__", not the notebook's location. In Jupyter notebooks,__file__is not defined, so the fallback toos.getcwd()at line 50 will always execute. Consider simplifying:♻️ Suggested fix
-"root_dir = os.path.dirname(os.path.abspath(\"__file__\")) \n", -"# Note: In Jupyter __file__ might not exist, using current dir if needed\n", -"if not os.path.exists(root_dir) or root_dir == '':\n", -" root_dir = os.getcwd()\n", +"# In Jupyter notebooks, use the current working directory\n", +"root_dir = os.getcwd()\n",
191-193: Ambiguous variable namel.The variable
l(lowercase L) can be confused with1(one) orI(uppercase i) in many fonts. Use a more descriptive name likelabelorlbl.♻️ Suggested fix
-" labels = [l.decode('utf-8') for l in labels_bytes]\n", +" labels = [label.decode('utf-8') for label in labels_bytes]\n",examples/To_pardiso/test_pardiso_new.py (1)
13-13: Remove unused import.
numpyis imported but never used in this file.Proposed fix
import os import sys import json -import numpy as npdptb/postprocess/pardiso/tasks/dos_calculation.jl (1)
89-92: Consider vectorizing the Gaussian broadening loop for performance.The nested triple loop iterates over
nk_total × num_band × length(ωlist), which can be expensive for large systems. Julia's broadcasting could significantly improve performance:Vectorized alternative
# Vectorized approach using broadcasting for ik in 1:nk_total for ib in 1:num_band diff = egvals_all[ib,ik] .- ωlist .- fermi_level dos .+= exp.(-(diff.^2 / ϵ^2)) .* factor end enddptb/tests/test_to_pardiso.py (1)
7-7: Remove unused import.
astis imported but never used in this file.Proposed fix
import json -import ast import numpy as npdptb/postprocess/pardiso/main.jl (1)
100-101: Fragile boolean parsing from config.The expression
config["isspinful"] in [true, "true"]mixes boolean and string comparisons. This could miss valid inputs like"True"or1.Proposed fix
- spinful = haskey(config, "isspinful") ? (config["isspinful"] in [true, "true"]) : false + spinful = haskey(config, "isspinful") ? (lowercase(string(config["isspinful"])) in ["true", "1"]) : falsedocs/pardiso_architecture.md (1)
18-30: Add language specifier to fenced code block.The directory structure code block lacks a language specifier. Use
textorplaintextfor better markdown compatibility.Proposed fix
-``` +```text dptb/postprocess/julia/ ├── io/dptb/entrypoints/pdso.py (2)
12-23: Use explicitOptionaltype hints per PEP 484.Parameters with
= Nonedefault should useOptional[T]for clarity and static analysis compliance.Proposed fix
+from typing import Optional + def pdso( INPUT: str, - init_model: str = None, - structure: str = None, - data_dir: str = None, + init_model: Optional[str] = None, + structure: Optional[str] = None, + data_dir: Optional[str] = None, output_dir: str = "./", log_level: int = 20, - log_path: str = None, + log_path: Optional[str] = None, ill_project: bool = True, ill_threshold: float = 5e-4, **kwargs ):
81-83: Simplify exception logging.
log.exception()already includes the exception information automatically; passingeexplicitly is redundant.Proposed fix
except Exception as e: - log.exception(f"Export failed: {e}") + log.exception("Export failed") sys.exit(1)dptb/postprocess/pardiso/utils/hamiltonian.jl (2)
14-44: Consider using JLD2 instead of Serialization for caching.The
Serializationmodule in Julia is not safe for loading untrusted data and can execute arbitrary code during deserialization. For a cache file that might be shared or persisted,JLD2.jlprovides a safer and more portable alternative.Additionally, the cache file extension
.jldis misleading since it's actually using Julia's nativeSerializationformat, not the JLD format.♻️ Suggested improvement using JLD2
-using Serialization +using JLD2 function get_HR_SR_sparse(input_dir::String, structure::Dict, matrix_loader::Function, use_cache::Bool=true) - cache_file = joinpath(input_dir, "sparse_matrices.jld") + cache_file = joinpath(input_dir, "sparse_matrices.jld2") if use_cache && isfile(cache_file) println("Loading H/S matrices from cache: $cache_file") try - data = deserialize(cache_file) - if isa(data, Tuple) && length(data) == 2 - return data[1], data[2] - else - println("Cache format invalid. Rebuilding...") - end + `@load` cache_file H_R S_R + return H_R, S_R catch e println("Failed to load cache: $e. Rebuilding...") end end # ... rest of function ... if use_cache println("Saving sparse H/S matrices to cache: $cache_file") - serialize(cache_file, (H_R, S_R)) + `@save` cache_file H_R S_R end
101-102: Consider usingTuple{Int,Int,Int}instead ofVector{Int}for dictionary keys.Using
Vector{Int}as dictionary keys is less efficient because vectors are mutable and require content-based hashing. Tuples are immutable and have faster hash/equality operations.♻️ Suggested improvement
- H_data = Dict{Vector{Int}, Tuple{Vector{Int}, Vector{Int}, Vector{ComplexF64}}}() - S_data = Dict{Vector{Int}, Tuple{Vector{Int}, Vector{Int}, Vector{ComplexF64}}}() + H_data = Dict{Tuple{Int,Int,Int}, Tuple{Vector{Int}, Vector{Int}, Vector{ComplexF64}}}() + S_data = Dict{Tuple{Int,Int,Int}, Tuple{Vector{Int}, Vector{Int}, Vector{ComplexF64}}}() # And later: - R = [rx, ry, rz] + R = (rx, ry, rz)This would also require updating the return type and
HR2HKto use tuples for R vectors.dptb/postprocess/pardiso/tasks/band_calculation.jl (2)
92-120: Inefficient file I/O: opening/closing file on every k-point.The
bands.datfile is opened and closed for each k-point in the loop (lines 112-120). For large k-point calculations, this creates significant I/O overhead.♻️ Suggested improvement: Keep file handle open during calculation
# Initialize text output (bands.dat) txt_path = joinpath(output_dir, "bands.dat") - open(txt_path, "w") do f - `@printf`(f, "# %4s %10s %10s %10s %12s %s\n", "Idx", "Kx", "Ky", "Kz", "Dist", "Eigenvalues(eV, shifted by Fermi)") - end all_egvals = Vector{Vector{Float64}}() start_time = time() - # Main calculation loop - for (ik, kpt) in enumerate(klist) + # Main calculation loop - keep file open for efficiency + open(txt_path, "w") do f + `@printf`(f, "# %4s %10s %10s %10s %12s %s\n", "Idx", "Kx", "Ky", "Kz", "Dist", "Eigenvalues(eV, shifted by Fermi)") + + for (ik, kpt) in enumerate(klist) # Construct H(k) and S(k) H_k, S_k = HR2HK(kpt, H_R, S_R, norbits) # Solve eigenvalue problem using provided solver egvals, _, _ = solver_func(H_k, S_k, fermi_level, num_band, max_iter, false, solver_opts.ill_project, solver_opts.ill_threshold) push!(all_egvals, egvals) - # Append to text file incrementally - open(txt_path, "a") do f - # Write K-point info - `@printf`(f, "%6d %10.6f %10.6f %10.6f %12.6f", ik, kpt[1], kpt[2], kpt[3], xlist[ik]) - # Write eigenvalues (shifted by Fermi level for consistency with plot) - for e in egvals - `@printf`(f, " %12.6f", e - fermi_level) - end - `@printf`(f, "\n") + # Write K-point info + `@printf`(f, "%6d %10.6f %10.6f %10.6f %12.6f", ik, kpt[1], kpt[2], kpt[3], xlist[ik]) + for e in egvals + `@printf`(f, " %12.6f", e - fermi_level) end + `@printf`(f, "\n") # Progress logging if ik % 10 == 0 || ik == length(klist) elapsed = (time() - start_time) / 60 log_message(`@sprintf`("K-point %4d/%d done | Elapsed: %.2f min", ik, length(klist), elapsed)) end + end end
59-68: Consider logging file errors instead of silently ignoring them.The empty
catchblock makes debugging difficult if file writes consistently fail. At minimum, log a warning.♻️ Suggested improvement
catch e - # Ignore file errors to prevent crash + `@debug` "Failed to write to log file: $e" enddptb/postprocess/pardiso/sparse_calc_npy_print.jl (1)
559-584: Executing embedded Python scripts is fragile and potentially insecure.This approach has several concerns:
- Assumes
python3is available in PATH- Creates temporary files that could fail to clean up on errors
- The Python script path contains the output directory which could have special characters
Consider using Julia's
NPZ.jlpackage for direct NPY file writing, or at minimum add proper error handling and path escaping.♻️ Alternative using NPZ.jl
using NPZ function save_bandstructure_npy(klist, xlist, eigenvalues, e_fermi, high_sym, labels, output_dir, log_path) try # Convert data to arrays klist_arr = hcat(klist...) eig_arr = hcat(eigenvalues...) npzwrite(joinpath(output_dir, "bandstructure.npz"), Dict( "klist" => klist_arr, "xlist" => collect(xlist), "eigenvalues" => eig_arr, "E_fermi" => e_fermi, "high_sym_kpoints" => collect(high_sym), # Note: NPZ doesn't handle string arrays well, save separately if needed )) tee_info("Generated bandstructure.npz", log_path) catch e `@warn` "Failed to generate bandstructure.npz: $e" tee_log("[Warn] Failed to generate bandstructure.npz: $e", log_path) end enddocs/phase1_summary.md (1)
18-29: Add language specifier to fenced code block.The code block starting at line 18 is missing a language identifier. Based on static analysis hint, this should specify the language for proper syntax highlighting.
♻️ Proposed fix
-``` +```text dptb/postprocess/julia/ ├── io/ │ ├── structure_io.jl # Load JSON structuredptb/postprocess/pardiso/solvers/pardiso_solver.jl (1)
59-114: Code duplication:make_shift_invert_mapcall is identical in both branches.Lines 61 and 101 create the same shift-invert map. The map creation and Pardiso setup can be hoisted before the
ifstatement.♻️ Suggested refactor
function solve_eigen_k(H_k, S_k, fermi_level, num_band, max_iter, out_wfc, ill_project, ill_threshold) + # Common setup for both branches + lm, ps = make_shift_invert_map(Hermitian(H_k) - fermi_level * Hermitian(S_k), Hermitian(S_k)) + if ill_project - lm, ps = make_shift_invert_map(Hermitian(H_k) - fermi_level * Hermitian(S_k), Hermitian(S_k)) - if out_wfc egval_inv, egvec_sub = eigs(lm, nev=num_band, which=:LM, ritzvec=true, maxiter=max_iter) # ... rest of ill_project branch ... - - set_phase!(ps, Pardiso.RELEASE_ALL) - pardiso(ps) else - lm, ps = make_shift_invert_map(Hermitian(H_k) - fermi_level * Hermitian(S_k), Hermitian(S_k)) - if out_wfc egval_inv, egvec = eigs(lm, nev=num_band, which=:LM, ritzvec=true, maxiter=max_iter) # ... rest of non-ill_project branch ... - - set_phase!(ps, Pardiso.RELEASE_ALL) - pardiso(ps) end + + # Common cleanup + set_phase!(ps, Pardiso.RELEASE_ALL) + pardiso(ps)dptb/postprocess/unified/system.py (4)
536-544: Unused variableorb_typeand redundant inner loop.The variable
orb_typeis assigned but never used (Ruff F841). The inner loop iterates throughl_mapkeys but the matched keytis used directly instead.♻️ Proposed fix
for elem, orbs in basis.items(): norb = 0 for orb in orbs: - orb_type = orb[-1] for t in l_map: if t in orb: norb += l_map[t] break orbital_counts[elem] = norb
571-574: Remove extraneousfprefix from strings without placeholders.Per Ruff F541, these f-strings contain no placeholders and should be regular strings.
♻️ Proposed fix
- log.info(f"Successfully saved all Pardiso data (NEW format).") + log.info("Successfully saved all Pardiso data (NEW format).") log.info(f" - Hamiltonian blocks: {len(hr)}") log.info(f" - Structure: {len(self.atoms)} atoms, {basis_info['total_orbitals']} orbitals") - log.info(f" - Files: predicted_hamiltonians.h5, predicted_overlaps.h5, structure.json") + log.info(" - Files: predicted_hamiltonians.h5, predicted_overlaps.h5, structure.json")
588-606: Complex regex-based JSON formatting may be fragile.The regex approach for collapsing JSON arrays and objects is clever but can be brittle with edge cases (e.g., strings containing brackets, deeply nested structures). Consider using Python's
jsonmodule with a custom encoder or thesimplejsonlibrary's formatting options.Also, line 594 has multiple statements on one line (Ruff E701).
♻️ Minor fix for line 594
try: - if not tokens: return "[]" + if not tokens: + return "[]" # Check if numbers
519-520: Moveimport dptbto top of file.Importing inside a method is unconventional and can mask import errors until the method is called. Since
dptbis the package this file is part of, it should always be available.♻️ Suggested improvement
Add at top of file:
import dptbAnd remove from line 519.
|
@YJQ-7 看一下,我重构了一下代码。之前的目前还支持。后面可以梳理一下。弄干净点。 |
好的老师 |
…at/modular-julia-pardiso
…lver in a module, and add ill-conditioned projection and improved eigenvector handling to the dense solver. (#2)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@dptb/postprocess/pardiso/solvers/pardiso_solver.jl`:
- Around line 60-115: The solve_eigen_k_pardiso function can leak MKL/Pardiso
resources if eigs (or other linear algebra) throws because set_phase!(ps,
Pardiso.RELEASE_ALL) and pardiso(ps) are only called on the happy path; wrap the
creation/use of the Pardiso state (lm, ps from make_shift_invert_map) and all
eigs/linear-algebra work in a try/finally block and move set_phase!(ps,
Pardiso.RELEASE_ALL) and pardiso(ps) into the finally so they always run; apply
this to both branches (ill_project true and false) and ensure variables (egval,
egvec, etc.) are assigned before rethrowing or returning so callers still see
deterministic state.
🧹 Nitpick comments (4)
dptb/postprocess/pardiso/solvers/dense_solver.jl (3)
13-15:default_dtypeis not exported but may be intended as public API.The constant
default_dtypeis defined but not included in theexportstatement. If this constant is intended for external use (e.g., by callers needing to match the expected data type), consider exporting it:-export solve_eigen_k_dense +export solve_eigen_k_dense, default_dtypeOtherwise, if it's internal-only, prefix with underscore per Julia convention (
_default_dtype).
17-32: Docstring is incomplete—missing 4 parameters.The docstring documents only 4 of the 8 function parameters. Consider adding documentation for:
max_iter: Unused in dense solver (kept for API compatibility)out_wfc: Whether to compute and return eigenvectorsill_project: Enable ill-conditioning handling via S-eigenspace projectionill_threshold: Threshold for filtering small S-eigenvalues
33-36: Consider underscore prefix for unusedmax_iterparameter.The
max_iterparameter is never used in this dense solver (expected, since dense diagonalization doesn't iterate). Julia convention suggests prefixing unused parameters with underscore to signal intent:-function solve_eigen_k_dense(H_k, S_k, fermi_level, num_band, max_iter, out_wfc, ill_project, ill_threshold) +function solve_eigen_k_dense(H_k, S_k, fermi_level, num_band, _max_iter, out_wfc, ill_project, ill_threshold)dptb/postprocess/pardiso/solvers/pardiso_solver.jl (1)
43-47: Reduce per-iteration allocation insideLinearMapapply.
S * xallocates every multiply; Arpack will call this many times. Consider a preallocated buffer withmul!to cut allocations and improve solver throughput.♻️ Suggested refactor (preallocate & mul!)
- lm = LinearMap{ComplexF64}( - (y, x) -> begin - set_phase!(ps, Pardiso.SOLVE_ITERATIVE_REFINE) - pardiso(ps, y, H_pardiso, S * x) - end, - size(H, 1); - ismutating=true - ) + tmp = similar(b) + lm = LinearMap{ComplexF64}( + (y, x) -> begin + mul!(tmp, S, x) + set_phase!(ps, Pardiso.SOLVE_ITERATIVE_REFINE) + pardiso(ps, y, H_pardiso, tmp) + end, + size(H, 1); + ismutating=true + )
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dptb/entrypoints/pdso.py`:
- Around line 54-55: Resolve and validate the INPUT-derived path before calling
Julia: after computing config_path = os.path.abspath(INPUT) (and before any
Julia invocation), verify the file exists and is readable (e.g.,
os.path.exists/config_path and os.path.isfile(config_path) or equivalent) and
raise a clear exception or exit with an informative error if not; update error
message to reference config_path so failures are fast and not opaque. Also
ensure output_path resolution remains unchanged and only proceed to invoke the
Julia logic when config_path validation passes.
In `@dptb/postprocess/unified/system.py`:
- Around line 588-607: The l_map used to compute orbital counts is missing the
'g' orbital mapping, causing incorrect orbital_counts, site_norbits, and
total_orbitals for bases that include g functions; update the mapping in the
block that defines l_map so it includes "g": 9 (alongside
"s":1,"p":3,"d":5,"f":7), then regenerate
orbital_counts/site_norbits/total_orbitals (the code that populates
orbital_counts, computes site_norbits, and sets total_orbitals in basis_info) so
g orbitals are counted correctly.
In `@examples/To_pardiso/test_pardiso_new.py`:
- Around line 93-99: The test is swallowing backend failures by catching
subprocess.CalledProcessError and FileNotFoundError and only printing messages;
modify the test around the subprocess.run(cmd, check=True) call so failures
cause the test to fail or be explicitly skipped: for FileNotFoundError call
pytest.skip with a clear message, and for CalledProcessError either re-raise the
exception (remove the except block) or assert the subprocess return code / let
check=True propagate the error so the test fails; reference the subprocess.run
call and the surrounding test_* function to locate and update the handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 265088cf-af47-49e9-ac82-e1883be17b9f
📒 Files selected for processing (6)
dptb/entrypoints/pdso.pydptb/postprocess/pardiso/io/io.jldptb/postprocess/unified/calculator.pydptb/postprocess/unified/system.pydptb/tests/test_to_pardiso.pyexamples/To_pardiso/test_pardiso_new.py
✅ Files skipped from review due to trivial changes (1)
- dptb/postprocess/unified/calculator.py
🚧 Files skipped from review as they are similar to previous changes (2)
- dptb/postprocess/pardiso/io/io.jl
- dptb/tests/test_to_pardiso.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dptb/postprocess/pardiso/solvers/pardiso_solver.jl (1)
60-106:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winVariable scoping error:
egval_invandegvec_subused outside thetryblock where they are assigned.In Julia, variables assigned inside a
tryblock are local to that block and cannot be accessed afterward. Lines 66–69 assignegval_invandegvec_subinside the try block, but lines 80 and 82 attempt to use them outside, causingUndefVarErrorat runtime.Initialize these variables before the try block:
Fix: Declare variables before try block
if ill_project lm, ps = make_shift_invert_map(Hermitian(H_k) - fermi_level * Hermitian(S_k), Hermitian(S_k)) + egval_inv = nothing + egvec_sub = nothing try if out_wfc egval_inv, egvec_sub = eigs(lm, nev=num_band, which=:LM, ritzvec=true, maxiter=max_iter) else egval_inv = eigs(lm, nev=num_band, which=:LM, ritzvec=false, maxiter=max_iter)[1] egvec_sub = zeros(default_dtype, size(H_k, 1), 0) end finally🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dptb/postprocess/pardiso/solvers/pardiso_solver.jl` around lines 60 - 106, The variables egval_inv and egvec_sub are assigned only inside the try in solve_eigen_k_pardiso, causing a scoping error when used later; to fix, declare and initialize egval_inv and egvec_sub just before the try (e.g. egval_inv = similar(...) or egval_inv = zeros(...); egvec_sub = zeros(default_dtype, size(H_k,1), 0)) so they exist in the outer scope regardless of which branch inside the try runs, then keep the existing assignments inside the try (where eigs is called) and leave the resource-release code (set_phase!/pardiso) unchanged.
♻️ Duplicate comments (2)
dptb/entrypoints/pdso.py (1)
57-58:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate
config_pathbefore invoking Julia.Fail fast if the config file is missing/unreadable to avoid opaque backend errors.
Proposed fix
config_path = os.path.abspath(INPUT) + if not (os.path.isfile(config_path) and os.access(config_path, os.R_OK)): + log.error(f"Configuration file not found or not readable: {config_path}") + sys.exit(1) output_path = os.path.abspath(output_dir)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dptb/entrypoints/pdso.py` around lines 57 - 58, Validate that the resolved config_path (created from INPUT) exists and is readable before calling into Julia: after computing config_path = os.path.abspath(INPUT) check os.path.isfile(config_path) and os.access(config_path, os.R_OK) (or equivalent) and if the check fails log a clear error and exit/raise so the process fails fast instead of letting the backend produce opaque errors; update the code around the config_path/output_path assignment to perform this validation and use INPUT, config_path, output_dir, and output_path names to locate where to add the check.examples/To_pardiso/test_pardiso_new.py (1)
94-100:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not swallow backend failures in this test flow.
This block only prints on failure, so the test can appear successful when Julia execution fails or is missing.
Proposed fix
print(f"Running: {' '.join(cmd)}") - try: - subprocess.run(cmd, check=True) - print("Julia backend run successfully!") - except subprocess.CalledProcessError as e: - print(f"Julia execution failed with code {e.returncode}") - except FileNotFoundError: - print("Julia executable not found. Skipping execution.") + try: + subprocess.run(cmd, check=True) + print("Julia backend run successfully!") + except FileNotFoundError as e: + raise RuntimeError("Julia executable not found in PATH.") from e🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/To_pardiso/test_pardiso_new.py` around lines 94 - 100, The test currently swallows subprocess.run failures by only printing in the except blocks; update the try/except around subprocess.run (the block handling CalledProcessError and FileNotFoundError) to not hide failures — either remove the except for CalledProcessError so the exception propagates and fails the test, or re-raise the CalledProcessError after logging; for FileNotFoundError, replace the print with a proper test skip (e.g., call pytest.skip) so missing Julia marks the test as skipped rather than passing. Ensure references to subprocess.run, CalledProcessError, and FileNotFoundError are updated accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dptb/postprocess/pardiso/solvers/pardiso_solver.jl`:
- Around line 107-127: The variables egval_inv, egval, and egvec are only
assigned inside the try block but are referenced after the finally; to fix,
predeclare and initialize egval_inv, egval, and egvec before the try so they are
in scope for later use (use appropriate shapes/types matching later code, e.g.,
egvec = zeros(default_dtype, size(H_k,1), 0) or egval_inv = similar(empty array)
so the later logic that computes egval = real(1 ./ egval_inv) .+ fermi_level and
uses egvec will always have defined values); update the else branch around
make_shift_invert_map, try, and finally to declare these variables before
entering try and only assign them inside try.
---
Outside diff comments:
In `@dptb/postprocess/pardiso/solvers/pardiso_solver.jl`:
- Around line 60-106: The variables egval_inv and egvec_sub are assigned only
inside the try in solve_eigen_k_pardiso, causing a scoping error when used
later; to fix, declare and initialize egval_inv and egvec_sub just before the
try (e.g. egval_inv = similar(...) or egval_inv = zeros(...); egvec_sub =
zeros(default_dtype, size(H_k,1), 0)) so they exist in the outer scope
regardless of which branch inside the try runs, then keep the existing
assignments inside the try (where eigs is called) and leave the resource-release
code (set_phase!/pardiso) unchanged.
---
Duplicate comments:
In `@dptb/entrypoints/pdso.py`:
- Around line 57-58: Validate that the resolved config_path (created from INPUT)
exists and is readable before calling into Julia: after computing config_path =
os.path.abspath(INPUT) check os.path.isfile(config_path) and
os.access(config_path, os.R_OK) (or equivalent) and if the check fails log a
clear error and exit/raise so the process fails fast instead of letting the
backend produce opaque errors; update the code around the
config_path/output_path assignment to perform this validation and use INPUT,
config_path, output_dir, and output_path names to locate where to add the check.
In `@examples/To_pardiso/test_pardiso_new.py`:
- Around line 94-100: The test currently swallows subprocess.run failures by
only printing in the except blocks; update the try/except around subprocess.run
(the block handling CalledProcessError and FileNotFoundError) to not hide
failures — either remove the except for CalledProcessError so the exception
propagates and fails the test, or re-raise the CalledProcessError after logging;
for FileNotFoundError, replace the print with a proper test skip (e.g., call
pytest.skip) so missing Julia marks the test as skipped rather than passing.
Ensure references to subprocess.run, CalledProcessError, and FileNotFoundError
are updated accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c0103fc8-b753-415d-9ee4-656157ebdcf3
📒 Files selected for processing (15)
CLAUDE.mdREADME.mddptb/entrypoints/pdso.pydptb/postprocess/pardiso/io/io.jldptb/postprocess/pardiso/solvers/pardiso_solver.jldptb/postprocess/pardiso/sparse_calc_npy_print.jldptb/postprocess/pardiso/tasks/dos_calculation.jlexamples/To_pardiso/README.mdexamples/To_pardiso/dptb_to_Pardiso.ipynbexamples/To_pardiso/dptb_to_Pardiso_new.ipynbexamples/To_pardiso/pardiso_tutorial.ipynbexamples/To_pardiso/test_pardiso_new.pyinstall_julia.shinstall_julia_packages.jltest_mkl_pardiso.jl
✅ Files skipped from review due to trivial changes (3)
- README.md
- examples/To_pardiso/README.md
- CLAUDE.md
Summary by CodeRabbit
New Features
dptb pdsoCLI command to run Pardiso-based postprocessing workflows.to_pardiso_json()method for structured data export to the Julia backend.Documentation