Summary
I tried PTRUNK with both PBFGS and PLSE to get dense and linear-operator element matrices.
Three main cost of the PTRUNK:
cg with intensive use of partitioned vector products, 25%;
- partitioned updates (limited memory of not) 25% or more;
- should preallocate vectors, at least for the numerical tests ;
- f(x) evaluation via ExpressionTreeForge 30%;
- grad evaluation 10 %.
Remark: A problem solved in few iterates may take only few cg iterates before covergence, cofonding the number of partitioned vector products performed by cg and the ones in during $m_k(s)$ evaluations.
The FlameGraphs are after the detailled script:
using PartiallySeparableSolvers, PartiallySeparableNLPModels
using BenchmarkTools, ProfileSVG
using ADNLPModels
using OptimizationProblems, OptimizationProblems.ADNLPProblems
path = pwd()*"/dvpt/profilage/profiles/"
n = 500
# nlp = ADNLPProblems.arwhead(; n)
nlp = ADNLPProblems.rosenbrock(; n)
nlp = ADNLPProblems.dixmaanh(; n)
psnlp = PartiallySeparableNLPModel(nlp)
ProfileSVG.set_default(maxdepth=100)
# Partitioned structure summary (dixmannh100):
# element functions: ████████████████████ 297
# distinct element functions: ██████████⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 135
# Element statistics:
# linear: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 convex: ███████████⋅⋅⋅⋅⋅⋅⋅⋅⋅ 100
# quadratic: █████████████████⋅⋅⋅ 132 concave: █⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 1
# cubic: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 general: ████████████████████ 197
# general: ████████████████████ 164
# Element function dimensions: Variable overlaps:
# min: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0.0 min: ████████████████⋅⋅⋅⋅ 4.0
# mean: █████████████████⋅⋅⋅ 1.6 mean: ████████████████████ 4.9
# max: ████████████████████ 2.0 max: ████████████████████ 5.0
# Partitioned structure summary: (rosenbrock100)
# element functions: ████████████████████ 297
# distinct element functions: ██████████⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 135
# Element statistics:
# linear: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 convex: ███████████⋅⋅⋅⋅⋅⋅⋅⋅⋅ 100
# quadratic: █████████████████⋅⋅⋅ 132 concave: █⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 1
# cubic: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 general: ████████████████████ 197
# general: ████████████████████ 164
# Element function dimensions: Variable overlaps:
# min: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0.0 min: ████████████████⋅⋅⋅⋅ 4.0
# mean: █████████████████⋅⋅⋅ 1.6 mean: ████████████████████ 4.9
# max: ████████████████████ 2.0 max: ████████████████████ 5.0
ges = PTRUNK(nlp; name=:pbfgs, verbose=0, max_iter=1000)
@code_warntype PTRUNK(nlp; name=:pbfgs, verbose=0, max_iter=1000) # type stability
@benchmark PTRUNK(nlp; name=:pbfgs, verbose=0, max_iter=1000)
p = ProfileSVG.@profview PTRUNK(nlp; name=:pbfgs, verbose=0, max_iter=1000)
ProfileSVG.save(path * "pbfgs" * nlp.meta.name * string(nlp.meta.nvar) * ".svg")
# (dixmannh100) 555 iter
# BenchmarkTools.Trial: 16 samples with 1 evaluation.
# Range (min … max): 301.036 ms … 328.704 ms ┊ GC (min … max): 5.20% … 8.99%
# Time (median): 311.552 ms ┊ GC (median): 5.10%
# Time (mean ± σ): 313.714 ms ± 9.387 ms ┊ GC (mean ± σ): 6.92% ± 2.18%
# ▃ █▃
# █▁▇▁▁▁▁▁▁▁▇▁▁▇▁▇▇▁▇▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▁▁▁▁▁▁▁▁▁▁▇▁▁▁▇ ▁
# 301 ms Histogram: frequency by time 329 ms <
# Memory estimate: 188.44 MiB, allocs estimate: 3711279.
# (rosenbrock100) 1000 iter
# BenchmarkTools.Trial: 11 samples with 1 evaluation.
# Range (min … max): 473.243 ms … 494.008 ms ┊ GC (min … max): 8.11% … 7.94%
# Time (median): 482.832 ms ┊ GC (median): 7.91%
# Time (mean ± σ): 482.779 ms ± 6.672 ms ┊ GC (mean ± σ): 8.17% ± 0.69%
# ▁ ▁ ▁ ▁ ▁ ▁ █ ▁ ▁ ▁
# █▁▁▁█▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█▁▁▁▁▁▁▁▁█ ▁
# 473 ms Histogram: frequency by time 494 ms <
# Memory estimate: 405.67 MiB, allocs estimate: 3126986.
ges = PTRUNK(nlp; name=:plse, verbose=0, max_iter=1000)
@code_warntype PTRUNK(nlp; name=:pbfgs, verbose=0, max_iter=1000) # type stability
@benchmark PTRUNK(nlp; name=:plse, verbose=0, max_iter=1000)
p = ProfileSVG.@profview PTRUNK(nlp; name=:plse, verbose=0, max_iter=1000)
ProfileSVG.save(path * "plse" * nlp.meta.name * string(nlp.meta.nvar) * ".svg")
# (dixmaanh100) 25 iter
# BenchmarkTools.Trial: 87 samples with 1 evaluation.
# Range (min … max): 52.061 ms … 76.085 ms ┊ GC (min … max): 0.00% … 23.51%
# Time (median): 54.743 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 57.979 ms ± 7.506 ms ┊ GC (mean ± σ): 5.64% ± 9.46%
# ▁▁█▄▄▇▅▄
# ▆█████████▃▁▁▁▃▅▁▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▅▃▅▅▁▁▅▃▅▃ ▁
# 52.1 ms Histogram: frequency by time 75.6 ms <
# Memory estimate: 24.72 MiB, allocs estimate: 466225.
# (rosenbrock100) 346 iter
# BenchmarkTools.Trial: 26 samples with 1 evaluation.
# Range (min … max): 181.323 ms … 210.233 ms ┊ GC (min … max): 0.00% … 6.92%
# Time (median): 199.135 ms ┊ GC (median): 6.60%
# Time (mean ± σ): 197.935 ms ± 8.653 ms ┊ GC (mean ± σ): 4.53% ± 3.29%
# ▃ ▃ █
# ▇▇▁▁▇▁▁▇▁▇▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▇█▇▇▁▁▁█▁▇▁▁▁▇▁▁▇▇█▁▁▇▇▁▇▇▁▁▁▇▁▇ ▁
# 181 ms Histogram: frequency by time 210 ms <
# Memory estimate: 86.98 MiB, allocs estimate: 1495942.
FlameGraphs:
- plse for dixmaanh99;
- plse for dixmaanh498;
- plse for genrose100;
- plse for genrose500;
- pbfgs for dixmaanh99;
- pbfgs for dixmaanh498;
- pbfgs for genrose100;
- pbfgs for genrose500;








Summary
I tried PTRUNK with both PBFGS and PLSE to get dense and linear-operator element matrices.
Three main cost of the PTRUNK:
cgwith intensive use of partitioned vector products, 25%;Remark: A problem solved in few iterates may take only few cg iterates before covergence, cofonding the number of partitioned vector products performed by$m_k(s)$ evaluations.
cgand the ones in duringThe FlameGraphs are after the detailled script:
FlameGraphs: