-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
I encounter an error when using the ILP solver while compiling Llama3-Mini with nnscaler. Can you help me understand why this is happening?
2024-11-03 08:56:37 | INFO | nnscaler.autodist.spmd_solver | finish spmd solver initializetion
Welcome to the CBC MILP Solver
Version: 2.10.3
Build Date: Dec 15 2019
command line - /opt/conda/envs/andy/nnscaler/lib/python3.10/site-packages/pulp/solverdir/cbc/linux/64/cbc /tmp/fd2f9f31b8df4c399e4cf1c02094c54b-pulp.mps -sec 600 -threads 104 -timeMode elapsed -branch -printingOptions all -solution /tmp/fd2f9f31b8df4c399e4cf1c02094c54b-pulp.sol (default strategy 1)
At line 2 NAME MODEL
At line 3 ROWS
At line 2660 COLUMNS
Duplicate row C0001129 at line 2757 < X0000017 C0001129 1.000000000000e+00 >
Duplicate row C0001130 at line 2758 < X0000017 C0001130 1.000000000000e+00 >
Duplicate row C0001132 at line 2759 < X0000017 C0001132 1.000000000000e+00 >
Duplicate row C0001134 at line 2760 < X0000017 C0001134 1.000000000000e+00 >
Duplicate row C0001135 at line 2761 < X0000017 C0001135 1.000000000000e+00 >
Duplicate row C0001137 at line 2762 < X0000017 C0001137 1.000000000000e+00 >
Duplicate row C0001129 at line 2773 < X0000019 C0001129 1.000000000000e+00 >
Duplicate row C0001130 at line 2774 < X0000019 C0001130 1.000000000000e+00 >
Duplicate row C0001133 at line 2775 < X0000019 C0001133 1.000000000000e+00 >
Duplicate row C0001134 at line 2776 < X0000019 C0001134 1.000000000000e+00 >
Duplicate row C0001135 at line 2777 < X0000019 C0001135 1.000000000000e+00 >
Duplicate row C0001138 at line 2778 < X0000019 C0001138 1.000000000000e+00 >
Duplicate row C0001129 at line 2790 < X0000021 C0001129 1.000000000000e+00 >
Duplicate row C0001131 at line 2791 < X0000021 C0001131 1.000000000000e+00 >
Duplicate row C0001132 at line 2792 < X0000021 C0001132 1.000000000000e+00 >
Duplicate row C0001134 at line 2793 < X0000021 C0001134 1.000000000000e+00 >
Duplicate row C0001136 at line 2794 < X0000021 C0001136 1.000000000000e+00 >
Duplicate row C0001137 at line 2795 < X0000021 C0001137 1.000000000000e+00 >
Duplicate objective at line 2796 < X0000021 OBJ 8.920574188232e-03 >
Duplicate row C0001129 at line 2807 < X0000023 C0001129 1.000000000000e+00 >
Duplicate row C0001131 at line 2808 < X0000023 C0001131 1.000000000000e+00 >
Duplicate row C0001133 at line 2809 < X0000023 C0001133 1.000000000000e+00 >
Duplicate row C0001134 at line 2810 < X0000023 C0001134 1.000000000000e+00 >
Duplicate row C0001136 at line 2811 < X0000023 C0001136 1.000000000000e+00 >
Duplicate row C0001138 at line 2812 < X0000023 C0001138 1.000000000000e+00 >
Duplicate row C0001666 at line 7085 < X0000779 C0001666 1.000000000000e+00 >
Duplicate row C0001667 at line 7086 < X0000779 C0001667 1.000000000000e+00 >
Duplicate row C0001669 at line 7087 < X0000779 C0001669 1.000000000000e+00 >
Duplicate row C0001671 at line 7088 < X0000779 C0001671 1.000000000000e+00 >
Duplicate row C0001672 at line 7089 < X0000779 C0001672 1.000000000000e+00 >
Duplicate row C0001674 at line 7090 < X0000779 C0001674 1.000000000000e+00 >
Duplicate row C0001666 at line 7101 < X0000781 C0001666 1.000000000000e+00 >
Duplicate row C0001667 at line 7102 < X0000781 C0001667 1.000000000000e+00 >
Duplicate row C0001670 at line 7103 < X0000781 C0001670 1.000000000000e+00 >
Duplicate row C0001671 at line 7104 < X0000781 C0001671 1.000000000000e+00 >
Duplicate row C0001672 at line 7105 < X0000781 C0001672 1.000000000000e+00 >
Duplicate row C0001675 at line 7106 < X0000781 C0001675 1.000000000000e+00 >
Duplicate row C0001666 at line 7118 < X0000783 C0001666 1.000000000000e+00 >
Duplicate row C0001668 at line 7119 < X0000783 C0001668 1.000000000000e+00 >
Duplicate row C0001669 at line 7120 < X0000783 C0001669 1.000000000000e+00 >
Duplicate row C0001671 at line 7121 < X0000783 C0001671 1.000000000000e+00 >
Duplicate row C0001673 at line 7122 < X0000783 C0001673 1.000000000000e+00 >
Duplicate row C0001674 at line 7123 < X0000783 C0001674 1.000000000000e+00 >
Duplicate objective at line 7124 < X0000783 OBJ 8.920574188232e-03 >
Duplicate row C0001666 at line 7135 < X0000785 C0001666 1.000000000000e+00 >
Duplicate row C0001668 at line 7136 < X0000785 C0001668 1.000000000000e+00 >
Duplicate row C0001670 at line 7137 < X0000785 C0001670 1.000000000000e+00 >
Duplicate row C0001671 at line 7138 < X0000785 C0001671 1.000000000000e+00 >
Duplicate row C0001673 at line 7139 < X0000785 C0001673 1.000000000000e+00 >
Duplicate row C0001675 at line 7140 < X0000785 C0001675 1.000000000000e+00 >
Duplicate row C0002203 at line 11222 < X0001509 C0002203 1.000000000000e+00 >
Duplicate row C0002204 at line 11223 < X0001509 C0002204 1.000000000000e+00 >
Duplicate row C0002206 at line 11224 < X0001509 C0002206 1.000000000000e+00 >
Duplicate row C0002208 at line 11225 < X0001509 C0002208 1.000000000000e+00 >
Duplicate row C0002209 at line 11226 < X0001509 C0002209 1.000000000000e+00 >
Duplicate row C0002211 at line 11227 < X0001509 C0002211 1.000000000000e+00 >
Duplicate row C0002203 at line 11238 < X0001511 C0002203 1.000000000000e+00 >
Duplicate row C0002204 at line 11239 < X0001511 C0002204 1.000000000000e+00 >
Duplicate row C0002207 at line 11240 < X0001511 C0002207 1.000000000000e+00 >
Duplicate row C0002208 at line 11241 < X0001511 C0002208 1.000000000000e+00 >
Duplicate row C0002209 at line 11242 < X0001511 C0002209 1.000000000000e+00 >
Duplicate row C0002212 at line 11243 < X0001511 C0002212 1.000000000000e+00 >
Duplicate row C0002203 at line 11255 < X0001513 C0002203 1.000000000000e+00 >
Duplicate row C0002205 at line 11256 < X0001513 C0002205 1.000000000000e+00 >
Duplicate row C0002206 at line 11257 < X0001513 C0002206 1.000000000000e+00 >
Duplicate row C0002208 at line 11258 < X0001513 C0002208 1.000000000000e+00 >
Duplicate row C0002210 at line 11259 < X0001513 C0002210 1.000000000000e+00 >
Duplicate row C0002211 at line 11260 < X0001513 C0002211 1.000000000000e+00 >
Duplicate objective at line 11261 < X0001513 OBJ 8.920574188232e-03 >
Duplicate row C0002203 at line 11272 < X0001515 C0002203 1.000000000000e+00 >
Duplicate row C0002205 at line 11273 < X0001515 C0002205 1.000000000000e+00 >
Duplicate row C0002207 at line 11274 < X0001515 C0002207 1.000000000000e+00 >
Duplicate row C0002208 at line 11275 < X0001515 C0002208 1.000000000000e+00 >
Duplicate row C0002210 at line 11276 < X0001515 C0002210 1.000000000000e+00 >
Duplicate row C0002212 at line 11277 < X0001515 C0002212 1.000000000000e+00 >
Duplicate row C0000590 at line 13735 < X0001952 C0000590 1.000000000000e+00 >
Duplicate row C0000591 at line 13736 < X0001952 C0000591 1.000000000000e+00 >
Duplicate row C0000593 at line 13737 < X0001952 C0000593 1.000000000000e+00 >
Duplicate row C0000595 at line 13738 < X0001952 C0000595 1.000000000000e+00 >
Duplicate row C0000596 at line 13739 < X0001952 C0000596 1.000000000000e+00 >
Duplicate row C0000598 at line 13740 < X0001952 C0000598 1.000000000000e+00 >
Duplicate row C0000590 at line 13751 < X0001954 C0000590 1.000000000000e+00 >
Duplicate row C0000591 at line 13752 < X0001954 C0000591 1.000000000000e+00 >
Duplicate row C0000594 at line 13753 < X0001954 C0000594 1.000000000000e+00 >
Duplicate row C0000595 at line 13754 < X0001954 C0000595 1.000000000000e+00 >
Duplicate row C0000596 at line 13755 < X0001954 C0000596 1.000000000000e+00 >
Duplicate row C0000599 at line 13756 < X0001954 C0000599 1.000000000000e+00 >
Duplicate row C0000590 at line 13768 < X0001956 C0000590 1.000000000000e+00 >
Duplicate row C0000592 at line 13769 < X0001956 C0000592 1.000000000000e+00 >
Duplicate row C0000593 at line 13770 < X0001956 C0000593 1.000000000000e+00 >
Duplicate row C0000595 at line 13771 < X0001956 C0000595 1.000000000000e+00 >
Duplicate row C0000597 at line 13772 < X0001956 C0000597 1.000000000000e+00 >
Duplicate row C0000598 at line 13773 < X0001956 C0000598 1.000000000000e+00 >
Duplicate objective at line 13774 < X0001956 OBJ 8.920574188232e-03 >
Duplicate row C0000590 at line 13785 < X0001958 C0000590 1.000000000000e+00 >
Duplicate row C0000592 at line 13786 < X0001958 C0000592 1.000000000000e+00 >
Duplicate row C0000594 at line 13787 < X0001958 C0000594 1.000000000000e+00 >
Duplicate row C0000595 at line 13788 < X0001958 C0000595 1.000000000000e+00 >
Duplicate row C0000597 at line 13789 < X0001958 C0000597 1.000000000000e+00 >
At line 22450 RHS
At line 25106 BOUNDS
At line 28108 ENDATA
Problem MODEL has 2655 rows, 2988 columns and 11651 elements
Coin0008I MODEL read with 116 errors
There were 116 errors on input
seconds was changed from 1e+100 to 600
threads was changed from 0 to 104
Option for timeMode changed from cpu to elapsed
** Current model not valid
Option for printingOptions changed from normal to all
** Current model not valid
No match for /tmp/fd2f9f31b8df4c399e4cf1c02094c54b-pulp.sol - ? for list of commands
Total time (CPU seconds): 0.01 (Wallclock seconds): 0.02
Traceback (most recent call last):
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/examples/llama3_8B_128K/train.py", line 291, in <module>
main(args)
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/examples/llama3_8B_128K/train.py", line 240, in main
trainer.run()
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/nnscaler/cli/trainer.py", line 98, in run
self._setup()
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/nnscaler/cli/trainer.py", line 209, in _setup
pmodel_class = nnscaler.parallelize(
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/nnscaler/parallel.py", line 996, in parallelize
regen_status = _gencode(
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/nnscaler/parallel.py", line 761, in _gencode
graph = pas_policy(graph, compute_config)
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/nnscaler/policies.py", line 307, in pas_autodist
return parallelize_graph(graph, autodist_cfg)
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/nnscaler/autodist/apis.py", line 120, in parallelize_graph
search_out = calc_parallel_plan(graph, autodist_config)
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/nnscaler/autodist/apis.py", line 101, in calc_parallel_plan
pp_out = calc_optimal_spmd_plan(autodist_graph, autodist_config)
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/nnscaler/autodist/spmd_solver.py", line 1516, in calc_optimal_spmd_plan
spmd_outs = spmd_solver.solve([(0, model_graph.op_num - 1)], 1)[0]
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/nnscaler/autodist/spmd_solver.py", line 1385, in solve
return self.do_ilp(intervals, topk)
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/nnscaler/autodist/spmd_solver.py", line 1178, in do_ilp
solver_out = self._solve_by_ilp(start, end)
File "/data/haiqwa/zevin_nfs/andy/Auto-Parallelization/nnscaler_group1/qinghe/nnscaler-main/nnscaler/autodist/spmd_solver.py", line 1119, in _solve_by_ilp
prob.solve(solver)
File "/opt/conda/envs/andy/nnscaler/lib/python3.10/site-packages/pulp/pulp.py", line 1867, in solve
status = solver.actualSolve(self, **kwargs)
File "/opt/conda/envs/andy/nnscaler/lib/python3.10/site-packages/pulp/apis/coin_api.py", line 112, in actualSolve
return self.solve_CBC(lp, **kwargs)
File "/opt/conda/envs/andy/nnscaler/lib/python3.10/site-packages/pulp/apis/coin_api.py", line 190, in solve_CBC
raise PulpSolverError("Pulp: Error while executing " + self.path)
pulp.apis.core.PulpSolverError: Pulp: Error while executing /opt/conda/envs/andy/nnscaler/lib/python3.10/site-packages/pulp/solverdir/cbc/linux/64/cbcReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels