Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
rm -rf *.o *.so
g++ -O3 -fPIC -I/usr/include/python3.10 -c planner.cpp -ljsoncpp -o planner.o
planner.cpp: In member function ‘void SailorPlanner::get_plans_no_heuristics(std::unordered_map<std::__cxx11::basic_string<char>, std::vector<std::pair<std::__cxx11::basic_string<char>, std::vector<int> > > >&)’:
planner.cpp:903:34: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘std::vector<std::__cxx11::basic_string<char> >::size_type’ {aka ‘long unsigned int’} [-Wformat=]
903 | printf("size is %d\n", available_gpu_types.size());
| ~^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| int std::vector<std::__cxx11::basic_string<char> >::size_type {aka long unsigned int}
| %ld
planner.cpp:922:46: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘std::vector<std::pair<std::__cxx11::basic_string<char>, std::vector<std::__cxx11::basic_string<char> > > >::size_type’ {aka ‘long unsigned int’} [-Wformat=]
922 | printf("Permutation size is %d\n", region_list.size());
| ~^ ~~~~~~~~~~~~~~~~~~
| | |
| int std::vector<std::pair<std::__cxx11::basic_string<char>, std::vector<std::__cxx11::basic_string<char> > > >::size_type {aka long unsigned int}
| %ld
planner.cpp: In member function ‘void SailorPlanner::get_plans_num_gpus_dp(std::unordered_map<std::__cxx11::basic_string<char>, std::vector<std::pair<std::__cxx11::basic_string<char>, std::vector<int> > > >&)’:
planner.cpp:1123:42: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘std::vector<std::pair<std::__cxx11::basic_string<char>, std::vector<std::__cxx11::basic_string<char> > > >::size_type’ {aka ‘long unsigned int’} [-Wformat=]
1123 | printf("Permutation size is %d\n", region_list.size());
| ~^ ~~~~~~~~~~~~~~~~~~
| | |
| int std::vector<std::pair<std::__cxx11::basic_string<char>, std::vector<std::__cxx11::basic_string<char> > > >::size_type {aka long unsigned int}
| %ld
planner.cpp: In member function ‘ParallelismConfig* SailorPlanner::solve_dp(int, int, int, int, int, int, const string&, const std::vector<std::__cxx11::basic_string<char> >&, std::unordered_map<std::__cxx11::basic_string<char>, std::vector<std::pair<std::__cxx11::basic_string<char>, std::vector<int> > > >&, std::string, std::vector<std::vector<int> >&, std::vector<std::vector<int> >&)’:
planner.cpp:717:31: note: parameter passing for argument of type ‘std::pair<double, double>’ when C++17 is enabled changed to match C++14 in GCC 10.1
717 | id_to_zone);
| ^
g++ -O3 -fPIC -c utils/read_json.cpp -ljsoncpp -o read_json.o
g++ -O3 -fPIC -c training.cpp -ljsoncpp -o training.o
g++ -O3 -fPIC -c planner_utils.cpp -ljsoncpp -o planner_utils.o
planner_utils.cpp: In function ‘std::pair<double, double> find_p2p_time_cost(int, int, int, int, double, std::vector<StageConfig>&, std::unordered_map<std::__cxx11::basic_string<char>, std::unordered_map<int, std::unordered_map<int, std::unordered_map<std::__cxx11::basic_string<char>, std::unordered_map<int, std::unordered_map<int, std::pair<std::vector<double>, double> > > > > > >&, std::unordered_map<long unsigned int, double>&, std::unordered_map<std::__cxx11::basic_string<char>, std::unordered_map<std::__cxx11::basic_string<char>, double> >&, std::vector<std::__cxx11::basic_string<char> >&, bool)’:
planner_utils.cpp:122:20: note: parameter passing for argument of type ‘std::pair<double, double>’ when C++17 is enabled changed to match C++14 in GCC 10.1
122 | bool activation)
| ^
g++ -O3 -fPIC -c utils/network_utils.cpp -o network_utils.o
g++ -O3 -fPIC -shared -I/usr/include/python3.10 read_json.o training.o planner_utils.o network_utils.o planner.o -ljsoncpp -o libplanner.cpython-310-aarch64-linux-gnu.so
/root/sailor/sailor/Planner/sailor_planner/profiles/LLAMA-3-8/
/root/sailor/sailor/providers/multizone_bandwidths_het.json
/root/sailor/sailor/Planner/simulations/configs/training_config_llama_3.json
/root/sailor/sailor/Planner/llm_info.json
/root/sailor/sailor/providers/gcp/communication_cost.json
AT QUOTAS DICT: A100-40
/root/sailor/sailor/Planner/sailor_planner/profiles/LLAMA-3-8/A100-40/profile.json
AT QUOTAS DICT: V100-16
/root/sailor/sailor/Planner/sailor_planner/profiles/LLAMA-3-8/V100-16/profile.json
Check for GPU A100-40
Check for GPU V100-16
build_structs took 21849 ms
NUM_VALID_GPUS is 1
hash_string is us-central1-a_A100-40_32_V100-16_0
region: us-central1
zone: us-central1-a
32,0,
-------------------------------------------------------------------------------------------------------- Check for PP: 1
**************************************************** Check for MBS: 1
MIN_TMPS - Check for GPU 0
[]
MIN_TMPS - Check for GPU 1
[]
Permutation size is 1
++++++++++++++++++++++++++++++++++++++++ Check for regions: us-central1,
MAX_DP is 0
----------- CHECK WITH D 683, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 23 us, extra_cost is 0
----------- CHECK WITH D 512, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 410, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 13 us, extra_cost is 0
----------- CHECK WITH D 342, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 15 us, extra_cost is 0
----------- CHECK WITH D 293, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 12 us, extra_cost is 0
----------- CHECK WITH D 256, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 13 us, extra_cost is 0
----------- CHECK WITH D 228, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 13 us, extra_cost is 0
----------- CHECK WITH D 205, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 187, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 171, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 158, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 147, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 137, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 128, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 121, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 114, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 108, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 103, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 98, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 94, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 90, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 86, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 82, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 79, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 76, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 74, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 71, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 69, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 67, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 64, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 63, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 61, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 59, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 57, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 56, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 4 us, extra_cost is 0
----------- CHECK WITH D 54, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 53, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 52, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 50, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 49, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 48, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 47, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 46, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 6 us, extra_cost is 0
----------- CHECK WITH D 45, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 44, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 43, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 42, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 41, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 40, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 39, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 38, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 37, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 36, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 35, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 34, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 33, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 32, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 31, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 30, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 4 us, extra_cost is 0
----------- CHECK WITH D 29, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 28, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 27, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 26, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 25, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 24, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 23, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 22, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 21, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 20, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 1 us, extra_cost is 0
----------- CHECK WITH D 19, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 18, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 17, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 16, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 15, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 14, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 13, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 12, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 11, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 10, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 9, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 8, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 7, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 6, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 5, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 4, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 3, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 2, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
----------- CHECK WITH D 1, max_dp is 0
max cur budget is 0.00
----------------------------------------------- Tpp is 0.000000, solve_dp duration is 2 us, extra_cost is 0
**************************************************** Check for MBS: 2
[1ff2ae71d04c:3173 :0:3173] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xaaa000000029)
==== backtrace (tid: 3173) ====
0 0x00000000000300c0 find_tmp_degrees() ???:0
1 0x00000000000505a4 SailorPlanner::get_plans_num_gpus_dp() ???:0
2 0x000000000005450c SailorPlanner::get_sorted_plans() ???:0
3 0x000000000008590c pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<std::vector<Config, std::allocator<Config> >, SailorPlanner, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > > > >, float, double, pybind11::name, pybind11::is_method, pybind11::sibling>(std::vector<Config, std::allocator<Config> > (SailorPlanner::*)(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > > > >, float, double), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(SailorPlanner*, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > > > >, float, double)#1}, std::vector<Config, std::allocator<Config> >, SailorPlanner*, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > > > >, float, double, pybind11::name, pybind11::is_method, pybind11::sibling>(pybind11::cpp_function::initialize<std::vector<Config, std::allocator<Config> >, SailorPlanner, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > > > >, float, double, pybind11::name, pybind11::is_method, pybind11::sibling>(std::vector<Config, std::allocator<Config> > (SailorPlanner::*)(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > > > >, float, double), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(SailorPlanner*, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > > > >, float, double)#1}&&, std::vector<Config, std::allocator<Config> > (*)(SailorPlanner*, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > > > >, float, double), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN() :0
4 0x00000000000791d8 pybind11::cpp_function::dispatcher() :0
5 0x0000000000103c34 PyObject_CallFunctionObjArgs() ???:0
6 0x00000000000fa240 _PyObject_MakeTpCall() ???:0
7 0x0000000000113c9c PyMethod_New() ???:0
8 0x00000000000f0d2c _PyEval_EvalFrameDefault() ???:0
9 0x00000000001139a8 PyMethod_New() ???:0
10 0x00000000000ed8fc _PyEval_EvalFrameDefault() ???:0
11 0x00000000001048c8 _PyFunction_Vectorcall() ???:0
12 0x00000000000ed8fc _PyEval_EvalFrameDefault() ???:0
13 0x00000000001048c8 _PyFunction_Vectorcall() ???:0
14 0x00000000000ec8a4 _PyEval_EvalFrameDefault() ???:0
15 0x00000000001e8250 PyEval_EvalCode() ???:0
16 0x00000000001e80d4 PyEval_EvalCode() ???:0
17 0x000000000021b3ac PyUnicode_Tailmatch() ???:0
18 0x0000000000213ab8 PyInit__collections() ???:0
19 0x000000000021b05c PyUnicode_Tailmatch() ???:0
20 0x000000000021a1c4 _PyRun_SimpleFileObject() ???:0
21 0x0000000000219d90 _PyRun_AnyFileObject() ???:0
22 0x000000000020a7b0 Py_RunMain() ???:0
23 0x00000000001d9208 Py_BytesMain() ???:0
24 0x00000000000273fc __libc_init_first() ???:0
25 0x00000000000274cc __libc_start_main() ???:0
26 0x00000000001d90f0 _start() ???:0
=================================
python /root/sailor/sailor/Planner/simulations/simulator.py --sailor_path /root --trace_file /root/sailor/sailor/Planner/simulations/configs/gpu_trace_scaled.csv --basic_cluster_config_json /root/sailor/sailor/Planner/simulations/configs/basic_cluster_config.json --training_config_json /root/sailor/sailor/Planner/simulations/configs/training_config_llama_3.json --result_dir_path ae_results/planner/test --objective throughput --planner SAILOR --sailor_profile_file_dir /root/sailor/sailor/Planner/sailor_planner/profiles/LLAMA-3-8/ --quotas_dict /root/sailor/sailor/Planner/sailor_planner/dummy_quotas_dict.json --simulator_profile_file /root/sailor/sailor/Planner/simulations/profiles_tmp.json
I have successfully gotten the baseline planner evaluation experiments working in artifact evaluation using the OPT-350 model. However, I have been unable to do so with the provided LLAMA-3-8 model. When I use the run_homogenous.sh (Figure 7) run_het.sh (Figure 8b) scripts with LLAMA-3-8, I run into various errors based on which planner I use, but none successfully return a correct result. Sailor runs into a very strange segfault, Metis is missing sufficient profile files (but also I expect its runtime to be too slow to use anyways), and Piper returns an empty result. I made as few code changes as possible to avoid introducing bugs. I modified args.baseline in run_all_sim.py to selectively pass in any of the planner frameworks. I created a new llama config file and added an entry for it in the train_config_files dictionary in run_all_sim.py.
As a sanity test, I also created another llama config file that contains the same values as the OPT-350 config file, except for the "model" and "num_all_layers" entries. The same errors still show up.
Could you guys provide some insight into how to run the planner eval experiments on LLAMA-3-8B? Doing so will allow us to understand on how to explore other models using sailor. Please let me know if I have a mistake or if I am missing something. Would be happy to provide more details if necessary!
Thank you for reading!
I added all of the files and error log messages when running
run_homogenous.shbelow to help reproduce and debug:sailor/Planner/simulations/configs/training_config_llama_3.json{ "global_batch_size": 2048, "type": "gpt2", "hidden_size": 4096, "sequence_length": 8192, "num_layers": 32, "vocab_size": 128256, "model": "LLAMA-3-8", "optimizer": "Adam", "heads": 32, "head_dim": 128, "max_position_embeddings": 8192, "num_all_layers": 35 }sailor/Planner/simulations/configs/sanity_check.json{ "global_batch_size": 1024, "type": "gpt2", "hidden_size": 1024, "sequence_length": 2048, "num_layers": 32, "vocab_size": 50272, "model": "LLAMA-3-8", "optimizer": "Adam", "heads": 16, "head_dim": 64, "max_position_embeddings": 2048, "num_all_layers": 35 }ae_scripts/planner/run_homogenous.sh#!/bin/bash mkdir -p /root/sailor/ae_results/planner/fig7 python ae_scripts/planner/run_all_sim.py \ --model-name LLAMA-3-8 \ --gpu-type A100-40 \ --trace_file /root/sailor/sailor/Planner/simulations/configs/gpu_trace_scaled.csv \ --basic_cluster_config_json /root/sailor/sailor/Planner/simulations/configs/basic_cluster_config.json \ --simulator_profile_file /root/sailor/sailor/Planner/simulations/profiles_tmp.json \ --simulator_profile_file_op /root/sailor/sailor/Planner/simulations/profiles_tmp_aceso.json \ --quotas_dict /root/sailor/sailor/Planner/sailor_planner/dummy_quotas_dict.json \ --gpus-per-node 4 \ --sailor_path /root \ --res_dir ae_results/planner/test \ --objective throughput \ --baselines sailorae_scripts/planner/run_het.sh#!/bin/bash mkdir -p ae_results/planner/fig8b python ae_scripts/planner/run_all_sim.py \ --model-name LLAMA-3-8 \ --gpu-type A100-40 \ --trace_file /root/sailor/sailor/Planner/simulations/configs/gpu_trace_heterogeneous_imbalanced.csv \ --basic_cluster_config_json /root/sailor/sailor/Planner/simulations/configs/basic_cluster_config.json \ --simulator_profile_file /root/sailor/sailor/Planner/simulations/profiles_tmp.json \ --quotas_dict /root/sailor/sailor/Planner/sailor_planner/dummy_quotas_dict.json \ --gpus-per-node 4 \ --sailor_path /root \ --res_dir ae_results/planner/test \ --objective throughput \ --baselines sailorDiff for
ae_scripts/planner/run_all_sim.pySailor Output & Error Log
Piper Output & Error Log