Commit 9bc57a2
committed
Stabilize MPI test timing
Synchronize ranks before timed sections so scheduler skew and barrier waits are not counted as task runtime, preventing rare timeout flakes like these:
```
[ RUN ] PicMatrixTests/NesterovARunFuncTestsProcesses3.MatmulFromPic/nesterov_a_test_task_processes_3_mpi_enabled_3_3
unknown file: error: C++ exception with description "
Task execute time need to be: time < 1 secs.
Original time in secs: 1.21769
" thrown in the test body.
[ OK ] PicMatrixTests/NesterovARunFuncTestsProcesses3.MatmulFromPic/nesterov_a_test_task_processes_3_mpi_enabled_3_3 (1224 ms)
[ FAILED ] PicMatrixTests/NesterovARunFuncTestsProcesses3.MatmulFromPic/nesterov_a_test_task_processes_3_mpi_enabled_3_3, where GetParam() = (64-byte object <20-AA 75-60 F6-7F 00-00 C0-6C 6E-60 F6-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 88-77 B8-48 FD-01 00-00>, "nesterov_a_test_task_processes_3_mpi_enabled", (3, "3")) (1225 ms)
[ RUN ] PicMatrixTests/NesterovARunFuncTestsProcesses3.MatmulFromPic/nesterov_a_test_task_processes_3_mpi_enabled_7_7
job aborted:
[ranks] message
[0] terminated
[1] application aborted
aborting MPI_COMM_WORLD (comm=0x44000000), error 1, comm rank 1
[2] terminated
---- error analysis -----
[1] on runnervmqq1k9
D:\a\parallel_programming_course\parallel_programming_course\install\bin\ppc_func_tests aborted the job. abort code 1
---- error analysis -----
[ PROCESS 1 ] [ PROCESS 1 ] Traceback (most recent call last):
File "D:\a\parallel_programming_course\parallel_programming_course\scripts\run_tests.py", line 308, in <module>
_execute(args_dict, env_copy)
File "D:\a\parallel_programming_course\parallel_programming_course\scripts\run_tests.py", line 283, in _execute
runner.run_processes(args_dict["additional_mpi_args"])
File "D:\a\parallel_programming_course\parallel_programming_course\scripts\run_tests.py", line 247, in run_processes
self.__run_exec(
File "D:\a\parallel_programming_course\parallel_programming_course\scripts\run_tests.py", line 122, in __run_exec
raise Exception(f"Subprocess return {result.returncode}.")
Exception: Subprocess return 1.
Error: Process completed with exit code 1.
```1 parent 626e4be commit 9bc57a2
5 files changed
Lines changed: 28 additions & 6 deletions
File tree
- modules
- runners/src
- util
- include
- src
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
85 | | - | |
86 | 85 | | |
87 | 86 | | |
88 | | - | |
| 87 | + | |
| 88 | + | |
89 | 89 | | |
90 | | - | |
| 90 | + | |
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
96 | | - | |
| 96 | + | |
97 | 97 | | |
98 | 98 | | |
99 | | - | |
100 | | - | |
| 99 | + | |
| 100 | + | |
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
103 | 103 | | |
104 | 104 | | |
105 | 105 | | |
| 106 | + | |
106 | 107 | | |
107 | 108 | | |
108 | 109 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
| 88 | + | |
88 | 89 | | |
89 | 90 | | |
90 | 91 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| 78 | + | |
78 | 79 | | |
79 | 80 | | |
80 | 81 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
3 | 5 | | |
4 | 6 | | |
5 | 7 | | |
| |||
65 | 67 | | |
66 | 68 | | |
67 | 69 | | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
0 commit comments