Commit 16ba018
committed
portable: accumulate in fp32 for Half/BFloat16 in mean and sum
Problem:
The fast-path and generic reduction loops in mean.out and sum.IntList_out
accumulated the running sum in the tensor dtype. For BFloat16, the sum
saturates around 256, so a mean over N=512 all-ones elements gives 0.5
instead of 1.0, and summing 512 all-ones elements gives 256 instead of
512.
Changes:
Accumulate in float for Half/BFloat16 by promoting the loop accumulator
to ACC in both the fast path and the generic path. The final result is
cast back to the tensor dtype on store.
Continues the fp32-accumulation work in #19117.1 parent 9a39008 commit 16ba018
4 files changed
Lines changed: 102 additions & 19 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
| 11 | + | |
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
| |||
58 | 60 | | |
59 | 61 | | |
60 | 62 | | |
| 63 | + | |
| 64 | + | |
61 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
62 | 71 | | |
63 | 72 | | |
64 | | - | |
| 73 | + | |
65 | 74 | | |
66 | 75 | | |
67 | | - | |
| 76 | + | |
68 | 77 | | |
69 | 78 | | |
70 | 79 | | |
71 | | - | |
| 80 | + | |
72 | 81 | | |
73 | 82 | | |
74 | 83 | | |
| |||
83 | 92 | | |
84 | 93 | | |
85 | 94 | | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
86 | 100 | | |
87 | 101 | | |
88 | 102 | | |
89 | 103 | | |
90 | 104 | | |
91 | | - | |
| 105 | + | |
92 | 106 | | |
93 | | - | |
94 | | - | |
95 | | - | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
96 | 110 | | |
97 | 111 | | |
98 | | - | |
| 112 | + | |
| 113 | + | |
99 | 114 | | |
100 | 115 | | |
101 | 116 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
| 11 | + | |
10 | 12 | | |
11 | 13 | | |
12 | 14 | | |
| |||
60 | 62 | | |
61 | 63 | | |
62 | 64 | | |
| 65 | + | |
| 66 | + | |
63 | 67 | | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
64 | 73 | | |
65 | 74 | | |
66 | 75 | | |
67 | 76 | | |
68 | | - | |
| 77 | + | |
69 | 78 | | |
70 | 79 | | |
71 | 80 | | |
72 | | - | |
| 81 | + | |
73 | 82 | | |
74 | 83 | | |
75 | 84 | | |
| |||
108 | 117 | | |
109 | 118 | | |
110 | 119 | | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
111 | 125 | | |
112 | 126 | | |
113 | 127 | | |
114 | 128 | | |
115 | 129 | | |
116 | | - | |
| 130 | + | |
117 | 131 | | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
125 | 135 | | |
126 | 136 | | |
127 | | - | |
| 137 | + | |
128 | 138 | | |
129 | 139 | | |
130 | 140 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
263 | 263 | | |
264 | 264 | | |
265 | 265 | | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
266 | 295 | | |
267 | 296 | | |
268 | 297 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
307 | 307 | | |
308 | 308 | | |
309 | 309 | | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
310 | 339 | | |
311 | 340 | | |
312 | 341 | | |
| |||
0 commit comments