File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -201,8 +201,8 @@ __global__ void reduceFinal(const float* __restrict input, int N)
201201
202202 __shared__ float data[BLOCK_SIZE ];
203203 // Already combine two values upon load from global memory.
204- data[threadIdx .x ] = id < N / 2 ? input[id] : 0 ;
205- data[threadIdx .x ] += id + N/ 2 < N ? input[id + N / 2 ] : 0 ;
204+ data[threadIdx .x ] = id < N ? input[id] : 0 ;
205+ data[threadIdx .x ] += ( id + N < 2 *N) ? input[id + N] : 0 ;
206206
207207 for (int s = blockDim .x / 2 ; s > 16 ; s /= 2 )
208208 {
@@ -312,4 +312,4 @@ Can you observe any difference in terms of speed / computed results?
3123122) Do you have any other ideas how the reduction could be improved?
313313Making it even faster should be quite challenging, but if you have
314314some suggestions, try them out and see how they affect performance!
315- */
315+ */
You can’t perform that action at this time.
0 commit comments