CUDA: Summation of results -


i'm using cuda run problem need complex equation many input matrices. each matrix has id depending on set (between 1 , 30, there 100,000 matrices) , result of each matrix stored in float[n] array n number of input matrices.

after this, result want sum of every float in array each id, 30 ids there 30 result floats.

any suggestions on how should this?

right now, read float array (400kb) host device , run on host:

// allocate result_array 100,000 floats on device // cuda process input matrices // read device host result_array float result[10] = { 0 }; (int = 0; < n; i++) {     result[input[i].id] += result_array[i]; } 

but i'm wondering if there's better way.

you use cublassasum() - bit easier adapting 1 of sdk reductions (but less general of course). check out cublas examples in cuda sdk.


Comments

Popular posts from this blog

asp.net - repeatedly call AddImageUrl(url) to assemble pdf document -

java - Android recognize cell phone with keyboard or not? -

iphone - How would you achieve a LED Scrolling effect? -