caching - Cache Optimization of C++ Code -


this might seem kind of open-ended, i'm having trouble optimizing piece of c++ code multiple processors , cache.

more important multiple processors cache: iterating on 2 nested loops

for(int i=0; i<n; i++){   //do little here single array   for(int j=0; j<whoaanotherarray[n].size(); j++){     * access array[i][j] , otherarray[i][j] , store in variable        - example is: "int x = array[i][j] + otherarray[i][j]"     * compare variable other array[index calculated , j]        - example is: "if (x < yetanotherarray[i*n+j]){ //do yetanotherarray }"   } } 

my arrays (array , otherarray) of very large size. n size.

is there way make more cache friendly? switched using linked-lists, terrible cache. read somewhere access order [i][j] cache efficient.

fwiw, part of negative-weight cycle detection algorithm.

i thinking maybe since arrays huge (they're arrays of integers btw), better break them bit fit in cache better? i'm not sure if that's right or if is, how go it.

i've started use openmp. thing i've been doing adding

#pragma omp parallel 

before right loops, , i'm getting decent utilization. i'd learn how better use parallelism beyond loops in code not sure can do. , time: trying cache friendly.

one possibility cache-use improvement modify pattern of access array , otherarray. when read array[i][j] machine will, of course, move 'line' of memory cache. when read otherarray[i][j] machine will, of course, move 'line' of memory cache. possible read second 'line' first has flushed cache ram. , make situation worse (potentially) reading value yetanotherarray.

what happens depends lot on else going on @ same time, else in cache , other operations being carried out. can difficult figure out.

if (dominant) pattern of array access require element[i][j] both (or 3) arrays @ same time want arrange matters such in same 'line' of memory read. 1 way merge 3 arrays single m*n*3 array, in superarray[i][j][1] next superarray[i][j][2] next superarray[i][j][3] , in 3 planes of array each represent 1 of original arrays. of course, works if i've got index ordering right, give more thought have.

finally:

  1. this may transform elegant program spaghetti mess -- that's small price pay improved speed !

  2. by 'line' mean whatever chunk platform loads ram cache in 1 go.

  3. google around loop tiling , strip mining. compilers not terrifically @ yet , can provide should rewarded in improved execution speed.

Comments

Popular posts from this blog

Add email recipient to all new Trac tickets -

400 Bad Request on Apache/PHP AddHandler wrapper -

php - Change action and image src url's with jQuery -