Loop Unrolling:
Loop unrolling involves, copying the loop body a few times, so that the number of iterations are decreased. To understand this, let us look at a program which adds an array of 400 items
Normal loop:
for(int i = 0;i < 400;i++){
x += arr[i];
}
Unrolled loop:
for(int i = 0;i < 100;i+=4){
x += arr[i];
x += arr[i+1];
x += arr[i+2];
x += arr[i+3];
}
The second (unrolled) loop will run much faster than the first, because there are fewer jumps and compare instructions. The number of times a loop is unrolled (here it is 4) should be 2 or 4 for maximum efficiency.
Moving Stuff Out Of Loops
Generally loops and recursive functions are the most time consuming parts of a program, so the body of loops and recursive functions need to be shortened. For example, the following program snippet makes 'n' number of copies of a string 's', and stores it in an array 'arr':
char *arr = new char*[n];
for(int i = 0;i < n;i++){
arr[i] = new char[strlen(s)+1];
strcpy(arr[i], s);
}
You might have noticed that we are calling the strlen function inside the loop, because of which it gets executed 'n' times. This is unnecessary as the length of the string 's' does not change, therefore a simple modification can make the program run much faster. Here is the modified code:
char *arr = new char*[n];
int len = strlen(s)+1;
for(int i = 0;i < n;i++){
arr[i] = new char[len];
strcpy(arr[i], s);
}
It is important to note that some compilers are smart enough to notice this error, and might move the strlen function call out of the loop body. In such a case, you will not notice a performance increase, because the program is already optimized!!
Optimizing Sequential Memory Reads
There is a vast scope for optimization while reading memory sequentially. The following unoptimized program snippet finds the maximum out of 10000 numbers (char's):
char *arr;//*arr is pointing to a 10000 byte array
max = arr[0];
for(int i = 1;i < 10000;i++){
if(arr[i] > max)max = arr[i];
}
The following program does that much more efficiently:
char *arr;//*arr is pointing to a 10000 byte array
max = arr[0];
for(int i = 1;i < 10000;i+=64){
if(arr[i] > max)max = arr[i];
}
for(int i = 1;i < 10000;i++){ //this will read every 64th element the second time, but the overhead is small
if(arr[i] > max)max = arr[i];
}
Explaining why this is faster is out of the scope of this blog. But if you read about how memory (RAM) is structured you will notice that it can handle 4 or more sequential memory requests in parallel, so the memory is loaded into the cache, after which it is read with hardly any delay.
No comments:
Post a Comment