Hello, I have downloaded Intel® 64 And IA-32 Optimizations Manual, and I got to know some nice tricks to optimize code. So I will be posting them one by one.
This is the first trick that I got from this manual:
Try to keep your loops less than 16 instructions long. If you do this, then the whole loop gets stored in the Instruction Queue itself, thus avoiding many penalties. This also allows the Instruction Fetch and pre-decode sections of the processor to remain idle, thus reducing power consumption.
However the manual also says that loop unrolling can be in many cases faster than trying to fit the whole loop in the instruction queue. So to take the right decision, profile your code and see which works faster.
No comments:
Post a Comment