I've been reading up on design patterns and I had a question about the cache friendliness of methods.
Here is some sample code:
class a
{
public:
int num;
void update();
}
Ignoring constructors and overall syntax/grammar, I have an array of this object like so:
int[] cacheArrayofa = new a[10];
for (int i =0;i<10;i++){
cacheArrayofa[i].update();
}
Where is this method 'update()' stored? I'm assuming there is only one copy of this update method that all instances use, which will cause cache misses.
I did some searching online and the only thing I saw was to be wary of virtual members, but nothing specifically about a run of the mill class method.
Hm, I was assuming that the method for the class would not be in the cache line. I define the class and the method, and it's well 'who knows where?' in memory. Later I initialize an array of objects for this class that are contiguous in memory. So all data members are in a cache line together. But where are the methods?
Now I'm realizing that as long as I'm not calling any data members I'm probably fine aren't I? I guess a real problem may arise when I try something like this:
for(int i = 0;i<10;i++){
cacheArrayofa[i].num++;
cacheArrayofa[i].update();
}
Is when this will cause a problem with the cache missing, since I'm jumping from my tightly packed contiguous data members to the class method over and over... correct?
First of all, you're assuming that data and instructions share the same cache, which is not necessarily so. In fact IIRC x86 has independent code and instruction caches. I might be wrong, though.
Second, you seem to assume that the CPU will arbitrarily flush the cache each time through the loop. Given a general update() function, it's impossible to know how often cache misses will occur in this code. Maybe update() just multiplies num by 2, or maybe it has to fetch data from disk and send to commands to the graphics device while waiting for a network packet. If we assume the simpler case, the cache behavior will be optimal. A cache that misses simply because it has to fetch memory from three disparate locations (the loop, the array, and the function) each time through a loop would not be very useful.
I was assuming that the method for the class would not be in the cache line
There is more than one cache line. Not to mention that there are several different cache levels. Usually when you get to function call, respective function code will be already loaded and ready to execute by virtue of branch prediction and code prefetch.
Processor creators continuously optimise their processors to works with common programming approaches to make code faster and so you will not have to worry about how code is executed. For example modern CPU can recognise and knows how to work with virtual call table, severly reducing perfomance impact of dynamic dispatch. Generally any attempt to reason about processor working or attempt to optimise somethig for it is doomed to fail.
Ah, that makes sense. I was looking at the cache friendly design pattern, but I think I may need to dig into the specifications of cpu caches in general before I can make use of it.