How does cache of function calls work?

Dear experts,

I am now understanding how cache works, and suitable "data" structures (e.g. contiguous data array, stack memory region etc).

Here, I am wondering how memory access(caching) works in calling functions.
Are they never cached? Or, some rules to cache access them, and optimal way to enhance them?
(1) are usual calling functions automatically cached?
(probably, each scope (~stack) know what functions are called)
(2) how are calling dynamically switching functions(I meant virtual functions or std::function) treated in view of caching?

Kind regards
Last edited on
most cpus have a dedicated instruction cache (google that term) and a few cpu share one cache across both data and instructions.

jumps can cause page faults. jumps happen when you call a non-inlined function, perform a logical branch, loop, etc. Most loops and logic will not page fault on their jumps as it is usually within the same page. Calling functions may page fault at first, but remember the hardware and OS have smart algorithms and the compiler does a fair bit of optimizing as well.
consider:

1)
void foo()
{
loop a bunch of times. call one time from main
}
vs
main
{
loop a bunch of times calling foo
}

do you see the difference? But, in reality, it does not matter after just a few calls because the page management algorithm will begin to 'understand' that you need the page for main and the page for foo both in the cache at the same time until the loop is done. For the first version, it only needs the page for foo, but the difference is tiny and it may not make sense to write it the alternative way.

basically, trying to optimize at this level may be maddening to do in C++ (assembly, maybe, but c++, it will be rough). You are trying to outguess a very smart compiler, the operating system, and the hardware itself. That said, there are little things you can do. You may want to try to arrange your code so that you do not fill the instruction cache's pages so that it has to constantly swap pages in and out to do the work. So if you have a loop over a function that calls another function that calls yet another... and so on, at some point (depends on the size of the functions in instruction language) you run out of pages and it thrashes. There are probably other tips and tricks you can find, but that is the biggest one I know, is to do your best to avoid that scenario.

you may want to do diagnostics on your code to see what the compiler inlines and what it does not to get a feel for that.

2) assembly / cpu language has no idea what you just said. It does not care. Code is code, and it supports just a few things: jumps, loops, etc. function calls are not even 'real' in the sense that you manually push the call stack and pop it yourself in assembly -- functions are little more than syntax sugar around a hard address goto type call! These concepts are compiled into assembly language like anything else, and after that, they behave the same at the cpu level. The high level constructs of c++ are quite often stripped away ... assembly does not have classes, objects, constants are embedded into instructions themselves ... its totally different.

just like data, the instructions are just split into equal sized pages in memory and moved in blocks. The block where the function is stored (the page) is what moves, not the function itself, and a function can span more than 1 page of course. Data is the same way, it may move a KB or MB or whatever a page is now to access 1 double in memory. The page moving is extremely efficient, but you still don't want to do it constantly.

intel cpus and many others prefetch instructions, so execution is behind a little bit most of the time. This helps mitigate instruction page faults in a lot of scenarios.
some (older?) cpus have cache management instructions that, buried in the program, tell the cpu what to put in the cache so it does not have to think about it.
Last edited on
See this somewhat dated paper
What Every Programmer Should Know About Memory
https://akkadia.org/drepper/cpumemory.pdf

The author, Ulrich Drepper, is the maintainer of GNU libc and other software that you use, in some form or another, every day. The paper is fairly long; I found time to read it on a lengthy bus ride some years ago.

S.A. good material on the guy's web site
https://akkadia.org/drepper/

Last edited on
function calls are not even 'real' in the sense that you manually push the call stack and pop it yourself in assembly
Which CPUs do this? I'm not aware of any. Every CPU I've worked with has dedicated instructions to "jump to subroutine" and "return from subroutine." They save/retrieve the return address automatically. I think MIPS and Sparc would save/retrieve the return address in a register, meaning that the called function had to push it on the stack if necessary, but it still saved the return address.
Yes, you are correct. I probably said that badly.
what I was trying to say is that the machine language / assembly language washes away the C++ concepts down to what the CPU can understand, and it does not get OOP, virtual, const, classes, and many other such things, functions are very different, etc.

int foo (int a, int b, int c, int d)
in intel becomes something like

push a //manually set up the call stack stuff
push b
push c
push d
call foo //basically saves the instruction pointer and then jumps to foo's first statement by changing the IP
pop (or claim from register) foo's result //manually handle call stack again

anyway, in context of 2) above .. at this point in the game the functions are concrete blocks of code that are jumped to, no matter how it got to that address (the compiler and linker do this work) in the c++ code, those high level concepts have been resolved to blocks of code and addresses.
Last edited on
Dear all,

Thank you for your detailed replies.
I understand that caching for instructions is automatic matter so that as possible I keep in mind that I rely on CPUs (and compilers ?).

But, is there minimum practice (or rules) for software engineers to remember to enhance such a caching?
(e.g. This is my speculation. such as... (1) as possible as we can, call the same functions (Not use many functions) (2) function objects might be cache efficient)
In addition to instructions cache, is there a way for software engineer to tell CPUs the memory range to be cached?
(Particularly for data caches. But, should I rely on CPUs?)

> mbozzi

Thank you for your kind telling. I would like to study the document, and enhance my knowledge.

Kind regards
[I]s there minimum practice (or rules) for software engineers to remember to enhance such a caching?
Yes. The practice is don't worry about it. You will get much better performance by paying attention to your data structures and algorithms than you will from worrying about caching.

That isn't to say that you should ignore caching 100%. It's good to be aware that it occurs, but it's usually the last thing you need to optimize.
The document I linked discusses software design practice which can be used to help improve memory performance, e.g., by helping the computer decide what to load or evict from the caches and when.

Drepper shows that major performance improvements are possible, because the memory subsystem relies on heuristics which arent always optimal for particular systems. You'll get reasonable performance in "typical" C code, but the computer decides what's typical.

Last edited on
Dear dhayden

Thank you for your reply. I would like to be oblivious about caching for function calls, and concentrate on my data structures suitable to data cache.

Dear mbozzi

I deeply appreciate your information about the document. I am now reading it, and studying what we get aware of, and leave to CPUs.

Kind regards
Topic archived. No new replies allowed.