I would put it like this:
Virtual fxs are going to equal at best, or possibly slightly worse.
The performance hit really should not be that steep. A very simple design pattern that can help you avoid the hit in many cases:
instead of this:
for(1 to 10 grillion)
thing.virt_foo(10 grillion); //put the loop inside the function.
Ok. so do you mean to say that virtual functions are slower than statically bound calls as the virtual function call requires a table lookup at runtime before calling, which impacts the slowness here? Or is it due to the reason that the class must use memory to maintain a table of virtual function pointers, which results in a performance issue?
My understanding is that it takes longer to invoke a virtual method and that extra memory is required to store the information needed for the lookup. Virtual function calls must be resolved at run time by performing a vtable lookup, whereas non-virtual function calls can be resolved at compile time. This can make virtual function calls slower than non-virtual calls. In reality, this overhead may be negligible, particularly if our function does non-trivial work or if it is not called frequently.
Your closing points seem fine, but I thought I'd add this to the inquiry:
Usually, when used properly, a virtual function call is a decision based on type.
When I say "properly", consider this as part of my meaning:
If the decision to be made is well known at compile time, using a virtual function in the design may be an error. It may be wiser to use a template parameter, or some other means, to make the decision at compile time.
If the decision to be made must be at runtime, in part because the purpose is to decide, based on a derived type, what to do when the perspective of the function call is from a base class that does not know (or shouldn't have to know) the derived type, then a virtual function design may be best suited.
When a decision must be made a runtime, there is going to be a cost. All decisions take some time and effort. One of the fastest runtime decisions is an inline test of a bool. If your decision is between two options, and that option can be encoded as a bool, that may be the fastest way to write that decision.
The typical virtual function example from decades past has been the "shape" base object with derived shaped classes like circles, rectangles, triangles, etc. This was never a great example, but it shows that the container stores everything as the "shape" base object, and the derived types decide how to draw through the virtual function paradigm.
However, when performance consideration is paramount, one must weigh the cost of the decision mechanism to know if they've chosen wisely. If there are only 2 or 3 choices (ever), an inline test of an integer in an "if/else if/else" paradigm is likely faster than 3 types derived from a base. There is a cost to this 3 way jumping around, but it is likely less than the virtual function mechanism, especially if the programmer knows which result is the most likely and can make that the first "if" test.
However, as the number of options grow, a "C" style approach was often the switch, based on an integer. There is a cost to this, much like an "if/elseif/elseif/elseif" ladder of options, which gets longer as the options grow, give or take which branch is taken.
C style would, in slightly more advanced hands, then resort to a pointer to a function (in place of an integer specifying the item's selection). This drops the overhead of each call to a fixed time, that of a call through a function pointer, no matter how many options there may be.
The C++ version of this, given the caveat of a derivative class for each option making the selection, is the virtual function.
To make the choice intelligently, one must recognize there is a cost to the decision, so make the least cost choice for the performance advantage, but only when performance is paramount and not the convenience of a design. Sometimes the "time" we prioritize is development time, not runtime.
If performance is paramount and there will never be more than two options, an "if" testing a bool is faster than most anything else.
Also, one must consider, when looping though a long array or vector (the reason there's a loop and thus a performance issue), if the decision is made at each element, or made only once before the entire "job".
C's qsort function operates on an array, sorting the array. It takes a pointer to a function as the parameter, where that function compares two elements during the sort.
This function could be called millions of times in a large sort, and the only option with the qsort function of the C library is to call the comparison by pointer to a function.
The decision about what the comparison must be to match the type of data is made once for all elements.
This is a similar cost to a virtual function call.
In the stl, the sort algorithm takes a template parameter representing a comparison "object", which can be a lambda or a function object which performs the comparison.
This offers the same choice, one decision for all elements being sorted.
The difference, though, is that while that can be a pointer to a function, the function object is much faster because the compiler has more information than merely a pointer to a function (which it has no choice but to call), and thus can inline the comparison code...eliminating overhead.
The way the decision is supplied to the compiler can imply a zero overhead method in that approach.
The point here is that the choice is still available to the programmer, and that choice is made once before the algorithm fires, and is the only choice applied to all elements subsequently until the "job" is completed.
This isn't a good choice for either a pointer to a function or a virtual function design.
In C++, such choices, which are known at compile time, are best implemented as template parameters or other similar fixed implementations that have no overhead but still offer morphing options in the design.
The key is the point that a decision is being made, but to ask where is that decision made, how often (and why), and what is the cost of alternatives when making that test inside loops where a virtual function may be applicable.
The virtual function paradigm, much like the pointer to a function paradigm, is a fixed cost for the decision no matter how many options there are. Compare this to a switch with 15 cases, as opposed to 15 classes derived from a base, or a pointer to one of 15 different functions...each method makes a choice, but the costs are different.
In a 3D engine, for example, one might find the rendering system implemented as a base object interface, with derived classes targeting the various major 3D interfaces, ranging from OpenGL (in various versions) to DirectX (in various versions), to Metal (on Apple) or Vulcan (wherever).
If the derived classes represent a choice as to which target engine is configured, then every function from the base interface may well be a virtual function, adding cost to every call into the engine.
Instead, it may be better to use one of two alternative designs.
In C++, the rendering "base" can be refashioned as a template class, where the parameter it takes represents the "implementation" being selected at compile time (or instantiated as required). We rarely need to switch the rendering choice at runtime, except for various testing. When the morphing option is of this kind of requirement, the template approach to the design eliminates the virtual functions and, since they are usually inside loops, lightens their burden on performance.
There is an older approach that does something similar....just chose different compilation units by "defines"...this is a C style approach.
The way that works is that one writes each of the targets (directX, Vulcan, Metal, OpenGL) in separate CPP files, all with the same interface header file.
Then, you include only 1 of those targets at compile time, based on a define, while insisting all of them conform to the same interface (the header of function calls to the renderer).
This old style approach actually works well, and is quite simple...the choice being made by a define, and eliminates virtual function calls throughout.
It just doesn't use C++ language features.
Just keep in mind that while saying virtual functions take longer than non-virtual functions ignore the reason one uses virtual functions...to make a decision at runtime, which has a cost no matter how it is implemented, and under the right circumstances the virtual function paradigm can be the faster option.
When it isn't, it may be more convenient, and that might be the priority.
The difference is in the 90 nanosecond range per function call (or per decision), give or take the hardware.
Sometimes decision can be made in a few nanoseconds, and they're not best served by virtual functions when performance is paramount.
Otherwise, decision by type using virtual functions is very quick.