Calculating the Duration of a Factorial Calculation

Pages: 12
Hi, I'm trying the calculate a factorial calculation's duration. But the code that I wrote is giving me the zeros all the time. If it's 1! giving me 0 even while calculating 20!, it's again the zero microseconds. Am I doing it wrong? Is there anything wrong with the code according to the general C++ rules or the practicals?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <iostream>
#include <chrono>

unsigned long long facto (unsigned long long);

unsigned long long facto (unsigned long long a)
{
	if (a > 1)
		a *= facto(a-1);

	return a;
}

int main ()
{
	std::cout << std::fixed;

	for (unsigned long long i {1}; i < 21; i++ )
	{
		unsigned long long a {i};

		auto start = std::chrono::high_resolution_clock::now();
		facto(a);
		auto stop = std::chrono::high_resolution_clock::now();

		auto duration = std::chrono::duration_cast<std::chrono::microseconds> (stop - start);

		double* tDuration = new double;
		*tDuration = (static_cast<double> (duration.count() / 1000000.0));

		std::cout << "***************************" << "\n|   " << i << "!\t| " << *tDuration << " micSec |\n";

		delete tDuration;
		tDuration = NULL;
	}

	return 0;
}
try nanoseconds instead?
your computer is very, very fast. factorial of a small number like 20 computed the slow way is still only 20 multiplications and a little loop scaffolding. Your computer can do upwards of 3.x billion things per second, and you asked it to do maybe 50 things. 50/3.something billion is a small number.

if you want to see if it is right, try sleeping for a constant amount in a test function. does your timer return the right value give or take a little lag (so answer is slightly higher than slept amount) ... sleep for say a full second or two... ?

I think its OK, you just asked for a very small number that may at times be zero.

in practice you would never do this, you would have a lookup table of factorials since you can only store a few of them even if the result is a 64 bit integer. Fast as it is, computing them is still inefficient and may be an aggravation if you need to do a lot of them (eg some statistics ask for 3 or 4 factorials in a single equation).
also in practice, whats up with the dynamic memory double mess? That is just unnecessary bloat. use nullptr instead of null, and smart pointers if you must use a pointer (but you do not need one here at all).

eg try this to test:
1
2
3
4
5
6
7
8
9
10
#include <thread>
unsigned long long facto (unsigned long long);

unsigned long long facto (unsigned long long a)
{
//	if (a > 1)
//		a *= facto(a-1);
   std::this_thread::sleep_for (std::chrono::seconds(1));
	return 1;
}
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <iostream>
#include <chrono>

unsigned long long facto(unsigned long long a) {
	if (a > 1)
		a *= facto(a - 1);

	return a;
}

int main() {
	std::cout << std::fixed;

	for (unsigned long long i { 1 }; i < 21; ++i) {
		const auto start { std::chrono::high_resolution_clock::now() };
		const auto a { facto(i) };
		const auto stop { std::chrono::high_resolution_clock::now() };
		const auto duration { std::chrono::duration_cast<std::chrono::nanoseconds> (stop - start) };

		std::cout << i << '\t' << a << " took " << duration << '\n';
	}
}


Which on my laptop displays:


1       1 took 395ns
2       2 took 395ns
3       6 took 0ns
4       24 took 395ns
5       120 took 0ns
6       720 took 0ns
7       5040 took 395ns
8       40320 took 0ns
9       362880 took 0ns
10      3628800 took 0ns
11      39916800 took 0ns
12      479001600 took 0ns
13      6227020800 took 396ns
14      87178291200 took 0ns
15      1307674368000 took 0ns
16      20922789888000 took 0ns
17      355687428096000 took 395ns
18      6402373705728000 took 395ns
19      121645100408832000 took 0ns
20      2432902008176640000 took 0ns


which just shows that each factorial calculation is too quick to be accurately measured. Even in nanoseconds it's really beyond measuring capability.

1
2
3
	auto start = std::chrono::high_resolution_clock::now();
	facto(a);
	auto stop = std::chrono::high_resolution_clock::now();

Watch out for compiler optimisations.
Since the function has no side effects, and you ignore the return result, the compiler is free to optimise it away.
This is indeed the case when I compiled the code with -O2.

If you're trying to time really fast things, try putting them in a loop.
1
2
3
4
5
6
volatile unsigned long long result = 0;
auto start = std::chrono::high_resolution_clock::now();
for ( int r = 0 ; r < 100 ; r++ ) {
    result += facto(a);
}
auto stop = std::chrono::high_resolution_clock::now();

Divide your result by 100 to get an average per iteration.

Experiment with different loop limits, or maybe just count loops until duration passes a meaningful threshold.
Thanks for your help George P
jonnin and seeplus, thanks for your help. I actually tried the nanoseconds and it worked for my station. As seeplus said it's really fast to compute. It's the only reasonable thing that I found out.

About the pointers, as a beginner, I'm trying to use the things that I learned like pointers. And I will keep asking these to yours.

We're having data structures and algorithm class this year in college. Our teacher is not making things easy for us. He's just reading the slides. So I'm watching a udemy course by Abdul Bari. It's going such good on recursions. Can I ask the questions about this class here or can you suggest me a site?
salem c, thanks for your help. You're right compiler optimizations are making my work harder sometimes. But I got it with nanoseconds. cpp.sh is giving us optimization choices and it explains why I am seeing different numbers other than my pc.
yes, you can ask questions, please do. A lot of us refuse to do your work FOR you, but help is here when you get stuck, just post the problem, the code, and what you are struggling with as you did this time.

data structures and algorithms -- the thing I would try to take away from the class is the strengths and weaknesses of each container for data structures, for example an array is weak at maintaining sorted order because inserting in the middle requires shuffling a lot of data over and over, while a linked list is good at this but is slow at finding a specific item. Algorithms... most of what you see in the class are the basis for similar solutions, for example the counting sort can also deduplicate ... such things take time to absorb but the idea is to learn to use similar techniques to what you are given to make new things that perform well, and also how to realize that you can't do any better (for example if you need to add 1 to every item in a container, you can't do it in less operations than touching every item once -- seems obvious, but its not always for more complex problems).

recursion is a powerful tool, as it can often express a difficult to code (without it) problem in a few statements. A lot of recursion can be summed up as a loop with a free stack data structure, if you ever need to emulate it in a language that does not allow it.
Last edited on
Hello. I like to test some codes in my projects using std::chrono so as to improve them. Sometimes we have many alternatives to do a task, and there are different executions durations - some are just bad. The main explanation is here :

https://en.cppreference.com/w/cpp/chrono/duration/duration_cast

Duration type can be really precise :
https://en.cppreference.com/w/cpp/chrono/duration

In this topic, I did some tests according to loop(s) and recursive functions.
It was just a test about processes execution, but it could be useful for you ++
https://cplusplus.com/forum/beginner/284837/#msg1234968
Last edited on
jonnin, thanks for your big helps and ideas.
Geckoo, thanks for your help.
Way back when I did my degree, it was so much easier to measure things like this. Computers were much slower and these sort of calculations could produce useful results in milli-seconds... The price of progress!
I heard a tale about program that did "work fine" in late '70, early '80, but on next computer in mid '80 it did stop to give expected results.

Apparently it had:
1
2
3
// in loop
  srand( time(NULL) );
  x = rand();

At least a second per iteration vs current "whole loop under nanosecond". :-)
if it can't be measured in MS when stress tested, its not worth timing and tuning it further: you are not doing enough of whatever it is to matter.

Apparently the early days were "fun". There is a persistent story that a team was paid big money over a long period to greatly increase the performance of a critical loop that was dominating the execution time in an OS. They did so, getting gigantic returns in its execution speed, but nothing improved in the OS performance. Additional study led to the understanding that the busy wait/ idle loop of the OS was now about as good as it could be. Its probably a fake story, but it does make a point.
Last edited on
RE: Sleep.
Using any form of “sleep” is never as precise as you’d like it to be. Better to repeat a test multiple times.

RE: Timing Precision.
C++ and the like are designed to permit very precise timing measurements. That does not mean that the underlying hardware/OS/whatever can be that precise. Even Windows OS functions return more precision than the vast majority of machines Windows is running on can deliver.

RE: Factorial Recursion
While that is a nicely linear recursion, it has stack-smashing possibilities: it creates a new call frame for every one of the a recursive invocations.

You can elide that with tail-call optimization, or TCO.

Do that by moving your result into arguments:

1
2
3
4
unsigned long long factorial( unsigned long long a, unsigned long long r = 1 )
{
  return (a < 2) ? r : factorial( a-1, r*a );
}

This won’t stop the answer from being incorrect after 20! (2,432,902,008,176,640,000) which is as large a factorial as you can compute without overflowing a 64-bit integer. It also doesn’t stop the function from trying to compute a larger factorial (continuing to invoke itself a times). But it does use only a single call frame on the stack.

You could easily write it as an explicitly iterative solution as well:

1
2
3
4
5
6
7
8
9
unsigned long long factorial( unsigned long long a )
{
  unsigned long long r = 1;
  while (a > 1)
  {
    r *= a--;
  }
  return r;
}

It isn’t as pretty as the recursive solution, though.

In either case, you now don’t have to pay for a normal function call more than once and no input number can crash your program. (It can make it run a really long time, though — it is totally worth adding a condition to refuse any numbers larger than you can compute, or at least an upper limit to try computing.)
My qualm about tail-recursion is that it makes the success of the program dependent on optimization! Often, debug modes are built without much optimization enabled, so that debugging is easier.

The following program will crash without optimization, but will succeed with O2 optimization (gcc).
[results will vary depending on your default stack size]
1
2
3
4
5
6
7
8
9
10
11
12
13
using Num = unsigned long long;
Num footorial( Num a, Num r = 1 )
{
  return (a < 2) ? r : footorial( a-1, r + a );
}

#include <iostream>
using namespace std;

int main()
{
    cout << footorial(50000) << '\n';
}

Not to mention, if a less-trivial function has a destructor called within it, it can make it less obvious that they compiler isn't able to tail-call optimize. I'd be overjoyed if somebody could offer some reassurance here, but I've yet to hear it.
Last edited on
If you happen to be using Clang 13+ you can decorate the return statement with [[musttail]], or at least I think that's how you spell it.
https://clang.llvm.org/docs/AttributeReference.html#musttail

Edit: it's [[clang::musttail]]
Last edited on
Very cool, do you know if it results in a compiler error if the compiler is unable to fulfill the tail requirement? (I see there's the caveat about virtual functions, so let's set that aside.)
Woah, that is indeed very much extra cool!

(Too bad only LLVM/Clang is that cool.)
Pages: 12