optimising my code problem

i want to find the sum of the following series.

1*2 + 1*3 + 1*4 + 1*5 ..... 1*n +
2*3 + 2*4 + 2*5 + 2*6 + ....2*n +
3*4 + 3*5 + 3*6 + 3*7 +.....3*n +
....
..
..
and so on.where n=10^5

i used the following logic by using two loops but it is not efficient.. i am getting time limit exceeded.please suggest me some other way.

1
2
3
4
5
6
7
8
int s=0;
for(int i=1;i<n;i++)
{
for(int j=i+1;j<=n;j++)
{
s=s+ i*j;
}
}


where s=summation of my result..
Last edited on
How would you solve it if this was a math problem? You wouldn't go through all n² terms by hand, would you?
i am getting time limit exceeded

Are you saying the code does not compute fast enough?

if so, maybe using bitwise operations instead of multiplication.
@Peter87
are you saying that 1*(2+3+4+5.....n)+2*(3+4+5+6+7....n)+3*(4+5+6+7+.....n)...
Is this the way you are talking about...??
@codekiddy
Actualy the time complexicity of the code is O(n^2)..But i want to do it in O(n)..So thats why i was asking for some alternative method..
One more thing i want to tell that i have taken the example of consecutive numbers just for clarification..But it can be any series of numbers...i.e. a1,a2,a3,a4,a5,a6.....an.
a1,a2,a3,a4,a5......an- belong to natural numbers smaller than 10^9.
i wanted to know the generalised method of implementation
samarth123 wrote:
@Peter87
are you saying that 1*(2+3+4+5.....n)+2*(3+4+5+6+7....n)+3*(4+5+6+7+.....n)...
Is this the way you are talking about...??

Yes. Now, can you find a way to calculate (or keep track of) what's inside the the parentheses without using the inner loop?
Last edited on
@Peter87
Is this method applicable for any kind of series,i mean to say a list of numbers which do not have any relation.
Because the above example which i mentioned had all the numbers in Arithmetic Progression.
Yes, there is a way.
use an auxiliar S_j = \sum_{k=1}^{j} a_k
@Peter87
What is the way to solve it.?

@ne555
what do you mean by that formula..I cant understand.
Hi, this question is actually good for practicing, not only math but coding as well.

I was using boost::timer::auto_cpu_timer to measure execution time, and also cut generated assembly code to count number of instructions.

Initially I wrote the code dealing with individual bits in a part of the code,
(bitwise addition / multiplication), in hope it will be "fast", but that didn't work since compiler optimization did much better job than than my "bits" lol.

Here is statistics (as well as the code) for the code you posted:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <iostream>
using std::cin;
using std::cout;
using std::endl;
#include <boost/timer/timer.hpp>
using boost::timer::auto_cpu_timer;

/*
1*2 + 1*3 + 1*4 + 1*5 ..... 1*n +
2*3 + 2*4 + 2*5 + 2*6 + ....2*n +
3*4 + 3*5 + 3*6 + 3*7 +.....3*n +
*/

int main()
{
	typedef unsigned long long var;
	const var n = 100'000;
	var s = 0;

	{	// begin measuring cpu time
		auto_cpu_timer timer;

		for (var i = 1; i < n; i++)
		{
			for (var j = i + 1; j <= n; j++)
			{
				s = s + i * j;
			}
		}
	}

	cout << "result is: " << s << endl;
	cin.get();
	return 0;
} 



and here is program output of the above code together with execution time:
20.072684s wall, 20.000000s user + 0.000000s system = 20.000000s CPU (99.6%)
result is: 12500083332083325000



Here is my version of the code:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
using std::cin;
using std::cout;
using std::endl;
#include <boost/timer/timer.hpp>
using boost::timer::auto_cpu_timer;

/*
1*2 + 1*3 + 1*4 + 1*5 ..... 1*n +
2*3 + 2*4 + 2*5 + 2*6 + ....2*n +
3*4 + 3*5 + 3*6 + 3*7 +.....3*n +
*/

int main()
{
	typedef unsigned long long var;
	const var n = 100'000;
	var check = n;
	var factor = 1;
	var result = 0;

	{	// begin measuring cpu time
		auto_cpu_timer timer;

		for (var s; (s = 0, check = n) > factor; result += s * factor++)
			while (check > factor) s += check--;
	}

	cout << "result is: " <<  result << endl;
	cin.get();
	return 0;
} 


and here is program output of the above code together with execution time:
17.963140s wall, 17.796875s user + 0.015625s system = 17.812500s CPU (99.2%)
result is: 12500083332083325000


As you can see the second version of the code runs 3 seconds faster (huh, not bad),
however time length is platform dependent of course.



and finally here is generated assembly code, in both cases it begins on timer construction and ends at timer destruction.

original version:
	call	??0auto_cpu_timer@timer@boost@@QEAA@F@Z	; boost::timer::auto_cpu_timer::auto_cpu_timer
; Line 26
	jmp	SHORT $LN4@main
$LN2@main:
	mov	rax, QWORD PTR s$5[rsp]
	imul	rax, QWORD PTR factor$[rsp]
	mov	rcx, QWORD PTR result$[rsp]
	add	rcx, rax
	mov	rax, rcx
	mov	QWORD PTR result$[rsp], rax
	mov	rax, QWORD PTR factor$[rsp]
	inc	rax
	mov	QWORD PTR factor$[rsp], rax
$LN4@main:
	mov	QWORD PTR s$5[rsp], 0
	mov	QWORD PTR check$[rsp], 100000		; 000186a0H
	mov	rax, QWORD PTR factor$[rsp]
	cmp	QWORD PTR check$[rsp], rax
	jbe	SHORT $LN3@main
$LN5@main:
; Line 27
	mov	rax, QWORD PTR factor$[rsp]
	cmp	QWORD PTR check$[rsp], rax
	jbe	SHORT $LN6@main
	mov	rax, QWORD PTR check$[rsp]
	mov	rcx, QWORD PTR s$5[rsp]
	add	rcx, rax
	mov	rax, rcx
	mov	QWORD PTR s$5[rsp], rax
	mov	rax, QWORD PTR check$[rsp]
	dec	rax
	mov	QWORD PTR check$[rsp], rax
	jmp	SHORT $LN5@main
$LN6@main:
	jmp	$LN2@main
$LN3@main:
; Line 28
	lea	rcx, QWORD PTR timer$4[rsp]
	call	??1auto_cpu_timer@timer@boost@@QEAA@XZ	; boost::timer::auto_cpu_timer::~auto_cpu_timer


modified version:
	call	??0auto_cpu_timer@timer@boost@@QEAA@F@Z	; boost::timer::auto_cpu_timer::auto_cpu_timer
; Line 25
	mov	QWORD PTR i$5[rsp], 1
	jmp	SHORT $LN4@main
$LN2@main:
	mov	rax, QWORD PTR i$5[rsp]
	inc	rax
	mov	QWORD PTR i$5[rsp], rax
$LN4@main:
	cmp	QWORD PTR i$5[rsp], 100000		; 000186a0H
	jae	SHORT $LN3@main
; Line 26
	mov	rax, QWORD PTR i$5[rsp]
	inc	rax
	mov	QWORD PTR j$6[rsp], rax
	jmp	SHORT $LN7@main
$LN5@main:
	mov	rax, QWORD PTR j$6[rsp]
	inc	rax
	mov	QWORD PTR j$6[rsp], rax
$LN7@main:
	cmp	QWORD PTR j$6[rsp], 100000		; 000186a0H
	ja	SHORT $LN6@main
; Line 27
	mov	rax, QWORD PTR i$5[rsp]
	imul	rax, QWORD PTR j$6[rsp]
	mov	rcx, QWORD PTR s$[rsp]
	add	rcx, rax
	mov	rax, rcx
	mov	QWORD PTR s$[rsp], rax
	jmp	SHORT $LN5@main
$LN6@main:
	jmp	SHORT $LN2@main
$LN3@main:
; Line 28
	lea	rcx, QWORD PTR timer$4[rsp]
	call	??1auto_cpu_timer@timer@boost@@QEAA@XZ	; boost::timer::auto_cpu_timer::~auto_cpu_timer


as you can see small optimization has been achieved in both execution time and generated code.

I'm not saying it's perfect of course, there could a lot of room to generate better code but at this moment I'm not sure how.
I hope there are math gurus in this forums to jump here into discussion and give some genius solutions.
@codekiddy: don't sweat the small stuff.

@OP: use an auxiliar array that would hold the partial sum from the first to the `j' element.

By instance you have 3*(4+5+6+7+.....n) that you generalize as 3*(a[4]+a[5]+a[6]+a[7]+.....a[n])
that can be write condensed as \sum_{k=1}^{n} a[k] - \sum_{k=1}^{j} a[k]
so you may do 3 * (S[n] - S[3])
Last edited on
Hi, this question is actually good for practicing, not only math but coding as well.

Well, it's good for illustrating that optimizing a suboptimal algorithm doesn't buy you much.

http://melpon.org/wandbox/permlink/BVmAIsEKkctmuLuH
Work out the math. Even with ak being arbitrary numbers, I estimate that you can compute the answer for at least 10,000,000 numbers in 1 second on a modern PC.
Last edited on
@cire
nice catch cire, I was reading the wikipedia article myself, and wrote the code according to formula, but you was faster.

however it looks my version runs faster than yours ;)

http://melpon.org/wandbox/permlink/G8ezPPPEhYhsJ0cu
Last edited on
Math is your friend. Here it is without any loops at all. The sum is:
2 + 3*sum(1 to 2) + 4*sum(1 to 3) + ... + n*sum(1 to n-1)

= sum(for k=2 to n of k*sum(i=1 to k-1 of i))
= sum(for k=2 to n of k*(k-1)*k/2)
= (1/2) * sum(for k=2 to n of k3 - k2

Wolfram alpha says that's
(1/2) * (1/12 * (N-1) * N * (N+1) * (3N+2)
http://www.wolframalpha.com/input/?i=sum+k^3-k^2++from+k+%3D+2+to+N

No loops at all!

The hard part is coding this without having the value overflow. It turns out that with N=100,000, the value is near the limit of the 64 bit integer. In fact, if you factor out the division by 2 in the existing solutions, you'll find that the result overflows. I had to do it with long double arithmetic and even then I pulled tricks:
1
2
3
4
5
6
7
8
9
10
11
12
13
#include <iostream>

int main()
{
    uint64_t N=1E5;
    long double result = (N-1) * N;
    result /= 2;
    result *= (N+1);
    result /= 3;
    result *= (3*N+2);
    result /= 4;
    std::cout << static_cast<uint64_t>(result) << '\n';
}

$ ./foo
12500083332083325000


@dhayden
congrats, with Boost.Multiprecision it should be possible to hold values larger than 64 bits.

Another idea would be the combination of formula and loop:

1+2+3+4+5.....n -> (n2 + n) / 2

1
2
3
4
5
6
7
uint64_t sum = 1;
uint64_t prem_sum = (n * (n + 1)) / 2;
for(int i = 1; i <= n; ++i)
{
  prem_sum -= i;
  sum += i * prem_sum;
}
Not tested!
Last edited on
Topic archived. No new replies allowed.