Which is faster??

Apr 7, 2012 at 1:14am
I want to know which one is faster (executing-wise): loops, or writing out all of the code. Ever since I was a newbie programmer, and I found out about loops, I had always preferred writing out all of the code, as it gave me more control over the little details. However, I find I am using loops more recently to initialize 81 variables, even though I had previously initialized them one at a time... Which one would be faster: writing out all of the code (as I had been doing from the beginning), or use loops (I'd be all for using loops, but I want to know about performance first)?

Sure, I may not be creating all sorts of huge games like CoD, or anything. I'm just very... curious, for lack of a better word.
Apr 7, 2012 at 1:35am
Hello.

That depends on how large your loops are. If they'd end up producing a lot of code if you "unrolled" them, then it could potentially be much slower. For smaller loops, then it could be faster but probably not enough to make a huge difference. :)

For the sake of making your code easy to read, though, for now I'd probably suggest using loops. They're much easier to read and debug than 100s of lines of repeated code, aren't they?

-Albatross
Apr 7, 2012 at 1:36am
For very small loops with a small number of iterations unrolling the loop can be faster, otherwise it will be slower. However, you should not do the unrolling yourself - that's the compiler's job.
Edit: ninja'ed.

In addition, for any type of optimization the same rules apply:
1. don't worry about code parts that does not contribute significantly to the running time of the program (a profiler will tell you whether that's the case or not).
2. only optimize if you can prove that the optimization does really make the code significantly faster - proof is always obtained by timing sample runs on real-word data.
Last edited on Apr 7, 2012 at 1:39am
Apr 7, 2012 at 2:44am
Writing all the code instead of loops and function calls is always faster but we as programmers should not care because if we did we would be writing in assembly and not a high level language like C++.

The compilers we use will try to optimize our code as best they can and that's all we need to know.

If you really want to write the most efficient program you possibly can you will have to write it in assembly code which will utilize the actual CPU instruction sets and let me tell you, you do not want to debug those type of programs! lol

In assembly you truly learn how expensive it is to loop and call a function!

Here is a taste of some mips assembly I had to write for a CPU architecture class. Even with the comments, notice how hard it is to understand. The advantage is that it will run faster compared to any code written in a high level language.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
main:
	la		$a0, array	#address of the array
	li		$a1, 9		#size of the array
	jal		heappermute	#call heap permute
	
	j		exit			#exit the program


heappermute:
	sw		$fp, -4($sp)	#save the frame pointer in the frame
	subiu	$fp, $sp, 4	#set the frame pointer to the top of the frame
	sw		$ra, -4($fp)	#save the return address in the frame
	subiu	$sp, $fp, 4	#set the stack pointer for the new frame
	
	sw		$s0, -4($sp)
	sw		$s1, -8($sp)
	sw		$s2, -12($sp)
	subiu	$sp, $sp, 12
	
	move	$s0, $a1			#set $a1 to the size of the array
	move	$s2, $a0			#$s2 holds the address of array	
	
	bne		$s0, 1, startloop	#check to see if the size is = to 1
	move	$a0, $s2			#move $s2 into $a0
	jal		checksq			#check to see if square is magic
	j		endpermute		#exit the permutation

startloop:
	li		$s1, 0			#$s1 = 0
loop:
	move	$a0, $s2			#move $s2 into $a0 address
	move	$a1, $s0			# size into $a1 size of array
	subiu	$a1, $a1, 1		#subtract one from the size

	jal		heappermute		#call heappermute
	
	li		$t1, 2
	div		$s0, $t1			#divide size by 2
	mfhi		$t1				#move the remainder into $t1
	bne		$t1, 1, swap2		#if the remainder is not 1 branch to swap 2

	move	$a0, $s2			#move address into $a0
	move	$a1, $s0			#move size into $a1
	sub		$a1, $a1, 1		#subtract 1 from the size
	mul		$a1, $a1, 4		#multiply size by 4
	add		$a1, $a1, $a0
	jal		swap			#call the swap

	j		endswap			#skip over the swap2
swap2:
	mul		$a0, $s1, 4
	add		$a0, $a0, $s2
	move	$a1, $s0
	sub		$a1, $a1, 1
	mul		$a1, $a1, 4
	add		$a1, $a1, $s2
	
	jal		swap			#call the swap
	
endswap:
	move	$a0, $s2
	jal		printarray
	addi		$s1, $s1, 1		#increment i + 1
	bne		$s1, $s0, loop		# if i is not equal to the size	
endpermute:	
	lw		$s0, 8($sp)
	lw		$s1, 4($sp)
	lw		$s2, 0($sp)
	addiu	$sp, $sp, 12
	
	lw		$ra, -4($fp)		#restore the return address
	addiu	$sp, $fp, 4		#restore the stack pointer
	lw		$fp, 0($fp)		#restore the frame pointer
	
	jr		$ra				#return to the calling routine

printgood:
	sw		$fp, -4($sp)    	# save the frame pointer in the frame
	subiu	$fp, $sp, 4      	# Set the frame pointer to the top of the frame
	sw		$ra, -4($fp)    	# save the return address in the frame
	subiu	$sp, $fp, 4    	# set the stack pointer for the new frame
	
	move	$t0, $a0 		#temp storage for address
	
	la		$a0, printsq  	#address of the null-terminated string
	li		$v0, 4   		#prints the string
	syscall
	
	la		$a0, eol  		#address of the null-terminated string
	li		$v0, 4   		#prints the string
	syscall
	
	move	$a0, $t0
	jal		printarray
	
	lw		$ra, -4($fp)   	# restore the return address
	addiu	$sp, $fp, 4    	# restore the stack pointer
	lw		$fp,0($fp)     	# restore the frame pointer
	jr		$ra               	# return
	
exit:
	li		$v0, 10
	syscall

printarray:
	
	lw		$t0, 0($a0)	#load the values in the array into $t0 -> $t8
	lw		$t1, 4($a0)
	lw		$t2, 8($a0)
	lw		$t3, 12($a0)
	lw		$t4, 16($a0)
	lw		$t5, 20($a0)	
	lw		$t6, 24($a0)
	lw		$t7, 28($a0)
	lw		$t8, 32($a0)	
		
	li		$v0, 1	
	move	$a0, $t0
	syscall	
	
	li		$v0, 1	
	move	$a0, $t1
	syscall
	
	li		$v0, 1	
	move	$a0, $t2
	syscall	
	
	la		$a0, eol  		#address of the null-terminated string
	li		$v0, 4 		#prints the string
	syscall	
	li		$v0, 1	
	move	$a0, $t3
	syscall
		
	li		$v0, 1	
	move	$a0, $t4
	syscall
	
	li		$v0, 1	
	move	$a0, $t5
	syscall	
	
	la		$a0, eol  		#address of the null-terminated string
	li		$v0, 4   		#prints the string
	syscall	
	li		$v0, 1	
	move	$a0, $t6
	syscall	
	
	li		$v0, 1	
	move	$a0, $t7
	syscall
	
	li		$v0, 1	
	move	$a0, $t8
	syscall	
	
	la		$a0, eol 		#address of the null-terminated string
	li		$v0, 4   		#prints the string
	syscall
	
	la		$a0, eol  		#address of the null-terminated string
	li		$v0, 4   		#prints the string
	syscall
		
	
	jr		$ra	

checksq:
	sw		$fp, -4($sp)    	# save the frame pointer in the frame
	subiu	$fp, $sp, 4      	# Set the frame pointer to the top of the frame
	sw		$ra, -4($fp)    	# save the return address in the frame
	subiu	$sp, $fp, 4     	# set the stack pointer for the new frame


Last edited on Apr 7, 2012 at 2:51am
Apr 7, 2012 at 3:37am
> I want to know which one is faster (executing-wise): loops, or writing out all of the code.

The compiler knows how to unroll loops if it is worthwhile; the writers of the compiler know at least as much about what is more efficient on a particular platform than the programmer who uses the compiler.

Write clear, simple, transparent code - that is your job. Let the compiler do the low level optimizations - that is the compiler's job.

1
2
3
4
5
6
7
8
9
10
11
inline bool find_it( const int a[], int sz, int value )
{
   for( int i = 0 ; i < sz ; ++i ) if( a[i] == value ) return true ;
   return false ;
}

bool foobar( int a, int b, int c, int x )
{
    const int seq[10] = { 0, 1, 2, 3, 4, 5, 6, a, b, c } ;
    return find_it( seq, 10, x ) ;
}


GCC 4.7, compiled with -O3 -fomit-frame-pointer unrolled the loop and generated:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
__Z6foobariiii:
	subl	$48, %esp
	movl	52(%esp), %edx
	movl	64(%esp), %eax
	movl	%edx, 36(%esp)
	movl	56(%esp), %edx
	testl	%eax, %eax
	movl	%edx, 40(%esp)
	movl	60(%esp), %edx
	movl	%edx, 44(%esp)
	je	L11
	cmpl	$1, %eax
	je	L11
	cmpl	$2, %eax
	je	L11
	cmpl	$3, %eax
	.p2align 4,,2
	je	L11
	cmpl	$4, %eax
	.p2align 4,,2
	je	L11
	cmpl	$5, %eax
	.p2align 4,,2
	je	L11
	cmpl	$6, %eax
	.p2align 4,,2
	je	L11
	cmpl	36(%esp), %eax
	je	L11
	cmpl	40(%esp), %eax
	je	L11
	cmpl	%edx, %eax
	sete	%al
	addl	$48, %esp
	ret
L11:
	movl	$1, %eax
	addl	$48, %esp
	ret

Topic archived. No new replies allowed.