Prime

Pages: 123
I enlarged the index vektor by 152 items to keep outside the undefined nowhere and removed all the if (m <= x), alas with no success:

From 1..100000000 found 5761455 primes in 0.265854 seconds.


While running my routine several times to find the best time it happens every now and then that I get the result from others, once I even had to play tic-tac-toe.
On my (admittedly quite old) laptop, this simple implementation is comparable to yours if not a little faster.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <iostream>
#include <vector>
#include <algorithm>
#include <cstdlib>
#include <cmath>
#include <ctime>
#include <cstdint>

using Int = uint32_t;
const Int UpTo = 100000000;
const Int L1D_CACHE_SIZE = 32768;

Int sieve(Int limit) {
    Int sqrt = (Int) std::sqrt(limit);
    Int segment_size = std::max(sqrt, L1D_CACHE_SIZE);
    Int count = (limit < 2) ? 0 : 1;
    Int i = 3, n = 3, s = 3;
    std::vector<char> sieve(segment_size), is_prime(sqrt + 1, true);
    std::vector<Int> primes, multiples;

    for (Int low = 0; low <= limit; low += segment_size) {
        std::fill(sieve.begin(), sieve.end(), true);
        Int high = low + segment_size - 1;
        high = std::min(high, limit);

        for ( ; i * i <= high; i += 2)
            if (is_prime[i])
                for (Int j = i * i; j <= sqrt; j += i)
                    is_prime[j] = false;

        for ( ; s * s <= high; s += 2)
            if (is_prime[s]) {
                primes.push_back(s);
                multiples.push_back(s * s - low);
            }

        for (Int i = 0; i < primes.size(); i++) {
            Int j = multiples[i];
            for (Int k = primes[i] * 2; j < segment_size; j += k)
                sieve[j] = false;
            multiples[i] = j - segment_size;
        }

        for ( ; n <= high; n += 2)
            if (sieve[n - low])
                count++;
    }
    return count;
}

int main() {
  clock_t t0 = clock();
  Int count = sieve(UpTo);
  std::cout << count << " primes found in "
            << double(clock() - t0) / CLOCKS_PER_SEC << " seconds.\n";
}

this simple implementation is comparable to yours if not a little faster


Comparable to mine? No, i) mine is a first try with almost no knowledge of C++, ii) I do not (not yet) grasp all details in yours, it is not beginners' level, and iii) yours is faster.

Before I bore you with silly questions I'll try to dig into it on my own. I'm still not finished with the tutorials of this site. In an other example I found something beyond my level, here an snippet from https://github.com/kimwalisch/primesieve/blob/master/src/EratSmall.cpp#L123
1
2
3
4
5
    switch (wheelIndex)
    {
      for (;;)
      {
        case 0: ...

The for (;;) is at first glance astonishing but corresponds to do forever in REXX. What puzzles me is, that this for (;;) is injected between switch (wheelIndex) and successive gridiron of case 0: to n.
Where may I find information about such kind of coding? Are the tutorials of this site enough?
Last edited on
I thought you were saying that your code was particularly fast. It's certainly complicated! I didn't realize you were a beginner, but now that I look through it I can see some clues.

I wouldn't say for(;;) is "astonishing". You can leave out any of the three parts. If you leave out the middle part (the condition) then it defaults to true. So for(;;) is another way of saying while(true).

As for the loop in the odd position inside the switch, the switch will jump to the case label, just like a goto. The cases don't have breaks so they will "fall through". And they are inside the infinite loop so they will fall through and then around again until the condition that they are testing causes one of them to execute a goto, which jumps to one of the labels that are created with the UPDATE_SIEVING_PRIME macros. The macros also create continue statements, which cause a jump back to the head of the outer loop.

Here is a simplified version of what is going on. The OUT macro, when called with, e.g., OUT(2), will expand to this:
1
2
3
    out2:    // the ## appends the argument to the preceding token
    cout << "out" "2" "\n";    // the # "stringifies" the argument (and adjacent string literals are concatenated)
    continue;


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include <iostream>

#define OUT(x) \
    out ## x: \
    cout << "out" #x "\n"; \
    continue;

int main()
{
    using std::cout;
    for (int i = 0; i < 15; ++i)
    {
        int n = 0;
        switch (i % 6)
        {
            for (;;)
            {
                case 0:
                    cout << "0 ";
                    if (n++ > i) goto out0;
                case 1:
                    cout << "1 ";
                    if (n++ > i) goto out1;
                case 2:
                    cout << "2 ";
                    if (n++ > i) goto out2;
            }

            OUT(0);
            OUT(1);
            OUT(2);

            for (;;)
            {
                case 3:
                    cout << "3 ";
                    if (n++ > i) goto out3;
                case 4:
                    cout << "4 ";
                    if (n++ > i) goto out4;
                case 5:
                    cout << "5 ";
                    if (n++ > i) goto out5;
            }

            OUT(3);
            OUT(4);
            OUT(5);
        }
    }
}

Thank you for your effort to explain me this. For the first line of output I see the reasons, alas already the second not.
0 1 out1
1 2 0 out0

So I describe the single steps as I see it from switch (i % 6) on:
i is 0, i % 6 also, so switch jumps to case 0: (without execution of for(;;)?), cout puts out "0 ", n++ > i is false as n is incremented after the comparison, so without break the execution falls through to case 1:, cout appends "1 ", now n++ > i is true and so the goto out1 is executed. (Well, it took a while until I found the corresponding label.) Next cout appends "out1\n"... next statement is continue; ... and I am lost.

Next line in output shows 1 in first place, that would mean the continue; triggers an iteration of the loop before switch. This for (int i = 0; i < 15; ++i) is the only spot where i is changed. For me, new to C++, this is challenging. Why does continue; skip over the for(;;) loop? What information do I miss? Without it I am not able neither to "read" such routines nor to code this way.
(<grin> the more I know about C++ I like "my" REXX -- https://en.wikipedia.org/wiki/Rexx)

Again, thank you for your time.
continue is a probably sadly under-utilized keyword.
it 'resets' the loop. that is, it skips the loop body (from where it is forward) and returns to the loop header to execute the next iteration of the loop (if there would be one). Its one of those things they don't get to in beginner material for quite a long while because as handy as it is, real life use is infrequent for most types of code (but its really handy in math heavy code).

by the way, your own code could benefit from partial for loops.
you have one with a bogus n=n, you could have just said for(; n< cond; n++) instead.

also you have excess return statements. They don't hurt, but the return at the end of a void function is optional and almost no one would put it in.


Last edited on
continue is a probably sadly under-utilized keyword.

In REXX it's called iterate and there I use it when appropriate. It passes control to the start of the innermost loop -- but(!) giving a name as argument, controll goes to the loop assisiated with that name. So you may "break out" from an inner loop, or even several nested loops.

This is the reason why I was puzzled by the execution sequence of the above example of dutch. But with a thorough look at the { ... } braces, the first execcuted goto jumps out of the for(;;)-loop, so the continue; acts on the for (int i = 0; i < 15; ++i)-loop. Sorry, it took a while until I saw it.
no need to be sorry, its probably a big mental adjustment learning a very different type of syntax for c++. All in all you are doing well. The code you did here is pretty quick (yea, it could be better, but it could be a LOT worse too) and most of the things you did that are weird are very minor.

my pc must be better than I thought it was. I got the OP's in .125 on my gaming home pc. I got a few .109s but .125 was the most consistent result.

I was attacking that pf function to see if I could get it in a line or 2 without jumps but I apparently noobed it up somewhere. I was hoping it would give some solid returns with a rewrite with a microscopic inline footprint.
Last edited on
I was attacking that pf function

Is there a chance to messure where the running program spends the most time? I once used such a tool for FORTRAN to see where in the code some fiddling is promising.
You say "without execution of for(;;)?", but there is no "execution" of for(;;) since there's nothing for it to do. It just defines a point that will be jumped back to. "Skipping" it by jumping to a label means nothing. Think about the underlying assembly code.

EDIT: I forgot to put the switch part into my pseudo-assembly code! It's there now (kind of).

    set i to 0
outer_for_head:
    compare i to 15
    jump to end_outer_for if compare was greater or equal

    # switch
    mod i and 6
    jump to case0 if it's 0
    etc...

        inner_for_head0:
            case0:
                output "0 "
                compare n to i
                increment n
                jump to out0 if compare was greater
            case1:
                output "1 "
                compare n to i
                increment n
                jump to out1 if compare was greater
            case2:
                output "2 "
                compare n to i
                increment n
                jump to out2 if compare was greater
        inner_for_tail0:
                jump to inner_for_head0

            out0:
                output "0 out\n"
                jump to outer_for_tail
            out1:
                output "1 out\n"
                jump to outer_for_tail
            out2:
                output "2 out\n"
                jump to outer_for_tail

        inner_for_head1:
            etc (similar to the first inner for)
            ...
            ...
        inner_for_tail1:
            jump to inner_for_head1

    end_switch:

outer_for_tail:
    increment i
    jump to outer_for_head
end_outer_for:

Last edited on
For profiling under linux you can use gprof.
I don't currently have a good setup, I don't have a debugger nor a profiler, just a text editor and compiler. I need to at least get my home setup right (work, we don't use much c++ so I am stuck with the crude tools). I was targeting that function via educated guesswork, not actual evidence that it would help. That is not really a sound approach, but we are playing here, after all.
Last edited on
but we are playing here

Yes, we do. If it was for serious, I would attend a training.
Hello dutch!
Thank you for your pseudo-assembly code. I assume there is a compiler option that shows what real assembly code it generated, human readable preferably.

Edit:
You say "without execution of for(;;)?", but there is no "execution" of for(;;) since there's nothing for it to do.

No "execution"? At least something similar like do forever or do while .true. what checks at the start or end of the loop if execution of the block hast to be repeated once ore or not. Could it be, as C++ is compiled, that this stucture is already defined during compilation? Nevertheless -- and it's still shaking me -- the switch (i % 6) jumps to case 0: ... in other words, in the middle of a repetitive loop/block/name-it ... and it works. Well, different languages, a bit different ideas.
Last edited on
yes, the compiler will generate assembly. Human readable is another story, but it at least dumps the line of code represented by the asm block as a comment. Compilers make... weird assembly at times.
Last edited on
Of course you can get the assembly from the compiler (with g++ it's the -S switch), but it's much more complicated:

	.file	"switchloop.cpp"
	.section	.rodata
	.type	_ZStL19piecewise_construct, @object
	.size	_ZStL19piecewise_construct, 1
_ZStL19piecewise_construct:
	.zero	1
	.local	_ZStL8__ioinit
	.comm	_ZStL8__ioinit,1,1
.LC0:
	.string	"0 "
.LC1:
	.string	"1 "
.LC2:
	.string	"2 "
.LC3:
	.string	"out0\n"
.LC4:
	.string	"out1\n"
.LC5:
	.string	"out2\n"
.LC6:
	.string	"3 "
.LC7:
	.string	"4 "
.LC8:
	.string	"5 "
.LC9:
	.string	"out3\n"
.LC10:
	.string	"out4\n"
.LC11:
	.string	"out5\n"
	.text
	.globl	main
	.type	main, @function
main:
.LFB1383:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$16, %rsp
	movl	$0, -8(%rbp)
.L19:
	cmpl	$14, -8(%rbp)
	jg	.L2
	movl	$0, -4(%rbp)
	movl	-8(%rbp), %ecx
	movl	$715827883, %edx
	movl	%ecx, %eax
	imull	%edx
	movl	%ecx, %eax
	sarl	$31, %eax
	subl	%eax, %edx
	movl	%edx, %eax
	addl	%eax, %eax
	addl	%edx, %eax
	addl	%eax, %eax
	subl	%eax, %ecx
	movl	%ecx, %edx
	cmpl	$5, %edx
	ja	.L3
	movl	%edx, %eax
	movq	.L5(,%rax,8), %rax
	jmp	*%rax
	.section	.rodata
	.align 8
	.align 4
.L5:
	.quad	.L4
	.quad	.L6
	.quad	.L7
	.quad	.L8
	.quad	.L9
	.quad	.L10
	.text
.L29:
	nop
.L4:
	movl	$.LC0, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	movl	-4(%rbp), %eax
	leal	1(%rax), %edx
	movl	%edx, -4(%rbp)
	cmpl	-8(%rbp), %eax
	setg	%al
	testb	%al, %al
	jne	.L27
.L6:
	movl	$.LC1, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	movl	-4(%rbp), %eax
	leal	1(%rax), %edx
	movl	%edx, -4(%rbp)
	cmpl	-8(%rbp), %eax
	setg	%al
	testb	%al, %al
	jne	.L28
.L7:
	movl	$.LC2, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	movl	-4(%rbp), %eax
	leal	1(%rax), %edx
	movl	%edx, -4(%rbp)
	cmpl	-8(%rbp), %eax
	setg	%al
	testb	%al, %al
	je	.L29
	jmp	.L23
.L27:
	nop
.L11:
	movl	$.LC3, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	jmp	.L3
.L28:
	nop
.L12:
	movl	$.LC4, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	jmp	.L3
.L23:
.L14:
	movl	$.LC5, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	jmp	.L3
.L32:
	nop
.L8:
	movl	$.LC6, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	movl	-4(%rbp), %eax
	leal	1(%rax), %edx
	movl	%edx, -4(%rbp)
	cmpl	-8(%rbp), %eax
	setg	%al
	testb	%al, %al
	jne	.L30
.L9:
	movl	$.LC7, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	movl	-4(%rbp), %eax
	leal	1(%rax), %edx
	movl	%edx, -4(%rbp)
	cmpl	-8(%rbp), %eax
	setg	%al
	testb	%al, %al
	jne	.L31
.L10:
	movl	$.LC8, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	movl	-4(%rbp), %eax
	leal	1(%rax), %edx
	movl	%edx, -4(%rbp)
	cmpl	-8(%rbp), %eax
	setg	%al
	testb	%al, %al
	je	.L32
	jmp	.L26
.L30:
	nop
.L15:
	movl	$.LC9, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	jmp	.L3
.L31:
	nop
.L16:
	movl	$.LC10, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	jmp	.L3
.L26:
.L18:
	movl	$.LC11, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	nop
.L3:
	addl	$1, -8(%rbp)
	jmp	.L19
.L2:
	movl	$0, %eax
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE1383:
	.size	main, .-main
	.type	_Z41__static_initialization_and_destruction_0ii, @function
_Z41__static_initialization_and_destruction_0ii:
.LFB1567:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$16, %rsp
	movl	%edi, -4(%rbp)
	movl	%esi, -8(%rbp)
	cmpl	$1, -4(%rbp)
	jne	.L35
	cmpl	$65535, -8(%rbp)
	jne	.L35
	movl	$_ZStL8__ioinit, %edi
	call	_ZNSt8ios_base4InitC1Ev
	movl	$__dso_handle, %edx
	movl	$_ZStL8__ioinit, %esi
	movl	$_ZNSt8ios_base4InitD1Ev, %edi
	call	__cxa_atexit
.L35:
	nop
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE1567:
	.size	_Z41__static_initialization_and_destruction_0ii, .-_Z41__static_initialization_and_destruction_0ii
	.type	_GLOBAL__sub_I_main, @function
_GLOBAL__sub_I_main:
.LFB1568:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	$65535, %esi
	movl	$1, %edi
	call	_Z41__static_initialization_and_destruction_0ii
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE1568:
	.size	_GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main
	.section	.init_array,"aw"
	.align 8
	.quad	_GLOBAL__sub_I_main
	.hidden	__dso_handle
	.ident	"GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609"
	.section	.note.GNU-stack,"",@progbits
Interleaved with the source code it shows which statement provoked what. In the following excerpt the for(,,) in question is related to exactly -- nothing, as you said.
* * * * * snip * * * * *
; 14   :         switch (i % 6)

	mov	eax, DWORD PTR _i$23046[ebp]
	cdq
	mov	ecx, 6
	idiv	ecx
	mov	DWORD PTR tv67[ebp], edx
	cmp	DWORD PTR tv67[ebp], 5
	ja	$LN23@main
	mov	edx, DWORD PTR tv67[ebp]
	jmp	DWORD PTR $LN42@main[edx*4]
$LN20@main:

; 15   :         {
; 16   :             for (;;)
; 17   :             {
; 18   :                 case 0:
; 19   :                     cout << "0 ";
* * * * * snap * * * * *
One more try, now vector replaced by bitset in the hope its count() function is optimized:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
#include <iostream>		/* cout, cin, endl */
#include <ctime>		/* clock */
#include <cmath>		/* sqrt */
#include <bitset>
using namespace std;

unsigned long const z(100000000);
bitset<((z-2)/30+20)*8> p(string("0")); // +20 * 8 for a simpler set up 

// from bitset index to int number
unsigned long i2n(unsigned long i)
{
    unsigned long b;
	b = i % 8;
	switch (b) {
		case 0:
			b = 7;
			break;
		case 1:
			b = 11;
			break;
		case 2:
			b = 13;
			break;
		case 3:
			b = 17;
			break;
		case 4:
			b = 19;
			break;
		case 5:
			b = 23;
			break;
		case 6:
			b = 29;
			break;
		case 7:
			b = 31;
			break;
	}
	return 30 * (i / 8) + b;
}

// sieve out single number in bitset
void pf(unsigned long n)
{
    unsigned long b;
	b = n % 30;
	switch (b) {
		case 7:
			b = 0;
			break;
		case 11:
			b = 1;
			break;
		case 13:
			b = 2;
			break;
		case 17:
			b = 3;
			break;
		case 19:
			b = 4;
			break;
		case 23:
			b = 5;
			break;
		case 29:
			b = 6;
			break;
		case 1:
			n -= 2;
			b = 7;
			break;
		default:
			return;
	}
	p[(n / 30) * 8 + b] = false;
	return;
}

int main()
{
	unsigned long n, c, i, rz, m, x;

	clock_t t0 = clock();		/* time(R) */
	
	p.flip();
	
	rz = ((sqrt(z+.0)-2)/30)*8 + 1;		/* roundabout */
	x = p.size() - 1;       /* it is few bits too big */
    for (; i2n(x) > z; --x)
        p[x] = false;
    
    for (m = 12; m <= x; m+=12)     // \7
    {
        p[m] = false; m+=7;
        p[m] = false; m+=4;
        p[m] = false; m+=7;
        p[m] = false; m+=4;
        p[m] = false; m+=7;
        p[m] = false; m+=12;
        p[m] = false; m+=3;
        p[m] = false;
    }

    for (m = 31; m <= x; m+=12)     // \11
    {
        p[m] = false; m+=6;
        p[m] = false; m+=11;
        p[m] = false; m+=6;
        p[m] = false; m+=12;
        p[m] = false; m+=18;
        p[m] = false; m+=5;
        p[m] = false; m+=18;
        p[m] = false;
    }

    for (m = 44; m <= x; m+=7)     // \13
    {
        p[m] = false; m+=13;
        p[m] = false; m+=7;
        p[m] = false; m+=14;
        p[m] = false; m+=21;
        p[m] = false; m+=7;
        p[m] = false; m+=21;
        p[m] = false; m+=14;
        p[m] = false;
    }

    for (m = 76; m <= x; m+=19)     // \17
    {
        p[m] = false; m+=9;
        p[m] = false; m+=18;
        p[m] = false; m+=27;
        p[m] = false; m+=9;
        p[m] = false; m+=27;
        p[m] = false; m+=18;
        p[m] = false; m+=9;
        p[m] = false;
    }

    for (m = 95; m <= x; m+=10)     // \19
    {
        p[m] = false; m+=20;
        p[m] = false; m+=30;
        p[m] = false; m+=11;
        p[m] = false; m+=30;
        p[m] = false; m+=20;
        p[m] = false; m+=10;
        p[m] = false; m+=21;
        p[m] = false;
    }

	for (n = 5; n <= rz; ++n)	/* only up to sqrt(z) */
	{
		if (p[n])				/* candidate is prime */
		{
			m = i2n(n);
			for (i = m*m; i <= z; i += m+m)
				pf(i);			/* sieve out multiple */
		}
	}

/* sieving is done, all remaining are prime */
    c = p.count() + 3;      /* +3 for 2, 3, and 5 */

	double te = (double)( clock() - t0 ) / CLOCKS_PER_SEC;  /* time(E) */

	cout << "From 1.." << z << " found " << c << " primes in " << te << " seconds." << endl;
}

From 1..100000000 found 5761455 primes in 0.244158 seconds.

Not that much better, but for me another little mosaic stone for the tessellated work.

Edit: the simpler set up tripped on nonexistent bitset bits.
Last edited on
Topic archived. No new replies allowed.
Pages: 123