Prime

On my (admittedly quite old) laptop, this simple implementation is comparable to yours if not a little faster.

#include <iostream>
#include <vector>
#include <algorithm>
#include <cstdlib>
#include <cmath>
#include <ctime>
#include <cstdint>

using Int = uint32_t;
const Int UpTo = 100000000;
const Int L1D_CACHE_SIZE = 32768;

Int sieve(Int limit) {
    Int sqrt = (Int) std::sqrt(limit);
    Int segment_size = std::max(sqrt, L1D_CACHE_SIZE);
    Int count = (limit < 2) ? 0 : 1;
    Int i = 3, n = 3, s = 3;
    std::vector<char> sieve(segment_size), is_prime(sqrt + 1, true);
    std::vector<Int> primes, multiples;

    for (Int low = 0; low <= limit; low += segment_size) {
        std::fill(sieve.begin(), sieve.end(), true);
        Int high = low + segment_size - 1;
        high = std::min(high, limit);

        for ( ; i * i <= high; i += 2)
            if (is_prime[i])
                for (Int j = i * i; j <= sqrt; j += i)
                    is_prime[j] = false;

        for ( ; s * s <= high; s += 2)
            if (is_prime[s]) {
                primes.push_back(s);
                multiples.push_back(s * s - low);
            }

        for (Int i = 0; i < primes.size(); i++) {
            Int j = multiples[i];
            for (Int k = primes[i] * 2; j < segment_size; j += k)
                sieve[j] = false;
            multiples[i] = j - segment_size;
        }

        for ( ; n <= high; n += 2)
            if (sieve[n - low])
                count++;
    }
    return count;
}

int main() {
  clock_t t0 = clock();
  Int count = sieve(UpTo);
  std::cout << count << " primes found in "
            << double(clock() - t0) / CLOCKS_PER_SEC << " seconds.\n";
}

Edit & run on cpp.sh

Mar 8, 2019 at 6:00am

this simple implementation is comparable to yours if not a little faster

Comparable to mine? No, i) mine is a first try with almost no knowledge of C++, ii) I do not (not yet) grasp all details in yours, it is not beginners' level, and iii) yours is faster.

Before I bore you with silly questions I'll try to dig into it on my own. I'm still not finished with the tutorials of this site. In an other example I found something beyond my level, here an snippet from https://github.com/kimwalisch/primesieve/blob/master/src/EratSmall.cpp#L123

    switch (wheelIndex)
    {
      for (;;)
      {
        case 0: ...

The for (;;) is at first glance astonishing but corresponds to do forever in REXX. What puzzles me is, that this for (;;) is injected between switch (wheelIndex) and successive gridiron of case 0: to n.
Where may I find information about such kind of coding? Are the tutorials of this site enough?

Last edited on Mar 8, 2019 at 6:00am

Mar 8, 2019 at 7:15am

I thought you were saying that your code was particularly fast. It's certainly complicated! I didn't realize you were a beginner, but now that I look through it I can see some clues.

I wouldn't say for(;;) is "astonishing". You can leave out any of the three parts. If you leave out the middle part (the condition) then it defaults to true. So for(;;) is another way of saying while(true).

As for the loop in the odd position inside the switch, the switch will jump to the case label, just like a goto. The cases don't have breaks so they will "fall through". And they are inside the infinite loop so they will fall through and then around again until the condition that they are testing causes one of them to execute a goto, which jumps to one of the labels that are created with the UPDATE_SIEVING_PRIME macros. The macros also create continue statements, which cause a jump back to the head of the outer loop.

Here is a simplified version of what is going on. The OUT macro, when called with, e.g., OUT(2), will expand to this:

1
2
3

    out2:    // the ## appends the argument to the preceding token
    cout << "out" "2" "\n";    // the # "stringifies" the argument (and adjacent string literals are concatenated)
    continue;

#include <iostream>

#define OUT(x) \
    out ## x: \
    cout << "out" #x "\n"; \
    continue;

int main()
{
    using std::cout;
    for (int i = 0; i < 15; ++i)
    {
        int n = 0;
        switch (i % 6)
        {
            for (;;)
            {
                case 0:
                    cout << "0 ";
                    if (n++ > i) goto out0;
                case 1:
                    cout << "1 ";
                    if (n++ > i) goto out1;
                case 2:
                    cout << "2 ";
                    if (n++ > i) goto out2;
            }

            OUT(0);
            OUT(1);
            OUT(2);

            for (;;)
            {
                case 3:
                    cout << "3 ";
                    if (n++ > i) goto out3;
                case 4:
                    cout << "4 ";
                    if (n++ > i) goto out4;
                case 5:
                    cout << "5 ";
                    if (n++ > i) goto out5;
            }

            OUT(3);
            OUT(4);
            OUT(5);
        }
    }
}

Edit & run on cpp.sh

Mar 8, 2019 at 8:55am

Thank you for your effort to explain me this. For the first line of output I see the reasons, alas already the second not.

0 1 out1
1 2 0 out0

So I describe the single steps as I see it from switch (i % 6) on:
i is 0, i % 6 also, so switch jumps to case 0: (without execution of for(;;)?), cout puts out "0 ", n++ > i is false as n is incremented after the comparison, so without break the execution falls through to case 1:, cout appends "1 ", now n++ > i is true and so the goto out1 is executed. (Well, it took a while until I found the corresponding label.) Next cout appends "out1\n"... next statement is continue; ... and I am lost.

Next line in output shows 1 in first place, that would mean the continue; triggers an iteration of the loop before switch. This for (int i = 0; i < 15; ++i) is the only spot where i is changed. For me, new to C++, this is challenging. Why does continue; skip over the for(;;) loop? What information do I miss? Without it I am not able neither to "read" such routines nor to code this way.
(<grin> the more I know about C++ I like "my" REXX -- https://en.wikipedia.org/wiki/Rexx)

Again, thank you for your time.

Mar 8, 2019 at 12:18pm

continue is a probably sadly under-utilized keyword.
it 'resets' the loop. that is, it skips the loop body (from where it is forward) and returns to the loop header to execute the next iteration of the loop (if there would be one). Its one of those things they don't get to in beginner material for quite a long while because as handy as it is, real life use is infrequent for most types of code (but its really handy in math heavy code).

by the way, your own code could benefit from partial for loops.
you have one with a bogus n=n, you could have just said for(; n< cond; n++) instead.

also you have excess return statements. They don't hurt, but the return at the end of a void function is optional and almost no one would put it in.

Last edited on Mar 8, 2019 at 12:24pm

Mar 8, 2019 at 2:38pm

continue is a probably sadly under-utilized keyword.

In REXX it's called iterate and there I use it when appropriate. It passes control to the start of the innermost loop -- but(!) giving a name as argument, controll goes to the loop assisiated with that name. So you may "break out" from an inner loop, or even several nested loops.

This is the reason why I was puzzled by the execution sequence of the above example of dutch. But with a thorough look at the { ... } braces, the first execcuted goto jumps out of the for(;;)-loop, so the continue; acts on the for (int i = 0; i < 15; ++i)-loop. Sorry, it took a while until I saw it.

Mar 8, 2019 at 2:43pm

no need to be sorry, its probably a big mental adjustment learning a very different type of syntax for c++. All in all you are doing well. The code you did here is pretty quick (yea, it could be better, but it could be a LOT worse too) and most of the things you did that are weird are very minor.

my pc must be better than I thought it was. I got the OP's in .125 on my gaming home pc. I got a few .109s but .125 was the most consistent result.

I was attacking that pf function to see if I could get it in a line or 2 without jumps but I apparently noobed it up somewhere. I was hoping it would give some solid returns with a rewrite with a microscopic inline footprint.

Last edited on Mar 8, 2019 at 2:47pm

Mar 8, 2019 at 3:07pm

I was attacking that pf function

Is there a chance to messure where the running program spends the most time? I once used such a tool for FORTRAN to see where in the code some fiddling is promising.

Mar 8, 2019 at 3:51pm

You say "without execution of for(;;)?", but there is no "execution" of for(;;) since there's nothing for it to do. It just defines a point that will be jumped back to. "Skipping" it by jumping to a label means nothing. Think about the underlying assembly code.

EDIT: I forgot to put the switch part into my pseudo-assembly code! It's there now (kind of).

    set i to 0
outer_for_head:
    compare i to 15
    jump to end_outer_for if compare was greater or equal

    # switch
    mod i and 6
    jump to case0 if it's 0
    etc...

        inner_for_head0:
            case0:
                output "0 "
                compare n to i
                increment n
                jump to out0 if compare was greater
            case1:
                output "1 "
                compare n to i
                increment n
                jump to out1 if compare was greater
            case2:
                output "2 "
                compare n to i
                increment n
                jump to out2 if compare was greater
        inner_for_tail0:
                jump to inner_for_head0

            out0:
                output "0 out\n"
                jump to outer_for_tail
            out1:
                output "1 out\n"
                jump to outer_for_tail
            out2:
                output "2 out\n"
                jump to outer_for_tail

        inner_for_head1:
            etc (similar to the first inner for)
            ...
            ...
        inner_for_tail1:
            jump to inner_for_head1

    end_switch:

outer_for_tail:
    increment i
    jump to outer_for_head
end_outer_for:

Last edited on Mar 8, 2019 at 4:33pm

Mar 8, 2019 at 3:54pm

For profiling under linux you can use gprof.

Mar 8, 2019 at 4:25pm

I don't currently have a good setup, I don't have a debugger nor a profiler, just a text editor and compiler. I need to at least get my home setup right (work, we don't use much c++ so I am stuck with the crude tools). I was targeting that function via educated guesswork, not actual evidence that it would help. That is not really a sound approach, but we are playing here, after all.

Last edited on Mar 8, 2019 at 4:26pm

Mar 8, 2019 at 4:50pm

but we are playing here

Yes, we do. If it was for serious, I would attend a training.

Mar 8, 2019 at 4:58pm

Hello dutch!
Thank you for your pseudo-assembly code. I assume there is a compiler option that shows what real assembly code it generated, human readable preferably.

Edit:

You say "without execution of for(;;)?", but there is no "execution" of for(;;) since there's nothing for it to do.

No "execution"? At least something similar like do forever or do while .true. what checks at the start or end of the loop if execution of the block hast to be repeated once ore or not. Could it be, as C++ is compiled, that this stucture is already defined during compilation? Nevertheless -- and it's still shaking me -- the switch (i % 6) jumps to case 0: ... in other words, in the middle of a repetitive loop/block/name-it ... and it works. Well, different languages, a bit different ideas.

Last edited on Mar 8, 2019 at 6:15pm

Mar 8, 2019 at 5:07pm

yes, the compiler will generate assembly. Human readable is another story, but it at least dumps the line of code represented by the asm block as a comment. Compilers make... weird assembly at times.

Last edited on Mar 8, 2019 at 5:07pm

Mar 8, 2019 at 5:09pm

Of course you can get the assembly from the compiler (with g++ it's the -S switch), but it's much more complicated:

	.file	"switchloop.cpp"
	.section	.rodata
	.type	_ZStL19piecewise_construct, @object
	.size	_ZStL19piecewise_construct, 1
_ZStL19piecewise_construct:
	.zero	1
	.local	_ZStL8__ioinit
	.comm	_ZStL8__ioinit,1,1
.LC0:
	.string	"0 "
.LC1:
	.string	"1 "
.LC2:
	.string	"2 "
.LC3:
	.string	"out0\n"
.LC4:
	.string	"out1\n"
.LC5:
	.string	"out2\n"
.LC6:
	.string	"3 "
.LC7:
	.string	"4 "
.LC8:
	.string	"5 "
.LC9:
	.string	"out3\n"
.LC10:
	.string	"out4\n"
.LC11:
	.string	"out5\n"
	.text
	.globl	main
	.type	main, @function
main:
.LFB1383:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$16, %rsp
	movl	$0, -8(%rbp)
.L19:
	cmpl	$14, -8(%rbp)
	jg	.L2
	movl	$0, -4(%rbp)
	movl	-8(%rbp), %ecx
	movl	$715827883, %edx
	movl	%ecx, %eax
	imull	%edx
	movl	%ecx, %eax
	sarl	$31, %eax
	subl	%eax, %edx
	movl	%edx, %eax
	addl	%eax, %eax
	addl	%edx, %eax
	addl	%eax, %eax
	subl	%eax, %ecx
	movl	%ecx, %edx
	cmpl	$5, %edx
	ja	.L3
	movl	%edx, %eax
	movq	.L5(,%rax,8), %rax
	jmp	*%rax
	.section	.rodata
	.align 8
	.align 4
.L5:
	.quad	.L4
	.quad	.L6
	.quad	.L7
	.quad	.L8
	.quad	.L9
	.quad	.L10
	.text
.L29:
	nop
.L4:
	movl	$.LC0, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	movl	-4(%rbp), %eax
	leal	1(%rax), %edx
	movl	%edx, -4(%rbp)
	cmpl	-8(%rbp), %eax
	setg	%al
	testb	%al, %al
	jne	.L27
.L6:
	movl	$.LC1, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	movl	-4(%rbp), %eax
	leal	1(%rax), %edx
	movl	%edx, -4(%rbp)
	cmpl	-8(%rbp), %eax
	setg	%al
	testb	%al, %al
	jne	.L28
.L7:
	movl	$.LC2, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	movl	-4(%rbp), %eax
	leal	1(%rax), %edx
	movl	%edx, -4(%rbp)
	cmpl	-8(%rbp), %eax
	setg	%al
	testb	%al, %al
	je	.L29
	jmp	.L23
.L27:
	nop
.L11:
	movl	$.LC3, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	jmp	.L3
.L28:
	nop
.L12:
	movl	$.LC4, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	jmp	.L3
.L23:
.L14:
	movl	$.LC5, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	jmp	.L3
.L32:
	nop
.L8:
	movl	$.LC6, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	movl	-4(%rbp), %eax
	leal	1(%rax), %edx
	movl	%edx, -4(%rbp)
	cmpl	-8(%rbp), %eax
	setg	%al
	testb	%al, %al
	jne	.L30
.L9:
	movl	$.LC7, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	movl	-4(%rbp), %eax
	leal	1(%rax), %edx
	movl	%edx, -4(%rbp)
	cmpl	-8(%rbp), %eax
	setg	%al
	testb	%al, %al
	jne	.L31
.L10:
	movl	$.LC8, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	movl	-4(%rbp), %eax
	leal	1(%rax), %edx
	movl	%edx, -4(%rbp)
	cmpl	-8(%rbp), %eax
	setg	%al
	testb	%al, %al
	je	.L32
	jmp	.L26
.L30:
	nop
.L15:
	movl	$.LC9, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	jmp	.L3
.L31:
	nop
.L16:
	movl	$.LC10, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	jmp	.L3
.L26:
.L18:
	movl	$.LC11, %esi
	movl	$_ZSt4cout, %edi
	call	_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
	nop
.L3:
	addl	$1, -8(%rbp)
	jmp	.L19
.L2:
	movl	$0, %eax
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE1383:
	.size	main, .-main
	.type	_Z41__static_initialization_and_destruction_0ii, @function
_Z41__static_initialization_and_destruction_0ii:
.LFB1567:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$16, %rsp
	movl	%edi, -4(%rbp)
	movl	%esi, -8(%rbp)
	cmpl	$1, -4(%rbp)
	jne	.L35
	cmpl	$65535, -8(%rbp)
	jne	.L35
	movl	$_ZStL8__ioinit, %edi
	call	_ZNSt8ios_base4InitC1Ev
	movl	$__dso_handle, %edx
	movl	$_ZStL8__ioinit, %esi
	movl	$_ZNSt8ios_base4InitD1Ev, %edi
	call	__cxa_atexit
.L35:
	nop
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE1567:
	.size	_Z41__static_initialization_and_destruction_0ii, .-_Z41__static_initialization_and_destruction_0ii
	.type	_GLOBAL__sub_I_main, @function
_GLOBAL__sub_I_main:
.LFB1568:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	$65535, %esi
	movl	$1, %edi
	call	_Z41__static_initialization_and_destruction_0ii
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE1568:
	.size	_GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main
	.section	.init_array,"aw"
	.align 8
	.quad	_GLOBAL__sub_I_main
	.hidden	__dso_handle
	.ident	"GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609"
	.section	.note.GNU-stack,"",@progbits

Mar 8, 2019 at 10:30pm

Interleaved with the source code it shows which statement provoked what. In the following excerpt the for(,,) in question is related to exactly -- nothing, as you said.

* * * * * snip * * * * *
; 14   :         switch (i % 6)

	mov	eax, DWORD PTR _i$23046[ebp]
	cdq
	mov	ecx, 6
	idiv	ecx
	mov	DWORD PTR tv67[ebp], edx
	cmp	DWORD PTR tv67[ebp], 5
	ja	$LN23@main
	mov	edx, DWORD PTR tv67[ebp]
	jmp	DWORD PTR $LN42@main[edx*4]
$LN20@main:

; 15   :         {
; 16   :             for (;;)
; 17   :             {
; 18   :                 case 0:
; 19   :                     cout << "0 ";
* * * * * snap * * * * *

Mar 9, 2019 at 1:30am