Seive Sundaram MUCH faster than Eratosthenes?

So I implemented these two seives to test out which is faster, and was very surprised to find that the seive of sundaram is insanely faster than eratosthenes. In fact, the difference in timing between the two was so great, that I am a bit suspicious of the results.

From what I've read, it seemed that the difference between the two was not so great, and many even said seive of eratosthenes is faster, so Im a bit puzzled about my results.

Is my implementation okay?
Note: I commented out the printing for both functions. I only wanted to time the actual "seiving", not the printing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
 #include <iostream>
#include <set>
#include <cmath>
#include <ctime>
#include <cstdlib>
using namespace std;

void seiveEra(set<int>&, int);
void seiveSundaram(set<int>&, int);
int main()
{
    cout << "Enter n: ";
    int n;
    cin >> n;

    set<int> S;

    clock_t start, stop;
    double duration;

    start = clock();
    seiveEra(S, n);
    stop = clock();

    duration = (double)(stop-start)/CLOCKS_PER_SEC;
    cout << "Eratosthenes took " << duration << " seconds\n\n";


    start = clock();
    seiveSundaram(S, n);
    stop = clock();

    duration = (double)(stop-start)/CLOCKS_PER_SEC;
    cout << "Sundaram took " << duration << " seconds\n\n";


}

void seiveEra(set<int>& S, int n)
{
    for(int i = 2; i<=n; ++i)
        S.insert(i);

    for(int i = 2; i<=sqrt(n); ++i)
    {
        if(S.find(i) == S.end()) continue;

        for(int j = i*2; j<=n; ++j)
        {
            if(S.find(j) == S.end()) continue;
            if(j%i==0)
                S.erase(j);
        }
    }

    /************PRINT***********************/

    /*int i = 1;
    cout << "\nSEIVE OF ERATOSTHENES\n";
    cout << "All prime numbers up to " << n << " are: \n\n";

    for(set<int>::iterator it = S.begin(); it != S.end(); ++it)
    {
        cout << *it << " ";
        if(i%10==0) cout << "\n";
        i++;
    }
    cout << "\n\n";*/
}

void seiveSundaram(set<int>& S, int n)
{
    int N = (n-1)/2;

    for(int i = 1; i<= N; ++i)
        S.insert(i);

    for(int i=1; i<=n; ++i)
    {
        for(int j = i; j<((n-i)/(2*i+1)); ++j)
        {
            S.erase(i+j+2*i*j);
        }

    }

    /************PRINT***********************/
    /*int i = 2;
    cout << "\nSEIVE OF SUNDARAM\n";
    cout << "All prime numbers up to " << n << " are: \n\n";
    cout << 2 << " ";
    for(set<int>::iterator it = S.begin(); it != S.end(); ++it)
    {
        cout << 2*(*it)+1 << " ";
        if(i%10==0) cout << "\n";
        i++;
    }*/

    //cout << "\n\n";

}


As n got larger, the difference in timing between the two became greater. Here are my results:

n = 1000
sundaram = .001 s
Eratosthenes = .003 s

n = 10,000
sundaram = .009 s
Eratosthenes = .068 s

n = 100,000
sundaram = .131 s
eratosthenes = 1.907

n = 500,000
sundaram = .868 s
eratosthenes = 19.535 s

n = 1,000,000
sundaram = 1.923 s
eratosthenes = 54.696 s
Last edited on
With std::set, both insert() and find() are O( log N )
Use std::vector instead (the specialisation for std::vector<bool> saves space, but element access would be slower).

With reasonably naive implementations, in practice there would not be a startling difference in performance.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#include <iostream>
#include <vector>
#include <ctime>

std::vector<std::size_t> primes_till_s( std::size_t n ) // sundaram
{
    const std::size_t m = n / 2 ;
    std::vector<int> sieve( m, true ) ;
    for( std::size_t i = 1 ; i < m ; ++i )
    {
        const std::size_t l = (m-i) / ( 2*i + 1 ) ;
        for( std::size_t j = i ; j <= l ; ++j ) sieve[ i + j + 2*i*j ] = false ;
    }

    std::vector<std::size_t> primes ;
    primes.push_back(2) ;
    for( unsigned i = 1 ; i < m ; ++i ) if( sieve[i] ) primes.push_back( i*2 + 1 ) ;
    return primes ;
}

std::vector<std::size_t> primes_till_e( std::size_t n ) // eratosthenes
{
    std::vector<int> sieve( n, true ) ;
    for( std::size_t i=2 ; i<n ; ++i ) if( sieve[i] )
    {
        for( std::size_t j = i*i ; j<n ; j += i ) sieve[j] = false ;
    }

    std::vector<std::size_t> primes ;
    for( std::size_t i = 2 ; i < n ; ++i ) if( sieve[i] ) primes.push_back(i) ;
    return primes ;
}

int main()
{
    const std::size_t n = 32'000'000 ;

    {
        const auto start = std::clock() ;
        const auto last = primes_till_s(n).back() ;
        const auto end = std::clock() ;
        std::cout << "    sieve of sundaram: last prime < " << n << " == " << last << "   processor time: "
                  << (end-start) * 1000.0 / CLOCKS_PER_SEC << " milliseconds.\n" ;
    }

    {
        const auto start = std::clock() ;
        const auto last = primes_till_e(n).back() ;
        const auto end = std::clock() ;
        std::cout << "sieve of eratosthenes: last prime < " << n << " == " << last << "   processor time: "
                  << (end-start) * 1000.0 / CLOCKS_PER_SEC << " milliseconds.\n" ;
    }
}

http://coliru.stacked-crooked.com/a/860020927ba06cb9

In case you haven't seen it, sieve of Atkin: https://en.wikipedia.org/wiki/Sieve_of_Atkin
In eratosthenes you were doing
1
2
3
4
5
6
7
for(int i = 2; i<=sqrt(n); ++i)
        for(int j = i*2; j<=n; ++j)
        {
            if(S.find(j) == S.end()) continue;
            if(j%i==0)
                S.erase(j);
        }
You are stepping on all the values and then check if they are multiple of `i'.
That'll give you O(n^2)

However you may do j = i*i ; j<n ; j += i, stepping only on the multiples. Then the number of pass would follow the harmonic series, giving you O(n lg n)
Repeat: it is the use of std::set that is causing the really huge difference in performance.

With std::vector:
--------- clang++/libc++ ------------

          sieve of sundaram: last prime < 32000000 == 31999939   processor time: 1050 milliseconds.
sieve of eratosthenes (i*i): last prime < 32000000 == 31999939   processor time: 1090 milliseconds.
sieve of eratosthenes (i*2): last prime < 32000000 == 31999939   processor time: 1450 milliseconds.

--------- g++/libstdc++ -------------

          sieve of sundaram: last prime < 32000000 == 31999939   processor time: 1050 milliseconds.
sieve of eratosthenes (i*i): last prime < 32000000 == 31999939   processor time: 1070 milliseconds.
sieve of eratosthenes (i*2): last prime < 32000000 == 31999939   processor time: 1450 milliseconds.

http://coliru.stacked-crooked.com/a/b854027b19ec74e9
Repeat: it is the use of std::set that is causing the really huge difference in performance.

I think ne555 is right. The OP's implementation of eratosthenes was suboptimal and that's why he saw such a huge performance difference. The problem wasn't the starting point of the inner loop (2*i vs i*i), the problem was the increment (1 vs. i)

Using std::vector will speed up both implementations as you showed.
Yes. Note that incrementing j by one in the inner loop is a possibility only when a set is used.
Thank you for all the responses. A bit busy right now, but will be reading them later.
Okay, I've read all your replies and re-implemented the two sieves. I have some follow-up questions, but first, here is my updated code (pretty much a re-hash of what JLBorges posted, but in my own novice style):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
#include <iostream>
#include <vector>
#include <cmath>
#include <ctime>
#include <cstdlib>
using namespace std;

void seiveEra(int);
void seiveSundaram(int);
int main()
{
    cout << "Enter n: ";
    int n;
    cin >> n;

    clock_t start, stop;
    double duration;

    start = clock();
    seiveEra(n);
    stop = clock();

    duration = 1000*(stop-start)/CLOCKS_PER_SEC;
    cout << "Eratosthenes took " << duration << " milliseconds\n\n";


    start = clock();
    seiveSundaram(n);
    stop = clock();

    duration = 1000.0*(stop-start)/CLOCKS_PER_SEC;
    cout << "Sundaram took " << duration << " milliseconds\n\n";


}

void seiveEra(int n)
{
    vector<int>s(n+1, true);

    for(int i = 2; i<=sqrt(n); ++i)
    {
        if(s[i])
            for(int j = i*i; j<=n; j+=i)
               s[j] = false;
    }


    /************PRINT***********************/

    int j = 0;
    cout << "\nSEIVE OF ERATOSTHENES\n";
    //cout << "All prime numbers up to " << n << " are: \n\n";

    vector<int> primes;
    for(int i = 2; i<=n; ++i)
    {
        if(s[i]) primes.push_back(i);
        /*
        {
            cout << i << " ";
            j++;

            if(j%10==0) cout << "\n";
        }*/

    }

    cout << primes.back() << "\n\n";

}

void seiveSundaram(int n)
{
    const int m = n/2; // why does this work?

    vector<int> s(m+1, true);

    for(int i = 1; i<=m; ++i)
    {
        const int l = (m-i)/(2*i+1);
        for(int j = i; j<=l; ++j)
        {
            s[i+j+2*i*j] = false;
        }
    }


    /************PRINT***********************/
    int j = 0;
    cout << "\nSEIVE OF SUNDARAM\n";
    //cout << "All prime numbers up to " << n << " are: \n\n";
    //cout << 2 << " ";
    vector<int> primes;
    for(int i = 1; i<=m; ++i)
    {
        if(s[i]) primes.push_back(2*i+1);/*
        {
            cout << 2*i+1 << " ";
            j++;

            if(j%10==0) cout << "\n";

        }*/

    }
    cout << primes.back();
    cout << "\n\n";

}


The code works and im getting pretty much the same time durations as JLBorges, but have some questions:

(1) I've commented out the printing in my code for now, but I noticed that when I was printing, it would not print out 'n' if 'n' was a prime. For example, for n=11, it would print 2, 3, 5, 7. I realized the size of the vectors had to be n+1, not 'n' (and similarly it should be 'm+1' for sieve of sundaram). When I made this change, it printed out all primes including 11. @JLBorges, did you notice this as well, and you made a mistake in your code?

(2) @JLBorges: My sieve of Eratosthenes duration was just about 80 or so milliseconds slower than the duration given by your code (so pretty much same duration, as we do we have different machines and that could be a factor). But I was surprised by this because I actually only loopeed up to sqrt(n) in my code. So why wasn't my duration for sieve of Eratosthenes faster since I only went up to sqrt(n) and you went up to n?

(3) @JLBorges: I noticed you used an int vector (called 'sieve'), but filled it up with boolean values ('true'). Is this because the compiler automatically translates 'true' into '1'? So an int can hold a Boolean variable? Would the opposite also work (fill up a bool vector with int values)?

(4) I used a bool vector at first, but my durations were a lot longer. Why is this?

Im sorry for so many questions, but I can't help but indulge :)
Last edited on
> (1) I noticed that when I was printing, it would not print out 'n' if 'n' was a prime. ... did you notice this as well

Yes, I was aware of it.
Therefore, std::cout << "... last prime < " << n ...
instead of std::cout << "... last prime <= " << n ...
Last prime number less than 11 is 7, Last prime number less than or equal to 11 is 11.


> (2) I actually only loopeed up to sqrt(n) in my code. So why wasn't my duration for sieve of Eratosthenes faster?

For sufficiently large values of n, your version is faster:
--------- clang++/libc++ ------------

      sieve of eratosthenes [i<n]: last prime < 32000000 == 31999939   processor time: 1070 milliseconds.
sieve of eratosthenes [i<sqrt(n)]: last prime < 32000000 == 31999939   processor time: 980 milliseconds.

--------- g++/libstdc++ -------------

      sieve of eratosthenes [i<n]: last prime < 32000000 == 31999939   processor time: 1080 milliseconds.
sieve of eratosthenes [i<sqrt(n)]: last prime < 32000000 == 31999939   processor time: 930 milliseconds.

http://coliru.stacked-crooked.com/a/42ed14fc92036bd2

Not very much faster because:
a. The inner loop for( std::size_t j = i*i ; j<n ... does not execute when i >= sqrt(n)
b. For small values of n, the time taken to compute the square root of n may not be insignificant.


> (3) @JLBorges: I noticed you used an int vector (called 'sieve'), but filled it up with boolean values ('true').
> (4) I used a bool vector at first, but my durations were a lot longer. Why is this?

I had mentioned (in passing) iearlier:
"the specialisation for std::vector<bool> saves space, but element access would be slower".
For details, see: http://en.cppreference.com/w/cpp/container/vector_bool


> Is this because the compiler automatically translates 'true' into '1'? So an int can hold a Boolean variable?

Yes. There are implicit conversions between int and bool.

int to bool: The value zero is converted to false; any other value is converted to true.

bool to int: false is converted to zero, true is converted to one.


> Would the opposite also work (fill up a bool vector with int values)?

This would work if and only if the integer values of interest are either zero or one.
Thank you, you've been of great help. Im learning so much.

I've tried implementing the sieve of Atkin, but before I post that, a small question regarding your sieve of sundaram implementation.

const std::size_t m = n / 2

since the number of primes generated by this algorithm are 2n+1, shouldn't this line be:
m = (n-1)/2

I understand the value is floored anyway so your line is as good as mine, but even then, for some values (like n=100), the two lines of code (yours vs mine) can produce different 'm' values. Entering 100 with your version gives me 101 as the last prime. I may be nitpicking here, and Im sorry about that, but its just a small issue I wanted to bring to light.

Anyway, here is my implementation for the sieve of Atkin, based directly off of the algorithm described on wiki:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
void sieveAtkins(int limit)
{
    vector<int> results;
    results.push_back(2);
    results.push_back(3);
    results.push_back(5);

    int n, r;
    int root = sqrt(limit);
    vector<int> sieve(limit+1, false);

    for(int i=7; i<=limit; ++i)
    {
        for(int x = 1; x < root; ++x)
        {
            for(int y = 1; y<root; ++y)
            {
                n = 4*x*x + y*y;
                r = n%60;
                if(n<=limit && (r==1 || r==13 || r==17 || r==29 || r==37 || r==41 || r==49 || r==53))
                    sieve[n] ? sieve[n] = false : sieve[n] = true;

                n = 3*x*x + y*y;
                r = n%60;
                if(n<=limit && (r==7 || r==19 || r==31 || r==43))
                    sieve[n] ? sieve[n] = false : sieve[n] = true;

                n = 3*x*x - y*y;
                r = n%60;
                if(n<=limit && (r==11 || r==23 || r==47 || r==59))
                    sieve[n] ? sieve[n] = false : sieve[n] = true;
            }
        }
    }

    for(int i = 7; i<=limit; ++i)
    {
        if(sieve[i])
        {
            results.push_back(i);
            // mark all multiples of he square of primes as false
            for(int j = i*i; j<=limit; j+=i*i) sieve[j] = false;
        }
    }

    // print the prime numbers
    for(int i=0; i<results.size(); ++i)
        cout << results[i] << " ";

}


Its behaving quite oddly. It prints all primes for some smaller n values, but stops at 5 or 7 for larger n values. I tried following the wiki algorithm step by step, so what am I doing wrong here?
Last edited on
> since the number of primes generated by this algorithm are 2n+1, shouldn't this line be: m = (n-1)/2

Even numbers (greater than two) can be ignored; and in any case, we have a special case for two:
primes.push_back(2) ;.

n ==  97  last prime <  97 ==  89    details: m == n/2 ==  48 ( (m-1) * 2 + 1 ) ==  95
n == 100  last prime < 100 ==  97    details: m == n/2 ==  50 ( (m-1) * 2 + 1 ) ==  99
n == 101  last prime < 101 ==  97    details: m == n/2 ==  50 ( (m-1) * 2 + 1 ) ==  99
n == 102  last prime < 102 == 101    details: m == n/2 ==  51 ( (m-1) * 2 + 1 ) == 101
n == 103  last prime < 103 == 101    details: m == n/2 ==  51 ( (m-1) * 2 + 1 ) == 101
n == 107  last prime < 107 == 103    details: m == n/2 ==  53 ( (m-1) * 2 + 1 ) == 105
n == 108  last prime < 108 == 107    details: m == n/2 ==  54 ( (m-1) * 2 + 1 ) == 107

http://coliru.stacked-crooked.com/a/9b4eb0ab48cf1270


> I tried following the wiki algorithm step by step, so what am I doing wrong here?

I haven't had a look at your code yet; I'll update this post when I do.

UPDATE:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
void sieveAtkins(int limit)
{
    // 1. initialise the result list with 2, 3, 5.
    vector<int> results;
    results.push_back(2);
    results.push_back(3);
    results.push_back(5);

    int n, r;
    // int root = sqrt(limit);
    // const int root =

    // 2. Create a sieve list with an entry for each positive integer;
    // all entries of this list should initially be marked non prime (composite)
    vector<int> sieve(limit+1, false);


    //for(int i=7; i<=limit; ++i)
    {
        // 3. For each entry number n in the sieve list, with modulo-sixty remainder r
        // for(int x = 1; x < root; ++x)
        for(int x = 1; x <= limit; ++x)
        {
            // for(int y = 1; y<root; ++y)
            for(int y = 1; y<=limit; ++y)
            {
                n = 4*x*x + y*y;
                r = n%60;

                // 3.1 If r is 1, 13, 17, 29, 37, 41, 49, or 53, flip the entry for each possible solution to 4x2 + y2 = n
                if(n<=limit && (r==1 || r==13 || r==17 || r==29 || r==37 || r==41 || r==49 || r==53))
                    sieve[n] ? sieve[n] = false : sieve[n] = true;

                // 3.2 If r is 7, 19, 31, or 43, flip the entry for each possible solution to 3x2 + y2 = n.
                n = 3*x*x + y*y;
                r = n%60;
                if(n<=limit && (r==7 || r==19 || r==31 || r==43))
                    sieve[n] ? sieve[n] = false : sieve[n] = true;

                // 3.3 If r is 11, 23, 47, or 59, flip the entry for
                // each possible solution to 3x2 − y2 = n when x > y
                n = 3*x*x - y*y;
                r = n%60;
                if( x > y ) // *** added
                    if(n<=limit && (r==11 || r==23 || r==47 || r==59))
                        sieve[n] ? sieve[n] = false : sieve[n] = true;
            }
        }
    }

    for(int i = 7; i<=limit; ++i)
    {
        if(sieve[i])
        {
            results.push_back(i);
            // mark all multiples of he square of primes as false
            for(int j = i*i; j<=limit; j+=i*i) sieve[j] = false;
        }
    }

    // print the prime numbers
    for(int i=0; i<results.size(); ++i)
        cout << results[i] << " ";

}



Fancier rendering of essentially the same code (as earlier, here too, the limit is an exclusive upper bound):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <iostream>
#include <vector>
#include <initializer_list>

void toggle_if( std::vector<int>& sieve, std::size_t number, std::initializer_list<std::size_t> ilist )
{
    if( number < sieve.size() )
    {
        const std::size_t r = number%60 ;
        for( std::size_t v : ilist ) if( v == r )
        {
             sieve[number] = !sieve[number] ;
             break ;
        }
    }
}

std::vector<std::size_t> primes_till( std::size_t limit )
{
    // 1. initialise the result list with 2, 3, 5.
    std::vector<std::size_t> result { 2, 3, 5 } ;

    // 2. Create a sieve list with an entry for each positive integer;
    // all entries of this list should initially be marked non prime (composite)
    std::vector<int> sieve( limit, false );

    // 3. For each entry number n in the sieve list, with modulo-sixty remainder r
    for( std::size_t x = 1; x < limit; ++x )
    {
        for( std::size_t y = 1; y < limit; ++y )
        {
            // 3.1 If r is 1, 13, 17, 29, 37, 41, 49, or 53, flip the entry for each possible solution to 4x2 + y2 = n
            toggle_if( sieve, 4*x*x + y*y, { 1, 13, 17, 29, 37, 41, 49, 53 } ) ;

            // 3.2 If r is 7, 19, 31, or 43, flip the entry for each possible solution to 3x2 + y2 = n.
            toggle_if( sieve, 3*x*x + y*y, { 7, 19, 31, 43 } ) ;

            // 3.3 If r is 11, 23, 47, or 59, flip the entry for each possible solution to 3x2 − y2 = n when x > y
            if( x > y ) toggle_if( sieve, 3*x*x - y*y, { 11, 23, 47, 59 } ) ;
        }
    }

    for( std::size_t i = 7; i < limit; ++i ) if( sieve[i] )
    {
            result.push_back(i);
            // mark all multiples of he square of primes as false
            for( std::size_t j = i*i; j < limit; j += i*i ) sieve[j] = false;
    }

    return result ;
}

int main()
{
    for( std::size_t v : primes_till(1000) ) std::cout << v << ' ' ;
}
Last edited on
Ah, I see. Thank you for that. So here is my implementation of sieve of Atkin:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
void sieveAtkins(int limit)
{
    vector<int> results;
    results.push_back(2);
    results.push_back(3);
    results.push_back(5);

    int n, r;
    //int root = sqrt(limit);
    vector<int> sieve(limit+1, false);

    for(int x = 1; x <= limit; ++x)
    {
        for(int y = 1; y<=limit; ++y)
        {
            n = 4*x*x + y*y;
            r = n%60;
            if(n<=limit && (r==1 || r==13 || r==17 || r==29 || r==37 || r==41 || r==49 || r==53))
                sieve[n] = !sieve[n];

            n = 3*x*x + y*y;
            r = n%60;
            if(n<=limit && (r==7 || r==19 || r==31 || r==43))
                sieve[n] = !sieve[n];

            n = 3*x*x - y*y;
            r = n%60;
            if(x > y)
                if(n<=limit && (r==11 || r==23 || r==47 || r==59))
                    sieve[n] = !sieve[n];
        }
    }


    for(int i = 7; i<=limit; ++i)
    {
        if(sieve[i])
        {
            results.push_back(i);
            // mark all multiples of the square of primes as false
            for(int j = i*i; j<=limit; j+=i*i) sieve[j] = false;
        }
    }


    // print the prime numbers
    cout << "SIEVE OF ATKIN\n";
   cout << results.back() << "\n";

}


Only think im puzzled about is its speed. Its very slow and I did not expect that.

n=20000
Atkins took 4,864 ms.

Is this not supposed to be the fastest algorithm of the three? Or is my implementation simply not optimized?
> Its very slow and I did not expect that.
> Is this not supposed to be the fastest algorithm of the three?

It is theoretically (complexity analysis) the fastest algorithm of the three. However, an optimised implementation is not trivial.

The pseudocode in wiki https://en.wikipedia.org/wiki/Sieve_of_Atkin#Pseudocode is somewhat more optimised than the basic algorithm. But,
this pseudocode eliminates some obvious combinations of odd/even x's/y's in order to reduce computation ....
This pseudocode is written for clarity; although some redundant computations have been eliminated ... it still wastes almost half of its quadratic computations on non-productive loops ... To improve its efficiency, a method must be devised to minimize or eliminate these non-productive computations.


A fast implementation of the sieve of Atkin: http://cr.yp.to/primegen.html

An optimised 2/3/5/7 (modulo 210) wheel-factorised sieve of Eratosthenes implementation:
https://github.com/kimwalisch/primesieve

Theory: https://en.wikipedia.org/wiki/Wheel_factorization
Thanks so much JLBorges. I learned so much since my first implementation because of your helpful responses. If I have any more questions regarding the fascinating topic of sieving and prime number generators, I'll be sure to post them here.

Cache locality:

Effect of spacial cache locality:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <iostream>
#include <vector>
#include <ctime>
#include <cmath>
#include <iomanip>

template < typename INT_TYPE> std::vector<std::size_t> primes_till( std::size_t n ) // eratosthenes, sqrt version
{
    std::vector<INT_TYPE> sieve( n, true ) ;
    const std::size_t sqrt = std::ceil( std::sqrt(n) ) ;
    for( std::size_t i=2 ; i < sqrt ; ++i ) if( sieve[i] )
    {
        for( std::size_t j = i*2 ; j<n ; j += i ) sieve[j] = false ;
    }

    std::vector<std::size_t> primes ;
    for( std::size_t i = 2 ; i < n ; ++i ) if( sieve[i] ) primes.push_back(i) ;
    return primes ;
}

int main()
{
    const std::size_t n = 32'000'000 ;

    {
        const auto start = std::clock() ;
        const auto last = primes_till<int>(n).back() ;
        const auto end = std::clock() ;
        std::cout << " [int]: last prime < " << n << " == " << last 
                  << "   sieve size == " << std::setw(3) << n / 1'000'000 * sizeof(int) << " MB"
                  << "   processor time: " << std::setw(4) << (end-start) * 1000.0 / CLOCKS_PER_SEC << " milliseconds.\n" ;
    }

    {
        const auto start = std::clock() ;
        const auto last = primes_till<char>(n).back() ;
        const auto end = std::clock() ;
        std::cout << "[char]: last prime < " << n << " == " << last 
                  << "   sieve size == " << std::setw(3) << n / 1'000'000 * sizeof(char) << " MB"
                  << "   processor time: " << std::setw(4) << (end-start) * 1000.0 / CLOCKS_PER_SEC << " milliseconds.\n" ;
    }
}

--------- clang++/libc++ ------------

 [int]: last prime < 32000000 == 31999939   sieve size == 128 MB   processor time: 1030 milliseconds.
[char]: last prime < 32000000 == 31999939   sieve size ==  32 MB   processor time:  670 milliseconds.

--------- g++/libstdc++ -------------

 [int]: last prime < 32000000 == 31999939   sieve size == 128 MB   processor time: 1030 milliseconds.
[char]: last prime < 32000000 == 31999939   sieve size ==  32 MB   processor time:  600 milliseconds.

http://coliru.stacked-crooked.com/a/8482f2b61f7fd1ab

Segmented sieve with better temporal cache locality (primes up to 128 million):
--------- clang++/libc++ ------------

n == 128000000   segment size ==  16384   processor time: 350 milliseconds.
n == 128000000   segment size ==  32768   processor time: 470 milliseconds.
n == 128000000   segment size ==  65536   processor time: 440 milliseconds.
n == 128000000   segment size == 131072   processor time: 480 milliseconds.

--------- g++/libstdc++ -------------

n == 128000000   segment size ==  16384   processor time: 280 milliseconds.
n == 128000000   segment size ==  32768   processor time: 320 milliseconds.
n == 128000000   segment size ==  65536   processor time: 380 milliseconds.
n == 128000000   segment size == 131072   processor time: 400 milliseconds.

http://coliru.stacked-crooked.com/a/cfe86d3707a27c65

An introduction to the concepts of CPU caching and performance:
http://arstechnica.com/gadgets/2002/07/caching/1/
Topic archived. No new replies allowed.