Prime

I don't see a clean way to thread this. The current iteration depends on the previous one, I think it would need a rewrite to exploit threads.

what if you make P static and return a pointer or something to it?

Last edited on Mar 5, 2019 at 2:22pm

Mar 5, 2019 at 7:36pm

I don't see a clean way to thread this.

As mentioned before, there is an example: https://primesieve.org/

Mar 5, 2019 at 9:34pm

To lastchance:

As a beginner in the beginners' corner I learned something new: vectors. Thank you.
I learned also that M$VS2010 is waaaaay too slooooow manipulating a vector if compiled in debug mode. Compiled for release is ok.

Line 10 in your suggestion is int imax = sqrt( N + 0.5 );
Could you please explain the need for that +0.5?

Question: I do miss something similar to Pipelines on z/VM, see http://vm.marist.edu/%7Epipeline/pipeline.pdf
The output of cout is not as nice as I'd do on "my" mainframe.

Using verctor my suggestion is now:

#include <iostream>		/* cout, cin, endl */
#include <ctime>		/* clock */
#include <vector>
#include <cmath>        /* sqrt */
using namespace std;

unsigned long const z(100000000);
vector<bool> p(z+1, true);

int main()
{
	unsigned long n, c(1), i, nmax;

	clock_t t0 = clock();		/* time(R) */
	nmax = sqrt( z + .0 );  // sqrt wants a real argument

	for (n = 3; n <= nmax; n+=2)	/* only up to sqrt(z) */
		if (p[n])		/* candidate is prime */
		{
			c += 1;		/* count it */
			for (i = n*n; i <= z; i += n+n)
				p[i] = false;	/* sieve out multiple */
		}

/* sieving is done, all remaining are prime */
	for (n = n + 2; n <= z; n+=2)
		if (p[n]) c += 1;

	double te = (double)( clock() - t0 ) / CLOCKS_PER_SEC;  /* time(E) */

	cout << "From 1.." << z << " found " << c << " primes in " << te << " seconds." << endl;
	cout << "The first 50 of them are:" << endl;
	cout << 2;
	i = 0;
	for (n=3; i < 50; n += 2)
		if (p[n])
		{
			cout << ' ' << n;
			i++;
		}
	cout << endl;
}

From 1..100000000 found 5761455 primes in 0.437989 seconds.
The first 50 of them are:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233

BTW, the sqrt in the procedure has nothing to do with Eratosthenes, it is related to the limitation to only a few of the infinite quantity of numbers.

Edit: M$VS2010 rejects sqrt with integer argument. Changed sqrt( z ) to sqrt( z + .0 )

Last edited on Mar 6, 2019 at 1:39am

Mar 5, 2019 at 9:45pm

MikeStgt wrote:
Line 10 in your suggestion is int imax = sqrt( N + 0.5 ); Could you please explain the need for that +0.5?

The theoretical requirement would be sqrt( N ). Any composite number less than or equal to N must have at least one factor less than or equal to sqrt( N ), so this is part of the sieve algorithm and is the limit of how far you need check to initiate the slicing.

However, sqrt( N ) is evaluated and returned as a floating-point number, before being truncated down to an integer on assignment. As floating-point numbers are always subject to floating-point (in)accuracy, that extra 0.5 is covering myself for that unfortunate instance when, say, 24.99999999 gets truncated down to 24.

Last edited on Mar 5, 2019 at 10:34pm

Mar 6, 2019 at 12:19pm

I suspect vector<bool> works so well because it can store elements in a small amount of memory (possibly 1 bit rather than 1 or 4 bytes). Thus you can keep more in fast-access cached memory at any time. But I bet that is susceptible to what else you might be running on the machine at the time!

#include <iostream>
#include <cmath>
#include <ctime>
#include <vector>
using namespace std;

vector<int> sieve( int N )
{
   int M = ( N - 1 ) / 2;                        // Upper limit of odd numbers counting from 3, 5, ...
   vector<bool> A( 1 + M, true );                // A[0] not used
   int oddmax = ( sqrt( N + 0.5 ) - 1 ) / 2;
   for ( int odd = 1; odd <= oddmax; odd++ )     // Only worry about odd numbers (first is 3, second is 5 etc.)
   {
      if ( A[odd] )
      {
         int odd2 = odd << 1;
         int oddstep = odd2 + 1;
         for ( int j = odd2 * ( odd + 1 ); j <= M; j += oddstep ) A[j] = false;
      }
   }
   vector<int> P = { 2 };                                                              // 2 is the only even prime
   for ( int odd = 1; odd <= M; odd++ ) if ( A[odd] ) P.push_back( odd + odd + 1 );   // Include odd primes
   return P;
}

int main()
{
   const int N = 100000000;
   clock_t t0 = clock();
   vector<int> primes = sieve( N );
   double t = (double)( clock() - t0 ) / CLOCKS_PER_SEC;

   cout << "Found " << primes.size() << " primes in " << t << " seconds\n\n";

   const int num = 20;
   cout << "The first " << num << " are: ";
   for ( int i = 0; i < num; i++ ) cout << primes[i] << " ";
}

Found 5761455 primes in 0.357052 seconds

The first 20 are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71

Last edited on Mar 6, 2019 at 12:26pm

Mar 6, 2019 at 3:18pm

Edit: M$VS2010 rejects sqrt with integer argument. Changed sqrt( z ) to sqrt( z + .0 )

because the sqrt(2) isnt one of the most useful values in math, or anything...
good job M$

Mar 6, 2019 at 3:23pm

keskiverto (10425)

1. Is there any rational justification for still using the M$VS2010?

2. What is the grounds of rejection?

3. Explicit cast would be explicit: sqrt( static_cast<double>(z) )

Mar 7, 2019 at 12:10am

1. Is there any rational justification for still using the M$VS2010?

I was told that the compiler of 2010 has fewer errors than the successors. As a beginner I am not able to prove this statement. In contrast the IDE of the 2015 would be better, I was told. No idea, I am still struggling with terms like Project and Solution, big notions but obscure boundary (for me up to now).

2. What is the grounds of rejection?

sqrt for interger arguments is not implemented (in the system I am currently using).

3. Explicit cast would be explicit: sqrt( static_cast<double>(z) )

Thank you. Meanwhile I use my own "integer RooT"

unsigned long rt( unsigned long z)
{
    unsigned long a, b;
    a = z;
    do {
        b = a;
        a = (a + z / a) / 2;
    } while (a < b);
    return b;
}

It is slower than int rt = sqrt( N + 0.0 ), but i) it's called only once outside the loop and ii) I've done it on my own. (Just see now, argument 0 should get red carpet treatment -- unless 0/0=0 is usual in C/C++)

Last edited on Mar 7, 2019 at 12:10am

Mar 7, 2019 at 1:19am

closed account (E0p9LyTq)

I was told that the compiler of 2010 has fewer errors than the successors.

You were IMO, to put it bluntly, lied to.

If your intent is to write C++ code that is not capable of being even C++11 compliant then by all means use VS2010.

VS2017 is more than capable of enforcing C++ standard compliant coding.

There is one very minor quibble. The earliest C++ standard it allows appears to be C++14, even when no explicit language standard is chosen.

I learned also that VS2010 is waaaaay too slooooow manipulating a vector if compiled in debug mode. Compiled for release is ok.

In my experience debug mode with VS2017 still can be a major time bottle neck, if not as severe. Having to load and use the debug libraries, speed is not even remotely something to be worried about then. Speed is just not important when debugging code. IMO.

Mar 7, 2019 at 3:16am

Microsoft thinks they are trendy or even trendsetters.
Project and Solution.
Project should be obvious. At your level, its one assignment or program, but it can mean a group of those that are related in some way, like a library specific to your program that you also wrote might be 2 things in one project.

A solution is microsoftspeak for "program". Lately they have moved on to "app" which is short for "application" which is millennial for "program". All these things mean about the same thing: a freaking executable file on your computing device. Try not to get hung up over the latest buzzword. Its idiotic, but they are going to keep changing the words so you just have to sigh and roll with it.

writing your own root finder is great practice, but the CPU probably has this as a circuit, and nothing you can do in c++ is ever going to get within an order of magnitude of *that* for performance. Use the tools they gave you, apart from the 'for fun' and 'for practice' stuff. If you can beat it, do so (pow for integer powers is very beatable in code, but sqrt, I don't think its possible, its a case by case thing). You might do it with an approximation, but not a high precision finder. This stuff can be fun, but you can sink a lot of time into shaving off a few nanoseconds too.

2010 was 9 years ago. That is an *eternity* in this world. Granted I have a few things that date back to 1990, but not my compilers :P
C++ has revisions. It had one in '11, '14, and '17 so you are THREE generations out using that tool. (this was said but not directly, I wanted to make this very clear how out of date you are).

Last edited on Mar 7, 2019 at 3:20am

Mar 7, 2019 at 7:47am

You were IMO, to put it bluntly, lied to.

It is now three week I try my first steps with C++, first with MinGW and failed, now with M$VS2010 and yet I do not feel comfortable with it, so I am absolutely not in the position to have an own opinion about the pros and cons of compiler versions or IDEs. For me it is not important if I was told a lie or some taradiddle. My goal was first and foremost to modify an existing program (the trace output of V41, a virtual HP41 you may get at www.hp41.org for free, almost free, you have to hand over your e-mail address). I achieved that in the first week after installing VS2010. Now that I do have C++, I just try if it could be useful for something else.

In addition, I found this site with tutorials and a helpful beginners forum. Quite possible, I will redo simulations of an HP7470A pen plotter and an HP82162A printer. I did both in ooREXX some time ago.

Mar 7, 2019 at 8:02am

Use the tools they gave you, apart from the 'for fun' and 'for practice' stuff.

As a beginner in C++ I just try this and that, nothing serious. Some kind of training. And as with tennis, for the first trials an old racket will be ok.

Last edited on Mar 7, 2019 at 8:35am

Mar 7, 2019 at 8:34am

Now it's getting cruel. This suggestion is still Sieve of Eratosthenes but far away from the "aesthetical looking" first try.
The idea is, if reducing the size of the vector has such a distinct impact on execution time, I could squeeze out some more multiples, those of 3 and 5, and spend some time on index calculations.

#include <iostream>		/* cout, cin, endl */
#include <ctime>		/* clock */
#include <vector>
using namespace std;

unsigned long const z(100000000);
vector<bool> p(((z-2)/30+1)*8, true);

unsigned long rt( unsigned long z)
{
    unsigned long a, b;
    a = z;
    do {
        b = a;
        a = (a + z / a) / 2;
    } while (a < b);
    return b;
}

unsigned long i2n(unsigned long i)
{
    unsigned long b;
	b = i % 8;
	switch (b) {
		case 0:
			b = 7;
			break;
		case 1:
			b = 11;
			break;
		case 2:
			b = 13;
			break;
		case 3:
			b = 17;
			break;
		case 4:
			b = 19;
			break;
		case 5:
			b = 23;
			break;
		case 6:
			b = 29;
			break;
		case 7:
			b = 31;
			break;
//		default:		/* should not come here */
	}
	return 30 * (i / 8) + b;
}

void pf(unsigned long n)
{
    unsigned long a, b;
	b = n % 30;
	switch (b) {
		case 7:
			b = 0;
			break;
		case 11:
			b = 1;
			break;
		case 13:
			b = 2;
			break;
		case 17:
			b = 3;
			break;
		case 19:
			b = 4;
			break;
		case 23:
			b = 5;
			break;
		case 29:
			b = 6;
			break;
		case 1:
			n -= 2;
			b = 7;
			break;
		default:
			return;
	}
	a = n / 30;
	b = 8 * a + b;
	p[b] = false;
	return;
}

int main()
{
	unsigned long n, c(3), i, rz, m;

	clock_t t0 = clock();		/* time(R) */
	rz = ((rt(z)-2)/30)*8 + 1;		/* roundabout */

	for (n = 0; n <= rz; ++n)	/* only up to sqrt(z) */
	{
		if (p[n])				/* candidate is prime */
		{
			c += 1;				/* increment counter */
			m = i2n(n);
			for (i = m*m; i <= z; i += m+m)
				pf(i);			/* sieve out multiple */
		}
	}

/* sieving is done, all remaining are prime */
    for (i = ((z-2)/30+1)*8; i2n(i) > z; --i);
    for (n=n; n <= i; n++)
		if (p[n]) c++;

	double te = (double)( clock() - t0 ) / CLOCKS_PER_SEC;  /* time(E) */

	cout << "From 1.." << z << " found " << c << " primes in " << te << " seconds." << endl;
}

It looks awful but runs few miliseconds faster (only with Optimization level -O3, maximum).

From 1..100000000 found 5761455 primes in 0.347293 seconds.

The runtime varies perchance, this is one of the best. So I assume the value represents wall-clock time, not CPU-time.

Mar 7, 2019 at 9:01am

If just counting primes ...

Found 5761455 primes in 0.296162 seconds

#include <iostream>
#include <cmath>
#include <ctime>
#include <vector>
using namespace std;

int sieve( int N )
{
   int M = ( N - 1 ) / 2;                        // Upper limit of odd numbers counting from 3, 5, ...
   vector<bool> A( 1 + M, true );                // A[0] not used
   int oddmax = ( sqrt( N + 0.5 ) - 1 ) / 2;
   for ( int odd = 1; odd <= oddmax; odd++ )     // Only worry about odd numbers (first is 3, second is 5 etc.)
   {
      if ( A[odd] )
      {
         int odd2 = odd << 1;
         int oddstep = odd2 + 1;
         for ( int j = odd2 * ( odd + 1 ); j <= M; j += oddstep ) A[j] = false;
      }
   }
   int c = 1;                                   // 2 is the only even prime
   for ( int odd = 1; odd <= M; odd++ ) if ( A[odd] ) c++;   // Include odd primes
   return c;
}

int main()
{
   const int N = 100000000;
   clock_t t0 = clock();
   int p = sieve( N );
   double t = (double)( clock() - t0 ) / CLOCKS_PER_SEC;
   cout << "Found " << p << " primes in " << t << " seconds\n\n";
}

Last edited on Mar 7, 2019 at 9:01am

Mar 7, 2019 at 9:07am

For comparison (I added the elapse time output and set the upper limit the same as in the other examples in this thread):

/// @file     segmented_sieve.cpp
/// @author   Kim Walisch, <kim.walisch@gmail.com> 
/// @brief    This is a simple implementation of the segmented sieve of
///           Eratosthenes with a few optimizations. It generates the
///           primes below 10^9 in 0.8 seconds (single-threaded) on an
///           Intel Core i7-6700 3.4 GHz CPU.
/// @license  Public domain.

#include <iostream>
#include <algorithm>
#include <cmath>
#include <vector>
#include <cstdlib>
#include <stdint.h>

/// Set your CPU's L1 data cache size (in bytes) here
const int64_t L1D_CACHE_SIZE = 32768;

/// Generate primes using the segmented sieve of Eratosthenes.
/// This algorithm uses O(n log log n) operations and O(sqrt(n)) space.
/// @param limit  Sieve primes <= limit.
///
void segmented_sieve(int64_t limit)
{
  int64_t sqrt = (int64_t) std::sqrt(limit);
  int64_t segment_size = std::max(sqrt, L1D_CACHE_SIZE);
  int64_t count = (limit < 2) ? 0 : 1;

  // we sieve primes >= 3
  int64_t i = 3;
  int64_t n = 3;
  int64_t s = 3;

  std::vector<char> sieve(segment_size);
  std::vector<char> is_prime(sqrt + 1, true);
  std::vector<int64_t> primes;
  std::vector<int64_t> multiples;

  for (int64_t low = 0; low <= limit; low += segment_size)
  {
    std::fill(sieve.begin(), sieve.end(), true);

    // current segment = [low, high]
    int64_t high = low + segment_size - 1;
    high = std::min(high, limit);

    // generate sieving primes using simple sieve of Eratosthenes
    for (; i * i <= high; i += 2)
      if (is_prime[i])
        for (int64_t j = i * i; j <= sqrt; j += i)
          is_prime[j] = false;

    // initialize sieving primes for segmented sieve
    for (; s * s <= high; s += 2)
    {
      if (is_prime[s])
      {
           primes.push_back(s);
        multiples.push_back(s * s - low);
      }
    }

    // sieve the current segment
    for (std::size_t i = 0; i < primes.size(); i++)
    {
      int64_t j = multiples[i];
      for (int64_t k = primes[i] * 2; j < segment_size; j += k)
        sieve[j] = false;
      multiples[i] = j - segment_size;
    }

    for (; n <= high; n += 2)
      if (sieve[n - low]) // n is a prime
        count++;
  }

  std::cout << count << " primes found." << std::endl;
}

/// Usage: ./segmented_sieve n
/// @param n  Sieve the primes up to n.
///
int main(int argc, char** argv)
{
	clock_t t0 = clock();		/* time(R) */
  if (argc >= 2)
    segmented_sieve(std::atoll(argv[1]));
  else
    segmented_sieve(100000000);

	double te = (double)( clock() - t0 ) / CLOCKS_PER_SEC;  /* time(E) */
	std::cout << "Elapsed time: " << te << std::endl;
  return 0;
}

5761455 primes found.
Elapsed time: 0.140986

Seems I have to squeeze my routine a bit harder.

Mar 7, 2019 at 10:00am

Why don't you count while sieving? Just modified your suggestion:

#include <iostream>
#include <cmath>
#include <ctime>
#include <vector>
using namespace std;

int sieve( int N )
{
    int odd;
   int M = ( N - 1 ) / 2;                        // Upper limit of odd numbers counting from 3, 5, ...
   vector<bool> A( 1 + M, true );                // A[0] not used
   int oddmax = ( sqrt( N + 0. ) - 1 ) / 2;
   int c = 1;                                   // 2 is the only even prime
   for ( odd = 1; odd <= oddmax; odd++ )     // Only worry about odd numbers (first is 3, second is 5 etc.)
   {
      if ( A[odd] )
      {
          ++c;
         int odd2 = odd << 1;
         int oddstep = odd2 + 1;
         for ( int j = odd2 * ( odd + 1 ); j <= M; j += oddstep ) A[j] = false;
      }
   }
   for ( ++odd; odd <= M; odd++ ) if ( A[odd] ) c++;   // Include odd primes
   return c;
}

int main()
{
   const int N = 100000000;
   clock_t t0 = clock();
   int p = sieve( N );
   double t = (double)( clock() - t0 ) / CLOCKS_PER_SEC;
   cout << "Found " << p << " primes in " << t << " seconds\n\n";
}

Found 5761455 primes in 0.277221 seconds

Mar 7, 2019 at 10:07am

Because when I timed it this didn't make any measurable difference. sqrt(M) is several thousand times smaller than M, so it makes negligible difference whether I start the extra counting loop from 1 or sqrt( M ) + 1.

The timing swings wildly on c++ shell from about 0.27 to 0.33 seconds, with the occasional one above 0.4 seconds. So I guess it is very dependent on the size of the sieve array and how much of that fits in cache memory. (And what else is being done on the machine at the same time.)

So a good segmented sieve will probably perform better.

Last edited on Mar 7, 2019 at 10:12am

Mar 8, 2019 at 2:22am

So I guess it is very dependent on the size of the sieve array

In addition the speed of the inner, the sieving loop has most impact. I should find a faster index calculation. Meanwhile I do the sieving for some primes everybody knows by heart in advance, just to avoid for those the braking index calculation.

So once again, same result, new speed:

#include <iostream>		/* cout, cin, endl */
#include <ctime>		/* clock */
#include <vector>
using namespace std;

unsigned long const z(100000000);
vector<bool> p(((z-2)/30+1)*8, true);

unsigned long rt( unsigned long z)
{
    unsigned long a, b;
    a = z;
    do {
        b = a;
        a = (a + z / a) / 2;
    } while (a < b);
    return b;
}

unsigned long i2n(unsigned long i)
{
    unsigned long b;
	b = i % 8;
	switch (b) {
		case 0:
			b = 7;
			break;
		case 1:
			b = 11;
			break;
		case 2:
			b = 13;
			break;
		case 3:
			b = 17;
			break;
		case 4:
			b = 19;
			break;
		case 5:
			b = 23;
			break;
		case 6:
			b = 29;
			break;
		case 7:
			b = 31;
			break;
//		default:		/* should not come here */
	}
	return 30 * (i / 8) + b;
}

void pf(unsigned long n)
{
    unsigned long a, b;
	b = n % 30;
	switch (b) {
		case 7:
			b = 0;
			break;
		case 11:
			b = 1;
			break;
		case 13:
			b = 2;
			break;
		case 17:
			b = 3;
			break;
		case 19:
			b = 4;
			break;
		case 23:
			b = 5;
			break;
		case 29:
			b = 6;
			break;
		case 1:
			n -= 2;
			b = 7;
			break;
		default:
			return;
	}
	a = n / 30;
	b = 8 * a + b;
	p[b] = false;
	return;
}

int main()
{
	unsigned long n, c(8), i, rz, m, x;

	clock_t t0 = clock();		/* time(R) */
	rz = ((rt(z)-2)/30)*8 + 1;		/* roundabout */
    for (x = ((z-2)/30+1)*8; i2n(x) > z; --x);
    
    for (m = 12; m <= x; m+=12)     // \7
    {
                    p[m] = false; m+=7;
        if (m <= x) p[m] = false; m+=4;
        if (m <= x) p[m] = false; m+=7;
        if (m <= x) p[m] = false; m+=4;
        if (m <= x) p[m] = false; m+=7;
        if (m <= x) p[m] = false; m+=12;
        if (m <= x) p[m] = false; m+=3;
        if (m <= x) p[m] = false;
    }

    for (m = 31; m <= x; m+=12)     // \11
    {
                    p[m] = false; m+=6;
        if (m <= x) p[m] = false; m+=11;
        if (m <= x) p[m] = false; m+=6;
        if (m <= x) p[m] = false; m+=12;
        if (m <= x) p[m] = false; m+=18;
        if (m <= x) p[m] = false; m+=5;
        if (m <= x) p[m] = false; m+=18;
        if (m <= x) p[m] = false;
    }

    for (m = 44; m <= x; m+=7)     // \13
    {
                    p[m] = false; m+=13;
        if (m <= x) p[m] = false; m+=7;
        if (m <= x) p[m] = false; m+=14;
        if (m <= x) p[m] = false; m+=21;
        if (m <= x) p[m] = false; m+=7;
        if (m <= x) p[m] = false; m+=21;
        if (m <= x) p[m] = false; m+=14;
        if (m <= x) p[m] = false;
    }

    for (m = 76; m <= x; m+=19)     // \17
    {
                    p[m] = false; m+=9;
        if (m <= x) p[m] = false; m+=18;
        if (m <= x) p[m] = false; m+=27;
        if (m <= x) p[m] = false; m+=9;
        if (m <= x) p[m] = false; m+=27;
        if (m <= x) p[m] = false; m+=18;
        if (m <= x) p[m] = false; m+=9;
        if (m <= x) p[m] = false;
    }

    for (m = 95; m <= x; m+=10)     // \19
    {
                    p[m] = false; m+=20;
        if (m <= x) p[m] = false; m+=30;
        if (m <= x) p[m] = false; m+=11;
        if (m <= x) p[m] = false; m+=30;
        if (m <= x) p[m] = false; m+=20;
        if (m <= x) p[m] = false; m+=10;
        if (m <= x) p[m] = false; m+=21;
        if (m <= x) p[m] = false;
    }

	for (n = 5; n <= rz; ++n)	/* only up to sqrt(z) */
	{
		if (p[n])				/* candidate is prime */
		{
			c += 1;				/* increment counter */
			m = i2n(n);
			for (i = m*m; i <= z; i += m+m)
				pf(i);			/* sieve out multiple */
		}
	}

/* sieving is done, all remaining are prime */
    for (n=n; n <= x; n++)
		if (p[n]) c++;

	double te = (double)( clock() - t0 ) / CLOCKS_PER_SEC;  /* time(E) */

	cout << "From 1.." << z << " found " << c << " primes in " << te << " seconds." << endl;
}

From 1..100000000 found 5761455 primes in 0.265734 seconds.

Mar 8, 2019 at 2:30am