finding largest prime number in a file

Pages: 12
May 7, 2019 at 9:42pm
hello guys,Anyone having an idea of creating a programm that is able to find largest prime number in a file containing 100,000 numbers.
Also,to print out the first ten large prime number.
May 7, 2019 at 10:13pm
Open file. Look at each number in turn. If it's a prime number, check if it's the largest prime number you've seen so far.
May 7, 2019 at 10:14pm
Break the program into individual challenges.

Challenge 1: Given a number, determine if it is prime
Challenge 2: Open a file and iterate through the numbers in that file
Challenge 3: (Putting it together) Keeping track of a maximum prime number as you iterate over numbers.

Focus on just challenge 1, make a function like
bool isPrime(int n);
that returns whether or not a number is prime. There's plenty of resources out there to help you if you get stuck.

Here's some basic psuedo code that might be helpful:

1
2
3
4
5
6
7
8
9
10
11
12
13
std::ifstream file("filename");
int largest_prime = -1;
int number;
while (file >> number) // while getting a number from the file
{
    if (isPrime(number))
    {
        if (number > largest_prime)
        {
             largest_prime = number;
        }
    }
}


Of course, this is only to find the 1st largest prime number. You'd have to have an array if you want to keep track of the N largest prime numbers.

Alternatively, if you find a prime that is larger than the last prime you found, then add it to an std::vector, and at the end, sort the vector and print the last 10 elements.
Last edited on May 7, 2019 at 10:19pm
May 7, 2019 at 10:19pm
Find the largest number in the file.

Create a Sieve of Eratosthenes of that size.

Read off the last 10 primes.
May 8, 2019 at 12:23pm
a Sieve of Eratosthenes of that size

So big? IMO up to the sqare root would be sufficient. If you know all primes including 17 it is enough to filter for next primes up to (17+2)2-1 = 360. Or -2 = 359 to be precise (assuming you do not look for even primes > 2).
May 8, 2019 at 12:53pm
MikeStgt wrote:
So big?


Yes, so big.

If the largest number in the file is 100 then I require a sieve big enough to tell me that 89 and 97 are prime, not stop at sqrt( 100 ) = 10.
Last edited on May 8, 2019 at 1:19pm
May 8, 2019 at 1:57pm
I require a sieve big enough to tell me that 89 and 97 are prime

How do you test a number to be prime? By lookup a list of primes? Or by computing? (Using a significant smaller list of primes.)
May 8, 2019 at 2:16pm
1) there are a bunch of "very likely" primality testing algorithms. These have some tiny chance at being wrong.
2) you can sieve it up and see if its in your sieve. typical sieves are a vector of bools, so its literally if(list[value]) then its prime.
a simple version of it...
3) for very small numbers you can use GCD against a constant or two. the constants grow like a factorial and its not as efficient as the bool list, so its of dubious value.

1
2
3
4
5
6
7
8
9
vector<bool> pn(max_size,true);
  for(i = 4; i < pn.size(); i+=2)
	  pn[i] = false;  
  for(i = 3; i < pn.size(); i+=2)
  {
     if(pn[i])
      for(j = i*2; j < pn.size(); j+=i)
         pn[j] = false; 	
  }
Last edited on May 8, 2019 at 2:21pm
May 8, 2019 at 8:12pm
1) there are a bunch of "very likely" primality testing algorithms.
Rabin-Miller is one of them. AFAIK, it reports no false positives below 10^10 (at least not on an HP-42S and its emulators).

2) you can sieve it up...
For this you must have sieved up to max value found in list to build the vector of bools.

3) for very small numbers...
A counter proposal:
1
2
3
4
5
6
7
8
9
bool isPrime(int number)
{
    if(number < 2) return false;
    if(number == 2) return true;
    if(number % 2 == 0) return false;
    for(int i=3; (i*i)<=number; i+=2)
        if(number % i == 0 ) return false;
    return true;
}

My idea was to "improve" the loop with a small array of primes p[] to:
for(int i=0; (p[i]*p[i])<=number; ++i)
if(number % p[i] == 0 ) return false;


Well, I have to confess, I forgot the option of a vector of bools what is a grand saving in storage and "lookup" in contrast to an array of primes. In case the list contains values close to max of unsigned long long (80 bits I assume), the vector of bools will use more than 3.8e16 MB of your storage (if compacted), what is not nothing ;)
I assume, depending on the max value option 1) with verification by something like option 3) could be beneficial.

Edit: bits not bytes
Edit: size of vector of bools
Last edited on May 9, 2019 at 11:58pm
May 9, 2019 at 9:05am
Supplement
In 1876 Lucas proved that 2127-1 is prime.
That was some time before unsigned long long :)
May 19, 2019 at 11:02pm
Anyone having an idea of creating a programm that is able to find largest prime number in a file containing 100,000 numbers.

Took a while, sorry --
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <iostream>	/* cout, cin, endl */
#include <ctime>	/* clock */
#include <cmath>	/* sqrt */
#include <cstdint>      /* __uint128_t */
using namespace std;

/* MRT from https://de.wikipedia.org/wiki/Miller-Rabin-Test#Implementierung */
#include <cstdint>
using std::uint64_t;
bool mrt (const uint64_t n, const uint64_t a) { // n ungerade, 1 < a < n-1
   if (n % 2 == 0) return false;
   const uint64_t m = n - 1;
   uint64_t d = m >> 1, e = 1;
   while (!(d & 1)) d >>= 1, ++e;
   __uint128_t p = a, q = a;
   while (d >>= 1) { // exponentiere modular: p = a^d mod n
      q *= q, q %= n; // quadriere modular: q = q^2 mod n
      if (d & 1) p *= q, p %= n; // multipliziere modular: p = (p * q) mod n
   }
   if (p == 1 || p == m) return true; // n ist wahrscheinlich prim
   while (--e) {
      p *= p, p %= n;
      if (p == m) return true;
      if (p <= 1) break;
   }
   return false; // n ist nicht prim
}

int main()
{
	unsigned long max(~0), delta2(100000), n, cnt(0);

	cnt = 0; clock_t t0 = clock();		/* time(R) */
	for (n = max - delta2; n < max; ++n)
		if (mrt(n,  2) && 
		    mrt(n,  3) &&
		    mrt(n,  5) &&
		    mrt(n,  7) &&
		    mrt(n, 11) &&
		    mrt(n, 13) &&
		    mrt(n, 17) &&
		    mrt(n, 19) &&
		    mrt(n, 23))		cnt++;
	double te = (double)( clock() - t0 ) / CLOCKS_PER_SEC;  /* time(E) */
	cout << "MRT counts " << cnt << " primes from " << max - delta2 << " to " << max << " within " << te << " seconds.\n";
}

My focus was on a way to scan 100'000 numbers for primes in some reasonable time, to present the 10 largest found is left as exercise for others.
Last edited on May 19, 2019 at 11:36pm
May 20, 2019 at 3:12pm
A hybrid approach would be faster if the range is 0-N ... do the vector bool lookup table sieve as big as you can tolerate it first, nearly instant check for the ones in that table, then do the harder check for anything bigger. MRT there is doing an awful lot of work to see if 11 is prime or not.
May 20, 2019 at 7:41pm
A hybrid approach would be faster
Correct. For that I am about to get the speed of different methods at different "spots", 0..100'000, 8000000..8100000, and 4294957295 to 4294967295 so far. See http://www.cplusplus.com/forum/beginner/253928/#msg1115830 A simple primality check seems to be sufficient in the typical number range of pocket calculators.
It is work in progress, as I am not yet familiar enough with C++ to code my ideas right away. So the method using a bool lookup table is currently missing. I've done it, partially, see http://www.cplusplus.com/forum/beginner/253520/#msg1114742 , but to put it as a function so far failed, bitset<z> s; insists of z being a constant.
May 20, 2019 at 8:16pm
my code above gives you a bool lookup table. if pn[value] == true then its prime. can tweak it to use unsigned long long or whatever, but the core code should be fine. You may be able to make it faster somehow too. Even cuter you can run it once and save the vector to a file or whatever.
Last edited on May 20, 2019 at 8:18pm
May 20, 2019 at 9:09pm
the core code should be fine

The core of your routine is SoE, Sieve of Eratosthenes :)

You may be able to make it faster somehow too.
Yes, I am. Instead of your sieving loop
for(j = i*2; j < pn.size(); j+=i) I suggest
for(j = i*i; j < pn.size(); j+=i+i) Next speed up would be using "wheels".

Even cuter you can run it once and save the vector to a file or whatever.
Exactly that is my goal, for reasons of size I'd prefere bitset instead of bool array. Drawback, its size must be known at compilation already. At least that's what I grasped.
May 21, 2019 at 12:03am
vector bool is optimized to use ~ 1 bit per bool.
May 21, 2019 at 12:59am
The storage is not necessarily an array of bool values, but the library implementation may optimize storage so that each value is stored in a single bit.
found at http://www.cplusplus.com/reference/vector/vector-bool
Not necessarily, may optimize storage is indefinite -- sorry, this is not so stringent as your statement. Things That Make You Go Hmmmm...
In contrast: "The class emulates an array of bool elements, but optimized for space allocation", see http://www.cplusplus.com/reference/bitset/bitset
May 21, 2019 at 3:30pm
You need to determine the primality of 100,000 numbers. To make each test fast, it might be useful to keep a collection of primes up to some value. As long as you have primes up to sqrt(N), you can quickly determine if N is prime by seeing if any of those primes is a factor of N.

That helps to check primality of 100,000 numbers, but only if they are relatively small. If they can range up to 1018 Then another test like MRT might be better.

So how big can these numbers get?

It seems to me that this is really an exercise in your ability to analyze a problem and choose an efficient algorithm. In this case, then key factors are how many numbers you must examine (100,000), and the range of values for those numbers.

May 21, 2019 at 4:13pm
It seems to me that this is really an exercise

Yes, Dave, I try to take my first steps with it in C++. For sure it is not for the primes, there are several known meanwhile :)

BTW, several years ago there was an information, it is faster to generate the first "few" prime numbers than reading them from a disk file. That was before SSD.
May 21, 2019 at 4:19pm
It is still true even on SSD. SSD is (very slightly) slower than ram, and ram is slower than cache. Rotating disk is much slower than SSD, but the delay still exists because ram does not have a file system / OS interface layer to contend with, and they are not always tied to the motherboard as optimally as ram (depends on the system).

'few' is system dependent so its hard to nail that down.

modern PC timings are hard to grasp ... Ive seen too many examples where doing more work is faster than having logic to avoid some of the work. A couple of the primality test examples thrown around suffer from this ... 6+ if statements out the gate take more time than they save, at least on my setup.
Last edited on May 21, 2019 at 4:23pm
Pages: 12