Allocation with many strings

Forum

Forum
Beginners
Allocation with many strings

Allocation with many strings

So i have to generate 3kk diffrent 6-digit codes (i know its a lot) i wrote this program and it got quite a few flaws first and main problem is that everytime i start it, i got an error with "bad allocation size" but it works anywys so i tried to ignore it but the biggest number it ever generated was 30k so its only 1/100 of what i need. In bigger cases it just doesnt start i belive he lack memory so firstly is there any way i can make it work? On the other hand i tried to make it save to the file only uniqe codes but in 30k case it saved all of them, so is there a mistake in code or i just got lucky?

 int j=0;
	int w=0;
	string kod;
	string tab[30000];
	int z=0;
	int ch=0;
	ofstream File;
	File.open("C:\\file.txt");
	while (z <=30000)
	{
		while (j <=5)
		{
			char los = ( rand() % ( 'Z' - 'A' ) ) + 'A';
			tab[z] += los;
			
			j++;
		}
		while (w <= z-1)
		{
			if (tab[w] == tab[z])
			{
				ch=1;
			}
			w++;
		}
		if (ch==0)
		{
			File<<tab[z]<<endl;
		}
		ch=0;
		cout << tab[z] << endl;
		j=0;
		z++;
	}
	File.close();
	getch();

keskiverto (10402)

string tab[X];
int z=0;
while (z <=X)
{
  tab[z]; // this will be an error on the last iteration, because z==X
}

Furthermore, lines 18.25 have a loop, where w increases. Should it start the loop from 0 every time?

Your 'tab' is a local variable and therefore it is in stack memory. That is relatively limited. You must use dynamic allocation; the heap memory has much more space. Consider std::vector<string> tab(X); instead of array.

Consider std::set<string> too. It might be more efficient. You don't actually need the whole array; you print every value, and write only the unique values, so a set of uniques should be enough.

Chervil (7320)

I've tried this out using the existing code with the various bugs ironed out (I think!). It does work, but as the size is increased, the program spends more and more time searching the table to see whether the value is already there.

#include <iostream>
#include <fstream>
#include <string>
#include <cstdlib>

    using namespace std;

int main()
{
    const int size = 10000;

    string * tab = new string[size];

    int z=0;

    ofstream File;
    File.open("C:\\file.txt");

    while (z <size)
    {
        int j = 0;
        while (j <=5)
        {
            char los = ( rand() % ( 'Z' - 'A' + 1 ) ) + 'A';
            tab[z] += los;

            j++;
        }

        bool dup = false;
        int w = 0;
        while (w < z)
        {
            if (tab[w] == tab[z])
            {
                dup = true;
                tab[z] = "";
                break;
            }
            w++;
        }

        if (!dup)
        {
            File<<tab[z]<<endl;
            z++;
        }

        //cout << tab[z] << endl;
    }

    File.close();
    delete [] tab;
    // getch();

    return 0;
}

I agree with the suggestion from keskiverto to use std::set as a better way of detecting duplicates.

Last edited on

Piotrifek (8)

Thank you very much! I hope it really finds the duplicated keys :)

MrHutch (1822)

Here's an example using standard sets (that keskiverto mentioned) that should generate 30k codes with reasonable ease.

#include <iostream>
#include <ctime>
#include <cstdlib>
#include <fstream>
#include <string>
#include <set>

std::string generateCode()
{
   std::string ret;

   for( int i=0; i < 6; ++i )
      ret += ( rand() % ( 'Z' - 'A' + 1 ) ) + 'A';

   return ret;
}

int main( int argc, char* argv[] )
{
   srand( time( NULL ) );

   std::set<std::string> codes;
   std::set<std::string>::const_iterator it;

   std::cout << "Generating codes...\n";

   while( codes.size() < 3000000 )
      codes.insert( generateCode() );

   std::cout << "Codes generated. Printing to file...\n";

   std::ofstream of( "codes.txt", std::ofstream::out );

   for( it = codes.begin(); it != codes.end(); ++it )
      of << *it << std::endl;

   of.close();

   return 0;
}

Last edited on

Piotrifek (8)

Im using Chervil's code right now and i need 3kk not 30k ;( may take quite a while as i see

MrHutch (1822)

Oh, I see. My apologies, I misread that. Still, you can change the 30000 to 3000000 in that code (I'll edit the code above).

It's not the actual generation that takes the time, it'll be the outputting to a text file. File I/O is notoriously slow.

Piotrifek (8)

Well its ridiculously slow got 100k i 30min and itll prolog get slower :/ anyways I need it in the file so I think I just need to wait or mb urs is somehow faster?

Last edited on

MrHutch (1822)

If I'm honest, I don't know how sets work in terms of checking for currently existing values. Someone else here might be able to chime in on that. Given that, in terms of generation, it could be faster. In terms of outputting to the file, it'll be the same speed.

Took roughly between 90 seconds to two minutes to generate 3 million on my work laptop here, which is reasonable (but not stupendous) spec.

Last edited on

keskiverto (10402)

1. I made a wrong conclusion based on the original code: that you store only the unique among 3E+6 generated keys. As you apparently need 3E+6 unique codes, you really need space for that much. The latter versions do so.

2. @iHutch105: std::set is ordered. The order that you write keys out is not the order they were generated. This may or may not be significant. Therefore,

const std::string key{ generateCode() };
auto pos{ codes.lower_bound( key ) };
if ( codes.end() != pos && key == *pos ) // duplicate
else {
  codes.insert( pos, key );
  of << key << '\n';
}

Piotrifek (8)

Order doesn't matter at all tho after 1,5 h I'm stuck with 150k and I'm getting really annoyed by the speed this'll take forever

Chervil (7320)

I don't know whether this is better or worse than the other version posted - but it is definitely faster than my first attempt. Took just somewhere between 45 seconds to 1 minute to run and used about 185MB of RAM..

#include <iostream>
#include <fstream>
#include <string>
#include <set>
#include <cstdlib>

    using namespace std;

int main()
{
    const int size = 3000000;

    set<string> tab;
    
    const int len = 6;
    string kod(len,'-');  // create string of 6 characters
    
    int z=0;

    ofstream File("C:\\file.txt");

    while (z <size)
    {
        for (int j=0; j<len; j++)
        {
            kod[j]= ( rand() % ( 'Z' - 'A' + 1 ) ) + 'A';
        }

        if (tab.find(kod) == tab.end())
        {
            tab.insert(kod);
            File << kod << endl;
            z++;
        }
    }

    File.close();
    
    cout << "Done" << endl;

    return 0;
}

EDIT: this is the shortest, and about the fastest version I came up with. One significant change is the use of newline '\n' rather than std::endl when writing to the file. That gave a big time saving on my machine.

#include <iostream>
#include <fstream>
#include <set>
#include <cstdlib>
#include <string>

    using namespace std;

int main()
{  
    const unsigned int size = 3000000;   
    const int len = 6;

    set<string> tab;
    string kod(6,'-');
    ofstream File("C:\\file.txt");

    while (tab.size() < size)
    {
        for (int j=0; j<len; j++)
            kod[j]= ( rand() % ( 'Z' - 'A' + 1 ) ) + 'A';

        if (tab.insert(kod).second)
            File << kod << '\n';
    }

    return 0;
}

Last edited on

Piotrifek (8)

U mean one minute with saving to file ? How is it possible to be that much of a difference when my pc isn't archaic at all?

Last edited on

Piotrifek (8)

Ok sorry for double post but i need conclusion First i want to thank everyone here and second dont ever use visual studio :D but now to be serious i used same code on the dev cpp and it acually finished in like 1 min or smth when MSV took 1,5h for ~200k dunno if i messed up with smth or whatever reason but ms used like 200mb of ram and 15% of CPU when dev took all my 4gb and whole cpu mb i need to work with the configuration or whatever. Anyways im so happy its over :)

Chervil (7320)

Have you tested my latest version - how long did it take? Or has it not finished running yet?

I ran this on an old laptop, its several years old, nothing special.

Piotrifek (8)

yup i tried lastest now and it took about the same ~1min

Topic archived. No new replies allowed.

C++

Forum

Allocation with many strings