Reading and outputting random lines?

Hi. I want to read one random line from a file and output it into a new file. I want the program to run until all lines have been read and outputted in a random order.

Here is my code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
#include <fstream>
#include <istream>
#include <ctime>
#include <string>

using namespace std;

int main()
{

 char c;
 int num=0;
 ifstream is;
 is.open ("Text.txt");
 while (is.good())
 {
 c = is.get(); 
 if (c=='\n')num++;
 }
 is.close();
 num+=1;

	 int random;

	 random = rand()%num+1;

	 ifstream file1("Text.txt");
	 ofstream file2("Text2.txt");

	 string myNum;

	 for (int z=1; z<=num; z++)
	 {
		 getline(file1, myNum);
		 file2<<myNum<<endl;
		 file2<<myNum<<endl;
	 }



	return(0);
}


I know the code looks bad and whatnot. Sorry, I am still newish. I just need to find out how to go to a random line and never to the same line again.
Last edited on
I found out if you use srand(time(NULL)); you will never get the zame random values each time you run the program. IDK too much of the techie part of thiz. But it zetz the random call to a certain number.

This is where you should place srand(time(NULL));

1
2
3
4
5
int random;

srand(time(NULL)); // before first call to rand()

random = rand()%num+1;


Hope i helped =]


P.S. if you find any "z's" in this post that dont make sense, the 's' on my keyboard iz broken, zo i subztitue 'z' alot if im too laZy to uze the "On-Screen Keyboard"
Last edited on
I actually already knew that, I just didn't put it into this code for some reason. But my program doesn't even actually use the random number for anything yet because I don't know how to go about it.

Thank you!
I would read all of the lines from the first file and store them in some contain, then do a random_shuffle() on it. Then just dump the container to the other file.
You mean like an array or vector?

I'm going to have over 100 million lines; last time I tried to have an array store that many things, I got a "stack overflow" error.

Thank you.
Isn't this a duplicate thread?
http://www.cplusplus.com/forum/beginner/104705/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <fstream>
#include <string>
#include <cstdlib>
#include <ctime>

// invariant: file contains at least one line
std::string random_line( const char* path )
{
    std::string selected ;
    std::ifstream file(path) ;

    std::string line ;
    std::streamsize nlines = 0 ;
    while( std::getline( file, line ) )
    {
        ++nlines ;
        if( std::rand() % nlines == 0 ) selected = line ;
    }

    return selected ;
}
Yes, sorry. I wasn't able to figure out how to use the code posted in the previous post but now I am trying to figure out how to use the code that you just provided.

Sorry and thanks.
Okay, so I figured out how to use your code this time. Thank you!

But, now I am stuck again. My output file should have every line written into it. But when I run the program, it picks one random line and outputs only that one into my file. Here is the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
 #include <fstream>
#include <string>
#include <cstdlib>
#include <ctime>

using namespace std;

string random_line( const char* path );

int main()
{

	ofstream file2("Text2.txt");
	string apple;

	for (int z=1; z<=15; z++);
	{
	apple = random_line("Text.txt");
	file2<<apple<<endl;
	}


	return(0);
}

// invariant: file contains at least one line
string random_line( const char* path )
{
    string selected ;
    ifstream file("Text.txt") ;

	srand((unsigned)(time)(NULL));

    string line ;
    streamsize nlines = 0 ;
    while( std::getline( file, line ) )
    {
        ++nlines ;
        if( std::rand() % nlines == 0 ) selected = line ;
    }

    return selected ;
}
Last edited on
> I'm going to have over 100 million lines;

And how much memory do you have?
4gb. I figured reading and outputting from/to text files would be more plausible than storing each line into an array or whatever.

I thought maybe if I could select a random line, store that as a string, delete the line, then output the string into a text file (then rinse and repeat), then I would never have a duplicate line and the program would run until all lines are deleted.

But I don't know how to do that and I don't know if the code you've provided me picks a random line and never the same line?

(The test file I am using has 15 lines. But my output file is only getting one random line..)

Thanks.
Last edited on
Ok, 4 GB is not bad.

One plausible way is

1. Make a first pass through the file looking for new lines, identifying the std::streampos where each line starts, and entering those into a std::vector<std::streampos>

2. std::shuffle() on the vector.

3. clear() the error state on the file stream

4. In a loop, seekg(), followed by std::getline() and write the line to the output file.


This will be very slow. Is performance of the essence?
Performance isn't that important. I have a desktop that I don't use much (that also has 4gb of RAM) that I can just use to run the program if it will really take that long and whatever.

Okay, I'm still kinda new and haven't used any of those commands that you just named (besides loops and getline (and I only used getline because of your code)). But anyway...

1. So I would have to somehow tell std::streampos that each new line starts after "\n" and then store std::streampos into a vector? So, streampos is kind of like a number value for where the cursor is? So then...

2. Shuffling this means that the order in which a line is read is determined by the cursor position (which is based on streampos (which is in a random order))?

3. Not sure about this but I will look it up.

4. Not sure what seekg() is but I'll look it up. So how does seekg() and getline() give me the shuffled streampos?


Sorry, I understand if you don't want to reply because I've asked a lot, but I am very basic with this so far.
Something like this (untested):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <ctime>
#include <cstdlib>
#include <algorithm>

std::vector<std::streamoff> get_offsets( std::istream& file )
{
    std::vector<std::streamoff> offsets( 1, 0 ) ;

    file >> std::noskipws ;
    std::streamoff pos = 0 ;
    char c ;
    while( file >> c )
    {
        pos = pos + 1 ;
        if( c == '\n' ) offsets.push_back(pos) ;
    }

    file.clear() ;
    return offsets ;
}

void write( std::istream& in, std::ostream& out,
            const std::vector<std::streamoff>& offsets )
{
    for( std::size_t i = 0 ; i < offsets.size() ; ++i )
    {
        in.seekg( offsets[i] ) ;
        std::string line ;
        if( std::getline( in, line ) ) out << line << '\n' ;
    }
}

int main()
{
    std::ifstream file( __FILE__, std::ios::binary ) ;
    std::vector<std::streamoff> offsets = get_offsets(file) ;

    std::srand( std::time(0) ) ;
    std::random_shuffle( offsets.begin(), offsets.end() ) ;
    write( file, std::cout, offsets ) ;
}


This is C++98; see if you can get the sense of it.
It works almost perfectly! I don't understand a lot of the code but I will study it and see how and why it works. Thanks again!

Testing it with a 15 line file, I was able to output all of the lines into a random order. The only problem was that sometimes there would be two "\n" between lines and sometimes there was only one.

Now I just need to test it on the actual 100million line file and see what happens.

Thanks again!
Last edited on
> I don't understand a lot of the code but I will study it and see how and why it works.

Yes. You must do that before you use it.


> The only problem was that sometimes there would be two "\n" between lines and sometimes there was only one.

Your file may be containing empty lines. If you want to skip empty lines, modify line 33:

1
2
// if( std::getline( in, line ) ) out << line << '\n' ;
if( std::getline( in, line ) && !line.empty() ) out << line << '\n' ;

Last edited on
"The line endings in this file are not consistent. Do you want to normalize the line endings?"

Not sure what this means but I get this error. I'll look it up though.

Thanks for all of the help.
> "The line endings in this file are not consistent. Do you want to normalize the line endings?"

See: http://stackoverflow.com/questions/5665217/do-i-want-normalized-line-endings
Thank you!

I have one final question if you're still interested in helping, but you've already written my program for me and provided me with resources and knowledge so, yeah...

Anyway...

This is my code:
1
2
3
4
5
6
               int x=1;
	ofstream outt("Text.txt");
	for (x=1; x<117711536; x++);
	{
	outt<<x<<endl;
	}


Instead of writing each number onto a line, my file ends up only having the final number and that is it.

I want this:

1
2
3
4
...
117711535
117711536

but that isn't happening. Thanks anyone.
Oh, come on!

1
2
//for (x=1; x<117711536; x++); // *** get rid of that ;
for (x=1; x<117711536; x++)
Topic archived. No new replies allowed.