Search & Extract from File

Mar 29, 2011 at 6:10pm
Hello,
First time poster; Ive come here often for help, but couldn't find anything on this topic:

For work I have to do some painstaking copy pasting to convert one type of text file to another, but really all it is is extracting the small bits that I actually need.

I want to write a C++ program that will help speed up this process.
I am in freshman programming courses (have about 3 semesters of C++ down) so I feel like I can handle this.

Ive gotten down the fstream in and outs, etc... but am having trouble establishing a solid way to accomplish the following:

I want to be able to search for a certain character or string in the file, then copy the file contents up through another character, so I can export them to the new file. And repeat this line by line.

What method should I be using? I have looked into the fstream functions, such as seekg, but don't know how to put them together. I need someone to push me in the right direction.

Any help is much appreciated!
Mar 29, 2011 at 6:26pm
Is that to be performed on single lines?
If the two delimiting characters are 'a' and 'b' and the following is the file:
qwertyuiopasdfghjkl
zxcvbnm
qwertasdfgvbnm
zxcvbnmqwertyu
sdfghjkl
mnbvcxzdsa
What should the resulting file look like?
Last edited on Mar 29, 2011 at 6:29pm
Mar 29, 2011 at 6:46pm
Does this work?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <iostream>
#include <string>

using namespace std;

bool attemptToFind(string trg, bool copy = false) {
	int i = 0;
	char *c = new char;
	while( i < trg.length() ) {
		cin.read(c, 1);
		if(copy)
			cout << *c;
		if( *c == trg[i] )
			i++;
		else
			break;
	}
	delete c;
	return i == trg.length();
}

int main(int argc, char *argv[]) {
	if( argc < 3 ) {
		cerr << "Needs two arguments\n";
		return 1;
	}
	
	const string start = argv[1],
		end = argv[2];
	bool copy = false;
	
	while( !cin.eof() ) {
		if( !cin.good() ) {
			cerr << "Input error\n";
			return 2;
		}
		
		if(copy) {
			if( attemptToFind(end, true) )
				copy = false;
		} else {
			if( attemptToFind(start) )
				copy = true;
		}
	}
	
	return 0;
}


I just wrote and tested this by calling:
copy_paste ! ... <in.txt

Where the input file (in.txt) contained:
Hello, world!
Copy all this text...
However, not this text.

And I got:

Copy all this text...
Last edited on Mar 29, 2011 at 7:18pm
Mar 29, 2011 at 6:50pm
Yes all of it would be done on single lines.
Actually Ill just find an example.

(OFF,0,0,0) OX= WX= #[-40.7570,-8.5179,-9.5819,98.6692,-43.3770,-7.1841] ;

This is part of the end output for a line. In the important file there is crap between the OX= and WX= :

(OFF,0,0,0) OX= fdasfdsafdsafdsafdsafdsafds WX= agdsafdsafdsafdsafdsafdas #[-40.7570,-8.5179,-9.5819,98.6692,-43.3770,-7.1841] ; //this is all on one line

Essentially what I need is to be able to get read and extract everything between the #[ and ];

Thanks for the reply!
Last edited on Mar 29, 2011 at 6:50pm
Mar 29, 2011 at 7:03pm
@Mathhead:

Will definitely try that out as soon as I get home.

Thanks, and Ill get back to you :]
Mar 29, 2011 at 7:04pm
I ran my program by calling:
copy_paste #[ ]

I entered the input (followed by an eof):
(OFF,0,0,0) OX= fdasfdsafdsafdsafdsafdsafds WX= agdsafdsafdsafdsafdsafdas #[-40.7570,-8.5179,-9.5819,98.6692,-43.3770,-7.1841] ;

and got:
-40.7570,-8.5179,-9.5819,98.6692,-43.3770,-7.1841]

I'm sure you could edit the program to discard the ending string "]".
Mar 29, 2011 at 7:13pm
You can make a little find function pretty easily.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
 fstream file;
 file.open("file.ext",ios::in);

 string tofind = "#[";
 string found = "";
 vector<string> vecfound;


 while (!file.eof()){
  char buf = file.get();
  if (buf == tofind[0]){
   //you can now use peek() to find out if the next char is going to be a part of what you're trying to find.

  if (/*found the whole key*/){
   while (!file.eof() && /*next little bit isn't the ending char, you can use a function, if you want*/) {
    found += file.get();
    //or output to the new file
    
   }
   vecfound.push_back(found);
   found = "";
  }
}
Mar 29, 2011 at 7:33pm
Thank you ultifinitus. I'll try that as well.
I have a question about how the program would read the file.
Does it read continuously until EOF (and new lines are just the '\n' character), or would it stop at the end of a line?
Also, can I force it to start reading from the next line, or skip the next line, etc. Is there an easy way to do this? I read somewhere about using a vector to index all the \n newlines, but if it's that complicated I don't really need it. Just wondering!

Again, thanks for the replies; much appreciated!
Mar 29, 2011 at 7:47pm
My program doesn't read by line, but by character.

You can edit it to use the getline() method http://cplusplus.com/reference/string/getline in place of the read() method http://cplusplus.com/reference/iostream/istream/read . (Also, you don't need to store all the lines in memory at once, just create one string and loop though all the lines reading one at a time, copy to it stream, and discard it. As you print to a stream it gets stored there so I don't think you need an extra copy in memory as an array.)
Last edited on Mar 29, 2011 at 7:50pm
Mar 29, 2011 at 7:48pm
you can use getline to get a single line, and if you don't want that line, just let the result fizzle.

The .eof() checks to see if the current position is at the end of the file, so when you read in parts of the file using the different methods, the position pointer is incremented.
Mar 29, 2011 at 8:23pm
Ah yeah I get it now.

|My program doesn't read by line, but by character.
|
|You can edit it to use the getline() method http://cplusplus.com/reference/string/getline in place |of the read() method http://cplusplus.com/reference/iostream/istream/read . (Also, you don't |need to store all the lines in memory at once, just create one string and loop though all the lines |reading one at a time, copy to it stream, and discard it. As you print to a stream it gets stored |there so I don't think you need an extra copy in memory as an array.)

I was confusing the getline with get, for some reason I thought it would stop reading once it reached the end of a line.

However, am i correct that .get will return a newline as '\n'?
Mar 29, 2011 at 8:35pm
You are correct, it will return every single character, including the end of file.
Mar 29, 2011 at 8:45pm
Okay thanks again.

I'll try to get something working and will post the results later today/tomorrow.
Mar 30, 2011 at 3:34pm
Okay I'm trying something out and am a bit confused by the tellg() function.

I have this in a separate program:
1
2
3
4
5
for(int i=0; i<10; i++){
        cout<<"cur:"<<textIn.tellg()<<endl;
        char buf = textIn.get();
        cout<<"next char:"<<buf<<endl;
}

And it gives me the output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cur:0
next char:f
cur:5
next char: 
cur:10
next char:o
cur:15
next char:h
cur:20
next char: 

.
.
.


How exactly does tellg() work? It returns the current location of the get pointer, but does it also move the get pointer? And why is it, in this case, incrementing by 5?

I hope I phrased that correctly, Im just trying to understand the output.

Thank you!

EDIT:

The input file is:
1
2
3
4
5
first #second third;
#fourth; fifth
sixth seventh
either### ninth
tenth

Last edited on Mar 30, 2011 at 3:34pm
Mar 30, 2011 at 3:39pm
Try the ios::binary binary flag

file.open("filename",ios::in|ios::binary);
Mar 30, 2011 at 3:49pm
Ah thank you!
Mar 30, 2011 at 4:41pm
no problem ;)
Apr 4, 2011 at 1:02am
Well I have wrapped this project up, thanks to you all!

On an unrelated note, is there an easy way to get to two DOS box displays up in the same program? I would like to be able to key in input on one and the other to display data as the program refreshes it. Is this possible?
Topic archived. No new replies allowed.