Search & Extract from File

Hello,
First time poster; Ive come here often for help, but couldn't find anything on this topic:

For work I have to do some painstaking copy pasting to convert one type of text file to another, but really all it is is extracting the small bits that I actually need.

I want to write a C++ program that will help speed up this process.
I am in freshman programming courses (have about 3 semesters of C++ down) so I feel like I can handle this.

Ive gotten down the fstream in and outs, etc... but am having trouble establishing a solid way to accomplish the following:

I want to be able to search for a certain character or string in the file, then copy the file contents up through another character, so I can export them to the new file. And repeat this line by line.

What method should I be using? I have looked into the fstream functions, such as seekg, but don't know how to put them together. I need someone to push me in the right direction.

Any help is much appreciated!
Is that to be performed on single lines?
If the two delimiting characters are 'a' and 'b' and the following is the file:
qwertyuiopasdfghjkl
zxcvbnm
qwertasdfgvbnm
zxcvbnmqwertyu
sdfghjkl
mnbvcxzdsa
What should the resulting file look like?
Last edited on
Does this work?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <iostream>
#include <string>

using namespace std;

bool attemptToFind(string trg, bool copy = false) {
	int i = 0;
	char *c = new char;
	while( i < trg.length() ) {
		cin.read(c, 1);
		if(copy)
			cout << *c;
		if( *c == trg[i] )
			i++;
		else
			break;
	}
	delete c;
	return i == trg.length();
}

int main(int argc, char *argv[]) {
	if( argc < 3 ) {
		cerr << "Needs two arguments\n";
		return 1;
	}
	
	const string start = argv[1],
		end = argv[2];
	bool copy = false;
	
	while( !cin.eof() ) {
		if( !cin.good() ) {
			cerr << "Input error\n";
			return 2;
		}
		
		if(copy) {
			if( attemptToFind(end, true) )
				copy = false;
		} else {
			if( attemptToFind(start) )
				copy = true;
		}
	}
	
	return 0;
}


I just wrote and tested this by calling:
copy_paste ! ... <in.txt

Where the input file (in.txt) contained:
Hello, world!
Copy all this text...
However, not this text.

And I got:

Copy all this text...
Last edited on
Yes all of it would be done on single lines.
Actually Ill just find an example.

(OFF,0,0,0) OX= WX= #[-40.7570,-8.5179,-9.5819,98.6692,-43.3770,-7.1841] ;

This is part of the end output for a line. In the important file there is crap between the OX= and WX= :

(OFF,0,0,0) OX= fdasfdsafdsafdsafdsafdsafds WX= agdsafdsafdsafdsafdsafdas #[-40.7570,-8.5179,-9.5819,98.6692,-43.3770,-7.1841] ; //this is all on one line

Essentially what I need is to be able to get read and extract everything between the #[ and ];

Thanks for the reply!
Last edited on
@Mathhead:

Will definitely try that out as soon as I get home.

Thanks, and Ill get back to you :]
I ran my program by calling:
copy_paste #[ ]

I entered the input (followed by an eof):
(OFF,0,0,0) OX= fdasfdsafdsafdsafdsafdsafds WX= agdsafdsafdsafdsafdsafdas #[-40.7570,-8.5179,-9.5819,98.6692,-43.3770,-7.1841] ;

and got:
-40.7570,-8.5179,-9.5819,98.6692,-43.3770,-7.1841]

I'm sure you could edit the program to discard the ending string "]".
You can make a little find function pretty easily.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
 fstream file;
 file.open("file.ext",ios::in);

 string tofind = "#[";
 string found = "";
 vector<string> vecfound;


 while (!file.eof()){
  char buf = file.get();
  if (buf == tofind[0]){
   //you can now use peek() to find out if the next char is going to be a part of what you're trying to find.

  if (/*found the whole key*/){
   while (!file.eof() && /*next little bit isn't the ending char, you can use a function, if you want*/) {
    found += file.get();
    //or output to the new file
    
   }
   vecfound.push_back(found);
   found = "";
  }
}
Thank you ultifinitus. I'll try that as well.
I have a question about how the program would read the file.
Does it read continuously until EOF (and new lines are just the '\n' character), or would it stop at the end of a line?
Also, can I force it to start reading from the next line, or skip the next line, etc. Is there an easy way to do this? I read somewhere about using a vector to index all the \n newlines, but if it's that complicated I don't really need it. Just wondering!

Again, thanks for the replies; much appreciated!
My program doesn't read by line, but by character.

You can edit it to use the getline() method http://cplusplus.com/reference/string/getline in place of the read() method http://cplusplus.com/reference/iostream/istream/read . (Also, you don't need to store all the lines in memory at once, just create one string and loop though all the lines reading one at a time, copy to it stream, and discard it. As you print to a stream it gets stored there so I don't think you need an extra copy in memory as an array.)
Last edited on
you can use getline to get a single line, and if you don't want that line, just let the result fizzle.

The .eof() checks to see if the current position is at the end of the file, so when you read in parts of the file using the different methods, the position pointer is incremented.
Ah yeah I get it now.

|My program doesn't read by line, but by character.
|
|You can edit it to use the getline() method http://cplusplus.com/reference/string/getline in place |of the read() method http://cplusplus.com/reference/iostream/istream/read . (Also, you don't |need to store all the lines in memory at once, just create one string and loop though all the lines |reading one at a time, copy to it stream, and discard it. As you print to a stream it gets stored |there so I don't think you need an extra copy in memory as an array.)

I was confusing the getline with get, for some reason I thought it would stop reading once it reached the end of a line.

However, am i correct that .get will return a newline as '\n'?
You are correct, it will return every single character, including the end of file.
Okay thanks again.

I'll try to get something working and will post the results later today/tomorrow.
Okay I'm trying something out and am a bit confused by the tellg() function.

I have this in a separate program:
1
2
3
4
5
for(int i=0; i<10; i++){
        cout<<"cur:"<<textIn.tellg()<<endl;
        char buf = textIn.get();
        cout<<"next char:"<<buf<<endl;
}

And it gives me the output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cur:0
next char:f
cur:5
next char: 
cur:10
next char:o
cur:15
next char:h
cur:20
next char: 

.
.
.


How exactly does tellg() work? It returns the current location of the get pointer, but does it also move the get pointer? And why is it, in this case, incrementing by 5?

I hope I phrased that correctly, Im just trying to understand the output.

Thank you!

EDIT:

The input file is:
1
2
3
4
5
first #second third;
#fourth; fifth
sixth seventh
either### ninth
tenth

Last edited on
Try the ios::binary binary flag

file.open("filename",ios::in|ios::binary);
Ah thank you!
no problem ;)
Well I have wrapped this project up, thanks to you all!

On an unrelated note, is there an easy way to get to two DOS box displays up in the same program? I would like to be able to key in input on one and the other to display data as the program refreshes it. Is this possible?
Topic archived. No new replies allowed.