merge same words in text files

May 30, 2016 at 9:13pm
I need to merge same words from f.txt and n.txt to f1.txt but i only cant merge them when every word is on its own line and in one line, like it only compares 1st to 1st, 2nd to 2nd, and so on, how to make it look through all text and merge them?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
int main( )
	{
	    ifstream in("f.txt");
	    ifstream in2("n.txt");
	    ofstream end("f1.txt");
	    while ((!in.eof()) && (!in2.eof())) { 
	        string line,line2;
	        getline(in,line);
	        getline(in2,line2);
	        if(line==line2){
	           end<<line << "\n";
	        }
	    }
	    in.close();
	    in2.close();
	    end.close();
	    return 0;
	}
May 30, 2016 at 9:22pm
If you want to work with "words" then you probably should be using the extraction operator instead of getline().
May 30, 2016 at 10:24pm
What do you mean by "merge same words"?

Are you trying to find all the "words" common to two files?
For example:

    I have eaten a pizza.

and

    A few dollars is all they have.

The common words are:

    a have

Is that correct?
May 31, 2016 at 9:35am
Yes, that's correct.
May 31, 2016 at 12:34pm
Then you'll probably want to read one of the files, extracting the words into some kind of container, vector, map, etc. Then read a word from the second file, search your container to see if the word is contained in your container, if it is you would write the word into the third file.
May 31, 2016 at 4:21pm
Store the words from each file in a set<string>, then find the intersection using set_intersection().

You can use f >> s to read words. Do this with a loop.

Once you have a word (inside the loop), strip it of all non-alphanumeric characters and convert all uppercase letters to lowercase. Use a loop. (Create a function to do it.)

Also create a function to read a file's words into a set. Once you have the two sets (one for each file), intersect them into a third set. The third set will contain the words you should output.

Find an example of using set_intersection() here:
http://www.cplusplus.com/forum/general/26732/#msg142721

Be sure to #include <algorithm>, <cctype>, <iterator>, <set>, and <string>.

Good luck!
Topic archived. No new replies allowed.