-> open A
-> read first line from A
-> open B
-> read first line from B
-> open C
-> Run a loop comparing the current line from A to each of the lines in B. If by reaching the end of B you haven't found a similar line, write current line to C. If you find a similar line, read next line from A, go back to beginning of B.
I don't think you would have any memory issue by reading a text file line by line since you'd only be dealing with two strings at once.
As far ad I understand OP's issue it's what he's looking for, or is it?
and we have to find lines that are present in file A
but not present in file B and print result to file C
I was thinking in terms of text files as an example. Considering a "line" is any string ending with '\n', so it's highly unlikely that you get two identical lines.
In which way would you think it might not be the expected output? :)
I must apologise but I'm not getting what you're saying at all about the relationship between A and B. Aren't they two separate text files, which generation is independent from one another?
You assume separate. I assume one evolving from the other. We can both be wrong.
The point is that we should know the relationship, etc in order to select "best" solution for it.
For example, if we know that each file is less than megabyte, or that some are larger than terabyte, that knowledge will guide our decisions.
Oh. Yes, I see what you mean. Indeed it's very relevant to know the source, size and relationship between, the files. I hadn't taken that into consideration.
I guess it's up to OP to narrow down what files he wants to deal with..
write a function that searches a file for a given text string and returns a bool. bool FileContains(const std::string& searchTerm);
get that working.
then write another function to read each line from a file, in the loop call FileContains(thisLine) to see if the line exists, and if not, write it to file C.
Edit: After reading keskiverto's responses i feel the need to add, this code frag
1 2 3 4 5 6 7
for each line in A
if (fileC.FileContains(thisLine))
// already found, ignore
elseif (fileB.FileContains(thisLine))
fileC.write(thisLine)
endif
endfor
As keskiverto says, inside knowledge will lead to better algorithms and performance. My example above is just a brute force method.
for example, if you know that all of the items in the list are alphabetic order you can tweak the FileContains() algorithm to bug out sooner, if its looking for "dog" and gets to "house" you know "dog" cannot exist. but we know nothing of your data so brute force would seem the best suggestion atm.
#include <iostream>
#include <fstream>
#include <string>
#include <unordered_map>
usingnamespace std;
constint LINES_PER_BATCH = 1000000;
int main() {
ifstream finA("fileA"); if (!finA) { cerr<<"fileA\n"; return 0; }
ifstream finB("fileB"); if (!finB) { cerr<<"fileB\n"; return 0; }
while (true) {
string line;
// read a batch of fileA lines
unordered_map<string,bool> batch; // bool true means "found in fileB"
for (int i = 0; i < LINES_PER_BATCH && getline(finA, line); ++i)
batch[line] = false;
if (batch.size() == 0)
break;
// process all of fileB with this batch of fileA's lines.
while (getline(finB, line)) {
auto it = batch.find(line);
if (it != batch.end())
it->second = true;
}
finB.seekg(0); // reset fileB to the beginning
// print lines from this batch of fileA not found in fileB
for (constauto& x: batch)
if (!x.second)
cout << x.first << '\n';
}
}