This programs checks a files for duplicate lines and creates a new file that is duplicate free. Would you please review it and give me some feed back on how it can be improved.
The code should be pretty self explanatory. Thank you for your feedback.
#include <fstream>
#include <iostream>
#include <string>
usingnamespace std;
string word[2000];
int i,j,k;
bool c = 0;
int main()
{
// opens/creates files
ifstream afile("word.txt");
ofstream bfile("list.txt");
// creates a list of names in memory
for (i=0;i<2000;i++)
{
getline(afile, word[i]);
} // for i -- getline
afile.close();
// checks for and count doops
// writes names to bfile("list.txt")
for (i=0; i<2000; i++)
{
c=0; // c = copy aka doop aka duplicate
k=0; // k = counter, counts number of doops
// checks to see is this name is a doop
for (j=0;j<i;j++)
{
// if word is doop
if (word[j] == word[i])
{ c++; }
}
// if word is not a doop
if (c==0)
{
// counts number of doops
for (j=i;j<2000;j++)
{
// doop was found lets count it
if (word[j] == word[i])
{ k++; } // counting is fun
} // for j=i
// writes word to bfile
bfile << k << " - " << word[i] << endl;
} // if (c==0)
} // i -- check list
bfile.close();
} // main
Operationally:
Your file cannot be more than 2000 lines long.
As the file gets larger, it takes exponentially longer to run.
Code:
Your code is littered with 2000. A constant should be used.
You keep checking a line even after establishing its a duplicate, and then you loop again just to count.
In the end, the output isn't the same as the input without the duplicates, as you have additional text.
You shouldn't need to call close on a stream, it closes itself on destruction.
main doesn't return a value.
As a rule your loops shoud use pre-increment rather than post-increment (++i rather than i++).
First, I'd man uniq if I were you since you just reinvented a simplistic version of it.
Second, the main problems I see with your program are:
1) You support files that are no larger than 2000 lines long.
2) Your program is not scalable in terms of memory usage (you cache the entire file in RAM).
3) Your program is not scalable in terms of execution time (linear search for every line = O(n^2)).
#include <iostream>
int main()
{
std::string lastline, line;
while (std::getline(std::cin, line))
{
if (line != lastline)
{
std::cout << line << std::endl;
lastline = line;
}
}
return 0;
}
Further:
Yes, you need to return a value from main.
post-increment needs a temporary value, pre-increment doesn't.
If you work with streams without storing, you don't have a limit.
If you look at the uniq above, it only ever stores two lines of the input. It reads from stdin and writes to stdout, and doesn't need to store anything much during the run, as it isn't using resources, it won't run out.
If your program tries to suck everything in before writing it out, then you have the immediate limit of how much you can actually hold.