Feedback please

This programs checks a files for duplicate lines and creates a new file that is duplicate free. Would you please review it and give me some feed back on how it can be improved.

The code should be pretty self explanatory. Thank you for your feedback.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <fstream>
#include <iostream>
#include <string>
using namespace std;

string word[2000];

int i,j,k;
bool c = 0;

int main()
{
// opens/creates files
    ifstream afile("word.txt");
    ofstream bfile("list.txt");

// creates a list of names in memory
    for (i=0;i<2000;i++)
    {
        getline(afile, word[i]);
    } // for i -- getline
afile.close();

// checks for and count doops
// writes names to bfile("list.txt")
    for (i=0; i<2000; i++)
    {
    c=0; // c = copy aka doop aka duplicate
    k=0; // k = counter, counts number of doops

    // checks to see is this name is a doop
        for (j=0;j<i;j++)
        {
        // if word is doop
            if (word[j] == word[i])
            { c++; }
        }
    // if word is not a doop
        if (c==0)
        {
        // counts number of doops
            for (j=i;j<2000;j++)
            {
            // doop was found lets count it
                if (word[j] == word[i])
                { k++; } // counting is fun

            } // for j=i

        // writes word to bfile
            bfile << k << " - " << word[i] <<  endl;

        } // if (c==0)

    } // i -- check list
bfile.close();


} // main 

Operationally:
Your file cannot be more than 2000 lines long.
As the file gets larger, it takes exponentially longer to run.

Code:
Your code is littered with 2000. A constant should be used.
You keep checking a line even after establishing its a duplicate, and then you loop again just to count.
In the end, the output isn't the same as the input without the duplicates, as you have additional text.
You shouldn't need to call close on a stream, it closes itself on destruction.
main doesn't return a value.
As a rule your loops shoud use pre-increment rather than post-increment (++i rather than i++).
First, I'd man uniq if I were you since you just reinvented a simplistic version of it.

Second, the main problems I see with your program are:
1) You support files that are no larger than 2000 lines long.
2) Your program is not scalable in terms of memory usage (you cache the entire file in RAM).
3) Your program is not scalable in terms of execution time (linear search for every line = O(n^2)).
Jsmith, I am on a windows machine. Does windows have man pages?

Thanks guys. The file this is used for is a list of names and I wanted a count of how many time each name was listed.

You keep checking a line even after establishing its a duplicate, and then you loop again just to count.


1st the program looks in bfile for a doop. If no doop is found it counts the doops in the afile then writes the count and string to bfile.

You shouldn't need to call close on a stream, it closes itself on destruction.

Thanks. Thats what the iofstream tutorial did so thats how i did it.

main doesn't return a value.

Does it need to?

As a rule your loops shoud use pre-increment rather than post-increment

Whats the different?

You support files that are no larger than 2000 lines long.

How can i fix this?

Your program is not scalable in terms of memory usage (you cache the entire file in RAM)

How can i fix this?

Thank you for you input. I don't know much about C++ and appreciate you input.

Last edited on
On unix, you'd do:
sort <filename> | uniq

Windows provides a sort.

Here's a uniq:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>

int main()
{
        std::string lastline, line;
        while (std::getline(std::cin, line))
        {
                if (line != lastline)
                {
                        std::cout << line << std::endl;
                        lastline = line;
                }
        }

        return 0;
}


Further:
Yes, you need to return a value from main.
post-increment needs a temporary value, pre-increment doesn't.
If you work with streams without storing, you don't have a limit.
Last edited on
Thanks KBW.
Can you explain this?
If you work with streams without storing, you don't have a limit.
If you look at the uniq above, it only ever stores two lines of the input. It reads from stdin and writes to stdout, and doesn't need to store anything much during the run, as it isn't using resources, it won't run out.

If your program tries to suck everything in before writing it out, then you have the immediate limit of how much you can actually hold.
I get it. Thanks!
Topic archived. No new replies allowed.