Deleting repeated lines in a text file

Forum

Forum
General C++ Programming
Deleting repeated lines in a text file

Deleting repeated lines in a text file

Jul 30, 2014 at 6:34pm

Hi everyone,

I have a text file with repeated lines, and i would like to get rid of the duplicate information, can you help me with an algorithm to achieve this?

Example of my text file (the actual file may be huge, up to 500MB):

+Hello
+Bye
*The sun
-The moon
+Bye
/One coin
+Bye
+Bye
+Hello
*The sun

And i would expect to get something like this:

+Hello
+Bye
*The sun
-The moon
/One coin

I know how to open and read a file with fstream and getline(), but i don't know how to make the comparison.

Thanks.

Last edited on Jul 30, 2014 at 6:35pm

Jul 30, 2014 at 7:15pm

SIK (309)

each time you read a line you should check if the line is already stored in a list.

if it is we ignore line and proceed to next.

if it is not in list we add it.

once all lines from file has been processed we output the list to a new file which should contain no duplicate lines.

Jul 30, 2014 at 7:50pm

booradley60 (896)

Perhaps a set would be useful? As you read a line, try to put the line into a set.
http://www.cplusplus.com/reference/set/set/
It probably won't preserve the order that you see the lines. Would that be a problem?

Jul 30, 2014 at 9:24pm

SIK (309)

actually a set would work nicely if we directly output each line to output file after determining that it does not exist in the set and was added to set.

Jul 30, 2014 at 9:25pm

SIK (309)

at the end the set may not preserve the original order but has now become insignificant.

Jul 30, 2014 at 9:31pm

ntran (33)

Thank to both of you SIK and booradley60, it seems like using sets may solve my problem, sorry if it was too obvious from the beginning, but I didn't knew about the existence of sets.

And yes, the original order is not important.

Last edited on Jul 30, 2014 at 9:31pm

Jul 30, 2014 at 9:31pm

booradley60 (896)

Yup, that would work nicely.

Topic archived. No new replies allowed.

C++

Forum

Deleting repeated lines in a text file