Program that looks for repetitive numbers in a file

I don't even know where to begin on this at the moment. I have some large text files that have 9 digit numbers on each line. (We're talking probably thousands total). What I need to do is make a C++ program that will go through these text files, and check each 9 digit number and compare it and look for duplicates, delete the duplicates, and then copy the non-duplicate 9 digit numbers to another file. The numbers are NOT comma delineated (something I'm starting to wish I should have done, as I'm guessing it would have made it easier.

My only difficulty is not the creating of a new file or that, its whats a good setup/method to use to have it pull out each 9 digit number, compare it to the rest of the file, look for duplicates, then delete the duplicates? Not not really sure how to go about this.
I think you should load the whole file into array(or vector)
pseudocode:
1
2
3
4
5
6
7
8
vector<int> numbers;

load_file("numbers.txt",numbers); // load numbers from file into vector

// search for  repetitive numbers and delete them from vector
// you'll need two nested loops for that

write_file("numbers2.txt",numbers);


Do the numbers need to be in the same order when writing to the new text file? If all you are needing it for is to delete the duplicates you could probably use the STL list function (which will also arrange the numbers in order)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include "fstream" 
#include "iostream" 
#include "list"

using namespace std;

int main()
{
	list<int> List;
	list<int>::iterator It;

	int Number;

	ifstream File1;
	File1.open("Old_File.txt"); 

	while(!File1.eof())
	{
		File1 >> Number;
		List.push_back(Number);   // Reads each number and inserts it into list 
	}

	File1.close();

	List.sort();     //Sorts the numbers in order of value
	List.unique();   //Removes any duplicates

	ofstream File2;
	File2.open("New_File.txt"); 
	
	for(It = List.begin(); It != List.end(); It++)
	{
		File2 << *It;   // Writes each element in list into new text file
		File2 << '\n';
	}

	File2.close(); 

	return 0; 
}
If order does matter, you can use this version of the code above. This will preserve the order but may be slightly less effeceint.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <fstream>
#include <iostream>
#include <vector>

using namespace std;

int main()
{
	vector<int> List;
	vector<int>::iterator It;

	int Number;

	ifstream File1;
	File1.open("Old_File.txt"); 

	while(!File1.eof())
	{
		File1 >> Number;
		List.push_back(Number);   // Reads each number and inserts it into list 
	}

	File1.close();

	for (vector<int>::iterator i = List.begin() ; i < List.end() ; i++)
		for (vector<int>::iterator j = i+1 ; j < List.end(); j++)
			if (*i == *j) 
				List.erase(j--);

	ofstream File2;
	File2.open("New_File.txt"); 
	
	for(vector<int>::iterator i = List.begin(); i != List.end(); i++)
	{
		File2 << *i;   // Writes each element in list into new text file
		File2 << '\n';
	}

	File2.close(); 

	return 0; 
}
@James2250
You are doing a couple of dangerous things there.

@dalydir
The std::set is specifically designed to hold a collection of unique elements.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <iostream>
#include <set>
#include <string>
using namespace std;

int main()
  {
  // We'll use a collection of unique 'string's.
  // (Since each line of the input file is just a number, and the only thing we are doing
  //  with the number is comparing it for uniqueness against another number/string.)
  typedef set <string> myset;

  myset unique_s;

  // Get all the input strings and keep the unique ones
  string s;
  while (getline( cin, s ))
    {
    unique_s.insert( s );
    }

  // Print all the unique strings to the output
  for (my_set::const_iterator i = unique_s.begin(); i != unique_s.end(); ++i)
    cout << (*i) << endl;

  // All done!
  return 0;
  }

Here is an example run:

D:\Michael\prog>a
123
456
123
7890
456
^Z
123
456
7890

D:\Michael\prog>
(That ^Z in there is when I pressed Ctrl+Z to terminate the input; everything that follows it is output.)

You can also read and write files by using command pipes:

D:\Michael\prog>a < original.txt > outfile.txt

Hope this helps.
Topic archived. No new replies allowed.