hi i was wondering if i could have some help in optimising this code i need to reduce the time it takes to search through an array of 1 million words against 850 words needed to be removed
#include <iostream>
#include <string>
#include <fstream>
#include <time.h>
usingnamespace std;
time_t begin, end;
void main()
{
time(&begin);
//opens file cotaining words that need to be removed
ifstream WordList("WordList.txt");
ifstream Original("word.txt");
char str[15];
int max = 10;
string* words = new string[max];
int x= 0;
int y = 0;
//read text file holding list of all the words
while(!WordList.eof())
{
//adding all the words held onto a dynamic array
WordList.getline( str, 15);
words[x] = str;
x++;
//if array is too small make it larger and copy over to
if( x >= max)
{
max = max *2;
string* temp = new string[max];
for(int i = 0; i < x; i++)
{
temp[i] = words[i];
}
delete [] words;
words = temp;
}
}
//reset max back to 10 for file that is being read in
//same as for word array above this
max = 10;
string* readin = new string[max];
while( !Original.eof())
{
Original >> str;
readin[y] = str;
y++;
if( y >= max)
{
max = max *2;
string* temp2 = new string[max];
for(int i = 0; i < y; i++)
{
temp2[i] = readin[i];
}
delete [] readin;
readin = temp2;
}
}
WordList.close();
Original.close();
int WordSize = x;
int ReadinSize = y;
cout << endl;
//uses loops to check if file in read in text file
//contains words needed to be removed
bool check = false;
ofstream Final("dest.txt");
for( int i = 0; i <= ReadinSize; i++)
{
check = false;
for(int x = 0; x <= WordSize; x++)
{
if(readin[i] == words[x])
{
check = true;
}
}
if( check == false)
{
Final << readin[i] << endl;
cout << readin[i] << endl;
}
}
Final.close();
//deallocate memeory used in the array
delete[] words;
delete[] readin;
time(&end);
cout << "Time elapsed: " << difftime(end, begin) << " seconds"<< endl;
system("pause");
}
I would suggest building a small cache for this. Caches are used for optimization as there are 3 within the CPU( Level 1, Level 2 and Level 3 ).
Another way is peeking at the first character in each of the words. If the first character at index n is the starting character of a word you're looking for, remove it.
Another way is to read each word that is pulled from the file stream. If the word is the one of the ones you're looking for, discard the word.
If it is not sorted, then sort it either the big or the small list
To speed up the search, try a tree or a trie.
Also, why are you handling the dynamic allocation? Using an static array will save you all the reallocation.
And as Framework pointed: do you actually need to store the words? (from the big list)
Edit:
If you really need to store them, and you can't guess a good number to reserve the space needed, use a container (like a vector or a set) that will handle the memory properly.