program crashes

so i made a program that will automatically go through log files and clean out any major repeating lines. the problem is that SOMETIMES it works and SOMETIMES it crashes and i have no idea why. it crashes while it's running that block of code between lines 57 and 82. does anyone have any ideas of what it could possibly be? any help would be greatly appreciated.

the log file here is actually a lot smaller than the ones that i would usually have, they're usually somewhere around 100 or so MB so you can see why the log cleaner was necessary.

here's a link to the file
http://www.filedropper.com/logcleanerv1

and this is the code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int compare(string a, string b); //compares two string values and returns percentage of how similar they are
int bigger(string a, string b); //sends the bigger string first from compare
int blacklist(string[], int size, string line);
void firstPass(void);
int blacklistF(string[], int size, string line);

//global variables

const unsigned int percent = 30; //percentage of string that is similar that qualifies it to be called 'similar'
const unsigned int numofrep = 50; //if any file repeats more than fifty times, it will be removed
string blacklisted[10000];//final blacklisted lines
string potentials[10000000]= {""};
int unsigned amounts[10000000] = {0};

int main(){
    cout<< "going through first pass with user defined blacklist:\n";
firstPass();
cout<< endl;

//ifstream variables
fstream log; //original log file
//ifstream altlog2; //for reading to see if next line should be killed or not

//ofstream variables
//ofstream altlog1; //loads lines from ifstream into this, but without ALL repeats including the ones that will be kept
fstream altlog;
fstream altlog2;



//other variables
string temp, temp2, potential,line;
unsigned int count1 = 0,count2 = 0, count3 = 0, numoflines = 0, totalBlacklisted = 0, crap = 0;
bool found = 0;

log.open("log1.log");
altlog.open("altlog.log", fstream::in | fstream::out | ios::trunc);
altlog2.open("altlog.log");

//generate blacklist file
//********************************************************************************************
cout<< "generating blacklist, this may anywhere between 1 and 10 minutes\ndepending on the log file...\n";



unsigned int numofpotentials = 0;
unsigned int count4 = 0;



while(!log.eof()){
                  ++numoflines;
                  //cout<< numoflines << "\n";
                  getline(log,temp);
                  //altlog << temp << endl;
                  //altlog.flush();
                  count1 = 0;
                  while(count4 < numofpotentials){
                               if(compare(temp, potentials[count4])>= percent){
                                                found = true;
                                                ++amounts[count1];
                               }
                               ++count1;
                               ++count4;
                  }
                  if(!found){
                             ++numofpotentials;
                             potentials[numofpotentials] = temp;
                             ++amounts[count1];
                             }
                  found = false;
                  count4 = 0;
                  if((numoflines % 1000) == 0)
                  cout<<numoflines/1000<< "\n";

}






unsigned int blacklistnums[10000];


//i've obtained the repeats, now it's time to decide which ones to keep in
//***************************
cout<< "omitting repeats with more than " << numofrep << "\n";
altlog.clear();
altlog.seekg(0);
for(int i = 0; i < numoflines-1; ++i){
        //getline(altlog, potential);
        if(amounts[i+1] > numofrep){
                      blacklisted[count2] = potentials[i+1];
                      blacklistnums[count2] = amounts[i+1];
                      ++count2;
        }
}


//now that i've done that, it's time to write the new log file
cout<< "generating new log file\n";
altlog2.clear();
altlog2.seekg(0);
log.clear();
log.seekg(0);

altlog2 << "the following lines have been automatically removed from this log file:\n";
for(int i = 0; i < count2; ++i){
        altlog2<< blacklisted[i] << "\n" << blacklistnums[i] << "\n";
}
altlog2 << "\n\n\n\n\n\n\n\n\n\n";

while(getline(log, line)){
                   if(blacklist(blacklisted, count2, line)){
                                             altlog2 << line << endl;
                   }
}




altlog.close();
altlog2.close();
log.close();
remove("log1.log");
rename("altlog.log","newlog.log");

return 0;

}





int compare(string a, string b){

return(a.size() > b.size() ? bigger(a,b) : bigger(b,a));
}


so does anyone think they could figure out why it's crashing? i've been working on this program for a while now and if i can get it to not crash while reading the file then i'll be finished
Last edited on
please post code for these functions also.

1
2
3
4
int bigger(string a, string b); //sends the bigger string first from compare
int blacklist(string[], int size, string line);
void firstPass(void);
int blacklistF(string[], int size, string line);
that's weird, i thought i put everything there but yeah it's missing, weird. well anyway here it is. something that i found weird is that some logs will work on some computers but it won't on others, on the ones where it doesn't work it breaks at different places.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
int bigger(string a, string b){

int maxcount = 0, currentcount = 0;//used to see which set of concurrent characters were biggest
for(int i = 0; i < a.size(); i += 8){
	for(int j = 0; j < b.size(); ++j){
		if(a[i+j] == b[j]){
		 ++currentcount;
		 }
		else{
			if(currentcount > maxcount){
			 maxcount = currentcount;
             }//end if
			 currentcount = 0;
		    }//end else
		}//end inner for loop
	}//end outer for loop


return ((int)(((float)maxcount/((float)a.size()))*100));

}


int blacklist(string blacklist[], int size, string line){
    int counter = 0;
    for(int i = 0; i < size; ++i){
            if(compare(blacklist[i],line) >= percent){
                                          return 0;
            }
    }
    return 1;
}




void firstPass(void){
string blacklist[200];



int BLamount = 0;
ifstream blacklisttext;
	blacklisttext.open("blacklist.txt");
string BLline;
while(!blacklisttext.eof()){
	while(getline(blacklisttext, BLline)){
		blacklist[BLamount] = BLline;
		++BLamount;
		}
	}

string line;
ifstream sup;
    sup.open("log.log");            //opens your log file
    ofstream temp;
    temp.open("temp.txt");          //temporary file for copying all non flagged lines
    while(!sup.eof()){  //stops when it finds the end of the file
    while (getline(sup,line))  // reads each line in one by one
    {

	if(blacklistF(blacklist, BLamount, line))
        {                                                                    // order to throw out the line
           temp << line << endl;

        } // end if        
    }// end inner while loop
}// end outer while loop
    temp.close();   // close the two files that were being used
    sup.close();
    remove("newlog.log");    //if the program ran before it will delete the file that it had previously made
    rename("temp.txt","log1.log"); //rename the new file to newlog.log
}

int blacklistF(string blacklist[], int size, string line){

int counter = 0;
    for(int i = 0; i < size; ++i){
            if(line.find(blacklist[i]) != -1){
		++counter;
		}
	}
	if(counter == 0)
	 return(1);
	if(counter != 0)
	 return(0);
}
Declaring 10000000 std::string objects at once in the stack is a very bad practice and the most likely cause to app crash.
You need to reconsider your approach, I suggest you to use a memory-mapped file technique.
if that were the problem it would 1. fail to compile or 2. fail as soon as the program starts. which it doesn't. so yes i agree that it's a very lazy way of doing it and there are definitely better ways to do it. however that's not the problem i'm having.
No, it is undefined behavior, which means anything could happen, even to work as expected.
This kind of bugs is most annoying ones.

If your program algorithm works as expected (I don't tested) try to allocate memory from heap instead with new operator.
modoran wrote:
Declaring 10000000 std::string objects at once in the stack is a very bad practice and the most likely cause to app crash.
He didn't. That are global variables.

the program looks overly complicated. What's that blacklist and count1 ... count4 good for.

I would have said that if you read a line that's already in the array 'potentials' 'amounts' at that position is increased and that's it?
i'll probably just remake it, i agree it does look overly complicated, the problem with it is that i put each part together several days apart so that's why it's all discombobulated. now that i've done this version of it i'll make a new version, that's a lot more coherent. this time i'll do it with pointers and whatnot like modoran said, maybe that will make it work as expected.

thanks for your help
Topic archived. No new replies allowed.