Trying to identify problem in program.

The following program works properly (or at least accomplishes its intended function without failing)when compiled/run on the windows 2000 computer at my lab.

It does not work properly on my own windows xp computer. I believe something is going wrong with cstdio.

Be prepared this program is a bit longer than the ones I normally see in here.
I'm totally open to suggested changes in the way I handle things,

Keep in mind that this program will be doing DNA sequence comparisons; functionality and speed are far more important than easily written code. The comparison algorithm WILL be adjusted in the future, since simple homology isn't really good enough for what we're looking for, but I don't think thats where the problem lies, since the execution reaches the debug print statement that prints the string containing the sequences before it fails. This means that the sequence files are being loaded properly. I still don't understand why its failing at that point.

If anyone intends to run a test of this, I can send you a sequence file or you can make your own file just to test it, just make sure it follows the FASTA format or the DNASTAR *.seq format (which allows commenting at the beginning of the file, sequence follows a line of two ^ characters). Just because it works for you does not indicate that its perfect, because it works on one machine and not another in my case. (it could be that my OS is just being strange, but I'd like to figure this out).

I'll paste the code in follow-up messages as it is quite long.
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
#include <cstdio>
#include <cstdlib>

#define BAD_ARGS 1
#define FILE_ERROR 2
#define BAD_FORMAT 3
/**/
#define DEBUG
//*/
/**/
#define HOMOLOGY_TEST
//*/
/*/
#define TEST_EQUIV
//*/
struct element 
{
   int start1,start2;
   struct element * next;
};
typedef struct element node;

int seqlength(FILE *file,int *size);
int readseq(FILE *file,char *seq,int size);
int findSeq(FILE *file,const char *input);
int equiv(char c1,char c2);
int backcheck(node **head, node **curr,int *num1, int *num2, int cursize,int maxsize);
int main (int argc, const char* argv[]) 
{
	int count,minsize,maxsize,num1,num2,size1,size2,cursize,selector;
	char flags,*ptr1,*ptr2,*ptr3, *seq1, *seq2, hashline[82]={"################################################################################\n"}, equalline[82]={"================================================================================\n"};
	const char *input1, *input2, *output;
	FILE *file;
	node **head,**curr,*newnode;
	count=1;
	flags=0;
	input1=NULL;
	input2=NULL;
	output=NULL;
	seq1=NULL;
	seq2=NULL;
	ptr1=NULL;
	ptr2=NULL;
	ptr3=NULL;
	minsize=0;
	maxsize=0;
	size1=0;
	size2=0;
	num1=0;
	num2=0;
	//==============================input handling==============================
#define INPUT_HANDLING
	if(!(argc%2))
	{
		puts("Number of arguments is wrong, indicating an input error.");
		puts(
				"Incorrect arguments!\n"
				"correct argument formats are:\n"
				"-i filename1\n-j filename2\n"
				"-o outputfilename\n"
				"-m min size of match\n"
				"-M max size of match\n"
				"repeats are not permitted"
			);
		return BAD_ARGS;
	}
	while(count<argc&&!(flags&0x01))
	{
		switch(argv[count][0])
		{
			case '-':
			switch(argv[count][1])
			{
				case 'i':
				if(argv[count][2]||flags&0x02)
				{
					flags=flags|0x01;
					break;
				}
				count++;
				input1=argv[count];
				count++;
				flags=flags|0x02;
				break;
				case 'j':
				if(argv[count][2]||flags&0x04)
				{
					flags=flags|0x01;
					break;
				}
				count++;
				input2=argv[count];
				count++;
				flags=flags|0x04;
				break;
				case 'o':
				if(argv[count][2]||flags&0x08)
				{
					flags=flags|0x01;
					break;
				}
				count++;
				output=argv[count];
				count++;
				flags=flags|0x08;
				break;
				case 'm':
				if(argv[count][2]||flags&0x10)
				{
					flags=flags|0x01;
					break;
				}
				count++;
				sscanf(argv[count],"%d",&minsize);
				count++;
				flags=flags|0x10;
				break;
				case 'M':
				if(argv[count][2]||flags&0x20)
				{
					flags=flags|0x01;
					break;
				}
				count++;
				sscanf(argv[count],"%d",&maxsize);
				count++;
				flags=flags|0x20;
				break;
				default:
				flags=flags|0x01;
				break;
			}
			break;
			default:
			flags=flags|0x01;
			break;
		}
	}
	if(flags!=0x3E)
	{
		if(!(flags&0x02)){puts("missing -i argument");}
		if(!(flags&0x04)){puts("missing -j argument");}
		if(!(flags&0x08)){puts("missing -o argument");}
		if(!(flags&0x10)){puts("missing -m argument");}
		if(!(flags&0x20)){puts("missing -M argument");}
		puts(
				"Incorrect arguments!\n"
				"correct argument formats are:\n"
				"-i filename1\n-j filename2\n"
				"-o outputfilename\n"
				"-m min size of match\n"
				"-M max size of match\n"
				"repeats are not permitted"
			);
		return BAD_ARGS;
	}
	if(!(minsize>0)||!(maxsize>0))
	{
		puts("Min and max sizes must be positive numbers.");
		return BAD_ARGS;
	}
	if(minsize>maxsize)
	{
		puts("Min size must be less than or equal to max size.");
		return BAD_ARGS;
	}
#ifdef DEBUG
	printf("input1: %s\n""input2: %s\n""output: %s\n""minsize: %d\n""maxsize: %d\n",input1,input2,output, minsize, maxsize);
#endif
	//==============================File Reading==============================
	//==========File1==========
#define FILE_1
	file=fopen(input1,"r");
	if(!file)
	{
		printf("Error opening %s!\n""Terminating program\n",input1);
		return FILE_ERROR;
	}
	flags=findSeq(file,input1);
	if(flags)
	{
		return flags;
	}
	flags=seqlength(file,&size1);
	if(flags)
	{
		return flags;
	}
	//==allocate space for sequence, check validity of format, and store==
	seq1=(char*)malloc(sizeof(char)*(size1+1));
	flags=readseq(file,seq1,size1);
	if(flags)
	{
		return flags;
	}
	fclose(file);
	file=NULL;
#ifdef DEBUG
	printf("%s\n",seq1);
#endif
	//==========File2==========
#define FILE_2
	file=fopen(input2,"r");
	if(!file)
	{
		printf("Error opening %s!\n""Terminating program\n",input2);
		return FILE_ERROR;
	}
	flags=findSeq(file,input2);
	if(flags)
	{
		return flags;
	}
	flags=seqlength(file,&size2);
	if(flags)
	{
		return flags;
	}
	//==allocate space for sequence, check validity of format, and store==
	seq2=(char*)malloc(sizeof(char)*(size2+1));
	flags=readseq(file,seq2,size2);
	if(flags)
	{
		return flags;
	}
	fclose(file);
	file=NULL;
#ifdef DEBUG
	printf("%s\n",seq2);
#endif
#ifdef TEST_EQUIV
//=======Begin Test=======
//This code block tests the equiv function that determines base-pair homology.
ptr1=(char*)malloc(sizeof(char)*17);
ptr1[0]='A';
ptr1[1]='C';
ptr1[2]='G';
ptr1[3]='T';
ptr1[4]='U';
ptr1[5]='R';
ptr1[6]='Y';
ptr1[7]='K';
ptr1[8]='M';
ptr1[9]='S';
ptr1[10]='W';
ptr1[11]='B';
ptr1[12]='D';
ptr1[13]='H';
ptr1[14]='V';
ptr1[15]='N';
ptr1[16]=0;
//"ACGTURYKMSWBDHVN"
ptr2=ptr1;
while(*ptr2)
{
	ptr3=ptr1;
	while(*ptr3)
	{
		printf("equiv(%c,%c)=%d\n",*ptr2,*ptr3,equiv(*ptr2,*ptr3));
		ptr3++;
	}
	ptr2++;
}
free(ptr1);
ptr1=NULL;
ptr2=NULL;
ptr3=NULL;
//=======End Test=======
#endif
	//==============================Sequence Analysis==============================	
#define ANALYSIS
	cursize=maxsize;
	//==================Initialize head and set pointers to NULL===================
	head=(node**)malloc(sizeof(node*)*(maxsize-minsize+1));
	curr=(node**)malloc(sizeof(node*)*(maxsize-minsize+1));
	num1=0;
	while(num1<=(maxsize-minsize+1))
	{
		head[num1]=NULL;
		curr[num1]=NULL;
		num1++;
	}
	
	while(!(cursize<minsize))
	{
		num1=0;
		while(num1<(size1-cursize))
		{
			num2=0;
			while(num2<(size2-cursize))
			{
				/*
				 *struct element 
				 *{
   				 *int start1,start2;
   				 *struct element * next;
				 *};
				 *typedef struct element node;
				 */
				count=0;
				flags=1;
				//==========Check if regions are homologous==========
				while((count<cursize)&&(flags))
				{
					flags=equiv(seq1[num1+count],seq2[num2+count]);
					count++;
				}
				//==========Check if regions are encompassed by Larger regions already found==========
				if(flags&&(cursize<maxsize))
					{
						flags=backcheck(head,curr,&num1,&num2,cursize,maxsize);
					}
				//==========Store info about region in a node==========
				if(flags)
				{
					newnode=(node*)malloc(sizeof(node));
					newnode->next=NULL;
					newnode->start1=num1;
					newnode->start2=num2;
					if(head[maxsize-cursize]==NULL)
					{
						head[maxsize-cursize]=newnode;
						curr[maxsize-cursize]=newnode;
					}
					else
					{
						curr[maxsize-cursize]->next=newnode;
						curr[maxsize-cursize]=newnode;
					}
					//num1=num1+count;
					num2=num2+count;
				}
				num2++;
			}
			num1++;
		}
#ifdef HOMOLOGY_TEST
		printf("cursize: %d,num1: %d,num2: %d\n",cursize,num1,num2);
#endif
		curr[maxsize-cursize]=head[maxsize-cursize];
		cursize--;
	}
	count=0;
	while(count<(maxsize-minsize+1))
	{
		while(curr[count])
		{
#ifdef HOMOLOGY_TEST
			printf("Homology found, size: %d, start1: %d,start2: %d\n",maxsize-count,curr[count]->start1+1,curr[count]->start2+1);
#endif
			curr[count]=curr[count]->next;
		}
		curr[count]=head[count];
#ifdef HOMOLOGY_TEST
		getchar();
#endif
		count++;
	}
quick note here, i've been using #define and #ifdef statements to make it
easier to switch pieces of code on and off in various places, and also to
easiliy locate portions of code in eclipse. Not every definition serves a
purpose within the program, hopefully it doesnt bother anything.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
//==============================File Writing==============================
#define NO_FILE_WRITING
#ifdef FILE_WRITING
	file=fopen(output,"w");
	free(curr);
	curr=NULL;
	count=0;
	num1=0;
	num2=0;
	selector=0;
	/*/
	fputs(equalline,file);
	fputs(hashline,file);
	//*/
	fprintf(file,"Results of Comparison Between %s and %s:\n",input1,input2);
	fprintf(file,"Size of %s: %d\tSize of %s: %d\n",input1,size1,input2,size2);
	while(selector<maxsize-minsize+1)
	{
		fputs(hashline,file);
		fprintf(file,"Region Size: %d\n",maxsize-selector);
		newnode=head[selector];
		while(newnode)
		{
			fputs(equalline,file);
			count=0;
			while(count<maxsize-selector)
			{
				fputc(seq1[newnode->start1+count],file);
				count++;
			}
			fputc('\n',file);
			fprintf(file,"Start1: %d\t""Start2: %d\n""  End1: %d\t""  End2: %d\n",newnode->start1+1,newnode->start2+1,newnode->start1+maxsize-selector,newnode->start2+maxsize-selector);
			newnode=newnode->next;
			free(head[selector]);
			head[selector]=newnode;
		}
		selector++;
	}
	free(head);
	free(seq1);
	free(seq2);
	fclose(file);
	return 0;
#else
	return 0;
#endif

}
int readseq(FILE *file,char *seq,int size)
{
	char flags;
	int num=0;
	flags=fgetc(file);
	while(num<size-1)
	{
		if(('a'>'A')&&(flags>'Z')){flags=flags-('a'-'A');}
		if(('A'>'a')&&flags!='\n'&&(flags<'Z')){flags=flags+('A'-'a');}
		if(flags=='-')
		{
			puts("Gaps in sequence not handled by program. Terminating.");
			return BAD_FORMAT;
		}
		if((flags!='A')&&(flags!='C')&&(flags!='G')&&(flags!='T')&&(flags!='U')&&(flags!='R')&&(flags!='Y')&&(flags!='K')&&(flags!='M')&&(flags!='S')&&(flags!='W')&&(flags!='B')&&(flags!='D')&&(flags!='H')&&(flags!='V')&&(flags!='N')&&(flags!='\n'))
		{
			puts("Unrecognized character in sequence. Terminating.");
#ifdef DEBUG
			printf("character:%c, code:%d\n",flags,flags);
			printf("size:%d, num:%d\n",size,num);
#endif
			return BAD_FORMAT;
		}
		if(flags!='\n')
		{
			seq[num]=flags;
			num++;
		}
		flags=fgetc(file);
	}
	seq[size]=0x00;
	return 0;
}
int seqlength(FILE *file,int *size)
{
	char flags=0;
	fpos_t filepos;
	//==save position in file==
	if(fgetpos(file,&filepos))
	{
		puts("There was a problem saving file position, terminating program");
		return FILE_ERROR;
	}
	//==get sequence length==
	while(!feof(file))
	{
		if(fgetc(file)!='\n')
		{
			(*size)++;
		}
	}
#ifdef DEBUG
	printf("Size of Sequence: %d\n",*size);
	fsetpos(file,&filepos);
	flags=fgetc(file);
	printf("First letter in sequence: %c\n",flags);
#endif
	fsetpos(file,&filepos);
	return 0;
}

int findSeq(FILE *file,const char *input)
{
	char flags,filename[FILENAME_MAX],format;
	const char *ptr;
	int pos=0;
	ptr=input;
	while(*ptr!=0)
	{
		if(*ptr=='.')
		{
			pos=ptr-input;
		}
		ptr++;
	}
	ptr=input+pos;
	if(*(ptr+1)=='s'&&*(ptr+2)=='e'&&*(ptr+3)=='q'&&*(ptr+4)==0)
	{
		format=1;
	}
	switch(format)
	{
	case 1://Editseq file
		flags=fgetc(file);
		while(flags!='^')
		{
			flags=fgetc(file);
		}
		flags=fgetc(file);
		if(flags!='^')
		{
			puts("File Not formatted properly");
			return BAD_FORMAT;
		}
		flags=fgetc(file);
		if(flags!='\n')
		{
			puts("File Not formatted properly");
			return BAD_FORMAT;
		}
		return 0;
	case 0://any other extension assumes fasta
		flags=fgetc(file);
		if(flags!='>')
		{
			printf("%s is not in proper FASTA format, the first character should be a >.\n",input);
			return BAD_FORMAT;
		}
		while(flags!='\n')
		{
			flags=fgetc(file);
		}
		return 0;
	}
}

int backcheck(node **head, node **curr,int *num1, int *num2, int cursize,int maxsize)
{
	int selector,start1,end1,start2,end2,enum1,enum2;
	enum1=*num1+cursize-1;
	enum2=*num2+cursize-1;
	selector=0;
	while(cursize<maxsize-selector)
	{
		while(curr[selector])
		{
			start1=curr[selector]->start1;
			start2=curr[selector]->start2;
			end1=start1+maxsize-(selector+1);
			end2=start2+maxsize-(selector+1);
			if(
					(
						((*num1>start1)&&(*num1<end1))
						||
						((*num1==start1)||(*num1==end1))
						||
						((enum1>start1)&&(enum1<end1))
						||
						((enum1==start1)||(enum1==end1))
					)
					&&
					(
						((*num2>start2)&&(*num2<end2))
						||
						((*num2==start2)||(*num2==end2))
						||
						((enum2>start2)&&(enum2<end2))
						||
						((enum2==start2)||(enum2==end2))
					)
					
				)
			{
				curr[selector]=head[selector];
				return 0;
			}
			curr[selector]=curr[selector]->next;
		}
		curr[selector]=head[selector]; 
		selector++;
	}
	return 1;
}
int equiv(char c1,char c2)
{
/*
A 65> adenosine           M 77> A C (amino)
C 67> cytidine            S 83> G C (strong)
G 71> guanine             W 87> A T (weak)
T 84> thymidine           B 66> G T C (NOT A)
U 85> uridine             D 68> G A T (NOT C)
R 82> G A (purine)        H 72> A C T (NOT G)
Y 89> T C (pyrimidine)    V 86> G C A (NOT T)
K 75> G T (keto)          N 78> A G C T (any)
- 45>gap of indeterminate length
a<->z ==97<->122
a-A=32
*/
	switch(c1)
{
		case 'A':
		switch(c2)
		{
			default:return 0;
			case 'A':
			case 'R':
			case 'M':
			case 'W':
			case 'D':
			case 'H':
			case 'V':
			case 'N':
			return 1;
		}
		case 'C':
		switch(c2)
		{
			default:return 0;
			case 'C':
			case 'Y':
			case 'M':
			case 'S':
			case 'B':
			case 'H':
			case 'V':
			case 'N':
			return 1;
		}
		case 'G':
		switch(c2)
		{
			default:return 0;
			case 'G':
			case 'R':
			case 'K':
			case 'S':
			case 'B':
			case 'D':
			case 'V':
			case 'N':
			return 1;
		}
		case 'T':
		switch(c2)
		{
			default:return 0;
			case 'T':
			case 'U':
			case 'Y':
			case 'K':
			case 'W':
			case 'B':
			case 'D':
			case 'H':
			case 'N':
			return 1;
		}
		case 'U':
		switch(c2)
		{
			default:return 0;
			case 'U':
			case 'T':
			case 'Y':
			case 'K':
			case 'W':
			case 'B':
			case 'D':
			case 'H':
			case 'N':
			return 1;
		}
		case 'R':
		switch(c2)
		{
			default:return 0;
			case 'R':
			case 'G':
			case 'A':
			return 1;
		}
		case 'Y':
		switch(c2)
		{
			default:return 0;
			case 'Y':
			case 'T':
			case 'U':
			case 'C':
			return 1;
		}
		case 'K':
		switch(c2)
		{
			default:return 0;
			case 'K':
			case 'G':
			case 'T':
			case 'U':
			return 1;
		}
		case 'M':
		switch(c2)
		{
			default:return 0;
			case 'M':
			case 'A':
			case 'C':
			return 1;
		}
		case 'S':
		switch(c2)
		{
			default:return 0;
			case 'S':
			case 'G':
			case 'C':
			return 1;
		}
		case 'W':
		switch(c2)
		{
			default:return 0;
			case 'W':
			case 'A':
			case 'T':
			case 'U':
			return 1;
		}
		case 'B':
		switch(c2)
		{
			default:return 1;
			case 'A':
			return 0;
		}
		case 'D':
		switch(c2)
		{
			default:return 1;
			case 'C':
			return 0;
		}
		case 'H':
		switch(c2)
		{
			default:return 1;
			case 'G':
			return 0;
		}
		case 'V':
		switch(c2)
		{
			default:return 1;
			case 'T':
			case 'U':
			return 0;
		}
		case 'N':
		return 1;
		default:return 0;
	}

could it be that not freeing the allocated space is causing the system to
believe there has been an error? (the space is freed when the file writing
portion is active and there is STILL a problem even in that case unless there
was allocated space that I forgot to free, which is possible).
Last edited on
I appreciate any help that is offered. (I recommend checking the analysis algorithm, file writing portion, and backcheck subroutine for mistakes that would make the OS barf, as that is where I will be looking next).

I use eclipse to write the code, but I compile using the command line and minGW distribution of GCC.

-edit: I tried replacing the following line in the analysis algorithm
flags=backcheck(head,curr,&num1,&num2,cursize,maxsize);
I commented it and put the following
1
2
//flags=backcheck(head,curr,&num1,&num2,cursize,maxsize);
flags=1;

It STILL runs into a problem, so it does not appear that the backcheck subroutine is responsible.

-edit2
just noticed I forgot to test my malloc operations to see if they failed, after I fixed that, the error still occurs.

also to be clear, I'm not sure WHAT the error is, it says unhandled win32 exception occurred in compare.exe[3920] when it opens a visual studio just-in-time debugger query.

It appears to be something in the analysis portion thats causing the trouble.
Last edited on
Most (almost all) of your code is C, as opposed to C++. Instead of using malloc/free, you should use new/delete. printf should be replaced with cout etc etc.

I would run this through a debugger. You can load GDB into your eclipse IDE and use that. I would also start putting some exception handling around your code to catch problems like you are having. You can disable the exception handling with #ifdef once your code is actually functioning.

The fact it is working on 1 OS, but not another usually means you have assigned an insufficient amount of memory (or not assigned any) and it's having issues writing to it on diff OS's. It may work (by luck) on one OS, but not another.

Keep in mind that this program will be doing DNA sequence comparisons; functionality and speed are far more important than easily written code.


Well written code leads to increased functionality and speed, as well as reduced bugs. You should develop your code first with readibility and realiability as the core fundamentals. Once you have achieved this, then you should begin optmising the code. You shouldn't optimise as you code, this leads to many many problems.

http://www.devx.com/go-parallel/Article/33534
Best Practices for Developing and Optimizing Threaded Applications, Part 1
Last edited on
you mean i should be using iostream instead of cstdio right? do you mind pointing me to a good tutorial for iostream? looking at its operators makes me dizzy since it reminds me of bit shifting operations.

-edit I think I might need fstream also.
Last edited on
I attempted to switch to using new/delete instead of malloc/free and i'm running into this compiler error now:

C:\Source\compare\src>gcc compare.cpp -o compare.exe
C:\DOCUME~1\goochmi\LOCALS~1\Temp/ccQandQS.o:compare.cpp:(.text+0x57d): undefine
d reference to `operator new[](unsigned int)'
C:\DOCUME~1\goochmi\LOCALS~1\Temp/ccQandQS.o:compare.cpp:(.text+0x6b3): undefine
d reference to `operator new[](unsigned int)'
C:\DOCUME~1\goochmi\LOCALS~1\Temp/ccQandQS.o:compare.cpp:(.text+0x750): undefine
d reference to `operator new[](unsigned int)'
C:\DOCUME~1\goochmi\LOCALS~1\Temp/ccQandQS.o:compare.cpp:(.text+0x76a): undefine
d reference to `operator new[](unsigned int)'
C:\DOCUME~1\goochmi\LOCALS~1\Temp/ccQandQS.o:compare.cpp:(.text+0x8c9): undefine
d reference to `operator new(unsigned int)'
collect2: ld returned 1 exit status
Last edited on
use g++ not gcc. g++ is the c++ compiler.

http://www.cplusplus.com/doc/tutorial/files.html

That should give you a good primer on C++ IOStreams.

Just for reference, I have built a spatial abundance model for modelling the abundance of things (fish primarily, but also water, possums etc). This has to do an MCMC chain with 1mil-10million iterations. This requires hundreds of millions of calculations to do 1 iteration. So my code has to be extremely optmised. It's all done in OO C++ with no problems.

Did you try using gdb and try/catch statements to find where the error is occuring?
I hadn't built in any try/catch statements yet, as I wasn't sure which functions were capable of throwing exceptions.
Anything can throw an exception. It's just an easy way to isolate your code. I take it your a scientist primarily?
I'm currently working in a microbiology lab in a summer research internship. I'm interested in bioinformatics and I do have a little bit of experience with programming, but only what I've done in class and on my own. Which has included java, C, and assembly. anything in C++ that is different from syntax in java I have to learn by looking it up or getting advice.

anyway the new/delete has replaced the malloc/free and I rewrote the node structure into a class with constructors. All I need to do now is learn IOstream and I think things will be converted mostly over into c++. (I prefer to use char arrays in this case rather than strings, since I'm not really using the functions provided by the string class, and these sequences could become quite large.
Size won't be a problem.

Just looking at your node code, you can greatly simplify it using the STL (Standard Template Library)

e.g.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <vector>
#include <iostream>

using namespace std;

struct element 
{
   int start1,start2;
};

int main() {
 element e1 = {0};
 element e2 = {0};

 e1.start1 = 10;
 e2.start2 = 5;

 vector<element> elementList;
 elementList.push_back(e1);
 elementList.push_back(e2);

 cout << "you have " << elementList.size() << " elements" << endl;

 vector<element>::iterator vPtr = elementList.begin();
 while (vPtr != elementList.end()) { 
  cout << "Element: Start1 = " << (*vPtr).start1 << " / Start2 = " << (*vPtr).start2 << endl;
  vPtr++;
 } 

 return 0;
}


Microbiology, cool :D I work for marine, water and atmosphere research company. I am a software developer who assists the scientists in development.
Last edited on
The following code is what I turned my struct into. Is this less efficient than using a re-sizable array like a vector?. This program is probably going to be iterated over a list of files so I prefer to avoid any costs of resize operations.
I have taken 2 intro Computer programming courses (both were in JAVA) and a computer architecture course that taught us to use some basic ANSI C and assembly. (they also taught us floating point standards, though I have forgotten most of that).

1
2
3
4
5
6
7
8
class node 
{
public:
	int start1,start2;
	node * next;
	node(void) : start1(0),start2(0),next(NULL){}
	node(int s1,int s2) : start1(s1),start2(s2),next(NULL){}
};
Last edited on
If you used dynamic allocation then you'd be ok. As it'd only have to re-allocate for the size of your pointers.

Or you can reserve enough memory.
http://www.cplusplus.com/reference/stl/vector/reserve.html

Which part of my "node code" do you expect vectors to simplify, so I can look and see if I want to make the switch, currently its linked-list format.

also I dont really understand what the "::iterator vPtr" part of the code does.
1
2
3
4
5
vector<element>::iterator vPtr = elementList.begin();
 while (vPtr != elementList.end()) { 
  cout << "Element: Start1 = " << (*vPtr).start1 << " / Start2 = " << (*vPtr).start2 << endl;
  vPtr++;
 } 
With your node code, you have to manage the attaching and detaching of the nodes yourself. A Vector is essentially a double-linked list that can be accessed like an array.

An iterator, is a pointer. It can point to a record in your vector. By incrementing or decrementing it you can point to a different one.

You can also do cout << elementList[0].start1 << endl; etc. There is really very little need to write your own stacks, queues, lists etc when you can use the STL.
I also wrote a program (permutate.cpp) that takes in a text file and recombines filenames and lines of text to run another program through system(); commands, enabling me to do batch runs of this program (I made it extensible to any program that takes 2 filename input arguments and one output argument, though it currently puts all setting modifiers before filenames, which may not be acceptable for other programs. ) I just finished completely moving that program into new/delete and iostream functions and it doesn't use cstdio at all now. I'm getting to work changing this comparison program now, after that I'll try to tackle this strange bug behavior and post an update if I figure out what it is or not.

I'm still using my own node class at the moment, I may consider switching to vector later, but its really not a big deal as long as I handle the linked list operations properly.
Last edited on
I think I might have discovered a clue to what is causing the strangeness. After rewriting the program into what seems to me to be proper c++, an interesting thing happened.

When i run this command:
C:\Source\compare\src>compare.exe -m 10 -M 15 -i seq3.txt -j seq2.seq -o output.txt

The program completes happily.

When I run THIS command:
C:\Source\compare\src>compare.exe -m 10 -M 15 -i seq3.txt -j seq2.seq -o compare_seq1_and_seq2_results.txt

Windows barks at me about an unhandled win32 exception.

it seems something to do with the filename length is causing problems on my machine that didn't matter at all to the windows 2000 machine.

--edit:
It maxes out at 20 characters(19 if you don't count the null-terminating the string);

-edit2
also it appears the error is occuring BEFORE the file gets to the file.open(filename,mode) operation, it seems this is a command line parameter size limitation. (I haven't reached the maximum command string limit, but the parameter might be over some parameter limit)

-edit3
My other program (permutate.cpp) seems perfectly capable of taking a filename argument larger than the ones causing problems for compare.cpp, so its not a command line issue, theres probably something wrong in how i'm handling the arguments, i'm checking it now.

-edit 4
I wrote a method that copies the argument character by character into an internal string so as not to risk any accidental write operations to the arguments and cause an error, but it still runs into an error.

-edit 5
just realized that the arguments aren't going anywhere, rewrote the code to use the arguments where they're needed by saving the location of the relevant inputs instead of copying them out, which seemed to be causing windows to think there was an access violation. It seems to be working so far.
Last edited on
Topic archived. No new replies allowed.