Unusual Slowness in Processing Vector
Jan 19, 2009 at 2:25pm UTC
Dear all,
With this small dataset (mydata.txt)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 AAAAA 21 21 21 21 21
1 AAAAC 9 18 12 18 15
1 AAATG 7 9 18 18 15
1 AACTA 17 9 12 15 18
1 AATGA 18 15 18 15 18
1 ACACT 17 18 17 15 12
1 ACGAT 18 8 12 18 15
1 ACTAC 15 8 12 12 18
1 ACTTA 15 6 18 6 18
1 ACTTC 21 21 21 21 21
1 CAAGA 18 15 17 9 18
1 CCCCC 21 21 21 21 21
1 CTACA 21 21 21 21 21
1 TACAC 12 21 12 21 21
1 TGATA 15 9 18 18 15
2 TGCTC 18 8 12 18 13.5
1 TTATA 18 18 18 18 18
1 TTCCC 21 12 21 21 21
1 TTCTC 18 7 9 18 12
1 TTCTG 21 21 21 21 21
I am wondering why this code of mine run extraordinarily slow.
In particular the "id2tagnum" function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
using namespace std;
std::pair<int ,int > number_of_lines( char * myfn ) {
int lineno = 0;
int tagLength;
string line;
ifstream myfilename (myfn);
if (myfilename.is_open())
{
while (getline(myfilename,line) )
{
stringstream ss(line);
string DNA;
ss >> DNA >> DNA;
tagLength = DNA.size();
lineno += 1;
}
myfilename.close();
}
return std::make_pair(lineno,tagLength);
}
void prn_vec(vector<int >& arg) {
for ( int n=0; n<arg.size() ; n++ ) {
cout << arg[n];
}
}
vector <int > id2tagnum(int id, int tgl ) {
// replicate
vector <int > Rep;
Rep.assign(tgl,1);
// this also slow
/*for ( int k=0 ; k<tgl ; k++ ) {
int nv = id -1;
Rep.push_back(nv);
}*/
prn_vec(Rep);
cout << endl;
}
// End of functions
//
int main ( int arg_count, char *arg_vec[] ) {
if (arg_count !=2 ) {
cerr << "expected one argument" << endl;
return EXIT_FAILURE;
}
pair <int ,int > tuple = number_of_lines(arg_vec[1]);
cout << "Number of Tags = " << tuple.first << endl;
cout << "Tag lengh = " << tuple.second << endl << endl;
int taglen = tuple.second;
string line;
ifstream myfile (arg_vec[1]);
int idval = 0;
if (myfile.is_open())
{
while (getline(myfile,line) )
{
idval += 1;
stringstream ss(line);
string dummy;
int tableEntry;
ss >> tableEntry >> dummy;
cout << "Tagno : " << idval << ", Rawcount: " << tableEntry << endl;
// THIS LINE IS VERY2 SLOW
vector <int > numTag = id2tagnum(idval,taglen);
}
myfile.close();
}
else cout << "Unable to open file" ;
return 0;
}
Jan 19, 2009 at 3:40pm UTC
std::vector uses contiguous space for member elements and reserve is recommended if you know in advance the size it may grow.
Use pass-by-reference instead of returning a value to stay away from extra copy.
How slow is it? With the data you listed I can't see how slow it can be even without my suggestions.
Jiryih Tsaur
Jan 20, 2009 at 12:41am UTC
id2tagnum doesn't return anything although in your
vector <int > numTag = id2tagnum(idval,taglen);
you seem to expect a return value
'tagLength ' seems just to be assigned last read entry, seems rather pointless. Was your intention to get max tag length?
using stringstream is a heavy weight way of parsing, i think you could do it faster by just using string member functions, especially in this context since you can treat all values as strings and don't need to convert - if I understood your code correctly.
in your id2tagnum I am a bit puzzled by
Why do you this? You don't seem to create anything useful in there since idval is just a counter and taglen is invariant.
actually the whole id2tagnum seems redundant.
Topic archived. No new replies allowed.