Unusual Slowness in Processing Vector

Dear all,

With this small dataset (mydata.txt)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
2	AAAAA	21	21	21	21	21
1	AAAAC	9	18	12	18	15
1	AAATG	7	9	18	18	15
1	AACTA	17	9	12	15	18
1	AATGA	18	15	18	15	18
1	ACACT	17	18	17	15	12
1	ACGAT	18	8	12	18	15
1	ACTAC	15	8	12	12	18
1	ACTTA	15	6	18	6	18
1	ACTTC	21	21	21	21	21
1	CAAGA	18	15	17	9	18
1	CCCCC	21	21	21	21	21
1	CTACA	21	21	21	21	21
1	TACAC	12	21	12	21	21
1	TGATA	15	9	18	18	15
2	TGCTC	18	8	12	18	13.5
1	TTATA	18	18	18	18	18
1	TTCCC	21	12	21	21	21
1	TTCTC	18	7	9	18	12
1	TTCTG	21	21	21	21	21


I am wondering why this code of mine run extraordinarily slow.
In particular the "id2tagnum" function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
using namespace std;

std::pair<int,int> number_of_lines( char* myfn ) {

    int lineno = 0;
    int tagLength;

    string line;
    ifstream myfilename (myfn);

    if (myfilename.is_open())
    {
        while (getline(myfilename,line) )
        {
           stringstream ss(line);
           string DNA;
           
           ss >> DNA >> DNA;
           tagLength = DNA.size();

           lineno += 1;
        }
        myfilename.close();
    }


    return std::make_pair(lineno,tagLength); 

}

void prn_vec(vector<int>& arg) {
    for ( int n=0; n<arg.size() ; n++ ) {
        cout << arg[n];
    }
}

vector <int> id2tagnum(int id, int tgl ) {

    // replicate
    vector <int> Rep;
	Rep.assign(tgl,1);
	
	
	// this also slow
    /*for ( int k=0 ; k<tgl ; k++ ) {
        int nv = id -1;
        Rep.push_back(nv);
    }*/
    prn_vec(Rep);
    cout << endl;

}


// End of functions
//



int main  ( int arg_count, char *arg_vec[] ) {

    if (arg_count !=2 ) {
        cerr << "expected one argument" << endl;
        return EXIT_FAILURE;
    }

    pair <int,int> tuple = number_of_lines(arg_vec[1]); 
    cout << "Number of Tags = " << tuple.first << endl;
    cout << "Tag lengh = " << tuple.second << endl << endl;

    int taglen = tuple.second;

    string line;
    ifstream myfile (arg_vec[1]);
    int idval = 0; 

    if (myfile.is_open())
    {
        while (getline(myfile,line) )
        {
                idval += 1;
                stringstream ss(line);
                string dummy;
                int tableEntry;

                ss >> tableEntry >> dummy;
                cout << "Tagno : " << idval << ", Rawcount: " << tableEntry << endl;

                // THIS LINE IS VERY2 SLOW
                vector <int> numTag = id2tagnum(idval,taglen);
                

        }
        myfile.close();
    }

    else cout << "Unable to open file";    
    return 0;
}
std::vector uses contiguous space for member elements and reserve is recommended if you know in advance the size it may grow.

Use pass-by-reference instead of returning a value to stay away from extra copy.

How slow is it? With the data you listed I can't see how slow it can be even without my suggestions.

Jiryih Tsaur
id2tagnum doesn't return anything although in your

 
vector <int> numTag = id2tagnum(idval,taglen);


you seem to expect a return value

'tagLength ' seems just to be assigned last read entry, seems rather pointless. Was your intention to get max tag length?

using stringstream is a heavy weight way of parsing, i think you could do it faster by just using string member functions, especially in this context since you can treat all values as strings and don't need to convert - if I understood your code correctly.

in your id2tagnum I am a bit puzzled by

 
Rep.assign(tgl,1);


Why do you this? You don't seem to create anything useful in there since idval is just a counter and taglen is invariant.

actually the whole id2tagnum seems redundant.




Topic archived. No new replies allowed.