I am building a hashmap using "unorderd_map".
key will be unique id and its corresponding value will be char*.
A file got lines for ids and
following multiple lines will be its value (multiple lines of string).
So, I read the file line by line using "ifstream".
After reading a id line starting with "<", I read following lines until the next id line.
Initial char array is built for the first line that it's concatenated of previous lines and current line for the following lines.
EXAMPLE:
<id1
hellow
Iam
justin
Hashmap should be like this:
Key value
id1 hellowIamjustin
I prefer char array over string because I will read a big text file that each value will be very long string.
My program runs good until I got "corrupted double-linked list" error after some successful line readings.
The program crushes in "apppendString" function after "cout<<"append6"" point.
Here's the code:
"trans.h"
#include <string>
//#define SIZE 10000000
using namespace std;
class trans {
public :
char *trid;
char *seqs;
trans(char *_id);
//void apppendString(string _s);
};
void readTranscriptome(const char* _inputfile) {
ifstream inf(_inputfile);
if (!inf) {
cout << "Cannot open file.";
exit(1);
}
int trcnt = 0;
std::string str;
getline(inf, str);
while (!inf.eof()) {
//cout << str << endl;
if (str[0] == '>') { //for id line
trcnt++;
//cout << str << endl;
char *dup = new char[str.length()+1];
strcpy(dup, str.c_str());
//strdup(str.c_str());
char *pch;
pch = strtok(dup, "\t");
//while (pch != NULL)
//{
//cout << "token1: "<< pch << endl;
// pch = strtok(NULL, "\t");
//}
char *pch1=new char[strlen(pch)-1];
strncpy(pch1,pch+1,strlen(pch));
cout << "token2: " << pch1 << endl;
delete[] dup;
//cout << "check1: " << str << endl;
getline(inf, str);
cout << "check2: " << pch1 << endl;
trans *run99=new trans(pch1);
//cout << "check3: " << str << endl;
int cnt = 0;
while (!inf.eof() && str[0] != '>' ) { //until next id line
cout << "check4: "<< str << endl;
if (cnt==0) {
run99->seqs = new char[str.length()+1];
strcpy(run99->seqs, str.c_str());
}
else {
apppendString(run99, str);
}
cout <<"check5: "<< str << endl;
cnt++;
getline(inf, str);
}//until end of file or thenext id line
transeq.insert(transcripts::value_type(pch1,run99));
cout << "check6: " << pch1<< ": "<< strlen(run99->seqs)<<" : "<<trcnt << endl;
//cout << endl;
}//if id line get id only for hashmap
//getline(inf, str);
}
inf.close();
I prefer char array over string because I will read a big text file that each value will be very long string.
In my book, that's an argument for std::string over char array, especially since your code demonstrates a marked lack of familiarity with managing memory manually.
typedef unordered_map<char* ,trans*> transcripts;
You don't want to have a char* as your key type without a custom comparison and hash function supplied to the constructor. You particularly don't want a pointer to memory that the map doesn't own and will be deleted before the map is fully populated as you currently have.
Do you think which one is more efficient "string" or "char*" for value in the map?
You can essentially assume that there is no performance difference between std::string and raw arrays. Until you have evidence (i.e., you profiled your code) that the internals of std::string itself are the cause of performance issues, use the standard strings, because standard strings are simpler.
Rule 0 of performance optimization:
Don't fix performance problems until you actually have performance problems.
If you have performance problems, target algorithmic complexity in an evidence-based manner. Use a profiler to speed up only the slow bits of your code. It's only in very rare cases that you will need to perform micro-optimizations.