corrupted double-linked list error

I am building a hashmap using "unorderd_map".
key will be unique id and its corresponding value will be char*.

A file got lines for ids and
following multiple lines will be its value (multiple lines of string).

So, I read the file line by line using "ifstream".
After reading a id line starting with "<", I read following lines until the next id line.
Initial char array is built for the first line that it's concatenated of previous lines and current line for the following lines.

EXAMPLE:

<id1
hellow
Iam
justin

Hashmap should be like this:
Key value
id1 hellowIamjustin

I prefer char array over string because I will read a big text file that each value will be very long string.

My program runs good until I got "corrupted double-linked list" error after some successful line readings.

The program crushes in "apppendString" function after "cout<<"append6"" point.

Here's the code:

"trans.h"
#include <string>
//#define SIZE 10000000
using namespace std;


class trans {
public :
char *trid;
char *seqs;
trans(char *_id);
//void apppendString(string _s);
};


test.cpp (main)
#include<stdio.h>
#include <string>
//#include<conio.h>
#include<iostream>
#include<cstring>
#include<stdlib.h>
#include "trans.h"
#include <fstream>
#include <unordered_map>

using namespace std;
typedef unordered_map<char* ,trans*> transcripts;
transcripts transeq;

void apppendString(trans *_one, string _s) {
//cout << "append1: " << _s << endl;
//cout << "append11: " << strlen(_one->seqs) << " : "<<_s.length() << endl;
int bfsize = strlen(_one->seqs) + _s.length() + 1;
char * buffer = new char[ bfsize ];
//cout << "append2: " << _s.length() << " "<< strlen(buffer) << " "<< bfsize<< endl;
strcpy(buffer, _one->seqs);
//delete[] _one->seqs;
//cout << "append3: " <<strlen(buffer) << endl;
strcat(buffer, _s.c_str());
//cout << "append4: " <<strlen(buffer) << endl;
delete[] _one->seqs;
_one->seqs = 0;
int seqsize = strlen(buffer) + 1;
_one->seqs = new char[seqsize];
cout << "append5: " << strlen(_one->seqs) << " "<<strlen(buffer) << endl;
strcpy(_one->seqs, buffer);
cout << "append6: " << strlen(_one->seqs) << " "<<strlen(buffer) << endl;
delete[] buffer;
buffer = 0;
cout << "append7: " << strlen(_one->seqs) << endl;
}

void readTranscriptome(const char* _inputfile) {
ifstream inf(_inputfile);
if (!inf) {
cout << "Cannot open file.";
exit(1);
}
int trcnt = 0;
std::string str;
getline(inf, str);
while (!inf.eof()) {
//cout << str << endl;
if (str[0] == '>') { //for id line
trcnt++;
//cout << str << endl;
char *dup = new char[str.length()+1];
strcpy(dup, str.c_str());
//strdup(str.c_str());
char *pch;
pch = strtok(dup, "\t");
//while (pch != NULL)
//{
//cout << "token1: "<< pch << endl;
// pch = strtok(NULL, "\t");
//}
char *pch1=new char[strlen(pch)-1];
strncpy(pch1,pch+1,strlen(pch));
cout << "token2: " << pch1 << endl;
delete[] dup;
//cout << "check1: " << str << endl;
getline(inf, str);
cout << "check2: " << pch1 << endl;
trans *run99=new trans(pch1);
//cout << "check3: " << str << endl;
int cnt = 0;
while (!inf.eof() && str[0] != '>' ) { //until next id line
cout << "check4: "<< str << endl;
if (cnt==0) {
run99->seqs = new char[str.length()+1];
strcpy(run99->seqs, str.c_str());
}
else {
apppendString(run99, str);
}

cout <<"check5: "<< str << endl;
cnt++;
getline(inf, str);
}//until end of file or thenext id line

transeq.insert(transcripts::value_type(pch1,run99));
cout << "check6: " << pch1<< ": "<< strlen(run99->seqs)<<" : "<<trcnt << endl;
//cout << endl;
}//if id line get id only for hashmap
//getline(inf, str);
}
inf.close();

}

void checkMap() {

for (transcripts::iterator it = transeq.begin(); it != transeq.end(); ++it) {
cout << " [" << it->first << ", " << strlen(it->second->seqs) << "]" << endl;
cout << it->second->seqs << endl;
}

}

int main()
{

/*trans run99("testid23423423","143535642763678");
apppendString(run99," modify!");
cout<<run99.seqs<<endl;
*/
string srcdir = "/data/rnafusion/challenge/data/trinity_sim13_core16/";
cout << "Reading "<<srcdir + "Trinity.fasta" << endl;
readTranscriptome((srcdir+"Trinity.fasta").c_str());
//checkMap();
return 0;
}
I prefer char array over string because I will read a big text file that each value will be very long string.

In my book, that's an argument for std::string over char array, especially since your code demonstrates a marked lack of familiarity with managing memory manually.

typedef unordered_map<char* ,trans*> transcripts;
You don't want to have a char* as your key type without a custom comparison and hash function supplied to the constructor. You particularly don't want a pointer to memory that the map doesn't own and will be deleted before the map is fully populated as you currently have.

Things get much simpler if you use strings:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <iostream>
#include <sstream>
#include <string>
#include <unordered_map>

std::istringstream in(
R"(<id1
hellow
Iam
justin
<id2
yo
iam
jesus
)");

using table_type = std::unordered_map<std::string, std::string>;

void checked_insert(table_type& table, std::string id, const std::string value) {
    auto ret = table.insert(std::make_pair(std::move(id), std::move(value)));

    if (!ret.second)
        std::cerr << "Attempted to insert duplicate key " << id << "\nInsertion ignored.\n";
}

int main() {
    table_type table;

    char ch;
    while (in >> ch && ch == '<') {
        std::string id, value_token;
        if (in >> id >> value_token) {
            std::string value(value_token);
            while (in >> std::ws && in.peek() != '<' && in >> value_token)
                value += value_token;

            checked_insert(table, id, value);
        } else {
            std::cerr << "Unexpected format in input stream.\n";
            std::cerr << "Terminating input extraction.\n";
            break;
        }
    }

    std::cout << "Key\tValue\n";
    for (const auto& element : table)
        std::cout << element.first << '\t' << element.second << '\n';
}


As a side note, the subject of this post is completely inaccurate. There is no double linked list here.
@cire: v nicely done! proposing a minor edit for line 35:
value += " " + value_token;
Thanks a lot!

I got one more question.

I have very big text file (600G) that has 50000 ids and a long string for each id.

In other words, I need to keep 50000 very long strings for 50000 ids.

There will be pretty good overload for long strings if I use string type.

Do you think which one is mor efficient "string" or "char*" for value in the map?

Thanks!
Last edited on
Is there a reason why you don't use a database for such a big file ?
Do you think which one is more efficient "string" or "char*" for value in the map?

You can essentially assume that there is no performance difference between std::string and raw arrays. Until you have evidence (i.e., you profiled your code) that the internals of std::string itself are the cause of performance issues, use the standard strings, because standard strings are simpler.

Rule 0 of performance optimization:
Don't fix performance problems until you actually have performance problems.

If you have performance problems, target algorithmic complexity in an evidence-based manner. Use a profiler to speed up only the slow bits of your code. It's only in very rare cases that you will need to perform micro-optimizations.

You should consider using a database system.
Last edited on
Topic archived. No new replies allowed.