Multimap Memory problem??

I'm trying to read a flat file, load data onto a structure and then insert the structured data into the std::multimap.

The problem is that the input data file is a very large file.. about 30gig in size.
I'm iterating through this file line-by-line but I'm only been able to read the first 10,000,000 lines before the program aborts!

Any assistance to overcome this problem would be highly appreciated.

Please, see the following minimal code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
#include <iostream>
#include <iomanip>
#include <map>
#include <fstream>
#include <string>
//#include <unistd.h>
#include "common.h"

using namespace std;

class Record
{
   public:
     string seq;
     string t_type;
     string s_id;
     string h_id;
     string i_id;
     string s_cnt;
     string s_grp;
     string s_nbr;
     string s_score;
     string h_cnt;
     string h_grp;
     string h_nbr;
     string h_score;
     string i_cnt;
     string i_grp;
     string i_nbr;
     string i_score;

     Record::Record()
     {
        (this->seq)        ="";
	(this->t_type)	="";
        (this->s_id)	="";
        (this->h_id)	="";
        (this->i_id)        ="";
        (this->s_cnt)	="";
        (this->s_grp)	="";
        (this->s_nbr)	="";
        (this->s_score)	="";
        (this->h_cnt)	="";
        (this->h_grp)	="";
        (this->h_nbr)	="";
        (this->h_score)	="";
        (this->i_cnt)	="";
        (this->i_grp)	="";
        (this->i_nbr)	="";
        (this->i_score)	="";
     };

     //virtual Record::~Record(){}
};

void process ()
{
   long record_count (0);
   bool first    (true);
   string buffer ("");
   string inFile ("/home/daula/projects/test/new_duper_test_output.dat");
   multimap < const string, Record * > myMap;
   ifstream inputFile ( inFile.c_str() );
   
   while (getline(inputFile, buffer))
   {
     Record  * temp_rec = new Record;

     temp_rec->seq    = trim(buffer.substr( 0,  26));
     temp_rec->t_type = trim(buffer.substr( 26,  1));
     temp_rec->s_id   = trim(buffer.substr( 27, 12));
     temp_rec->h_id   = trim(buffer.substr( 39, 12));
     temp_rec->i_id   = trim(buffer.substr( 51, 12));
     temp_rec->s_cnt  = trim(buffer.substr( 63,  5));
     temp_rec->s_grp  = trim(buffer.substr( 68, 10));
     temp_rec->s_nbr  = trim(buffer.substr( 78,  3));
     temp_rec->s_score= trim(buffer.substr( 81,  3));
     temp_rec->h_cnt  = trim(buffer.substr( 84,  5));
     temp_rec->h_grp  = trim(buffer.substr( 89, 10));
     temp_rec->h_nbr  = trim(buffer.substr( 99,  3));
     temp_rec->h_score= trim(buffer.substr( 102, 3));
     temp_rec->i_cnt  = trim(buffer.substr( 105, 5));
     temp_rec->i_grp  = trim(buffer.substr( 110,10));
     temp_rec->i_nbr  = trim(buffer.substr( 120, 3));
     temp_rec->i_score= trim(buffer.substr( 123, 3));

     myMap.insert ( std::make_pair( temp_rec->s_grp, temp_rec));
	 
     if ( (record_count%1000000) == 0 && !first )
     {
         cout << "Record count is: " << record_count << endl;
     }

     delete temp_rec;
     ++record_count;
     first = false;
   }
}

int main ()
{
   process ();
   return 0;
}


output:
Record count is: 1000000
Record count is: 2000000
Record count is: 3000000
Record count is: 4000000
.
.
.
Record count is: 63000000
Record count is: 64000000
Record count is: 65000000
Record count is: 66000000
Aborted

By the way, I'm compiling and running this program on a 32 bit linux box

Thanks alot

-D
Last edited on
I would guess that you probably do not always need all of those char's allocated to store your values; why not parse the record, and store the fields in STL strings? Depending on your data, this should save some memory usage.
10M*((126 bytes)+(17 pointers, or 4*17 bytes)=1.806 GiB
(Overhead from multimap not included in the calculation.)

There's no way all that data is going to fit. Try rethinking your logic so that the entire file doesn't need to be loaded.
Last edited on
I'm thinking probably process() based of the key. That is, only insert records that shares the same key to the map at a time.
BTW, this map (produced by the process routine) is further manipulated and the entire manipulation is determined from the key

Suggestions and/ or thoughts?
On a 32-bit system the maximum theoretical amount of addressable memory available to a program is 4GB. Trying to load 30GB? Nope, never, regardless of how much swap you have.

What about inserting into multiple multimaps and then merge them?
How can I deallocate the memory for instance from the myMap in the code above?
What about inserting into multiple multimaps and then merge them?
What about it?

How can I deallocate the memory for instance from the myMap in the code above?
myMap.clear()? Although you'll first have to iterate the map (with an iterator) to free each value.
By the way, I just took a better look at your code, and it sucks:
1
2
temp_rec->seq    = const_cast<char *>(t01.data());
//... 

Why exactly are you initializing the members to arrays in the heap?
 
By the way, I just took a better look at your code, and it sucks:

Thanks for critique'ing my code ;-)

It was just something that I had put together in hurry as I assimilate the requirements
Also, it tell you I'm not the best of the coder in the world, one of the reasons I'm here seeking guidance and assistance from some good people like yourself.

If you have a minute, you can have another look at it... I updated and improved a little.

Thanks again

-D
Topic archived. No new replies allowed.