Ask for suggestion on a data structure problem

Hi All,

For each element in data1, I need to figure out what elements in data2 are related to it. Also, for each element in data2, I need to figure out what elements in data1 are related to it. Therefore, I setup a mutual data structure as you can see below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
01	class data1 {
02	  // class variables
03	  int id;
04	  .....
05	 
06	  map<string, float> m_to_data2; // string is the keyword of data2, float recorded the reference info when data1 and data2 are related.
07	};
08	 
09	class data2 {
10	  // class variables
11	  .....
12	  list <int> src_id;  // int is the id of data1
13	};
14	 
15	map<string, data2 *> map_2;
16	map<int, data1 *> map_1;



Then I parse file and fill the map_1 and map_2. I found:
(1) the total memory usage after setting up the mutual-linked two maps: 498.7M.
(2) without set the link from data2 to data1 (not fill list <int> src_id), memory usage: 392.7M.
(3) Without fill map_1, without fill list <int> src_id in data2, memory usage: 182.0M
(4) Without fill map_1, fill list <int> src_id with ids of data1, memory usage: 289.7M
(5) without fill map<string, float> m_to_data2, memory usage: 290.0M

(6) The size of map_1: 77737
(7) The size of map_2: 1830009
(8) The size of map<string, float> m_to_data2 for each element of map_1 in the range of 3 - 17522
(9) The size of list <int> src_id for each element of map_2 in the range of 1- 1377

I need to reduce the memory usage after setting up the mutual-linked maps (ideally less than 200M, currently 498M as you can see above). I was trying to token the string (keyword of data2) to int by setting up an extra map <string, int>, since int needs less memory than string, but it may not help much since I need extra memory for the map <string, int>. Any suggestions?

Your comments/suggestions are highly appreciated.
Thank you very much.
This may be worth a look:

http://code.google.com/p/google-sparsehash/

Most likely, you will need to find some solution where you trade of time (increase time) for space (decrease memory). In the worst case scenario, you have have to do some disk caching. But before you go down that path, see if you can find some implementation of map which uses minimal memory. You may also look into in memory compression or other algorithms dependent on the statistical distribution profile of the data you are actually storing. Also consider how you might just get more memory (it's cheap these days).
Topic archived. No new replies allowed.