Best way to initialize class that needs large amount of supporting data

May 1, 2011 at 5:38am
Hello,

I am writing a class library in which several classes rely on up to 200 MB of supporting polynomial coefficients and lookup table data. I don't want to read this much data each time a new object is instantiated so I have declared a static member vector of the appropriate data type for each class:

1
2
3
4
5
6
7
8
9
10
class Test
{
private:

    static std::vector<mydatatype> ms_vData;

    ...
}

std::vector<mydatatype> Test::ms_vData;


I am tinkering with two different ways to initialize the data, which will take place at program startup.

1. Use a static function to initialize the data in advance, without having to instantiate an object:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Test
{
public:

    static bool Init();
    ...
}

bool Test::Init()
{
    ms_vData.resize(somenumber);
    while(true)
    {
        ms_vData.push_back(somedata);
    }
    ....
}

main()
{
    bool result = Test::Init();
    ....
}


2. Add a static member instance counter that calls a non-static version of Init() when the first object is instantiated, and clears the vector when the last object is destroyed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class Test
{
private:

    static std::vector<mydatatype> ms_vData;
    static unsigned int ms_nNumInstances;

    ...
}

std::vector<mydatatype> Test::ms_vData;
unsigned int Test::ms_nNumInstances = 0;

Test::Test()
{
    if(0 == ms_NumInstances)
    {
        Init();
    }
    ++ms_nNumInstances;
    ....
}

Test::~Test()
{
    --ms_nNumInstances;
    if (0 == ms_nNumInstances)
    {
        ms_vData.clear();
    }
}


Does anyone have a preference for one method over the other? I would like to use the first (static function) version but I don't want to call another static function just to clear the vector before shutdown. In comparison, the second version's destructor elegantly handles vector cleanup.

Or, is it safe to just allow the program to clean up the vector as it shuts down, without having to explicitly clear it myself?


May 1, 2011 at 5:43am
closed account (3hM2Nwbp)
I would go with the first approach you have posted, the operating system will should, if it's a reasonably modern OS, automagically clean up any memory that your program didn't on its own.
Last edited on May 1, 2011 at 5:44am
May 1, 2011 at 5:02pm
Thanks Luc.

I'm writing under Windows 7 but I can't guarantee that the program will always be installed on an up-to-date machine.

If I were to write a simple test program that initialized a large vector and then terminated, how could I determine if it is properly cleaning up after itself? For example, would any problems show up in Process Explorer, or would I need something more sophisticated? My IDE is the LGPL version of Qt Creator.

Frank
Last edited on May 1, 2011 at 5:03pm
May 1, 2011 at 5:27pm
I suggest that you look at boost's smart pointers:

http://www.codeproject.com/KB/stl/boostsmartptr.aspx

If you want to roll out your reference counting, you need to watch out for copy constructors, assignment operators, and other pitfalls, if you want your class to work properly under general situations.

Another approach which may work in your case is lazy initialization in combination with a singleton (if you're concerned about eating up 200MB of the same memory for each instance).

Also, if the data is not already in a binary format, converting it to a binary format where you can read the data in blocks (vs OO streaming/parsing using small objects) could save you a TON of time. In fact, it could be SO FAST that you don't care about optimizing for speed any more. You will see the most benefit from this method if you have a gazillion small variable sized objects that can be organized into blocks of fixed-size objects. Then you use fread/fwrite with pointers to chunk in your data. In many, many cases, FILE I/O is the bottleneck. Using chunking/binary files gets around this bottleneck.
May 2, 2011 at 4:28pm
@kfmfe04

Thanks for the advice. I've never written a singleton before but I'll look into the idea (I am aware of the basic pattern).

I totally agree with your binary file comment. All but one group of files are (or will be) binary, and the ones that will remain ASCII are small and designed to be user maintainable.
May 2, 2011 at 6:47pm
If you ensure that each vector element is an automatic variable, then the cleanup of the memory should be handled by the destructor of the vector, and of the 'myData' objects, as far as I'm aware.
Topic archived. No new replies allowed.