What would you do in this situation?

I am forced in a situation where I must reinvent the wheel due to limits beyond my control. I have a data structure to hold information, anything I want at all. The problem is that when the software using my DLL plugin needs to save information, the data structure, in its entirety, is dumped to a file and, later when the application is started up again and need to load data, is loaded back in again without me knowing except for a function called for me to update old versions of this structure to newer ones. This is a problem - pointers are not an option because they would be invalid once the file is loaded back. Normal data is fine, such as limited length character arrays and normal integral and floating point data types. However, if I would want to start using any variably sized data, I must first check to see if the existing data is the same size, and, if it isn't, I then have to request more or less space for the structure. Then I can copy the new data.

This is overkill already, but not too bad. The real nightmare comes when I need to have two ore more variably sized data fields! This is absolute hell. First I have to store offsets for the data after the first data set. Then, when new data needs to be inputted, if the data is not the same size, I have to copy all the data after it into temporary holding areas, request the change in size, copy the new data, and copy the data that was after it back into place, and adjust the offsets. After the nightmare is over, you wake up to code like this:
http://paste.pocoo.org/show/437202/

This is beyond overkill - it took me 3 rewrites to get down a good reliable system for this that worked every time without any flaws. Now that I understand how to correctly manage this horrid nightmare, I want to simplify it with some sort of of manager function(s) and/or class(es). I can't figure out how I would want to do this, how I would store the information about the data, etc. Even with this simplifying things some, it would still be cumbersome to implement things. I want your opinion on what I should do to solve this problem - and remember, these circumstances are real and out of my control. It is simply a design flaw in the software that uses my DLL plugin, and I have no control over this software whatsoever.

All I would need are simple suggestions as to the right thing to do - Classes? Functions? Both? How would I make it simple and easy to use so I could just make a few simple modifications and be ready for another plugin or more data to put in the structure?

Thank you for reading this long and boring post, or, if it was too long and you didn't read it, thanks for clicking on this topic...
The situation is similar to the case where two processes on the same machine communicating through ipc/shared memory on the same machine OR two processes communicating through some other mechanism on different machines. Further the language of the two processes (in which they have been writtent ) can be separate too. Further machines can have different architecture (big-Endian/Little Endian). To simplify we can expect a byte stream of data, which has to be mapped into a data structure.
Couple of points:
(1) - First of all the object dump in the file ( or the bytes received in the communication) should be written in a proper format( protocol ). It's format should be defined in a way to cater all type of data both fixed size and variable size. If this is not there then we are in for a trouble. For any structure , it should start with size, type, data.
(assume size can be a integer for 4 butes)
simple -- Struct{int, int} -> TotalSize,<int1>,<int> -> 8,<int>,<int> -> 4 bytes+ 4bytes + 4bytes
nested -- StructA{int, int , StructB{int1,int1}} -> TotalSize,<int>,<int>,SizeOfStructB,int,int -> 20,<int>.<int>,8,<int>,<int>.....
-> 4 + 4 + 4 + 4 + 4 + 4 -> 24 bytes
Here the TotalZise also incorporates the extra space taken for storing the SizeOfStructB. It's not mere sizeof(struct).
You can further complicate the format, by storing a type info for primitive data types or nested user defined data types. It will be helpful in many cases.

(2) Since the padding done in different machines are different, hence you should not incorporate the size alterations due to padding in this format, The reader should also consider the case for Big-Endian/Little endian once it is reading 4 bytes of data and putting it into a primitive data type.

3) You may have a ObjDumpManager class -- which maintains a queue of objects.
All your classes after thier full and final construction, should register to the ObjDumpManager and get queued. The queue should be made in such a way so that the most external class is registered first.
e.g. composition
Class A{
B
C
D
primitive data types. like int,char etc...
}
Queue -> A,B,C,D
So that the information of A is written first, B second, C then D. This registration may be done in the relevant constructor. It should register itself first then it should register it's members.

A->objDumper->dump() function will first calculate the total size it has to write

A::dumpSize() {
return B->dumpSize() + C->dumpSize() + D->dumpSize() + sizes of primitive data types in A
}

Now the class should write it's own dumpSize, then call the write calls of all the member objects. For primitive data types, write the content yourself.

You will have to add more complexity for this design, to take care of the life time of the objects ( if it exists or was deleted etc...)

While reading the dump file, You must read it in the constructors of all the classes. All the member variables of a class should be initialized with the help of the dump file. Further In A's constructor when B is being constructed, you must have alternate constructors for A,B,C,D which takes the file pointer, offset. The constructors of all the objects may read it from the file pointer and at relevant offset. Onle One file pointer should be opened and should be passed on to different objects.
To reduce the file traversals (seek calls) disk operation, one can alter the queue such objects are written in a way C/C++ constructs the objects. You can also write the size at the end. OR read the whole file in a large character array once, and after that use this large array to populate your objects. (that may be better option). Also do take care of the fact that vptr in C++ will not be populated from the dump. It's only the raw data. If your scheme of things have pointers to different objects contained in one object, then life is more hell.... The reader will have to create a internal object, populate it from the file, then store it's pointer in the container object. Better not to store these things in the format. Store only raw data -- of the objects in the file.

It's quite complex, but one should take care of two things.
(1) Define the format
(2) Follow the C/C++ language, how and in what sequence the objects are constructed, the format should help the initalization process to help that.
(3) Remember the architecture of the machine while populating the objects.



Both the application using my DLL and my DLL are written in C++
1. I am using variably sized data, not primitive data types with a constant size.
2. Constructors and destructors are not ever called because the memory of this structure is managed by the application, it only knows the size and so for a new structure it initializes it to that size, and I can request changes in size later on. However I cannot get the new size, I have to remember it.
3. Again I do not have to worry about this, the data structure is allocated for me and it will always be allocated correctly, else my DLL is not even called.
I think you have misread my post...or that I explained it in a poor way.
This pattern in general is called "serialization" or "marshalling". Have a look at
http://www.boost.org/doc/libs/1_47_0/libs/serialization/doc/index.html

(you are not the first one)
I too would think to keep a serialized version of the data (that the user application can dump and load at will), which you then extract/convert to a more usable form on demand... The unserialized-with-every-access kind of thing will slow things down a bit, but if you are careful and lucky it should not impact performance much.

You are maintaining a plugin for something? Can you elaborate?
Boost's serialization looks awesome but I'm not sure the best way to use it in my case.

You are maintaining a plugin for something? Can you elaborate?
Yes, you may have heard of the Multimedia Fusion 2 software? It has an extension interface where C++ developers can write DLLs and then the software will load them as extensions, which 'extend' the abilities of the software. Anything that C++ can do, the software can too, if you write an extension.
There are a couple SDKs based off of the official SDK, they are used to correctly write these DLLs to work with the software. The situation I described is one I face during the edit time version the extension - the user can have data for the extension at edit time that will later be transferred to the runtime data structure (in which I can easily use pointers, etc, it is not saved/loaded, I do the saving/loading for that). The saving and loading that the application does is when the user saves or loads their project file for their game or application.
It is hard for me to explain but I have a video demonstrating what I am talking about:
http://www.youtube.com/watch?v=Rmlvs9y8uog
I recommend watching it in fullscreen (my screen has a resolution of 1440 by 900)
Does anyone have any thoughts on what they would do in this situation? I'm not asking for help, just for ideas about how to simplify this in the future.
Topic archived. No new replies allowed.