Buffer to store multiple values of different types

Hi all,

I've been working on a dependency graph (like the one found in Autodesk Maya). Every node in the graph derives from a `DependencyNode` class, a node will have a set of input and output plugs, where the value of the output plugs is calculated in a `compute` function which is overridden from the `DependencyNode` class. You can connect output plugs to input plugs of other nodes, thus by chaining together multiple nodes, can create complex behaviors.

Simplified DependencyNode super class header:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <unordered_map>
#include <string>

#include "Plug.h"

class DependencyNode
{
public:
    DependencyNode(std::string name);
    virtual ~DependencyNode();

    virtual void initialize() = 0;
    virtual void compute()=0;

    std::string getName() { return this->mName; }
    void setName(std::string name) { this->mName = name; }

    void addPlug(Plug plug) { this->mPlugs.insert(std::pair<std::string, Plug>(plug->getName(), plug)); }
    Plug& getPlug(std::string name) { return this->mPlugs.find(name)->second; }
private:
	std::string mName;

	std::unordered_map<std::string, Plug> mPlugs;
};


Now for example if you wanted to create a multiply node, you would subclass `DependencyNode`, add three plugs to it as member variables (two inputs, one output) and during the compute function would set the output to be the value of the two inputs multiplied together.

My current issue is each plug can be one of many types, it could store a float, double, a 4x4 matrix or even some kind of OpenGL type as an example. You can also only connect outputs to inputs that both have the same type.

Finally my question: Each node needs to store the data of all its plugs. I know Maya uses some kind of contiguous allocated memory block to do so for its implementation, and I was thinking of allocating a block of memory equivalent to the size of all the plugs value types added together. i.e if there were two floats and a double then the allocated memory size would be `sizeof(float) * 2 + size(double)`. Then each plug would store an offset on where to find its particular data, and what type of data it is. However I quickly ran into problems. Say I allocate memory with `malloc` and receive a void* in return. I can't perform pointer arithmetic on a void* as it's not an object type, nor can I cast the void* to a specific type as the data block could contain any number of types.(Unless I casted to char*, did the arithmetic, then casted to the appropriate type) but this feels really hacky. The other important factor here is speed, as the graph would be evaluated in real time, many many timers per second, with most of the time being taken up in user-defined compute functions, so the back-end of the graph needs to be as fast as possible.

Any ideas on how I could implement such a thing? Thanks!
depending on how you want to approach it, 3 simple ways..

union is a type that lets you swap what a group of bytes represent (it reuses the memory, so you can only use 1 of the types inside it). It sizes to the biggest item inside and may have struct type padding to the nearest word in some configurations.


pointers can do it, of course, just a void pointer that you cast out to the right type..


templates can do it, even a simple 1 data item template class can do it.

and the old hand-rolled memory manager unsigned char * approach can do it. An offshoot of the void pointer idea... eg
unsigned char membuffer[100000];
int *x = (int*)&(membuffer[100]); //stores x inside the byte buffer.

Is one of these an idea you would like to explore? The closest thing to your example is the last one, the 'byte pointer' approach, but this is *very* hands-on and you need to understand what you are doing very, very well to avoid making a mess. This will be quite fast.





Last edited on
Thanks for the ideas!

Well the reason I'm doing C++ at the moment is to kind of get a bit more dirty with the memory management side of things, as I've only ever used memory managed languages.

With the last approach I don't mind making a mess of things as that's normally how I learn best, so I might give that a shot.

Is there anything that's not so obvious I should look out for? Also I wouldn't know the size the buffer needed to be until the end of the initialization function where all the plugs have been created. Should I in that case do:

1
2
3
4
5
6
7
8
unsigned char* membuffer;

// create plugs & calculate size

membuffer = new char[size];

// Then sometime later
delete[] membuffer;
Well the reason I'm doing C++ at the moment is to kind of get a bit more dirty with the memory management side of things, as I've only ever used memory managed languages.


Well in modern C++ you should be moving away from that: Avoid raw pointers, memory management, new and delete. Instead use smart pointers (std::unique_ptr and std::shared_ptr), STL containers all of which use RAII concepts.

The down and dirty language you may be thinking of is is C. C++ and C are quite different animals :+)

With C++ one of the main ideas about polymorphic code is that a pointer to Derived is a valid pointer to Base. This means that one can make functions declared with arguments of pointer to Base, but one sends a parameter which is a pointer to Derived, the compiler calls the correct Derived function. So you could make a Plug Base class and derive from it. Also there are templates, so that allows different types, so you could have a PlugMatrix<Matrix> for example.
no, its all obvious, its just that if you scramble your pointers & offsets you will either exceed your allocated size, or you will damage the data by overwriting a byte that was part of another item, etc.

also be aware that complex types can get weird, like string or vector. It works for them, but if you wrote that buffer to a file and pulled it back on another run on another day, it would crash, because the vector's internal pointers would be scrambled -- it saved the pointer value, but not the values pointed to, and the pointer value is useless after a re-start of the program!


Yes, dynamic memory will work but consider a vector, and consider unsigned char for buffers (there are subtle reasons to avoid signed here that I won't go into right now).
vector<unsigned char> membuffer(size);
pointers work just fine here, too, exact same code:
pointer= (type*)(&membuffer[100])

but the vector will handle the new/delete statements for you cleaner, and you may find it has other features that are handy down the road.


somehow you have to track where your items are located. that is, every item in the buffer needs a pointer (or index into the buffer, either one works) so you can get back to it, or understand where it IS. (these can be hard-coded offsets if you have fixed data, but it sounds like you do not).

remember that when the buffer dies, ALL the variables inside it die too.
Last edited on
I know Maya uses some kind of contiguous allocated memory block to do so for its implementation, and I was thinking of allocating a block of memory equivalent to the size of all the plugs value types added together. i.e if there were two floats and a double then the allocated memory size would be `sizeof(float) * 2 + size(double)`.

Consider instead using a number of blocks of contiguous storage, one block per distinct type. For instance, if you wanted to store 2 floats and a double, your collection would manage two distinct blocks of memory, one for floats, and another for double. This usually offers decent performance characteristics and would simplify implementation, especially regarding alignment issues.

This is (roughly) the approach used by Boost.Polycollection:
https://www.boost.org/doc/libs/master/doc/html/poly_collection.html#poly_collection.introduction

Prefer using std::aligned_storage_t<sizeof (T)> over arrays of unsigned char[sizeof(T)].

Also consider std::any:
http://en.cppreference.com/w/cpp/utility/any
Last edited on
Another thing to consider here is std::function :

http://en.cppreference.com/w/cpp/utility/functional/function

I used some software called Feature Modification Engine (FME) a GIS program, which sounds similar in concept. It had a GUI with boxes and wires. Boxes were either a file or a procedure, and they were joined by wires. There could be multiple file input, and output could go into another procedure, so complex processing could be achieved. So you might be able to make use of std::function to help.
Last edited on
@TheIdeasMan Yeah I get that, but if I only teach myself modern C++ I feel like I'm missing out on a lot. Also instead of PlugMatrix<Matrix> wouldn't it be more like PlugGeneric<Matrix>, and then you could do like PlugGeneric<Float>?

@jonnin yeah each plug would keep track of its own offset, as well as have a pointer to the data.

@mbozzi
Yeah that had crossed my mind, however I would potentially later on implement support for any custom type to be a plug, so either each plug could store its own value or I could use one singe memory buffer. I'll check out the links, thanks.
Yeah I get that, but if I only teach myself modern C++ I feel like I'm missing out on a lot.


Well the choice is yours, I am saying it's bad form to mix C thinking with C++. If you want to think in C, then write in C - there is nothing wrong with that. But consider there might be big advantages of using OOP and RAII. The other thing is that to learn all this effectively isn't trivial, but if you have have experience with OOP elsewhere it may not be so bad.

Just going back to a Node storing Plugs, why wouldn't a Derived Node simply store them in a map like the base class does?

Also instead of PlugMatrix<Matrix> wouldn't it be more like PlugGeneric<Matrix>, and then you could do like PlugGeneric<Float>?


I was thinking there should be a Plug base class with an interface of pure virtual functions. I reckon you would need multiple derived classes to work in a polymorphic fashion, so that Matrix calls Matrix::operator* not any other type multiply. I hadn't thought the template thing through well enough, but there is all kinds of magic that can be achieved with templates, so much so that they and TMP are paradigms of their own.
Yeah I've had a fair bit of experience in Java and Python.

Well I don't think you'd define operators on plugs, as they themselves aren't technically the value, they're connectors that you use to connect nodes together. So for example in Autodesk Maya, you get a 'DataHandle' from a 'DataBlock' by passing in the the plug you want and then it has a bunch of functions like asFloat() asDouble() etc, so it's up to the user to know what type the plug is (which you always do, so it's not too bad) and then you manipulate the data and set it to output plugs.

EDIT:
So I was reading about the aligned_storage class and memory in general, what I came across was this stack exchange question:
https://stackoverflow.com/questions/11386946/whats-the-difference-between-sizeof-and-alignof

Of particular interest was this example of memory alignment:

1
2
3
memory byte    0 1 2 3  4 5 6 7
integer        goooood
                   baaaaaad


If I were to use aligned_storage_t, would I specify an alignment of the largest alignof(type) value that will be stored? Also if I used an unsigned char array, would I have an issue of memory alignment for my data when I go to cast them to the correct type?

Thanks!
Last edited on
If I were to use aligned_storage_t, would I specify an alignment of the largest alignof(type) value that will be stored?

You would specify alignof(std::max_align_t), which is the least strict alignment that works for every object, except overaligned types.

Would I have an issue of memory alignment for my data when I go to cast them to the correct type?

Only if you do something wrong. The purpose of aligned_storage_t is to make alignment requirements in raw storage explicit. You can do this with blocks of unsigned char, but you would need to make sure that every object is constructed at an appropriate boundary for that particular type. .
Last edited on
Thanks, I will give this a shot :)

EDIT:
Okay so I've been attempting to use aligned_storage_t. Failing miserably at the moment.

Could you provide a short example on the usage? There isn't a lot of information on the net regarding it.
Last edited on
Could you provide a short example on the usage?
An object of type std::aligned_storage_t<Len, Align> is a block of memory Len bytes in size aligned to an Align-byte boundary. It's mostly the same as alignas(Align) unsigned char object[Len].

Therefore, an object of type
1
2
using aligned_storage_fundamental =
  std::aligned_storage_t<alignof(std::max_align_t), alignof(std::max_align_t)>;

is a k-byte object aligned to a k-byte boundary. It's aligned suitably for any fundamentally-aligned type.

You can create a contiguous block of those and obtain a section of storage into which you can construct any object that's sufficiently small. We are doing this so that you won't easily end up with a situation where a char (e.g., at byte offset 0) follows a 4-byte int at byte offset 1, where it would be mis-aligned. For example, you might write
1
2
3
4
5
constexpr int blocks = 16; 
aligned_storage_fundamental mem_buffer[blocks];

new(mem_buffer + 0) char { 'h' }; 
new(mem_buffer + 1) float { 2.2f }; // correctly aligned 

instead of
1
2
3
4
unsigned char mem_buffer_2[sizeof mem_buffer];

new(mem_buffer_2 + 0) char { 'h' };
new(mem_buffer_2 + 1) float { 2.2f }; // (probably) incorrectly aligned 
.

Notice that doing this with objects which are not trivially destructible requires manual book-keeping, or if that can't be left to the caller, it requires type-erasure, like std::any, for example. Keeping an actual heterogenous collection is a dubious choice anyways. Not least because it's going to be exceptionally complicated.

In real life you'd just add some indirection. But you're doing this to learn, right?
Last edited on
Topic archived. No new replies allowed.