Multidimensional pointers

Forum

Forum
General C++ Programming
Multidimensional pointers

Multidimensional pointers

I'm writing a class that is a multidimensional array. Number of dimensions (as well as element type) is passed as template argument. The storage is a solid memory block that contains pointers at the beginning in hierarchical manner: at the beginning is an array of pointers (number of those pointers equals first size of the hyper cube), each pointer references to the following arrays of pointers of the "second level" (sizes of those arrays are all equal to the second size of the cube), which are located directly after first array and so on. The data elements are at the end of the memory block. Such structure allows fast access to element using multiple brackets except one inconvenience: I have to manually convert memory block pointer to pointer like T*** where number of stars equals number of dimensions before I can use brackets. It would be very useful to have syntax for declaring such pointer with specified number of stars, so I can return data pointer directly from template without additional conversion every time. Does such syntax exist? I don't raise my hopes much, but still...

hanst99 (2869)

Please use paragraphs. It's really hard to read blocks of text on a monitor.

Also, I don't think we need most of the information you provided here, just ask your question.
And in C++ there is 99% of the time a better alternative to using pointers. 100 if you count smart pointers.

Last edited on

coder777 (8443)

I'd suggest to not using pointer to pointer to pointer to ...

Instead use that solid memory block as it is: 1d. Calculate the index of the wanted field yourself. http://www.cplusplus.com/articles/G8hv0pDG/

thesourcehim (6)

I use multiple pointers because access to elements this way is faster than calculating (calculating is basically a sum of variable length products, which is much slower), otherwise I wouldn't bother writing such template while I have std::vector. I searched Google and it seems that there's no such syntax, so I asked here as a 'last resort'. Well, thank you anyway.

hanst99 (2869)

Actually, I don't know what exactly you are trying to do but probably a vector would be fast enough. If you are going to optimize, optimize bottle necks - what you are trying to do would probably, even IF your access was faster that way(which I doubt by the way), not grant you any considerable speed gain.

coder777 (8443)

if speed matters then I'd recommend offsets. If you limit the array size to 256 you can use 1 byte. That's reduction of n additions (where n is the dimension). Plus you can store the data...

calculating is basically a sum of variable length products, which is much slower

With pointers you have also a multiplication (sizeof(pointer) * index) loading the pointer and calculating the next from there which is not so much faster.

thesourcehim (6)

If you limit the array size to 256 you can use 1 byte.

Not an option, size of each dimension is defined by the user if this class.

With pointers you have also a multiplication (sizeof(pointer) * index) loading the pointer and calculating the next from there which is not so much faster

Pointers are calculated and stored before data in memory during initialization, so accessing element is just several consequential dereferences.

hanst99 (2869)

Dereferencing is also not free. Again, your "optimizations" here are rather unlikely to make any difference in a real world case, unless you are working with systems with very specific requirements.

Disch (13742)

I use multiple pointers because access to elements this way is faster than calculating (calculating is basically a sum of variable length products, which is much slower),

Are you sure about this?

Computers can do computations extremely fast. Doing a little math is nothing.

Hopping around and accessing different blocks of memory is what's slow. If you're dereferencing 3 pointers to get to a final element, that's likely to be slower than calculating a index with some simple math and having only one dereference.

So basically... +1 @ hanst99.

helios (17560)

I agree with Disch. A few additions and multiplications are nothing compared to the cost of hopping all over memory chasing pointers.

thesourcehim (6)

I tested 10 dimensional cube of 32bit integers with size 7 by each dimension on the following system:
Core i7 2000MHz, 8GB RAM 1333 MHz, Fedora 15 x86_64. 10000000 reads from random places of the cube were done using both methods.
Results:
Pointer dereference algorithm time = 4636148 microseconds
Sumproduct algorithm time = 8634137 microseconds

helios (17560)

Without code, your numbers are meaningless.

helios (17560)

#include <iostream>
#include <ctime>
#include <cstdlib>

//!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
//!! WARNING: assumes sizeof(void *)==sizeof(T *) for any T that is a pointer !!
//!! or a basic type.                                                         !!
//!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
struct pointers_10cubic_matrix{
	void *pointers;
	unsigned side;

	pointers_10cubic_matrix(size_t side);
	~pointers_10cubic_matrix();
	void *init_matrix(unsigned d);
	void delete_matrix(unsigned d,void *p);
	int get(unsigned index[10]);
};

pointers_10cubic_matrix::pointers_10cubic_matrix(size_t side):side(side){
	this->pointers=this->init_matrix(9);
}

void *pointers_10cubic_matrix::init_matrix(unsigned d){
	if (d){
		void **r=new void *[this->side];
		for (unsigned a=0;a<this->side;a++)
			r[a]=this->init_matrix(d-1);
		return r;
	}else{
		int *r=new int[this->side];
		for (unsigned a=0;a<this->side;a++)
			r[a]=rand()%10+10;
		return r;
	}
}

pointers_10cubic_matrix::~pointers_10cubic_matrix(){
	this->delete_matrix(9,this->pointers);
}

void pointers_10cubic_matrix::delete_matrix(unsigned d,void *p){
	if (d){
		void **p2=(void **)p;
		for (unsigned a=0;a<this->side;a++)
			this->delete_matrix(d-1,p2[a]);
		delete[] p2;
	}else
		delete[] (int *)p;
}

int pointers_10cubic_matrix::get(unsigned index[10]){
	void *p=this->pointers;
	for (unsigned a=0;a<9;a++)
		p=((void **)p)[index[a]];
	return ((int *)p)[index[9]];
}

////////////////////////////////////////////////////////////////////////////////

struct arithmetic_10cubic_matrix{
	int *pointers;
	size_t size;
	unsigned side;
	unsigned offsets[9];

	arithmetic_10cubic_matrix(size_t side);
	~arithmetic_10cubic_matrix();
	int get(unsigned index[10]);
};

arithmetic_10cubic_matrix::arithmetic_10cubic_matrix(size_t side):side(side){
	this->size=1;
	for (size_t a=0;a<10;a++){
		this->size*=side;
		if (a==9)
			break;
		this->offsets[8-a]=this->size;
	}
	this->pointers=new int[this->size];
	for (size_t a=0;a<this->size;a++)
		this->pointers[a]=rand()%10+10;
}

arithmetic_10cubic_matrix::~arithmetic_10cubic_matrix(){
	delete[] this->pointers;
}

int arithmetic_10cubic_matrix::get(unsigned index[10]){
	unsigned final_index=0;
	for (unsigned a=0;a<9;a++)
		final_index+=index[a]*this->offsets[a];
	return this->pointers[final_index+index[9]];
}

int main(){
	srand(time(0));
	const unsigned N=10000000,
		side=6;
	{
		arithmetic_10cubic_matrix m(side);
		unsigned index[10];
		clock_t t0=clock();
		for (unsigned a=0;a<N;a++){
			for (unsigned b=0;b<10;b++)
				index[b]=rand()%side;
			m.get(index);
		}
		clock_t t1=clock();
		std::cout <<t1-t0<<std::endl;
	}
	{
		pointers_10cubic_matrix m(side);
		unsigned index[10];
		clock_t t0=clock();
		for (unsigned a=0;a<N;a++){
			for (unsigned b=0;b<10;b++)
				index[b]=rand()%side;
			m.get(index);
		}
		clock_t t1=clock();
		std::cout <<t1-t0<<std::endl;
	}
	return 0;
}

I get very similar times with this code. arithmetic_10cubic_matrix takes ~101% as long, but uses 63% as much memory as pointers_10cubic_matrix. I imagine the difference in space efficiency will be even greater in a 64-bit system.

Last edited on

thesourcehim (6)

Optimized code including index calculations. New results:
Pointer algorithm time = 1730133 mksecs
Math algorithm time = 3389825 mksecs

Here's the code I use:
The template itself:

template<int D, typename T>
class CCube
{
	protected:
		void*			m_pData;
		T*				m_pBeginElems;
		int				m_iNumElements;
		QVector<int>	        m_aSizes;
		T				m_cDummy;
	protected:
		void CalcNumElements(int & NumPtrs, QVector<int>* NumPtrsLevel=0)
		{
			m_iNumElements=1;
			NumPtrs=0;
			int prod;
			if(NumPtrsLevel)
				NumPtrsLevel->resize(m_aSizes.size()-1);
			for(int i=0; i<m_aSizes.size()-1; ++i)
			{
				prod=1;
				for(int j=0; j<=i; ++j)
					prod*=m_aSizes[j];
				NumPtrs+=prod;
				m_iNumElements*=m_aSizes[i];
				if(NumPtrsLevel)
					(*NumPtrsLevel)[i]=prod;
			}
			if(m_aSizes.size())
				m_iNumElements*=m_aSizes.last();
		}
		
		void InitArray()
		{
			QVector<int> nelems;
			int numptrs;
			CalcNumElements(numptrs, &nelems);
			
			m_pData=malloc(numptrs*sizeof(void*)+m_iNumElements*sizeof(T));
			
			int base=0;
			for(int i=0; i<m_aSizes.size()-1; ++i)
			{
				for(int k=0; k<nelems[i]; ++k)
				{
					if(i<m_aSizes.size()-2)
						((void**)m_pData)[base+k]=(void*)(((void**)m_pData)+base+nelems[i]+m_aSizes[i+1]*k);
					else
						((void**)m_pData)[base+k]=(void*)(((T*)(((void**)m_pData)+base+nelems[i]))+m_aSizes[i+1]*k);
				}
				base+=nelems[i];
			}
			m_pBeginElems=(T*)(((void**)m_pData)+base);
		}
		
		void InitElemets()
		{
			for(int i=0; i<m_iNumElements; ++i)
				new (m_pBeginElems+i) T();
		}
		
		void CopyArray(const T* ptr)
		{
			InitArray();
			if(ptr)
				for(int i=0; i<m_iNumElements; ++i)
					new (m_pBeginElems+i) T(*(ptr+i));
		}
		
		void DeleteArray()
		{
			for(int i=0; i<m_iNumElements; ++i)
				(m_pBeginElems+i)->~T();
			free(m_pData);
		}
	public:
		CCube(): m_pData(0), m_pBeginElems(0), m_iNumElements(0), m_aSizes(QVector<int>(D, 0))
		{
		}
		
		CCube(std::initializer_list<int> lst): m_pData(0), m_pBeginElems(0),
					m_iNumElements(0), m_aSizes(QVector<int>(D))
		{
			int i=0;
			for(auto it=lst.begin(); it!=lst.end(); ++it)
			{
				m_aSizes[i]=*it;
				++i;
			}
			InitArray();
			InitElemets();
		}
		
		CCube(const CCube<D, T> & op): m_pData(0), m_aSizes(op.m_aSizes)
		{
			CopyArray(op.m_pBeginElems);
		} 
		
		~CCube()
		{
			DeleteArray();
		}
		
		CCube<D, T> & operator = (const CCube<D, T> & op)
		{
			if(m_pData)
				DeleteArray();
			m_aSizes=op.m_aSizes;
			CopyArray(op.m_pBeginElems);
			return (*this);
		}
		
		operator void* ()
		{
			return m_pData;
		}
		
		operator const void* () const
		{
			return m_pData;
		}
		
		void* data()
		{
			return m_pData;
		}
		
		const void* data() const
		{
			return m_pData;
		}
		
		operator QByteArray () const
		{
			static_assert(__is_pod(T), "Serializing objects is not implemented yet");
			QByteArray arr(m_aSizes.size()*sizeof(int), '\0');
			memcpy(arr.data(), m_aSizes.data(), arr.size());
			return arr+((m_pBeginElems)?
				(QByteArray((const char*)m_pBeginElems, m_iNumElements*sizeof(T))):(QByteArray()));
		}
		
		void FromByteArray(const QByteArray & Data)
		{
			static_assert(__is_pod(T), "Deserializing objects is not implemented yet");
			if(m_pData)
				DeleteArray();
			memcpy(m_aSizes.data(), Data.data(), m_aSizes.size()*sizeof(int)); 
			InitArray();
			memcpy(m_pBeginElems, Data.data()+m_aSizes.size()*sizeof(int), Data.size()); 
		}
		
		QVector<int> size() const
		{
			return m_aSizes;
		}
		
		int size(int n) const
		{
			return m_aSizes[n];
		}
		
		const T & GetElem_Math(std::initializer_list<int> lst) const
		{
			if(lst.size()!=(long uint)m_aSizes.size())
			{
				qDebug("CCube: wrong number of indexes: must be %d, given %lu", m_aSizes.size(), lst.size());
				return m_cDummy;
			}
			int ind=*(lst.begin()), n=1;
			auto it=lst.begin();
			++it;
			while(it!=lst.end())
			{
				ind=ind*m_aSizes[n]+*it;
				++it;
				++n;
			}
			return m_pBeginElems[ind];
		}
};

Test program:

#define CBT(a) ((int**********)(void*)a)
#define CBI(a) ((int***)(void*)a)
#define CBS(a) ((QString***)(void*)a)

int random(int v)
{
	return rand()%v;
}

int main(int, char**)
{	
	qDebug("Functionality test");
	CCube<3, int> int_cube({4, 4, 4});
	CCube<3, QString> str_cube({4, 4, 4});
	int n=1;
	for(int i=0; i<int_cube.size(0); ++i)
		for(int j=0; j<int_cube.size(1); ++j)
			for(int k=0; k<int_cube.size(2); ++k)
			{
				CBI(int_cube)[i][j][k]=n;
				CBS(str_cube)[i][j][k]=QString("Hello_%1").arg(n);
				++n;
			}
	CCube<3, QString> str_cube2;
	str_cube2=str_cube;
	
	for(int i=0; i<int_cube.size(0); ++i)
		for(int j=0; j<int_cube.size(1); ++j)
			for(int k=0; k<int_cube.size(2); ++k)
			{
				qDebug("Str value = %s, Str2 value = %s, int value (ptr) = %d, int value (math) = %d",
					CBS(str_cube)[i][j][k].toUtf8().data(),
					CBS(str_cube2)[i][j][k].toUtf8().data(),
					CBI(int_cube)[i][j][k], int_cube.GetElem_Math({i, j, k}));
			}
	
	qDebug("\nPerformance test");
	CCube<10, int> tb({7, 7, 7, 7, 7, 7, 7, 7, 7, 7});
	srand(time(0));
	struct timeval tv1, tv2;
	const int tries=10000000;
	qDebug("Pointer dereference readings");
	gettimeofday(&tv1, 0);
	for(n=0; n<tries; ++n)
	{
		CBT(tb)[random(7)][random(7)][random(7)][random(7)][random(7)]
			[random(7)][random(7)][random(7)][random(7)][random(7)];			
	}
	gettimeofday(&tv2, 0);
	printf("Pointer algorithm time = %lld mksecs\n", ((qint64)(tv2.tv_sec)-(qint64)(tv1.tv_sec))*1000000ll+
			((qint64)(tv2.tv_usec)-(qint64)(tv1.tv_usec)));
	
	qDebug("Index calculation readings");
	gettimeofday(&tv1, 0);
	for(n=0; n<tries; ++n)
	{
		tb.GetElem_Math({random(7), random(7), random(7), random(7), random(7),
			random(7), random(7), random(7), random(7), random(7)});			
	}
	gettimeofday(&tv2, 0);
	printf("Math algorithm time = %lld mksecs\n", ((qint64)(tv2.tv_sec)-(qint64)(tv1.tv_sec))*1000000ll+
			((qint64)(tv2.tv_usec)-(qint64)(tv1.tv_usec)));
	
    return 0;
}

thesourcehim (6)

Helios, you use DIFFERENT memory blocks, allocated by multiple new's. I use single consequential block for EVERYTHING. Pointers become cached in this case (since they are all close to each other) and dereferencing is done in the cache most of the time rather than in memory. And this is really fast. In short, you use different structure than I described.

helios (17560)

//!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
//!! WARNING: assumes sizeof(void *)==sizeof(T *) for any T that is a pointer !!
//!! or a basic type.                                                         !!
//!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
struct pointers_10cubic_matrix{
	void *pointers;
	int *data;
	unsigned side;

	pointers_10cubic_matrix(size_t side);
	~pointers_10cubic_matrix();
	int get(unsigned index[10]);
};

pointers_10cubic_matrix::pointers_10cubic_matrix(size_t side):side(side){
	size_t size=1,
		pointers_size=0;
	for (int a=0;a<9;a++){
		size*=side;
		pointers_size+=size;
	}
	this->pointers=new void **[pointers_size];
	this->data=new int[size*=side];

	void **out_pointer=(void **)this->pointers,
		**in_pointer=((void **)this->pointers)+this->side;
	const unsigned increment=this->side;
	unsigned step_size=increment;
	for (unsigned b=increment;b<pointers_size;){
		for (unsigned a=0;a<step_size;a++){
			*out_pointer++=(void *)in_pointer;
			in_pointer+=increment;
			b+=increment;
		}
		step_size*=this->side;
	}
	int **out_int_pointer=(int **)out_pointer;
	for (unsigned a=0;a<size;a+=side)
		*out_int_pointer++=this->data+a;
}

pointers_10cubic_matrix::~pointers_10cubic_matrix(){
	delete[] (void **)this->pointers;
	delete[] this->data;
}

int pointers_10cubic_matrix::get(unsigned index[10]){
	void *p=this->pointers;
	for (unsigned a=0;a<9;a++)
		p=((void **)p)[index[a]];
	return ((int *)p)[index[9]];
}

The constructor must be among the most annoying pieces of code I've ever written.
Now the times for pointers_10cubic_matrix are slightly worse, but it doesn't make much of a difference (as I expected).

Helios, you use DIFFERENT memory blocks, allocated by multiple new's. I use single consequential block for EVERYTHING. Pointers become cached in this case

No, they don't. It takes 48 MiB to store all the 32-bit pointers required to hold the structure. This amount doesn't change no matter how you organize it, thus, when the CPU tries to access an element it'll have to jump around all over memory. Some parts of the tree (usually near the root) might get cached, but they aren't likely to stay there for long, specially when accessing randomly.

Now, about your code. Come on, now. Your tests aren't anywhere near fair. Compare the amount of work this does:

CBT(tb)[random(7)][random(7)][random(7)][random(7)][random(7)][random(7)][random(7)][random(7)][random(7)][random(7)];

to the amount of work this does:

const T & GetElem_Math(std::initializer_list<int> lst) const
{
    if(lst.size()!=(long uint)m_aSizes.size())
    {
        qDebug("CCube: wrong number of indexes: must be %d, given %lu", m_aSizes.size(), lst.size());
        return m_cDummy;
    }
    int ind=*(lst.begin()), n=1;
    auto it=lst.begin();
    ++it;
    while(it!=lst.end())
    {
        ind=ind*m_aSizes[n]+*it;
        ++it;
        ++n;
    }
    return m_pBeginElems[ind];
}

And now look at how I wrote it:

int pointers_10cubic_matrix::get(unsigned index[10]){
	void *p=this->pointers;
	for (unsigned a=0;a<9;a++)
		p=((void **)p)[index[a]];
	return ((int *)p)[index[9]];
}

int arithmetic_10cubic_matrix::get(unsigned index[10]){
	unsigned final_index=0;
	for (unsigned a=0;a<9;a++)
		final_index+=index[a]*this->offsets[a];
	return this->pointers[final_index+index[9]];
}

You can't seriously be expecting to get any meaningful information out of that.

coder777 (8443)

@thesourcehim

your comparison is false. You're comparing fixed with variable dimensions and macro with function calls.

Topic archived. No new replies allowed.