Which substitute for non-type template p

Forum

Forum
General C++ Programming
Which substitute for non-type template p

Which substitute for non-type template parameter ?

Apr 18, 2012 at 8:55am

Hi all,

I would like to define a templated class with a non-type parameter like this :

template <int s>
class C40 {
public:
	int m[s];
	int other1;
	float other2[400];
	double other3[200];
};
template class C40<200>;
template class C40<300>;

But I need to declare variables with a non const, what is not authorized :

1
2

int i  = 200;
C40<i> toto[1000000];

Compiler says :

’i’ cannot appear in a constant-expression

For my usage, toto will end in a shared memory.

An alternative would be to separate m from C40 and to allocate it dynamically, but then for the sake of the shared memory, it would become a mess to manage it.

Any idea please ?

Last edited on Apr 18, 2012 at 8:57am

Apr 18, 2012 at 9:57am

Cubbi (4774)

Why would it be a mess to manage the individually dynamically allocated m?

#include <iostream>
#include <vector>

class C40 {
 public:
    std::vector<int> m;
    int other1;
    float other2[400];
    double other3[200];
    C40(int s) : m(s) {}
};

int main()
{
    int i = 200;
    std::vector<C40> toto(1000000, C40(i));
}

Edit & run on cpp.sh

Your C40<200> and C40<300> are two different types. C++ only supports arrays/vectors of objects of the same type, so there can never be a "C40<i> toto[...]" for some variable i: only "C40<200> toto[...]" or "C40<300> toto[...]"

Apr 18, 2012 at 10:36am

lalebarde (114)

Thanks Cubbi.

For my understanding, C40(int s) : m(s) {} initializes m with s isn't it ? As s is an int and not a std::vector<int>, it resizes m to size s ?

It is then equivalent to : C40(int s) {m.resize(s);} ?

Some more questions :

1) In my shared memory context where toto will be mapped in /dev/toto, shall I manage capacity with reserve to prevent memory waste ?

2) I am concern a lot with memory ressource because my data are huge. What is the overhead of vectors versus arrays ?

Last edited on Apr 18, 2012 at 10:49am

Apr 18, 2012 at 11:34am

Peter87 (11251)

C40(int s) : m(s) {} will use the vector constructor that takes the size as argument.

C40(int s) {m.resize(s);} will first construct m with the default constructor and after that call resize to resize the vector.

The end result will be the same but first way is possibly a little faster because it has to do less.

Apr 18, 2012 at 12:08pm

Cubbi (4774)

1) define memory waste. To place containers such as vectors in shared memory we use allocators. The allocator will receive the same request from the constructor above as from a call to reserve().

2) there is no overhead when compared to a dynamically allocated array. But indeed, your original structure, with a fixed size array member, would take up less room and avoid indirection . Perhaps you could deal with a vector<C40<200> > toto2 and vector<C40<300> > toto3?

Apr 18, 2012 at 1:07pm

lalebarde (114)

Thanks for the clarifications.

I also found this interesting :
http://stackoverflow.com/questions/381621/using-arrays-or-stdvectors-in-c-whats-the-performance-gap
Most contributors say the overhead of vectors vs arrays is not significant, except duli.

I have re-worked the example of duli :

class X42 {
public:
	vector<int> vec;
	X42(const int s) : vec(s) {}
	X42(const vector<int>& v) {vec = v;}
	int first() { return vec[0];}
};
class Y42 {
public:
	int f[3];
	Y42(int a[]) {f[0]=a[0]; f[1]=a[1];f[2]=a[2];}
	int first() { return f[0];}
};

int main() {
			int sum = 0;
			X42 v1(3);
			v1.vec[0]=1; v1.vec[1]=2;v1.vec[2]=3;
			int starttime1=time(NULL);
			for (int i=0;i<50000;i++) for (int j=0;j<10000;j++) {
				X42 x(v1);
				sum+=x.first() + i + j;
			}
			int endtime1=time(NULL);
			cout << "duration with vectors : " << endtime1 - starttime1 << endl; // = 19 - user	0m19.288s
//
//			int v2[3];
//			v2[0]=1; v2[1]=2;v2[2]=3;
//			int starttime2=time(NULL);
//			for (int i=0;i<500000;i++) for (int j=0;j<100000;j++) {
//				Y42 x(v2);
//				sum+=x.first() + i + j;
//				//cout << "i = " << i <<", j = " << j << ", sum = " << sum << endl;
//			}
//			int endtime2=time(NULL);
//			cout << "100 x duration with arrays : " << endtime2 - starttime2 << endl; // = 36 - user	0m36.317s
			cout << sum << endl;
}

When I comment the array part once and the vector part in a second time, I obtain that in this use case, arrays are more than 50 times faster than vectors.

$ time ./test 42
100 x duration with arrays : 36
-1482588160

real	0m36.389s
user	0m36.317s
sys	0m0.000s
$ time ./test 42
duration with vectors : 19
1974202368

real	0m19.326s
user	0m19.288s
sys	0m0.001s

duli says :

Sometimes arrays are indeed better than vectors. If you are always manipulating a fixed length set of objects, arrays are better.

I am in this case. And my application deals with very huge data sets. So memory footprints and cpu footprints are a concern. What do you thing about it ?

Then I am back to my former question, without the help of vectors.....

Last edited on Apr 18, 2012 at 1:09pm

Apr 18, 2012 at 5:18pm

clanmjc (666)

When I comment the array part once and the vector part in a second time, I obtain that in this use case, arrays are more than 50 times faster than vectors.

Am I reading this wrong, is 36ms quicker than 19ms? (Smaller is better right?)

Also I think this test case is a little off, the array is using fixed indexes in it's constructor where the vector is first using the default constructor for the vector then using the assignment operator, I don't think that is apples to oranges.

Also, why are you using a template like this? If you have many different sizes you will have serious code bloat, which doesn't make sense because other than the size of your array, the class functions identically.

Last edited on Apr 18, 2012 at 5:26pm

Apr 19, 2012 at 10:47am

lalebarde (114)

Am I reading this wrong, is 36ms quicker than 19ms? (Smaller is better right?)

For arrays, the loop is performed 100 times more, then you have to compare 0.36ms to 19ms.

the vector is first using the default constructor for the vector then using the assignment operator, I don't think that is apples to oranges.

Yes but this is done once before the loop, to not disadvantage vectors. So in my opinion, that is apples to apples. But I am not an expert. If you propose me some code, I will test it.

Also, why are you using a template like this?

I will have only 4 different sizes with tens to hundreds of Mb each, in a shared memory mapped to /dev. I will have other part of data in the class that will differentiate. I am open to suggestions of course.

Apr 19, 2012 at 12:45pm

Cubbi (4774)

but this is done once before the loop,

there is a vector constructed inside the loop, line 21, which allocates storage from the heap (by default), which is of course slower than constructing a fixed-size array on stack at line 31.

I will have only 4 different sizes with tens to hundreds of Mb each, in a shared memory mapped to /dev.

If I were you, I'd create 4 different vectors using an allocator that accesses your shared memory. If your API is one of the ones supported by boost, you could use boost.interprocess library for that http://www.boost.org/doc/libs/release/doc/html/interprocess/quick_guide.html#interprocess.quick_guide.qg_interprocess_container

Apr 19, 2012 at 2:09pm

coder777 (8449)

vectors only have a plus when you use the dynamic sizing. In your case you can gladly use POD arrays

Apr 19, 2012 at 2:47pm

clanmjc (666)

For arrays, the loop is performed 100 times more, then you have to compare 0.36ms to 19ms.

Yup, my mistake, I saw your comment about the 100 x's and looked at the loop counts (but must have just breezed over them because I thought they were the same).

Either way, as Duli stated in his post, the draw back of this test case is the constructor calls for the vector in the inner loop, so this test is definitely skewed to favor arrays.

Is the array/vector for the class truly going to be integers only (as is the test case)? After running some sample tests myself, with the algorithm a little different to give the vector a fair shot, I found the vector to take twice as long as the array, however, I may not have all the optimization settings correct. Also, I believe the difference in time is also attributed to cache misses, the vector has additional memory, since the array and vector are using ints, this is easily aligned for the array. Change the type to some user defined not so aligned type and you will see the gap close almost completely and in some cases the vector outperforms the array, as is this test case... using your modified code.

struct TestObject
{
	TestObject():int_type(1), string_type("Some string"), double_type(3.4), char_type('c')
	{
	}
	int int_type;
	string string_type;
	double double_type;
	char char_type;
};

int _tmain(int argc, _TCHAR* argv[])
{
	long sum = 0;
	vector<TestObject> vec1;
	vec1.push_back(TestObject());
	vec1.push_back(TestObject());
	vec1.push_back(TestObject());

	vector<TestObject> vec2;
	vec2.push_back(TestObject());
	vec2.push_back(TestObject());
	vec2.push_back(TestObject());

	TestObject arr1[3] = { TestObject(), TestObject(), TestObject()};
	TestObject arr2[3] = { TestObject(), TestObject(), TestObject()};

	time_t start1, start2, end1, end2;
	const int outer = 50000, inner = 10000;

	clock_t starttime1=clock();
	time(&start1);
	for (int i=0;i<outer;i++) for (int j=0;j<inner;j++) {
		vec1[0] = vec2[0];
		vec1[1] = vec2[1];
		vec1[2] = vec2[2];
		sum+=vec1[0].int_type + i + j;
	}
	time(&end1);
	clock_t endtime1=clock();
	cout << "Elapsed time with vectors    : " << difftime(end1,start1) << endl;
	cout << "Clocks Per/sec with vectors : " << ((double)endtime1-starttime1)/CLOCKS_PER_SEC << endl; // = 19 - user	0m19.288s
	cout << sum << endl;
	sum = 0;
	
	clock_t starttime2=clock();
	time(&start2);
	for (int i=0;i<outer;i++) for (int j=0;j<inner;j++) {
		arr1[0] = arr2[0];
		arr1[1] = arr2[1];
		arr1[2] = arr2[2];
		sum+=arr1[0].int_type + i + j;
	}
	time(&end2);
	clock_t endtime2=clock();
	cout << "Elapsed time with arrays   : " << difftime(end2,start2) << endl;
	cout << "Clocks Per/Sec with arrays : " << ((double)endtime2-starttime2)/CLOCKS_PER_SEC  << endl; // = 36 - user	0m36.317s
	cout << sum << endl;
	return 0;
}


Elapsed time with vectors    : 30
Clocks Per/sec with vectors : 30.125
1974202368
Elapsed time with arrays   : 36
Clocks Per/Sec with arrays : 36.611
1974202368

Last edited on Apr 19, 2012 at 3:02pm

Apr 19, 2012 at 4:12pm

lalebarde (114)

Thanks all for this very interresting discussion.

clanmjc, on my machine, I obtain :

Elapsed time with vectors    : 10
Clocks Per/sec with vectors : 9.96
15000000000000
Elapsed time with arrays   : 10
Clocks Per/Sec with arrays : 9.96
15000000000000

Here are my compiling flags : -O3 -Wall -c -fmessage-length=0 -std=c++0x -std=gnu++0x

Now, if I modify TestObject to contain four integers :

struct TestObject
{
	TestObject():int_type(1), int2(2), int3(3), int4(4) {}
	int int_type;
	int int2;
	int int3;
	int int4;
};

I obtain :

Elapsed time with vectors    : 2
Clocks Per/sec with vectors : 2
15000000000000
Elapsed time with arrays   : 0
Clocks Per/Sec with arrays : 0
15000000000000

And with 100x more loops, say with outer = 500000, inner = 100000 :

Elapsed time with vectors    : 127
Clocks Per/sec with vectors : 126.86
15000000000000000
Elapsed time with arrays   : 0
Clocks Per/Sec with arrays : 0
15000000000000000

And with 10000x more loops, say with outer = 5000000, inner = 1000000, it is the same. So I conclude the compiler makes too smart optimisation when code is useless.

Let's modify your code to use the data and avoid such optimisation :

	for (int i=0;i<outer;i++) for (int j=0;j<inner;j++) {
		vec1[0] = vec2[0];
		vec1[1] = vec2[1];
		vec1[2] = vec2[2];
		sum+=vec1[0].int_type + i + j + vec1[0].int2 + vec1[1].int3 + vec1[2].int4;
		vec2[1].int2 = vec1[2].int_type;
		vec2[1].int4 = vec1[2].int3;
		vec2[2].int2 = vec1[0].int_type;
		vec2[2].int4 = vec1[0].int3;
	}

	for (int i=0;i<outer;i++) for (int j=0;j<inner;j++) {
		arr1[0] = arr2[0];
		arr1[1] = arr2[1];
		arr1[2] = arr2[2];
		sum+=arr1[0].int_type + i + j + arr1[0].int2 + arr1[1].int3 + arr1[2].int4;
		arr2[1].int2 = arr1[2].int_type;
		arr2[1].int4 = arr1[2].int3;
		arr2[2].int2 = arr1[0].int_type;
		arr2[2].int4 = arr1[0].int3;
	}

Elapsed time with vectors    : 5
Clocks Per/sec with vectors : 5.05
15004000000001
Elapsed time with arrays   : 1
Clocks Per/Sec with arrays : 0.54
15004000000001

So with integers only, I obtain a gain of 10x times with arrays vs vectors !!! With -o2, I have the same result. With -o1, arrays is identical, vectors is worse : 7.17. With no optimisations at all, I get 25.75 for vectors vs 5.08 for arrays, 25.75/5.08=5.

As I will deal in my tables only with integers, as the size will not change nor the contents, and as my shared data are huge, I will stick to arrays.

vectors only have a plus when you use the dynamic sizing. In your case you can gladly use POD arrays

Yep, thanks coder777

Last edited on Apr 19, 2012 at 5:00pm

Apr 19, 2012 at 5:41pm

clanmjc (666)

As I will deal in my tables only with integers, as the size will not change nor the contents, and as my shared data are huge, I will stick to arrays.

Yeah that is what I was getting at as well, definitely a fun discussion.

Apr 19, 2012 at 5:58pm

Cubbi (4774)

So with integers only, I obtain a gain of 10x times with arrays vs vectors !!!

If you look at the generated code, you will notice that gcc threw your array out and simply calculated sum in the second case, without any memory accesses (at least that's what it does for me).

Notice that if you'd actually compared apples to apples (heap-allocated array to vector), you'd get the same results wth gcc.

Apr 19, 2012 at 10:53pm

lalebarde (114)

If you look at the generated code, you will notice that gcc threw your array out and simply calculated sum in the second case, without any memory accesses

That's a killing argument ! I am going to try to build another example where the compiler is not expected to perform such optimisation.

Notice that if you'd actually compared apples to apples (heap-allocated array to vector), you'd get the same results wth gcc.

Could you elaborate please, I am lost here.

Apr 19, 2012 at 11:21pm

lalebarde (114)

Here is the new code :

#define SOMESIZE 10000u

	long sum = 0;
	time_t start1, start2, end1, end2;
	const int outer = 50000, inner = 10000;

	vector<TestObject> vec1(SOMESIZE);
	// initialise vec1
	for (int i = 0; i < SOMESIZE; i++) {
		vec1[i].int_type = rand() % 1000;
		vec1[i].int2 = rand() % 1000;
		vec1[i].int3 = rand() % 1000;
		vec1[i].int4 = rand() % 1000;
	}
	clock_t starttime1=clock();
	time(&start1);
	for (int i=0;i<outer;i++) for (int j=0;j<inner;j++) {
		sum += vec1[(i*j) % SOMESIZE].int_type + vec1[(i+j) % SOMESIZE].int2 + vec1[rand() % SOMESIZE].int3 + vec1[rand() % SOMESIZE].int4;
	}
	time(&end1);
	clock_t endtime1=clock();
	cout << "Elapsed time with vectors    : " << difftime(end1,start1) << endl;
	cout << "Clocks Per/sec with vectors : " << ((double)endtime1-starttime1)/CLOCKS_PER_SEC << endl; // = 19 - user	0m19.288s
	cout << sum << endl;
	sum = 0;

	TestObject arr1[SOMESIZE];
	// initialise vec1
	for (int i = 0; i < SOMESIZE; i++) {
		arr1[i].int_type = rand() % 1000;
		arr1[i].int2 = rand() % 1000;
		arr1[i].int3 = rand() % 1000;
		arr1[i].int4 = rand() % 1000;
	}

	clock_t starttime2=clock();
	time(&start2);
	for (int i=0;i<outer;i++) for (int j=0;j<inner;j++) {
		sum += arr1[(i*j) % SOMESIZE].int_type + arr1[(i+j) % SOMESIZE].int2 + arr1[rand() % SOMESIZE].int3 + arr1[rand() % SOMESIZE].int4;
	}
	time(&end2);
	clock_t endtime2=clock();
	cout << "Elapsed time with arrays   : " << difftime(end2,start2) << endl;
	cout << "Clocks Per/Sec with arrays : " << ((double)endtime2-starttime2)/CLOCKS_PER_SEC  << endl; // = 36 - user	0m36.317s
	cout << sum << endl;

Elapsed time with vectors    : 12
Clocks Per/sec with vectors : 12.56

Elapsed time with arrays   : 12
Clocks Per/Sec with arrays : 12.05

Eventually, even with only aligned data (integers), I find the performances are very close. So I am going to change my decision and use vectors.

A big thank to all of you.

Apr 20, 2012 at 12:58am

Cubbi (4774)

Notice that if you'd actually compared apples to apples (heap-allocated array to vector), you'd get the same results wth gcc.

Could you elaborate please, I am lost here.

std::vector<int> v(size);, by default, places the array on heap -- it uses std::allocator<int>, which uses new. That's different from using, for example int arr[size];, which places the array on stack. *those* arrays are faster than vectors for some operations, and are amenable to more optimizations as you saw. The C++ way to write them is std::array<int> arr(size);, although it only became popular some 10 years ago and became standard last year.

But since your goal is to drop the data into shared memory, you'll either manually placement-new your raw array in there, or you'll use a vector with a shared memory allocator. I use those allocators a lot at work, but unlike vectors, they are not beginner stuff (unless you can use boost, then it's easy)

Last edited on Apr 20, 2012 at 12:58am

Topic archived. No new replies allowed.