I had a question about memory allocation/how iterators work for a std::vector<foo> of a user defined class 'foo'. Say foo contains variables of variable size, so that each member of the std::vector<foo> does not require the same amount of memory space. Does c++ allocate the same amount of memory for each element, equal to the amount of memory required for the largest element? Or does it use some sort of array of pointers pointing to the location of each element in the vector to make the iterator work? Or does it use some other method? I am wondering because I wrote a code which reads data from a binary files and stores most of it in std::vectors. The code seems to be using significantly more memory than the sum of the size of all the binary files, and I am using vectors made up of the datatype within the binary files (float). So I was wondering if internally the code was allocating space for each vector element which is the size of the largest element as a way to handle indexing/iterators. I ran my code through a memory leak checker and it found no errors. Thanks
To be clear, elements of a vector must all be the same type. therefore each element must be the same size. You seem to be misunderstanding how pointers work.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
// Each foo is the same size because the size of a char* is always the same.
// The length of what fooStr points to varies for each foo instance, but the size of foo does not.
struct foo
{
Foo::Foo(constchar* inStr, int inInt) : fooStr(inStr), value(inInt) {}
constchar* fooStr;
int value;
};
// Clearly the lengh of the string within each foo is different, but the size of
// each foo is the same.
// The vector is a template. It doesn't have the ability to do anything with fooStr so it doesn't
// know anything about the length of the fooStr in each object.
std::vector<foo> containerOfFoo;
foo fooMeOnce("hello", 5);
containerOfFoo.push_back(fooMeOnce);
foo fooMeTwice("greetings sir", 20);
containerOfFoo.push_back(fooMeTwice);
Ok, I see how your 'foo' should always be the same size, because it contains only a pointer and an int, which are both of fixed size. What about if I have this example...
If you do a sizeof on the std::vector, you'll get the size of the container. Not sure if this is system specific in any way, but I get 24. This will be the contribution made to any sizeof of a class or struct containing this vector.
If you wanted to see how much memory the vector is using, you could get the product of the vector's size and the size of the elements it stores.
Because of data hiding and implementation-defined behavior, it is never possible to know how much memory is being consumed by a given object. (Unless all objects involved are POD types, of course.)
now, going back to my original question... it seems to me there are 2 options on how to implement in iterator for the std::vector Vfoo...
1) allocate the same amount of memory for each element of Vfoo equal to or greater than the amount needed in the largest element.
2) have a table of pointers to the location of the start of each element
The first option would waste a lot of memory, but the iterator would know exactly how many bytes to skip to reach the next element of VFoo. The second option would have the iterator skim through the table of pointers to access the various elements of VFoo. But then there would be an issue if you resized the data within an element of Vfoo since my understanding is the the memory in a std::vector is contiguous. In that case there would be no room to fit the extra data and so the memory would need to be copied somewhere else with enough space and the pointer table updated. In case 1) the memory would only need to be copied if you added more to the element that already had the most data and didn't leave any additional buffer.
I originally asking because it seems the memory usage in a program I wrote seems to be significantly larger than the size of the binary data I am reading in by factors of 3-5. The binary data is stored as floats, and I am saving them in std::vector<float>'s so I wouldn't expect that much bloating. That is what got me wondering if 1) is how vectors are typically implemented, which would explain the bloating. However, based upon cires example, it seems like 2) is what is going on. Any more input would be appreciated, or if it was already stated but I missed it/didn't understand it. Thanks a lot you've already cleared up a bunch for me!
You're still misunderstanding the nature of arrays/vectors. Each element in an array/vector must have the exact same size. If this wasn't the case, the structure wouldn't be an array, but rather a contiguous linked list.
This works for any value of T:
1 2 3 4 5 6 7 8 9 10
std::vector<T> v;
fill_vector(v);
assert(v.size() > 0);
char *b = (char *)&v[0];
for (size_t i = 0; i < v.size(); i++){
T &objectA = *(T *)(b + sizeof(T) * i);
T &objectB = v[i];
assert(objectA == objectB);
assert(&objectA == &objectB);
}
originally asking because it seems the memory usage in a program I wrote seems to be significantly larger than the size of the binary data I am reading in by factors of 3-5. The binary data is stored as floats, and I am saving them in std::vector<float>'s so I wouldn't expect that much bloating.
Depending on how you populate it, a vector may use much more memory than the elements it actually contains.
Post your code. You're probably using the vectors suboptimally.
So I am trying to read in some binary logfiles and store it in a class VLog, and then creatse a std::vector of these logfiles. I have multiple classes so I'll just post the variables or else it will get a bit much..
class VLog
{
//...
private:
//from binary file
std::string my_Signature;
std::string my_Version;
int my_Headsize;
int my_Sampling;
int my_NumAxis;
std::vector<int> my_Axis;
std::vector<int> my_SamplesPerAxis;
int my_AxisScale;
int my_NumSubbeams;
int my_IsTruncated;
int my_NumLeafPairs;
std::string my_LogType;
std::vector<subbeam> my_Subbeams;
int my_NumSnapshots;
std::vector<snapshot> my_SnapShots;
}
class subbeam
{
//...
private:
std::string my_Name; //name of subbeam
int my_CP; //control point??
float my_MU; //monitor units
float my_Radtime; //radiation time
int my_SeqNum; //sequence number
}
class snapshot
{
//...
private:
data my_Collimator; //collimator rotation
data my_Gantry; //gantry rotation
data my_JawX1; //X1 jaw position
data my_JawX2; //X2 jaw position
data my_JawY1; //Y1 jaw position
data my_JawY2; //Y2 jaw position
data my_CouchVrt; //couch vertical position
data my_CouchLat; //couch lateral position
data my_CouchLng; //couch longitudinal position
data my_CouchRtn; //couch rotation angle
data my_MU; //monitor units
data my_Beamhold; //beamhold
data my_Controlpoint; //control point
data my_CarriageA; //carriageA position
data my_CarriageB; //carriageB position
std::vector<leaf> my_LeavesA; //leaf positions A
std::vector<leaf> my_LeavesB; //leaf positions B
}
class leaf
{
//...
private:
data my_Pos; //leaf position
}
class data
{
//...
private:
std::string my_ID; //data value ID
float my_Actual; //actual value
float my_Expected; //expected value
tolerence my_Tolerence; //tolerences
}
and the tolerence class isn't very important (contains 2 doubles). So I have multiple classes going on which might make it hard for you guys to track with. But what it boils down to is I create a std::vector<VLog> logs, and I am wondering what happens when one element of 'log' contains much less data and another element, does a std::vector increase the amount of memory allocated so that each element of 'log' uses the same amount of memory...this would be 1) in my above post.
For instance, if I had 3 logfiles, one being 500kB, 2MB, and 4MB. In that case, 500kB logfile will end up taking the same amount of memory space as a 4MB logfile when stored in a std::vector?? But if I had created 3 VLog separate variables rather than placing them in a std::vector, each VLog variable would be much closer to the size of their respective logfile? This would explain my bloating if that's how it works. Thanks again
I am wondering what happens when one element of 'log' contains much less data and another element, does a std::vector increase the amount of memory allocated so that each element of 'log' uses the same amount of memory
Like I and other people in this thread have said, an object is of fixed size. All snapshots are always the exact same size, no matter how many leaves their vectors contain. A vector stores its data in a memory region separate from the object that contains the vector, so its size has nothing to do with the object that contains it.
Do you ever remove elements from your vectors, particularly with pop_back()? Also, are you compiling your program with optimizations? Certain compilers generate space-inefficient vectors when making debug builds.
Ok, I see how your 'foo' should always be the same size, because it contains only a pointer and an int, which are both of fixed size. What about if I have this example...
int main()
{
std::vector<foo> Vfoo;
Vfoo.resize(2);
Vfoo[0].bar.resize(20);
Vfoo[1].bar.resize(1);
}
I would think Vfoo[0] would require more memory than Vfoo[1]?
Vfoo[0] certainly has a bigger capacity, but the vector implementation decides how it wants to expand capacity during a resize. Read up on the reserve member function. two different vectors could have a different size but the same capacity.
ok, so maybe I am using bad vernacular? When I say size I am talking about the physical amount of memory required to store the data contained within an element of a vector, not the number of elements. Maybe that is causing confusion?
So I'll continue to use size to mean the amount if bits required to store the data....
Based upon previous posts, I understand that the size of each vector element is the same. However, this only answers the 'what' and I am after more of the why/how this is done. I can think of 2 ways to implement a vector that would result in the size of each element being the same. Assume I have 2 log files I want to read, log1 being 1kB in size and log2 being 1GB in size.
1) the vector is implemented as a list/array of pointers to data. In this case, the size of any element would be the same regardless of the size of the data each pointer points to as the size of the pointers are unchanged.
2) the vector stores the actual data within itself. In this case, there is absolutely no reason why a 1kB file must use the same amount of memory as the 1GB file. However, if I want to store it in a container, there may (or may not, im not really sure) be very good reasons to implement something that forces this be true. Since you cant allocate 1kB and fit the 1GB logfile within that space, the only other alternative is to allocate 1GB of data and store the 1kB logfile in there.
Both implementations will result in each element of the vector being the same size, however the consequences in term of memory management are extremely different as the second option can easily lead to a ton of wasted memory space. I am sure there are other ways to implement a vector which lead to different behaviour as well, but these are two that pop into mind.
Once I read the data in I will work on it, but not modify it; so maybe there is a better way to store it. I think I am using O2 optimization, but I'll have to check that tomorrow. It's always difficult to notice your own ignorances, if I am completely missing the point, is there anything else you guys could point me to for some reading? I'll check out the reserve member function after dinner. thanks again for the help
Ah, yes you were using a poor choice of words. To answer your question:
LB wrote:
Because of data hiding and implementation-defined behavior, it is never possible to know how much memory is being consumed by a given object. (Unless all objects involved are POD types, of course.)
1) the vector is implemented as a list/array of pointers to data. In this case, the size of any element would be the same regardless of the size of the data each pointer points to as the size of the pointers are unchanged.
2) the vector stores the actual data within itself. In this case, there is absolutely no reason why a 1kB file must use the same amount of memory as the 1GB file. However, if I want to store it in a container, there may (or may not, im not really sure) be very good reasons to implement something that forces this be true. Since you cant allocate 1kB and fit the 1GB logfile within that space, the only other alternative is to allocate 1GB of data and store the 1kB logfile in there.
Neither is generally true, but #1 is possibly true for particular element types.
Here's a basic outline of how vectors are implemented:
template <typename T>
class vector{
T *_front;
unsigned _size, _capacity;
public:
vector(): _front(nullptr), _size(0), _capacity(0){}
void push_back(const T &new_object){
if (!this->_front){
//Note: In reality, the array will be allocated in a way that
//doesn't call constructors on each of the elements.
this->_front = new T[8];
this->_capacity = 8;
}
if (this->_size == this->_capacity){
T *temp = new T[this->_capacity *= 2];
std::copy(this->_front, this->_front + this->_size, temp);
delete[] this->_front;
this->_front = temp;
}
this->_front[this->_size++] = new_object;
}
void pop_back(){
assert(this->_size > 0);
//In reality, the destructor would be called on this->_front[this->_size - 1]
this->_size--;
}
T &operator[](int i){
returnthis->_front[i];
}
};
Note that:
1. sizeof(vector<T>), in many platforms, will equal 12 for all values of T. How many elements the vector has is irrelevant.
2. sizeof is what matters when the compiler generated the code for handling arrays and pointer arithmetic.
3. vector<vector<T>> has elements of constant sizes, even if each vector element contains a different number of element themselves. This situation correlates to your hypothesis #1. vector<vector<T>> is essentially a pointer to an array of pointers. But this because the element type of the outer vector is another vector. Other values of Y may or may not generate a vector type that's an array of pointers. For example, vector<int> is simply a pointer to an array of ints.