I'm a new programmer working up to implementing a major Engineering project. Right now, I'm trying to put together a simple program to perform a simple linear regression on a set of data. The set of data can be of arbitrary length.
Given that the set of data is of arbitrary length, what structure should I use to hold it? A linked list is the only one that I can think of, but it'd be nice to have consecutive memory addresses for efficiency's sake (since my ultimate goal is to build a very CPU-intensive regression program). Let's say that the values are all doubles.
Are there any more efficient data structures of arbitrary length?
Internally, is a vector a linked list with the operators overloaded so that operations like somePointer++; lead to the next pointer? Or are components of it actually stored in consecutive memory addresses?
And if they are actually stored in consecutive memory addresses, what does the computer do when you try to add a component and the next memory address isn't available?
vector is not a linked list (that would be std::list). Also:
Reference wrote:
vector containers have their elements stored in contiguous storage locations
If you're trying to access an element that is not in the container, different things can happen:
In Release mode there is (usually) no protection against that due to performance reasons - so you'll access an invalid memory address and possibly cause a segmentation fault, just like when using a raw array.
In Debug mode, there usually are checks (for gcc, you have to additionally define the global macro _GLIBCXX_DEBUG) - so you might get some error message such as this:
/usr/include/c++/4.4/debug/vector:265:error: attempt to subscript container
with out-of-bounds index 6, but container only holds 5 elements.
Objects involved in the operation:
sequence "this" @ 0x0xbf846468 {
type = NSt7__debug6vectorIiSaIiEEE;
}
Aborted
I guess I'm just curious how it works internally. I mean, if the addresses are all contiguous, then it seems like the computer would have to preemptively know how much free space to put around the vector to allow it to grow to whatever size it needs to be. And if it does preemptively declare a large region for the vector, then why not just use an oversized fixed array instead?
I won't argue with the reference's claim, but it doesn't make much sense to me.
A vector indeed preallocates some memory. You can use the capacity member function to check for how many elements memory was reserved.
When the capacity runs out, the vector usually doubles its capacity and relocates all elements.
To avoid these relocations, you can call the reserve member function in advance if you have a good idea of how many elements you're going to add.