Can the total element size of a dynamic array, be decreased at run-time?

Pages: 12
Is it possible? Because from the many webpages I have read online, the general notion is that programmers will only ever need to only carry on increasing the size of an array at run-time.. never decreasing?

and if you need an array which is smaller than the original,
then you have to delete[] array and array = NULL and make the array smaller manually.....

Last edited on
and if you need an array which is smaller than the original,
then you have to delete[] array and array = NULL and make the array smaller manually.....


If you're manually managing memory, the procedure for shrinking the size of the array is the much the same as for growing it.

For instance, the following works whether you're shrinking or growing:
Allocate the new array. Copy the elements of the old array into the new array (being careful not to overwrite the bounds of the new array if it's smaller than the old.) Delete the old array.
closed account (48T7M4Gy)
Small point. NULL is out, nullptr is in.
it is probably best to track its current size and never decrease because you may need to increase back up again. Memory resize is very expensive time-wise, so you want to do it as infrequently as possible, esp if using the copy-destroy method.

I like to do these types of operations in large blocks -- like a megabyte or more. That is, allocate 10 MB up front, if it gets full, allocate 20mb, and so on. I don't think I have ever decreased. But the need to do this kind of thing is greatly reduced by vectors. I haven't needed to roll out a memory manager class in many years.


Last edited on
Traditional advice is to multiply the size of the block by some constant factor. In std::vector, for instance, Dinkumware's implementation scales capacity by a factor of 3/2, while libstdc++ scales the capacity by a factor of 2. Because this causes exponential growth, it provides the amortized constant time guarantee on insertion at the end, and reduces the number of reallocations substantially. (Unless you allocate huge blocks, of course.)
I haven't been able to find any kind of rationale for what libstdc++ (GCC 6.3) does;
it alternates between the two factors 1.001 (!) and 1.999.

Is it some kind of crazy attempt at getting close to the golden ratio (about 1.6, which probably gives a good statistical trade-off between performance and memory foot print)?

Or is it because it knows about the innards of its default allocator and exploits the knowledge that with a factor of 1.001 the first time around, behind the scenes it can do the equivalent of a std::realloc without really relocating the buffer?

EDIT: No, the elements are moved even when the factor is 1.001
http://coliru.stacked-crooked.com/a/c227f12f7788739e

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>
#include <vector>
#include <iomanip>

int main()
{
    std::vector<int> vec(1'000) ;

    while( vec.capacity() < 1'000'000 )
    {
        const auto old_cap = vec.capacity() ;
        vec.resize( old_cap + 1 ) ;
        std::cout << "capacity: " << std::setw(7) << vec.capacity() << "  increased by a factor of : " 
                                  << std::fixed << std::setprecision(4) << vec.capacity() / double(old_cap)  << '\n' ;
    }
} 


clang++ / libc++
capacity:    2000  increased by a factor of : 2.0000
capacity:    4000  increased by a factor of : 2.0000
capacity:    8000  increased by a factor of : 2.0000
capacity:   16000  increased by a factor of : 2.0000
capacity:   32000  increased by a factor of : 2.0000
capacity:   64000  increased by a factor of : 2.0000
capacity:  128000  increased by a factor of : 2.0000
capacity:  256000  increased by a factor of : 2.0000
capacity:  512000  increased by a factor of : 2.0000
capacity: 1024000  increased by a factor of : 2.0000



g++ / libstdc++
capacity:    2000  increased by a factor of : 2.0000
capacity:    2002  increased by a factor of : 1.0010
capacity:    4002  increased by a factor of : 1.9990
capacity:    4006  increased by a factor of : 1.0010
capacity:    8006  increased by a factor of : 1.9985
capacity:    8014  increased by a factor of : 1.0010
capacity:   16014  increased by a factor of : 1.9983
capacity:   16030  increased by a factor of : 1.0010
capacity:   32030  increased by a factor of : 1.9981
capacity:   32062  increased by a factor of : 1.0010
capacity:   64062  increased by a factor of : 1.9981
capacity:   64126  increased by a factor of : 1.0010
capacity:  128126  increased by a factor of : 1.9980
capacity:  128254  increased by a factor of : 1.0010
capacity:  256254  increased by a factor of : 1.9980
capacity:  256510  increased by a factor of : 1.0010
capacity:  512510  increased by a factor of : 1.9980
capacity:  513022  increased by a factor of : 1.0010
capacity: 1025022  increased by a factor of : 1.9980

http://coliru.stacked-crooked.com/a/e00f45707c32757b

Microsoft:
capacity:    1500  increased by a factor of : 1.5000
capacity:    2250  increased by a factor of : 1.5000
capacity:    3375  increased by a factor of : 1.5000
capacity:    5062  increased by a factor of : 1.4999
capacity:    7593  increased by a factor of : 1.5000
capacity:   11389  increased by a factor of : 1.4999
capacity:   17083  increased by a factor of : 1.5000
capacity:   25624  increased by a factor of : 1.5000
capacity:   38436  increased by a factor of : 1.5000
capacity:   57654  increased by a factor of : 1.5000
capacity:   86481  increased by a factor of : 1.5000
capacity:  129721  increased by a factor of : 1.5000
capacity:  194581  increased by a factor of : 1.5000
capacity:  291871  increased by a factor of : 1.5000
capacity:  437806  increased by a factor of : 1.5000
capacity:  656709  increased by a factor of : 1.5000
capacity:  985063  increased by a factor of : 1.5000
capacity: 1477594  increased by a factor of : 1.5000

http://rextester.com/YPX40463
Last edited on
Woah, thanks for the replies.
I've read through them all and you have said some interesting, in depth stuff.
It really was just a general programming question, i am a naive uni student.
JLBorges wrote:
I haven't been able to find any kind of rationale for what libstdc++ (GCC 6.3) does;

libstdc++ uses the size instead of the capacity to decide the new capacity.

The new capacity is calculated in _M_check_len (stl_vector.h, line 1425).
https://gcc.gnu.org/onlinedocs/gcc-6.3.0/libstdc++/api/a01641_source.html#l01419

 
const size_type __len = size() + std::max(size(), __n);

__len will be used as the new capacity.
size() is the old size.
__n is the number of new elements to be inserted.
> libstdc++ uses the size instead of the capacity to decide the new capacity.

I still can't see the logic behind what it does:

This is what it has to do: initial buffer size 1000, capacity 2000 is to be resized to allow a size of 2001.

This is what it does: resize the buffer, ending up with a new buffer of capacity 2002, size 2001
(Just two insertions would result in yet another relocation of the buffer).

If it is going to use the size to decide the new capacity, why does it consider only the old size and ignore the new size? Isn't it easy to recognise that resizing the buffer from 2000 to 2002, 4002 to 4006 etc. is not a good idea?
closed account (48T7M4Gy)
https://groups.google.com/forum/#!msg/comp.lang.c++.moderated/asH_VojWKJw/Lo51JEmLVboJ
( 1 + sqrt(5) ) / 2 that Koenig talks about is the golden ratio that was mentioned earlier.
https://en.wikipedia.org/wiki/Golden_ratio

Statistically, it may be a good idea for the scaling factor to be just below the golden ratio (eg 1.5).
How ever, that does not explain libstdc++ scaling by a factor of just 1.001; particularly when it otherwise would scale by a factor well above the golden ratio.

My point was: the idea of an amortised scaling factor which is just below the golden ratio does not appear to make any sense.
closed account (48T7M4Gy)
That's right and why I posted it.

The interesting thing about Koenig's comment is he says 'that's what they've found' or something like that, which raises all sorts of questions - How have they found it? Is there a mathematical proof? etc etc that near-golden ratio is the ideal, which doesn't seem to tally with your test. Somebody has probably written a paper on it. Maybe ask someone at Microsoft?

Without knowing the answers to these we are left like the Ancient Greeks, believing in phi, or being modern day phi-deniers. I don't have enough information to decide.
The theory behind it has been well-known (and has been routinely taught in computer sciences courses) for quite a long time. Here is a simplified explanation:
https://crntaylor.wordpress.com/2011/07/15/optimal-memory-reallocation-and-the-golden-ratio/

However it may have limited relevance for allocators (like the one in libstdc++) that use different arenas for different ranges of block sizes.
closed account (48T7M4Gy)
Well, there we go.

Microsoft have done the 'right' thing.

Maybe g++ have taken a 'statistical' punt because the average is ~1.5 (1.0010 + 1.9980)/2.
I don't think libstdc++'s choice is as bad as you make it look.

Normally one would not use the capacity to decide how many elements to add to a vector. The number of elements that you add each time varies considerably (1, 1000, 2, 2000, 4, 4000, 8, etc.) but if you were to add the same number of elements that you do with clang++ (1, 1000, 2000, 4000, etc.) you would get nearly the same number of reallocations.

The best strategy of course depends on how the vector is used. Resizing the vector exponentially as above is probably rare. If you just want to set a fixed size and then use the vector for an extended period of time it might be wasteful to allocate any more than necessary. When elements are repeatedly added I think the number of elements added each time is more likely to be within some limited range in which case the growth factor will tend towards 2 as the vector grows.

The choice to grow based on the size instead of the capacity seems to be a way of minimizing memory usage while still fulfilling the complexity requirements.
Last edited on
If you're using a really big vector and you know ahead of time the exact, or even approximate, number of items you'll need, then it maybe wise to use the reserve() method to set the capacity. In my experience, you often do know the size ahead of time, but maybe that's just in the problem domains that I face.

If you remove many elements, then you can use the shrink_to_fit() method to reduce the capacity to the actual size.

Ideally, vector<> would shrink its capacity just like it expands capacity. For example, if it expands capacity by a factor of 2 when growing, then wouldn't it make sense to shrink capacity when size() < 2*capacity()? I wonder if it does this.
closed account (48T7M4Gy)
shrink capacity when size() < 2*capacity()
Maybe shrink capacity when size() < 0.5*capacity()? Why not shrink capacity when size() < 1/1.5*capacity() would given the (new found) technology of golden ratio and Microsoft et al expertise in these matters.
> I don't think libstdc++'s choice is as bad as you make it look.

No, it is not. That test was a very deliberately chosen edge-case.

The typical use-case that is fairly representative of the most common way in which vectors are used - for example, populate a vector from data read from an input stream or socket - is to repeatedly push back values into the vector. In this case, size() == capacity when relocation occurs.

What is remarkable about the libstdc++ implementation is that it does not compare the newly estimated reserved size with the current reserved size (which was not considered in that estimation) to see if it is a reasonable proposition.

Doubling the size / reserved size, used by libstdc++ (and compatible libraries) is also not an issue when the default allocator provided by the library is used. However, going above the golden ratio can have a significant impact on memory performance when a custom allocator is used (unless the programmer is willing to baby-sit the memory management mechanism).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <iostream>
#include <vector>
#include <malloc.h>

#ifdef USE_CUSTOM_ALLOCATOR
    #include <boost/pool/pool_alloc.hpp>
    template < typename T > using allocator = boost::pool_allocator<T> ;
    static const auto& init = std::cout << "boost::pool_allocator\n------------\n\n" ;
#else
    template < typename T > using allocator = std::allocator<T> ;
    static const auto& init = std::cout << "std::allocator\n-----------------\n\n" ;
#endif // USE_CUSTOM_ALLOCATOR


int main()
{
    std::size_t n = 32'000'000 ;
    std::vector< int, allocator<int> > vector ;
    while( vector.size() < n ) vector.push_back( vector.size() ) ; // simulate typical use-case

    std::cout << "bytes used by vector: " << vector.capacity() * sizeof(int)
              << "\n\nmalloc stats\n............\n" << std::flush ;

    malloc_stats() ;
}

g++ -std=c++14 -O2 -Wall -Wextra -pedantic-errors -pthread main.cpp && ./a.out
g++ -DUSE_CUSTOM_ALLOCATOR -std=c++14 -O2 -Wall -Wextra -pedantic-errors -pthread main.cpp && ./a.out

std::allocator
-----------------

bytes used by vector: 134217728
...
Total (incl. mmap):
system bytes     =  134430720
in use bytes     =  134298656
max mmap regions =          2
max mmap bytes   =  201334784


boost::pool_allocator
------------

bytes used by vector: 134217728

...
Total (incl. mmap):
system bytes     =  536993792
in use bytes     =  536992864
max mmap regions =         11
max mmap bytes   =  536653824


In general, experience has shown that aggressively releasing memory adversely impacts performance, while usually yielding no tangible benefit. (It doesn't make sense to cramp all the people in a large hall into a tiny corner, because such a large hall was not required in the first place.) AFAIK, it is best to leave shrink_to_fit() to the discretion of the programmer using the vector.
dhayden wrote:
Ideally, vector<> would shrink its capacity just like it expands capacity.

I don't think shrinking the vector automatically is a good idea. If the size of the vector varies a lot it would just add a lot of unnecessary reallocations. It's also a problem if we don't want deletions to invalidate iterators, pointers and references to vector elements that is stored before the point of deletion.
If the size of the vector varies a lot it would just add a lot of unnecessary reallocations.

The number of reallocations upon deletion would depend on when you decide to shrink it and by how much, just as it does on when you expand it.

It's also a problem if we don't want deletions to invalidate iterators, pointers and references to vector elements that is stored before the point of deletion.
That's the real rub. I have an old working draft of C++/11 and it says (section 23.3.6.5) that erase() "invalidates iterators and references at or after the point of the erase." Earlier it says that iterators are still valid unless stated otherwise, so it seems to me that this requires that erase() does not reallocate.

One thing I find odd though is that the section on shrink_to_fit() makes no statement about iterators and references. I'd think that shrink_to_fit() would invalidate both if it decides to reallocate.
Pages: 12