Vector reallocation strategies
August 11th, 2011I was interested in understanding a bit more of how C++ compilers reallocate vectors once their capacity (using the default allocator) is reached so I decided to build a small experiment. All I wanted to know was the new vector capacity, and it turns out it’s (obviously) based on the previously full vector’s size.
The small experiment is simple:
#include<iostream>
#include<vector>
using namespace std;
int main() {
vector<int> v;
auto prev = v.begin();
for(int i=0; i<1000000; i++) {
v.push_back(i);
if(prev != v.begin()) {
cout << "Reallocation occurred. i = " << i << endl;
prev = v.begin();
}
}
return 0;
}
In g++ 4.6.1 on Arch Linux 64-bit I get:
Reallocation occurred. i = 0 Reallocation occurred. i = 1 Reallocation occurred. i = 2 Reallocation occurred. i = 4 Reallocation occurred. i = 8 Reallocation occurred. i = 16 Reallocation occurred. i = 32 Reallocation occurred. i = 64 Reallocation occurred. i = 128 Reallocation occurred. i = 256 Reallocation occurred. i = 512 Reallocation occurred. i = 1024 Reallocation occurred. i = 2048 Reallocation occurred. i = 4096 Reallocation occurred. i = 8192 Reallocation occurred. i = 16384 Reallocation occurred. i = 32768 Reallocation occurred. i = 65536 Reallocation occurred. i = 131072 Reallocation occurred. i = 262144 Reallocation occurred. i = 524288
We can clearly notice the familiar 2^x pattern that gcc uses. If a vector of size n is full and performs a push_back(), the allocator simply allocates 2n. That’s pretty simple and requires very few allocations, with the potential of wasting a lot of space if the user uses only a small portion of the new allocated space.
On the other hand, VisualStudio 2010 Ultimate Build 10.0.30319.1 RTMRel on Windows 7 (Release mode) does something considerably different:
Reallocation occurred. i = 0 Reallocation occurred. i = 1 Reallocation occurred. i = 2 Reallocation occurred. i = 3 Reallocation occurred. i = 4 Reallocation occurred. i = 6 Reallocation occurred. i = 9 Reallocation occurred. i = 13 Reallocation occurred. i = 19 Reallocation occurred. i = 28 Reallocation occurred. i = 42 Reallocation occurred. i = 63 Reallocation occurred. i = 94 Reallocation occurred. i = 141 Reallocation occurred. i = 211 Reallocation occurred. i = 316 Reallocation occurred. i = 474 Reallocation occurred. i = 711 Reallocation occurred. i = 1066 Reallocation occurred. i = 1599 Reallocation occurred. i = 2398 Reallocation occurred. i = 3597 Reallocation occurred. i = 5395 Reallocation occurred. i = 8092 Reallocation occurred. i = 12138 Reallocation occurred. i = 18207 Reallocation occurred. i = 27310 Reallocation occurred. i = 40965 Reallocation occurred. i = 61447 Reallocation occurred. i = 92170 Reallocation occurred. i = 138255 Reallocation occurred. i = 207382 Reallocation occurred. i = 311073 Reallocation occurred. i = 466609 Reallocation occurred. i = 699913
So it does a lot more allocations, but it can potentially save same space. It also allows some rare allocations in situations where the process does not have space for a 2n allocation but it does have space for a smaller allocation. The problem is that reallocations are expensive, and this rare occasion is probably not worth the considerably greater number of reallocations.
But if you pay attention to the behavior of the numbers above, it’s not so easy to derive a simple formula such as 2^n. After some thought and observation I think the function behaves as (sorry no TeX plugin):
a(n+1) = { a(n)+1 if floor(a/2)=0
{ a(n)+floor(a/2) otherwise
In C++ that would be:
void next(int &a) {
int half = a/2;
if(half)
a = a+half;
else
a++;
}
But really, even though the behavior of these two sequences seem very different, they both are still exponential after the initial values (when half turns out to be 0). The difference is that gcc grows by a factor of n while VC10 grows by a factor of n/2.
Of course we can provide our own custom allocator and guarantee the performance that we expect instead of relying on the defaults provided by compilers (which could change at any time), but understanding the essentials (e.g. both gcc and VC10 do not allocate chunks of constant size or in a linear fashion) of these reallocations helps us build more reliable and predictable systems.