C++11: Vector Improved, How It Works?

C++11 comes with several interesting things, and in this post, we will talk a little about a new method in std::vector that comes with the new standard. This method can lead to improve the performance of your programs. This post is divided in two: The first part explains why it is good to have this new method (emplace_back) and how to use it, and the second part will try to make the way it works clear. To achieve this goal, we have to go through several new things (variadic template, constexpr, argument forwarding) that we explain a little to understand the whole thing.

In the previous standard, sometimes you had to create an array of let's say Point3D (or whatever) in a loop. Let's suppose we know the number of elements we have to put in the vector, we first show the slowest version I think about, and then we show a more optimized version thanks to C++11 standard.

1 A Case Study

struct Point3D
{
  Point3D(int xx, int yy, int zz)
    : x {xx}
    , y {yy}
    , z {zz}
  {
    std::cout << "Cstor" << std::endl;
  }

  Point3D(const Point3D& p)
    : x {p.x}
    , y {p.y}
    , z {p.z}
  {
    std::cout << "Copy Cstor" << std::endl;
  }

  int x;
  int y;
  int z;
};

int main()
{
  std::vector<Point3D> v;

  for (unsigned int i = 0; i < 10; ++i)
    {
      v.push_back(Point3D {i, i - 69, i + 42});
    }

  for (unsigned int i = 0; i < 10; ++i)
    {
      std::cout << v[i].x << std::endl;
    }
}

What we want in this program, is to have only 10 calls to the constructor because we just want ten objects, so our requirement seems legit right? But if we run this program, we can see 10 calls to the first constructor and 25 to the copy constructor. This is huge! There are two reasons for these copies:

Vectors are dynamic arrays. And each time it reaches its limit, it doubles its size. So all the elements are copied at each reallocation. It starts with a size 1, then has to double its size, and the same for 2, 4 and 8. If we add these numbers, we have 15 copies. These can be deleted by using the "reserve" method since we known the number of elements. By this call we avoid the reallocation and the copies.
The problem of who is responsible of the destruction of which object is a complex one. The STL handles this by copying the object it takes in their containers and to destroy them when the container is destroyed. This is why we still have 20 copies and not 10. But with C++11 comes an emplace_back method (emplace exists too, but we focus on the push_back dual here). This method removes the copy done by the push_back method. For a user, the only changes to make is to replace the call to push_back by a call to emplace_back. The arguments to give to the new method are the arguments to give to the constructor:

v.emplace_back(i, i - 69, i + 42);

2 How It Works

Now, let's see what are the changes done for being able to make this emplace_back method. For this post I have used the glibcxx version 4.7.1 to discover the changes.

Before going into the code, we have to present variadic templates and its power, the forwarding parameters concept, and how to use it. Then, we show the difference between push_back and emplace_back.

2.1 Variadic Template and Forwarding

Forwarding parameters is a way to take all the arguments received by a function, and to resend it as it comes. This comes with variadic templates. Variadic templates allow to pass an undefined number of arguments to a function/method. It allows to have a program which looks functional (a Head and the Rest). The code that follows combines two new features of the C++11, constexpr and variadic parameters. A little word about constexpr: it allows to make computation at compile time if they are possible (there are several conditions to respect for that, but for now, we just assume that they are respected). In this example, we find the min of a list of argument at compile time (thanks to constexpr). This is a generic version that works only with integers but works on 2, 3, 4, … n arguments.

constexpr
int min(int n, int p)
{
  return n < p ? n : p;
}

template<typename... Args>
constexpr
int min(int n, Args... args)
{
    return min(n, min(args...));
}

int main(int argc, char *argv[])
{
  static_assert(min(4, 5, 6, 42, 7, 3, 6) == 3, "min is incorrect");
}

The recursion mechanism is classical, but what I would have implemented with a vector in the past (in a run-time version) or template recursion (in a compile-time version) is fully expressible with the concept of variadic template, and it is enjoyable, because it is kind of beautiful. It allows to be computed at compile-time if all the arguments are deductible at this time, or simply computed at run-time. It is also invisible for the user.

It is important to note that the arguments are passed by copy. If we wanted to pass them by reference, the game would be harder… And to be honest, at this time, I don't know how I should do. It seems more complex to handle the recursion because it implies to implement different combination of the arguments ( int&, int&&; int&&, int&; …). There was a proposal to make it as a part of the standard library (proposition n772), or with initializer_list. The proposal shows it is easier to implement and faster with the second method. We keep this min implementation as a simple example to understand variadic templates. Any proposition of a fully generic working version (using constexpr and variadic template) of this is encouraged! A solution might be to use std::forward since it seems to be made for handling the forwarding, but my first tries were not successful.

Now you have seen why I think the variadic template are good to play with, let's talk about forwarding arguments. Forwarding is made by the std::forward function from the `utility' header (code can be found in `bits/move.h'). It forwards the arguments exactly as they are received (more information in the man). Here is a little example of what makes std::forward in the code:

#include <iostream>
#include <utility>

struct Item
{
  Item(): value{0} {}

  Item(const Item& p){ std::cout << "Copy" << std::endl; }

  int value;
};

void take_args(int& a, int&& b, int c, Item& f)
{
  std::cout << a << " - " << b << " - " << c
            << " - " <<  f.value << std::endl;
}

template <typename... Args>
void call_take_args(Args&&... args)
{
  take_args(std::forward<Args>(args)...);
}

int main()
{
  Item f;
  int i = 2;
  call_take_args(i, 4, 5, f);
}
// The program outputs "2 - 4 - 5 - 0".

If we remove the reference in the last take_args argument, "Copy" is also printed. We can see that the program won't compile if we remove a `&' for the second argument, or if we add an extra one to the first. This is because std::forward keeps the r/l-valueness of the argument received. How are they able to know this? Let's take a look at the source of this function (version 4.7.1 of glibcxx):

template<typename _Tp>
constexpr _Tp&&
forward(typename std::remove_reference<_Tp>::type& __t) noexcept
{ return static_cast<_Tp&&>(__t); }

template<typename _Tp>
constexpr _Tp&&
forward(typename std::remove_reference<_Tp>::type&& __t) noexcept
{
  static_assert(!std::is_lvalue_reference<_Tp>::value, "template argument"
                " substituting _Tp is an lvalue reference type");
  return static_cast<_Tp&&>(__t);
}

The first version of the function takes a lvalue and the second a rvalue. The static_assert just checks if the specialization works as expected. This specialization works because std::remove_reference takes a type `T' (possibly U, U&, U&&), and the `type' is a typedef of T. So we are able to manage a use case where we give as parameter T, T& or T&& the same way. As a result we have two functions with the following signature: forward(T& __t) and forward(T&& __t). If the argument is a lvalue it goes in the first one, otherwise to the second. noexcept just helps the compiler to know that the function/method will not throw an exception or that the program should be stopped if an exception tries to escape from here.

Now let's take a look at what they return. The fact that both functions seem to return the same thing might look weird. We have the feeling that they return the same type: `T&&'. But in fact, `T' could be a `U&' or a `U&&', and C++11 comes with a rule named reference collapsing rules (see this article which talk about it). This is simple:

U& & => U&
U&& & => U&
U& && => U&
U&& && => U&&

This looks like the `and' truth table where & is 0 and && is 1. A good way to remember I think. So: if _Tp is a U& (the first function), the returned object will be U& too, and if it was U&& it will be U&&. Which follows the rules of a perfect forwarding. Now we have understood the concept of the forwarding, and the way it is done, we can take a look at the code of push_back and emplace_back to know what makes the change.

2.2 Emplace_back and Push_back

Now we have all the tools in our hands to understand the difference between these two methods, we can just show the source code:

// Taken from bits/stl_vector.h
void
push_back(const value_type& __x)
{
   if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
     {
        _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
                                 __x);
        ++this->_M_impl._M_finish;
     }
    else
#ifdef __GXX_EXPERIMENTAL_CXX0X__
      _M_emplace_back_aux(__x);
#else
      _M_insert_aux(end(), __x);
#endif
}

// Taken from bits/vector.tcc
template<typename... _Args>
void
vector<_Tp, _Alloc>::
emplace_back(_Args&&... __args)
{
   if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
     {
        _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
                                 std::forward<_Args>(__args)...);
        ++this->_M_impl._M_finish;
     }
   else
       _M_emplace_back_aux(std::forward<_Args>(__args)...);
}

The main difference is the use of the std::forward. The macro __GXX_EXPERIMENTAL_CXX0X__ is defined when an option to activate C++11 is set. The _M_emplace_back_aux and _M_insert_aux are responsible of the reallocation when the number of available position (available means allocated and not already taken) is down to 0.

Since there is no other difference between the two methods, that means the interesting point is in the construct method (we just leave the *aux method for this post, it will take too much time to analyze them).

Before that, we have to present briefly the concept of putting `::' before a function/method name and the concept of using a placement new. If your are familiar with these concepts, you should skip these paragraphs and go directly after the code of the construct method.

It is possible for a class to override its operator new. This (static) method is responsible of the allocation of the object. If we put `::' before the operator new that means that we want to take the one in the global namespace and not the overridden one. The code below shows the difference between these two notation:

struct Foo
{
  Foo(int a)
    : value{a}
  {
  }

  static void* operator new(size_t)
  {
    return (void*)42; // g++ is able to detect if
                      // we return 0 and raises a warning.
  }

  int value;
};

int main(int argc, char *argv[])
{
  Foo* bad = new Foo(4); // segmentation fault.
  Foo* good = ::new Foo(4); // ok.
  return 0;
}

The operator new returns a pointer to an unallocated area, which leads to a segmentation fault when the constructor is called and tries to write the value of `a'. If we put `::' in front of new, this buggy operator new is not called, and there is no memory corruption.

Now, let's talk about placement new. This is just a way to use specific position in the memory, and this can be useful when working with memory pools, or when performance is needed: allocation and deallocation are expensive. Avoiding this can speed up the program. The syntax is just: new(position), where position is a pointer to where the object must be. This is as simple as that.

Now let's close the parenthesis and look at the code of construct (this is not the construct which is directly called, but the other functions that calls this one are not interesting since the same path is followed by emplace_back and push_back):

// Taken from ext/new_allocator.h
template<typename _Up, typename... _Args>
void
construct(_Up* __p, _Args&&... __args)
{ ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }

This version is the C++11 version, which manages the case where we copy, and the case where we construct. Let's start by understanding this line, they use the placement new because this is not up to the `new' to decide where put this object, since it must be put at the end of the vector.

There is a call to the constructor and the arguments received are forwarded. These arguments are the ones given to the emplace_back method. And then it is up to the overloading to make it works. If we are making a copy (we arrived here by push_back), the copy constructor is called. Otherwise, we call another constructor if it exists (if not, we get a compile-time error).

And that's all! To arrive here, we have to know variadic template, argument forwarding, constexpr, placement new, operator new overriding and reference collapsing rules. But at least, we understand what are the changes needed to be able to make this emplace_back method, and how the C++11 makes it possible. Thanks to the new standard for this improvement! :-)

PS: Thanks to Christopher Chedeau and Gregoire Bouchetoun for their comments and corrections about this article.

Enki :: Technical Blog

Sunday, August 5, 2012