C++11 comes with several interesting things, and in this post, we
will talk a little about a new method in std::vector
that comes with the new
standard. This method can lead to improve the performance of your
programs. This post is divided in two: The first part explains why
it is good to have this new method (emplace_back) and how to use it,
and the second part will try to make the way it works clear. To
achieve this goal, we have to go through several new things (variadic
template, constexpr, argument forwarding) that we explain a little to understand the whole thing.
In the previous standard, sometimes you had to create an array of let's say Point3D (or whatever) in a loop. Let's suppose we know the number of elements we have to put in the vector, we first show the slowest version I think about, and then we show a more optimized version thanks to C++11 standard.
1 A Case Study
struct Point3D { Point3D(int xx, int yy, int zz) : x {xx} , y {yy} , z {zz} { std::cout << "Cstor" << std::endl; } Point3D(const Point3D& p) : x {p.x} , y {p.y} , z {p.z} { std::cout << "Copy Cstor" << std::endl; } int x; int y; int z; }; int main() { std::vector<Point3D> v; for (unsigned int i = 0; i < 10; ++i) { v.push_back(Point3D {i, i - 69, i + 42}); } for (unsigned int i = 0; i < 10; ++i) { std::cout << v[i].x << std::endl; } }
What we want in this program, is to have only 10 calls to the constructor because we just want ten objects, so our requirement seems legit right? But if we run this program, we can see 10 calls to the first constructor and 25 to the copy constructor. This is huge! There are two reasons for these copies:
- Vectors are dynamic arrays. And each time it reaches its limit, it doubles its size. So all the elements are copied at each reallocation. It starts with a size 1, then has to double its size, and the same for 2, 4 and 8. If we add these numbers, we have 15 copies. These can be deleted by using the "reserve" method since we known the number of elements. By this call we avoid the reallocation and the copies.
-
The problem of who is responsible of the destruction of which object
is a complex one. The STL handles this by copying the object it
takes in their containers and to destroy them when the container is
destroyed. This is why we still have 20 copies and not 10. But with
C++11 comes an
emplace_back
method (emplace
exists too, but we focus on thepush_back
dual here). This method removes the copy done by thepush_back
method. For a user, the only changes to make is to replace the call topush_back
by a call toemplace_back
. The arguments to give to the new method are the arguments to give to the constructor:
v.emplace_back(i, i - 69, i + 42);
2 How It Works
Now, let's see what are the changes done for being able to make this
emplace_back
method. For this post I have used the glibcxx version
4.7.1 to discover the changes.
Before going into the code, we have to present variadic templates
and its power, the forwarding parameters concept, and how to use it.
Then, we show the difference between
push_back
and emplace_back
.
2.1 Variadic Template and Forwarding
Forwarding parameters is a way to take all the arguments received by a function, and to resend it as it comes. This comes with variadic templates. Variadic templates allow to pass an undefined number of arguments to a function/method. It allows to have a program which looks functional (a Head and the Rest). The code that follows combines two new features of the C++11, constexpr and variadic parameters. A little word about constexpr: it allows to make computation at compile time if they are possible (there are several conditions to respect for that, but for now, we just assume that they are respected). In this example, we find the min of a list of argument at compile time (thanks to constexpr). This is a generic version that works only with integers but works on 2, 3, 4, … n arguments.
constexpr int min(int n, int p) { return n < p ? n : p; } template<typename... Args> constexpr int min(int n, Args... args) { return min(n, min(args...)); } int main(int argc, char *argv[]) { static_assert(min(4, 5, 6, 42, 7, 3, 6) == 3, "min is incorrect"); }
The recursion mechanism is classical, but what I would have implemented with a vector in the past (in a run-time version) or template recursion (in a compile-time version) is fully expressible with the concept of variadic template, and it is enjoyable, because it is kind of beautiful. It allows to be computed at compile-time if all the arguments are deductible at this time, or simply computed at run-time. It is also invisible for the user.
It is important to note that the arguments are passed by copy. If we
wanted to pass them by reference, the game would be harder… And to
be honest, at this time, I don't know how I should do. It seems more
complex to handle the recursion because it implies to implement
different combination of the arguments ( int&, int&&; int&&, int&;
…). There was a proposal to make it as a part of the standard
library (proposition n772), or with initializer_list
. The proposal shows it is
easier to implement and faster with the second method. We keep this min implementation
as a simple example to understand variadic templates. Any
proposition of a fully generic working version (using constexpr and variadic template) of this is encouraged! A solution might be to use std::forward
since it seems to be made for handling the forwarding, but my first tries were not
successful.
Now you have seen why I think the variadic template are good to play
with, let's talk about forwarding arguments. Forwarding is made by
the std::forward
function from the `utility' header (code can be
found in `bits/move.h'). It forwards the arguments exactly as they
are received (more information in the man). Here is a little example
of what makes std::forward
in the code:
#include <iostream> #include <utility> struct Item { Item(): value{0} {} Item(const Item& p){ std::cout << "Copy" << std::endl; } int value; }; void take_args(int& a, int&& b, int c, Item& f) { std::cout << a << " - " << b << " - " << c << " - " << f.value << std::endl; } template <typename... Args> void call_take_args(Args&&... args) { take_args(std::forward<Args>(args)...); } int main() { Item f; int i = 2; call_take_args(i, 4, 5, f); } // The program outputs "2 - 4 - 5 - 0".
If we remove the reference in
the last take_args argument, "Copy" is also printed. We can see that
the program won't compile if we remove a `&' for the second
argument, or if we add an extra one to the first. This is because
std::forward
keeps the r/l-valueness of the argument received.
How are they able to know this? Let's take a look at the source of
this function (version 4.7.1 of glibcxx):
template<typename _Tp> constexpr _Tp&& forward(typename std::remove_reference<_Tp>::type& __t) noexcept { return static_cast<_Tp&&>(__t); } template<typename _Tp> constexpr _Tp&& forward(typename std::remove_reference<_Tp>::type&& __t) noexcept { static_assert(!std::is_lvalue_reference<_Tp>::value, "template argument" " substituting _Tp is an lvalue reference type"); return static_cast<_Tp&&>(__t); }
The first version of the function takes a lvalue and the second a
rvalue. The static_assert just checks if the specialization works as
expected. This specialization works because std::remove_reference
takes a type `T' (possibly U, U&, U&&), and the `type' is a typedef
of T. So we are able to
manage a use case where we give as parameter T, T& or T&& the same
way. As a result we have two functions with the following signature:
forward(T& __t)
and forward(T&& __t)
. If the
argument is a lvalue it goes in the first one, otherwise to the
second. noexcept
just helps the compiler to know that the
function/method will not throw an exception or that the program
should be stopped if an exception tries to escape from here.
Now let's take a look at what they return. The fact that both functions seem to return the same thing might look weird. We have the feeling that they return the same type: `T&&'. But in fact, `T' could be a `U&' or a `U&&', and C++11 comes with a rule named reference collapsing rules (see this article which talk about it). This is simple:
- U& & => U&
- U&& & => U&
- U& && => U&
- U&& && => U&&
This looks like the `and' truth table where & is 0 and && is 1. A good way to remember I think. So: if _Tp is a U& (the first function), the returned object will be U& too, and if it was U&& it will be U&&. Which follows the rules of a perfect forwarding. Now we have understood the concept of the forwarding, and the way it is done, we can take a look at the code of push_back and emplace_back to know what makes the change.
2.2 Emplace_back and Push_back
Now we have all the tools in our hands to understand the difference between these two methods, we can just show the source code:
// Taken from bits/stl_vector.h void push_back(const value_type& __x) { if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) { _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, __x); ++this->_M_impl._M_finish; } else #ifdef __GXX_EXPERIMENTAL_CXX0X__ _M_emplace_back_aux(__x); #else _M_insert_aux(end(), __x); #endif } // Taken from bits/vector.tcc template<typename... _Args> void vector<_Tp, _Alloc>:: emplace_back(_Args&&... __args) { if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) { _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, std::forward<_Args>(__args)...); ++this->_M_impl._M_finish; } else _M_emplace_back_aux(std::forward<_Args>(__args)...); }
The main difference is the use of the std::forward
. The macro
__GXX_EXPERIMENTAL_CXX0X__
is defined when an option to
activate C++11 is set. The _M_emplace_back_aux
and
_M_insert_aux
are responsible of the reallocation when the number
of available position (available means allocated and not already taken) is
down to 0.
Since there is no other difference between the two methods, that
means the interesting point is in the construct
method (we just
leave the *aux method for this post, it will take too much time to
analyze them).
Before that, we have to present briefly the concept of
putting `::' before a function/method name and the concept of using a placement
new. If your are familiar with these concepts, you should skip these
paragraphs and go directly after the code of the construct
method.
It is possible for a class to override its operator new. This
(static) method is responsible of the allocation of the object. If
we put `::' before the operator new
that means that we want to
take the one in the global namespace and not the overridden one. The
code below shows the difference between these two notation:
struct Foo { Foo(int a) : value{a} { } static void* operator new(size_t) { return (void*)42; // g++ is able to detect if // we return 0 and raises a warning. } int value; }; int main(int argc, char *argv[]) { Foo* bad = new Foo(4); // segmentation fault. Foo* good = ::new Foo(4); // ok. return 0; }
The operator new returns a pointer to an unallocated area, which leads to a segmentation fault when the constructor is called and tries to write the value of `a'. If we put `::' in front of new, this buggy operator new is not called, and there is no memory corruption.
Now, let's talk about placement new. This is just a way to use
specific position in the memory, and this can be useful when working
with memory pools, or when performance is needed: allocation and
deallocation are expensive. Avoiding this can speed up the program.
The syntax is just: new(position)
, where position is a pointer
to where the object must be. This is as simple as that.
Now let's close the parenthesis and look at the code of construct (this is not the construct which is directly called, but the other functions that calls this one are not interesting since the same path is followed by emplace_back and push_back):
// Taken from ext/new_allocator.h template<typename _Up, typename... _Args> void construct(_Up* __p, _Args&&... __args) { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
This version is the C++11 version, which manages the case where we copy, and the case where we construct. Let's start by understanding this line, they use the placement new because this is not up to the `new' to decide where put this object, since it must be put at the end of the vector.
There is a call to the constructor and the arguments
received are forwarded. These arguments are the ones given to the emplace_back
method. And then it is up to the overloading to make it works. If we
are making a copy (we arrived here by push_back), the copy
constructor is called. Otherwise, we call another constructor if it
exists (if not, we get a compile-time error).
And that's all! To arrive here, we have to know variadic template,
argument forwarding, constexpr, placement new, operator new
overriding and reference collapsing rules. But at least, we
understand what are the changes needed to be able to make this
emplace_back
method, and how the C++11 makes it possible.
Thanks to the new standard for this improvement! :-)
PS: Thanks to Christopher Chedeau and Gregoire Bouchetoun for their comments and corrections about this article.
too awesome
ReplyDeleteReally good article.
ReplyDelete