Monday, July 30, 2012

How To Demangle C++ Symbols

I am finally on holidays ! I was pretty busy these last months… Now I have some times for me, I am able to unstack a lot of stuff I wanted to share with the lost reader(s ?) of this blog. Let's start with a few things about C++ and its mangling.

The C++ mangles its symbols. But sometimes it is hard to reread these names. For example when you work with a lot of different libraries and when you get a "undefined reference to <hardly-readable>". So in this post, I'll show you how to decypher these symbols in two different ways on GNU/Linux.

void leave_a_comment()
{
  return;
}

void foo(int v)
{
  int this_blog_is_cool = v;

  if (this_blog_is_cool)
    leave_a_comment();
}

int main(int argc, char *argv[])
{
  foo(42);
}

Here is an example of a simple and useless C++ program, to show what makes the C++ with these function names. Let's assume that the source code above is in the file bar.cc.

$ gcc -c bar.cc
$ nm bar.o
00000000 T _Z15leave_a_commentv
00000005 T _Z3fooi
0000001e T main

It seems readable. Z is reserved word for C++, the number after corresponds to the number of characters that composed the function name, for example here it is 3 for `foo' and 15 for `leaveacomment` It is then postfixed by the type of arguments. For the first function no argument, it is void (v). For the second one, it is an int (i). In this case it is easy to understand what corresponds to what. But it is enough to see what are our choices to demangle this identifiers.

I personally know two ways, (maybe there is more, thanks for reporting it ! :)). The first way is to use `nm' itself. It has a nice option named `-C' which allows to demangle the identifiers. If we use it, we get this results:

$ gcc -c bar.cc
$ nm -C bar.o
00000000 T leave_a_comment()
00000005 T foo(int)
0000001e T main

Which is far better right? In fact `-C' is a shortcut for the `–demangle' option. It can takes several style as input (see the man for more information). Warning, it seems that this option is not defined by the Single Unix Specification.

Another way is to use c++filt which is a tools from `GNU Binary Utilities'. It allows to demangle C++ symbols. It can either takes a mangled symbols in the command line, or read it from stdin. You don't have to filter yourself the output (for example the nm output), to make it works. You just have to give it all and he replaces what he has to. Here is an example:

$ gcc -c bar.cc
$ nm bar.o | c++filt
00000000 T leave_a_comment()
00000005 T foo(int)
0000001e T main

That's all folks ! I hope this will be useful ;)