the google benchmark

First results

These few pages describe our participation to the programming exercise proposed by R.Hundt in his paper Loop Recognition in C++/Java/Go/Scala, where he presents the implementation of a well specified benchmark in different languages…

We have chosen to compare Cawen only with the C++ version proposed by R.Hundt.

How will Cawen behave in this friendly derby ?

Versions overview

The two C++ versions
Version V0

It is the version published by R.Hundt. you can download using :

svn checkout http://multi-language-bench.googlecode.com/svn/trunk/ multi-language-bench-read-only
Version V1

R.Hundt did not opensource the C++ « pro » version.

But, he gave some hints to improve significantly the V0 version. In this list of changes we picked up the easiest to implement, namely :

  • Additional gain for getting rid of BasicBlockMap and storing dfs_numbers on the BasicBlocks themselves.
  • Converted BasicBlockSet from set<> to vector<>.
  • Changed node_pool from a list<> to a vector<>. Moved definition out of the loop to recycle it.
  • Changed LoopSet from set<> to vector<>.
  • Changed LoopList from list<> to vector<>.
  • Changed EdgeList from list to vector.

The eight Cawen versions

Version V0.0

This version is a cut/paste of the C++ V0 version. Except for one thing : by default, pseudo-destructors of the Cawen variables are not automatically called when their scope ends. V0.0 Cawen code explicitly cleans all the containers.

Version V0.1

Same as V0.0 with the difference that all memory allocations of areas less or equal to 64 bytes are done through a home-made allocator.

Version V0.2

Same as V0.1 with the difference that cleaning is no more explicit. Automatic cleanup when variables pass out of scope is activated.

Version V0.3

Same as V0.0 with implicit cleanup

Version V1.0

Equivalent to C++ V1 + explicit cleanup.

Version V1.1

Equivalent to C++ V1 + explicit cleanup + allocator

Version V1.2

Equivalent to C++ V1 + automatic cleanup + allocator

Version V1.3

Equivalent to V1.0 + implicit cleanup.

Plateforms

We ran the benchmarks on 4 platforms :

  • Platform A : gcc 4.2.1 / FreeBSD 8.2 / Intel Core i5-2300/64b/2.80GHz /16G RAM
  • Platform B : gcc 4.2.1 / FreeBSD 7.1 / Intel Celeron /32b/ 2,66 GHz / 1G RAM
  • Platform C : gcc 4.5.3 / Cygwin6.0/ Intel Pentium Dual-Core T4200 /64b/2GHz /4G RAM
  • Platform D : gcc-4.0.1 / Mac OS X 10.5.8/ Intel Core 2 Duo /32b/ 2,4GHz /2G RAM

The criteria and their measurements

Run-time

We calculate the average of (sys + user) from the time command over seven runs.

Memory footprint

We first used Valgrind that gives the exact number of allocations and their cumulated size. Then, to be coherent with R.Hundt’s paper, we used one top command every second all through the run-time and calculate the average over the coherent measures.

Code Size

We have been surprised not to find, foreach language under evaluation, some tool that could count only “useful” lines. That is to say, a tool that would know enough about a given langage to make the difference between a comment line, a white line and a statement.

We might not have searched enough.

We use :

wc -l;

To stay fair with C++, we copied comments and even indentation mistakes…

Compilation time

Our preprocessor is installed only on platform A. On the other platforms we compiled the generated C99 code with gcc.

Results

All results are available in an array.

Code sizes et compilation times
Version 0

V0 - code sizes [number of lines]

V0 - compilation times [seconds]

# Version 1

V1 - code sizes [number of lines]

V1 - compilation times [seconds]

The only difference beetween V0.0 and V0.3, V0.1 and V0.2, V1.0 and V1.3, V1.1 and V1.2 is their code size and precompile times, their processings are the same and so are their performances. In the rest of this presentation, we will present only one result for each of these couples.

Binary sizes
Version 0

V0 - binary sizes [bytes]

Version 1

V1 - binary sizes [bytes]

Run-times
Version 0

V0 - run-times [seconds]

Version 1

V1 - run-times [seconds]

Memory footprints
Version 0

V0 - virt memory [Mo]

V0 - real memory [Mbytes]

Version 1

V1 - virt memory [Mbytes]

V1 - real memory [Mbytes]

Analysis and hypothesis

First of all, Cawen compilation times are huge even on a rather tough server. « We are working hard on it ».

One can observe that Vx0 Cawen/C99 versions runs faster (from 25 to 64%) and needs less memory (from 14% to 41%) than the corresponding C++ versions.

Adding an allocator reduces run-times by 3% to 34% but makes memory footprint reach the C++ level.

We had no time yet to find out why the C99 code generated by Cawen is more efficicent than the corresponding C++ versions. Among the hypothesis :

    • differences between C99 and C++ compiler
    • STL poor performances
    • Judy library efficiency

If you are curious about it (we are), don’t hesitate to download and inspect the C code.

What the benchmark tells about Cawen

Code sizes

According to our different implementations of the benchmark, Cawen seems to be a litlle less verbose than C++ (by 3 to 6%). Mainly because we used « foreach » (available in C0x11) and because the writing functions of our containers accept any type of data as « from » argument.

To give a better idea of Cawen‘s expressiveness, we can add that govel, our baby STL, that includes a stack array, a heap array, a double-linked list, the wrapping of Judy1 and JudyL, their associated iterators, differents foreach (runtime & preprocessing) and a copying/cleaning policy is 10081 Cawen lines long.

Adding an allocator

With two Cawen lines, we reoriented every memory allocation/deallocation to our aggHmg allocator. Please notice that the deallocation function of this allocator needs a « size to deallocate» argument.

Automatic cleanup

In one Cawen line, we activated the automatic cleanup at scope ends. This freeing policy , which is 330 Cawen lines long, belongs to govel (our container collection) and is not part of the language itself.

Conclusion

The debates about languages, their respective advantages and disadvantages, very often turn to religious wars.

We would like to thank R.Hundt for reminding us that their main purpose is to express simply and clearly some possibly very complex processings and then to execute these processings efficiently.

So that there must be some objective criteria to estimate their utility in solving a given problem.

Following this very general principle, we hope we contributed to prove that there is still a lot of room for new solutions.

We are very conscious that Cawen remains very perfectible and that we have a lot of work ahead of us…Although we don’t know how far little Cawen will go, will we agree to say that his first steps seem promising ?