These few pages describe our participation to the programming exercise proposed by R.Hundt in his paper Loop Recognition in C++/Java/Go/Scala, where he presents the implementation of a well specified benchmark in different languages…
We have chosen to compare Cawen only with the C++ version proposed by R.Hundt.
How will Cawen behave in this friendly derby ?
The two C++ versions
It is the version published by R.Hundt. you can download using :
svn checkout http://multi-language-bench.googlecode.com/svn/trunk/ multi-language-bench-read-only
R.Hundt did not opensource the C++ « pro » version.
But, he gave some hints to improve significantly the V0 version. In this list of changes we picked up the easiest to implement, namely :
- Additional gain for getting rid of BasicBlockMap and storing dfs_numbers on the BasicBlocks themselves.
- Converted BasicBlockSet from set<> to vector<>.
- Changed node_pool from a list<> to a vector<>. Moved definition out of the loop to recycle it.
- Changed LoopSet from set<> to vector<>.
- Changed LoopList from list<> to vector<>.
- Changed EdgeList from list to vector.
The eight Cawen versions
This version is a cut/paste of the C++ V0 version. Except for one thing : by default, pseudo-destructors of the Cawen variables are not automatically called when their scope ends. V0.0 Cawen code explicitly cleans all the containers.
Same as V0.0 with the difference that all memory allocations of areas less or equal to 64 bytes are done through a home-made allocator.
Same as V0.1 with the difference that cleaning is no more explicit. Automatic cleanup when variables pass out of scope is activated.
Same as V0.0 with implicit cleanup
Equivalent to C++ V1 + explicit cleanup.
Equivalent to C++ V1 + explicit cleanup + allocator
Equivalent to C++ V1 + automatic cleanup + allocator
Equivalent to V1.0 + implicit cleanup.
We ran the benchmarks on 4 platforms :
- Platform A : gcc 4.2.1 / FreeBSD 8.2 / Intel Core i5-2300/64b/2.80GHz /16G RAM
- Platform B : gcc 4.2.1 / FreeBSD 7.1 / Intel Celeron /32b/ 2,66 GHz / 1G RAM
- Platform C : gcc 4.5.3 / Cygwin6.0/ Intel Pentium Dual-Core T4200 /64b/2GHz /4G RAM
- Platform D : gcc-4.0.1 / Mac OS X 10.5.8/ Intel Core 2 Duo /32b/ 2,4GHz /2G RAM
The criteria and their measurements
We calculate the average of (sys + user) from the time command over seven runs.
We first used Valgrind that gives the exact number of allocations and their cumulated size. Then, to be coherent with R.Hundt’s paper, we used one top command every second all through the run-time and calculate the average over the coherent measures.
We have been surprised not to find, foreach language under evaluation, some tool that could count only “useful” lines. That is to say, a tool that would know enough about a given langage to make the difference between a comment line, a white line and a statement.
We might not have searched enough.
We use :
To stay fair with C++, we copied comments and even indentation mistakes…
Our preprocessor is installed only on platform A. On the other platforms we compiled the generated C99 code with gcc.
All results are available in an array.
Code sizes et compilation times
# Version 1
The only difference beetween V0.0 and V0.3, V0.1 and V0.2, V1.0 and V1.3, V1.1 and V1.2 is their code size and precompile times, their processings are the same and so are their performances. In the rest of this presentation, we will present only one result for each of these couples.
Analysis and hypothesis
First of all, Cawen compilation times are huge even on a rather tough server. « We are working hard on it ».
One can observe that Vx0 Cawen/C99 versions runs faster (from 25 to 64%) and needs less memory (from 14% to 41%) than the corresponding C++ versions.
Adding an allocator reduces run-times by 3% to 34% but makes memory footprint reach the C++ level.
We had no time yet to find out why the C99 code generated by Cawen is more efficicent than the corresponding C++ versions. Among the hypothesis :
- differences between C99 and C++ compiler
- STL poor performances
- Judy library efficiency
If you are curious about it (we are), don’t hesitate to download and inspect the C code.
What the benchmark tells about Cawen
According to our different implementations of the benchmark, Cawen seems to be a litlle less verbose than C++ (by 3 to 6%). Mainly because we used « foreach » (available in C0x11) and because the writing functions of our containers accept any type of data as « from » argument.
To give a better idea of Cawen‘s expressiveness, we can add that govel, our baby STL, that includes a stack array, a heap array, a double-linked list, the wrapping of Judy1 and JudyL, their associated iterators, differents foreach (runtime & preprocessing) and a copying/cleaning policy is 10081 Cawen lines long.
Adding an allocator
With two Cawen lines, we reoriented every memory allocation/deallocation to our aggHmg allocator. Please notice that the deallocation function of this allocator needs a « size to deallocate» argument.
In one Cawen line, we activated the automatic cleanup at scope ends. This freeing policy , which is 330 Cawen lines long, belongs to govel (our container collection) and is not part of the language itself.
The debates about languages, their respective advantages and disadvantages, very often turn to religious wars.
We would like to thank R.Hundt for reminding us that their main purpose is to express simply and clearly some possibly very complex processings and then to execute these processings efficiently.
So that there must be some objective criteria to estimate their utility in solving a given problem.
Following this very general principle, we hope we contributed to prove that there is still a lot of room for new solutions.
We are very conscious that Cawen remains very perfectible and that we have a lot of work ahead of us…Although we don’t know how far little Cawen will go, will we agree to say that his first steps seem promising ?