March, 2018 - Paul Stephen Borile

You probably heard many times a coder say “this way is faster” : nothing can be more wrong nowadays unless you prove your statement by measuring the same piece of code running with algorithm A and algorithm B (the faster one). And there were times when you could measure the speed of the execution time of a program 1, 2 or 10 times without noticeable variability : those times are gone. Measuring how fast a piece of software runs is about getting a set of results with the lowest possible variability.

Variability in computing time in modern computer architectures is just unavoidable; while we can guarantee the results of a computation we cannot guarantee how fast this computation will be :

“Computer can reproduce anwsers, not performance” : Bryce Adelstein Lellback, https://youtu.be/zWxSZcpeS8Q?t=6m45s

Reasons for variance in computation time can be recap in :

Hardware jitter : instruction pipelines, cpu frequency scaling and power management, shared caches and many other things
OS activities : a huge list of things the kernel can do to screw up your benchmark performance
Observer effect : every time we instruments code to measure performance we introduce variance.

Also warming up the cpu seems to have become necessary to get meaningful results. Running hot instead of cold on a single piece of code is well described here :

https://youtu.be/zWxSZcpeS8Q?t=18m51s

This said the process for benchmarking an piece of code is :

Write some benchmark code : define a subset of execution (let’s call it op1) in your code and measure how many op1/sec. Do the same for other parts (op2, op3) but be sure that all parts can be run indipendently from each other
Assure that samples population that we get is “normal” distributed

In details I use C code to measure and :

Use GNU Scientific Library (gsl) for mean, median, variance, stdev, skew running stats
Use CLOCK_MONOTONE clock_gettime() mode. An example code here.
Check if the set returned has ‘normal’ distribution : find the algorithm you like here (I use mean-median/max(mean,median) < 1%, not really correct but ok for my purposes).
Narrow down as much as possible the size of the code you are measuring
Warm up the cpu before taking samples to avoid having too much variability. You do this by running the code you want to measure many times before actually taking samples. Ideally you should run it until you have normal distributed results. If you don’t then probably the piece of code you are measuring is too big and you are experiencing the effects of other perturbations (os ? io ?).
Consider using a benchmark framework : google benchmark (c++) is great for this but I’m afraid you will have to measure C++ code.
Lock the cpu clock (if you can) on the host you are using for benchmarks.

Prepare to see strange results 🙂

There where times when you had to write C/C++ code and find all the bounds errors and memory leaks by hand or with ancient tools (who remembers Purify?). Valgrind introduced a lot of features but sanitize features added in gcc/clang are just awesome (thanks google guys for this). Just add -fsanitize=address both to compilation and link of your unit tests (we use google test framework), execute the test and get valuable info on memory leaks and out of bounds access to data. In detail :

Use after free (dangling pointer dereference)
Heap buffer overflow
Stack buffer overflow
Global buffer overflow
Use after return
Use after scope
Initialization order bugs
Memory leaks

Testing and inclusion of sanitize is under way in our lab : more info on this in a future post. Work from this team led to the inclusion in Golang of the Data Race detector.

Paul Stephen Borile

software teams management, golang/c coding, bass player, biking addicted

Monthly Archives: March 2018

Code benchmarks : how can I measure how fast my software is (and make it faster) ?

gcc/clang -fsanitize is saving lifes