You probably heard many times a coder say “this way is faster” : nothing can be more wrong nowadays unless you prove your statement by measuring the same piece of code running with algorithm A and algorithm B (the faster one). And there were times when you could measure the speed of the execution time of a program 1, 2 or 10 times without noticeable variability : those times are gone. Measuring how fast a piece of software runs is about getting a set of results with the lowest possible variability.
Variability in computing time in modern computer architectures is just unavoidable; while we can guarantee the results of a computation we cannot guarantee how fast this computation will be :
“Computer can reproduce anwsers, not performance” : Bryce Adelstein Lellback, https://youtu.be/zWxSZcpeS8Q?t=6m45s
Reasons for variance in computation time can be recap in :
- Hardware jitter : instruction pipelines, cpu frequency scaling and power management, shared caches and many other things
- OS activities : a huge list of things the kernel can do to screw up your benchmark performance
- Observer effect : every time we instruments code to measure performance we introduce variance.
Also warming up the cpu seems to have become necessary to get meaningful results. Running hot instead of cold on a single piece of code is well described here :
This said the process for benchmarking an piece of code is :
- Write some benchmark code : define a subset of execution (let’s call it op1) in your code and measure how many op1/sec. Do the same for other parts (op2, op3) but be sure that all parts can be run indipendently from each other
- Assure that samples population that we get is “normal” distributed
In details I use C code to measure and :
- Use GNU Scientific Library (gsl) for mean, median, variance, stdev, skew running stats
- Use CLOCK_MONOTONE clock_gettime() mode. An example code here.
- Check if the set returned has ‘normal’ distribution : find the algorithm you like here (I use mean-median/max(mean,median) < 1%, not really correct but ok for my purposes).
- Narrow down as much as possible the size of the code you are measuring
- Warm up the cpu before taking samples to avoid having too much variability. You do this by running the code you want to measure many times before actually taking samples. Ideally you should run it until you have normal distributed results. If you don’t then probably the piece of code you are measuring is too big and you are experiencing the effects of other perturbations (os ? io ?).
- Consider using a benchmark framework : google benchmark (c++) is great for this but I’m afraid you will have to measure C++ code.
- Lock the cpu clock (if you can) on the host you are using for benchmarks.
Prepare to see strange results 🙂