Category Archives: benchmarks

Benchmarking golang code

Let’s say that you want to know if EncodeToString is faster than fmt.Sprintf : you will need to compare the speed of this method

func Md5Encode(str string) string {
	md5HashInBytes := md5.Sum([]byte(str))
	md5HashInString := hex.EncodeToString(md5HashInBytes[:])
	return md5HashInString

with this other one

func Md5EncodeFmt(str string) string {
	md5HashInBytes := md5.Sum([]byte(str))
	md5HashInString := fmt.Sprintf("%x", md5HashInBytes)
	return md5HashInString

Go provides benchmarking features in the testing package which is pretty usefull :

func BenchmarkMd5EncodeFmt(b *testing.B) {
	// run the md5Encode function b.N times
	for n := 0; n < b.N; n++ {
		Md5EncodeFmt("aldfhasdl la fasdfeo8ekldjh asdkj fh lksdjfhwoieuxnroiAUN;laiDJ;ANIfub;OEIRBUF;OEfuN;ALFJ;AL")

func BenchmarkMd5Encode(b *testing.B) {
	// run the md5Encode function b.N times
	for n := 0; n < b.N; n++ {
		Md5Encode("aldfhasdl la fasdfeo8ekldjh asdkj fh lksdjfhwoieuxnroiAUN;laiDJ;ANIfub;OEIRBUF;OEfuN;ALFJ;AL")


$ go test -bench=.
goos: linux
goarch: amd64
BenchmarkMd5EncodeFmt-8   	 1894791	       625 ns/op
BenchmarkMd5Encode-8      	 3068509	       363 ns/op
ok  	_/home/paul/LazyInit/bench	3.342s

Run 3 times the benchmarks :

$ go test -count 3 -bench=. 
goos: linux
goarch: amd64
BenchmarkMd5EncodeFmt-8   	 1882105	       627 ns/op
BenchmarkMd5EncodeFmt-8   	 1918942	       624 ns/op
BenchmarkMd5EncodeFmt-8   	 1902894	       625 ns/op
BenchmarkMd5Encode-8      	 3139585	       386 ns/op
BenchmarkMd5Encode-8      	 2937154	       397 ns/op
BenchmarkMd5Encode-8      	 3009801	       380 ns/op
ok  	_/home/paul/LazyInit/bench	10.217s

EncodeToString() makes your method almost twice faster !

In the case you need to isolate the code that will be measured from some init/fixture code you can use ResetTimer() and StopTimer() to accurately isolate what you want to measure :

func BenchmarkMd5EncodeFmt(b *testing.B) {
	// Some init code
	for n := 0; n < b.N; n++ {
	// some final code

If the function you’re measuring is very slow you might want to increase to maximum time of excecution of the benchmark (default is 1 s) with -benchtime=20s


  • without any benchmarks : go test .
  • with benchmarks (time) : go test -bench .
  • with benchmarks (time and memory) : go test -bench . -benchmem

The argument following -bench is a regular expression. All benchmark functions whose names match are executed. The . in the previous examples isn’t the current directory but a pattern matching all tests. To run a specific benchmark, use the regexp : -bench Suite (means everything containing Suite).

(thanks to )

Measuring memory footprint of a linux/macosx application


If you’re selling an API or an application which is deployed on production systems, one of the questions your customers might ask you is what is the memory footprint of your API/application in order for them to account for an increase of memory requirements due to using your product. After some research I think that the best tool for measuring and debugging any increases/decrease of your mem footprint is valgrind –tool=massif together with ms_print reporting tools.

Massif is a Heap memory profiler and will measure how much/when you allocate heap memory in your code and show the involved code. Run :

valgrind --tool=massif

this will execute the code and generate a massif.out.<pid> file that you may visualize with

ms_print massif.out.<pid>

Take a ride, the output is absolutely useful and you will have an histogram of how much memory is used at every sampling moment.


Hash functions – efficency and speed

hash function is any function that can be used to map data of arbitrary size to data of a fixed size. The values returned by a hash function are called hash valueshash codesdigests, or simply hashes. Hash functions are often used in combination with a hash table, a common data structure used in computer software for rapid data lookup.

If you happen to need a non cryptographic hash function (and you might need it even if your are using languages that have builtin hashtables like java or c++ ) here are some references taken from various parts:

Interesting comparison of different hash tables :

Some more hashfunctions here :

Some really interesting benchmarks here by Peter Kankowsky.

And again hash benchmarks here.

Some (a mininum set) benchmarks ran using User-Agents as keys, 1.5 multiply factor on hash size, next power of 2 hash sizes :

Keys 3130591, Average fh_default_hash(8388608) time in nanosecs : 292.56
Total collisions 517939, longest chain 6, hash_dim_factor 1.500000
Keys 3130591, Average djb_hash(8388608) time in nanosecs : 181.24
Total collisions 517445, longest chain 7, hash_dim_factor 1.500000
Keys 3130591, Average sax_hash(8388608) time in nanosecs : 244.49
Total collisions 517431, longest chain 7, hash_dim_factor 1.500000
Keys 3130591, Average jsw_hash(8388608) time in nanosecs : 141.39
Total collisions 522951, longest chain 14, hash_dim_factor 1.500000
Keys 3130591, Average jen_hash(8388608) time in nanosecs : 148.54
Total collisions 518565, longest chain 7, hash_dim_factor 1.500000
Keys 3130591, Average djb2_hash(8388608) time in nanosecs : 182.37
Total collisions 516405, longest chain 7, hash_dim_factor 1.500000
Keys 3130591, Average sdbm_hash(8388608) time in nanosecs : 238.93
Total collisions 525513, longest chain 7, hash_dim_factor 1.500000
Keys 3130591, Average fnv_hash(8388608) time in nanosecs : 237.01
Total collisions 517530, longest chain 7, hash_dim_factor 1.500000
Keys 3130591, Average oat_hash(8388608) time in nanosecs : 294.78
Total collisions 517939, longest chain 6, hash_dim_factor 1.500000

Tests done with the hashtable code you can find here

And last a research study (which I’m still working on) for SIMD optimized hash functions : here  some theory and code here. The all use the vector extensions present in gcc.


Photo by Luca Campioni on Unsplash

Code benchmarks : how can I measure how fast my software is (and make it faster) ?

You probably heard many times a coder say “this way is faster” : nothing can be more wrong nowadays unless you prove your statement by measuring the same piece of code running with algorithm A and algorithm B (the faster one). And there were times when you could measure the speed of the execution time of a program 1, 2 or 10 times without noticeable variability : those times are gone. Measuring how fast a piece of software runs is about getting a set of results with the lowest possible variability.

Variability in computing time in modern computer architectures is just unavoidable; while we can guarantee the results of a computation we cannot guarantee how fast this computation will be :

“Computer can reproduce anwsers, not performance” : Bryce Adelstein Lellback,

Reasons for variance in computation time can be recap in :

  • Hardware jitter : instruction pipelines, cpu frequency scaling and power management, shared caches and many other things
  • OS activities : a huge list of things the kernel can do to screw up your benchmark performance
  • Observer effect : every time we instruments code to measure performance we introduce variance.

Also warming up the cpu seems to have become necessary to get meaningful results. Running hot instead of cold on a single piece of code is well described here :

This said the process for benchmarking an piece of code is :

  • Write some benchmark code : define a subset of execution (let’s call it op1) in your code and measure how many op1/sec. Do the same for other parts (op2, op3) but be sure that all parts can be run indipendently from each other   
  • Assure that samples population that we get is “normal” distributed

In details I use C code to measure and :

  • Use GNU Scientific Library (gsl) for mean, median, variance, stdev, skew running stats
  • Use CLOCK_MONOTONE clock_gettime() mode. An example code here.
  • Check if the set returned has ‘normal’ distribution : find the algorithm you like here (I use mean-median/max(mean,median) < 1%, not really correct but ok for my purposes).
  • Narrow down as much as possible the size of the code you are measuring
  • Warm up the cpu before taking samples to avoid having too much variability. You do this by running the code you want to measure many times before actually taking samples. Ideally you should run it until you have normal distributed results. If you don’t then probably the piece of code you are measuring is too big and you are experiencing the effects of other perturbations (os ? io ?).
  • Consider using a benchmark framework : google benchmark (c++) is great for this but I’m afraid you will have to measure C++ code.
  • Lock the cpu clock (if you can) on the host you are using for benchmarks.

Prepare to see strange results 🙂