Category Archives: benchmarks

Benchmarking golang code

Let’s say that you want to know if EncodeToString is faster than fmt.Sprintf : you will need to compare the speed of this method

func Md5Encode(str string) string {
	md5HashInBytes := md5.Sum([]byte(str))
	md5HashInString := hex.EncodeToString(md5HashInBytes[:])
	return md5HashInString
}

with this other one

func Md5EncodeFmt(str string) string {
	md5HashInBytes := md5.Sum([]byte(str))
	md5HashInString := fmt.Sprintf("%x", md5HashInBytes)
	return md5HashInString
}

Go provides benchmarking features in the testing package which is pretty usefull :

func BenchmarkMd5EncodeFmt(b *testing.B) {
	// run the md5Encode function b.N times
	for n := 0; n < b.N; n++ {
		Md5EncodeFmt("aldfhasdl la fasdfeo8ekldjh asdkj fh lksdjfhwoieuxnroiAUN;laiDJ;ANIfub;OEIRBUF;OEfuN;ALFJ;AL")
	}
}

func BenchmarkMd5Encode(b *testing.B) {
	// run the md5Encode function b.N times
	for n := 0; n < b.N; n++ {
		Md5Encode("aldfhasdl la fasdfeo8ekldjh asdkj fh lksdjfhwoieuxnroiAUN;laiDJ;ANIfub;OEIRBUF;OEfuN;ALFJ;AL")
	}
}

Run

$ go test -bench=.
goos: linux
goarch: amd64
BenchmarkMd5EncodeFmt-8   	 1894791	       625 ns/op
BenchmarkMd5Encode-8      	 3068509	       363 ns/op
PASS
ok  	_/home/paul/LazyInit/bench	3.342s

Run 3 times the benchmarks :

$ go test -count 3 -bench=. 
goos: linux
goarch: amd64
BenchmarkMd5EncodeFmt-8   	 1882105	       627 ns/op
BenchmarkMd5EncodeFmt-8   	 1918942	       624 ns/op
BenchmarkMd5EncodeFmt-8   	 1902894	       625 ns/op
BenchmarkMd5Encode-8      	 3139585	       386 ns/op
BenchmarkMd5Encode-8      	 2937154	       397 ns/op
BenchmarkMd5Encode-8      	 3009801	       380 ns/op
PASS
ok  	_/home/paul/LazyInit/bench	10.217s

EncodeToString() makes your method almost twice faster !

In the case you need to isolate the code that will be measured from some init/fixture code you can use ResetTimer() and StopTimer() to accurately isolate what you want to measure :

func BenchmarkMd5EncodeFmt(b *testing.B) {
	// Some init code
	initRandomBenchStrings()
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		Md5EncodeFmt(getRandomstrings())
	}
	b.StopTimer()
	// some final code
}

If the function you’re measuring is very slow you might want to increase to maximum time of excecution of the benchmark (default is 1 s) with -benchtime=20s

Recap

without any benchmarks : go test .
with benchmarks (time) : go test -bench .
with benchmarks (time and memory) : go test -bench . -benchmem

The argument following -bench is a regular expression. All benchmark functions whose names match are executed. The . in the previous examples isn’t the current directory but a pattern matching all tests. To run a specific benchmark, use the regexp : -bench Suite (means everything containing Suite).

(thanks to https://github.com/samonzeweb/profilinggo )

Measuring memory footprint of a linux/macosx application

If you’re selling an API or an application which is deployed on production systems, one of the questions your customers might ask you is what is the memory footprint of your API/application in order for them to account for an increase of memory requirements due to using your product. After some research I think that the best tool for measuring and debugging any increases/decrease of your mem footprint is valgrind –tool=massif together with ms_print reporting tools.

Massif is a Heap memory profiler and will measure how much/when you allocate heap memory in your code and show the involved code. Run :

valgrind --tool=massif

this will execute the code and generate a massif.out.<pid> file that you may visualize with

ms_print massif.out.<pid>

Take a ride, the output is absolutely useful and you will have an histogram of how much memory is used at every sampling moment.

Hash functions – efficency and speed

A hash function is any function that can be used to map data of arbitrary size to data of a fixed size. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. Hash functions are often used in combination with a hash table, a common data structure used in computer software for rapid data lookup.

If you happen to need a non cryptographic hash function (and you might need it even if your are using languages that have builtin hashtables like java or c++ ) here are some references taken from various parts:

Interesting comparison of different hash tables :

http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx

Some more hash functions here :

http://www.cse.yorku.ca/~oz/hash.html

Some really interesting benchmarks strchr.com/hash_functions by Peter Kankowsky.

And again hash benchmarks sanmayce.com/Fastest_Hash/.

Hash benchmarks in Rust by https://medium.com/@tprodanov/benchmarking-non-cryptographic-hash-functions-in-rust-2e6091077d11

And more benchmarks : https://medium.com/logos-network/benchmarking-hash-and-signature-algorithms-6079735ce05

Some (a mininum set) benchmarks ran using random keys of random len, 1.5 multiply factor on hash size, next power of 2 hash sizes :

Tests are run on random keys
--- hash_function speed on random long (100-350) keys
Keys    HashFunc         HashSize AvgTime(ns) Collisions LongestChain DimFactor
1000000 djb_hash         4194304    191.03      110620    6            1.500000
1000000 murmur64a_hash   4194304    60.87       110830    7            1.500000
1000000 wyhash_hash      4194304    36.69       110170    5            1.500000
1000000 elf_hash         4194304    444.25      110488    5            1.500000
1000000 jen_hash         4194304    143.90      110471    5            1.500000
1000000 djb2_hash        4194304    189.63      111048    6            1.500000
1000000 sdbm_hash        4194304    247.92      110087    6            1.500000
1000000 fnv_hash         4194304    247.87      110170    6            1.500000
1000000 oat_hash         4194304    309.55      110538    6            1.500000

--- hash_function speed on short keys (10 to 45 char len)
Keys    HashFunc        HashSize AvgTime(ns) Collisions LongestChain DimFactor
1000000 djb_hash         4194304    31.14       110363    6            1.500000
1000000 murmur64a_hash   4194304    28.24       110264    6            1.500000
1000000 wyhash_hash      4194304    21.84       109798    5            1.500000
1000000 elf_hash         4194304    49.88       110845    5            1.500000
1000000 jen_hash         4194304    33.28       110541    5            1.500000
1000000 djb2_hash        4194304    31.21       110664    6            1.500000
1000000 sdbm_hash        4194304    35.60       109906    6            1.500000
1000000 fnv_hash         4194304    35.59       109940    6            1.500000
1000000 oat_hash         4194304    43.28       110123    6            1.500000

Tests done with the hash table code you can find github.com/paulborile/clibs

And last a research study for SIMD optimized hash functions : arxiv.org/pdf/1612.06257.pdf some theory and code github.com/google/highwayhash/blob/master/c/highwayhash.c. The all use the vector extensions present in gcc.

Code benchmarks : how can I measure how fast my software is (and make it faster) ?

You probably heard many times a coder say “this way is faster” : nothing can be more wrong nowadays unless you prove your statement by measuring the same piece of code running with algorithm A and algorithm B (the faster one). And there were times when you could measure the speed of the execution time of a program 1, 2 or 10 times without noticeable variability : those times are gone. Measuring how fast a piece of software runs is about getting a set of results with the lowest possible variability.

Variability in computing time in modern computer architectures is just unavoidable; while we can guarantee the results of a computation we cannot guarantee how fast this computation will be :

“Computer can reproduce anwsers, not performance” : Bryce Adelstein Lellback, https://youtu.be/zWxSZcpeS8Q?t=6m45s

Reasons for variance in computation time can be recap in :

Hardware jitter : instruction pipelines, cpu frequency scaling and power management, shared caches and many other things
OS activities : a huge list of things the kernel can do to screw up your benchmark performance
Observer effect : every time we instruments code to measure performance we introduce variance.

Also warming up the cpu seems to have become necessary to get meaningful results. Running hot instead of cold on a single piece of code is well described here :

https://youtu.be/zWxSZcpeS8Q?t=18m51s

This said the process for benchmarking an piece of code is :

Write some benchmark code : define a subset of execution (let’s call it op1) in your code and measure how many op1/sec. Do the same for other parts (op2, op3) but be sure that all parts can be run indipendently from each other
Assure that samples population that we get is “normal” distributed

In details I use C code to measure and :

Use GNU Scientific Library (gsl) for mean, median, variance, stdev, skew running stats
Use CLOCK_MONOTONE clock_gettime() mode. An example code here.
Check if the set returned has ‘normal’ distribution : find the algorithm you like here (I use mean-median/max(mean,median) < 1%, not really correct but ok for my purposes).
Narrow down as much as possible the size of the code you are measuring
Warm up the cpu before taking samples to avoid having too much variability. You do this by running the code you want to measure many times before actually taking samples. Ideally you should run it until you have normal distributed results. If you don’t then probably the piece of code you are measuring is too big and you are experiencing the effects of other perturbations (os ? io ?).
Consider using a benchmark framework : google benchmark (c++) is great for this but I’m afraid you will have to measure C++ code.
Lock the cpu clock (if you can) on the host you are using for benchmarks.

Prepare to see strange results 🙂

Paul Stephen Borile

software teams management, golang/c coding, bass player, biking addicted