Benchmarking golang code

Let’s say that you want to know if EncodeToString is faster than fmt.Sprintf : you will need to compare the speed of this method

func Md5Encode(str string) string {
	md5HashInBytes := md5.Sum([]byte(str))
	md5HashInString := hex.EncodeToString(md5HashInBytes[:])
	return md5HashInString

with this other one

func Md5EncodeFmt(str string) string {
	md5HashInBytes := md5.Sum([]byte(str))
	md5HashInString := fmt.Sprintf("%x", md5HashInBytes)
	return md5HashInString

Go provides benchmarking features in the testing package which is pretty usefull :

func BenchmarkMd5EncodeFmt(b *testing.B) {
	// run the md5Encode function b.N times
	for n := 0; n < b.N; n++ {
		Md5EncodeFmt("aldfhasdl la fasdfeo8ekldjh asdkj fh lksdjfhwoieuxnroiAUN;laiDJ;ANIfub;OEIRBUF;OEfuN;ALFJ;AL")

func BenchmarkMd5Encode(b *testing.B) {
	// run the md5Encode function b.N times
	for n := 0; n < b.N; n++ {
		Md5Encode("aldfhasdl la fasdfeo8ekldjh asdkj fh lksdjfhwoieuxnroiAUN;laiDJ;ANIfub;OEIRBUF;OEfuN;ALFJ;AL")


$ go test -bench=.
goos: linux
goarch: amd64
BenchmarkMd5EncodeFmt-8   	 1894791	       625 ns/op
BenchmarkMd5Encode-8      	 3068509	       363 ns/op
ok  	_/home/paul/LazyInit/bench	3.342s

Run 3 times the benchmarks :

$ go test -count 3 -bench=. 
goos: linux
goarch: amd64
BenchmarkMd5EncodeFmt-8   	 1882105	       627 ns/op
BenchmarkMd5EncodeFmt-8   	 1918942	       624 ns/op
BenchmarkMd5EncodeFmt-8   	 1902894	       625 ns/op
BenchmarkMd5Encode-8      	 3139585	       386 ns/op
BenchmarkMd5Encode-8      	 2937154	       397 ns/op
BenchmarkMd5Encode-8      	 3009801	       380 ns/op
ok  	_/home/paul/LazyInit/bench	10.217s

EncodeToString() makes your method almost twice faster !

Thanks year 2000 : less is immensely more (the 90s produced a lot of crap)

Thanks to god after year 2000 information technology has started moving towards more pragmatic, simple and effective tools and languages. Some examples that in my opinion make this evident : 

Languages and language tools

  • go, rust, swift are all born with the goal of simplifying their direct parents (c++, objectiveC) and removing their pitfalls.
  • UML abandoned : this is a relief for all coders which had to deal with it. I don’t know anyone using it nowadays.
  • git : finally some one (thanks Linux Torvalds) simplified svn/sourcesafe by putting features that are needed by developers in a clear, pretty intuitive command line interface
  • atom/sublime : reaction to the complexity of Visual Studio, IBM Rational, Eclipse ? I think yes


  • Key-value stores/noSQL are just taking ER/SQL model and making it simpler, providing only the features needed in 99% of the applications. Boyce-Codd normal form is pretty nice and interesting but in real world applications you’ll never use it. 
  • Object Databases completely disappeared and in some way also the idea that OO methodology/hierarchy could be applied everywhere (just because you are where using OO languages)


  • docker/rkt are slim alternatives to virtualization and virtual machines


  • plain old REST API aren’t just a simple way for doing things without having to Corba/Soap ?
  • gRPC : provides corba like features while being 1 order of magnitude more efficent and portable on any platform.

What I’m saying is that the 90s produced a lot of unnecessarily complicated tools and technology which developer just did not need/like which is being progressively substituted with simpler stuff.


11-36 Cassette on Tiagra RD 4700 (hacking a tiagra rear derailleur)

Tiagra rear derailleur is often mounted on gravel bikes but I did not like its limit to 34 maximum cog size. My hackers attitude comes out in these situations (and I don’t care breaking the component warranty) so I decided to modify the derailleur to hold larger cogs.

The problem is that the B-Screw is short and won’t allow you to increase the distanze between the lower cogs from the upper cassette. While some tutorial suggest to put a longer b-screw, I tried what can be seen below :

I used a chain pin and insert it to make the mechanical stop longer. Some thread blocker or resin to keep it still and you’re set.

It works 🙂


Milan skies after fires in the alps, 2018

Friday for Future is running and I feel the need of making sure (firstly to myself) that the process that will bring us totally away from fossil fuel consumption is possible, maybe long, but possible.

Decarbonization (this is the name given to the biggest revamping project in the world) is possible; will require money and time; will require the mutual work of Politics, Science and Industry toward the goals of :

  • producing electricity totally from renewable sources, decentralize prodution
  • reducing the energy consumption in all areas were this is possible
  • decentralize smart grid and electricity storage development
  • substitute direct fossil fuel consumption with renewable alternatives
  • stop deforestation process
  • substitute fossil fuel derived products with fossil derived recycled ones (or carbon free ones if possible)
  • 100 % recycle, waste to energy for the non recyclable

Ambitious plan ? I think this is the biggest revamping project you can immagine and it is already running but I think that the message we all sent last last friday is that we need to ‘deliver’ sooner 🙂

Producing electricity totaly from renewable sources

48 % of CO2 is emitted producing heat or electricity. Many countries are already active in the area of producing electricity from renewables, take Germany for example. 7 year ago (just after fukushima) Germany started phase out of nuclear power by incrementing the share of energy produced by renewables. Some data :


In 1 year Germany increased production from wind energ for example by 20 GWh. Continue this for 10 years and your reach more than half the whole country energy requirements. In fact Germany has also started a plan for removing coal in energy production.

In the first 6 months of 2019 Germany has produced more energy from renewables than from fossile/nuclear : here for some references.

Energy efficency

Again from Germany, a national plan to increase the efficency of systems in all areas which is estimating to produce a saving of 12 to 20% over 2020. Reducing the current energy footprint is fundamental for allowing new segments of activities to start using clean energy (think at electrical traction in automotive which is going to increase national demand)

Decentralize smart grid and electricity storage development

The example here comes form Australia were private energy company GreenSync is stimulating customers to setup local electricity storage to be used when there is shortage of power on the grid. Customers are being paid for the storage. For reason not known to me the biggest development in decentralized grid and storage is taking place in Australia and Japan.

Substitute direct fossil fuel consumption with renewable alternatives, limit impatc of direct CO2 emission

This is probably the biggest task in the project because it is spread over a tens of different segments which need to be revamped to achieve the goal :

Road Transportation : around 15% of total CO2 emissions. Redesigning this segment is going to be one of most serious tasks : cars and trucks make up 1/3 of the co2 emissions in countries like US and it is mostly a consumer segment. Battery powered electric cars, pickups and trucks seems to be the directions with Tesla, the real game changer, paving the road. All automotive industry is trying to catchup. 44 Billion investments announces by Volkswagen group over the next 5 years.

Agricolture : How much CO2 is produced by agricolture is the most controversial issue with estimates ranging from 13% of total CO2 emissions to 18% on fao docs, up to 51% including the effect of not having forests where we make food for cows, pigs and chicken. These comes mainly from Cattle belching (CH4) and the addition of natural or synthetic fertilizers and wastes to soils. Here the only possible change is reducing the use of fertilizers and reduce cattle breeding by eating less meat. Read Jonathan Safran Foer book if you want to dig into this more.

Maritime Transportation : 5% of total CO2 emissions, The world’s merchant fleet consists of around 100,000 ships and these are estimated to consume 250 million tonnes of bunker fuel annually. Just one Capesize Bulk Carrier or Bulker can use 40 metric tonnes or fuel or more a day leading  to an annual fuel consumption of approximately 10,400 tonnes. This results in the emission of around 32,988 tonnes of CO2 and 959 tonnes of SOx or more. This is just from one ship. Still no real prototypes afaik in this area but good project and potential around with project like Acquarius.

Air Transportation : 2% to 3.5% of total CO2 emissions . Various activities undergoing reduction of carbon footprint in aviation.

Substitute fossil fuel derived products with fossil derived recycled ones (or carbon free ones if possible)

This is probably the biggest task in the project because it is spread over a tens of different segments which need to be revamped to achieve the goal :

  • Plastics
  • Lubricants
  • Process Chemicals
  • Carpeting
  • Pharmaceuticals
  • Rubber Goods
  • Adhesives
  • Cosmetics
  • Footwear
  • Paints
  • Detergents
  • Inks
  • Sealants
  • Fragrances
  • Solvents
  • Caulking
  • Compounds
  • Fertilizers
  • Fibers
  • Tires

This point will require a complete structured analysis by its own. International energy agency dedicates a complete section on petrochemicals. They are not easy to replace : recycle will be the solution while Science and Industry find better substitutes.

I’ll stop here at least for now : the message I’m trying to share is that the matter is highly complex and cannot be simplified by just switching off air conditioning or doing these kind of things.

ALL activities have to be done at the same time (thanks Greta for having said this) and Politics IS the driver for all of them.

The Pragmatic Programmer

I think this book is full of valuable thoughts that I would like to recap in this post :

A broken window.
One broken window, left unrepaired for any substantial length of time, instills in the inhabitants of the building a sense of abandonment—a sense that the powers that be don’t care about the building. So another window gets broken. People start littering. Graffiti appears. Serious structural damage begins. In a relatively short space of time, the building becomes damaged beyond the owner’s desire to fix it, and the sense of abandonment becomes reality.

How often this applies to software : you can have the best design guidelines but leaving a broken windows (bad design, wrong decisions, poor code) will slowly propagate that error to all the new code written.

Know when to stop

In some ways, programming is like painting. You start with a blank canvas and certain basic raw materials. You use a combination of science, art, and craft to determine what to do with them. You sketch out an overall shape, paint the underlying environment, then fill in the details. You constantly step back with a critical eye to view what you’ve done. Every now and then you’ll throw a canvas away and start again.
But artists will tell you that all the hard work is ruined if you don’t know when to stop. If you add layer upon layer, detail over detail, the painting becomes lost in the paint.

I read this as don’t over engineer : let your code do the jobs for some time, don’t over refine.

Dry (Don’t Repeat Yourself)

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

We all know this right ? But it is not a matter od duplicating code : it is about duplicating knowledge.


In computing, the term has come to signify a kind of independence or decoupling. Two or more things are orthogonal if changes in one do not affect any of the others. In a well-designed system, the database code will be orthogonal to the user interface: you can change the interface without affecting the database, and swap databases without changing the interface.

You are familiar with orthgonality ( modular, component-based, and layered are synonyms). I read this as : think at your module/component as a service that exposes an API to users :

  • efficient development (no one is waiting for now one else for stuff to be done)
  • easy to test : orthogonal systems can be tested independently
  • easy to understand how to use

To be continued!

Upload filters aka The Censorship machine

Since end of 2016 the European Parliament has filed a proposal for a directive in the area of digital markets and copyrights. As part of this proposal the Article 13 introduces a new concept :

Internet platforms hosting “large amounts” of user-uploaded content must monitor user behavior and filter their contributions to identify and prevent copyright infringement.

As you may imagine this changes the game pretty much.

Let’s make an example : a rightholder of music rights may ask platforms like (Germany) to keep a look over a set of their works. Soundcloud will have to start monitoring all uploads to make sure that those materials are not uploaded by anyone on their platform.

Impact of this regulation, if it is going to pass, will be pretty strong on the EU contries economy. Let’s try to put down some points :

  1. Putting all the control burden on internet platforms hosting contents will probably result in :
    • being much more difficult for EU companies to compete with US/Asia content providers
    • get-away from EU countries for all new startups and existing companies in order to not have to comply with regulation

  2. Filter technology is too vast and complicated to be approached by each and single content provider : hundreds of rightholders requiring control over multiple sets of data ( text, images, audio, video, music score, software code ) will generate the need of content check providers that will de facto have censorship power .

  3. Guilty until proven innocent paradigma : if a filter erroneously blocks legal content it will be up to the content owner fight to make his content reinstated
  4. False positives : as in all automated checking procedures the number of false positives could be extremely high resulting in a limitation of freedom of expression

Many campaigns around this can be found :

  • Save Your Internet“Stand up and ask Europe to protect Your Internet” (offers contact-your-MEP tool)
  • Say No to Online Censorship by the Civil Liberties Union for Europe: “Act now! It’s about our freedom to speak. It’s about censorship.” (offers email-your-MEP tool)
  • #SaveTheMeme,referring to parodies and other expressions of web culture that may be removed by such filtering technology
  • Create•Refresh“These changes put the power of small, independent creators in jeopardy. Creative expression will effectively be censored, leaving only the bigger, more established players protected. Many of the sites that we use every day for information or entertainment may cease to exist.”
  • Save Codeshare

Thanks to Julia Reda (Pirate Party, EU Parliament member) for a lot of information on this topic.

Allocating memory inside a Varnish vmod

Writing varnish modules is pretty well documented by the standard varnish documentation, tutorials  and thanks to valuable work from other people here  . There are some areas I felt the need to be further clarified and this post tries to do that.

Allocating memory inside a vmod is tricky is you need to free it when the current Request is destroyed. Here are some ways :

  • per request memory allocation i.e. scope is the request lifetime so memory will be freed when the request is destroyed) :
void WS_Init(struct ws *ws, const char *id, void *space, unsigned len);
unsigned WS_Reserve(struct ws *ws, unsigned bytes);
void WS_MarkOverflow(struct ws *ws);
void WS_Release(struct ws *ws, unsigned bytes);
void WS_ReleaseP(struct ws *ws, char *ptr);
void WS_Assert(const struct ws *ws);
void WS_Reset(struct ws *ws, char *p);
char *WS_Alloc(struct ws *ws, unsigned bytes);
void *WS_Copy(struct ws *ws, const void *str, int len);
char *WS_Snapshot(struct ws *ws);
int WS_Overflowed(const struct ws *ws);
void *WS_Printf(struct ws *ws, const char *fmt, ...) __printflike(2, 3);

This is a per worker thread memory space allocation, no free necessary as data is removed when the request is detroyed. Ex :

vmod_hello(const struct vrt_ctx *ctx, VCL_STRING name)
char *p;
unsigned u, v;
u = WS_Reserve(ctx->ws, 0); /* Reserve some work space */
p = ctx->ws->f;         /* Front of workspace area */
v = snprintf(p, u, "Hello, %s", name);
if (v > u) {
/* No space, reset and leave */
WS_Release(ctx->ws, 0);
return (NULL);
/* Update work space with what we've used */
WS_Release(ctx->ws, v);
return (p);

Data is allocated starting with 64k and then when needed in 4k chunks in the cts->ws area. No varnish imposed limit.

  • (since varnish 4.0 up) Private Pointers : a way to have multi-scoped private data per each VCL, TASK. You may access private data either as passed on the VCL function signature or by calling directly VRT_priv_task(ctx, “name”) for example to obtain a per request place to hold :
    • free function
    • pointer to allocated data

This method is very interesting if you need a cleanup function to be called when the varnish request is destroyed.


Webassembly/wasm and asm.js

Photo by Markus Spiske on Unsplash

The web assembly thing. I’ll try to clarify things that I learned working on it:

  1. WASM : short for WebAssembly, a binary instructions format that runs on a stack based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust to be run on the Web. Reference here
  2. asm.js : a subset of js, static typed and highly optimizable, created to allow running  higher level languages like C application on the Web. Reference here and here

So you would say 1 and 2 have the same purpose : AFAIK yes. You can also convert asm.js to wasm and decode wasm back to asm.js (theoretically). Seems that WASM is going to be extended in the future compared to asm.js.

Let’s continue :

  1. emscripten  : toolchain to compile high level languages to asm.js and WASM. Uses LLVM and does also come conversion of API (openGL to WebGL for ex) and compiles to LLVM IR (llvm bitcode) and then from LLVM IR Bitcode to asm.js using Fastcomp.
  2. Binaryen (asm2wasm) : compiles asm.js to wasm and is included in emscripten (?)

Supposing that you have a C/C++ project, made of different libraries, I suggest to compile to LLVM IR Bitcode all the single components and just during the link phase generate asm.js/wasm for execution. This will allow you to maintain your building/linking steps as you would have in an standard object code generation environment.
emscripten/LLVM offer a full set of tools to on IR Bitcode if you like :

  • emmake : use existing makefiles by running emmake make
  • emconfigure : use existing configure command by running emconfigure configure <options>

Also if you want to dig deeper into llvm :

  • lli : directly executes programs in LLVM bitcode format. It takes a program in LLVM bitcode format and executes it using a just-in-time compiler or an interpreter
  • llc : compiles LLVM source inputs into assembly language for a specified architecture. The assembly language output can then be passed through a native assembler and linker to generate a native executable

Once you have all your compiled libraries/components in LLVM IR Bitcode you have to generate WASM. The basic compile command is :

emcc -s WASM=1 -o <prog>.html <prog>.c -l<anylibraryyouneed>

but :

  1. If you are using malloc/free you need to add : -s ALLOW_MEMORY_GROWTH=1
  2. If you are using pthreads in your code/libraries you need to add : -s USE_PTHREADS=1 but as of at Jan 2019 you can’t have both malloc/free and pthreads. More info here.

More to come soon.

Profiling a golang REST API server

go tool profiling

Profiling :

is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization.

How can you profile your golang REST API server in a super simple way :

First : add some lines to your server code

import _ "net/http/pprof"

And then add a listener (I normally use a command line flag to trigger this) :

go func() {
http.ListenAndServe("localhost:6000", nil)

Start your server and generate some load. While your code is running under the load you generated extract the profiler data :

go tool pprof http://localhost:6000/debug/pprof/profile
Fetching profile over HTTP from http://localhost:6000/debug/pprof/profile
Saved profile in /home/paul/pprof/pprof.wm-server.samples.cpu.008.pb.gz
File: wm-server
Build ID: c806572b51954da99ceb779f6d7eee3600eae0fb
Type: cpu
Time: Dec 19, 2018 at 1:41pm (CET)
Duration: 30.13s, Total samples = 17.35s (57.58%)
Entering interactive mode (type "help" for commands, "o" for options)

You have many commands at this point but what I prefer to do, having used kcachegrind for years, is to fire it up using the kcachegrind command :

(pprof) kcachegrind

This will generate a callgrind formatted file and run kcachegrind on it to let you do all the usual analysis that you’re probably already used to do (call graph, callers, callees ..)


glibc 2.25 bug : strstr() runs 10 times slower than on 2.24

Linux is used on 54.9% of the world websites : almost every application running on a linux machine uses the glibc which provides the core libraries to access almost every feature of a linux system. The Mighty Glibc started back in 1988 and is a wonderful and glorious project.
As far as the string functions are concerned the sse / avx optimized versions of these functions (strlen, strcpy, strstr, strcmp and more) are up to 10 times faster than their corresponding standard c implementations (which for example you might find in the libmusl) when run on a sse/avx capable cpu.

We rely a lot on glibc string functions and that’s why we found that glibc 2.25 introduced some optimization on the AVX capable processors and this disabled sse* optimizations for methods that don’t have a avx2 optimized implementation (strstr, strcat, and I’m afraid parts of the math functions). For further details go here.
The bug affects ubuntu 18, debian 10, fedora 26 to 28.
A fix will come for sure, hopefully in glibc 2.29.