Category Archives: coding

Code benchmarks : how can I measure how fast my software is (and make it faster) ?

You probably heard many times a coder say “this way is faster” : nothing can be more wrong nowadays unless you prove your statement by measuring the same piece of code running with algorithm A and algorithm B (the faster one). And there were times when you could measure the speed of the execution time of a program 1, 2 or 10 times without noticeable variability : those times are gone. Measuring how fast a piece of software runs is about getting a set of results with the lowest possible variability.

Variability in computing time in modern computer architectures is just unavoidable; while we can guarantee the results of a computation we cannot guarantee how fast this computation will be :

“Computer can reproduce anwsers, not performance” : Bryce Adelstein Lellback, https://youtu.be/zWxSZcpeS8Q?t=6m45s

Reasons for variance in computation time can be recap in :

  • Hardware jitter : instruction pipelines, cpu frequency scaling and power management, shared caches and many other things
  • OS activities : a huge list of things the kernel can do to screw up your benchmark performance
  • Observer effect : every time we instruments code to measure performance we introduce variance.

Also warming up the cpu seems to have become necessary to get meaningful results. Running hot instead of cold on a single piece of code is well described here :

https://youtu.be/zWxSZcpeS8Q?t=18m51s

This said the process for benchmarking an piece of code is :

  • Write some benchmark code : define a subset of execution (let’s call it op1) in your code and measure how many op1/sec. Do the same for other parts (op2, op3) but be sure that all parts can be run indipendently from each other   
  • Assure that samples population that we get is “normal” distributed

In details I use C code to measure and :

  • Use GNU Scientific Library (gsl) for mean, median, variance, stdev, skew running stats
  • Use CLOCK_MONOTONE clock_gettime() mode. An example code here.
  • Check if the set returned has ‘normal’ distribution : find the algorithm you like here (I use mean-median/max(mean,median) < 1%, not really correct but ok for my purposes).
  • Narrow down as much as possible the size of the code you are measuring
  • Warm up the cpu before taking samples to avoid having too much variability. You do this by running the code you want to measure many times before actually taking samples. Ideally you should run it until you have normal distributed results. If you don’t then probably the piece of code you are measuring is too big and you are experiencing the effects of other perturbations (os ? io ?).
  • Consider using a benchmark framework : google benchmark (c++) is great for this but I’m afraid you will have to measure C++ code.
  • Lock the cpu clock (if you can) on the host you are using for benchmarks.

Prepare to see strange results 🙂

gcc/clang -fsanitize is saving lifes

 There where times when you had to write C/C++ code and find all the bounds errors and memory leaks by hand or with ancient tools (who remembers Purify?). Valgrind introduced a lot of features but sanitize features added in gcc/clang are just awesome (thanks google guys for this). Just add -fsanitize=address both to compilation and link of your unit tests (we use google test framework), execute the test and get valuable info on memory leaks and out of bounds access to data. In detail :

Testing and inclusion of sanitize is under way in our lab : more info on this in a future post. Work from this team led to the inclusion in Golang of the Data Race detector.

 

Creating logstash development environment for core or plugins

I had to develop a ruby logstash plugin recently and I going to recap all necessary steps to create a clean jruby env for development :

  • i had to uninstall ruby and install a clean jruby only stack. Not saying that this is necessary but in my case it only worked this way
gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
curl -sSL https://get.rvm.io | bash -s stable --ruby=jruby-9.1.10.0
rvm alias create default jruby-9.1.10.0
source $HOME/.rvm/scripts/rvm
ruby -v
jruby 9.1.10.0 (2.3.3) 2017-05-25 b09c48a OpenJDK 64-Bit Server VM 25.151-b12 on 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12 +jit [linux-x86_64]
  • Install and compile logstash just to see if everything works. Check out logstash and from inside the directory :
jruby -S gem install rake bundler
rake bootstrap
rake plugin:install-default
bin/logstash -e 'input { stdin { } } output { stdout {} }'

  • Now compile the plugin from the plugin folder
bundle install
bundle exec rspec # test it
gem build logstash-filter-<yourplugin>.gemspec
  • Install the plugin gem and test it
bin/logstash-plugin install logstash-filter-<yourplugin>-1.0.0.gem

 

Google C++ style guide : something every C++ coder out there should read

Having over 100 million lines of C++ code, google took the C++ problem (language with so many features, different ways of doing the same things, dangerous constructs, some pointers to the problems with c++ can be found on wikipedia, and here for an authoritative opinion)  seriously and this is what came out.

Some interesting decisions taken :

“We do not use C++ exceptions.” from here

“Avoid defining macros, especially in headers; prefer inline functions, enums, and const variables. Name macros with a project-specific prefix. Do not use macros to define pieces of a C++ API.” from here.

“Avoid complicated template programming” from here

 

REST API Server and Windows : ASP.NET Core kestrel, OWIN or plain old IIS + ASP.NET Framework ?

If you are planning to build API and make them available as micro services and your code assets are C#/.NET Framework/Windows then you have some choices :

  1. ASP.NET Core comes with an internal  web server called Kestrel : kestrel claims to be a light and fast web framework which runs ASP.NET Core applications without the need for IIS in front (you might decide that IIS is usefull for other things). Kestrel allows for middleware modules (like IIS ISAPI filters) that allow for adding features to the basic server.
  2. Katana (OWIN Microsoft implementation) is a collection of NuGET packages for building OWIN applications.
  3. Plain old IIS + ASP.NET REST API code

Some goo information can be found here from uShip guys
I’m basically writing this for myself as a reminder for the new .NET Core Microsoft environment.

On the importance of having a second head to test code and TDD ( Coders just don’t want to break their code )

Good Coders don’t want to break their code because their code is like a creature and nobody wants to harm a creature. Take a note, this is how it is. Good Coders tests are just ‘combing’ the code they should test. But this creature has bugs, no matter how good the Coder is, so good Coders are bad at writing tests on their own code.

A good test tries to tear code apart, to abuse it doing the nastiest things, at the maximum possible speed, in the worst possible way : this is why you should consider making another set of tests by another coder which has not written the code itself.

But with TDD this is difficult : I like the idea that a Coder writes it’s software use cases before writing the software itself but I’m missing how to include in this process a second head to develop some really nasty tests on that piece of code and I’m not a great fan of pair programming (for many reasons).

So open issue for me here : how to include development by a second head of integration tests.

What’s wrong with word “simple” ? ( on over engineering code )

Over engineering code is a common plague that leads to a number of unwanted side effects in your software ranch. This post does not want to be an exhaustive guide on over engineering but with the help of many quotes  from different sources, I’ll try to summarize the impacts, when you can have evidence of over engineered code and the causes (to try to avoid them).

Impacts

  • unneccessary complexity => less agility in software
  • more code, more bugs
  • more code, more unit tests, more development time
  • performance impacts
  • longer catch up time for newcomers
  • difficult maintenance, fixing bugs requires more time than it should
  • small changes require big efforts and are prone to having many bugs
  • unwanted features are present even though there is not reference in requirements/backlogs and so you have to maintain them even if they are not used
  • false perception of complexity in the code that leads to over inflated estimates for fixes, changes.
  • extending an over engineered project makes extensions more complex than necessary
  • dismantling of over engineered code takes way longer than it took to over engineer it
  • Need for complete rewrite over time to reduce unneeded complexity when maintenance times become higher than rewrite time This is an interesting topic itself and needs a separate post)

Concrete symptons  of over engineered code

Really difficult here to point out some easy and simple guidelines to recognize over engineered code. Often it is an individual perception and this does not help. I’ll try :

  • When a well experienced coder, which has been working on that piece of code for at least an year, still takes way more than necessary/affordable to figure out where/how a specific feature works. I’m trying to find a better explanation for this concept
  • In code : using a factory even if it is making type of objects
  • In code : use an interface even if it is actually going to be implemented by just one class
  • From PrematureGeneralization

“This is really going to be a clean framework. I’ll make an abstract class out of this part so that folks can subclass it later, and I’ll put in a bunch of well-commented overridable hooks in the concrete subclasses so that folks can use them as templates, and just in case somebody ever needs to build special debug subclasses, I’ll put in extra stubs over there (somebody will thank me for ’em one of these days). Yeah, this is really going to be neat.Thus is bloated software produced when our artistic sense gets the better of us. — DaveSmith “

Some other signs of over engineering taken from stackoverflow :

One very strong warning sign of over engineering is when everything goes through so much indirection that it’s hard to find the piece of code that actually implements some concrete, domain-level piece of functionality. If you find that most of your functions do very little concrete work and just call other virtual functions, you may have a problem.

https://softwareengineering.stackexchange.com/questions/193929/too-object-oriented?newreg=f7722bef1ba7496e9d8b4e7d328cadf1

[..] Make factories where the factories make more then one type of object. Use dependency injection, where it immediately shows benefits. Make interfaces that are actually going to be implemented by more then one class [..] What I see too often in “true OO” is that advanced techniques are used to solve really simple problems in an overly complex way [..]

Causes of over engineering :

Accidental vs essential complexity

Interesting article, some really good points :

Software should behave predictably and accomplish its goals without too many surprises (that is, outages in production). The number of surprises directly correlates with the amount of unnecessary complexity found in a project. It’s therefore crucial to think about accidental complexity and essential complexity:

Accidental complexity relates to problems which engineers create and can fix, [whereas] essential complexity is caused by the problem to be solved, and nothing can remove it
— Fred Brooks in his seminal “No Silver Bullet” essay

Another interesting point on when a coder start to write unnecessary code :

Boredom is good precursor to over-engineered code. I’ll admit, when I got my first job, I felt so underutilized. I was just bored. And when I got bored, I wrote code. Not just any code — CATHEDRALS OF CODE.

No seriously, I had a mental picture of my code and abstractions as large towers with golden jutting spires, flying buttresses of glassy onyx, a wonderful vault supporting by arched domes topped with beautiful geometrical tracery, etc etc etc.

It was really fascinating to see the patterns working together for myself, but in retrospect, I am completely ashamed of the ungodly mess I left behind.

If you’re writing your own frameworks and DSLs code to while away the less stimulating hours at work, just stop. Time is better spent reading Wards Wiki, or writing an open source book, or you may just want to ask management for more work.

Some theory on why/when over engineering happens

  • Second system effect : you wrote a first good, simple, performant version of a product and it is getting traction so you decide to write 2.0 version of it which does everything.
  • feature creep : you just keep adding features that are used by less than 1 % of the user base/use cases.
  • inner-platform effect : you are making your system so customizable to become a replica of the features provided by the sw component of dev environment you are using.

Techniques to avoid over engineering

Beware of complicated solutions (that someone was paid to find) @ntaleb

I’m still working on this point : here are some  references to interesting concepts

The KISS principle

YAGNI : Your Arent Gonna Need It

DRY : Don’t repeat yourself

Simplicity is a prerequisite for reliability

The “Fix it or rewrite it from scratch” dilemma

I’m sure you found yourself in the situation of having to fix a piece of code that :

  • was not written by you
  • has not a clear set of requirements or the requirements are clear but the code is unnecessarily complex (at a first glance looks like you encountered the dreaded “big ball of mud”)

My first question here is to understand where is the break even between fix and rewrite. I tend to prefer rewriting code if :

  • besides needing fixes there are also performance problems
  • code needs to be fixed/changed often
  • code is a company asset
  • code complexity is exposed to customers
  • coders turnover is high

Remember that :

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
— Kernighan and Plauger in “The Elements of Programming Style”

SmartOS is like having vagrant/docker/virtualbox builtin in your OS

I recently had a chance to try out SmartOS after my last experience in application porting on it (2012) and I was impressed by the virtualization features that are made available directly by the os. One set of commands allows downloading images, running either vm like container or namespace like  containers. Zones also allow you to run debian/centos applications on SmartOS inside a lx zone with system call translation … Awesome.

If your interested in digging deeper take a look at this post from Tim Boudreau.

 

Yes, I like go programming language

Yes, I like Go programming language. I’m liking it so much that I have to resist from being a fan boy. I’m trying to understand where all this enthusiasm is coming from (I’m a seasoned coder)  so here’s an attempt to find why :

  1. Code readability — and maintainability — first, language features second
  2. Integrated test environment : go test <package> executes all tests for the package. Unit testing features are builtin.
  3. Code Coverage is builtin (with some limitations, for example if you use cgo it will not work).
  4. Integrated tool chain : no need to have makefile at the cost of rigid hierarchy of data.
  5. Exhaustive standard library containing everything you need to do server-side/network programming
  6. Good Multi-threading features/model included in language (sync package, goroutines, channels), fast goroutines thanks to segmented stack implementation.
  7. Basic set of OOP features, centered on composition, not inheritance : you won’t be able to mess up your code at the cost of not being perceived as an OO language by OO fanboys. For more details on whether go is oo or not go here.
  8. Go is backed by some Famous Names in computing, and this inspires confidence.
  9. CamelCase 🙂 ? Naaah, I hate camel case but I like the choice of having standard style, comments, indent; all supported by the language via go fmt package so that all code will look coherent.

So basically I like the fact that Go is a very opinionated language. You might like the single decisions or not but what I like most is that someone took care of taking them for you (so you don’t have to enforce them team wide or company wide)
Interesting read also on how and why go was born : quoting from Rob Pike speech at go conference SF 2012 :

“To put it another way, oversimplifying of course:

Python and Ruby programmers come to Go because they don’t have to surrender much expressiveness, but gain performance and get to play with concurrency.

C++ programmers don’t come to Go because they have fought hard to gain exquisite control of their programming domain, and don’t want to surrender any of it. To them, software isn’t just about getting the job done, it’s about doing it a certain way.

The issue, then, is that Go’s success would contradict their world view.

And we should have realized that from the beginning. People who are excited about C++11’s new features are not going to care about a language that has so much less.  Even if, in the end, it offers so much more.”