Category Archives: coding

The Pragmatic Programmer

I think this book is full of valuable thoughts that I would like to recap in this post :

A broken window.
One broken window, left unrepaired for any substantial length of time, instills in the inhabitants of the building a sense of abandonment—a sense that the powers that be don’t care about the building. So another window gets broken. People start littering. Graffiti appears. Serious structural damage begins. In a relatively short space of time, the building becomes damaged beyond the owner’s desire to fix it, and the sense of abandonment becomes reality.

How often this applies to software : you can have the best design guidelines but leaving a broken windows (bad design, wrong decisions, poor code) will slowly propagate that error to all the new code written.

Know when to stop

In some ways, programming is like painting. You start with a blank canvas and certain basic raw materials. You use a combination of science, art, and craft to determine what to do with them. You sketch out an overall shape, paint the underlying environment, then fill in the details. You constantly step back with a critical eye to view what you’ve done. Every now and then you’ll throw a canvas away and start again.
But artists will tell you that all the hard work is ruined if you don’t know when to stop. If you add layer upon layer, detail over detail, the painting becomes lost in the paint.

I read this as don’t over engineer : let your code do the jobs for some time, don’t over refine.

Dry (Don’t Repeat Yourself)

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

We all know this right ? But it is not a matter od duplicating code : it is about duplicating knowledge.

Orthogonality

In computing, the term has come to signify a kind of independence or decoupling. Two or more things are orthogonal if changes in one do not affect any of the others. In a well-designed system, the database code will be orthogonal to the user interface: you can change the interface without affecting the database, and swap databases without changing the interface.

You are familiar with orthgonality ( modular, component-based, and layered are synonyms). I read this as : think at your module/component as a service that exposes an API to users :

  • efficient development (no one is waiting for now one else for stuff to be done)
  • easy to test : orthogonal systems can be tested independently
  • easy to understand how to use

To be continued!

Allocating memory inside a Varnish vmod


Writing varnish modules is pretty well documented by the standard varnish documentation, tutorials  and thanks to valuable work from other people here  . There are some areas I felt the need to be further clarified and this post tries to do that.

Allocating memory inside a vmod is tricky is you need to free it when the current Request is destroyed. Here are some ways :

  • per request memory allocation i.e. scope is the request lifetime so memory will be freed when the request is destroyed) :
void WS_Init(struct ws *ws, const char *id, void *space, unsigned len);
unsigned WS_Reserve(struct ws *ws, unsigned bytes);
void WS_MarkOverflow(struct ws *ws);
void WS_Release(struct ws *ws, unsigned bytes);
void WS_ReleaseP(struct ws *ws, char *ptr);
void WS_Assert(const struct ws *ws);
void WS_Reset(struct ws *ws, char *p);
char *WS_Alloc(struct ws *ws, unsigned bytes);
void *WS_Copy(struct ws *ws, const void *str, int len);
char *WS_Snapshot(struct ws *ws);
int WS_Overflowed(const struct ws *ws);
void *WS_Printf(struct ws *ws, const char *fmt, ...) __printflike(2, 3);

This is a per worker thread memory space allocation, no free necessary as data is removed when the request is detroyed. Ex :

VCL_STRING
vmod_hello(const struct vrt_ctx *ctx, VCL_STRING name)
{
   char *p;
   unsigned u, v;

   u = WS_Reserve(ctx->ws, 0); /* Reserve some work space */
   p = ctx->ws->f;         /* Front of workspace area */
   v = snprintf(p, u, "Hello, %s", name);
   v++;
   if (v > u) {
      /* No space, reset and leave */
      WS_Release(ctx->ws, 0);
      return (NULL);
   }
   /* Update work space with what we've used */
   WS_Release(ctx->ws, v);
   return (p);
}

Data is allocated starting with 64k and then when needed in 4k chunks in the cts->ws area. No varnish imposed limit.

  • (since varnish 4.0 up) Private Pointers : a way to have multi-scoped private data per each VCL, TASK. You may access private data either as passed on the VCL function signature or by calling directly VRT_priv_task(ctx, “name”) for example to obtain a per request place to hold :
    • free function
    • pointer to allocated data

This method is very interesting if you need a cleanup function to be called when the varnish request is destroyed.

 

Webassembly/wasm and asm.js

The web assembly thing. I’ll try to clarify things that I learned working on it:

  1. WASM : short for WebAssembly, a binary instructions format that runs on a stack based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust to be run on the Web. Reference here
  2. asm.js : a subset of js, static typed and highly optimizable, created to allow running  higher level languages like C application on the Web. Reference here and here

So you would say 1 and 2 have the same purpose : AFAIK yes. You can also convert asm.js to wasm and decode wasm back to asm.js (theoretically). Seems that WASM is going to be extended in the future compared to asm.js.

Let’s continue :

  1. emscripten  : toolchain to compile high level languages to asm.js and WASM. Uses LLVM and does also come conversion of API (openGL to WebGL for ex) and compiles to LLVM IR (llvm bitcode) and then from LLVM IR Bitcode to asm.js using Fastcomp.
  2. Binaryen (asm2wasm) : compiles asm.js to wasm and is included in emscripten (?)

Supposing that you have a C/C++ project, made of different libraries, I suggest to compile to LLVM IR Bitcode all the single components and just during the link phase generate asm.js/wasm for execution. This will allow you to maintain your building/linking steps as you would have in an standard object code generation environment.
emscripten/LLVM offer a full set of tools to compile.work on IR Bitcode if you like :

  • emmake : use existing makefiles by running emmake make
  • emconfigure : use existing configure command by running emconfigure configure <options>

Also if you want to dig deeper into llvm :

  • lli : directly executes programs in LLVM bitcode format. It takes a program in LLVM bitcode format and executes it using a just-in-time compiler or an interpreter
  • llc : compiles LLVM source inputs into assembly language for a specified architecture. The assembly language output can then be passed through a native assembler and linker to generate a native executable

Once you have all your compiled libraries/components in LLVM IR Bitcode you have to generate WASM. The basic compile command is :

emcc -s WASM=1 -o <prog>.html <prog>.c -l<anylibraryyouneed>

but :

  1. If you are using malloc/free you need to add : -s ALLOW_MEMORY_GROWTH=1
  2. If you are using pthreads in your code/libraries you need to add : -s USE_PTHREADS=1 but as of at Jan 2019 you can’t have both malloc/free and pthreads. More info here.

More to come soon.

Profiling a golang REST API server

go tool profiling

Profiling :

is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization.

How can you profile your golang REST API server in a super simple way :

First : add some lines to your server code

import _ "net/http/pprof"

And then add a listener (I normally use a command line flag to trigger this) :

go func() {
http.ListenAndServe("localhost:6000", nil)
}()

Start your server and generate some load. While your code is running under the load you generated extract the profiler data :

go tool pprof http://localhost:6000/debug/pprof/profile
Fetching profile over HTTP from http://localhost:6000/debug/pprof/profile
Saved profile in /home/paul/pprof/pprof.wm-server.samples.cpu.008.pb.gz
File: wm-server
Build ID: c806572b51954da99ceb779f6d7eee3600eae0fb
Type: cpu
Time: Dec 19, 2018 at 1:41pm (CET)
Duration: 30.13s, Total samples = 17.35s (57.58%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

You have many commands at this point but what I prefer to do, having used kcachegrind for years, is to fire it up using the kcachegrind command :

(pprof) kcachegrind

This will generate a callgrind formatted file and run kcachegrind on it to let you do all the usual analysis that you’re probably already used to do (call graph, callers, callees ..)

 

glibc 2.25 bug : strstr() runs 10 times slower than on 2.24

Linux is used on 54.9% of the world websites : almost every application running on a linux machine uses the glibc which provides the core libraries to access almost every feature of a linux system. The Mighty Glibc started back in 1988 and is a wonderful and glorious project.
As far as the string functions are concerned the sse / avx optimized versions of these functions (strlen, strcpy, strstr, strcmp and more) are up to 10 times faster than their corresponding standard c implementations (which for example you might find in the libmusl) when run on a sse/avx capable cpu.

We rely a lot on glibc string functions and that’s why we found that glibc 2.25 introduced some optimization on the AVX capable processors and this disabled sse* optimizations for methods that don’t have a avx2 optimized implementation (strstr, strcat, and I’m afraid parts of the math functions). For further details go here.
The bug affects ubuntu 18, debian 10, fedora 26 to 28.
A fix will come for sure, hopefully in glibc 2.29.

Measuring memory footprint of a linux/macosx application

 

If you’re selling an API or an application which is deployed on production systems, one of the questions your customers might ask you is what is the memory footprint of your API/application in order for them to account for an increase of memory requirements due to using your product. After some research I think that the best tool for measuring and debugging any increases/decrease of your mem footprint is valgrind –tool=massif together with ms_print reporting tools.

Massif is a Heap memory profiler and will measure how much/when you allocate heap memory in your code and show the involved code. Run :

valgrind --tool=massif

this will execute the code and generate a massif.out.<pid> file that you may visualize with

ms_print massif.out.<pid>

Take a ride, the output is absolutely useful and you will have an histogram of how much memory is used at every sampling moment.

 

Bounds checking, leak checks and race conditions check with gcc sanitize

If you are coding since many years in C or C++ you probably hit many tools that helped in finding memory leaks in your code as well as did bounds checking to avoid buffers overrun. Those were PurifyPlus (rational), BoundsChecker (numega). Then valgrind cameout and all linux developers were happy to run valgrind on their code to find leaks and bounds overruns.

Since gcc 4.8 and llvm 3.1 Sanitize has been introduce in the compiler as a compile option which gives you the ability to check for :

  • leak and bounds overrun : -fsanitize=address
  • race conditions : -fsanitize=thread

Using sanitizer in your normal Test environment (we use c++ google test framework to test api code) will drastically change the quality of your software products (C, C++ or GO) by allowing you to spot most of the bounds overrun and leaks during the testing phase, both in your code and in the code that you include not written by you.

In my sample makefiles we build all component that we use in test in debug mode (-g3) and with sanitize settable from outside via an environment variable :

SANITIZE = -fsanitize=address

CXXFLAGS = -std=c++11 -pthread -fpermissive -isystem ../../googletest/include -isystem ../../googletest -DDEBUG -g3 $(SANITIZE)

Compiling a library with sanitize is :

$ make -f libfh.mk debugclean debuglinux
rm -rf fh.do fh.mo libfh-debug.a
cc -DDEBUG -g3 -fsanitize=address -O0 -fPIC --std=c99 -Wall -Wextra -Wcomment -pthread -I . -c fh.c -o fh.do
ar crs libfh-debug.a fh.do

Compiling tests :

$ make -f libfh-test.mk testclean testlinux
rm -rf TestMain.o TestThread.o TestFH.o CommandLineParams.o testfh *.gcda *.gcno gtest-all.o ../../*_test_results.xml
g++ -std=c++11 -pthread -fpermissive -isystem ../../googletest/include -isystem ../../googletest -DDEBUG -g3 -fsanitize=address -I.. -I ../../libutil/src -c -o TestMain.o TestMain.cpp
g++ -std=c++11 -pthread -fpermissive -isystem ../../googletest/include -isystem ../../googletest -DDEBUG -g3 -fsanitize=address -I.. -I ../../libutil/src -c -o TestThread.o TestThread.cpp
g++ -std=c++11 -pthread -fpermissive -isystem ../../googletest/include -isystem ../../googletest -DDEBUG -g3 -fsanitize=address -I.. -I ../../libutil/src -c -o TestFH.o TestFH.cpp
g++ -std=c++11 -pthread -fpermissive -isystem ../../googletest/include -isystem ../../googletest -DDEBUG -g3 -fsanitize=address -I.. -I ../../libutil/src -c -o CommandLineParams.o CommandLineParams.cpp
g++ -isystem ../../googletest/include -I ../../googletest/ -pthread -c ../../googletest/src/gtest-all.cc
g++ -o testfh -fsanitize=address TestMain.o TestThread.o TestFH.o CommandLineParams.o gtest-all.o -L ../ -lfh-debug -lpthread

Now let’s run a test (after having introduced a bug) :


paul@paul-bigdesk:~/git/clibs/libfh/test$ ./testfh --gtest_filter=*hash_with_struct* Running main() in file c-libraries/libfh/test/TestMain.cpp Now running test executable: ./testfh Note: Google Test filter = *hash_with_struct* [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from FH [ RUN ] FH.hash_with_struct ================================================================= ==23907==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000eb5f at pc 0x7f37a13bc964 bp 0x7fff66eca0b0 sp 0x7fff66ec9858 WRITE of size 16 at 0x60200000eb5f thread T0 #0 0x7f37a13bc963 in __asan_memcpy (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x8c963) #1 0x466298 in fh_insert /home/paul/git/clibs/libfh/fh.c:390 #2 0x414ed5 in FH_hash_with_struct_Test::TestBody() /home/paul/git/clibs/libfh/test/TestFH.cpp:486 #3 0x45683e in void testing::internal::HandleSehExceptionsInMethodIfSupported&lt;testing::Test, void&gt;(testing::Test*, void (testing::Test::*)(), char const*) (/home/paul/git/clibs/libfh/test/testfh+0x45683e) #4 0x44fdf6 in void testing::internal::HandleExceptionsInMethodIfSupported&lt;testing::Test, void&gt;(testing::Test*, void (testing::Test::*)(), char const*) (/home/paul/git/clibs/libfh/test/testfh+0x44fdf6) #5 0x433737 in testing::Test::Run() (/home/paul/git/clibs/libfh/test/testfh+0x433737) #6 0x4340cf in testing::TestInfo::Run() (/home/paul/git/clibs/libfh/test/testfh+0x4340cf) #7 0x4347c2 in testing::TestCase::Run() (/home/paul/git/clibs/libfh/test/testfh+0x4347c2) #8 0x43b8b5 in testing::internal::UnitTestImpl::RunAllTests() (/home/paul/git/clibs/libfh/test/testfh+0x43b8b5) #9 0x457e16 in bool testing::internal::HandleSehExceptionsInMethodIfSupported&lt;testing::internal::UnitTestImpl, bool&gt;(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/paul/git/clibs/libfh/test/testfh+0x457e16) #10 0x450c6e in bool testing::internal::HandleExceptionsInMethodIfSupported&lt;testing::internal::UnitTestImpl, bool&gt;(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/paul/git/clibs/libfh/test/testfh+0x450c6e) #11 0x43a35b in testing::UnitTest::Run() (/home/paul/git/clibs/libfh/test/testfh+0x43a35b) #12 0x406819 in main /home/paul/git/clibs/libfh/test/TestMain.cpp:38 #13 0x7f37a07c582f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f) #14 0x406538 in _start (/home/paul/git/clibs/libfh/test/testfh+0x406538) 0x60200000eb5f is located 0 bytes to the right of 15-byte region [0x60200000eb50,0x60200000eb5f) allocated by thread T0 here: #0 0x7f37a13c8662 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98662) #1 0x466222 in fh_insert /home/paul/git/clibs/libfh/fh.c:384 #2 0x414ed5 in FH_hash_with_struct_Test::TestBody() /home/paul/git/clibs/libfh/test/TestFH.cpp:486 #3 0x45683e in void testing::internal::HandleSehExceptionsInMethodIfSupported&lt;testing::Test, void&gt;(testing::Test*, void (testing::Test::*)(), char const*) (/home/paul/git/clibs/libfh/test/testfh+0x45683e) #4 0x44fdf6 in void testing::internal::HandleExceptionsInMethodIfSupported&lt;testing::Test, void&gt;(testing::Test*, void (testing::Test::*)(), char const*) (/home/paul/git/clibs/libfh/test/testfh+0x44fdf6) #5 0x433737 in testing::Test::Run() (/home/paul/git/clibs/libfh/test/testfh+0x433737) #6 0x4340cf in testing::TestInfo::Run() (/home/paul/git/clibs/libfh/test/testfh+0x4340cf) #7 0x4347c2 in testing::TestCase::Run() (/home/paul/git/clibs/libfh/test/testfh+0x4347c2) #8 0x43b8b5 in testing::internal::UnitTestImpl::RunAllTests() (/home/paul/git/clibs/libfh/test/testfh+0x43b8b5) #9 0x457e16 in bool testing::internal::HandleSehExceptionsInMethodIfSupported&lt;testing::internal::UnitTestImpl, bool&gt;(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/paul/git/clibs/libfh/test/testfh+0x457e16) #10 0x450c6e in bool testing::internal::HandleExceptionsInMethodIfSupported&lt;testing::internal::UnitTestImpl, bool&gt;(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/paul/git/clibs/libfh/test/testfh+0x450c6e) #11 0x43a35b in testing::UnitTest::Run() (/home/paul/git/clibs/libfh/test/testfh+0x43a35b) #12 0x406819 in main /home/paul/git/clibs/libfh/test/TestMain.cpp:38 #13 0x7f37a07c582f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f) SUMMARY: AddressSanitizer: heap-buffer-overflow ??:0 __asan_memcpy Shadow bytes around the buggy address: 0x0c047fff9d10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff9d20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff9d30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff9d40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c047fff9d50: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa =&gt;0x0c047fff9d60: fa fa fa fa fa fa fa fa fa fa 00[07]fa fa 05 fa 0x0c047fff9d70: fa fa 00 00 fa fa 00 00 fa fa 00 fa fa fa 00 fa 0x0c047fff9d80: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa 0x0c047fff9d90: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa 0x0c047fff9da0: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa 0x0c047fff9db0: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap right redzone: fb Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe ==23907==ABORTING

You easily get to were the bug is fh.c line 390; we decremented the allocated size by 1:

new_opaque_obj = malloc(fh->h_datalen-1);

Easily you can switch to race conditions checking my using

$ make SANITIZE=-fsanitize=thread -f libfh-test.mk testclean testlinux

 

Design for simplicity : software options

xkcd.com

(I was highly inspired by Evan Martin post before trying to recap in this post)
Think twice before adding an option to a software; options are :

  • bad user experience : good software just works without asking to many things to the user.
  •  a maintenance cost : an option is a feature you have to maintain/test for the lifetime of the software
  • documentation cost : every option, even the most cryptical, should be documented to the user, explained and you should provide some examples of use. When you normally try to do this you end up removing the option from documentation
  • normally options are a way to delegate to user a decision that the designer failed to take : “shall we open on the last opened document or shall put an options ‘always open on blank document’ ?
  • remember the paradox of choice
  • options setting generates bad GUIs : they are hard to present, group or organize.

On the other hand consider that any change you make will probably break someone’s workflow, no doubt, so the temptation to create an option to enable the previous operational mode is high.

The old windows search wizard, awesome, read this : https://www.joelonsoftware.com/2000/04/12/choices/

WTF/Minute : code quality measure unit

 Some day your code will be reviewed, modified or even just read to understand the underlying algorithm by someone else that is not you that wrote the code….

My point here is that readability/maintainability of code is by far the biggest quality of a piece of software. Being able to read the code and understand it rapidly makes everything easier, either bugfixing or modifying/enhancing the code. Think that the next one that might be reading the code might not be as experienced as you with that language so

  • avoid using  unnecessarily complicated constructs that feed you coder ego but slow down the maintenance activity. You are a good coder, no doubt, but if your code is not readable by the average coder that will have to then you are not making a good service to your colleagues
  • Make clear what your code is doing and, if necessary use a comment to clarify what is not evident from the code (I’m not a huge fan of comments because most of them don’t tell anything more than the code). Use comments to refer to a requirement/user-story that is being implemented.
  • Stick to the language style guidelines that were set for your team/company
  • Tests are code too and often are read/reviewed more than real code : don’t fuck up in tests code just because “this is just a test, this code will not run in production”

I’m sure I have more of these but for know I’ll stop here.

Comic comes from here

 

Hash functions – efficency and speed

hash function is any function that can be used to map data of arbitrary size to data of a fixed size. The values returned by a hash function are called hash valueshash codesdigests, or simply hashes. Hash functions are often used in combination with a hash table, a common data structure used in computer software for rapid data lookup.

If you happen to need a non cryptographic hash function (and you might need it even if your are using languages that have builtin hashtables like java or c++ ) here are some references taken from various parts:

Interesting comparison of different hash tables :

http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx

Some more hashfunctions here :

http://www.cse.yorku.ca/~oz/hash.html

Some really interesting benchmarks here by Peter Kankowsky.

And again hash benchmarks here.

Some (a mininum set) benchmarks ran using User-Agents as keys, 1.5 multiply factor on hash size, next power of 2 hash sizes :

Keys 3130591, Average fh_default_hash(8388608) time in nanosecs : 292.56
Total collisions 517939, longest chain 6, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average djb_hash(8388608) time in nanosecs : 181.24
Total collisions 517445, longest chain 7, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average sax_hash(8388608) time in nanosecs : 244.49
Total collisions 517431, longest chain 7, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average jsw_hash(8388608) time in nanosecs : 141.39
Total collisions 522951, longest chain 14, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average jen_hash(8388608) time in nanosecs : 148.54
Total collisions 518565, longest chain 7, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average djb2_hash(8388608) time in nanosecs : 182.37
Total collisions 516405, longest chain 7, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average sdbm_hash(8388608) time in nanosecs : 238.93
Total collisions 525513, longest chain 7, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average fnv_hash(8388608) time in nanosecs : 237.01
Total collisions 517530, longest chain 7, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average oat_hash(8388608) time in nanosecs : 294.78
Total collisions 517939, longest chain 6, hash_dim_factor 1.500000
----------------------


Tests done with the hashtable code you can find here

And last a research study (which I’m still working on) for SIMD optimized hash functions : here  some theory and code here. The all use the vector extensions present in gcc.