Wednesday, November 13, 2013

Getting the correct 'git hash' in your binaries w/o needless recompiling or linking

When processing bug reports, or when people doubt the output of your tools, it is tremendously helpful to know the provenance of the binaries. One way of doing this is to embed the git hash in your code.

A useful hash can be generated using:
$ git describe --always --dirty=+
This outputs something like '4120f32+', where the '+' means there have been local changes with respect to the commit that can be identified with '4120f32':
$ git show 4120f32
commit 4120f32ef7b684eb1ff42d136e37e8733f9811e1
Author: bert hubert
Date:   Wed Nov 13 12:46:43 2013 +0100
    rename git-version wording to git-hash, move the hash to a .o file so we only need to relink on a change
What we want is this:
$ antonie --version
antonie  version: g4120f32+ 
To achieve this, we need to convince our Makefile to generate something that includes the hash from git-describe. Secondly, if this output changes, all binaries must be updated to this effect. Finally, if the git hash did not change, we should not get spurious rebuilds.

Many projects, including PowerDNS, had given up on achieving the last goal. Running 'make' will always relink PowerDNS now, and even recompile a small file, even if nothing changed. And this hurts our sensibilities.

For Antonie, I got stuck on a boring issue today, and decided I needed to solve the git-hash-embedding issue once and for all instead.

As the first component, enter update-git-hash-if-necessary, which will update (or create) githash.h to the latest git hash, but only if it is different from what was in there already - effectively preserving the old timestamp if there were no changes.

Secondly, we need to convince Make to always run update-git-hash-if-necessary before doing anything else. This can be achieved by adding the following to the Makefile:
CHEAT_ARG := $(shell ./update-git-hash-if-necessary)
Even though we never use CHEAT_ARG, whenever we now run make to do anything, this will ensure that githash.h is updated if required. This clever trick was found here.

To complete the story, create githash.c which #includes githash.h and make it define a global variable, possibly like this: 'const char* g_gitHash = GIT_HASH;'. Now include githash.o as an object for all programs needing access to the hash.

Finally, add 'extern const char* g_gitHash;' somewhere so the programs can actually see it. Note that you can't use githash.h for that, since that would trigger recompilation where relinking would suffice.

If you now run 'make' or even 'make antonie', any change in the git hash will trigger a recompile of githash.o, followed by rapid relinking. If there was no change, nothing will appear to have happened.

Please let me know if you find ways to improve on the trick above!

2 comments:

  1. There's a slight problem that CHEAT_ARG := is GNUism and is not portable.

    ReplyDelete
  2. Ondřej, interesting, thanks! It works on Linux & OSX, and http://www.khmere.com/freebsd_book/html/ch01.html says FreeBSD does it too? But I don't worry that much, you need gmake for many thing anyhow otherwise..

    ReplyDelete