Friday, October 23, 2015

How to do fast canonical ordering of domain names

A small post to document an arcane subject: how to quickly do a comparison of DNS names in canonical order. First, to recap, what is DNS canonical ordering? It is case insensitive, but 8 bit, based on the labels that make up the DNS name in reverse order.

So, in human order, xxx.powerdns.com and aaa.powerdns.de, bbb.powerdns.net sort like this:

  1. aaa.powerdns.de
  2. bbb.powerdns.net
  3. xxx.powerdns.com
But in DNS canonical order, they sort like this:
  1. xxx.powerdns.com
  2. aaa.powerdns.de
  3. bbb.powerdns.net
This is because in the canonical order, we look at the 'com', 'de' and 'net' parts first. And only if those are equal, we look at the second-to-last label in the name. RFC 4034 section 6 has all the details. DNS ordering is more than an obscure subject: you need to order your records this way to calculate DNSSEC signatures for example. If you get the ordering wrong, your signatures won't match.

So how can we do the comparison quickly? The naive way is of course to use one of your language primitives to split up a domain name in labels, reverse the order, and do a case insensitive lexicographical comparison on them. That could look like this in C++ 2011:
 auto ours=getRawLabels(), rhsLabels = rhs.getRawLabels();
 return std::lexicographical_compare(ours.rbegin(), ours.rend(), rhsLabels.rbegin(), rhsLabels.rend(), CIStringCompare());
While this is easy enough, it is also astoundingly slow since it splits up your domain name and does loads of allocations. Loading a 1.4 million record long zone into a container with canonical ordering this way took 40 seconds. Ordering based on naive case insensitive human compare loaded in 8 seconds.

Now, DNS names consist of labels with a length, so www.powerdns.com typically gets stored in a packet as the value 3, then "www," the value 8, then "powerdns", the value 3 and them "com". Note there are no dots in there!

It is highly recommended to also store DNS names as a series of length/label-content pairs, since otherwise you need to do lots of escaping to deal with embedded nulls, embedded . etc.

When stored like this however, it is not straightforward to do a canonical compare. So I asked around among our open source friends, and Marek Vavrusa of CZNIC quickly chimed in to explain how the excellent Knot nameserver products do it, and it is quite clever. First, you store the domain in reverse label order, so www.powerdns.com would turn into com.powerdns.www, which would normally look like "3com8powerdns3www" in memory.

However, if you naively compare 3com8powerdns3www with (say) 2de8powerdns3www, you'd decide that based on the '3' versus the '2', that www.powerdns.de would sort before www.powerdns.com, which is wrong.

So the clever bit is to zero out the label length fields, so you store the names as '0com0powerdns0www' and '0de0powerdns0www'. And then you can simply do a case-insensitive compare and get the right ordering. And if course there is no need to store the leading 0 in this case.

Now, there is a downside to this arrangement: you lose the information what the domain actually looked like. If there were embedded 0s in the domain name, and there could be, you can't recover the domain name anymore. However, if you don't care, or if you just use this as a key and have a copy of the original domain name somewhere, this works great. Thanks for the explanation Marek!

PowerDNS uses the most astoundingly great Boost Multi Index container. I had the great pleasure of meeting its author Joaquín Mª López Muñoz recently and I keep learning more about what is possible with this wonderful container. But, Boost Multi Index allows us to index objects based on what is in them, without a key that lives separately. So within PowerDNS we like to just use the DNSName that is embedded in an indexed object to sort, and we don't want to 0 out the label lengths in there.

After some trial and mostly error, I hit on the following rapid ordering procedure for DNS names stored in DNS native format (so: 3www8powerdns3com).

  1. Scan through both labels and note the positions of the label boundaries in a stack-based simple array (so no malloc). Store how many labels each DNS name has.
  2. Starting at the last position in your arrays, which denotes the beginning of the last label, do a lexicographical compare starting at one position beyond the length byte and ending "length byte" bytes after that. This for both DNS names
  3. If this comparison leads to 'smaller', your DNS name is definitely smaller. If it leads to 'larger', your DNS name is definitely not smaller and you are done. 
  4. Otherwise, proceed one place back in the array of lengths of both names.
  5. If you ended up at position 0 for one name and not yet for the other, that name is smaller and you are done
  6. If you ended up at position 0 for BOTH DNS names, none is smaller than the other
  7. Go to 2 (except don't look at the last label, but at the 'current' position).
To make this safe, either make two arrays sufficiently large that no legal DNS name could overflow it, or use something plausible as a maximum, and fall back to allocating on the heap if your name is long enough to warrant it.

In code, this looks something like this. So, the big question of course, is it fast enough? After a little bit of tuning, the canonical comparison function implemented as above is just as fast as the 'naive' human order comparison. Loading a zone again takes 8 seconds. This is faster than you'd expect, but it turns out our canonical comparison function inlines better since the original version secretly used a C library function for case insensitive comparisons. 

I hope this has been helpful - either do what Knot does, if you can get away with it, and then it is super fast, or ponder our suggested stack based array solution.

Good luck!








7 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Hi Bert,

    I think the comparison can be done without setting up the ourpos, rhspos auxiliary arrays. I've implemented the stuff at

    http://coliru.stacked-crooked.com/a/2143706a4e6d8e90

    You might want to check if this gains you some extra performance.

    ReplyDelete
    Replies
    1. I see that you have a fallback variant when ourpos is not large enough. In the same spirit, here's a version with an analogous fallback to avoid excessive recursive depths:

      http://coliru.stacked-crooked.com/a/6f17856a7340fcb2

      The good thing about canon_dns_slow_compare_same_num_segments is that it doesn't allocate dynamic memory. My measurements indicate that it is in fact almost as fast as canon_dns_fast_compare_same_num_segments! (although with real data results can be different).

      Delete
  4. Hi, just wanted to tell you, I enjoyed this blog post. It was informative and helpful. Keep on posting!

    Podiatry Websites Design for foot specialists.

    ReplyDelete