A small post to document an arcane subject: how to quickly do a comparison of DNS names in canonical order. First, to recap, what is DNS canonical ordering? It is case insensitive, but 8 bit, based on the labels that make up the DNS name in reverse order.
So, in human order, xxx.powerdns.com and aaa.powerdns.de, bbb.powerdns.net sort like this:
Now, DNS names consist of labels with a length, so www.powerdns.com typically gets stored in a packet as the value 3, then "www," the value 8, then "powerdns", the value 3 and them "com". Note there are no dots in there!
It is highly recommended to also store DNS names as a series of length/label-content pairs, since otherwise you need to do lots of escaping to deal with embedded nulls, embedded . etc.
When stored like this however, it is not straightforward to do a canonical compare. So I asked around among our open source friends, and Marek Vavrusa of CZNIC quickly chimed in to explain how the excellent Knot nameserver products do it, and it is quite clever. First, you store the domain in reverse label order, so www.powerdns.com would turn into com.powerdns.www, which would normally look like "3com8powerdns3www" in memory.
However, if you naively compare 3com8powerdns3www with (say) 2de8powerdns3www, you'd decide that based on the '3' versus the '2', that www.powerdns.de would sort before www.powerdns.com, which is wrong.
So the clever bit is to zero out the label length fields, so you store the names as '0com0powerdns0www' and '0de0powerdns0www'. And then you can simply do a case-insensitive compare and get the right ordering. And if course there is no need to store the leading 0 in this case.
Now, there is a downside to this arrangement: you lose the information what the domain actually looked like. If there were embedded 0s in the domain name, and there could be, you can't recover the domain name anymore. However, if you don't care, or if you just use this as a key and have a copy of the original domain name somewhere, this works great. Thanks for the explanation Marek!
PowerDNS uses the most astoundingly great Boost Multi Index container. I had the great pleasure of meeting its author Joaquín Mª López Muñoz recently and I keep learning more about what is possible with this wonderful container. But, Boost Multi Index allows us to index objects based on what is in them, without a key that lives separately. So within PowerDNS we like to just use the DNSName that is embedded in an indexed object to sort, and we don't want to 0 out the label lengths in there.
After some trial and mostly error, I hit on the following rapid ordering procedure for DNS names stored in DNS native format (so: 3www8powerdns3com).
So, in human order, xxx.powerdns.com and aaa.powerdns.de, bbb.powerdns.net sort like this:
- aaa.powerdns.de
- bbb.powerdns.net
- xxx.powerdns.com
But in DNS canonical order, they sort like this:
- xxx.powerdns.com
- aaa.powerdns.de
- bbb.powerdns.net
This is because in the canonical order, we look at the 'com', 'de' and 'net' parts first. And only if those are equal, we look at the second-to-last label in the name. RFC 4034 section 6 has all the details. DNS ordering is more than an obscure subject: you need to order your records this way to calculate DNSSEC signatures for example. If you get the ordering wrong, your signatures won't match.
So how can we do the comparison quickly? The naive way is of course to use one of your language primitives to split up a domain name in labels, reverse the order, and do a case insensitive lexicographical comparison on them. That could look like this in C++ 2011:
While this is easy enough, it is also astoundingly slow since it splits up your domain name and does loads of allocations. Loading a 1.4 million record long zone into a container with canonical ordering this way took 40 seconds. Ordering based on naive case insensitive human compare loaded in 8 seconds.auto ours=getRawLabels(), rhsLabels = rhs.getRawLabels();return std::lexicographical_compare(ours.rbegin(), ours.rend(), rhsLabels.rbegin(), rhsLabels.rend(), CIStringCompare());
Now, DNS names consist of labels with a length, so www.powerdns.com typically gets stored in a packet as the value 3, then "www," the value 8, then "powerdns", the value 3 and them "com". Note there are no dots in there!
It is highly recommended to also store DNS names as a series of length/label-content pairs, since otherwise you need to do lots of escaping to deal with embedded nulls, embedded . etc.
When stored like this however, it is not straightforward to do a canonical compare. So I asked around among our open source friends, and Marek Vavrusa of CZNIC quickly chimed in to explain how the excellent Knot nameserver products do it, and it is quite clever. First, you store the domain in reverse label order, so www.powerdns.com would turn into com.powerdns.www, which would normally look like "3com8powerdns3www" in memory.
However, if you naively compare 3com8powerdns3www with (say) 2de8powerdns3www, you'd decide that based on the '3' versus the '2', that www.powerdns.de would sort before www.powerdns.com, which is wrong.
So the clever bit is to zero out the label length fields, so you store the names as '0com0powerdns0www' and '0de0powerdns0www'. And then you can simply do a case-insensitive compare and get the right ordering. And if course there is no need to store the leading 0 in this case.
Now, there is a downside to this arrangement: you lose the information what the domain actually looked like. If there were embedded 0s in the domain name, and there could be, you can't recover the domain name anymore. However, if you don't care, or if you just use this as a key and have a copy of the original domain name somewhere, this works great. Thanks for the explanation Marek!
PowerDNS uses the most astoundingly great Boost Multi Index container. I had the great pleasure of meeting its author Joaquín Mª López Muñoz recently and I keep learning more about what is possible with this wonderful container. But, Boost Multi Index allows us to index objects based on what is in them, without a key that lives separately. So within PowerDNS we like to just use the DNSName that is embedded in an indexed object to sort, and we don't want to 0 out the label lengths in there.
After some trial and mostly error, I hit on the following rapid ordering procedure for DNS names stored in DNS native format (so: 3www8powerdns3com).
- Scan through both labels and note the positions of the label boundaries in a stack-based simple array (so no malloc). Store how many labels each DNS name has.
- Starting at the last position in your arrays, which denotes the beginning of the last label, do a lexicographical compare starting at one position beyond the length byte and ending "length byte" bytes after that. This for both DNS names
- If this comparison leads to 'smaller', your DNS name is definitely smaller. If it leads to 'larger', your DNS name is definitely not smaller and you are done.
- Otherwise, proceed one place back in the array of lengths of both names.
- If you ended up at position 0 for one name and not yet for the other, that name is smaller and you are done
- If you ended up at position 0 for BOTH DNS names, none is smaller than the other
- Go to 2 (except don't look at the last label, but at the 'current' position).
To make this safe, either make two arrays sufficiently large that no legal DNS name could overflow it, or use something plausible as a maximum, and fall back to allocating on the heap if your name is long enough to warrant it.
In code, this looks something like this. So, the big question of course, is it fast enough? After a little bit of tuning, the canonical comparison function implemented as above is just as fast as the 'naive' human order comparison. Loading a zone again takes 8 seconds. This is faster than you'd expect, but it turns out our canonical comparison function inlines better since the original version secretly used a C library function for case insensitive comparisons.
I hope this has been helpful - either do what Knot does, if you can get away with it, and then it is super fast, or ponder our suggested stack based array solution.
Good luck!
That's why I love C++ :)
ReplyDeleteespecially C++ 2011/2014!
ReplyDeleteHi Bert,
ReplyDeleteI think the comparison can be done without setting up the ourpos, rhspos auxiliary arrays. I've implemented the stuff at
http://coliru.stacked-crooked.com/a/2143706a4e6d8e90
You might want to check if this gains you some extra performance.
I see that you have a fallback variant when ourpos is not large enough. In the same spirit, here's a version with an analogous fallback to avoid excessive recursive depths:
Deletehttp://coliru.stacked-crooked.com/a/6f17856a7340fcb2
The good thing about canon_dns_slow_compare_same_num_segments is that it doesn't allocate dynamic memory. My measurements indicate that it is in fact almost as fast as canon_dns_fast_compare_same_num_segments! (although with real data results can be different).
If you're trying to burn fat then you certainly need to try this totally brand new custom keto meal plan.
ReplyDeleteTo create this service, certified nutritionists, fitness couches, and professional cooks united to provide keto meal plans that are productive, suitable, price-efficient, and fun.
From their launch in 2019, 1000's of people have already remodeled their figure and health with the benefits a certified keto meal plan can give.
Speaking of benefits: clicking this link, you'll discover eight scientifically-confirmed ones given by the keto meal plan.
We can get the best way of canonical ordering. Really. I am appreciating to you for this kind of article. As, I am providing you the best Commercial Property Inspection in San Diego and make happy your home environment.
ReplyDeleteAgarbatti automatic making machines /agarbatti-making-machine-automatic-indian in Delhi have transformed traditional incense production, offering speed and precision in crafting high-quality agarbattis. These machines automate the entire process, from mixing to rolling, ensuring consistency in fragrance and size. Manufacturers and businesses benefit from reduced labor costs, faster production times, and improved efficiency. The capital city has become a hub for cutting-edge agarbatti-making technology, providing both small and large-scale producers with top-tier equipment. With easy availability, competitive pricing, and after-sales service, Delhi remains the prime location for sourcing agarbatti machines.
ReplyDelete