Wednesday, November 11, 2009

When DNS is cool and when it is not

Whenever massive query rates are desired for globally distributed data, with high redundancy and built in positive and negative caching, people think of DNS. Popular examples are of course our day to day use of the Domain Name System (which is a lot more than a protocol) to lookup IP addresses, but also include tremendous amounts of spam lookups (RBLs) to determine if an IP address is likely to be a source of spam.

In addition, "ENUM" has been designed to share reachability information for phone numbers over DNS, telling one for example over which SIP identity the owner of a phone number could be reached using VoIP. This needs many of the aforementioned features of DNS, like high query rates, redundancy and caching.

Periodically, people ponder storing other things in DNS, most often because they are attracted to the huge query rates, built in distribution, redundancy and caching. And indeed, these are things that make the DNS very attractive.

In addition, DNS passes more firewalls by default than almost any other protocol, because the network's resolver acts as a sanctioned proxy.

It turns out however that there are severe limits to what you can do within DNS while retaining the attractive bits.

High query rates
Even with a very limited investment, it is possible to build solutions based on DNS enabling one to ask and answer over a million queries per second. Building such functionality on top of a SQL database would be an order of magnitude more expensive (at least).

Among the reasons why DNS can support such tremendous speeds is its use of the connectionless UDP protocol, which means that a question fits in a single packet, as does the answer. A TCP/IP session goes through at least 6 packets to achieve the same thing.

Passing firewalls
Almost all network environments have DNS connectivity to the outside world, often via the network's resolvers. In addition, these resolvers typically have an undisturbed view of UDP port 53 to the outside world. However, they often do not have such unfettered access to TCP or ICMP.

This is important because UDP packets have severe constraints on their size, with 1500 being the maximum before stuff needs to happen. The stuff that needs to happen either entails sending fragments (which have a hard time passing firewalls), or moving to TCP (which is blocked far more often than UDP for DNS).

DNS has a lot of rules
DNS was originally a replacement for the (then) famous HOSTS.TXT file, which contained IP addresses for host names that people wanted to share with the internet. This file was lovingly maintained by hand, and periodically downloaded by everybody.

When this no longer proved to be sustainable, the DNS was created so everybody could administer their own names, and publish these in an automated fashion.

Look closely however, and the DNS shows its HOSTS.TXT roots. Even though each 'top level domain' can have its own set of servers, in the end fundamentally, the DNS assumes it is actually one uniform list of records ('HOSTS.TXT'). This means that if the root says the nameserver for everything ending on NL are X, Y and Z, that if you ask any of X, Y and Z what the namservers for NL are it *has* to answer X, Y and Z (it may add U, V and W to the answers though).

What it may NOT do is say 'oh, NL, I handed that over to servers A, B and C, ask them'. Because this would violate the 'HOSTS.TXT' view of the DNS, where everything in the root zone has to be identical to the stuff at the lower level.

DNS can only answer simple questions
DNS basically knows only one question 'Do you have information of type X about name Y?'. And as an answer, you'll get all the information about Y of type X that fits in the answer packet. There is no way to say 'give me all names Y that have type X', for example. Nor is there a way to ask for all names that start with 'www'.

You can't mirror the DNS
The DNS is a fully distributed system, and one that can only answer simple questions (see above). There is no reliable way to make a complete copy of the DNS. This means that in order to use it, one has to rely on working network connectivity, and also has to trust other people's systems.

Unlike, say, a SQL database, it is not possible to have a full copy that still works without network connectivity.

So - what do these limitations mean?
Summarising - we like DNS because it is really fast, easily distributed, well cached and passes firewalls easily. However, the above means that if we want to keep all these cool features:
  • Responses to DNS queries must be small. Large answers mean UDP can't be used, which in turn means a significant slowdown because TCP needs so many more packets. In addition, TCP has a far harder time passing firewalls.
    Fundamentally, this means not storing photographs or other large things in DNS
  • We must only ask simple questions that have direct answers.
  • Our questions and data distribution must fit the DNS rules.
    This means we can't "redelegate". A practical problem that gets hit by this restriction is so called telephony number portability, where a phone number jumps outside of the hierarchy, and is suddenly served by a wholly different company.
  • We must accept that queries will leave our network, and that we can't have an 'offline copy'
All in all - this means that quite a lot of problems do not fit the constraints that DNS imposes.

But anytime you have simple questions, with small answers and you dare to rely on other people's servers, plus do not desire 'redelegation', DNS may be your best bet.

Some alternatives
Slightly more advanced than DNS is LDAP, which offers the possiblity of asking more complicated questions. Slightly *less* advanced than DNS is memcached, which does however share the very high performance and easy redundancy. It does not offer delegation though.

1 comment:

  1. This comment has been removed by a blog administrator.