Friday, February 13, 2015

Some notes on sendmsg()

This post is mostly so other people can save themselves the two days of pain PowerDNS just went through.

When sending or receiving datagrams with metadata, POSIX offers us sendmsg() and recvmsg(). These are complicated calls, but they can do quite magical things. They are the "where does this go" part of the socket API. And a lot went there.

For example, when you bind to or ::, you receive datagrams sent to any address on the port you bound to. But if you reply, you need to know the right source address because otherwise you might send back a response from another IP address than received the question. recvmsg() and sendmsg() can make this happen for you. We documented how this works here.

So, we learned two important things over the past few days.

Requesting timestamps
To request timestamp information, which is great when you want to plot the *actual* latency of the service you are providing, one uses setsockopt() to set the SO_TIMESTAMP option. This instructs the kernel to deliver packets with a timestamp describing when the packet hit the system. You get this timestamp via recvmsg() by going through the 'control messages' that came with the datagram.

On Linux, the type of the control message that delivers the timestamp is equal to SO_TIMESTAMP, just like the option we passed to setsockopt(). However, this is a lucky accident. The actual type of the message is SCM_TIMESTAMP.  And it only happens to be the case that SO_TIMESTAMP==SCM_TIMESTAMP on Linux. This is not the case on FreeBSD.

So: to retrieve the timestamp, select the message with type SCM_TIMESTAMP. If you select for SO_TIMESTAMP, you will get no timestamps on FreeBSD.

Datagrams without control messages
Secondly, sendmsg() is not a very well specified system call. Even though RFC 2292 was written by that master of documentation Richard Stevens, it does not tell us all the things we need to know. For example, we discovered that if you use sendmsg() to send a packet without control messages, on FreeBSD it is not enough to set the length of the control message buffer to 0 (which suffices on Linux).

FreeBSD in addition demands that the control message buffer address is 0 too. FreeBSD has a check if the length of the control message buffer is at least 1 control message, unless the address of the control message buffer is 0.

So: if you use sendmsg() to send a datagram without any control messages, set both msg_control and msg_controllen to 0. This way you are portable.

We hope the above has been helpful for you.