Friday, October 23, 2015

How to do fast canonical ordering of domain names

A small post to document an arcane subject: how to quickly do a comparison of DNS names in canonical order. First, to recap, what is DNS canonical ordering? It is case insensitive, but 8 bit, based on the labels that make up the DNS name in reverse order.

So, in human order, xxx.powerdns.com and aaa.powerdns.de, bbb.powerdns.net sort like this:

  1. aaa.powerdns.de
  2. bbb.powerdns.net
  3. xxx.powerdns.com
But in DNS canonical order, they sort like this:
  1. xxx.powerdns.com
  2. aaa.powerdns.de
  3. bbb.powerdns.net
This is because in the canonical order, we look at the 'com', 'de' and 'net' parts first. And only if those are equal, we look at the second-to-last label in the name. RFC 4034 section 6 has all the details. DNS ordering is more than an obscure subject: you need to order your records this way to calculate DNSSEC signatures for example. If you get the ordering wrong, your signatures won't match.

So how can we do the comparison quickly? The naive way is of course to use one of your language primitives to split up a domain name in labels, reverse the order, and do a case insensitive lexicographical comparison on them. That could look like this in C++ 2011:
 auto ours=getRawLabels(), rhsLabels = rhs.getRawLabels();
 return std::lexicographical_compare(ours.rbegin(), ours.rend(), rhsLabels.rbegin(), rhsLabels.rend(), CIStringCompare());
While this is easy enough, it is also astoundingly slow since it splits up your domain name and does loads of allocations. Loading a 1.4 million record long zone into a container with canonical ordering this way took 40 seconds. Ordering based on naive case insensitive human compare loaded in 8 seconds.

Now, DNS names consist of labels with a length, so www.powerdns.com typically gets stored in a packet as the value 3, then "www," the value 8, then "powerdns", the value 3 and them "com". Note there are no dots in there!

It is highly recommended to also store DNS names as a series of length/label-content pairs, since otherwise you need to do lots of escaping to deal with embedded nulls, embedded . etc.

When stored like this however, it is not straightforward to do a canonical compare. So I asked around among our open source friends, and Marek Vavrusa of CZNIC quickly chimed in to explain how the excellent Knot nameserver products do it, and it is quite clever. First, you store the domain in reverse label order, so www.powerdns.com would turn into com.powerdns.www, which would normally look like "3com8powerdns3www" in memory.

However, if you naively compare 3com8powerdns3www with (say) 2de8powerdns3www, you'd decide that based on the '3' versus the '2', that www.powerdns.de would sort before www.powerdns.com, which is wrong.

So the clever bit is to zero out the label length fields, so you store the names as '0com0powerdns0www' and '0de0powerdns0www'. And then you can simply do a case-insensitive compare and get the right ordering. And if course there is no need to store the leading 0 in this case.

Now, there is a downside to this arrangement: you lose the information what the domain actually looked like. If there were embedded 0s in the domain name, and there could be, you can't recover the domain name anymore. However, if you don't care, or if you just use this as a key and have a copy of the original domain name somewhere, this works great. Thanks for the explanation Marek!

PowerDNS uses the most astoundingly great Boost Multi Index container. I had the great pleasure of meeting its author Joaquín Mª López Muñoz recently and I keep learning more about what is possible with this wonderful container. But, Boost Multi Index allows us to index objects based on what is in them, without a key that lives separately. So within PowerDNS we like to just use the DNSName that is embedded in an indexed object to sort, and we don't want to 0 out the label lengths in there.

After some trial and mostly error, I hit on the following rapid ordering procedure for DNS names stored in DNS native format (so: 3www8powerdns3com).

  1. Scan through both labels and note the positions of the label boundaries in a stack-based simple array (so no malloc). Store how many labels each DNS name has.
  2. Starting at the last position in your arrays, which denotes the beginning of the last label, do a lexicographical compare starting at one position beyond the length byte and ending "length byte" bytes after that. This for both DNS names
  3. If this comparison leads to 'smaller', your DNS name is definitely smaller. If it leads to 'larger', your DNS name is definitely not smaller and you are done. 
  4. Otherwise, proceed one place back in the array of lengths of both names.
  5. If you ended up at position 0 for one name and not yet for the other, that name is smaller and you are done
  6. If you ended up at position 0 for BOTH DNS names, none is smaller than the other
  7. Go to 2 (except don't look at the last label, but at the 'current' position).
To make this safe, either make two arrays sufficiently large that no legal DNS name could overflow it, or use something plausible as a maximum, and fall back to allocating on the heap if your name is long enough to warrant it.

In code, this looks something like this. So, the big question of course, is it fast enough? After a little bit of tuning, the canonical comparison function implemented as above is just as fast as the 'naive' human order comparison. Loading a zone again takes 8 seconds. This is faster than you'd expect, but it turns out our canonical comparison function inlines better since the original version secretly used a C library function for case insensitive comparisons. 

I hope this has been helpful - either do what Knot does, if you can get away with it, and then it is super fast, or ponder our suggested stack based array solution.

Good luck!








Monday, August 10, 2015

Startups don’t win RFPs: here’s why you might want to do one anyhow

(I’d like to thank Dirk Peeters who taught me most things RFP and Remco van Mook for commenting on and improving this post.)

As I find myself in RFP-land again, I found myself pondering how my previous startups spent tremendous amounts of time working on these Requests For Proposals from huge customers. Enough so to warrant a blog post that may be helpful for current startups: how to choose between small customers and large customers, like governments and telcos with procurement departments.

Now, there are of course rare startups that sell straight to consumers, and their game is different. These typically are the startups everyone knows about, because they deal with customers (you) directly. But chances are your startup either sells to businesses, or will need to reach the consumer through established distributors or vendors that embed your stuff into their product. If you plan to sell to consumers directly, this post is not for you.

For context, most startups eventually have a ‘minimum viable product’ (MVP), or at least something that strives to be that. Product/market fit has not been achieved, let alone a perfect match. In other words, customers may be wanting things you have not gotten round to offering, or did not know they wanted. And meanwhile you added lots of stuff that the perceive as excess baggage. Not only is the product not perfect, neither is your knowledge of the market. There may not BE a market yet!

At this stage, there should be good contact with potential customers already. There are lots of small ones to talk to and far fewer very large ones. Who should you spend your time on?

Everyone discovers early that lofty revenue/profit goals will not be achieved with smaller customers in a reasonable timeframe. If you need thousands of business engagements to get to where you want to be, a startup-sized salesforce is not going to get you there in a reasonable timeframe. In fact, the salesforce that could make this happen does not want to work with you. Good salespeople work with companies with established products so they know they can make quotum (READ THIS LINK, by the way).

So when a potential ‘whale’ of a customer comes along, it is tempting to jump on that and give it all you got. And here is where I want to warn you. Large corporations and governments typically ‘tender’ deals. They don’t just pick a vendor, test their stuff, and make the deal. Instead, they write out a very confusing and conflicting list of requirements and instructions, and send that to any interested parties. Which may include you!

A typical RFP-process includes a spreadsheet filled with hundreds or thousands of numbered requirements, a set of documents outlining the procedure, and a number of questions like ‘outline security architecture of the product’, ‘provide copy of your sustainable sourcing policy’, and ‘describe in detail how the system deals with errors’. Next up might be a clarification meeting, where you can ask questions about the requirements and procedure. You then send in the huge stack of requested documents, after which you might be invited to present your company in person. This is then followed by interminable rounds of negotiations, references, proof of concept sessions etc.

Now, if your product is struggling to find a market (and at the beginning, it WILL be), this sure feels like traction! We’re getting somewhere, we have a potential customer, they have requirements, we can try to meet them, we have to show up for presentations etc. It almost feels like the real thing!

In my startups, I have wasted MONTHS on these processes. Turns out however, startups don’t win RFPs. Not until the word ‘startup’ starts feel wrong for your (by now) serious company.

rfp.jpg
Then, there is this


So why don’t you win an RFP as a small startup? For one, there is the kind of company that inflicts the RFP-process on itself. These are not dynamic places. These are not the organisations that want to give a startup a chance. That’s why they do an RFP, to make sure nothing is bought where they don’t have it in (credible) writing that the product will do what it promises. You mostly sell to the procurement department, not the actual user. And no matter how fab your product or service, the procurement department sees only risk in your startup. For one they will try to check your financials for the past three years. You have not HAD three years!

The second reason you don’t win is that an RFP is a compendium of every requirement someone ever voiced in the company. Hundreds of them at least. And this strongly favours incumbent vendors who have had years or decades to add every such feature under the sun, if it makes sense or not. The deck is stacked against you.

This third reason you don’t win an RFP is that is is typically heavily lobbied by existing relationships, making sure that only one vendor qualifies, or that new challengers (you) are immediately disqualified from the process because you don’t have 100 staff, over 10 million in annual revenue or 5 years of profitable business behind you.

In addition, the RFP process is highly depressing:

  • Seeing a list of features you don’t have and won’t have anytime soon is painful
  • Many requirements are in fact nonsense (‘system MUST be redundant against power failures’ - customer is trying to procure software!) - which makes you wonder about the state of the world
  • Finding out you didn’t actually have a chance because you are a startup is a blow

But the siren song of the RFP is still tempting for the business to business startup since it sure feels like progress and traction! It may be hard to resist if no other actual sales are going on. So, here are some reasons why it might make sense to participate in an RFP anyhow:

  • You get a free list of competitor features! Most of them show up as requirements (see the lobbying above)
    • In general, the process is great ‘competitive intelligence’ - although this works both ways, your competition often learns about you too!
  • The whole process is very educational about how large customers think and operate, something most startup employees have little experience with
  • Attempting to meet the giant list of requirements is a great motivator for your development team, finally something concrete to aim for
  • The documents requested in an RFP might come in handy anyhow, like that ‘high-level overview of your architecture’. And with pressure, such documents get written a lot faster
  • It gives your salespeople something to do except moan about lack of traction, although the flip side of this is that they waste their time on the RFP and don’t get any actual business done
  • Frequently, a whole RFP process fails (very frequently by the way, much more than you’d think), and if you managed to make a great impression, you might get invited to the ‘afterparty’ and do business anyhow

But always manage the process carefully. Taking part in such a large process can swallow all the time and resources of a small startup, and in the end you might have little to show for it. Be sure to drop out on time when it isn’t working. It’s better to lose quickly than not to win slowly. And in any case don’t neglect the rest of your business as the process goes on! Also, do realize that even if you send in a compliant RFP response, it still only sits in the sales pipeline. It is not a purchase order.

Finally, there is the risk that you might actually win! And that is the point where all those ‘FULLY COMPLIANT’s you optimistically put in the spreadsheet come back to haunt you. You don’t get paid until you are actually fully compliant! That and the potential huge size of the deal that could well overwhelm your startup.

So getting back to the beginning of the post, the smaller customers that don’t fit with your lofty long term revenue goals. Well, they are your path to the market. For one, because they themselves are smaller, don’t feel bad about doing business with small companies. In fact, when a small company tries to do business with a huge one, they feel they don’t get the attention they deserve.

Also, because no (formal) procurement department sits in between, if you find a small customer with strange requirements, you can talk to the people with the actual requirement  and figure out what they mean, or convince them to drop it.

This does not mean your initial customer should necessarily be tiny. They might even be pretty large, as long as they are still procuring things ‘humanly’, and not by spreadsheet with macros that prevent you from entering explanations (not making this up). Your first goal is to get ANY revenue - it will help you sustain your business or help show (current and future) investors that you really are moving the needle.

Once you’ve established yourself through several approachable launching customers, you might start winning RFPs. And it still won’t be fun, but it will get you to your financial targets.

Good luck!





Wednesday, July 15, 2015

Developing open source: don't listen to the people that want you to live under a bridge

This post is for open source developers and all other people working on open source, while trying to make a living. You may also get referred to this page if you made an open source developer unhappy with your demands.

Let's start at the very beginning: authoring, documenting, packaging and supporting any software takes stupendous amounts of time. Quality requires serious, dedicated and sustained effort.

Secondly, we need to realise that people also need to eat, live somewhere, get health insurance and often eventually raise and support a family. It would also be great if they saved for their retirement.

Combining these two, doing a non-trivial open source project requires more than 'evening hours and weekend work'. It requires people dedicated to the task. But they also need to make money to live!

And eventually this collides with some folks' expectations of open source. It turns out you generally can't live on charitable donations, and I'm not even sure if you should - donations come and go, and they may also come with expectations that are contrary to those of your actual users. Most large open source projects will therefore need to make money the traditional way: by actually selling something.

There are loads of things you can sell. Consulting, support, new features (even open source, people will pay to get the features they need), training, training materials, value added services, perhaps even some non-free software on top of the stuff you give away. But no matter how you do it: if you ask money for things, some people who can't or won't pay get left out in the cold.

And this frequently leads to anger. People will accuse you of selling out, and this hurts. They may even mention you are stealing from the community. Oddly enough they will also threaten to stop using your software! And all of this because you try to make a living so you can provide this great open source stuff for free.

So here's my word of encouragement: there is a segment of the open source community that you will never appease. They won't be satisfied until you live under a bridge, sell your body by day so you can code by night. For free.

You won't ever make these people happy. Whatever you do, it will not be free enough, and you should always do more. Their threat to stop using your software should tell you everything you need to know about them. Finally also realise that more often than not, the very people that accuse you of selling out work for horrible companies that would not DREAM of committing anything back to the community!

So separate out these people that want you to live on the streets from the parts of the community you should be listening to. They will help you guide the complex and challenging landscape of 'making money with open source'. But if you try to make wrong people happy, you'll fail and you and your software will end up badly.

(On a side note - open source is a community, not just business. There is no need to sell all or even the majority of your work. Some things are just a great idea, and you should add them to your software. Also ponder, users that need other stuff from you might "pay" you in Q&A, documentation work, (performance) testing etc. So don't get me wrong - this post is about making a living, not about asking money for everything!)

Summarising: making money with open source is ok, because delivering quality for a non-trivial project costs time, and that time can't come after dinner when you are tired from your day job. It should be a real job, and that requires income. Don't feel bad about it and don't try to make the wrong people happy. Do listen to the rest, as there is real tension between open source and making money, and they can guide you.

Good luck!



Saturday, June 6, 2015

Design of DelayPipe, a small class to let you execute millions of things in the near future

When developing, quite frequently we run into the situation where we want to delay an action a little bit, perhaps for half a second. If your software is not highly concurrent, this is as easy as a call to one of the more granular sleep() calls now available.

But what if you have a user facing 1000 DNS queries per second, wanting to delay answering all of them by a few hundred milliseconds? We can't spawn 1000 threads per second and just run nanosleep() on them!

This problem is all the more vexing because many end-users find it very easy to say the words 'just delay the answers a bit!', without realizing that this is in fact not an easy thing to do at high concurrency.

This week, we saw Pavel Odintsov run into this exact issue, nameservers from a big search engine were flooding him with questions, and Pavel was looking for solutions that would not involve actually blocking the search engine. We suggested adding a delay in answering queries, since this frequently shuts up 'back to back' generators of questions.

But then we actually had to do it. PowerDNS has a nice office these days, and it takes me 30 minutes of cycling to get there, and I find those 30 minutes are well suited to speccing out interesting solutions to problems.

What I came up with has now been implemented. It does have downsides, but it is remarkably simple and handles millions of delayed events per second with ease.

First, we employ a trick I documented back in 2007, namely using a pipe within the same process to transmit pointers. This gives you 'free' communication between threads, and since you pass pointers over the pipe, there is no need to serialize anything. We use the pipe to send the a tuple containing the event & when it needs to be executed.

The nice thing too is that pipes guarantee that small writes are atomic. So we can have as many threads as we want put events in the pipe, without any locking on our side, and they will arrive at the worker thread uninterleaved. And no locking (in our code at least) is always a win!

As an aside, we've been using the pipe trick since 2007 or so, and it has never shown up in any benchmarks (except in a positive fashion). However, people frequently associate pipes with "slow text based command line monstrosities", and assume pipes must be slow. In fact, pipes are core operating system infrastructure, and they are blazing fast.

Getting back to our 'DelayPipe', the worker thread reads pointers from the pipe, and puts them in a time ordered container. Subsequently, there is the question of how to 'wait until the first event that needs to happen'. And again, we can reuse some POSIX semantics, as follows.

We don't immediately read from the pipe, but we call poll() on it first with a timeout, and that timeout is equal to the amount of time we must wait until the first event needs to be executed. And if there is no work already queued, we wait infinitely long. For safety, we also check if there is actual work in the queue that is overdue for happening and in that case we don't read from the pipe at all. Once poll() either tells us we have a new event, or a timeout happened, we execute all events that have reached their appointed moment.

With this simple setup, we get a solution that is thread-safe, since many threads can put things in the pipe simultaneously. We do have only one executor of delayed events, which might be unacceptable if the execution of events is actually slow. However, we could easily spawn more of these threads, and give them their own pipe.

A potential downside is that this setup malloc()s stuff in a producer thread and free()s them in the worker thread, something that is known to be a heavily locked and potentially slow operation in many mallocs. Benchmarking on recent platforms however has not shown any appreciable overhead because of this.

Finally, after only a 30 minute trip on my bike, and half a day of coding, we were able to get this up and running:


The actual code can be found here and here. The testcase shows how it works in practice. I hope this has been useful for you!

PS: before you cry "you don't know how deep the pipe is!", please do realize we don't actually use the pipe for queueing, we have separate container for that. The pipe is only a nice way to get easy producer/consumer communications between threads with sane semantics.

Thursday, March 5, 2015

Some notes on shared_ptr atomicity and sharing configuration state

At PowerDNS, we've frequently run into this problem: a program has a complicated amount of state and configuration which determines how queries are processed, which happens non-stop. Meanwhile, occasionally we need to change this configuration, while everything is running.

The naive solution to this problem is to have a state which we access under a read/write lock. The state can in that case only be changed if no thread holds a read lock on it. This has at least two downsides. For one, locks aren't free. Even if they don't involve system calls, atomic operations cause inter-CPU communications and cache evictions. Secondly, if the worker threads hog the read lock (which they may need to do for consistency purposes), we can't guarantee that updates happen in a reasonable timeframe.

Effectively this means that a change in configuration might take a very long time, while we incur overhead every time we access the configuration, even if it isn't changing.

A very very tempting solution is to keep the configuration in a shared_ptr, and that threads access the configuration through this shared_ptr. This would give us unlocked access to a consistent configuration. And, if we read the C++ 2011 standard, it looks like this could work. It talks about how std::shared_ptr is thread safe under various scenarios. Simultaneously, the standard defines atomic update functions (20.7.2.5), which are sadly unimplemented in many modern compilers. This is a hint.

So here is what one would hope would work:
if(!g_config->acl.check(clientIP)) dropPacket();
Where the global g_config would be something like shared_ptr<Config>. If the user updates the ACL, we would do this to propagate it:
auto newConfig = make_shared<Config>(*g_config); newConfig->acl=newACL; g_config=newConfig;
And we would fervently hope that that the last statement was atomic in nature, so that a user of g_config either gets the old copy, or the new copy, but never anything else. And this would be right at least 999999 out of 1 million cases. And on that other case we crash. I know cause I wrote a testcase for this this afternoon.

It turns out that internally, a shared_ptr consists of reference counts and the actual object. And sadly, when we assign to a shared_ptr, the reference counts and the object get assigned to separately, sequentially. And a user of g_config above might thus end up with a shared_ptr in an inconsistent state that way.

By tweaking things a little bit, for example by utilizing swap(), you can increase the success rate of this mode of coding to the point where it fails almost almost never. You could fool yourself you solved the problem. Over at PowerDNS we thought that too, but then suddenly CPUs and compilers change, and it starts breaking again, leading to hard to debug crashes.

So, to summarise, whatever the C++ 2011 standard may or may not say about shared_ptr, as it stands in 2015, you can't atomically change a shared_ptr instance while someone tries to use it.

And of course we could add an RW-lock to our every use of g_config, but that would get us back to where we started, with heavy locking on everything we do.

Now, in general this problem (infrequent updates, non-stop access) is very well known, as is the solution: Read Copy Update. I'm not a big fan of software patents (to say the least), but I'll lovingly make an exception for RCU. IBM released the patent for use in GPL-licensed software, and unlike most patents, this one doesn't only prohibit other people from doing things, RCU also tells you exactly how to do it well. And RCU is sufficiently non-obvious that you actually need that help to do it well.

Now, the full glory of RCU may be a bit much, but it turns out we can very easily get most of its benefits:

  • Lock the g_config shared_ptr before changing it (this can be a simple mutex, not even an RW one, although it helps) 
  • Have the threads make a copy of this g_config ten times per second, fully locked. 
  • The threads actually only access this (private) copy
This means that if the configuration is changed, the operational threads will continue with the old configuration for at most 0.1 second. It also means that no matter how staggering the overhead of a lock is, we incur it only ten times per second. Furthermore, since the lock is only held very briefly for a copy, the updates will also happen very quickly.

In this way, we don't rely on unimplemented atomic shared_ptr functions, but we do get all the benefits of almost completely unlocked operations. 

UPDATE: Many people have pointed out that instead of "10 times per second", do the update if an atomic "generational" global counter no longer matches the local one. But some potential synchronisation issues linger in that case (you might miss a second very rapid change, for example. So while interesting, we do lose simplicity in this case.

UPDATEhttps://github.com/ahupowerdns/pdns/blob/dnsname/pdns/sholder.hh#L4 has the code for this idea

Summarising: don't attempt to rely on potential shared_ptr atomic update behaviour, but infrequently copy it it, but frequently enough that changes in configuration propagate swiftly, but not so frequently that the locking overhead matters.

Enjoy! And if you know about the implementation plans and status of the atomic_load etc family of functions for shared_ptr in the various popular compilers, please let me know!

UPDATE: Maik Zumstrull found this thread about the atomic shared_ptr operations in gcc.


Friday, February 13, 2015

Some notes on sendmsg()

This post is mostly so other people can save themselves the two days of pain PowerDNS just went through.

When sending or receiving datagrams with metadata, POSIX offers us sendmsg() and recvmsg(). These are complicated calls, but they can do quite magical things. They are the "where does this go" part of the socket API. And a lot went there.

For example, when you bind to 0.0.0.0 or ::, you receive datagrams sent to any address on the port you bound to. But if you reply, you need to know the right source address because otherwise you might send back a response from another IP address than received the question. recvmsg() and sendmsg() can make this happen for you. We documented how this works here.

So, we learned two important things over the past few days.

Requesting timestamps
To request timestamp information, which is great when you want to plot the *actual* latency of the service you are providing, one uses setsockopt() to set the SO_TIMESTAMP option. This instructs the kernel to deliver packets with a timestamp describing when the packet hit the system. You get this timestamp via recvmsg() by going through the 'control messages' that came with the datagram.

On Linux, the type of the control message that delivers the timestamp is equal to SO_TIMESTAMP, just like the option we passed to setsockopt(). However, this is a lucky accident. The actual type of the message is SCM_TIMESTAMP.  And it only happens to be the case that SO_TIMESTAMP==SCM_TIMESTAMP on Linux. This is not the case on FreeBSD.

So: to retrieve the timestamp, select the message with type SCM_TIMESTAMP. If you select for SO_TIMESTAMP, you will get no timestamps on FreeBSD.

Datagrams without control messages
Secondly, sendmsg() is not a very well specified system call. Even though RFC 2292 was written by that master of documentation Richard Stevens, it does not tell us all the things we need to know. For example, we discovered that if you use sendmsg() to send a packet without control messages, on FreeBSD it is not enough to set the length of the control message buffer to 0 (which suffices on Linux).

FreeBSD in addition demands that the control message buffer address is 0 too. FreeBSD has a check if the length of the control message buffer is at least 1 control message, unless the address of the control message buffer is 0.

So: if you use sendmsg() to send a datagram without any control messages, set both msg_control and msg_controllen to 0. This way you are portable.

We hope the above has been helpful for you.



Sunday, January 18, 2015

On C++2011, Quality of Implementation & Continuous Integration

Over the past years, I've done a few projects outside the main PowerDNS tree, and for all of them I've used C++2011. A wonderful mark of how big of an improvement C++2011 is, is how much pain you feel when you return to programming in 'regular' C++.

Recently, I've started to note that the new powers of C++ can either translate into better productivity (ie 'more functionality added/hour of work') or perhaps more importantly, in higher quality of implementation ('QoI').

And this drove me to ponder the concept of QoI a bit, as I think it is underrated compared to writing fast (by some measure) and bug free code.

I recently had to pick a reasonable value as an estimate while writing C++03 code, and I found that my fingers considered it too much work to actually scan three vectors of objects to make a decent estimate. As a result, the code ended up with a hardcoded number which (for now) is reasonable.

This is not quality of implementation. For example, a low QoI implementation of a generally useful memory allocator functions well for the amount of memory the author used it for - say, 1 gigabyte. Unbeknownst to you, lots of the inner workings are overkill when on an embedded platform, for example an O(N) algorithm that is actually pretty slow for small N. Meanwhile, other parts of the library might scale badly to 2GB of memory arena.

A high QoI implementation of a generic memory allocator would find ways to scale itself to multiple domains of scale. It would not surprise you over the years as your project evolves. It would adapt.

We often hear (correctly) 'make it work, make it right, make it fast'. QoI is the part of making it right. It might in turn also make your code fast!

In my example, C++2011 would've allowed me to scan three different vectors like this:

for(const auto& vec : {vecA, vecB, vecC}) {
  for(const auto& entry : vec)
     totSize += entry.length();


Whereas the equivalent in C++03 is something like:
unsigned int countVec(const vector& vec)
{
    int ret=0;
    for(vector::const_iterator iter = vec.begin(); iter!=vec.end(); ++iter)
       ret += iter->length();
    return ret;
}

... lots of code in between ... 
totSize = countVec(vecA);
totSize += countVec(vecB);
totSize += countVec(vecC);

You can see how the second variant might not happen ("100kb of entries is a good guess!").

If for this reason alone, I expect my C++2011 code to not only be more pleasing to read, but also to deliver higher quality of implementation.

It is therefore that it pains me to report that in 2015, I can't find a Continuous Integration provider taking C++2011 seriously (for free or for money).

Travis-CI which I otherwise love dearly uses an antiquated version of g++ that doesn't do C++2011 at all. If you modify the platform into installing g++-4.8, you find that the supplied version of Boost predates C++2011 and fails to compile. The deployed version of clang fares better, but can't do threads, and bails out the moment you #include anything thread related.

Meanwhile, Circle CI does ship with a slightly more recent gcc (but not recent enough), but for some reason uses a version of Ubuntu that can't install 'libboost-all-dev', or even 'libboost-serialization-dev'.

I spent some hours on it this morning, and I'm sure there are solutions that don't involved compiling Boost for every commit, but I haven't found them yet.

So pretty please, with sugar on top, could the CI platforms up their game a bit? If the goal of CI is to quickly find bugs and issues, they should surely feel motivated to support a language that offers ways to do this.

Thanks.