Thursday, March 5, 2015

Some notes on shared_ptr atomicity and sharing configuration state

At PowerDNS, we've frequently run into this problem: a program has a complicated amount of state and configuration which determines how queries are processed, which happens non-stop. Meanwhile, occasionally we need to change this configuration, while everything is running.

The naive solution to this problem is to have a state which we access under a read/write lock. The state can in that case only be changed if no thread holds a read lock on it. This has at least two downsides. For one, locks aren't free. Even if they don't involve system calls, atomic operations cause inter-CPU communications and cache evictions. Secondly, if the worker threads hog the read lock (which they may need to do for consistency purposes), we can't guarantee that updates happen in a reasonable timeframe.

Effectively this means that a change in configuration might take a very long time, while we incur overhead every time we access the configuration, even if it isn't changing.

A very very tempting solution is to keep the configuration in a shared_ptr, and that threads access the configuration through this shared_ptr. This would give us unlocked access to a consistent configuration. And, if we read the C++ 2011 standard, it looks like this could work. It talks about how std::shared_ptr is thread safe under various scenarios. Simultaneously, the standard defines atomic update functions (, which are sadly unimplemented in many modern compilers. This is a hint.

So here is what one would hope would work:
if(!g_config->acl.check(clientIP)) dropPacket();
Where the global g_config would be something like shared_ptr<Config>. If the user updates the ACL, we would do this to propagate it:
auto newConfig = make_shared<Config>(*g_config); newConfig->acl=newACL; g_config=newConfig;
And we would fervently hope that that the last statement was atomic in nature, so that a user of g_config either gets the old copy, or the new copy, but never anything else. And this would be right at least 999999 out of 1 million cases. And on that other case we crash. I know cause I wrote a testcase for this this afternoon.

It turns out that internally, a shared_ptr consists of reference counts and the actual object. And sadly, when we assign to a shared_ptr, the reference counts and the object get assigned to separately, sequentially. And a user of g_config above might thus end up with a shared_ptr in an inconsistent state that way.

By tweaking things a little bit, for example by utilizing swap(), you can increase the success rate of this mode of coding to the point where it fails almost almost never. You could fool yourself you solved the problem. Over at PowerDNS we thought that too, but then suddenly CPUs and compilers change, and it starts breaking again, leading to hard to debug crashes.

So, to summarise, whatever the C++ 2011 standard may or may not say about shared_ptr, as it stands in 2015, you can't atomically change a shared_ptr instance while someone tries to use it.

And of course we could add an RW-lock to our every use of g_config, but that would get us back to where we started, with heavy locking on everything we do.

Now, in general this problem (infrequent updates, non-stop access) is very well known, as is the solution: Read Copy Update. I'm not a big fan of software patents (to say the least), but I'll lovingly make an exception for RCU. IBM released the patent for use in GPL-licensed software, and unlike most patents, this one doesn't only prohibit other people from doing things, RCU also tells you exactly how to do it well. And RCU is sufficiently non-obvious that you actually need that help to do it well.

Now, the full glory of RCU may be a bit much, but it turns out we can very easily get most of its benefits:

  • Lock the g_config shared_ptr before changing it (this can be a simple mutex, not even an RW one, although it helps) 
  • Have the threads make a copy of this g_config ten times per second, fully locked. 
  • The threads actually only access this (private) copy
This means that if the configuration is changed, the operational threads will continue with the old configuration for at most 0.1 second. It also means that no matter how staggering the overhead of a lock is, we incur it only ten times per second. Furthermore, since the lock is only held very briefly for a copy, the updates will also happen very quickly.

In this way, we don't rely on unimplemented atomic shared_ptr functions, but we do get all the benefits of almost completely unlocked operations. 

UPDATE: Many people have pointed out that instead of "10 times per second", do the update if an atomic "generational" global counter no longer matches the local one. But some potential synchronisation issues linger in that case (you might miss a second very rapid change, for example. So while interesting, we do lose simplicity in this case.

UPDATE has the code for this idea

Summarising: don't attempt to rely on potential shared_ptr atomic update behaviour, but infrequently copy it it, but frequently enough that changes in configuration propagate swiftly, but not so frequently that the locking overhead matters.

Enjoy! And if you know about the implementation plans and status of the atomic_load etc family of functions for shared_ptr in the various popular compilers, please let me know!

UPDATE: Maik Zumstrull found this thread about the atomic shared_ptr operations in gcc.