Consistency: Eventual vs. (the illusion of) Strong

All distributed database systems are eventually consistent.

Summary of argument:
Strong consistency algorithms (2PC, T2PC, D2PC, etc.) do not remove eventual consistency from our distributed systems, they simply allow the transaction manage to know when consistency has been achieved or when a problem has occurred. This fact isn’t really understood by a lot of engineers and thus we get discussions about eventual consistency vs. strong consistency.

In my mind, which, admittedly, could be considered a universe in its own right, there is no such thing as strong consistency. It’s a shared consensual illusion that allows us to sleep at nights without having to worry about an entire class of problems. But, it’s just an illusion.

Ok, it’s possible to synchronize multiple data sources so that they have the same information at the same time, using distributed transactions; two-phase-commit (XA) and the like (T2PC, D2PC, etc.), but does this give us strong consistency where all data sources are in synch? Not really, for a couple of reasons.

First, an XA transaction is considered complete when all data sources have committed their portion of the transaction and reported back to the transaction manager that the commit was successful. What happens if there is a failure in one of the commits? It happens. The XA manage will report the transaction as having failed (assuming it’s not the manager that’s at fault), but the database instances will be out of sync. In this case, 2PC doesn’t guarantee consistency, but as long as the transaction manager isn’t the thing that failed you should be informed of the problem. Assuming that someone will take action to resync the data at some point we have dropped back to eventual consistency.

Another issue is that distributed data sources take different amounts of time to commit. Eg: my XA manager progresses to the commit phase, say for a 2 node transaction, and everything is fine and the transaction completes with both ack’s turning up. There is still a period of time during which the first node has committed but the second one hasn’t yet done so, during which time we’ll discover inconsistent data across these nodes. Note that with every possible model of distributed transactions this is always the case — there is always a period of inconsistency. Therefore, we have eventual consistency again. Ok, we’ve got eventual consistency, and a callback to tell us when things are in synch again, but that’s all we have.

Then, lets consider the human element. If a human is driving the transaction somehow, say by using a credit card to pay for some goods, the databases involved in the transaction may be considered synchronised, but the human definitely is not in synch, but the human can be considered part of the system, a part that is always eventually consistent.

CQRS systems have even bigger problems. While XA may be used for the write side, it is rarely linked in to the updates for the read side, so in this case reads may be out of sync with writes.

Then, we have the issue of where the data is going once it is read after a transaction. If it’s being pulled into a gui, for example, it could be seconds between the commit and the display being updated, during which time another commit may have happened. What is the point in having the datasources up to date if the gui is reporting stale data? Of course, this doesn’t have to be a gui — that’s just for the example, it could be any upstream process that uses this data.

So, all distributed systems just are eventually consistent. You need to decide if you are interested in getting an event every time a limited group of datasources are found to be consistent (if that’s of any use at all), or if you’re just willing to live with the fact that parts of your system may be out of sync with other parts, and relax.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s