Trying to understand CAP

The CAP theorem, stated by Brewer and proved by Gilbert and Lynch specifies a property of distributed systems. It states that such a system cannot guarantee at the same time Consistency, Availability and Partition tolerance. It is also often said as a catchy phrase:

Consistency, Availability, Partition Tolerance – pick any two

used mostly when talking about NoSQL databases and suggesting that a distributed system can be characterized as either CA, AP or CP (see e.g. here).

For some time I’ve been trying to understand the different combinations and what do they mean in practice; having some time at various airports I caught up on reading, and here’s what I came up with.

What’s C, A and P?

First let’s define how I understand the three guarantees, basing on some of the articles I’ve read.

Consistency is the easiest one. It roughly means that the clients get the same view of data. By saying that a system is consistent we often mean strong consistency, but it also can come in different flavors, e.g. casual.

Availability is a property saying that every request to a non-failing node will return a (meaningful) response. The response may not contain all data (so the harvest will not be 100%, see the appropriate section in [3]), but it should be useful for the client.

Partition tolerance means that the system will continue working even if any number of messages sent between nodes is lost. This can be e.g. a network failure between two datacenters, where nodes in each datacenter form a partition. Also note that a failure of any number of nodes forms a partition (it is not possible to distinguish between a network failure and a node failing and stopping to respond to messages).

The hardest part for me is understanding the difference between Availability and Partition tolerance. Also, the various articles don’t specify what they mean by saying that a system is “working” after being e.g. partitioned – does it mean that every request gets a response with useful data, or are responses “Sorry, I can’t give you data right now” acceptable also?

P+C?

Let’s assume that a system is partition tolerant and that it has more than one node. If a partition is formed, splitting the system in two, the system should continue working. Hence both partitions allow clients to write. But then, how to guarantee consistency? If one client writes to partition 1, and another to partition 2? Hence: P => ~C.

A+~P?

Suppose now that a system is available and that we have more than one node. As the system is available, it should respond to requests even if some nodes die. As noted above, some nodes dying are equivalent to a partition. So if the system is still working, we have partition tolerance. Hence: A => P.

A+C?

Summarizing the two implications above (A => P and P => ~C), we get: A => ~C, so that an available system cannot be consistent (if it has more than one node). In practice however, there are of course AC systems, e.g. single-node RDBMS. Or even master-slave/master-master replicated RDBMS, provided there’s a central router knowing which nodes live and directing client appropriately. Such a router is then a single point of failure (SPoF).

Relax?

I suspect that in reality, when e.g. NoSQL/NewSQL systems are characterized with the CAP properties, they assume some relaxed form of C/A/P. Unfortunately, all of the definitions flying around seem to be pretty vague and are more of hand-waving than proper, precise statements. I think it would be much easier to explore the ever-growing ecosystem of new datastores if they could be more easily characterized; maybe the CAP vocabulary is just not enough?

Please correct me if I’m wrong somewhere, and I probably am! :)

Adam

Some articles I used for my research:

[1] http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
[2] http://www.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/
[3] http://codahale.com/you-cant-sacrifice-partition-tolerance/

  • Michael

    Hello Adam,

    Thanks for pointing out these articles : this CAP problem has always been a mystery for me. So I’ll try to be of any help to you in return by explaining what I understood from these readings.

    From what I gather, the problem is that “partition tolerance” is a completely conter-intuitive wording. The definition you retained is about availability. The original one is a mathematical precept that needs to be applied to our real world : ““The network will be allowed to lose arbitrarily many messages sent from one node to another”.

    However :
    – We know our networks suffer failures and lose messages (it’s been proven by other people that no communication protocol is perfect, even theoretically).
    – But we build distributed applications on our networks.
    – Then any distributed application is implicitly allowed to lose arbitrarily many packets. If it was not, the application would stop at the first lost packet.

    Thus the real-world implication is : P is a given for any distributed application. You rightfully point out that single-node systems are AC : they are the only ones.
    – P is not a variable to be manipulated, it is a property of distributed systems.
    – P => ~(A,C).

    So, given a distributed system in real world, you have P and then have to choose between A and C :
    – either your partitions are available : this is the definition some people give to “partition tolerance” that leads to many misunderstandings
    – or your partitions wait for the partition to disappear before responding again, to guarantee consistency at the expense of availability

    I’m however not knowledgeable about the NoSQL world : I assume that consistency is somewhat achieved, sometimes delayed, while availability is the main goal. So they indeed losen the C contraint. Do you have any interesting articles about this question ? :-)

  • Michael

    After some more reading, including the original Gilbert and Lynch paper. After several re-readings, I do _not_ find a satisfactory definition in this paper. They do say how they _model_ partition tolerance but do not define what it is. So, in contrary to [2] I would say that this mathematical demonstration is not precise enough.

    I think we could add this to make things more understandable :

    – A “partition tolerant” system is one that defines its behavior when partitions occur. It then chooses A or C.
    => “on a highly distributed network [...] it is important that the service still performs as expected” : what do you expect ? A result anytime, anyhow, or only a good result when it is available ?
    – A non-“partition tolerant” system is not built to sustain partitions. It would have unpredictable behavior when partitions occur. I guess that either A or C would be lost, maybe randomly between requests.

    All in all, my best bet would be to rephrase “CAP” into “CAD” (distributed) to end it all. I’m no expert however so this would most certainly not hold scrutiny by the real guys :-)

  • http://www.warski.org Adam Warski

    I think it’s a very good observation – that if you want to have a distributed system, you need to tolerate partitions. And it may be that what people often understand when characterising the system as “P” is that it behaves non-randomly in a presence of a partition, not that all partitions “work” – for example denying any operations until integrity is restored. So +1 for the CAD terminology :).

    Also assuming a weaker definition of P, that a P-system behaves predictably when a partition happens, it is possible to have a PC-system: only the partition where at least half of the nodes+1 are present would complete requests (and in fact, decreasing this fraction is not possible in a symmetrical algorithm).

    Thanks for the comments!
    Adam

  • Michael

    Indeed, this is not unlike the classical quorum-based algorithm used in database clusters.

    Ironically, it is not considered an “available” distributed system since you do not need a total network failure to make your system unavailable, however it is still statistically more available than a single machine :)

    However the discussions I’ve seen on this topic often seem to miss the point : context. While you may accept that half of the world will not see your last tweet for the next 15 minutes, it is unacceptable that some bank transmits money that does not exist.

    So people saying that losing C is a “bad” tradeoff are probably in a context where consistency is worth the cost of big-iron database : consistency is costly for high volumes, thus an AP system is cheaper to achieve.

    Michael

  • http://bazhenov.me/ Denis Bazhenov

    I agree that A imply ~C, but the premises of that conclusion is incorrect.

    P doesn’t imply ~C. There are systems with PC guarantee. If partition occurs, system could finish in A state or C state, because messages could not be routed between partitions. So it’s either system performs write locally and becomes inconsistent but available (A) or return error upon writing (C). Therefore, A imply ~C (and C imply ~A). This conclusion contradicts with “Pick any two”. In fact, you choose any one :)

    One problem lies in terminology of availability, I think. In original paper there is no strict definition on what availability means in terms of database operations. After partition occurs, system could be available for reading, could be available for writing or maybe some combination of them.

  • http://www.warski.org Adam Warski

    Yeah, all the confusion arises from lack of clear definitions of A and P. And e.g. whether returning a write error in one partitions can be qualified as “working”. Thanks for the input!