Linearizability, Zookeeper etc.
·2 mins
- Linearizability:
- Defined on operation history as observed by clients, not something defined for system design or something.
- Despite the fact that a distributed database could have multiple copies of data (because of, say, replication or caching), it should behave as if it only had one copy if it were to be linearizable. So, operations on the data will have a total order.
- Same as “strong consistency”.
- Forbids us to serve reads from replicas - can only do from leader, otherwise clients may receive stale data. Kind of obvious.
- Zookeeper:
- Full-blown system as opposed to Raft-like library.
- Not fully linearizable because reads are allowed to go to replicas which may be stale.
- Guarantees:
- Writes are globally linearizable. I guess this is similar to how an atomic variable behaves in Java in that concurrent updates to it still result in an accurate value.
- All read or write requests from a single client are linearizable, irrespective of whether those go to different machines or whether some writes are sent asynchronously. ZK ensures this through zx-id which it sends back for every request.
- Seems like zx-id is similar to latest-committed-log-index in Raft.
- Different clients may still see stale data (because their zx-id might be from a stale replica), hence the system overall isn’t linearizable. So, no global order.
- This is how Zookeeper speeds up reads by allowing replicas to handle them. Otherwise, a fully linearizable system has to use a leader for reads and so won’t scale.
- If a client wants to read global latest data, it can apply a neat trick: send a dummy “write” request, called a “sync”, and then a read with that sync’s zx-id. I was thinking of a way to send the read request to the leader but this approach seems better.