Zookeeper and chain replication
·3 mins
- Zookeeper:
- One of the most interesting parts of ZK is how they designed APIs for a system to be a general-purpose coordination service.
- It supports hierarchy in its key structure, similar to how filesystems work. Two benefits:
- This hierarchy allows namespacing so that multiple unrelated systems can use a single ZK instance without stepping onto each other’s toes.
- Probably allows fall back. For e.g., if you don’t find config for a node, keep on going to its parents until you find what you are looking for. However, the lecture doesn’t talk about it so this is just a hunch.
- Mostly CRUD operations but with an interesting extra trick: watch points. For e.g., the “isExists” API allows you to atomically set a “watch” if the file you are looking for doesn’t exist.
- Watch notifications are sent out in order they were set (or something).
- It supports hierarchy in its key structure, similar to how filesystems work. Two benefits:
- How to implement locks using ZK:
- One way is to let all clients create a file and add a watch if file already exists. Whoever is able to create the file gets the lock and the others wait. When the file gets deleted (either explicitly by the client who created it, when it no longer needs the lock, or implicitly if that client dies), all remaining clients try again and the cycle repeats. This suffers from herd problem though.
- We can avoid the herd problem by letting all clients “get in line” and wait for their turn: they all get a sequence-id, similar to a file-version, for a particular file. A client then acquires a lock only once all clients that came before it have released their file versions.
- You can also do optimistic concurrency control through ZK.
- Good for storing small amounts of configuration/state: for e.g., all clients that make up my service, who all are alive etc.
- One of the most interesting parts of ZK is how they designed APIs for a system to be a general-purpose coordination service.
- Chain replication:
- Seems like an interesting idea: you have a chain of servers, writes go to head and replicated one by one through the chain to the tail, and reads are served from the tail.
- Good parts:
- Fully linearizable, so that’s interesting. Also, failure handling (not necessarily failure detection) somewhat straightforward because you can just remove the culprit host.
- Head has to do less than work than that required by a Raft leader. (The latter has to listen to write requests from clients and talk to 2 or 4 replicas.)
- Issues:
- Split brain possible: for e.g., if there is a network partition between head and head->next nodes, both of them may think they are leaders. So, you need an external system such as Zookeeper to manage chain participants. Whenever ZK detects failures, it updates the chain to add or remove nodes.
- This is still hard. For e.g., what happens if head and head->next can both talk to ZK but not to each other? ZK might think they are both up and leave it be.
- If one server in the chain goes slow, it affects the entire chain. Raft, instead, can continue to make progress if one server is slow.
- Split brain possible: for e.g., if there is a network partition between head and head->next nodes, both of them may think they are leaders. So, you need an external system such as Zookeeper to manage chain participants. Whenever ZK detects failures, it updates the chain to add or remove nodes.
- CRAQ seems to be an improvement on chain replication but don’t think the lecture explained how.