Replication
·2 mins
- Replication:
- Multi-leader replication:
- Conflicts the biggest problem.
- Leaderless replication:
- The key thing to remember is that there is no order in the way writes happen in a leaderless replication world. (Unlike single-leader replication.)
- Version numbers associated with each write help decide what version is the latest. Needless to say, timestamps aren’t sufficient because of clock skew problems.
- Think: Dynamo-style architecture.
- Quorum reads/writes: If n nodes are present for a particular piece of data, w + r > n.
- In case of partial write failures, how do you ensure that stale replicas eventually get updated?
- Read repair: as the name suggests.
- Anti-entropy process: some background process(es) that repair the records.
- If unsuccessful writes - those that succeeded on at least 1 node but not on quorum - what happens if you don’t rollback?
- Interesting edge cases can happen if concurrent writes happen, concurrent reads and writes happen, etc.
- Also, don’t assume that, in case of 2 concurrent writes, last-one-wins is an acceptable trade-off from the database perspective. That’s a case of data loss and some applications may not be okay with it.
- Sloppy quorums: let’s say a piece of data needs to go to a particular set of n nodes. Now, the home nodes don’t have enough w available, but some other set of w nodes are available in the data center.
- To increase availability, we could decide to still write. All that means is that the data is written to some w nodes, not necessarily the right ones.
- So, next we need hinted handoff: once the right w nodes are available, transfer the data to them.
- The key thing to remember is that there is no order in the way writes happen in a leaderless replication world. (Unlike single-leader replication.)
- Multi-leader replication: