Replication

Replication:
- Multi-leader replication:
  - Conflicts the biggest problem.
- Leaderless replication:
  - The key thing to remember is that there is no order in the way writes happen in a leaderless replication world. (Unlike single-leader replication.)
    - Version numbers associated with each write help decide what version is the latest. Needless to say, timestamps aren’t sufficient because of clock skew problems.
  - Think: Dynamo-style architecture.
  - Quorum reads/writes: If n nodes are present for a particular piece of data, w + r > n.
  - In case of partial write failures, how do you ensure that stale replicas eventually get updated?
    - Read repair: as the name suggests.
    - Anti-entropy process: some background process(es) that repair the records.
  - If unsuccessful writes - those that succeeded on at least 1 node but not on quorum - what happens if you don’t rollback?
  - Interesting edge cases can happen if concurrent writes happen, concurrent reads and writes happen, etc.
    - Also, don’t assume that, in case of 2 concurrent writes, last-one-wins is an acceptable trade-off from the database perspective. That’s a case of data loss and some applications may not be okay with it.
  - Sloppy quorums: let’s say a piece of data needs to go to a particular set of n nodes. Now, the home nodes don’t have enough w available, but some other set of w nodes are available in the data center.
    - To increase availability, we could decide to still write. All that means is that the data is written to some w nodes, not necessarily the right ones.
    - So, next we need hinted handoff: once the right w nodes are available, transfer the data to them.