Skip to main content

Replication

·2 mins
  • Replication:
    • Multi-leader replication:
      • Conflicts the biggest problem.
    • Leaderless replication:
      • The key thing to remember is that there is no order in the way writes happen in a leaderless replication world. (Unlike single-leader replication.)
        • Version numbers associated with each write help decide what version is the latest. Needless to say, timestamps aren’t sufficient because of clock skew problems.
      • Think: Dynamo-style architecture.
      • Quorum reads/writes: If n nodes are present for a particular piece of data, w + r > n.
      • In case of partial write failures, how do you ensure that stale replicas eventually get updated?
        • Read repair: as the name suggests.
        • Anti-entropy process: some background process(es) that repair the records.
      • If unsuccessful writes - those that succeeded on at least 1 node but not on quorum - what happens if you don’t rollback?
      • Interesting edge cases can happen if concurrent writes happen, concurrent reads and writes happen, etc.
        • Also, don’t assume that, in case of 2 concurrent writes, last-one-wins is an acceptable trade-off from the database perspective. That’s a case of data loss and some applications may not be okay with it.
      • Sloppy quorums: let’s say a piece of data needs to go to a particular set of n nodes. Now, the home nodes don’t have enough w available, but some other set of w nodes are available in the data center.
        • To increase availability, we could decide to still write. All that means is that the data is written to some w nodes, not necessarily the right ones.
        • So, next we need hinted handoff: once the right w nodes are available, transfer the data to them.