1. The problems of distributed state machines and distributed logs are essentially the same.
  2. 3 roles: leader, follower and candidate.
  3. Only 2 RPC APIs in the algorithm. RequestVotes and AppendEntries (which also serves as heartbeats from leader to followers).
  4. Terms are tied to leadership changes, not with time. There can be terms that don’t have any leader because of election failures.
  5. Log indices change with commands from clients. So, multiple indices in sequence could have the same term.
  6. Leader is always right. So, it can change old log entries on other nodes. However, during election, the algorithm tries to elect a leader that is most up to date.
  7. Invariant: if, at an index, term number is the same between two nodes, everything before it should also be so.
    1. If nodes find discrepancies during replication, they reject the AppendEntries call from the leader and it’s the latter’s responsibility to retry for smaller indices. So, just those two APIs in the system. (Followers don’t make any API calls to resolve discrepancies.)
  8. During election, node will vote for a candidate only if the latter has a more complete log, which is defined by: 1) higher term and 2) higher log index, in sequence.
  9. 2 different majorities will contain at least 1 common member. While obvious, this has an important implication. If 2 different leaders are elected in sequence, at least 1 node will be common between those that voted for them.

Open questions:

  1. What happens if there is a split brain, i.e. a new real leader and an old one that hasn’t yet stepped down (even though it should), and both try to replicate to a 3rd node? What does that 3rd node do?
    1. All the nodes who know about the new leader, which will be in majority, will reject the old leader’s AppendEntries calls because it must be using an older term number.
  2. When do you apply a command? Right after storing in a log or after it’s committed?
    1. Actually, the commands are committed, maybe even replicated, asynchronously across the replicas. So, it’s possible that the data on the replicas is not always in sync with that on the leader.
    2. Committed = replicated to majority but not necessarily applied on those. Still useful on leader because it can process that command on its end.

Resources