Transactions, ACID etc.
·2 mins
- D = durability. Easiest to understand.
- C = consistency, i.e. application-level constraints or invariants are met. Property of the application that’s using the DB and not the DB itself. So, kind of useless.
- A = atomicity or, better explained as, abortability. Either all commands in a transaction go through or none of them.
- I = isolation is the trickiest one.
- Concurrent operations in a DB can lead to various race conditions.
- Dirty reads: read something that’s not yet committed.
- Dirty writes: transaction T1 writes on top of partial writes, which aren’t yet committed, by another transaction T2.
- Read skew: Let’s say a long-running client reads a lot of records from a DB. If transactions keep changing records during its run, the state of the DB the client captures may not be consistent.
- Write skew: Let’s say a transaction checks for a premise and then writes based on that. What happens if the premise is no longer valid by the time the write happens?
- DB isolation features:
- Read committed: doesn’t allow dirty reads or writes.
- Snapshot isolation or repeatable reads: no read skews.
- Need to track transaction level information per record to implement this.
- Serializable: no write skews.
- Strongest level but isn’t cheap to implement. So, not necessarily the default amongst all the DBs.
- Algorithms to implement:
- Literally serialize on the DB. Can work if all transactions are guaranteed to be fast.
- 2PL: lock all rows that match the premise before updating them. So, pessimistic locking.
- SSI, i.e. snapshot serializable isolation: Allow all transactions to run concurrently but don’t let them commit if they’d introduce a conflict. So, optimistic.
- Concurrent operations in a DB can lead to various race conditions.
- 2-phase commit and 2-phase locking, i.e. 2PC and 2PL are different things.
- You use 2PC to support transactions across multiple systems. (I still need to learn about those in more detail.)
Open questions:
- In 2PL, how do you handle situations where the premise was based on non-existent rows that appeared, because of some other transaction, before the write happened?