Chapter 7, "Transactions"
·7 mins
- Transaction provides ACID safety guarantees:
- Atomicity == abortability: all or none of the commands in the transactions will go through.
- Consistency: useless because it’s a property of the applications and not databases.
- Isolation: 2 transactions don’t affect each other. (This is the primary focus of the chapter.)
- Durability: don’t lose written data.
- Isolation:
- Not a straightforward concept because databases provide different levels of isolation, sometimes while using inconsistent terminologies.
- True isolation == serializability. If not that, a database can provide some form of weak isolation.
- How do we classify weak vs strong isolation? Concurrent reads and writes during transactions can lead to a few types of race conditions - these are issues that wouldn’t have occurred if the same transactions had run serially. So, a database provides weak isolation if it suffers from any of those race conditions, strong if not.
- Names of isolation levels:
- Read committed.
- Snapshot isolation.
- Serializable.
- Race conditions that lead to weaker isolation levels. (A short summary is also present in chapter 5 of “Database Internals”.)
- Reads:
- Dirty reads: don’t read data from another uncommitted transaction.
- Non-repeatable read == read skew: a given transaction gets different versions when it reads the same record at different times. (Think of a long-running query against an OLAP database.)
- Phantom reads: same problem as earlier but against a range of data. Somewhat related to write skew below.
- Writes:
- Lost writes: For e.g., T1 and T2 read the current value of a counter at the same time, increment it by 1 and write the updated result one after the other. (Note that neither the read nor the write was dirty in this case. Still, the 2nd write lost some information from the previous one.)
- Dirty writes: don’t write on top of uncommitted data from another transaction.
- Write skew:
- 2 writes, when done individually, would be fine but violate an invariant when they occur concurrently.
- Another way to look at it is: a transaction read some data, decided to write something based on that data but the premise on which it made the decision is no longer true by the time it went to write.
- Some data can be a phantom. For e.g., “get list of doctors that are oncall during a particular shift” so that “I can remove myself if at least 2 doctors are available”. (Basically, this query is trying to remove one doctor if-and-only-if there will at least be 1 other oncall doctor available after the removal.)
- Keep in mind though that write skew can occur without phantoms. For e.g., A had $100 in their bank account and 2 transactions of the form “remove $75 from the account if the account has more than that” run concurrently.
- Note that serializable isolation doesn’t suffer from any of these race conditions.
- Reads:
- Weak isolation levels:
- Read committed:
- Solves for dirty reads and writes.
- Simple row-level locks are sufficient. These are kind of binary locks: either has or doesn’t have a lock on a given row. In snapshot isolation, we need to track a list of transactions per row.
- Another approach is for the database to maintain 2 versions of every record: last committed and currently uncommitted. When a transaction reads, just give the former.
- Snapshot isolation:
- For each row, we need to track what should be visible to all ongoing transactions against the database. So, we create multiple records for each user-visible row.
- Whenever a transaction updates a row, we create 2 records: one marked “delete”, another “create”. So, for a given transaction, its view of a particular record will only be based on all transactions that ran before it.
- Called MVCC == multi-version concurrency control.
- For each row, we need to track what should be visible to all ongoing transactions against the database. So, we create multiple records for each user-visible row.
- How to prevent lost writes?
- Atomic operations provided by the database.
- Locks such as
SELECT ... FROM ... FOR UPDATE
. - Automatic detection and prevention of lost writes: database could do this, say, in conjunction with snapshot isolation. (The book doesn’t provide sufficient details but I think the way it would work is that, during a write, check if any new versions of a record have shown up since the current transaction started.)
- Compare and set: such as “update record if previous record is such and such”. (Seems similar to optimistic locking from DynamoDB.)
- Read committed:
- Serializablity == strong isolation level.
- Doesn’t have any of the race conditions above (so, no lost writes or write skew as well).
- 3 approaches:
- Actual serial execution.
- Transactions need to be short and quick.
- Good for low write throughput such that transactions are handled by one CPU. If write throughput is high, think of partitioning in such a way that each CPU handles a single partition and transactions don’t need to span multiple partitions. (Basically, reduce performance overhead of CPU switching.)
- Disk can be another bottleneck. If all the data can fit in memory, that’ll improve the situation - but still need to ensure writes are durable. Write ahead log, possibly on SSDs, might help.
- Stored procedures also help: code that runs on the database directly. For e.g., Redis allows you to write such code in Lua. However, a bad code deployment can impact the whole database.
- 2-phase locking.
- Different from 2-phase commits. (Don’t get confused!)
- Yes, you first take locks (i.e. phase 1) and release at the end (in phase 2). But that’s not all because the same thing happens when you explicitly take locks for, say, preventing lost writes. So, the key thing to remember is: you take locks for both reads and writes, and the read locks should be for phantoms too.
- Shared and exclusive locks: similar to read/write locks in Java.
- How to prevent write skew by handling phantoms correctly?
- Materialized conflicts: make the conflict-able thing concrete in the database tables such that you (as application) can take locks on it. For e.g., if you build a meeting room booking system that allows users to block 30-min slots, you could add rows for every 30-min time slots (such as 2 to 2:30 pm) and take locks as needed.
- Predicate locks: predicates are the where clauses. So, a database could allow something like:
SELECT ... FROM ... WHERE ... FOR UPDATE
. So, a read transaction will check whether any of its conditions are already locked by someone else. And, a write transaction will wait if any of its conditions are locked in shared/exclusive mode by anyone else. - Index-range locks: predicate locks will work but can be difficult or performance-intensive to implement. However, the predicates you lock on might reference some table indexes. So, you approximate the predicate by locking a bigger subset at index level. For e.g., for “book room X between 2 and 3 pm” and assuming there is an index on “room”, attach a lock for “room == X”.
- Serializable snapshot isolation.
- Optimistic concurrency control algorithm that’s built on top of snapshot isolation: SSI figures out what writes will violate serialization and aborts transactions accordingly.
- Primary issue is write skew: a premise at the start of a transaction may no longer be valid by the time the transaction commits. (For e.g., the result of “are at least 2 people oncall for a given shift so that I can remove myself” changes by the time “I try to remove myself”.) So, if a transaction reads and later writes, you can expect causal connection between the two.
- SSI needs to act in 2 cases:
- Detecting reads of a stale MVCC object version (uncommitted write occurred before the read).
- At the time a transaction reads some snapshot of the data, it can also see that some other transaction(s) have uncommitted writes. So, the former may have to be aborted if, by the time it wants to commit, any of the latter are committed.
- Don’t abort during the stale read itself. The other transaction(s) may abort (for whatever reasons) or not commit by the time the particular transaction commits.
- Detecting writes that affect prior reads (the write occurs after the read).
- Somewhat an extension of index range locks from earlier.
- Keep track, at index level, what transactions have made read queries against it. (As discussed before, these can also be phantoms.)
- When a transaction writes (not commits), it can check this tracker and notify all transactions, that may now be impacted by this write. At the time the transaction is ready to commit, it can decide whether it should abort.
- Both these ideas sound fairly similar.
- In the 1st case, the staleness already occurred by the time the 2nd transaction reads. So, it (i.e. the reader) needs to track whether it should abort when the time comes.
- In the 2nd case, the reads were consistent/accurate but staleness occurred when the 1st writer wrote some data. So, it (i.e. the writer) notifies all transactions that may be impacted.
- Detecting reads of a stale MVCC object version (uncommitted write occurred before the read).
- Predictable latency (as opposed to 2-phase locking, wherein you don’t know how long you’ll have to wait for other transactions to finish before you can acquire your locks).
- Transactions should be short. The longer they are, higher the chances that you have to abort.
- Actual serial execution.