Today I Learned

Carbs vs fat in conjunction to strength training

Carbs: 3 types: Good ones: vegetables and fruits. Eat a lot of veggies. Satiating too, because they are fibrous. So, will make you feel full. Bad ones: sugary things, avoid as possible. Middle ones: pasta, wheat, rice, oats and grains. Decrease their intake. Keto diet doesn’t have any or negligible carbs. But I don’t think that’s for me because I do want to consume vegetables (which contain carbs) for overall nutrition. Balanced diet still seems to be a good idea.

Rust book

Two main on stack vs heap: For pushing something on stack, you need to know size upfront (during compile time). So, any object that can change size needs to go on heap. Primitive data types (such as int) remain on the stack? Accessing heap memory takes more time as you need to follow a pointer. The term “memory allocation” only applies to heap. Variables move (for those allocated on heap) and copy (primitive types). ...

Logical clocks

How to reconcile logical clocks with wall clocks? Two algorithms: Lamport. Total order, consistent with causality. Vector clocks. Related but not same as version vectors. Q: how to merge two event timestamps? Q: how to compare? How to find incomparable? Timestamps summarize everything that happened before. Partial vs total order: Partial == happens before: order only between causal events, none between concurrent. Basically, some events are incomparable. Total: order defined for each pair of events. Timestamps are incremented before message sent to another node and when received.

How to think of Kubernetes

Here are some mental models that I am building while reading The Kubernetes Book: Pods are the smallest building block. They are a collection of containers that are created, destroyed, deployed etc. together. Kubernetes is a generic control plane that allows you to run applications using these pods. For example: how to make sure X pods are running all the time, how to automatically recover from pod failures, how to deploy or rollabck safely, how to route traffic to these pods etc.? On the surface, the idea seems powerful. For e.g., the AWS service that I am currently part of manages its own control plane which takes care of Docker containers for customers. That’s pretty similar to what Kubernetes would have provided us. However, I haven’t operated Kubernetes myself, so my knowledge is theoretical and I don’t know how it works in practice. Kubernetes provides an imperative interface: users define what end state they want and Kubernetes figures out how to get there.

Self-hosted certificate authority

I recently setup a private CA, i.e. certificate authority, using Step CA. Here are a few things I learnt: Step CA generates the following at the start: Root certificate and corresponding private key. Former should be distributed to required client devices. Latter is super-secret and should ideally be kept offline. Intermediate certificate and corresponding private key. Latter is a secret but not super-secret and supposed to be kept online. These two are used to generate the end certificates that services should use. These certificates will have short expiry (such as 1 to 90 days). If compromised, can be thrown away and a new {certificate, private-key} pair can be generated using the root pair. The {certificate, private-key} pair generates new certificates that are used by the service. On the client side, the chain of trust, all the way back to the root certificate, allows it to setup a proper TLS connection with the service. Why Step CA and not regular X.509 certificates? ...

1ffcb882

Causality: Transaction A could happen before B, after or concurrently. In the first 2 cases, there is said to be causal connection between the transactions. Cross channel timing dependencies: if there are 2 communication channels between 2 systems, one slow channel can break causality. Causal order is not equal to total order. Linearizability, by definition,gives the latter because it makes the system behave as if there is a single copy of the data. However, has a performance penalty and many systems that pretend to require linearizability, in reality, only require causality. Lamport clocks. How are they different from version vectors?

Logical clocks

Physical clocks aren’t reliable in a distributed system. (Basically, assume that each server has different timestamps all the time.) Therefore, logical clocks help. Two main algorithms: Lamport clocks. Version vectors. (Somewhat related to but different from “version clocks”. Haven’t dug into details though.) What are “events”? Server receives a message or request from a client. Inter-server communication (i.e. when a server receives message from another server). On differences, I won’t describe how the two algorithms work as that would be too much stuff to write down. However, I’ll describe key implications of the two. Lamport clocks. If a happened-before b, it implies L(a) < L(b). However, if you are given L(a) and L(b) such that L(a) < L(b), all that means is b did not happen-before a. You can’t deduce anything else. If you have a bunch of events, you can deduce a total order which is consistent with causality. So, if two events had a happened-before relationship, they will be ordered correctly in the result. However, you can’t pick any two events from the order and deduce that the one earlier had happened-before the other. Basically, the total order is some order which is consistent with causality. It is not the actual order in which events occured in physical time. Version vectors. Each data record is associated with a vector. Each element in the vector is associated with a server that had processed that record. a -> b implies V(a) < V(b), both ways. That’s the key difference from Lamport clocks. Given any two events and their version vectors, you can identify whether one of them happened-before the other or they were concurrent. Important note: happens-before has nothing to with whether one event happened before the other in physical time. Instead, it means whether they are causally connected: one caused the other. Helpful resources: ...

Probabilistic data structures

Bloom filters. Set membership. 1D array with k hash functions. False positives possible but no false negatives. Applications: One hit wonders can take up to 75% of cache space. BF can help identify so that you can skip caching such items and save cache space. Check for weak passwords, malicious URLs, username etc. If you want 100% accuracy, check in BF first and fallback to the real data store as required. In a NoSQL type database, check BF on whether an item exists before going to the disk. Therefore, decrease disk access. Count min sketch. Estimate frequency of all elements in a set. 2D array. Each row is for a given hash function, so R hash functions. These functions give a number between 0 and C-1, where C is the number of columns. Once all elements are added in the stream, how do you find one with the highest frequency? If you have the superset of all elements, you can make another pass on it and keep track of the most frequent element. A better solution is to keep a separate min heap. While adding an element to the sketch, compare its latest frequency with the minimum element in the heap. If new frequency is higher than minimum in the heap, remove the minimum and add the current element. Remember: min heap helps to find top-k elements and vice versa. Can over-estimate, never under-estimate. Applications: Top k elements in a stream. (For e.g., trending products, etc.) Log log. Cardinality estimation. I was thinking one way to estimate cardinality would be to use a bloom filter in conjunction with a counter: when a new item comes, check if it already exists in the filter and, if it isn’t, increase counter by 1. Should probably work but I could be missing some edge case(s). Mental model: Let’s say you hash items such that the binary representation of the result is an X bit number. (X is a design parameter - up to you.) If the hash function is truly random, the bits should be evenly distributed in the result. If cardinality of the dataset is N, we expect N/2 items to be 1XXX…, N/4 to be 01XXX…, N/8 to be 001XXX… etc. If the maximum position of 1 is k from the start, N/2^k is approximately 1. So, cardinality is approximately 2^k. Randomness can cause issues with the calculation. Two ways to fix it. Multiple hash functions: Apply multiple hash functions, each will give some cardinality. Final result is average of all these. Expensive. Buckets with single hash function: Use first Y bits (out of X) to bucketize the results. So, Y == 10 will result in 2^10 buckets. Use the remaining X-Y bits per the original idea. So, for every bucket, you get some cardinality. Average the bucket-level cardinality numbers and multiply by 2^Y to get the final result. There is nothing special about choosing 00…1XX… (i.e. leading zeros). You are free to choose XX…100… (i.e. zeros at the end), 11…0XX… (i.e. leading 1s instead of 0s) etc. Hyper log log is a variant on top of log-log that decreases error percentage. But the core idea is the same.

How to Talk So Little Kids Will Listen

Here are my takeaways from reading the book: Playfulness instead of giving orders. For e.g., “the nail clipper is hungry and wants some baby nails”. Problem solving. “We have a problem and we need ideas to solve it. Can you think of some?” Write all ideas down even if they are absurd. Go through them again to pick one solution. Give information. Describe the feeling, with the same rigor as the feeling. For e.g., show frustration while saying something like “you get frustrated while sitting in the car seat for a long time”. Acknowledge feelings. “You are unhappy that we need to leave the park and go back home.” Write a note or draw pictures. “You want to buy this car? Let me write it in our wishlist so that we don’t forget.” This works even if kids can’t read. Offer (genuine) choices. Don’t be a dick by offering superficial choices (like “do you want to get in the car or do you want to be smacked”). Hire their help. “I have an important project, can you help me with that?”. Imagine things with them. Fantasize. “I wish we had a house made of ice cream.” Adjust your expectations. Sometimes, they might just be having a bad day, might be hungry etc. At a party where the kid meets a lot of new people, it is natural if he feels intimidated. (I myself do sometimes.) Don’t label them by saying “oh, he is just shy”. Instead, “he will join you when he’s ready.” Don’t say “good job” mindlessly or praise them unnecessarily. If they are doing good, describe (in some detail) what they did. “Look, you drew a nice circle.” If they fail or are struggling, describe the effort and whatever progress they made. “You tried hard to stick these 2 pieces together.” Don’t praise or console falsely. (How would you feel if, when you failed, someone told you, “don’t worry, that’s how life is” or “you did a good job”.)

Chapter 7, "Transactions"

Transaction provides ACID safety guarantees: Atomicity == abortability: all or none of the commands in the transactions will go through. Consistency: useless because it’s a property of the applications and not databases. Isolation: 2 transactions don’t affect each other. (This is the primary focus of the chapter.) Durability: don’t lose written data. Isolation: Not a straightforward concept because databases provide different levels of isolation, sometimes while using inconsistent terminologies. True isolation == serializability. If not that, a database can provide some form of weak isolation. How do we classify weak vs strong isolation? Concurrent reads and writes during transactions can lead to a few types of race conditions - these are issues that wouldn’t have occurred if the same transactions had run serially. So, a database provides weak isolation if it suffers from any of those race conditions, strong if not. Names of isolation levels: Read committed. Snapshot isolation. Serializable. Race conditions that lead to weaker isolation levels. (A short summary is also present in chapter 5 of “Database Internals”.) Reads: Dirty reads: don’t read data from another uncommitted transaction. Non-repeatable read == read skew: a given transaction gets different versions when it reads the same record at different times. (Think of a long-running query against an OLAP database.) Phantom reads: same problem as earlier but against a range of data. Somewhat related to write skew below. Writes: Lost writes: For e.g., T1 and T2 read the current value of a counter at the same time, increment it by 1 and write the updated result one after the other. (Note that neither the read nor the write was dirty in this case. Still, the 2nd write lost some information from the previous one.) Dirty writes: don’t write on top of uncommitted data from another transaction. Write skew: 2 writes, when done individually, would be fine but violate an invariant when they occur concurrently. Another way to look at it is: a transaction read some data, decided to write something based on that data but the premise on which it made the decision is no longer true by the time it went to write. Some data can be a phantom. For e.g., “get list of doctors that are oncall during a particular shift” so that “I can remove myself if at least 2 doctors are available”. (Basically, this query is trying to remove one doctor if-and-only-if there will at least be 1 other oncall doctor available after the removal.) Keep in mind though that write skew can occur without phantoms. For e.g., A had $100 in their bank account and 2 transactions of the form “remove $75 from the account if the account has more than that” run concurrently. Note that serializable isolation doesn’t suffer from any of these race conditions. Weak isolation levels: Read committed: Solves for dirty reads and writes. Simple row-level locks are sufficient. These are kind of binary locks: either has or doesn’t have a lock on a given row. In snapshot isolation, we need to track a list of transactions per row. Another approach is for the database to maintain 2 versions of every record: last committed and currently uncommitted. When a transaction reads, just give the former. Snapshot isolation: For each row, we need to track what should be visible to all ongoing transactions against the database. So, we create multiple records for each user-visible row. Whenever a transaction updates a row, we create 2 records: one marked “delete”, another “create”. So, for a given transaction, its view of a particular record will only be based on all transactions that ran before it. Called MVCC == multi-version concurrency control. How to prevent lost writes? Atomic operations provided by the database. Locks such as SELECT ... FROM ... FOR UPDATE. Automatic detection and prevention of lost writes: database could do this, say, in conjunction with snapshot isolation. (The book doesn’t provide sufficient details but I think the way it would work is that, during a write, check if any new versions of a record have shown up since the current transaction started.) Compare and set: such as “update record if previous record is such and such”. (Seems similar to optimistic locking from DynamoDB.) Serializablity == strong isolation level. Doesn’t have any of the race conditions above (so, no lost writes or write skew as well). 3 approaches: Actual serial execution. Transactions need to be short and quick. Good for low write throughput such that transactions are handled by one CPU. If write throughput is high, think of partitioning in such a way that each CPU handles a single partition and transactions don’t need to span multiple partitions. (Basically, reduce performance overhead of CPU switching.) Disk can be another bottleneck. If all the data can fit in memory, that’ll improve the situation - but still need to ensure writes are durable. Write ahead log, possibly on SSDs, might help. Stored procedures also help: code that runs on the database directly. For e.g., Redis allows you to write such code in Lua. However, a bad code deployment can impact the whole database. 2-phase locking. Different from 2-phase commits. (Don’t get confused!) Yes, you first take locks (i.e. phase 1) and release at the end (in phase 2). But that’s not all because the same thing happens when you explicitly take locks for, say, preventing lost writes. So, the key thing to remember is: you take locks for both reads and writes, and the read locks should be for phantoms too. Shared and exclusive locks: similar to read/write locks in Java. How to prevent write skew by handling phantoms correctly? Materialized conflicts: make the conflict-able thing concrete in the database tables such that you (as application) can take locks on it. For e.g., if you build a meeting room booking system that allows users to block 30-min slots, you could add rows for every 30-min time slots (such as 2 to 2:30 pm) and take locks as needed. Predicate locks: predicates are the where clauses. So, a database could allow something like: SELECT ... FROM ... WHERE ... FOR UPDATE. So, a read transaction will check whether any of its conditions are already locked by someone else. And, a write transaction will wait if any of its conditions are locked in shared/exclusive mode by anyone else. Index-range locks: predicate locks will work but can be difficult or performance-intensive to implement. However, the predicates you lock on might reference some table indexes. So, you approximate the predicate by locking a bigger subset at index level. For e.g., for “book room X between 2 and 3 pm” and assuming there is an index on “room”, attach a lock for “room == X”. Serializable snapshot isolation. Optimistic concurrency control algorithm that’s built on top of snapshot isolation: SSI figures out what writes will violate serialization and aborts transactions accordingly. Primary issue is write skew: a premise at the start of a transaction may no longer be valid by the time the transaction commits. (For e.g., the result of “are at least 2 people oncall for a given shift so that I can remove myself” changes by the time “I try to remove myself”.) So, if a transaction reads and later writes, you can expect causal connection between the two. SSI needs to act in 2 cases: Detecting reads of a stale MVCC object version (uncommitted write occurred before the read). At the time a transaction reads some snapshot of the data, it can also see that some other transaction(s) have uncommitted writes. So, the former may have to be aborted if, by the time it wants to commit, any of the latter are committed. Don’t abort during the stale read itself. The other transaction(s) may abort (for whatever reasons) or not commit by the time the particular transaction commits. Detecting writes that affect prior reads (the write occurs after the read). Somewhat an extension of index range locks from earlier. Keep track, at index level, what transactions have made read queries against it. (As discussed before, these can also be phantoms.) When a transaction writes (not commits), it can check this tracker and notify all transactions, that may now be impacted by this write. At the time the transaction is ready to commit, it can decide whether it should abort. Both these ideas sound fairly similar. In the 1st case, the staleness already occurred by the time the 2nd transaction reads. So, it (i.e. the reader) needs to track whether it should abort when the time comes. In the 2nd case, the reads were consistent/accurate but staleness occurred when the 1st writer wrote some data. So, it (i.e. the writer) notifies all transactions that may be impacted. Predictable latency (as opposed to 2-phase locking, wherein you don’t know how long you’ll have to wait for other transactions to finish before you can acquire your locks). Transactions should be short. The longer they are, higher the chances that you have to abort.