Linux namespaces

There are 7 namespaces. At any point in time, there can be multiple instances of each namespace. A process can belong to one, and only one, instance of every namespace. It can also move from one instance to another of a given namespace. This means that 2 processes can be on the same instance of one namespace and different on another namespace. Virtual machines, running on a hypervisor, run an independent kernel. They, therefore, provide all or nothing isolation. Namespaces, on the other hand, provide lightweight isolation. For e.g., you can isolate 2 processes on just one namespace. List of namespaces: Mount: It allows you to mount a device to some directory. However, there is a single copy of the underlying device. So, even if the device is mounted in different directories for 2 processes, any change made by one will be visible to another. If you want to prevent that, then not mounting the device at all is one option. PID: Allows 2 processes in different instances to have the same PIDs. It’s also hierarchical: a process can see PIDs of all its child processes even if they are created in separate namespace instances but not vice versa. I didn’t know that PID “1” is special. If killed, the kernel panics and restarts. Network: This is something that I’ve encountered, but not fully understood, before. Helps setup iptables and routing rules but I need to dig deeper into those. One, rather mundane, use of network namespaces is that you could run multiple web-servers on port 80 on the same kernel if they are running in different network namespace instances. UTS: domain name etc. IPC: IIRC, processes in the same instance can communicate with each other through IPC (i.e., inter-process communication), otherwise can’t. Cgroups. User: Helps map user and group ID in one instance to something else in another. For e.g., you may be root inside an unprivileged container but that’s mapped to non-root on the kernel.

December 6, 2022

326

I learned a few random things lately: India doesn’t have a big number of skyscrapers despite high population because of two reasons. One, albeit minor, is that skyscrapers require consistent power supply and that’s hard in India. Two, the bigger one, is the government allowed FSI: a ratio of total floor space within the building to the area of land on which the building is built. FSI varies with project or location but is usually small (2-5) in India compared to that in other countries where it can go as high as 25. Third place: The YouTube presenter argues that a third place, other than your home and office, something like a local pub etc. where you can walk to, is socially important. Allows you to serendipitously bump into a set of people periodically. That’s where new community members can form first friends. The following don’t serve the same purpose: Self-organized backyard parties because you already know the people that you’ll meet. Malls or those pubs that you need to drive to because you possibly won’t bump into the same people multiple times. (Also, I myself can’t imagine going up to strangers in a random bar and starting a conversation. ) Recently, as I’ve started going to the office every week or two, I’ve noticed how nice random conversations are. They just don’t happen on video calls when you are working from home because you only call people with an agenda.

November 26, 2022

Multi-region Kafka

Multi-region Kafka: The blog describes an architecture where: Stream producers only write to a stream in their own region. There is another stream - “aggregate stream/cluster” - which is replicated across all the regions. (In other words, data is duplicated in every region.) Data from the region-level streams is moved into this one. Consumers need to be more complex if they want to support failovers. They work in 2 ways: Active-active. Each region has an active consumer that runs redundant business logic. If the system overall needs to create one view of data - for e.g., the blog suggests one database, across all regions, that contains surge pricing data - it needs a coordinator that tells all the region-level consumers which region contains the database they need to write to. If the region has issues, it coordinates the failover to another region. I found a couple of things weird in the article. Why did they create a single database for surge pricing? If the stream data was anyway replicated across all the regions, they could’ve created a region-level database and have the database clients pick the region closest to them. It says the Flink job is resource-intensive. So, isn’t it expensive to duplicate that job across all their regions? Active-passive: There is one active consumer in one region. However, how do you support failover to another region? They replicate checkpoints across the regions. That way, a new consumer can restart from the last known checkpoint. One caveat is around offsets. (I think this is same as stream-sequence-ids.) These offsets can can be in different order in different regions of the aggregation stream (for e.g., writes from a regional stream of my region will come in faster than those from other regions). So, they describe an algorithm to maintain offset mappings across regions so that a new consumer can pick the right offset during failover. (I didn’t read this algorithm in detail.)

November 26, 2022

Man in the hole

There are ~6 ways of structuring a fictional story. Out of those, 1 seems interesting while interviewing at tech companies: man in the hole. Or: Goes like this: “This story occurred when I was a tech lead of X people”. So, this is the base line. Or anchor. Shit happened one after the other. ~3 times is ideal. “I did X, Y and Z” to find success. “At the end, I, the team or the product came out better than before because of …”. (Notice that the final line is better than the starting point.) Important: Magnitude of both sides of the U and, consequently, its depth matter. First half of the U == how big the challenge; 2nd half == how big your actions and whether they are appropriate with your targeted level. I should practice this idea to make it stick.

November 19, 2022

Liquidity vs solvency problem

I learnt about a new idea while reading how FTX recently went bankrupt (within a short period of time). A liquidity problem happens when a bank’s assets are tied up in long-term assets but its customers, a lot or all of them, are demanding their money right now. The latter could happen if customers lose trust in the bank (i.e. bank run) or some other reason. One way to solve this through a central bank which literally has the power to print money on its free will: it steps in, uses bank’s collateral to lend money and charge interest and/or penalty. A solvency problem is when a bank, crypto exchange etc. just lost customer funds. For e.g., it invested in risky or volatile assets and those assets went bust. Possibly no way out except filing for bankruptcy because no one would want to loan money to such a company that doesn’t have any collateral or assets. FTX claimed it had an liquidity problem but more likely had a solvency one. It really fucked up.

November 14, 2022

Geohash vs Quadtree for location-based services

Primary factors: Density or dynamic partitioning: if you care about not having more than X points in a node, Quadtrees allow that. Geohashes are static in that sense. Quadtrees are probably better suited for read queries like, “find top X points near Y”. In Geohashes, I’ll probably require recursive queries if density of boxes is low (for e.g., in Yellowstone). I think both algorithms will work in all (or most) of the cases.

October 26, 2022

Introduction to Coroutines

Continuation is a neat idea: a function executes partially, cedes control and, when it is invoked again, restarts from where it left off last time. Why coroutines? Similar to Go’s goroutines, coroutines are extremely lightweight compared to threads. For e.g., you can easily spin up 100k coroutines but not threads on a regular machine. A key observation is that nowadays most of our code spends time doing network IO. We don’t need computation-level concurrency most of the time. Compared with other languages: C#, Python etc. have async/await to implement concurrency. (If I understood correctly, they are similar to how Java’s concurrency also works: you start something in a separate thread, get a Future or something back and wait for it to complete at the point you want its result.) The default way to use them is to write concurrent code. Kotlin decided to make sequential code the default. If you want concurrent code, you have to make that explicit. Good for whoever is reading the code to understand when concurrency is happening. Kotlin introduced a single modifier: suspend. I think the idea is that such a function suspends - sort of goes to sleep - until the thing it is waiting for is available or done. Coroutines run on top of threads. When a coroutine is suspended, that thread frees up to do other things. Once the coroutine is ready, it will be scheduled on some, possibly different, thread. (So, I think this is the “continuation” part above.)

October 14, 2022

Replication

Replication: Multi-leader replication: Conflicts the biggest problem. Leaderless replication: The key thing to remember is that there is no order in the way writes happen in a leaderless replication world. (Unlike single-leader replication.) Version numbers associated with each write help decide what version is the latest. Needless to say, timestamps aren’t sufficient because of clock skew problems. Think: Dynamo-style architecture. Quorum reads/writes: If n nodes are present for a particular piece of data, w + r > n. In case of partial write failures, how do you ensure that stale replicas eventually get updated? Read repair: as the name suggests. Anti-entropy process: some background process(es) that repair the records. If unsuccessful writes - those that succeeded on at least 1 node but not on quorum - what happens if you don’t rollback? Interesting edge cases can happen if concurrent writes happen, concurrent reads and writes happen, etc. Also, don’t assume that, in case of 2 concurrent writes, last-one-wins is an acceptable trade-off from the database perspective. That’s a case of data loss and some applications may not be okay with it. Sloppy quorums: let’s say a piece of data needs to go to a particular set of n nodes. Now, the home nodes don’t have enough w available, but some other set of w nodes are available in the data center. To increase availability, we could decide to still write. All that means is that the data is written to some w nodes, not necessarily the right ones. So, next we need hinted handoff: once the right w nodes are available, transfer the data to them.

October 13, 2022

Cache strategies

Data flow is: application -> cache -> database (i.e. cache sits in between the application and the database). There are 3 strategies: Write through: write to both in parallel. Write around: database only. Write back: cache only. It basically depends on whether, during a write, the application writes to the underlying database, cache or both.

October 7, 2022

Hash encoding

MD5 hashes and UUIDs are 128 bits. So, if you encode them using hexadecimals, each hex character will encode 4 bits of information, so resultant “string” will be 128/4=32 character/bytes long. If you store UUIDs as string, you need 4 additional characters for the hyphens. If you store them in binary format, you just need 128/8 = 16 bytes because each byte will encode 8 bits of information.

April 4, 2022