I am one of those people who function better by writing things down. One day, I realized that most of my notes don’t have to be private, so here they are - my second brain. Be warned that, if you stumble upon something here that doesn’t make sense to you, it isn’t meant to!
The Log - What every software engineer should know about real-time data's unifying abstraction.
The Log: What every software engineer should know about real-time data’s unifying abstraction. Thoughts: What’s the difference between log/stream and messaging/notification service? Possibly that the former is for data flow whereas the latter for one time notifications between systems. Data silos in a company don’t help but, once connected, can unlock new insights and functionality. Logs seem to be a good way to do that. An interesting idea is that log enables putting data into both Hadoop-style batch and real-time (such as search indexing or monitoring) systems. (It may sound simple but can be complicated. For example, in a previous team, I tried aggregating a stream of affiliate order events and it wasn’t easy.) In the following image, what happens if you realize one of the upstream systems was producing bad data for, say, a few hours or days? How do you backfill data downstream and what do you do about the bad data inside the log?