Skip to main content

The Log - What every software engineer should know about real-time data's unifying abstraction.

·1 min

The Log: What every software engineer should know about real-time data’s unifying abstraction.

  • Thoughts:
    • What’s the difference between log/stream and messaging/notification service? Possibly that the former is for data flow whereas the latter for one time notifications between systems.
    • Data silos in a company don’t help but, once connected, can unlock new insights and functionality. Logs seem to be a good way to do that.
      • An interesting idea is that log enables putting data into both Hadoop-style batch and real-time (such as search indexing or monitoring) systems. (It may sound simple but can be complicated. For example, in a previous team, I tried aggregating a stream of affiliate order events and it wasn’t easy.)
    • In the following image, what happens if you realize one of the upstream systems was producing bad data for, say, a few hours or days? How do you backfill data downstream and what do you do about the bad data inside the log?