Skip to main content

Partitioning

·3 mins
  • How do you rebalance partitions (apart from the simple concern of whether to do so automatically or manually)? Conceptually, there are 4 ways:
    • “Hash % n”: not a good idea because, if changing n possibly requires moving data from a large number of nodes.
    • Fix the number of partitions upfront:
      • Similar to what Elasticsearch does. For e.g., let’s say you have 10 nodes, you may choose to create 1000 partitions.
      • Size of each partitions grows proportionally to data size.
      • You need to get the number of partitions just right but that’s hard. If you choose a very high number, you may end up with a lot of small size partitions, so high overhead of managing them. If you choose too small, moving them around might be hard.
        • Things complicate as your data size starts small but grows big over time.
    • Fix the size of every partition = dynamic partitioning:
      • Similar to how DynamoDB maintains an upper cap of 10 GB per partition.
      • Requires splitting or merging partitions as they grow or shrink in size.
      • Number of partitions grows proportionally to data size.
      • Pre-splitting is a good idea when you are just starting: otherwise, if you just start with a single partition and wait for it to split once its size hits the 10 GB or whatever threshold, only one node will be in use and others will sit idle.
    • Partitioning proportionally to nodes:
      • All 3 techniques above ignore the number of nodes. In this approach, however, we fix the number of partitions - say, P - per node.
      • Let’s say, at some point, there are X nodes in the system. So, size of all partitions on these X nodes will grow proportionally with data size. If we add Y nodes, the size of these partitions will go back down.
        • These Y nodes will, at random, steal half the partition data from P existing partitions.
        • If we pick partitions randomly, it might skew the data. But, averaged over time, data movement is uniform.
  • Request routing:
    • How are client requests routed to the right node for a given key?
      • Client could connect to a random node. That node calls the right node if it doesn’t have the data.
        • IIRC, this is how Elasticsearch behaves.
      • Client sends request to a load balancer style “request router”. This is how DynamoDB behaves.
      • Client itself knows what the right node is and directly sends request to it.
    • Still, how do you keep track of key to partition to node mapping?
      • Similar to “service discovery” problem: Zookeeper is a good solution.
        • What I find interesting is this: the ideas behind partitioning (and possibly other distributed system concepts) become conceptually straightforward once we introduce a low-level coordination service like Zookeeper.
      • If ZK weren’t present, I would try to design a partition-management-service that pushes this metadata to request-routers. I’ll also think of consistent-hashing to partition+replicate this metadata to further increase availability and fault tolerance.