Partitioning
·3 mins
- How do you rebalance partitions (apart from the simple concern of whether to do so automatically or manually)? Conceptually, there are 4 ways:
- “Hash % n”: not a good idea because, if changing
n
possibly requires moving data from a large number of nodes. - Fix the number of partitions upfront:
- Similar to what Elasticsearch does. For e.g., let’s say you have 10 nodes, you may choose to create 1000 partitions.
- Size of each partitions grows proportionally to data size.
- You need to get the number of partitions just right but that’s hard. If you choose a very high number, you may end up with a lot of small size partitions, so high overhead of managing them. If you choose too small, moving them around might be hard.
- Things complicate as your data size starts small but grows big over time.
- Fix the size of every partition = dynamic partitioning:
- Similar to how DynamoDB maintains an upper cap of 10 GB per partition.
- Requires splitting or merging partitions as they grow or shrink in size.
- Number of partitions grows proportionally to data size.
- Pre-splitting is a good idea when you are just starting: otherwise, if you just start with a single partition and wait for it to split once its size hits the 10 GB or whatever threshold, only one node will be in use and others will sit idle.
- Partitioning proportionally to nodes:
- All 3 techniques above ignore the number of nodes. In this approach, however, we fix the number of partitions - say, P - per node.
- Let’s say, at some point, there are X nodes in the system. So, size of all partitions on these X nodes will grow proportionally with data size. If we add Y nodes, the size of these partitions will go back down.
- These Y nodes will, at random, steal half the partition data from P existing partitions.
- If we pick partitions randomly, it might skew the data. But, averaged over time, data movement is uniform.
- “Hash % n”: not a good idea because, if changing
- Request routing:
- How are client requests routed to the right node for a given key?
- Client could connect to a random node. That node calls the right node if it doesn’t have the data.
- IIRC, this is how Elasticsearch behaves.
- Client sends request to a load balancer style “request router”. This is how DynamoDB behaves.
- Client itself knows what the right node is and directly sends request to it.
- Client could connect to a random node. That node calls the right node if it doesn’t have the data.
- Still, how do you keep track of key to partition to node mapping?
- Similar to “service discovery” problem: Zookeeper is a good solution.
- What I find interesting is this: the ideas behind partitioning (and possibly other distributed system concepts) become conceptually straightforward once we introduce a low-level coordination service like Zookeeper.
- If ZK weren’t present, I would try to design a partition-management-service that pushes this metadata to request-routers. I’ll also think of consistent-hashing to partition+replicate this metadata to further increase availability and fault tolerance.
- Similar to “service discovery” problem: Zookeeper is a good solution.
- How are client requests routed to the right node for a given key?