Partitioning
·1 min
- Partitioning and replication, though closely related, can be thought of independently. This may sound obvious but can potentially simplify system designs - might be worth keeping in mind.
- Two ways to partition:
- Key range based: Allows range based queries on the keys. However, call leads to non-uniform key distribution.
- Based on key-hashes:
- Uniform distribution.
- However, range-based queries on keys went work. One workaround is what DynamoDB does: allow such queries only on “sort key”.
- Problems to worry about: skewed data and hot spots.
- Note: skewed data can happen in key-based partitioning too. What if all records are for a single key?
- Might be application developer’s responsibility.
- What about secondary indexes?
- Local indexes: colocated with primary partitions but queries are expensive.
- Global indexes: reads are optimized at the expense of writes. For example, eventual consistency problems in latter.