Skip to main content

Partitioning

·1 min
  • Partitioning and replication, though closely related, can be thought of independently. This may sound obvious but can potentially simplify system designs - might be worth keeping in mind.
  • Two ways to partition:
    • Key range based: Allows range based queries on the keys. However, call leads to non-uniform key distribution.
    • Based on key-hashes:
      • Uniform distribution.
      • However, range-based queries on keys went work. One workaround is what DynamoDB does: allow such queries only on “sort key”.
  • Problems to worry about: skewed data and hot spots.
    • Note: skewed data can happen in key-based partitioning too. What if all records are for a single key?
    • Might be application developer’s responsibility.
  • What about secondary indexes?
    • Local indexes: colocated with primary partitions but queries are expensive.
    • Global indexes: reads are optimized at the expense of writes. For example, eventual consistency problems in latter.