Rate limiting

Anycast is a neat trick: the IP address you vend out to a client remains constant but their requests go to a datacenter closest to them. That way, you need to do rate-limiting for them within a datacenter and not across. The latter would be hard because you’d have to replicate data in real time.
There are various algorithms for local rate limiting. Good ones seem to be token bucket, leaky bucket and sliding window.
1. Factors to evaluate these algorithms: do they allow bursts and the size of parameters you need to track per client-id.
Distributed rate limiting:
1. Gossip between hosts vs hosts using a single distributed cache: the difference probably is just that we don’t have to deal with the former ourselves when we use the latter. In other words, we offload replication to latter instead of managing ourselves in former.
  1. One issue with distributed cache is that we need to ensure it scales with the service fleet size, so that we don’t have a large fleet overwhelming a small fleet.
  2. I find gossip difficult to visualize. Also, it has lot of chattiness.
  3. An alternative to gossip is to elect a coordinator/leader that reads data from all the hosts and updates them back with updated numbers. Such leader election may not have to be super-accurate.
2. Consistent hashing may be a good idea to map key to host which stores the counters. So, service hosts know which host, out of a cluster of rate trackers, they have to go for a given client id.
  1. Probably think of consistent hashing whenever you want to deal with data placement.