How Rate Limiters Protect Systems and Boost performance

Ayush Makwana — Fri, 03 Jan 2025 03:28:35 GMT

Rate limiters are used to control the number of requests sent by clients or services. They ensure that requests are sent at an acceptable rate to the target machine or service. If the request count exceeds the threshold defined by the limiter over a specified period, subsequent requests are managed according to predefined rules or policies.

Depending on the use cases, there are three main strategies to limit the rate of requests:

Slowing down the requests:
- Incoming requests are buffered, and the system processes them at a predetermined, controlled rate.
- A typical example of this is queues and message brokers. Incoming requests are stored in a queue, and consumers process them at their own pace.
Rejecting requests:
- There can be multiple scenarios where we have to reject excess requests. For example, when a large volume of requests overwhelms the system's capacity, excess requests are rejected, and clients receive a 429 (Too Many Requests) error code in response.
Ignoring requests:
- Similar to rejecting requests, but instead of returning an error code, excess requests are silently dropped without notifying the client.
- In the case of a DDoS attack, it is used to create an illusion for the attacker that all the requests are being accepted.

As per the use cases, we can apply these strategies individually or in combination to efficiently manage our system resources and prevent overload.

Rate limiters are typically deployed on the server side. Depending on the architecture and needs of the system, they can be implemented at the application layer (Layer 7) or network layer (Layer 3) of the OSI model.

Rate limiters can be used to handle both external requests (such as those coming from users or APIs) and internal requests (from internal services or third-party integrations). Here are some common use cases:

External Rate Limiters
- External rate limiters are client-facing and use metrics like time windows and IP addresses to determine whether a request should be allowed or not.
  1. Preventing DDoS attacks
    - DDoS attacks are common, and no matter how big the attack is, our system should not go down. So, external rate limiters ensure that we're only forwarding the requests which are legitimate by observing things like IP addresses or user access tokens.
  2. Managing traffic spikes gracefully
    - It may happen that there is a lot of legitimate traffic coming in, and none of them are attackers, but our infrastructure just isn't ready to handle it.
    - In these scenarios, it's okay to block excess requests and let users know that we're expecting too much traffic; please try again later.
Internal Rate Limiters
- In internal rate limiters, resource utilization is considered to make decisions.
  1. Managing subscription-based applications
    - For example, you're running a blogging website that allows free-tier users to read only 3 blogs in a month. In that case, if a user crosses the free limit, you need to block the request for the rest of the month.
    - To achieve this, data about usage and requests are stored in a database or cache so that the rate limiter can check it and make a decision.
  2. Controlling usage of third-party apps
    - When the internal services make calls to third-party services, you need to ensure that it is regulated as per the defined rules and not just overused.
    - Suppose in our website, we have integration with a third-party vendor who provides an API for LLM. And of course, the calls are expensive, so we have to ensure the number of calls we make stays within the agreed-upon limits.
  3. Preventing cascading failures
    - In scenarios where you want to execute some commands or database queries that could have cascading effects.
    - For instance, if you need to delete millions of rows from a table, doing it all at once can lead to catastrophic failures and make the database unavailable. Instead, you might want to delete a thousand rows every hour. In this case, a rate limiter helps to distribute the load uniformly across a longer time span, reducing the risk of system overload.

Algorithms to Implement Rate Limiter

Rate limiting can be implemented using different algorithms, each with its own advantages and limitations. Here is a list of some popular algorithms:

Token bucket
Leaking bucket
Fixed window counter
Sliding window log
Sliding window counter

What if a Client Exceeds the Rate Limit?

When a request exceeds the rate limit, the API returns a 429 (Too Many Requests) response to the client. According to use cases, rate-limited requests may be queued for later processing, such as delaying order processing during system overload.

How Does a Client Know Whether It Is Being Throttled?

Clients can track their rate limit status through these data in HTTP headers:

Number of remaining allowed requests.
Total allowed requests per time window.
Time to wait before making another request.

References

System Design Interview – An insider's guide by Alex Xu
Arpit Bhayani YouTube