James Lao

Rate Limit Status Headers With GCRA

In the previous post we looked at how to efficiently implement a leaky bucket rate limiter using the generic cell rate algorithm. In this post we’ll look at how to calculate the rate limit status which is often returned in response headers.

Many services such as GitHub, Twitter, and Discord return response headers containing information about the rate limits and their current status. GitHub returns the following:

Header nameDescription
X-RateLimit-LimitThe maximum number of requests you’re permitted to make per hour.
X-RateLimit-RemainingThe number of requests remaining in the current rate limit window.
X-RateLimit-ResetThe time at which the current rate limit window resets in UTC epoch seconds.

The header names vary from service to service, but they all roughly translate to:

Rate limit

The rate limit is easy since that is just a value you define for your API. If your limit is 5 requests per second then you would return 5.

Remaining requests

To calculate the remaining number of requests let’s think back to the leaky bucket metaphor again. If you recall from my previous post, we have a bucket that has a hole in it. We try to add some number of tokens to the bucket for each request. We reject the request if the bucket would overflow. All the while, the bucket is leaking tokens at a rate of 1 per second.

If you want to enforce a limit of “\(X\) requests per \(P\) seconds”, our bucket will have a capacity of \(P\) and we add \(T = P/X\) tokens to the bucket per request. The request is allowed if the bucket has less than \(\tau = P - T\) tokens in it. GCRA only requires us to track the time at which the bucket will be empty \(TAT\).

To calculate the remaining number of requests \(R\):

$$R = \frac{P - (TAT - t_a)}{T}$$

Let’s break it down. We add tokens to our bucket for each request and reject requests if the bucket overflows. So the remaining space in the bucket is directly related to the number of requests we’re allowed to make. When the bucket is empty, we can make \(X = P/T\) requests. If the bucket has \(F\) tokens in it, we can make \(\frac{P - F}{T}\) requests. To calculate \(F\) we recall that \(TAT\) is the time at which the bucket will empty and the bucket leaks 1 token per second, so \(F = TAT - t_a\) where \(t_a\) is the time the request arrived. If we substitute \(F\) in we get our equation for \(R\) above.

Next request time

The next request time is easy to calculate since it is part of the GCRA limit evaluation. We reject a request \(t_a < TAT - \tau\) so the next request time is \(TAT - \tau\).

Reset time

The reset time is also trivial since that is the same as when the bucket will empty which is \(TAT\).

#rate limiting #gcra