Rate Limit Status Headers With GCRA

In the previous post we looked at how to efficiently implement a leaky bucket rate limiter using the generic cell rate algorithm. In this post we’ll look at how to calculate the rate limit status which is often returned in response headers.

Many services such as GitHub, Twitter, and Discord return response headers containing information about the rate limits and their current status. GitHub returns the following:

Header name Description
X-RateLimit-Limit The maximum number of requests you’re permitted to make per hour.
X-RateLimit-Remaining The number of requests remaining in the current rate limit window.
X-RateLimit-Reset The time at which the current rate limit window resets in UTC epoch seconds.

The header names vary from service to service, but they all roughly translate to:

  • The rate limit. This would be the X-RateLimit-Limit header.
  • The remaining number of requests that can be made before getting rate limited. This would be the X-RateLimit-Remaining header.
  • The time at which the limit resets. That is, when the client can make the up to the limit number of requests again without getting throttled. This would be the X-RateLimit-Reset header.
  • The time at which the next request can be made without getting limited. This would be the Retry-After header.

Rate limit

The rate limit is easy since that is just a value you define for your API. If your limit is 5 requests per second then you would return 5.

Remaining requests

To calculate the remaining number of requests let’s think back to the leaky bucket metaphor again. If you recall from my previous post, we have a bucket that has a hole in it. We try to add some number of tokens to the bucket for each request. We reject the request if the bucket would overflow. All the while, the bucket is leaking tokens at a rate of 1 per second.

If you want to enforce a limit of “$X$ requests per $P$ seconds”, our bucket will have a capacity of $P$ and we add $T = P/X$ tokens to the bucket per request. The request is allowed if the bucket has less than $\tau = P - T$ tokens in it. GCRA only requires us to track the time at which the bucket will be empty $TAT$.

To calculate the remaining number of requests $R$:

$$R = \frac{P - (TAT - t_a)}{T}$$

Let’s break it down. We add tokens to our bucket for each request and reject requests if the bucket overflows. So the remaining space in the bucket is directly related to the number of requests we’re allowed to make. When the bucket is empty, we can make $X = P/T$ requests. If the bucket has $F$ tokens in it, we can make $\frac{P - F}{T}$ requests. To calculate $F$ we recall that $TAT$ is the time at which the bucket will empty and the bucket leaks 1 token per second, so $F = TAT - t_a$ where $t_a$ is the time the request arrived. If we substitute $F$ in we get our equation for $R$ above.

Next request time

The next request time is easy to calculate since it is part of the GCRA limit evaluation. We reject a request $t_a < TAT - \tau$ so the next request time is $TAT - \tau$.

Reset time

The reset time is also trivial since that is the same as when the bucket will empty which is $TAT$.

James Lao
Staff Software Engineer