GCRA With Burst Limiting

2025-12-11

In a previous posts we looked at how to implement a leaky bucket rate limiter using the generic cell rate algorithm (GCRA) and then again how to derive several common rate limiting headers. In this post we’ll look at how to apply a burst limit.

Consider a rate limit of $X = 10$ requests per $P = 60$ seconds. With GCRA we can make all 10 requests in an instant because it allows us to consume the entire request budget immediately. This is often called bursting. What if we want to limit how high we can burst but still only allow 10 requests per 60 seconds?

If we think about this in terms of the leaky bucket again, the reason we can make 10 requests is because the bucket has capacity $P=60$. When we make a request, we add $P/X=6$ tokens to the bucket so we can make our 10 requests before the bucket is full and can no longer make any more requests. To reduce the burst limit, we simply make our bucket smaller. Let’s say we want to set the burst limit to 8. This means we need a bucket with capacity $6 \times 8 = 48$. The number of tokens we add per request is still 6 and tokens still drain from the bucket at one per second, so the overall rate limit is still 10 requests per 60 seconds. However, now the bucket is full after 8 requests!

So how do we adjust the math? Let $B$ be the desired burst limit. The math becomes:

$$ \begin{align*} T &= P/X \\ \tau &= BT - T \end{align*} $$

When a request comes in we set the arrival time $t_a$ to the current time. If this is the first request, we initialize $TAT = t_a$. If $t_a < TAT - \tau$ then reject the request. Otherwise, we allow the request and update $TAT$:

$$TAT = \max(t_a, TAT) + T$$

Notice how the only thing that changed from before is $\tau$. Before we had a bucket of size $P=XT$ and now we have a bucket of size $BT$.

#rate limiting #gcra