Go implements various current limiting

Go implements various current limiting


When developing a high-concurrency system, we may encounter excessive interface access frequency. In order to ensure the high availability and stability of the system, you need to restrict traffic at this time. You may use it.

This kind of
Web Server
To control the request, it may also be implemented with some popular class libraries. Current limiting is a big killer of high concurrency systems. Before designing current limiting algorithms, let's first understand what they are.


The purpose is to protect the system by limiting the rate of concurrent access requests or limiting the rate of requests within a time window. Once the rate is reached, the system can be rejected, queued or waited, and downgraded. Protect the system by limiting the rate of concurrent (or within a certain time window) requests. Once the rate is reached, it will deny service (direct to an error page or inform that resources are gone), wait in line (such as spike, comment, order), and downgrade (Return to bottom data or default data).

As shown in the figure:

As shown in the cartoon, when the traffic comes up in a certain period of time, the interface access frequency of the service may be very fast. If we do not limit the interface access frequency, the server may not be able to withstand excessive pressure and hang up. Data loss may occur, so it is necessary to limit the current.

The current-limiting algorithm can help us control the frequency at which the function of each interface or program is called. It is a bit like a fuse to prevent the system from being paralyzed by exceeding the access frequency or the amount of concurrency. We may see response headers like this when calling some third-party interfaces:

Limit-ratelimit-X-: 60 //60 requests per second X---ratelimit the Remaining: 22 is //how many times the current left X--ratelimit the Reset-: 1612184024 //reset time limit duplicated code


HTTP Response
It tells the caller what the current limit frequency of the server is through the response header, and guarantees the upper limit of the back-end interface access. In order to solve the current limit problem, many algorithms have appeared, and they all have different purposes. The usual strategy is to reject the exceeding request or to queue the exceeding request.

Generally speaking, the common processing methods for current limiting are:

  • counter
  • Sliding window
  • Leaky bucket
  • Token bucket


The counter is one of the simplest current-limiting algorithms, and its principle is:

In a period of time, the requests are counted, and the threshold is compared to determine whether current limiting is needed. Once the time critical point is reached, the counter is cleared.

This is just like when you go to a car. How many positions are specified in the car. When the car is full, you will not be allowed to get in the car. Otherwise, it will be overloaded. If you are caught by the traffic policeman, you will be fined. If our system is not a fine thing. , It may have collapsed directly.

  1. You can set a variable in the program
    , When a request comes, I will add this number
    , And record the request time at the same time.
  2. Judge when the next request comes
    Whether the count value exceeds the set frequency, and whether the current request time and the first request time are within
    Within minutes.
  3. If in
    Within minutes and exceeding the set frequency, it proves that there are too many requests, and subsequent requests will be rejected.
  4. If the interval between the request and the first request is greater than the counting period, and
    If the value is still within the current limit, reset


package main import ( "log" "sync" "time" ) type Counter struct { rate int //Maximum number of requests allowed in the counting period begin time.Time //Counting start time cycle time.Duration //Counting period count int //Accumulated number of requests received in the counting period lock sync.Mutex } func (l *Counter) Allow () bool { l.lock.Lock() defer l.lock.Unlock() if l.count == l.rate -1 { now := time.Now() if now.Sub(l.begin) >= l.cycle { //Reset the counter within the allowable speed range l.Reset(now) return true } else { return false } } else { //The rate limit is not reached, the count is increased by 1 l.count++ return true } } func (l *Counter) Set (r int , cycle time.Duration) { l.rate = r l.begin = time.Now() l.cycle = cycle l.count = 0 } func (l *Counter) Reset (t time.Time) { l.begin = t l.count = 0 } func main () { var wg sync.WaitGroup var lr Counter lr.Set( 3 , time.Second) //At most 3 requests within 1s for i := 0 ; i < 10 ; i++ { wg.Add( 1 ) log.Println( "Create request:" , i) go func (i int ) { if lr.Allow() { log.Println( "Response to request:" , i) } wg.Done() }(i) time.Sleep( 200 * time.Millisecond) } wg.Wait() } Copy code


2021/02/01 21:16:12 Create request: 0 2021/02/01 21:16:12 Response request: 0 2021/02/01 21:16:12 Create request: 1 2021/02/01 21:16:12 Response to request: 1 2021/02/01 21:16:12 Create request: 2 2021/02/01 21:16:13 Create request: 3 2021/02/01 21:16:13 Create request: 4 2021/02/01 21:16:13 Create request: 5 2021/02/01 21:16:13 Response request: 5 2021/02/01 21:16:13 Create request: 6 2021/02/01 21:16:13 Response to request: 6 2021/02/01 21:16:13 Request created: 7 2021/02/01 21:16:13 Response to request: 7 2021/02/01 21:16:14 Create request: 8 2021/02/01 21:16:14 Create request: 9 Copy code

You can see that we set every

Create a request, significantly higher than
Seconds at most
The request limit, after running it, it was found that the number was
2, 3, 4, 8, 9
The request is discarded, indicating that the current limit is successful.

Then the question is, if there is a requirement for a certain interface

A maximum of 200 visits are allowed per minute. Suppose a user sends 200 requests in the last few milliseconds of the 59th second, and when the 59th second ends
It is cleared, and he will send another 200 requests in the next second. Then within 1 second, the user sent twice the request. This is in line with our design logic. This is also a design flaw of the counter method. The system may withstand a large number of requests from malicious users, or even penetrate the system.

As shown below:

Although this method is simple, it also has a big problem that it does not handle the boundary of unit time well.

Sliding window

Sliding window
It is aimed at the critical point defect of the counter, the so-called
Sliding window
Is a flow control technology, the term appears in
In agreement.
Sliding window
Divide the fixed time slices and move them as time goes by. A fixed number of movable grids are counted and the threshold is judged.

As shown in the figure:

In the above figure, we use the red dashed line to represent a time window (

One minute
), each time window has
Grids, each grid is
Seconds. Every time
The second time window moves one square to the right, you can see the direction of the red arrow. We set an independent counter for each grid
, If a request is
Visited then we set the counter of the fifth grid
(It's also
), when judging the current limit, you need to add up the counts of all grids and compare with the set frequency.

So how does the sliding window solve the problems we encountered above? Look at the picture below:

When the user is

Sent in seconds
Requests will be recorded by the counter in the sixth grid
, When the time window moves one to the right in the next second, the counter has already recorded the user s sent
A request, so if it is sent again, the current limit will be triggered and the new request will be rejected

In fact, the counter is a sliding window, it only has one grid, so if you want to make the current limit more accurate, you only need to divide more grids. In order to be more precise, we don t know how many grids should be set.

The number of grids affects the accuracy of the sliding window algorithm. There is still the concept of time slices, which cannot fundamentally solve the critical point problem.

Leaky bucket

Leaky Bucket Algorithm (Leaky Bucket)
, The principle is a leaky bucket with a fixed capacity, and water drops flow out at a fixed rate. Everyone who has used the faucet knows that when the faucet is turned on, the water will drip into the bucket. A leaky bucket refers to a leak in the bucket where water can flow out. If the faucet is opened too large, the water flow rate will be too high, which may cause the bucket to fill up and overflow.

As shown in the figure:

A fixed-capacity bucket has water flowing in and out. For the water that flows in, we cannot predict how much water will flow in, nor can we predict the speed of the water flow. But for the water flowing out, this bucket can fix the rate of water flowing out (

Processing speed
) To achieve
Traffic shaping
flow control


type LeakyBucket struct { rate float64 //Fixed water output rate per second capacity float64 //The capacity of the bucket water float64 //The current amount of water in the bucket lastLeakMs int64 //The last time the bucket leaked water timestamp ms lock sync.Mutex } func (l *LeakyBucket) Allow() bool { l.lock.Lock() defer l.lock.Unlock() now := time.Now().UnixNano()/1e6 eclipse := float64((now - l.lastLeakMs)) * l.rate/1000 // l.water = l.water - eclipse // l.water = math.Max(0, l.water) // l.lastLeakMs = now if (l.water + 1 ) <l.capacity { //try to add water, and the water is not full l.water++ return true } else { //The water is full, refuse to add water return false } } func (l *LeakyBucket) Set (r, c float64 ) { l.rate = r l.capacity = c l.water = 0 l.lastLeakMs = time.Now().UnixNano()/1e6 } Copy code

The leaky bucket algorithm has the following characteristics:

  • The leaky bucket has a fixed capacity, and the water outlet rate is a fixed constant (outflow request)
  • If the bucket is empty, no water droplets need to flow out
  • Water droplets can be flowed into the leaky bucket at any rate (inflow request)
  • If the inflowing water droplets exceed the capacity of the bucket, the inflowing water droplets overflow (the new request is rejected)

The leaky bucket limits the constant outflow rate (that is, the outflow rate is a fixed constant value), so the maximum rate is the rate of water outflow, and no burst flow occurs.

Token bucket algorithm

Token bucket algorithm

(Token Bucket)
Is network traffic shaping
(Traffic Shaping)
And rate limiting
(Rate Limiting)
One of the most commonly used algorithms. Typically, the token bucket algorithm is used to control the amount of data sent to the network and allow burst data to be sent.

We have a fixed bucket in which tokens are stored

. The bucket is empty at the beginning, and the system sets a fixed time
Add tokens to the bucket until the number of tokens in the bucket is full, and the excess requests will be discarded. When a request comes, remove a token from the bucket, and reject the request or block if the bucket is empty.

Implementation code:

type TokenBucket struct { rate int64 //Fixed token placement rate, r/s capacity int64 //The capacity of the bucket tokens int64 //The current number of tokens in the bucket lastTokenSec int64 //The timestamp of the last token put in the bucket s lock sync.Mutex } func (l *TokenBucket) Allow () bool { l.lock.Lock() defer l.lock.Unlock() now := time.Now().Unix() l.tokens = l.tokens + (now-l.lastTokenSec)*l.rate //add token first if l.tokens> l.capacity { l.tokens = l.capacity } l.lastTokenSec = now if l.tokens> 0 { //There are tokens, get tokens l.tokens-- return true } else { //If there is no token, refuse to return false } } func (l *TokenBucket) Set (r, c int64 ) { l.rate = r l.capacity = c l.tokens = 0 l.lastTokenSec = time.Now().Unix() } Copy code

The token bucket has the following characteristics:

  • Tokens are put into the token bucket at a fixed rate
  • Maximum storage in bucket
    Tokens, when the bucket is full, the newly added tokens are discarded or rejected
  • If there are not enough tokens in the bucket
    , The token will not be deleted, and the request will be throttled (discarded or blocked waiting)

The token bucket limits the average inflow rate (allowing burst requests, as long as there is a token, it can be processed, supporting 3 tokens, 4 tokens...), and allows a certain degree of burst traffic.


Currently commonly used is

Token bucket
In this way, this article introduces the implementation of several common current-limiting algorithms. The article is not easy to write, so you can't get lost if you pay attention.