Rate limiting - How it works
Rate limits are a powerful way to protect a web application from abuse. When setting up a web application the key elements to consider are:
- What endpoints on the web application require protecting?
- Do different endpoints require separate handling?
- What is the standard request rate for the entire application over a time period?
- How many concurrent connections are typically employed by your clients?
- What errors does your API endpoint give back as feedback to requests?
When considering the above questions, first understanding how rate limits can protect your application from abuse or miss use, the types of attacks that the technology can protect and how the rate limiting algorithm makes the decisions is important background information.
What kinds of attacks are stopped by rate limiting?
When protecting an application using rate limiting, the following scenarios of attacks are considered:
- Brute force and enumeration attacks
- Denial of Service (DoS) and Distributed Denial of Service (DDoS)
- Site scraping
What else can rate limiting protect?
Public APIs and authenticated APIs can be subject to both abuse or miss use. Sensible rate limit policies can be applied on these endpoints to prevent both attack and maintain service availability. Rate limiting can protect:
How does rate limiting work with user logins?
A well designed web application will only allow a certain amount of failed login attempts prior to locking down an account requiring a password reset. This it to protect against brute force attacks against an account. Bots commonly attempt to brute force logins to Wordpress and other popular web applications. Determined attackers can attempt to brute force api login endpoints.
Rate limiting on a login page can be applied to the IP address of a user attempting to log in. By rate limiting the IP address, we can avoid not only password brute force attacks but more simpler username iteration attacks.
Utilising Peakhour.IO rate limiting, responses to requests can be monitored and IP's blocked for administrator defined periods saving Origin servers valuable resources and stopping attacks dead in their tracks.
How could API rate limiting work?
An API are ubiquitous around the modern web. From Single Page Applications (SPAs) entirely composed using REST or GraphQL APIs to legacy applications using form submits, even browsing our blog, you have consumed a myriad of APIs.
As APIs are generally publicly available, limitations are generally placed so that they are not abused. Rate limiting for APIs protect against malicious attacks. An attacker could script a bot to perform many API calls to make the service unavailable for other users, causing unplanned downtime - a layer 7 DoS or DDoS attack.
APIs
Public and private APIs could be subject to abuse or miss use. Public APIs are discoverable by anyone and can be scripted for data mining or attacks as they are there for everyone to see. Rate limiting these endpoints based on fair use policies is common place. Keeping track of this within an endpoint can be expensive, so handling it via Peakhour can offload this burden from developers.
Overzealous 'good bots'
Peakhour has seen up to 65% of requests to web sites can be from automated bots. These are typically indiscriminate in their quest to mine information, after all, they don't get paged when your site goes down. Rate limiting good bots separately from your main users can ensure these crawlers don't stop your site from generating revenue.
How is rate limiting implemented?
Rate limiting is typically implemented using various methods:
Fixed window
Window based rate limiting is the simplest to understand. Fixed window limits are easy to define, such as 5,000 requests per 60 minutes. Fixed window rate limiting is subject to spikes at the edges of the window. For example, 5,000 requests in the first 5 minutes of the window may overwhelm a service.
Sliding window
A sliding window has the simplicity benefits of a fixed window, however use a rolling window. This allows bursts to be smoothed.
Token bucket
A token bucket is an algorithm in which a fixed capacity bucket is placed tokens. Tokens could be defined as bytes transferred or hits to an API. When a request is considered for rate limiting, tokens are removed from the bucket. If the buckes has a sufficient quantity of tokens, the request can proceed. If there are insufficient tokens, the request is considered to be non confirming. Non-conforming requests are dropped.
Leaky bucket
Leaky buckets are a mirror image of token buckets. Instead of removing tokens from a bucket, tokens are added to a bucket. Tokens are removed from the bucket (leaks) at a fixed rate. When a request is considered for rate limiting, it is compared to the number of tokens in the bucket. If the bucket is full, the request is considered non-conforming and is dropped.
If you need help protecting and securing your website and rate limiting seems like something you need to do, reach out to see how we can help.