Rate Limiter using Java – Python


What is Rate Limiter?

Rate limiting refers to preventing the frequency of an operation from exceeding a defined limit. In large-scale systems, rate limiting is commonly used to protect underlying services and resources. Rate limiting is generally used as a defensive mechanism in distributed systems, so that shared resources can maintain availability.

What are benefits of Rate Limiter?

Prevent systems from DoSS Attack, Pre-determine traffic for services, Secure systems by providing a limit based on IP, user, API key.

How to do Rate Limiting?

Rate Limiting is a process that is used to define the rate and speed at which consumers can access APIs. Throttling is the process of controlling the usage of the APIs by customers during a given period. Throttling can be defined at the application level and/or API level. When a throttle limit is crossed, the server returns HTTP status “429 – Too many requests“.

What are different types of throttling?

Here are the three famous throttling types that are used by different services: Hard Throttling: The number of API requests cannot exceed the throttle limit.
Soft Throttling: In this type, we can set the API request limit to exceed a certain percentage. For example, if we have rate-limit of 100 messages a minute and 10% exceed-limit, our rate limiter will allow up to 110 messages per minute.
Elastic or Dynamic Throttling: Under Elastic throttling, the number of requests can go beyond the threshold if the system has some resources available. For example, if a user is allowed only 100 messages a minute, we can let the user send more than 100 messages a minute when there are free resources available in the system.

Benefits – Why rate limiter is used for applications?

  1. Controlling cost: Most common reason to have rate limiting to control operational costs⁠ for an underlying resource is capable of auto-scaling to meet demand, by putting a virtual cap on scaling of resources where the budget for that resource usage is limited. An organization might use rate limiting to prevent experiments from running out of control and accumulating large bills.
  2. Security: Rate limiting prevents brute forcing of security intensive functionalities like login, promo code etc. Number of requests to these features is limited on a user level so brute force algorithms don’t work in these scenarios.
  3. Advertisement
  4. High availability: The most common reason for rate limiting is to improve the availability of API-based services by avoiding resource starvation. Load based denial of service (doS) attacks can be prevented if rate limiting is applied. When one user has high API or system usage then it should not impact other users, so a cap of number of requests per user is defined as rate limiting.

Token Bucket Algorithm for Rate-limiter

Java code for Rate-limiter Token Bucket

The token bucket algorithm is based on an analogy of a fixed capacity bucket into which tokens are added at a fixed rate. Before allowing an API to proceed, the bucket is inspected to see if it contains at least 1 token at that time. If so, one token is removed from the bucket, the API is allowed to proceed. In case there are no tokens available, the API is returned saying that the quota has exceeded during that window.

package com.ratelimiter;

import java.sql.Time;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.Semaphore;
import java.util.concurrent.TimeUnit;

 * This class is implemented based on Token bucket algorithm
public class RateLimiter {

    private long capacity;
    private long windowTimeInSeconds;

    long lastRefillTimeStamp;

    long refillCountPerSecond;

    long availableTokens;

    public RateLimiter(long capacity, long windowTimeInSeconds){
        this.capacity = capacity;
        this.windowTimeInSeconds = windowTimeInSeconds;
        lastRefillTimeStamp = System.currentTimeMillis();
        refillCountPerSecond = capacity / windowTimeInSeconds;
        availableTokens = 0;
    public long getAvailableTokens(){
        return this.availableTokens;
    public boolean tryConsume(){
        if(availableTokens > 0)
            return true;
            return false;
    private void refill(){
        long now = System.currentTimeMillis();
        if(now > lastRefillTimeStamp)
            long elapsedTime = now - lastRefillTimeStamp;
            //refill tokens for this durationlong tokensToBeAdded = (elapsedTime/1000) * refillCountPerSecond;
            if(tokensToBeAdded > 0) {
                availableTokens = Math.min(capacity, availableTokens + tokensToBeAdded);
                lastRefillTimeStamp = now;

Python code for Token Bucket Rate-limiter

import math

# This class is implemented based on Token bucket algorithm
class RateLimiter :
    capacity = 0
    windowTimeInSeconds = 0
    lastRefillTimeStamp = 0
    refillCountPerSecond = 0
    availableTokens = 0
    def __init__(self, capacity,  windowTimeInSeconds) :
        self.capacity = capacity
        self.windowTimeInSeconds = windowTimeInSeconds
        self.lastRefillTimeStamp = System.currentTimeMillis()
        self.refillCountPerSecond = capacity / windowTimeInSeconds
        self.availableTokens = 0
    def  getAvailableTokens(self) :
        return self.availableTokens
    def  tryConsume(self) :
        if (self.availableTokens > 0) :
            self.availableTokens -= 1
            return True
        else :
            return False
    def refill(self) :
        now = System.currentTimeMillis()
        if (now > self.lastRefillTimeStamp) :
            elapsedTime = now - self.lastRefillTimeStamp
            # refill tokens for this duration
            tokensToBeAdded = (elapsedTime / 1000) * self.refillCountPerSecond
            if (tokensToBeAdded > 0) :
                self.availableTokens = min(self.capacity,self.availableTokens + tokensToBeAdded)
                self.lastRefillTimeStamp = now

Fixed Window Algorithm

The simplest approach to build a rate limiter is the “fixed window” implementation in which we cap the maximum number of requests in a fixed window of time.

For exmaple, if the window size is 1 hour, we can “fix” it at the top of the hour, like 12:00-12:59, 1:00-1:59, and so forth.

long currentWindowNum = Instant.now().toEpochMilli() / WINDOW_SIZE.toMillis();

Java Code for Fixed Window Algorithm Rate – Limiter

import java.time.Duration;
import java.time.Instant;
import java.util.HashMap;
import java.util.Map;

class RateLimiter {

    private static final Duration WINDOW_SIZE = Duration.ofSeconds(5);
    private static final long REQUEST_LIMIT = 10000;

    private final Map<Integer, User> users = new HashMap<>();

    boolean rateLimit(int userId) {
        // Get the current window number. This breaks down time into discreet windows of WINDOW_SIZE width.
        long currentWindowNum = Instant.now().toEpochMilli() / WINDOW_SIZE.toMillis();
        User user = users.computeIfAbsent(userId, (k) -> new User(currentWindowNum));

        if (currentWindowNum > user.windowNum) {

        if (user.requestCount < REQUEST_LIMIT) {
            return true;
        return false;

    private static class User {
        private long windowNum;
        private long requestCount;

        User(long windowNum) {
            this.windowNum = windowNum;
            this.requestCount = 0;

        void reset(long windowNum) {
            this.windowNum = windowNum;
            this.requestCount = 0;

Python Code for Fixed Window Algorithm Rate – Limiter

class RateLimiter :
    WINDOW_SIZE = Duration.ofSeconds(5)
    REQUEST_LIMIT = 10000
    users =  dict()
    def  rateLimit(self, userId) :
        # Get the current window number. This breaks down time into discreet windows of WINDOW_SIZE width.
        currentWindowNum = Instant.now().toEpochMilli() / RateLimiter.WINDOW_SIZE.toMillis()
        user = self.users.computeIfAbsent(userId,lambda k : RateLimiter.User(currentWindowNum))
        if (currentWindowNum > user.windowNum) :
        if (user.requestCount < RateLimiter.REQUEST_LIMIT) :
            user.requestCount += 1
            return True
        return False
    class User :
        windowNum = 0
        requestCount = 0
        def __init__(self, windowNum) :
            self.windowNum = windowNum
            self.requestCount = 0
        def reset(self, windowNum) :
            self.windowNum = windowNum
            self.requestCount = 0

Sliding window algorithm

A sliding window rate limiter, unlike a fixed window, restricts requests for a discrete window prior to the current request under evaluation. As opposed to a fixed window rate limiter which groups the requests into a bucket based on a very definitive time window.

TODO: Java code


Leave a Reply

Your email address will not be published. Required fields are marked *

5 + nineteen =