How I manage to handle 1 million requests of updating per minute with Rust

Thiện Trần
7 min readJun 17, 2021

For the last month, I’ve been working on improving the performance of our services, and all of them are using Django REST Framework. After enhancing API endpoints such as CRUD, statistics, cron jobs… and the performance works fine. But, when the time comes, we want these services can handle more requests and do better performance. Don’t make me wrong, Django has done very well at the point of its jobs “rapid development and clean, pragmatic design” and it’s true, DRF helped us kept the deadline, however, it’s time to make something that fit our desire.

At the start point, we have built the monolith service, which means, inside of it, there are plenty of businesses. Looking at them, I started thinking to separate some businesses into smaller services, therefore, I can improve much easier and more flexibly. So, I choose one of those, a small business to improve and see how could it be better. In this topic, I will show you how I build the best that I could to solve the problem.

Just to be clear, I think I should tell you that I've implemented this small business in 2 ways. One is Golang with Ginonic framework, the other is Rust with the help of Actix and sqlx. The result of Golang is very cool. After modifying the config of database and doing some other tricks, this is the best result of it for 15000 rps:

If you want to know what are those tricks, I’ll be back on another topic after 😃. Some of those tricks are described in this topic too when I implementing Rust server. However, this result doesn’t satisfy the title of this article, and while the p95, avg are ok but the max is 7.44s, it’s huge 😤. The garbage collector of Golang makes that peak. That thing is really hurt, right 😢. That is the reason why I used another language, Rust. Therefore, I will go into detail with my Rust implementation for the rest of this topic, the best result that I have.

Okay, talking too much! Let’s Rust into it👌!

1. Business Details

Let me describe this business first. Currently, our service allows students can practice online. With each exam, a student can take an attempt with it and could submit a list of answer choices, each time students answer a question, clients will send all answer choices to the server and we must save it to our database. The target here is to serve as fast as possible.

Exam and Attempt tables

2. How To Measure

To stress-test the service, I will use K6 testing tool. This tool allows us to create many requests per second easily with a script using javascript language. The code below shows my script for testing:

import http from 'k6/http';
import { sleep } from 'k6';

export let options = {
scenarios: {
constant_request_rate: {
executor: 'constant-arrival-rate',
rate: 10000,
timeUnit: '1s',
duration: '1m',
preAllocatedVUs: 6000,
maxVUs: 15000,
},
},
};

function getRandomInt(min, max) {
min = Math.ceil(min);
max = Math.floor(max);
return Math.floor(Math.random() * (max - min + 1)) + min;
}

export default function () {
let data = {answers: {"1": [1], "2": [2], "3": [3], "4": [4]}}
let params = {headers: {"Content-Type": "application/json"}}
http.patch('http://localhost:8000/attempts/' + getRandomInt(1, 100000), JSON.stringify(data), params);
}

That script will generate a PATCH request to the server with a random id of an attempt to update. Each request carries the body is the answers of a student for an exam. For each test, we will set the duration to 60s, max virtual users (workers) to 15000. K6 will return the result includes average time, min time, max time, p90, p95, failed rate,…

Note that my table has 100000 records so I the script will only generate id in range [1, 100000].

To run the testing we will use the command likes this: k6 run script.js

I used my computer to run the server, the database, and the testing. For the case you wonder, this is my computer specification:

  • CPU: Intel(R) Core(TM) i5–10400 CPU @ 2.90GHz, 6 cores, 12 threads
  • RAM: 24GB
  • Disk: GX2 SSD 256GB

Ah! I already forgot the database I use is Postgresql 12, a powerful database isn’t it? 😄

3. Server Implementation And Results

As I said above, for implementation, I use Actix web framework along with sqlx which will manage connections to my database by using a connection pool. When creating the connection pool, I let sqlx create its own default config, meaning that connection timeout, idle timeout, max lifetime of connection will be 30 seconds, 10 minutes, and 30 minutes.

In this article, we focus on the performance of updating, so other businesses like authenticating,… I will not implement in here.

The default setting of Actix will make the number of workers to handle requests to be double my cpu’s number of threads. In this particular case, it is 24, and I set the max connection of my connection pool to 24. After build and running for this first test with 15000 rps, I failed. The server cannot handle that load of requests even if I lift the worker number up to 400. The highest rate at this point is just 6000 rps 😧.

I’ve heard that Rust is a very fast language and in many cases, Rust has been used to replace Golang to boost the performance. I don’t think that 6000 rps is the real potential of it.

Yes, I was right! After spending some time to research, I realize that I didn’t build my server at release version. Base on this documentation, I have my best configure to build my server in here:

opt-level = 3
debug = false
split-debuginfo = 'off'
debug-assertions = false
overflow-checks = false
lto = true
panic = 'unwind'
incremental = false
codegen-units = 1
rpath = false

And it’s led to this result with 10000 rps:

Still cannot compare to the best of Golang. What’s the heck? 😔

I’ve set the max connection number of connection to 400 to assure that there is no bottleneck but nah, not affected. Luckily, I found the reason that prevents the performance of Rust is logging. If I turn off the log, I’ll get this with 400 workers and 100 connections to the database:

15000 rps result of Rust server

Bravo, Rust server implementations have beaten Golang’s with the speed likes 10 times faster!

Keep this configure of the server and push it into the limit with 20000 rate and see what happen:

As you can see, the server can still handle at max rps is 19000 with a bit longer served time for each request. I think this is the time we should improve on the database side.

For this case, the business is heavy writing, and one simple solution to boost the writing of a table is setting it to unlogged (caution: if your database is crashed, all records of this table will be lost).

ALTER TABLE attempt SET UNLOGGED;

The result after set unlogged to table

Yeah! As above image shows, at last, we finally handle 20000 requests of updating per second 🎆. The avg is 20ms, even the max is less than 1s compares to 7s of Golang at 15000 rps. This is what I’m looking for!

4. For The Future

I have run pgbench on my table, and it said that the maximum tps of my table is about 55000. Therefore, I still push the limit of my server merely to that number if somehow the process of one request is optimized. In the next action, I will dive into my server code to improve it. Maybe to reduce the serialize time, optimize memory allocation, so on.

I hope that this article has helped you somehow. Please stay tuned for my next topic when I have more things to tell you

Thanks for reading!!!

Have a good day! 🌻

--

--