Load Testing for Fun and Profit

2021.04.04 :: {web} :: #caddy #wrk

I've been doing a lot of load testing at work recently, using an internal load generation and measurement tool.

Since I seem to be so obsessed with making this website faster, I decided to run some numbers to get an idea of its true performance.

A little context§

Let me start by saying that load testing is an imperfect science. The tests that I'm going to detail today don't properly represent the real-world user experience of someone actually navigating around this website. There are lots of other factors that impact said user's percieved performance on this site, including their browser, my Cache-Control headers, their proximity in the world relative to my server location, the list goes on.

But these tests do provide a general idea of this website's max requests (or transactions) per second (TPS), or roughly how many people can access the home page at a time.

This not only measures the relative speed at which a user loads my website, but also provides a realistic expectation of how my site will perform if it ever blows up on Hacker News, for example.

It should be noted that a better measurement of the perceived user experience speed can be found with a tool like PageSpeed.

The tool§

Since I'm not at work, I decided to use a more rudimentary (but still quite good) load testing tool called wrk.

That's right, I'm not at work, so I do wrk... eghm.

Wrk let's me configure the number of threads, connections, duration, and headers with which to hit my server, Caddy. Since I can specify different headers, I can use the Accept-Encoding header to tune which version of my home page I want Caddy to serve, meaning the raw HTML, a gzip compressed version, or a Brotli compressed version.

As explained in this delightful post, wrk's use of threads + connections means that the total number of connections as specified on the commandline are split up evenly across the total number of threads specified on the commandline.

For example, if I used 75 connections with 25 threads, this means that each thread would handle responses for 3 connections.

In my case, with a little trial and error, I tweaked my connections and threads until I had a stable benchmark that saturated my 100 megabit home internet connection.

The tests§

Like I mentioned above, I ran three test runs for each "version" of my home page. One test for the uncompressed HTML, and then two more tests for the gzip and Brotli compressed assets, respectively. I say "version", because I tweaked my static site generator to also produce a minified version of my home page, and then ran three more tests against that.

I wish I could leave HTML minification on full time, but alas, a bug in Zola's minification dependency causes my <code> and <pre> blocks to collapse to a single line.

Once a new version of Zola is out with the fixed dependency, I'll definitely be upgrading and activing HTML minification.

The results§

Okay, enough talking, let's get to the data.

Test Setup§

Download speed of home connection, measured using speedtest CLI: 12.57 MB/s
Latency, as measured in same speed test: 15.78 ms
CPU (load test host): Ryzen ThreadRipper 3960x (24c/48t), performance governor activated
Mem (load test host): 64GB 3600Mhz DDR4
CPU (web server): Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
Mem (web server): 989MB
All compressed / minified assets are compressed or minified offline, meaning that the server is doing no extra work.

The web server is hosted on a cheap $5.00/month, 1GB/1CPU, 25GB SSD, 1000GB transfer droplet at DigitalOcean.

Tests were run for 5 minutes each, late at night on a quiet connection.

Commands§

wrk -t 25 -c 75  -d 300s --latency https://austindw.com
wrk -t 25 -c 125 -d 300s --latency -H "Accept-Encoding: gzip" https://austindw.com
wrk -t 25 -c 125 -d 300s --latency -H "Accept-Encoding: br" https://austindw.com

Data§

Metric	raw HTML	GZIP compressed	Brotli compressed	minified HTML	minified + GZIP	minified + Brotli
Size (bytes)	16781	3712	3002	11108	3374	2765
Threads	25	25	25	25	25	25
Connections	75	125	125	125	125	125
Requests per sec	752.82	2963.38	3193.88	1130.91	3070.17	3246.17
Transfer per sec (MB)	12.22	11.27	9.98 MB	12.23	10.69	9.41
P50 (ms)	98.34	40.93	37.61	67.70	38.90	37.39
P90 (ms)	114.28	52.35	50.94	74.82	52.63	50.10
P99 (ms)	151.74	79.53	75.61	79.71	83.91	70.37
Web server CPU 5min load	0.41	0.74	0.87	N/A	N/A	N/A

Takeways§

It should have been pretty obvious, but of course minifying and compressing HTML with Brotli results in the smallest file sizes, and thus the fastest response times.

Being able to handle ~3200 users requesting my home page in a single second at $5/month ain't nothin' to sneeze at, that's for sure (and another reason why static site generators are just the best).

What is interesting is that the difference between GZIP and Brotli wasn't as significant as I would have expected, but looking at the 5min CPU load and how the "Transfer per sec" dropped as response sizes got smaller hints at why - my server was starting to bottleneck on its single CPU core.

To confirm this, I ran another test with a raw text file containing the word "hello", which measured at a total of 6 bytes.

The result? 4077.37 requests/sec.

At first glance, this might seem like a big difference when compared against minified Brotli's 3246 request/sec. But considering that the minified + Brotli compressed asset is 2765 bytes, or 460x larger than the 6 byte text file, and both results are within ~800 TPS of each other, it would appear that the CPU on my droplet (which was pegged at 100% pretty much the whole time for both tests) is much more the culprit than the response size or my network speeds.

This means that I'd expect even better performance out of the minified + Brotli compressed assets with a beefier CPU/multi-core setup even if all other specs remained the same (network, memory, etc).

Summary§

In summary, this website really is fast. Because science says so.

One thing I've learned after my 9+ years working in software, it's always important to measure and thus validate your assumptions. Before this, my website always seemed fast, but now I know I can easily handle over 3000 visitors in a single second. This means I've got a lot of headroom on this blog before I'd need to worry about upgrading to a bigger droplet.

Was this testing methodolgy perfect? No. Will this web server actually be able to handle 3000 visitors in one second? Maybe. There's a lot that can effect that answer, but these results are very much "good enough", at least for now.