What I found load testing a Next.js app at scale

Have you ever load tested a Next.js application and found that CPU metrics looked completely fine while response times were quietly falling apart? I ran into exactly that.

I'd been working on a Next.js application and wanted to understand how it would actually hold up under real traffic before I shipped it. I hadn't load tested it yet, had no baseline to compare against, and the Lighthouse scores I had were taken during development. They told me very little about how the app would behave when it actually mattered.

Why client-driven load testing

The obvious approach to load testing a web app is to hit the server directly with synthetic HTTP requests. For a Next.js application with server-side rendering, that misses most of what matters.

Driving a real browser through real user flows gives you things you can't get from raw HTTP load. You get accurate Core Web Vitals under pressure, not just at rest. You get the real request pattern (RSC payload fetches, prefetching, streaming) without having to reverse-engineer what the client actually calls and in what order. And you get genuine end-to-end confidence rather than confidence that the API layer holds while the rendering layer goes untested.

The setup: Fargate, Artillery, Playwright

I used Artillery to coordinate multiple tasks on AWS Fargate spot instances, with a Playwright script driving a browser through a critical user flow.

Getting the environment into a state where it could actually be performance tested was most of the work. The application had a number of upstream dependencies, and making sure the right things were running, stubbed, or mocked at parity with production took significant effort. An OAuth2 mock had drifted out of sync with the real authentication provider in staging, and that needed fixing before a single meaningful test could run.

Modelling the load itself required getting analytics data on what the previous application had experienced on its highest-traffic day, scaling that up, and adding a buffer to deliberately represent a worst case rather than a typical peak. The goal was to stress the system, not simulate an average day.

At the scale I was testing, provisioning enough Fargate tasks to generate that load surfaced its own constraints. Running too many headless browsers on a single task means the task itself becomes the bottleneck, not the application. I ended up running a large number of tasks, and at that scale I hit bandwidth limits on the NAT gateway that needed addressing before the tests could run cleanly.

What I found: CPU isn't the right signal for SSR

Once the environment was stable, something unexpected showed up in the results.

As request load ramped up, tail latency started to spike. But the timing was odd: requests would increase, and response times would deteriorate noticeably later, like the application was absorbing load for a while before starting to struggle. More striking was that CPU utilisation stayed flat throughout. No throttling. No signal that anything was under stress.

The Horizontal Pod Autoscaler was configured to use CPU utilisation as the scaling trigger. This works well for stateless services. A Go API, for example, will show CPU pressure as load increases. But Node.js is single-threaded by design. The main thread can only use one core, regardless of how many cores you allocate to the pod.

When Node.js is responsible for server-side rendering HTML, that's CPU-bound work, but it saturates the event loop rather than showing up in CPU metrics. Think of the event loop like a single-lane road. At low traffic, cars flow through fine. As volume increases, a queue builds up. The road itself isn't broken and isn't obviously congested from a distance, but every car is waiting longer. CPU metrics were looking at the road surface. What I needed to measure was the queue.

I added Node.js VM measurements via New Relic, which surfaces event loop internals alongside the standard metrics. Once that was in place, I could query event loop lag directly:

SELECT average(nodejs.eventLoop.lag.median)
FROM Metric
TIMESERIES AUTO

Event loop lag was climbing steadily as request load increased. The rendering work was queuing behind itself, and nothing in the standard metrics was capturing it.

The fix

Once I understood what was actually happening, the fix was straightforward: switch the auto-scaling trigger from CPU utilisation to request rate, so the app scales in response to actual incoming load rather than a metric that doesn't reflect SSR pressure.

I also right-sized the pods. Because Node.js can't use more than one core for the main thread, allocating multiple CPUs per pod is largely wasteful for an SSR workload. Reducing CPU allocation per pod and increasing the number of replicas gave more horizontal capacity for the same resource cost. Response times improved significantly.

Further gains came from startup optimisation: using the output: 'standalone' option in Next.js to reduce image size, tuning liveness and readiness probes so new pods received traffic faster. At scale this matters. If pods take a long time to become ready, the auto-scaler's reaction is always running behind the traffic curve.

Scaling horizontally with small single-CPU pods isn't the only approach here. Matteo Collina, a Node.js TSC member, has a detailed breakdown of this exact problem with benchmarks across several strategies — including PM2, which carries roughly 30% overhead per request due to IPC coordination, and Watt, which sidesteps that by using SO_REUSEPORT to distribute connections at the kernel level. The numbers were surprisingly good. I haven't tried it yet, but if you're self-hosting Next.js and hitting this problem it's worth a look.

The broader point

Lighthouse scores in CI tell you about render performance in isolation. They don't tell you how the application behaves when the event loop is under concurrent load, or whether the infrastructure will scale in time to absorb a traffic spike.

For a self-hosted Next.js application, the things most likely to cause problems under real traffic aren't visible in a local development environment. They're in the interaction between your rendering workload, your pod sizing, and your auto-scaling triggers. The only way to find them is to apply real load and watch what the system actually does.