Measuring the execution time of requests to external services is critical for performance monitoring and optimization. However, when connections to these external services are pooled, you might be inadvertently measuring more than just the request time. Specifically, if requests are taking too long and you run out of available connections, your custom logic may start including the waiting time to get a connection from the pool. This can lead to misleading metrics, causing you to misinterpret the performance of your system. Let's delve into how this happens and how you can avoid being fooled by your own metrics.

The Pitfall: Including Waiting Time in Metrics

When all connections in the pool are in use, additional requests must wait until a connection becomes available. This waiting time can skew your metrics if not measured separately from the actual request time.

Scenario: Running Out of Connections

Initial State: Your connection pool has a fixed number of connections, all of which are in use.
New Request: A new request comes in but must wait for a connection to become available.
Wait Time: The request waits (possibly for a significant amount of time) until a connection is free.
Request Time: Once a connection is obtained, the actual request is made.

If your custom logic measures the total time from when the request was made until a response is received, you are including both the waiting time and the request time.

Practical Example: Reproducing the Problem in Spring Boot with Apache HttpClient 5

To illustrate how you can be fooled by your own metrics in a connection-pooled environment, let's walk through a practical example using Spring Boot and Apache HttpClient 5. We'll set up a simple Spring Boot application that makes HTTP requests to an external service, measure the execution time of these requests, and demonstrate how connection pool exhaustion can lead to misleading metrics.

To simulate delays in the external service, we will use the httpbin Docker image. Httpbin provides an easy-to-use HTTP request and response service, which we can use to create artificial delays in our requests.

@SpringBootApplication
@RestController
public class Server {

    public static void main(String... args) {
        SpringApplication.run(Server.class, args);
    }

    class TimeClientHttpRequestInterceptor implements ClientHttpRequestInterceptor {

        @Override
        public ClientHttpResponse intercept(HttpRequest request, byte[] body, ClientHttpRequestExecution execution)
                throws IOException {
            var t0 = System.currentTimeMillis();
            try {
                return execution.execute(request, body);
            } finally {
                System.out.println("Request took: " + (System.currentTimeMillis() - t0) + "ms");
            }
        }
    }

    @Bean
    public RestClient restClient() {
        var connectionManager = new PoolingHttpClientConnectionManager();
        connectionManager.setMaxTotal(2); // Max number of connections in the pool
        connectionManager.setDefaultMaxPerRoute(2); // Max number of connections per route

        return RestClient.builder()//
                .requestFactory(new HttpComponentsClientHttpRequestFactory(
                        HttpClients.custom().setConnectionManager(connectionManager).build()))
                .baseUrl("http://localhost:9091")//
                .requestInterceptor(new TimeClientHttpRequestInterceptor()).build();
    }

    @GetMapping("/")
    String hello() {
        return restClient().get().uri("/delay/2").retrieve().body(String.class);
    }
}

In the above code we created a request interceptor (ClientHttpRequestInterceptor) to measure what we thought would be the execution time of requests to the external service backed by httpbin.

We also explicitly set the pool to a very small size of 2 connections to make it easy to reproduce the problem.

Now we just need to start httpbin, run our spring boot app and conduct a simple test using ab

$ docker run -p 9091:80 kennethreitz/httpbin

ab -n 10 -c 4 http://localhost:8080/
...
Percentage of the requests served within a certain time (ms)
  50%   4049
  66%   4054
  75%   4055
  80%   4055
  90%   4057
  95%   4057
  98%   4057
  99%   4057
 100%   4057 (longest request)

Request took: 2021ms
Request took: 2016ms
Request took: 2022ms
Request took: 4040ms
Request took: 4047ms
Request took: 4030ms
Request took: 4037ms
Request took: 4043ms
Request took: 4050ms
Request took: 4034ms

If we look at the numbers, we can see that even though we set an artificial delay of 2 seconds for the external server, we're actually getting a delay of 4 seconds for most requests. Moreover, we notice that only the first requests honor the configured delay of 2 seconds, while subsequent requests result in a delay of 4 seconds.

Time to profile

Profiling is essential when encountering strange code behavior because it identifies performance bottlenecks, uncovers hidden issues like memory leaks, and shows how your application uses system resources.

This time we’ll profile the running app using JFR while conducting the ab load testing.

$ jcmd <pid> JFR.start name=app-profile  duration=60s filename=app-profile-$(date +%FT%H-%M-%S).jfr

$ ab -n 50 -c 4 http://localhost:8080/
...
Percentage of the requests served within a certain time (ms)
  50%   4043
  66%   4051
  75%   4057
  80%   4060
  90%   4066
  95%   4068
  98%   4077
  99%   4077
 100%   4077 (longest request)

If we open the JFR file and look at the flamegraph, we can see that most of the execution time is spent by our HTTP client. The client's execution time is divided between waiting for our external service to respond and waiting to get a connection from the pool.

That explains why the response times we see are double the expected fixed delay of 2 seconds we set for our external server. We configured a pool of 2 connections. However, in our test, we're performing 4 concurrent requests. So, only the first 2 requests will be served in the expected time of 2 seconds. Subsequent requests will have to wait for the pool to release a connection, thus increasing the observed response time.

If we look at the flamegraph again we can also find out why the time measured by our ClientHttpRequestInterceptor does not reflect the time the external server takes to respond but the time it takes to get a connection from the pool plus the time it takes to perform the actual request to the external server. Our interceptor is actually wrapping a stack trace that ends up calling a pool manager to get a connection: PoolingHttpClientConnectionManager

Monitoring the response time of any HTTP client is best done using its built-in metrics because these metrics are specifically designed to capture precise timing information. They account for all aspects of the HTTP request lifecycle, including connection acquisition, data transmission, and response handling. This ensures that the measurements are accurate and consistent with the actual performance of the client.

Metrics Can Fool You: Measuring Execution Time in Connection-Pooled Environments

The Pitfall: Including Waiting Time in Metrics

Scenario: Running Out of Connections

Practical Example: Reproducing the Problem in Spring Boot with Apache HttpClient 5

Time to profile