Whenever talking about application performance as a subject, it is common to think about loadtime and the metrics that are involved in that part of the user journey. While initial page load speed remains crucial, it alone is not sufficient to ensure a great user experience, as most of the user's time is spent after the page is finished loading. This is why measuring and monitoring post-load experiences is essential.

For a long time even our understanding of ‘responsiveness’ was somewhat bound to load-time metrics. “How long it takes to be interactive”, “How long it takes for the browser to respond to the users’ first input”, “How long did the JS parsing block the main thread”, all of which are focused around load time and making sure the browser is ready to start responding to interactions as soon as the loading stage is completed.

We now have INP as part of the web-vital and as a core web-vitals metric. It measures the responsiveness of a web page to user interactions and focuses on the delay between a user's action and the next frame shipped, providing better insight into the users’ experience as they navigate and interact with your site.

Alongside INP as a metric, the new Long Animation Frame API offers developers a great attribution model and a way of thinking on how to portion and divide work in on the main thread. Introduced in Chrome 123, the LoAF API allows us to detect long Animation Frames that may cause visual jank or poor responsiveness in our applications, either by delaying the response to an input, introducing long processing times or bottlenecks on styling and layout delaying the presentation of the next frame.

While INP gives us a high-level view of our application's responsiveness, LoAF provides the granular data needed to build better attributions and fix the underlying causes of poor interactivity. This combination allows developers to not only measure overall responsiveness but also to drill down into specific problematic interactions and optimize them effectively.

But before we dive deeper into those new tools lets get a bit of a history lesson on how our performance metrics got here and why they evolved the way they did!

History of Performance Metrics and Tooling

To fully appreciate the significance of Interaction to Next Paint (INP) and the Long Animation Frames (LoAF) API, it's crucial to understand the journey and history of how we think about performance metrics, as It reflects our growing understanding of what constitutes the key indicators and guidelines of a good user experience.

The RAIL Model

In 2015, Google introduced the RAIL model, which stands for Response, Animation, Idle, and Load. This model provided a solid a user-centric approach to performance, breaking down the user experience into key parts and giving some recommendations for each part:

Response: Respond to user input within 100ms.
Animation: Produce a frame in 10ms.
Idle: Maximize idle time to increase the odds of responding quickly to user input.
Load: Deliver content and become interactive in under 5 seconds.

The RAIL model was a significant mark on our journey, as it encouraged developers to think about performance in terms of user perception and interaction, not just loading speed. Not only that, but it was, probably, the first time we started thinking about interactivity and set some key metrics to help us understand and allow the browser to respond and ship frames on time.

Defining the 50ms window per task to allow the browser to process events and ship frames for a smooth 60fps experience.

50 ms or 100 ms? Timing tasks execution window acording to user input and frame timing window on a 60fps ‘schedule’. From: https://web.dev/articles/rail

Early days of interactivity metrics

Soon after the RAIL model, we got a few key metrics that focused around load-time. Developers mostly measured how long it took for a page to fully load and be responsive as quickly as possible:

Page Load Time: With metrics such as FCP, FMP and LCP
(first) CPU Idle Time: The point at which the CPU has the first 'quiet window' after the initial page load.
Time to Interactive (TTI): Initially named 'Time to Consistently Interactive', it indicates when the page was consistently able to respond to user input.
First Input Delay (FID): Measured the time from when a user first interacts with a page to the time when the browser is able to respond to that interaction.
Total Blocking Time (TBT): Lab metric that traditionally measures the total amount of time between First Contentful Paint (FCP) and Time to Interactive (TTI) where the main thread was blocked for long enough to prevent input responsiveness. This is the most common usage, though TBT is a metric that can be used over an entire session duration.

With TTI and FID being our first metrics more focused around interactivity, but still mostly attributed to how much time it took for the browser to download and parse assets and be able to start responding to user inputs and lacking a better understanding on the different causes of poor interactivity.

These metrics were steps in the right direction, but they still had limitations. FID for instance, only measured the first interaction, which didn't necessarily reflect the overall responsiveness of the page. And Time To Interactive was complex in nature, hard to reason about and somewhat unreliable.

How FCP, TTI and FID correlates to each other. Part of the https://web.dev/articles/fid article

TBT was an interesting addition to the toolkit. As a lab metric, it was used by lab tools to asses the ‘total blocking time’ during the loading of the page leading up to the TTI mark. But it is not necessarily a load-time metric, as its objective is to assess and measure the impact of long task blocking the main thread over time. Although this metric is not necessarily connected to user interactions, it is a good indicator of the possible impact of long tasks blocking user interactions and visual updates.

Searching for a better, user-centric, metric

With the knowledge and experience gathered over time with the web-vitals being established alongside those metrics, the Chrome team started investigating how a better, user-centric, responsiveness metric could be shaped. One that can observe not only the load-time but post load-time part of the user experience. And also encompasses all of the parts that can be attributed to slow interactions. As mentioned on the article "Towards a better responsiveness metric", those earlier metrics do not focus on the user experience directly, but instead on how much JavaScript runs on the page as as it is loading.

Sections of a user interaction as part of the https://web.dev/blog/better-responsiveness-metric article

In the image above, taken from the same article, we can already see a lot of resemblance to how INP as a metric functions and identify its different parts. From the input delay section, the processing time section and the next frame being shipped as the full account of the interaction duration.

Nowadays you can also observe this segmentation on DevTools when inspecting different ‘interaction’ entries on the ‘Interactions’ track of the Performance tab.

An interaction with the input delay and presentation delay beign displayed as ‘whiskers’ and the processing duration as a solid bar.

Layout Animation Frames (LoAF) and INP

The introduction of the Long Animation Frames (LoAF) API, alongside Interaction to Next Paint (INP), represents a significant improvement in how we measure and optimize interactions.

That is why in INP replaced FID in March 2024 as a core web-vital, bringing a session-wide metric that is user-centered and better suited to help us understand real users’ experience around interactions.

Also with the LoAF API came a shift, changing our base for measurements on interactions away from focusing on tasks to animation frames. Bringing several key advantages:

Animation Frames encompass all the work done by the browser in order to ship a new frame, including JavaScript execution, style calculations, layout, paint, and compositing. This provides a more comprehensive picture of performance compared to isolated task measurements.
Users perceive performance in terms of visual updates and responsiveness, which better correlates with the Animation Frame as a base metric.
Multiple small tasks that individually don't qualify as "long tasks" can collectively delay a frame. LoAF captures this cumulative effect, which task-based metrics might miss. Some interactions may incur several tasks and trigger multiple event handlers, where the Long Task API would only show any potential outliars, making it difficult to build attributions from.

Another important advantage of this new segmentation of work is that if you only consider long tasks as a source of interactivity problems you are eliminating an entire class of performance problems that has to do with styling and layout that also will occupy the main thread, preventing the browser from responding to interactions and slow down the production of new frames.

Because of that, frame-based measurements are better at identifying jank or stuttering in animations and interaction. This is crucial for ensuring smooth experiences and better correlations for your metrics, allowing for more effective optimization strategies.

Also important to note is that Animation Frames are not directly connected to user interactions, as they are more of a segmentation of work in order to ship a frame, and may originate from many different sources.

Utilizing the Long Animation Frames (LoAF) API

The LoAF API is a great candidate to monitor for issues around the user’s experience. As stated on the LoAF API article, a long animation frame is when a rendering update is delayed beyond 50 milliseconds, the same threshold for long tasks.

The LoAF API provides detailed insights into frame performance, rather than just a start and duration timings, we have an entire breakdown of the frame cycle (as shown on the LoAF API article):

startTime: the start time of the long animation frame relative to the navigation start time.
duration: the duration of the long animation frame (not including presentation time).
renderStart: the start time of the rendering cycle, which includes requestAnimationFrame callbacks, style and layout calculation, resize observer and intersection observer callbacks.
styleAndLayoutStart: the beginning of the time period spent in style and layout calculations.
firstUIEventTimestamp: the time of the first UI event (mouse/keyboard and so on) to be handled during the course of this frame.
blockingDuration: the duration in milliseconds for which the animation frame was being blocked.

Those timestamps can be used to calculate different parts of the frame cycle

How LoAF entry timings can be used as a breakdown for a frame cycle. From: https://developer.chrome.com/docs/web-platform/long-animation-frames

Some gotchas with LoAF entries

It is important to note that, similar to INP, LoAF entries aim to measure the entirety of the frame lifecycle and entries span from the entirety of the sesison. So LoAF entries attributions may come in different shapes and root causes and not only from script execution. Also, as pointed on the LoAF article, script attribution is only provided for scripts running in the main thread of a page, including same-origin iframes. Which means that third party scripts, extensions, cross-origin iframes and other sources won’t have script attributions, but may contribute to LoAF entries.

There’s also, as of this writting, information missing about scripts without source information, such as event handler callbacks and inline scripts.

Visualizing INP and LoAF data in the wild

Here we have two examples of how you can visualize INP in the wild. On the left you have the Vercel toolbar, showing a collection of INP entries on dev mode and on the right you have the trace viewer on a tool I am creating, PerfLab. You can see the the INP entry highlighted on the trace displayed, alongside with the report cards for other data present on the trace.

Here’s another trace visualized on PerfLab, showcasing that Animation Frames, and LoAF entries, are not directly linked to the INP as a metric. But they can cause input delay to an interaction.

This particular Animation Frame was part of a trace session to analyze the loading experience of a page and trying to understand the visual jank around the experience. Even though there was no direct user input, the page seemed unresponsive and you could see on the trace the main thread busy with different thrid party scripts.

Using LoAF to Improve INP

Interaction to Next Paint (INP) measures the latency of interactions throughout a page session, so INP attribution may come from interactions in any point in time. The Long Animation Frame (LoAF) data will help to add better attribution to what could have contributed to poor INP scores by providing information on the entire frame duration and timings to help you understand where the animation frame spent the most time blocking. And INP issues can stem from various sources, including input delay, script execution and layout and style operations. And those may come from code executed 1st or 3rd party scripts.

Making sure to capture both INP and LoAF data, utilizing the web-vitals attribution build you can better assess not only the state of your interaction metrics, but also what are the causes for any potential problem.

We’ve come so far from the early days of performance tooling and metrics and now have such incredible tools at our disposal to help us better understand and improve our experiences on the web! It is a truly incredible journey and with INP and LoAF as the latest entries on the toolkit we can finally have a better understanding of user interactions and better deliver a delightful post-load experience to our users.

Long Frames and INP: Understanding Post-Load Performance