Last time, we discussed how the role of a software engineer has evolved into the role of a product engineer and what the fundamental differences between them are. Today, I'd like to delve into the work routine and understand the key principles of operating as a product engineer.

Since the emphasis in this new role is on rapid iterations to gather feedback and validate your solution, it means we need specific tools to assess the code you release. When I talk about validation, I'm not referring to technical metrics, but rather how your solution impacts the overall business. Let's call it end-to-end (e2e) validation, encompassing the following points:

Impact on fundamental technical metrics such as database load, server resource consumption, and browser memory usage.
Impact on integration technical metrics, including the number of non-200 responses to users, increased response times for specific API endpoints, and a "laggy" interface.
Influence on local product metrics, such as the number of users who started using a feature after an improvement (adoption level) or the number of users who consistently use it.
Impact on business indicators - measured in the money generated by users utilizing the features you've worked on.

If we take another look at the points mentioned above, we'll find a common thread that unites them all - to measure impact, we need to "stay informed" about what's happening with the system in various dimensions. In the world of product development, there's a specific term for this - observability.

Observability in the context of software development refers to the system or application's ability to be easily observed and analyzed. This involves collecting data about the application's operation, its state, and performance to ensure transparency and the ability to quickly detect and address issues.

The list of previous points can be represented as an onion, where each ring is within the responsibility zone of different teams.

Level One: Engineering Metrics. Owners - Software Engineers.
Level Two: Integration Metrics. Owners - Site Reliability Engineers (SRE).
Level Three: Feature Metrics. Owners - Product Managers.
Level Four: Business Metrics. Owners - Analytics Team.

Each level has its key metrics, and each team is responsible for its level in the onion, providing a comprehensive assessment of the impact on the product and business.

For most modern companies, evaluating based on 1-2 of these points is quite commonplace. The more advanced ones try to assess changes against three. However, only A-players thoroughly assess all four parameters. Why is that? Because carrying out a detailed evaluation demands substantial effort, advanced skills in multiple areas, and smooth cooperation across various team departments. For very young startups, implementing such processes would be prohibitively costly, while the advantages would be minimal.

In the early stages of product development, the basic shortcomings are usually clear, and all the low-hanging fruits are visible. However, as your solution matures and it's time to enhance or discover new growth areas, such a system becomes a primary tool in your arsenal if you don't want to move blindly forward.

Returning to engineering…

All of this, of course, depends on how teams are organized within the company, but let's get back closer to our daily work routine. When working on a task, you're still writing code, there are no changes there. However, with such an approach, you need to think a bit further than just how to write code; you should also be interested in how your code works in production and, most importantly, how your code impacts the product.

I like to envision this as a pyramid, something like Maslow's hierarchy but with an engineering twist. Each level becomes a concern only after we've managed to fill the previous one, and for success at each level, we'll need specific tools that best address the needs.

So, we've written the code, initiated the deployment process, and now it's time to observe

Is my code working? 🛠️

Difficulty: Easy 🟢 
Owner: Developer 👨‍💻
Tools: Bugtracker 🐛
Evaluation time: up to 1 hour ⏲️

The bottommost level. Before evaluating further, you must ensure that your code fundamentally works. The primary metric is a binary answer: yes if your code somehow accomplishes its task, or no if it fails 100% of the time, even for the simplest user story without corner cases.

Typically, simple bug trackers like Sentry or Bugsnag are used as tools. Evaluation time is usually less than an hour. Depending on the release process, validation can occur at various stages—some use QA department testing, others rely on team tests, and some may test independently directly on sandbox/production environments. These days there's nothing new about this, and hopefully, we all do it.

Is my code working well? ✅

Difficulty: Medium 🟡
Owner: Developer, but you'll need initial setup of integrations with 
tools typically falling on DevOps/SRE shoulders ⚙️
Tools: Log management systems, times-series databases 📉
Evaluation time: 10 minutes to 6 hours. ⌚

After ensuring basic functionality, the next step is to assess how reliably our code is working and give it a quantitative evaluation. The primary metric here is the error/success rate.
Tools for this can include logging systems like the ELK stack or Grafana, or more time-series-oriented tools such as Prometheus, Datadog, or AWS CloudWatch if you operate under high load. To assess this parameter, you need to ensure that your code logs information or sends metrics to the relevant system. Configuring dashboards to have a visual representation is also crucial.

What's the ideal metric values to move to the next level? It all depends on the situation! For example, a billing system for processing orders on your site should strive for a success rate approaching 100%, while tasks related to unvalidated image classification or the use of an unreliable 3rd-party API may have a lower reliability level.

Sometimes evaluation can be challenging, especially if you need to introduce a new feature to users and teach them how to use it. In such cases, you can manually test the main user scenario with different inputs and settings 5-10 times.

Additionally, at this stage, automated alerts become your good friends. They send notifications if this metric starts deteriorating over time, allowing you to proactively respond to changes before your users become dissatisfied with the product

How does my code impact the system as a whole? 🌐

Difficulty: Medium 🟡
Owner: DevOps or developer 🧑‍💻
Tools: Log management systems, times-series databases 📊 
Evaluation time: 10 minutes to 6 hours ⏳

There are numerous technical metrics reflecting the health of your service. Depending on the product, their criticality may vary, but your changes can easily lead to spikes in any of them. For example, using a database query in a loop without attention can significantly increase the number of database queries, loading an entire file into memory can spike server memory consumption, or synchronously calling a 3rd-party API that responds slowly can increase the average HTTP response time for your users.

Most modern hosting or cloud providers cover these metrics out of the box. Many services, such as databases or queuing systems, also default to supporting data transmission to monitoring systems like Prometheus or OpenTelemetry. You need to put in some effort to start working with these, but the good news is that you only need to do it once for each tool.

Evaluation time: In the best-case scenario, you can identify the issue immediately. More often, problems arise with an influx of traffic, so detailed logging becomes your best friend in investigation to quickly understand what exactly is leading to deteriorating metrics. Ideally, this can be caught during performance testing, but maintaining load testing for every change, especially in the development of new products, can be a challenging task.

For example, in Monite, we use a set of multiple tools that support developers in collaboration with the DevOps team. For monitoring hardware metrics, such as database load, we use metrics exported to Prometheus. For a more detailed analysis of microservice performance, we have deep integration with Sentry. And for exploring end-to-end performance, we implement distributed tracing.

How does my code impact the success of a business feature? 📈

Difficulty: Hard 🟠
Owner: Product Manager, Analyst 📈
Tools: Google Analytics, Amplitude, Data Warehouse 🔍
Evaluation time: A few days for designing the A/B test, and from
a few weeks to several months for data accumulation and drawing 
conclusions 📅

Once we've ensured that technical metrics are alright, we can move to the next level - assessing the impact of your changes at a local product level. Analytical tools such as Google Analytics or Amplitude can be helpful here.

Let's consider a simple example - the checkout form on your service requires users to input their country in a strict format. Data show that the conversion to the next step is at 40%, and the simplest hypothesis is that users don't understand the expected input format, leading them to close the page after the first unsuccessful attempt. You decide to weaken the validation, and the conversion increases to 50%. However, you feel it's not enough, so you add autocomplete to the form, further increasing the conversion to 80%. This is the product impact of your changes.

It might not sound too difficult, but when trying to implement this in practice, many questions arise. What if I have a multi-platform application? How do I measure the impact of each change on a feature when we release them simultaneously? How can we understand that it's your changes affecting metrics and not external factors? Most of these questions can be answered with one solution - A/B testing. Conducting clean A/B tests requires analytical skills and, more importantly, a significant volume of traffic to draw statistically significant conclusions. The latter is a major hurdle for small companies.

Another reason I categorized this as “hard” is because such evaluation requires designing, bootstrapping, and implementing metrics for each feature of your product. As compensation, proudly use the "data-driven" badge on your resume.

How does my code impact the success of the product? 🚀

Difficulty: Insane 🔴
Owner: Product Manager, Analyst, Finance 🧑‍💼
Tools: Data warehouses, reporting tools, BI tools 💾
Evaluation time: from 1 month 🗓️

If in the previous point you are examining how users behave when interacting with your feature, transitioning here shifts your focus from the feature to the user. Who is the user? What problem does your product solve for them? Where did they come from? What keeps them from leaving for competitors? You literally need to know everything about them. This requires significant investments in a data collection, analysis culture and data infrastructure.

From an analytical standpoint, the complexity lies in establishing the correlation between key business metrics and local product metrics. You need to learn how changes, like increasing the form conversion rate by X%, will impact key business metrics (such as ARR, LTV, Churn rate, and any other metrics vital to your business). From a product standpoint, you need to understand not only your current users but also those contemplating using your product.

Honestly, this goes beyond the direct responsibilities of an engineer, but I believe that if you consider yourself a product engineer, you should at least be curious about how effective the solutions you release are and how they influence the successes and failures of the business as a whole.

How does your company measure the impact of engineering decisions on the product?

Measuring an engineering impact. Pyramid of needs for product engineers.