Introduction

Power disaggregation, also known as Non-Intrusive Load Monitoring (NILM), is a fascinating yet challenging problem in the field of energy management and efficiency. It involves breaking down a household's total electricity consumption into individual appliance-level usage. This information can be invaluable for both consumers and utilities, enabling more informed decisions about energy usage and potentially leading to significant energy savings.
Despite its importance, power disaggregation remains a topic with limited practical resources available online, especially when it comes to code implementations. This scarcity of accessible information can make it challenging for researchers and practitioners to apply and improve upon existing methods.
In my previous role, I had the opportunity to work on this problem, implementing and enhancing a method described in the scientific article "Unsupervised extraction of electric water heaters from smart meter data" (https://arxiv.org/abs/2104.03120). This method, which I'll refer to as the "baseline method," offers an interesting approach to identifying water heaters from smart meter data.

The Baseline Method: A Brief Overview

The baseline method described in the paper relies on both active and reactive power measurements to detect and disaggregate water heater usage. Here's a simplified explanation of how it works:

It first analyzes the active power consumption profile, looking for large spikes of similar amplitude that correspond to the water heater's rated power.
The method creates a histogram of active power values above 2 kW, searching for outlier bins that indicate the presence of a water heater.
It then detects large jumps in both active and reactive power profiles, using these to identify potential ON/OFF periods of the water heater.
The approach leverages the fact that water heaters are purely resistive devices, unlike most other appliances which have inductive components. This allows it to use reactive power data to filter out false positives.
Finally, it estimates the water heater's power consumption based on these identified periods and the estimated rated power.

Adapting the Method: Working with Limited Data

While the baseline method provides a robust approach to water heater disaggregation, I faced a significant challenge in my implementation: I only had access to active power data, lacking the reactive power measurements used in the original method.
This limitation required me to adapt and improve upon the baseline approach. By focusing solely on the active power signal, I needed to develop new strategies to maintain accuracy and reliability in the disaggregation process.

Code implementation of the baseline method

NOTE: The sensor data was collected at a frequency of one minute.

# 1. Data Loading and Preprocessing

import pandas as pd
import numpy as np

# Load data
file_path = 'file/path' 
data = pd.read_csv(file_path, usecols=['reading_at', 'reading'])
data['reading_at'] = pd.to_datetime(data['reading_at'])

# Preprocessing
data = data.set_index('reading_at').sort_index(ascending=True)
data = data.diff()
data = data * 60

data = data.resample('T').mean()
data[(data > 10000) | (data < 0)] = np.nan
data.reset_index(inplace=True)

# 2. Estimating the Water Heater's Rated Power

filtered_data = data[(data['reading'] >= 1000) & (data['reading'] <= 4000)]
hist, bin_edges = np.histogram(filtered_data['reading'], bins=range(1000, 4001, 100))
bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2
max_density_bin = bin_centers[np.argmax(hist / hist.sum())]
estimated_rated_power_WH = max_density_bin

# 3. Detecting Water Heater Cycles

# Resampling data to 3-minute for detecting cycles
data['reading_at'] = pd.to_datetime(data['reading_at'])
data = data.set_index('reading_at').sort_index(ascending=True)
data = data.resample('3T').mean()
data = data.reset_index()
data['power_diff'] = data['reading'].diff()
data['WH_state'] = 0  # Initialize with all states OFF

jump = estimated_rated_power_WH * 0.7
on_start = None
potential_off = None
on_off_periods = []

for i in range(len(data)):
    # Check for a jump to potentially start an 'ON' period
    if data.loc[i, 'power_diff'] > jump:
        if on_start is None:
            on_start = i
        continue

    # Track the potential 'OFF' as a decrease in power_diff
    if data.loc[i, 'power_diff'] < 0:
        potential_off = i

    # Check if reading is below the threshold and there was an 'ON' start
    if data.loc[i, 'reading'] <= estimated_rated_power_WH and on_start is not None:
        # If there is a potential 'OFF', and the 'ON' duration is at least 4, keep the 'ON' period
        if potential_off and (potential_off - on_start) >= 7:
            data.loc[on_start:potential_off-1, 'WH_state'] = 1
        else:
            # If the 'ON' period is too short, set it to 'OFF'
            data.loc[on_start:i, 'WH_state'] = 0
        # Reset for the next potential 'ON' period
        on_start = None
        potential_off = None

# 4. Post-processing to Refine 'ON' Periods

on_off_periods = data[data['WH_state'] == 1].index.to_series().groupby(data['WH_state'].ne(1).cumsum()).agg(['first', 'last'])

for _, row in on_off_periods.iterrows():
    start_idx, end_idx = row['first'], row['last']

    # Get the 'reading' value just before the 'ON' start and just after the 'ON' end
    pre_on_reading = data.loc[start_idx-2:start_idx+1, 'reading'].min()
    post_off_reading = data.loc[end_idx-1:end_idx+2, 'reading'].min()

    # Determine the minimum 'reading' value to use as a threshold
    min_reading_threshold = min(pre_on_reading, post_off_reading)

    # Check if during the entire 'ON' period the condition is met
    if all(data.loc[start_idx+2:end_idx-2, 'reading'] - estimated_rated_power_WH >= min_reading_threshold*0.7):
        # If condition is met, confirm 'ON' state
        data.loc[start_idx:end_idx, 'WH_state'] = 1
    else:
        # If condition is not met, set the entire period to 'OFF'
        data.loc[start_idx:end_idx, 'WH_state'] = 0

# 5. Finalizing the Dataset

# Update the 'estimated_WH_power' column based on the final 'WH_state' values
data['estimated_WH_power'] = np.where(data['WH_state'] == 1, estimated_rated_power_WH, 0)

data.set_index('reading_at', inplace=True)
final_dataset = data[['reading', 'estimated_WH_power', 'WH_state', 'power_diff']]

Now, let's provide a brief description for each part of the implementation:

Data Loading and Preprocessing: This section loads the power consumption data from a CSV file and performs initial preprocessing. It converts timestamps to datetime objects, calculates power differences, resamples the data to one-minute intervals, and removes outliers. This step ensures the data is in a suitable format for analysis.
Estimating the Water Heater's Rated Power: Here, we estimate the rated power of the water heater by creating a histogram of power readings between 1000W and 4000W. The bin with the highest density is assumed to correspond to the water heater's rated power. This approach leverages the characteristic power consumption pattern of water heaters.
Detecting Water Heater Cycles: This part identifies potential ON/OFF cycles of the water heater. It resamples the data to 3-minute intervals and looks for significant jumps in power consumption that could indicate the start of a water heater cycle. It also tracks potential OFF periods and sets minimum duration criteria for valid ON periods.
Post-processing to Refine 'ON' Periods: This step further refines the detected ON periods by comparing the power consumption during these periods with the readings just before and after. It helps to eliminate false positives by ensuring that the power consumption during an ON period is consistently higher than the surrounding OFF periods.
Finalizing the Dataset: Finally, this section creates an 'estimated_WH_power' column based on the refined ON/OFF states and prepares the final dataset with relevant columns for further analysis or visualization.

This implementation adapts the method described in the paper to work with only active power data. The key challenges addressed include estimating the water heater's rated power without reactive power data and refining the detection of ON/OFF cycles to minimize false positives.

This is some results:

Improved Method

To enhance the accuracy of the method, I discovered that the estimated_rated_power_WH is not accurately estimated by the base method. Therefore, I recalculated it after identifying all the On/Off periods and utilizing only the On periods from the power consumption readings. Since the water heater maintains a consistent power consumption during all On periods, the real consumption rate should be the most frequently occurring value, which can be determined using a histogram.

The following code should be added to the baseline method to improve it:

# Improved Estimation of Water Heater's Rated Power

filtered_data_2 = final_dataset[final_dataset['estimated_WH_power'] != 0]
hist, bin_edges = np.histogram(filtered_data_2['reading'], bins=range(1000, 4001, 4))
bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2
estimated_rated_power_WH_2 = bin_centers[np.argmax(hist / hist.sum())]

# Update the 'estimated_WH_power' column with the refined estimation
final_dataset['estimated_WH_power'] = np.where(final_dataset['WH_state'] == 1, estimated_rated_power_WH_2, 0)

Results

To validate the method, I compared the actual energy consumption of the water heater, as measured by a smart plug over a 20-day period, to the results obtained from the method during the same period.

During the specified period, the total energy consumption of the water heater was 74.689 kWh. The base method predicted a consumption of 48.661 kWh, achieving an accuracy of 65%. In contrast, the improved method predicted 69.516 kWh, resulting in a significantly higher accuracy of 93%. This represents a substantial improvement in prediction accuracy.

The real power rate of the water heater is 1800 WH, the base method estimated it at 1300 WH, and the improved one estimated it at 1700 WH.

Conclusion

This work demonstrates a significant improvement in power disaggregation for water heaters using only active power data. By refining the estimation of the water heater's rated power and focusing on detected ON periods, we were able to substantially increase the accuracy of energy consumption predictions.

These results highlight the potential of refining disaggregation methods even when working with limited data sources. By focusing on the characteristics of the specific appliance (in this case, the consistent power draw of a water heater during operation), we were able to extract more accurate information from the available data.

Future Work

While this improvement represents a significant step forward, there is still room for further enhancement of the method:

Event Detection Refinement: The accuracy of the disaggregation could potentially be improved by enhancing the event detection algorithm. This could involve more sophisticated methods for identifying the start and end of water heater cycles, possibly incorporating machine learning techniques to recognize patterns in the power consumption data.
Parameter Optimization: The current method uses several parameters (such as the resampling rate and thresholds for identifying ON/OFF periods) that were chosen based on empirical observation. A systematic study to optimize these parameters could lead to even better results. This could involve techniques like grid search or more advanced optimization algorithms.
Adaptive Thresholding: Implementing adaptive thresholds that can adjust to different household characteristics or seasonal changes could make the method more robust and applicable across a wider range of scenarios.
Incorporation of Additional Features: While this method focuses on active power, future work could explore ways to incorporate other available data (such as time of day, day of week, or external temperature) to improve predictions without requiring additional sensor data.
Validation Across Diverse Datasets: To ensure the generalizability of this method, it should be tested on a wider range of households with different types of water heaters and usage patterns.

By addressing these areas, we can continue to improve the accuracy and reliability of power disaggregation methods, making them more useful for energy management and conservation efforts. As we move towards more sustainable energy usage, such techniques will play a crucial role in providing detailed, appliance-level consumption data to both consumers and utilities, enabling more informed decisions and targeted efficiency improvements.

Improving Water Heater Power Disaggregation with Limited Data: A Practical Approach