ETL with simple Example

Pranav Bakare - Sep 3 - - Dev Community

Let's walk through another SQL-based ETL example. This time, we'll deal with a slightly more complex scenario where we have to aggregate data and calculate new metrics before loading it into a target table.

Scenario:

You have an orders table containing information about customer orders. You want to extract this data, transform it by calculating the total revenue per customer, and then load the results into a customer_revenue table.

Source Table: orders

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    amount DECIMAL(10, 2)
);

INSERT INTO orders (order_id, customer_id, order_date, amount) VALUES
(1, 101, '2024-08-01', 250.75),
(2, 102, '2024-08-03', 300.50),
(3, 101, '2024-08-05', 120.00),
(4, 103, '2024-08-07', 450.00),
(5, 104, '2024-08-09', 200.00),
(6, 102, '2024-08-10', 150.00),
(7, 101, '2024-08-12', 330.50);
Enter fullscreen mode Exit fullscreen mode

1. Extract:

Select data from the orders table.

SELECT * FROM orders;
Enter fullscreen mode Exit fullscreen mode

2. Transform:

Aggregate the data to calculate the total revenue per customer.

-- Example Transformation:
-- 1. Group the data by `customer_id`
-- 2. Calculate the total revenue for each customer
-- 3. Filter out customers with revenue below a certain threshold (e.g., 200)

SELECT
    customer_id,
    SUM(amount) AS total_revenue
FROM
    orders
GROUP BY
    customer_id
HAVING
    SUM(amount) >= 200;
Enter fullscreen mode Exit fullscreen mode

3. Load:

Insert the aggregated data into the customer_revenue table.

-- Create the target table if it doesn't exist
CREATE TABLE customer_revenue (
    customer_id INT PRIMARY KEY,
    total_revenue DECIMAL(10, 2)
);

-- Load the transformed data into the target table
INSERT INTO customer_revenue (customer_id, total_revenue)
SELECT
    customer_id,
    SUM(amount) AS total_revenue
FROM
    orders
GROUP BY
    customer_id
HAVING
    SUM(amount) >= 200;
Enter fullscreen mode Exit fullscreen mode

Complete Example in SQL:

-- Step 1: Extract
SELECT * FROM orders;

-- Step 2: Transform
SELECT
    customer_id,
    SUM(amount) AS total_revenue
FROM
    orders
GROUP BY
    customer_id
HAVING
    SUM(amount) >= 200;

-- Step 3: Load
CREATE TABLE IF NOT EXISTS customer_revenue (
    customer_id INT PRIMARY KEY,
    total_revenue DECIMAL(10, 2)
);

INSERT INTO customer_revenue (customer_id, total_revenue)
SELECT
    customer_id,
    SUM(amount) AS total_revenue
FROM
    orders
GROUP BY
    customer_id
HAVING
    SUM(amount) >= 200;

-- Verify the data in the target table
SELECT * FROM customer_revenue;
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • Extract: The data is selected from the orders table.
  • Transform: The data is grouped by customer_id, and the total revenue is calculated for each customer. We use the HAVING clause to filter out customers with total revenue below a certain threshold (e.g., 200).
  • Load: The aggregated data is inserted into the customer_revenue table.

Output:

After running this ETL process, the customer_revenue table will contain the total revenue for each customer who has spent more than a specified amount. This table can be used for further analysis, such as identifying high-value customers or generating reports.

This example demonstrates how SQL can be used to perform more advanced transformations like aggregation and filtering during an ETL process.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player