I was working with my backend colleague on a particular project, going through his code, and I was amazed by how clean and simple the codebase was. I was literally looking for the queries behind some of the implementations he did; the code was really so simple and clean. I was like "how did he do this?" until one very good day he told me to read about MongoDB Aggregation.

After reading and understanding how MongoDB aggregation works, I was really amazed, so I decided to write an article about it.

Before I get started, here is what you will be familiar with after reading this article:

What is aggregation in MongoDB
How does the MongoDB aggregation pipeline work
MongoDB aggregation pipeline syntax
Practical example of MongoDB aggregation

What is aggregation in MongoDB?

Aggregation is a way of processing a large number of documents in a collection by means of passing them through stages, and these stages are called a Pipeline. These stages in a pipeline can filter, sort, group, reshape, and modify documents, and do much more.

Key Point🎯

A pipeline can have one or more stages.

The order of these stages is important.

This aggregation happens within the database engine, enabling it to handle large datasets efficiently.

How Does the MongoDB Aggregation Pipeline Work?

Here is a diagram to illustrate a typical MongoDB aggregation pipeline:

Image Credit: studio3t

Let's understand each of the stages and what they do:

$match stage - It filters the documents we need to work with, those that fit our needs.
$group stage - this is where the aggregation happens. It groups documents by a specified key to perform calculations like sum, average, max, min, and so on.
$sort stage - this sorts the documents in ascending or descending order based on specified fields.

There are many more stages; we will explore some others in the example below.

Key Point🎯

Each stage acts upon the results of the previous stage.

There can be one or more stages in a pipeline, depending on what you are planning to achieve.

Now that we understand how the pipeline works, let's take a look at the syntax.

MongoDB Aggregate Pipeline Syntax

This is an example of how to build an aggregation query:
db.collectionName.aggregate(pipeline, options)

where collectionName – is the name of a collection,
pipeline – is an array that contains the aggregation stages,
options – optional parameters for the aggregation

This is an example of the aggregation pipeline syntax:

pipeline = [
        { $match : { … } },
        { $group : { … } },
        { $sort : { … } }
       ]

Let's now see a practical example of how MongoDB aggregation works.

Practical Example Using MongoDB Aggregation

In this example, we'll use a sample dataset of restaurant orders. The goal of this aggregation is to retrieve the average order total and most popular dish for each day of the week, specifically for dine-in orders. The output will be a set of documents containing the day of the week and the corresponding rounded average order total and most frequently ordered dish.

Let's approach this step by step:

Step 1: Filter data by order type

db.restaurantOrders.aggregate([
  { $match: { orderType: "dine-in" } }
])

The $match stage filters the data to include only documents where the orderType is "dine-in".

Step 2: Group data by day of the week and calculate averages and most popular dish

db.restaurantOrders.aggregate([
  { $match: { orderType: "dine-in" } },
  {
    $group: {
      _id: { dayOfWeek: { $dayOfWeek: "$orderDate" } },
      avgOrderTotal: { $avg: "$orderTotal" },
      dishes: { $push: "$mainDish" }
    }
  }
])

The $group stage groups the data by the day of the week of the orderDate, calculates the average order total, and creates an array of all main dishes ordered on that day.

Step 3: Calculate the most popular dish

db.restaurantOrders.aggregate([
  { $match: { orderType: "dine-in" } },
  {
    $group: {
      _id: { dayOfWeek: { $dayOfWeek: "$orderDate" } },
      avgOrderTotal: { $avg: "$orderTotal" },
      dishes: { $push: "$mainDish" }
    }
  },
  {
    $addFields: {
      popularDish: {
        $reduce: {
          input: "$dishes",
          initialValue: { name: "", count: 0 },
          in: {
            $cond: [
              { $gt: [{ $size: { $filter: { input: "$dishes", cond: { $eq: ["$$this", "$$value.name"] } } } }, "$$value.count"] },
              { name: "$$this", count: { $size: { $filter: { input: "$dishes", cond: { $eq: ["$$this", "$$this"] } } } } },
              "$$value"
            ]
          }
        }
      }
    }
  }
])

The $addFields stage adds a new field popularDish, which uses $reduce to iterate through the dishes array and find the most frequently occurring dish.

Step 4: Sort data by day of the week

db.restaurantOrders.aggregate([
  { $match: { orderType: "dine-in" } },
  {
    $group: {
      _id: { dayOfWeek: { $dayOfWeek: "$orderDate" } },
      avgOrderTotal: { $avg: "$orderTotal" },
      dishes: { $push: "$mainDish" }
    }
  },
  {
    $addFields: {
      popularDish: {
        $reduce: {
          input: "$dishes",
          initialValue: { name: "", count: 0 },
          in: {
            $cond: [
              { $gt: [{ $size: { $filter: { input: "$dishes", cond: { $eq: ["$$this", "$$value.name"] } } } }, "$$value.count"] },
              { name: "$$this", count: { $size: { $filter: { input: "$dishes", cond: { $eq: ["$$this", "$$this"] } } } } },
              "$$value"
            ]
          }
        }
      }
    }
  },
  { $sort: { "_id.dayOfWeek": 1 } }
])

The $sort stage sorts the data by the day of the week in ascending order.

Step 5: Project and format the output

db.restaurantOrders.aggregate([
  { $match: { orderType: "dine-in" } },
  {
    $group: {
      _id: { dayOfWeek: { $dayOfWeek: "$orderDate" } },
      avgOrderTotal: { $avg: "$orderTotal" },
      dishes: { $push: "$mainDish" }
    }
  },
  {
    $addFields: {
      popularDish: {
        $reduce: {
          input: "$dishes",
          initialValue: { name: "", count: 0 },
          in: {
            $cond: [
              { $gt: [{ $size: { $filter: { input: "$dishes", cond: { $eq: ["$$this", "$$value.name"] } } } }, "$$value.count"] },
              { name: "$$this", count: { $size: { $filter: { input: "$dishes", cond: { $eq: ["$$this", "$$this"] } } } } },
              "$$value"
            ]
          }
        }
      }
    }
  },
  { $sort: { "_id.dayOfWeek": 1 } },
  {
    $project: {
      _id: 0,
      dayOfWeek: {
        $switch: {
          branches: [
            { case: { $eq: ["$_id.dayOfWeek", 1] }, then: "Sunday" },
            { case: { $eq: ["$_id.dayOfWeek", 2] }, then: "Monday" },
            { case: { $eq: ["$_id.dayOfWeek", 3] }, then: "Tuesday" },
            { case: { $eq: ["$_id.dayOfWeek", 4] }, then: "Wednesday" },
            { case: { $eq: ["$_id.dayOfWeek", 5] }, then: "Thursday" },
            { case: { $eq: ["$_id.dayOfWeek", 6] }, then: "Friday" },
            { case: { $eq: ["$_id.dayOfWeek", 7] }, then: "Saturday" }
          ],
          default: "Unknown"
        }
      },
      avgOrderTotal: { $round: ["$avgOrderTotal", 2] },
      popularDish: "$popularDish.name"
    }
  }
])

The $project stage reshapes the output by excluding the _id field, converting the numeric day of the week to its string representation, rounding the average order total to two decimal places, and including only the name of the most popular dish.

This example demonstrates how MongoDB Aggregation can be used to process restaurant order data, grouping by day of the week, calculating averages, finding the most popular dish, and formatting the output. It showcases several aggregation stages and operators, providing insights that could be valuable for restaurant management and menu planning.

Conclusion

MongoDB Aggregation is a powerful and flexible tool that can streamline data processing workflows, reduce the complexity of application code, and enable an easy way to extract valuable insights from data more efficiently. Whether you're working with large amounts of data, complex data structures, or any other type of data, MongoDB Aggregation offers a robust set of capabilities to help harness the full potential of your data.

In our restaurant orders example, we've seen how we can use aggregation to:

Filter data ($match)
Group and calculate averages ($group)
Perform complex calculations ($addFields with $reduce)
Sort results ($sort)
Reshape and format output ($project)

These operations, when combined, allow us to transform raw data into meaningful insights that can drive business decisions. For instance, a restaurant owner could use this aggregation to understand which days are busiest, what dishes are most popular on different days, and how average order totals vary throughout the week.

Remember, this is just scratching the surface of what's possible with MongoDB Aggregation. As you become more comfortable with these concepts, you'll find that aggregation can be applied to solve a wide variety of data processing challenges efficiently and elegantly.

For more MongoDB aggregation operators and stages, I recommend checking out the official MongoDB documentation or resources like Studio 3T's knowledge base.

Thanks for reading!

MongoDB Aggregation, is really powerful