The Mystery of MongoDB Indexing

ChunTing Wu - Apr 25 '22 - - Dev Community

I believe we are all familiar with MySQL's indexing rules. Because it is built from a B+ tree, MySQL indexes have a leftmost match rule. In order for the query to match the index, the columns used in MySQL query syntax are arranged from left to right. For example,

SELECT * FROM table WHERE a = 1 AND b > 2 ORDER BY c;

The most efficient index for such a query is a compound index like (a, b, c). However, such an index cannot be applied to

SELECT * FROM table WHERE a = 1 ORDER BY c;

This is because it lacks the necessary b column.

This is the MySQL indexing rule, and in general, relational databases follow pretty much the same rule for implementation. However, there are subtle differences in MongoDB's implementation with B-trees.

MongoDB ESR Rule

We will continue to use the query mentioned in the previous section as a demonstration.

db.table.find({a: 1, b: {$gt: 2}}).sort({c: 1})

Much the same as the MySQL query mentioned earlier, but with MongoDB's MQL rewritten. However, the index that is valid for such a query is (a, c, b) instead of (a, b, c) as mentioned in the previous section.

This is because MongoDB's compound indexes must follow the ESR (Equality, Sort, Range) rule. Hence, in the above example, b is used for range matching and c is used for sorting, so the correct index order is c before b.

MongoDB Index Intersection

In addition to the ESR rules, which are different from relational databases, MongoDB has another mystery: it can use multiple (let's say 2) indexes in the same query, which is called index intersection.

Continue with the query in the previous section as an example.

db.table.find({a: 1, b: {$gt: 2}}).sort({c: 1})

To improve the query performance, we suggest to use (a, c, b) compound index, but in fact, we can achieve the same result with two indexes.

  1. b
  2. (a, c)

We create a single index b and a compound index (a, c), which can also improve the query performance.

Why do we use (a, c)? Because even index intersection must follow ESR rule, so we separate ES and R.

What is the advantage of this? The biggest advantage is that the composition of the index becomes more flexible. If only one index (a, c, b) is created, then if b is queried alone, there is no matching index to use, so an additional index of b must be created. As we all know, indexes are actually a cost, which will take up memory and affect writing efficiency. In other words, if a more compact index can be used to cover more complex query conditions, then such an index would be more valuable.

MongoDB ESR "Hidden" Rule

When using MySQL we use IN to do range queries, but IN is actually an equality in MySQL. That is to say, when the query is WHERE a IN (2, 3) is actually equivalent to WHERE a = 2 OR a = 3.

However, in MongoDB, it is not.

If you simply use a single column $in then it still has the same behavior as MySQL.

For example, find({a: {$in: [2, 3]}}) and find({$or: [{a: 2}, {a: 3}]}) are equivalent, and both belong to E of the ESR rule.

But if used with sort, then $in is treated as a range match, i.e. R. For instance,

find({a: {$in: [2, 3]}}).sort({b: 1})

Here a is treated as R, so to meet such a query, the index to be created should be (b, a) instead of (a, b).

Conclusion

If you only have experience with relational databases, it's easy to get confused by feature-rich NoSQL databases, especially since the underlying MongoDB is actually a B-tree family similar to relational databases. However, there are still significant differences in the implementation details.

When creating MongoDB indexes, it is especially important to pay attention to ESR rule. Many users who have moved from MySQL can easily fall down on this rule without noticing it.

In fact, even indexes using (a, b, c) don't cause problems when the data volume is small; MongoDB sorts in memory, but when the data volume grows to a certain size, MongoDB can't load the entire dataset in memory and uses hard disk accesses. Then the performance will be very tragic.

Furthermore, the index intersection feature offers users more flexibility to create indexes and provides them with more diverse queries. However, it is also important to be aware of ESR rule when using index intersection in order not to lose more than you gain.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player