TL;DR: There is no Bitmap Scan (yet) in YugabyteDB, but we can still get scalable and predictable performance for multi-criteria conjunction queries.
In PostgreSQL, bitmaps are employed to minimize reads from the heap table when accessing data through single or multiple indexes. These index entries indicate which pages to retrieve from the heap table, preventing the need to scan all of them. When combining conditions from multiple indexes (or multiple entries from a GIN index), a bitmap is used to merge these page identifiers and arrange them in the physical order before fetching data from the table.
YugabyteDB differs in its approach by storing table columns within the primary key. Consequently, it doesn't face the random read issues seen in PostgreSQL or Oracle's heap tables. Moreover, YugabyteDB distributes rows based on their primary key, and index entries are distributed according to their indexed columns. When scanning a range, YugabyteDB doesn't require a list of physical pages but simply a range or list of keys. There are also additional features that enhance this efficiency. For instance, a single read operation can seek to multiple positions in the index (similar to a skip scan). Additionally, this list of keys can be batched when used in a nested loop join, reducing the number of distributed calls required.
Note that there may still be a need for bitmaps, especially with disjunctions (OR) as the following is about conjunction (AND), and it can still be implemented, differently than PostgreSQL. This is tracked by #4634.
Example
Consider a typical scenario: you have a query with multiple criteria on a table, and your goal is to achieve scalable and predictable performance by utilizing an index scan for each criterion.
I've created a table with a primary key (id
) and three column on which I can apply search conditions (a
, b
, c
) and fill it with rows with different selectivity:
create extension if not exists pgcrypto;
create extension if not exists orafce;
create table demo (
id uuid default gen_random_uuid() primary key
, a int, b int, c int, x text
);
create index on demo(a, id);
create index on demo(b, id);
create index on demo(c, id);
insert into demo(a,b,c,x)
select n%2, n%23, n%3, dbms_random.string('p',1000) x
from generate_series(1,1000000) n;
analyze demo;
I have created an index on each of the column a
, b
, c
and included the id
in each index. This design aims to allow each index to provide row identifiers without the need to access the actual table rows, optimizing for Index Only Scans.
I have a substantial dataset of one million rows where the columns a
, b
, c
individually exhibit relatively low selectivity, each returning approximately 50%, 4%, and 30% of the rows, respectively. However, when these filters are combined, the query becomes highly selective, resulting in only a few thousand rows being returned:
yugabyte=> select count(*) from demo;
count
---------
1000000
(1 row)
yugabyte=> select count(*) from demo where a=1;
count
--------
500000
(1 row)
yugabyte=> select count(*) from demo where b=1;
count
-------
43479
(1 row)
yugabyte=> select count(*) from demo where c=1;
count
--------
333334
(1 row)
yugabyte=> select count(*) from demo where a=1 and b=1 and c=1;
count
-------
7247
(1 row)
This scenario aligns with typical ad-hoc queries, often stemming from multi-criteria forms within the application. It's worth noting that I've also included a substantial column x
, containing random 1000-character strings, within the table.
My objective is to efficiently retrieve only the 7247 rows that match the criteria a=1 and b=1 and c=1
by leveraging all indexes to filter the data before accessing the table itself. This optimization is crucial for minimizing data retrieval and achieving consistent query performance.
Cost Based Optimizer
In my example, one criterion, b=1
, exhibits higher selectivity compared to the others and this information should be accessible by the query planner. I've performed ANALYZE on the tables to gather the selectivity, and for the following test, I enable the Cost-Based Optimizer:
yugabyte=> set yb_enable_optimizer_statistics to on;
SET
This allows the query planner to consider this selectivity when deciding which index to begin with. It's important to note that I'm conducting these tests in YugabyteDB version 2.18, where this setting isn't the default. While the techniques I'll demonstrate below remain consistent even without enabling it, they might not start with the most efficient index scan when not using the optimizer statistics.
Conditions combined with AND
The most straightforward way to construct a query with all three criteria is to combine them using the AND operator within a single WHERE clause:
explain (analyze, costs off, dist)
select * from demo where a=1 and b=1 and c=1
;
QUERY PLAN
----------------------------------------------------------------------------------------
Index Scan using demo_b_id_idx on demo (actual time=17.608..513.650 rows=7247 loops=1)
Index Cond: (b = 1)
Remote Filter: ((a = 1) AND (c = 1))
Storage Table Read Requests: 43
Storage Table Read Execution Time: 487.439 ms
Storage Index Read Requests: 43
Storage Index Read Execution Time: 4.961 ms
Planning Time: 0.121 ms
Execution Time: 514.129 ms
Storage Read Requests: 86
Storage Read Execution Time: 492.399 ms
Storage Write Requests: 0
Catalog Read Requests: 0
Catalog Write Requests: 0
Storage Flush Requests: 0
Storage Execution Time: 492.399 ms
Peak Memory Usage: 8 kB
(17 rows)
The drawback of this approach is that it utilizes only one index. The remaining predicates are pushed down during the Index Scan (referred to as a Remote Filter
), which still necessitates access to the table rows. These rows might reside on different nodes, and the process involves reading them to subsequently discard some based on the remaining filter conditions.
The efficiency of this method relies heavily on the selectivity of a single condition, which may not be the most optimal strategy for multi-criteria queries where the objective is to attain consistent and predictable performance.
Conditions combined with INTERSECT
To make use of all available indexes, we can split the query into several subqueries and then combine their results using the INTERSECT operator:
explain (analyze, costs off, dist)
select * from demo where a=1
intersect
select * from demo where b=1
intersect
select * from demo where c=1
;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
HashSetOp Intersect (actual time=17521.425..17523.206 rows=7247 loops=1)
-> Append (actual time=10934.606..17256.085 rows=355074 loops=1)
-> Result (actual time=10934.605..10946.849 rows=21740 loops=1)
-> HashSetOp Intersect (actual time=10934.602..10942.769 rows=21740 loops=1)
-> Append (actual time=25.525..10509.744 rows=543479 loops=1)
-> Subquery Scan on "*SELECT* 2" (actual time=25.525..820.742 rows=43479 loops=1)
-> Index Scan using demo_b_id_idx on demo (actual time=25.521..815.185 rows=43479 loops=1)
Index Cond: (b = 1)
Storage Table Read Requests: 43
Storage Table Read Execution Time: 774.230 ms
Storage Index Read Requests: 43
Storage Index Read Execution Time: 5.060 ms
-> Subquery Scan on "*SELECT* 1" (actual time=24.500..9652.965 rows=500000 loops=1)
-> Index Scan using demo_a_id_idx on demo demo_1 (actual time=24.497..9591.476 rows=500000 loops=1)
Index Cond: (a = 1)
Storage Table Read Requests: 489
Storage Table Read Execution Time: 9164.691 ms
Storage Index Read Requests: 489
Storage Index Read Execution Time: 6.277 ms
-> Subquery Scan on "*SELECT* 3" (actual time=24.660..6285.922 rows=333334 loops=1)
-> Index Scan using demo_c_id_idx on demo demo_2 (actual time=24.657..6245.744 rows=333334 loops=1)
Index Cond: (c = 1)
Storage Table Read Requests: 326
Storage Table Read Execution Time: 5971.567 ms
Storage Index Read Requests: 326
Storage Index Read Execution Time: 5.676 ms
Planning Time: 0.215 ms
Execution Time: 17552.637 ms
Storage Read Requests: 1716
Storage Read Execution Time: 15927.501 ms
Storage Write Requests: 0
Catalog Read Requests: 0
Catalog Write Requests: 0
Storage Flush Requests: 0
Storage Execution Time: 15927.501 ms
Peak Memory Usage: 288687 kB
(36 rows)
The challenge with this approach is that, even though we are using the appropriate index for each condition, the combination of results occurs only at the end of the process. For conditions that are not very selective, this translates to retrieving a significant number of rows, including all selected columns, from the distributed storage to the PostgreSQL backend, only to discard them later in the query execution. This can lead to unnecessary data transfer and processing overhead.
A single non-selective condition can significantly impact overall performance, even when the other conditions are highly selective. This approach is clearly not suitable for achieving our objective of consistent and predictable performance.
Conditions combined with JOIN
I can employ the SQL join operation as a logical equivalent to INTERSECT. In this approach, I query all the necessary columns from my table and perform a self-join for each criterion:
explain (analyze, costs off, dist)
/*+ Set(yb_bnl_batch_size 1024) */
select demo.* from demo
join demo as demo_a using(id)
join demo as demo_b using(id)
join demo as demo_c using(id)
where demo_a.a=1 and demo_b.b=1 and demo_c.c=1
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
YB Batched Nested Loop Join (actual time=135.339..638.096 rows=7247 loops=1)
Join Filter: (demo_a.id = demo.id)
-> YB Batched Nested Loop Join (actual time=48.520..489.336 rows=7247 loops=1)
Join Filter: (demo_b.id = demo_a.id)
-> YB Batched Nested Loop Join (actual time=13.074..360.234 rows=14493 loops=1)
Join Filter: (demo_b.id = demo_c.id)
-> Index Only Scan using demo_b_id_idx on demo demo_b (actual time=4.565..17.808 rows=43479 loops=1)
Index Cond: (b = 1)
Heap Fetches: 0
Storage Index Read Requests: 43
Storage Index Read Execution Time: 4.426 ms
-> Index Only Scan using demo_c_id_idx on demo demo_c (actual time=7.034..7.115 rows=337 loops=43)
Index Cond: ((id = ANY (ARRAY[demo_b.id, $1, $2, ..., $1023])) AND (c = 1))
Heap Fetches: 0
Storage Index Read Requests: 1
Storage Index Read Execution Time: 6.795 ms
-> Index Only Scan using demo_a_id_idx on demo demo_a (actual time=7.519..7.635 rows=483 loops=15)
Index Cond: ((id = ANY (ARRAY[demo_c.id, $1025, $1026, ..., $2047])) AND (a = 1))
Heap Fetches: 0
Storage Index Read Requests: 1
Storage Index Read Execution Time: 7.266 ms
-> Index Scan using demo_pkey on demo (actual time=16.879..17.297 rows=906 loops=8)
Index Cond: (id = ANY (ARRAY[demo_a.id, $2049, $2050, ..., $3071]))
Storage Table Read Requests: 1
Storage Table Read Execution Time: 16.246 ms
Planning Time: 2.173 ms
Execution Time: 639.000 ms
Storage Read Requests: 109
Storage Read Execution Time: 535.552 ms
Storage Write Requests: 0
Catalog Read Requests: 0
Catalog Write Requests: 0
Storage Flush Requests: 0
Storage Execution Time: 535.552 ms
Peak Memory Usage: 4351 kB
(35 rows)
What's particularly interesting, in contrast to INTERSECT, is that each index not only extracts the identifiers for its specific criterion but also reduces the scan to the list of primary keys obtained from the previous criterion. This showcases the significant advantage of the Nested Loop join, where the join condition is pushed down to the inner table scan. This is efficient only when a batch of values is pushed down, reason why I enabled Batched Nested Loops with a hint.
Since the indexes are covering the primary key, including id
, every scan is a fast Index Only Scan. Consequently, access to the table data only takes place in the final step when only the pertinent id
values remain for an access by primary key.
To Summarize...
To achieve the best performance with the JOIN execution plan in YugabyteDB 2.18, two parameters, which are not the default settings, can be configured either in the session or as a hint in the query:
-
yb_enable_optimizer_statistics
to start with the most selective index -
yb_bnl_batch_size
to reduce the number of loops from one index to the other. This optimization, visible in the execution plan asYB Batched Nested Loop Join
, is most effective whenyb_bnl_batch_size
is configured to its maximum value of 1024.
To benefit from this approach, it's advisable to create indexes for each criterion and include the primary key after the column on which we have a condition, to enable Index Only Scans (at least until #3574 is fixed). However, if a particular criterion exhibits very low selectivity, you may choose not to create an index for it, and the primary key will be used for the join in that case.
It's important to note that while the first approach, with all criteria in the same WHERE clause, may yield faster results for some queries, especially when one criterion is highly selective, the primary objective of the JOIN-based solution is to achieve consistent performance regardless of the combination of criteria. This ensures that your system maintains reliable and predictable performance levels across a wide range of query scenarios, and should avoid runaway queries for the worst cases.