SQL Performance: A Deep Dive into Indexing with examples

Firoj Mujawar - Oct 2 - - Dev Community

Most of the times we create a single-column index for join columns, but in practical I experienced the power of multiple column index which is called composite indexes.
There was a table with 10+ millions of records with approximate 3gb size, and query was already optimized to return response within 900ms.
But with the power of composite indexes I reduced response time to 50ms. This came after so many trials and errors, and I am glad to put this as a blog on internet.

We will dive into:

  • Single-column indexes and how they work.
  • An overview of B-tree, GIN, and GiST indexes.
  • The power of composite indexes, partial composite indexes, and how prefix columns work in composite indexes.

1. Single-Column Indexes

A single-column index is the simplest form of an index, created on just one column. It works well for queries that frequently filter or sort based on that column alone.

Example:

Let’s say we have a table employees:

CREATE TABLE employees (
    emp_id SERIAL PRIMARY KEY,
    name TEXT,
    department TEXT,
    salary NUMERIC
);
Enter fullscreen mode Exit fullscreen mode

Now, if we frequently query based on the department column:

SELECT * FROM employees WHERE department = 'Engineering';
Enter fullscreen mode Exit fullscreen mode

We can create a single-column B-tree index to speed up this query:

CREATE INDEX idx_department ON employees (department);
Enter fullscreen mode Exit fullscreen mode

PostgreSQL will use this index whenever you query the department column, speeding up lookups by reducing the number of rows scanned.

2. B-Tree Indexes

The B-tree (Balanced Tree) index is the default and most commonly used index type in PostgreSQL. It is optimal for queries that involve comparison operators (=, <, >, <=, >=), as it efficiently organizes data in sorted order.

When to Use B-tree:
  • Exact matches (=).
  • Range queries (e.g., salary > 50000).
  • Sorting and grouping.
Example:
CREATE INDEX idx_salary ON employees (salary);
Enter fullscreen mode Exit fullscreen mode

Now, any query filtering or sorting by salary will be optimized:

SELECT * FROM employees WHERE salary > 60000;
Enter fullscreen mode Exit fullscreen mode

3. GIN and GiST Indexes

GIN (Generalized Inverted Index) and GiST (Generalized Search Tree) indexes are specialized for more complex data types like arrays, full-text search, and geometric types. We should stick to B-tree index for primitive data types as B-tree will outperform GIN and GiST for primitives. Usage is as below:

  • GIN is ideal for indexing array values or documents (e.g., full-text search, JSONB fields).
  • GiST is better suited for geometric data types and data that doesn't fit into a strict linear order, such as ranges or network types.
Example:

For a table with a JSONB field containing customer data:

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    info JSONB
);
Enter fullscreen mode Exit fullscreen mode

A GIN index will optimize searches within the JSONB column:

CREATE INDEX idx_info ON customers USING gin (info);
Enter fullscreen mode Exit fullscreen mode

Now, querying within the JSONB field becomes efficient:

SELECT * FROM customers WHERE info @> '{"status": "active"}';
Enter fullscreen mode Exit fullscreen mode

4. Composite Indexes

A composite index is an index on multiple columns, and it can dramatically improve performance for queries involving multiple conditions.
When filtering is to be done on multiple columns in where clause and order by clause, composite indexes outperform single-column index with significant margin. Composite index uses prefix columns if not all the columns used in where clause for that we'll have to design composite index properly. More about prefix columns in this blog ahead.

Example:

For a table orders:

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    status TEXT
);
Enter fullscreen mode Exit fullscreen mode

If you often query by both customer_id and order_date, a composite index can help:

CREATE INDEX idx_customer_order ON orders (customer_id, order_date);
Enter fullscreen mode Exit fullscreen mode

With this index, queries filtering by both customer_id and order_date are optimized:

SELECT * FROM orders WHERE customer_id = 1001 AND order_date > '2024-01-01';
Enter fullscreen mode Exit fullscreen mode
Composite Index Usage with Prefix Columns

When using composite indexes, PostgreSQL can efficiently use the prefix columns (the leading columns) even if not all the indexed columns are included in the query. For example, in the composite index (customer_id, order_date), queries filtering by customer_id alone can still use the index.

SELECT * FROM orders WHERE customer_id = 1001;
Enter fullscreen mode Exit fullscreen mode

However, PostgreSQL cannot use this index efficiently if the query filters only by order_date because customer_id is the leading column.

5. Partial Composite Indexes

A partial index is a powerful optimization technique in PostgreSQL, where the index is created on a subset of the table's data. This can reduce the size of the index and make lookups faster by indexing only the relevant rows.

Example:

Let’s say you only care about orders that are active:

CREATE INDEX idx_active_orders ON orders (customer_id, order_date) WHERE status = 'active';
Enter fullscreen mode Exit fullscreen mode

This index will optimize queries that filter by customer_id, order_date, and status = 'active'.

SELECT * FROM orders WHERE customer_id = 1001 AND order_date > '2024-01-01' AND status = 'active';
Enter fullscreen mode Exit fullscreen mode

By narrowing the index to active orders only, PostgreSQL can skip irrelevant rows, improving performance.

6. Single-Column vs Composite Indexes

The choice between single-column and composite indexes depends on your query patterns.

  • Single-column indexes are optimal when your queries focus on individual columns.
  • Composite indexes are better when your queries often filter or sort by multiple columns.
Key Point: Index Prefix Columns

In a composite index, PostgreSQL can use the index efficiently if the query involves leading columns (prefix columns). If your query only involves non-prefix columns, PostgreSQL cannot use the index optimally.

Example:

For a composite index on (customer_id, order_date, status):

  • Queries using just customer_id will benefit from the index.
  • Queries using customer_id and order_date will also benefit.
  • But queries using only status or order_date will not benefit.

7. B-tree vs GIN vs GiST: Which to Use?

  • B-tree is the go-to for most types of queries, especially those involving comparisons or sorting.
  • GIN is best for full-text search, arrays, and documents like JSONB.
  • GiST is used for geometric data, range queries, and other complex types.

Conclusion

Indexes are an essential tool for optimizing query performance in PostgreSQL, but choosing the right type of index can be a nuanced decision based on the nature of your queries and the data. Whether it’s single-column indexes for simplicity, composite indexes for multi-column queries, or GIN/GiST indexes for specialized data types, a well-thought-out indexing strategy can dramatically reduce query times.

For large tables and frequent queries, understanding the concepts of prefix columns and partial indexes is crucial. A carefully designed index can shave off milliseconds or even seconds from query performance — and over time, that difference adds up.

Happy optimizing!

.
Terabox Video Player