PostgreSQL Insights: Understanding MVCC in Transactions

0. Introduction

This article delves into the Multi-Version Concurrency Control (MVCC) mechanism commonly implemented in transaction modules. It explores the advantages and disadvantages of MVCC, along with its specific implementation in PostgreSQL.

1. Introduction to MVCC

MVCC (Multi-Version Concurrency Control) is a method utilized in database management systems for handling concurrent access. Traditional concurrency control often relies on locking, ensuring that only one transaction can modify data at a time. However, this approach can lead to reduced concurrency and performance bottlenecks, especially in high-concurrency scenarios. MVCC addresses some limitations of the traditional locking mechanism by maintaining multiple versions of data. Each transaction sees a specific version, allowing read and write operations to execute without interfering with each other. Essentially, when a modification is made, a new version of the data is created instead of altering the original data directly. Other transactions can still access the old data, enhancing concurrency. Despite its benefits, MVCC can consume more memory due to multiple versions, particularly in high-concurrency environments.

2. Common Implementations of MVCC

There are two prevalent approaches to implementing MVCC:

Backup Before Modifying: Before modifying the old data, it is backed up to a separate space while the new data is written. Other transactions can read from this backup space, such as the rollback segment in MySQL's InnoDB engine.
Insert Instead of Modify: Instead of modifying the existing data, new data is inserted.

Both methods achieve MVCC but require additional space. Comparing the two, the second method simplifies transaction rollback and avoids running out of backup space. The first method, however, simplifies cleanup and prevents data scans from increasing data reads. PostgreSQL adopts the second approach, implementing MVCC through insertion.

3. MVCC Implementation in PostgreSQL

3.1 Visibility Determination

For MVCC to work, the concept of data versioning must be defined. PostgreSQL's definition is as follows:

typedef struct HeapTupleFields {
    TransactionId t_xmin;    /* inserting xact ID */
    TransactionId t_xmax;    /* deleting or locking xact ID */

    union {
        CommandId  t_cid;    /* inserting or deleting command ID, or both */
        TransactionId t_xvac;  /* old-style VACUUM FULL xact ID */
    } t_field3;
} HeapTupleFields;

struct HeapTupleHeaderData {
    union {
        HeapTupleFields t_heap;
        DatumTupleFields t_datum;
    } t_choice;

    ItemPointerData t_ctid;    /* current TID of this or newer tuple (or a speculative insertion token) */
    ...
};

In this structure, each tuple's header stores the transaction ID for data insertion (t_xmin) and deletion or update (t_xmax). A zero t_xmax value indicates that the data hasn't been deleted or updated. These values are immutable once set.

Consider an example where a record with a=2 and b=2 was inserted by a transaction with ID 10 (t_xmin=10). Initially, t_xmax=0, meaning it hasn't been deleted or updated. An UPDATE statement changes a to 6, setting t_xmax to 11 (indicating that transaction 11 deleted it), and creates a new record (without modifying the original one). Although two records exist, visibility must be determined using transaction snapshots and commit logs.

PostgreSQL uses snapshots to determine which transactions are currently active. A transaction's updates and writes are invisible to others if it hasn't completed. Snapshot data structure:

typedef struct SnapshotData {
    SnapshotSatisfiesFunc satisfies;  /* tuple test function */
    TransactionId xmin;      /* all XID < xmin are visible */
    TransactionId xmax;      /* all XID >= xmax are invisible */

    TransactionId *xip;
    uint32 xcnt;      /* # of xact IDs in xip[] */

    TransactionId *subxip;
    int32 subxcnt;    /* # of xact IDs in subxip[] */
    bool suboverflowed;  /* has the subxip array overflowed? */

    bool takenDuringRecovery;  /* recovery-shaped snapshot? */
    bool copied;      /* false if it's a static snapshot */

    CommandId curcid;      /* in my xact, CID < curcid are visible */

    uint32 speculativeToken;

    uint32 active_count;  /* refcount on ActiveSnapshot stack */
    uint32 regd_count;    /* refcount on RegisteredSnapshots */
    pairingheap_node ph_node;  /* link in the RegisteredSnapshots heap */

    TimestampTz whenTaken;    /* timestamp when snapshot was taken */
    XLogRecPtr lsn;      /* position in the WAL stream when taken */
} SnapshotData;

Transactions with IDs less than xmin are visible; those with IDs greater than or equal to xmax are invisible. Transactions between xmin and xmax might still be active, requiring an array (xip) to track them. If found in xip, the transaction is ongoing and invisible.

When fetching data, it first checks the snapshot to see if the transaction is complete. If not, it's invisible. If complete, it checks whether it was committed or aborted by querying the commit log (CLOG). For performance, PostgreSQL also employs tuple header flags to reduce CLOG queries.

3.2 Commit/Abort

In PostgreSQL, a transaction can end in one of two states: Commit or Abort:

Commit: Writes to the Write-Ahead Log (WAL) and CLOG on commit. Post-commit, the transaction's changes are visible to others.
Abort: Writes to WAL and CLOG on abort. Post-abort, the transaction's changes remain invisible to others.