In SQL databases, isolation levels are a hierarchy of update anomaly prevention. Then, people think that the higher is the better, and that when a database provides Serializable there's no need for Read Committed. However:

Read Committed is the default in PostgreSQL. The consequence is that the majority of applications are using it (and use SELECT ... FOR UPDATE) to prevent some anomalies
Serializable doesn't scale with pessimistic locking. Distributed databases use optimistic locking, and you need to code their transaction retry logic

With those two, a distributed SQL database that doesn't provide Read Committed isolation cannot claim PostgreSQL compatibility, because running applications that were build for PostgreSQL defaults is impossible.

YugabyteDB started with the "the higher the better" idea and Read Committed is transparently using "Snapshot Isolation". This is correct for new applications. However, when migrating applications built for Read Committed, where you don't want to implement a retry logic on serializable failures (SQLState 40001), and expect the database to do it for you. You can switch to Read Committed with the **yb_enable_read_committed_isolation** gflag.

Note: a GFlag in YugabyteDB is a global configuration parameter for the database, documented in yb-tserver reference. The PostgreSQL parameters, which can be set by the ysql_pg_conf_csv GFlag concern only the YSQL API but GFlags covers all YugabyteDB layers

In this blog post I'll demo the real value of Read Committed isolation level: there's no need to code a retry logic because, at this level, YugabyteDB can do it itself.

Start YugabyteDB

I am starting a YugabyteDB single node database for this simple demo:



Franck@YB:~ $ docker  run --rm -d --name yb       \
 -p7000:7000 -p9000:9000 -p5433:5433 -p9042:9042  \
 yugabytedb/yugabyte                              \
 bin/yugabyted start --daemon=false               \
 --tserver_flags=""

53cac7952500a6e264e6922fe884bc47085bcac75e36a9ddda7b8469651e974c

I explicitly didn't set any GFlags to show the default behaviour. This is version 2.13.0.0 build 42.

I check the read committed related gflags



Franck@YB:~ $ curl -s http://localhost:9000/varz?raw | grep -E "\
(yb_enable_read_committed_isolation\
|ysql_output_buffer_size\
|ysql_sleep_before_retry_on_txn_conflict\
|ysql_max_write_restart_attempts\
|ysql_default_transaction_isolation\
)"

--yb_enable_read_committed_isolation=false
--ysql_max_write_restart_attempts=20
--ysql_output_buffer_size=262144
--ysql_sleep_before_retry_on_txn_conflict=true
--ysql_default_transaction_isolation=

Read Committed is the default isolation level, by PostgreSQL compatibility:



Franck@YB:~ $ psql -p 5433 \
-c "show default_transaction_isolation"

 default_transaction_isolation
-------------------------------
 read committed
(1 row)

I create a simple table:



Franck@YB:~ $ psql -p 5433 -ec "
create table demo (id int primary key, val int);
insert into demo select generate_series(1,100000),0;
"

create table demo (id int primary key, val int);
insert into demo select generate_series(1,100000),0;

INSERT 0 100000

I'll run the following update, setting the default isolation level to Read Committed (just in case - but it is the default):



Franck@YB:~ $ cat > update1.sql <<'SQL'
\timing on
\set VERBOSITY verbose
set default_transaction_isolation to "read committed";
update demo set val=val+1 where id=1;
\watch 0.1
SQL

This will update one row.
I'll run this from multiple sessions, on the same row:



Franck@YB:~ $ timeout 60 psql -p 5433 -ef update1.sql >session1.txt &
Franck@YB:~ $ timeout 60 psql -p 5433 -ef update1.sql >session2.txt &
[1] 760
[2] 761

psql:update1.sql:5: ERROR:  40001: Operation expired: Transaction a83718c8-c8cb-4e64-ab54-3afe4f2073bc expired or aborted by a conflict: 40001
LOCATION:  HandleYBStatusAtErrorLevel, pg_yb_utils.c:405

[1]-  Done                    timeout 60 psql -p 5433 -ef update1.sql > session1.txt

Franck@YB:~ $ wait

[2]+  Exit 124                timeout 60 psql -p 5433 -ef update1.sql > session1.txt

On session encountered Transaction ... expired or aborted by a conflict. If you run the same several times, you may also get Operation expired: Transaction aborted: kAborted, All transparent retries exhausted. Query error: Restart read required or All transparent retries exhausted. Operation failed. Try again: Value write after transaction start. They are all ERROR 40001 which are serialization errors that expect the application to retry.

In Serializable, the whole transaction must be retried, and this is generally not possible to do transparently by the database, that doesn't know what else the application did during the transaction. For example, some rows may have already been read, and sent to the user screen or a file. The database cannot rollback that. The applications must handle that.

I've set \Timing on to get the elapsed time and, as I'm running this on my laptop, there's not client-server network significant time:



Franck@YB:~ $ awk '/Time/{print 5*int($2/5)}' session?.txt | sort -n | uniq -c

    121 0
     44 5
     45 10
     12 15
      1 20
      1 25
      2 30
      1 35
      3 105
      2 110
      3 115
      1 120

Most updates were less than 5 millisecond here. But remember that the program failed on 40001 quickly so this is the normal one-session workload on my laptop.

By default yb_enable_read_committed_isolation is false and in this case the Read Committed isolation level of YugabyteDB's transactional layer falls back to the stricter Snapshot Isolation (in which case READ COMMITTED and READ UNCOMMITTED of YSQL use Snapshot Isolation).

yb_enable_read_committed_isolation=true

Now changing this setting, which is what you should do when you want to be compatible with your PostgreSQL application that doesn't implement any retry logic.



Franck@YB:~ $ docker rm -f yb

yb
[1]+  Exit 124                timeout 60 psql -p 5433 -ef update1.sql > session1.txt

Franck@YB:~ $ docker  run --rm -d --name yb       \
 -p7000:7000 -p9000:9000 -p5433:5433 -p9042:9042  \
 yugabytedb/yugabyte                \
 bin/yugabyted start --daemon=false               \
 --tserver_flags="yb_enable_read_committed_isolation=true"

fe3e84c995c440d1a341b2ab087510d25ba31a0526859f08a931df40bea43747

Franck@YB:~ $ curl -s http://localhost:9000/varz?raw | grep -E "\
(yb_enable_read_committed_isolation\
|ysql_output_buffer_size\
|ysql_sleep_before_retry_on_txn_conflict\
|ysql_max_write_restart_attempts\
|ysql_default_transaction_isolation\
)"

--yb_enable_read_committed_isolation=true
--ysql_max_write_restart_attempts=20
--ysql_output_buffer_size=262144
--ysql_sleep_before_retry_on_txn_conflict=true
--ysql_default_transaction_isolation=

Running the same as above:



Franck@YB:~ $ psql -p 5433 -ec "
create table demo (id int primary key, val int);
insert into demo select generate_series(1,100000),0;
"

create table demo (id int primary key, val int);
insert into demo select generate_series(1,100000),0;

INSERT 0 100000

Franck@YB:~ $ timeout 60 psql -p 5433 -ef update1.sql >session1.txt &
Franck@YB:~ $ timeout 60 psql -p 5433 -ef update1.sql >session2.txt &
[1] 1032
[2] 1034

Franck@YB:~ $ wait

[1]-  Exit 124                timeout 60 psql -p 5433 -ef update1.sql > session1.txt
[2]+  Exit 124                timeout 60 psql -p 5433 -ef update1.sql > session2.txt

I got no error at all, and both sessions have been updating the same row during 60 seconds.

Of course, it wasn't exactly at the same time as the database had to retry many transactions, which is visible in the elapsed time:



Franck@YB:~ $ awk '/Time/{print 5*int($2/5)}' session?.txt | sort -n | uniq -c

    325 0
    199 5
    208 10
     39 15
     11 20
      3 25
      1 50
     34 105
     40 110
     37 115
     13 120
      5 125
      3 130

While most of transactions are still less than 10 milliseconds, some when to 120 milliseconds because of retries.

retry backoff

A common retry waits an exponential amount of time between each retries, up to a maximum. This is what is implemented in YugabyteDB and the 3 following parameters, that can be set at session level, controls it:



Franck@YB:~ $ psql -p 5433 -xec "
select name, setting, unit, category, short_desc
from pg_settings
where name like '%retry%backoff%';
"

select name, setting, unit, category, short_desc
from pg_settings
where name like '%retry%backoff%';

-[ RECORD 1 ]---------------------------------------------------------
name       | retry_backoff_multiplier
setting    | 2
unit       |
category   | Client Connection Defaults / Statement Behavior
short_desc | Sets the multiplier used to calculate the retry backoff.
-[ RECORD 2 ]---------------------------------------------------------
name       | retry_max_backoff
setting    | 1000
unit       | ms
category   | Client Connection Defaults / Statement Behavior
short_desc | Sets the maximum backoff in milliseconds between retries.
-[ RECORD 3 ]---------------------------------------------------------
name       | retry_min_backoff
setting    | 100
unit       | ms
category   | Client Connection Defaults / Statement Behavior
short_desc | Sets the minimum backoff in milliseconds between retries.

With my local database, transactions are short and I don't have to wait so much time. When adding set retry_min_backoff to 10; to my update1.sql the elapsed time is not inflated too much by this retry logic:



Franck@YB:~ $ awk '/Time/{print 5*int($2/5)}' session?.txt | sort -n | uniq -c

    338 0
    308 5
    302 10
     58 15
     12 20
      9 25
      3 30
      1 45
      1 50

yb_debug_log_internal_restarts

The restarts are transparent. If you want to see the reason for restarts, or the reason why it is not possible, you can get it logged with yb_debug_log_internal_restarts=true



# log internal restarts
export PGOPTIONS='-c yb_debug_log_internal_restarts=true'

# run concurrent sessions
timeout 60 psql -p 5433 -ef update1.sql >session1.txt &
timeout 60 psql -p 5433 -ef update1.sql >session2.txt &

# tail the current logfile
docker exec -i yb bash <<<'tail -F $(bin/ysqlsh -twAXc "select pg_current_logfile()")'

Versions

This was implemented in YugabyteDB 2.13 and I'm using 2.13.1 here. It is not yet implemented when running the transaction from DO or ANALYZE commands, but works for procedures. You can follow and comment issue #12254 if you want it in DO or ANALYZE.

https://github.com/yugabyte/yugabyte-db/issues/12254

In conclusion

Implementing retry logic in the application is not a fatality but a choice in YugabyteDB. A distributed database may raise restart errors because of clock skew, but still needs to make it transparent to SQL applications when possible.

If you want to prevent all transactions anomalies (see this one as an example), you can run in Serializable and handle the 40001 exception. Don't be fooled by the idea that it requires more code because, without it, you need to test all race conditions, which may be a bigger effort. In Serializable, the database ensures that you have the same behavior than running serially so that your unit tests are sufficient to guarantee correctness of data.

However, with an existing PostgreSQL application, using the default isolation level, the behavior is validated by years of running in production. What you want is not avoiding the possible anomalies, because the application probably workaround them. You want to scale-out without changing the code. This is where YugabyteDB provides the Read Committed isolation level which requires no additional error handling code.

Read Committed is a must for Postgres-compatible distributed SQL databases