This post is about a lesser known 'feature' of postgres: the loading or creation of the pg_internal.init files (plural) that happens when logging on to a postgres database. Because YugabyteDB reuses the postgres source, YugabyteDB performs dealing with the pg_internal.init files identical to postgres.

Let me first talk about why I am performing this writeup. During testing I noticed the first connection to postgres taking more time than successive logons. This is easy to miss, because when I tested again, I didn't see it anymore. Then when the postmaster was restarted, the very first connection showed higher latency again. This means that in specific cases, which is if these deal with databases that are accessed just after postmaster startup, there might be extra work performed that increases logon latency. This is important for doing automated tests on newly build or created databases.

Before diving into the details: is this important for concurrent postgres/YugabyteDB access, and does it influence regular database activity: no(!)

The phenomenon of performing extra work to create the pg_internal.init files only happens with the very first backend to logon to a database on a given postmaster. So, why this writeup, where is this important for? This is important for latency testing, to understand this clear outlier.

Closely related to the latency differences for logging on: the first connection latency is influenced by the number of objects in the database, and can get significant with high (relative to processing power ~ > 1000) number of objects, where YugabyteDB is more sensitive to this than stock postgres, because YugabyteDB loads the catalog data from the master leader, whilst stock postgres loads it from local files or the buffer cache.

So, what is happening here?
During the logon to a database in postgres, the backend is created by forking the postmaster, and the backend builds caches in its private memory. To be more precise: each individual backend creates caches with the needed minimal catalog data into its private memory so it has just enough catalog data to function.

As part of the backend creation, it creates a cache called 'relcache' or relation cache, which contains relation descriptors. These descriptors are quite comprehensive, which is why the very first session to enter a database after postmaster startup will gather the global relcache information, and gather the local, database specific relcache information, and write these to a file in memory format. This way, any following session can use the files to load its relcache, without the need to perform the work to obtain all the information for each relation in the relcache.

Again: if you have a normal functioning postgres or YugabyteDB database cluster, this phenomenon should not influence normal, daily operations. However, in specific situations, such as carefully timing logon after builds, this is very important to understand, because the first logon will have a higher latency.

To see the effect of the pg_internal.init files creation, you can use the [connection-profiler project].(https://github.com/fritshoogland-yugabyte/connection-profiler.git).

install connection-profiler

In order to install connection-profiler, you need rust to compile the connection-profiler, and you need git to clone the connection-profiler project.

Get git and gcc

sudo yum install git gcc -y

Clone the connection-profiler repository

git clone https://github.com/fritshoogland-yugabyte/connection-profiler.git

Get rust

curl https://sh.rustup.rs -sSf | sh -s -- -y

Source the rust environment

source "$HOME/.cargo/env"

Go into the connection-profiler directory and build a production (release) binary:

cd connection-profiler
cargo build --release

Once finished, the executable is in the target/release directory with the name connection-profiler.

PostgreSQL

To test with postgres startup the postmaster. Like hopefully is clear from the description: do not attach with any client before the testing, or that client will have performed the initialisation of the pg_internal.init files.

Now run the connection-profiler by performing a number (5) connections, not executing any SQL:

[postgres@localhost connection-profiler]$ ./target/release/connection-profiler -u "user=postgres password=postgres port=5432 host=localhost" -q '' -n 5
create_connection                              5001 us
------------------------------------------------------------
============================================================
create_connection                              2724 us
------------------------------------------------------------
============================================================
create_connection                              2523 us
------------------------------------------------------------
============================================================
create_connection                              2365 us
------------------------------------------------------------
============================================================
create_connection                              2380 us
------------------------------------------------------------
============================================================

Clearly, the first connection is slower than the other connections. If the same sequence of connecting 5 times is repeated, the first connection does not show the higher latency again:

[postgres@localhost connection-profiler]$ ./target/release/connection-profiler -u "user=postgres password=postgres port=5432 host=localhost" -q '' -n 5
create_connection                              2602 us
------------------------------------------------------------
============================================================
create_connection                              3074 us
------------------------------------------------------------
============================================================
create_connection                              2799 us
------------------------------------------------------------
============================================================
create_connection                              2319 us
------------------------------------------------------------
============================================================
create_connection                              2503 us
------------------------------------------------------------
============================================================

YugabyteDB

YugabyteDB uses the postgres source-code, and there behaves identical. However, the principal difference of YugabyteDB versus postgres is that the storage is changed to networked sources. This gives a lot of flexibility and scalability, but currently means extra latency is introduced. This is how the latencies look like on a small, local virtual machine based cluster:

[vagrant@yb-1 connection-profiler]$ ./target/release/connection-profiler -u "user=yugabyte password=yugabyte port=5433 host=localhost" -q '' -n 5
create_connection                            187654 us
------------------------------------------------------------
============================================================
create_connection                             22404 us
------------------------------------------------------------
============================================================
create_connection                             17495 us
------------------------------------------------------------
============================================================
create_connection                             17815 us
------------------------------------------------------------
============================================================
create_connection                             18213 us
------------------------------------------------------------
============================================================

Redoing the repeated connections gives more consistent response times:

[vagrant@yb-1 connection-profiler]$ ./target/release/connection-profiler -u "user=yugabyte password=yugabyte port=5433 host=localhost" -q '' -n 5
------------------------------------------------------------
============================================================
create_connection                             18996 us
------------------------------------------------------------
============================================================
create_connection                             19581 us
------------------------------------------------------------
============================================================
create_connection                             21980 us
------------------------------------------------------------
============================================================
create_connection                             21924 us
------------------------------------------------------------
============================================================

As indicated: the higher latency is something we are working on, on making it better, although it's not extremely far off of native postgres latency.

The first connection above which builds the pg_internal.init files will request catalog data to build these files, which is done to its proxy thread in its local tablet server, which then performs an RPC request to the current master leader for the actual catalog data.

Postgres first database connection latency

install connection-profiler

PostgreSQL

YugabyteDB