Lesson goal

You will learn about some things that you might come across when discussing database administration with other engineers and DBAs.

Database course index

This is part of a full course on persistence in postgres with typeorm and sql!

There is a github repo to go with this course. See part 2 for instructions.

PgTune

This is a website that can help you to set the basic settings for a db based on available compute.

Check it out yourself at https://pgtune.leopard.in.ua/

Stored procedures

We tend to use procedural code in our application layer instead of stored procedures if using an ORM but it’s worth knowing that they’re there and are very useful.

A stored procedure is a procedural bit of code that’s stored on the database itself.

If you do your procedural code in a stored procedure it means that you have put business logic on your database which isn’t ideal.

But it does mean that you reduce round trips to the database which is always useful.

Stored procedures can also be used by any client of the database in the same way, rather than having multiple clients have slightly different implementations of a piece of code.

You create them with

create [or replace] procedure procedure_name(parameter_list)
language plpgsql
as $$
declare
-- variable declaration
begin
-- stored procedure body
end; $$

and to execute a stored procedure with type orm you use a query.

const result = await this.connection.query(
  'CALL myStoredProcedure (:param1value)',
  [param1value]
)

Triggers

You can set a database to perform an action automatically when something changes. E.g. on deletion of a record you can have the database log that somewhere else.

This is different to typeorms events which operate on changes to the object model. Triggers are on the database itself.

This is a powerful feature but generally not recommend. It leaks business logic into trigger definitions that are not part of your main code and will be difficult for other devs to find.

Triggers significantly increase cognitive load for engineers working on your application code.

Replication

This is a technique to improve database performance and provide some resiliency.

Replication is the process of creating a full copy of your dataset so more clients can use it. Consider if you have clients in USA and Australia. You can replicate the entire database in both locations so everyone gets the same speed for reads.

In the event of a disaster with one data base you know you have an instance of the database still running.

Sharding and partitioning

Partitioning is a technique that’s used to improve database performance where you split your data up to improve some metric or issue you’ve identified - performance or maintainability for example.

E.g. if a table grows too large to search you can horizontally partition it across multiple logic units by putting rows 1-100 million on one partition and 100-200 million on another partition. The database will be able to figure out which partition likely contains the item you want and will start looking there.

Another example of partitioning is vertical partitioning where you manually partition a wide table to be more normalised to improve performance. You might extract a large free text column to it’s own table to avoid selecting it for every query.

Sharding is a term for horizontal partitioning. Sharding specifically implies splitting data across different physical databases. Each shard would have the same schema, but a different set of data to the other shards.

Regional tenancy is an example of this horizontal partitioning/sharding, where you might put all USA customers on a USA database but all Australian customers on an Australian database.

The database knows where to look based on the shard key. In the case of a regional tenancy sharding process this would be “Region”.

If your application gets to the level of usage where you need to consider partitioning or sharding instead of increasing compute resources then you probably need a DBA.

A special case of partitioning that we always need to consider is when using document databases like dynamo db.

You have to set a partition key on a dynamoDB instance and it will be highly dependant on your access pattern. You should choose an key that will result in even distribution of your reads across partitions.

See https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/ for more details.

Database encryption

Encryption will come up in most database discussions. Clients will want to know that their data is secure.

Almost all managed instances of a database will store the data on disk using encryption. Most managed instances of a database will enforce ssl connections to the database so that you have end to end encryption.

If both of these items aren’t enforce by default then you should turn them on.

To use ssl in typeorm and postgres, use the extra params option in your database config and set it to true.

{
  "type": "postgres",
  "host": "host.docker.internal",
  "port": 5412,
  "username": "postgres",
  "password": "samplePassword",
  "database": "learn_databases",
  "migrationsTableName": "migrations",
  "migrationsRun": true,
  "logging": true,
  "synchronize": false,
  "entities": ["dist/**/*.entity.{ts,js}"],
  "migrations": ["dist/**/migrations/*.{ts,js}"],
  "cli": {
    "migrationsDir": "src/migrations"
  },
  "extra": {
    "ssl": true
  }
}

To connect to a postgres instance with ssl you might have to include a certificate from the provider. TypeORM facilitates this for postgres by allowing you to pass configuration data supported by the specific database. AWS RDS requires this.

{
  "type": "postgres",
  "host": "host.docker.internal",
  "port": 5412,
  "username": "postgres",
  "password": "samplePassword",
  "database": "learn_databases",
  "migrationsTableName": "migrations",
  "migrationsRun": true,
  "logging": true,
  "synchronize": false,
  "entities": ["dist/**/*.entity.{ts,js}"],
  "migrations": ["dist/**/migrations/*.{ts,js}"],
  "cli": {
    "migrationsDir": "src/migrations"
  },
  "extra": {
    "ssl": {
      "ca": "-----BEGIN CERTIFICATE----------END CERTIFICATE-----\n"
    }
  }
}

Backups!

Almost all managed database providers back up your data. This is different for each provider so make sure you understand what kind of back up frequency you have.

There are two measurements to consider here. The recovery point objective (RPO) and recovery time objective (RTO).

If you don’t have a replicated database sitting there waiting to take over from the main database, it might take you a bit of time to restore service from a backup. This results in a low recovery time. If you have a replica waiting there you can just switch to it and your RTO is minutes.

If your backups are every hour then you will have a maximum data loss of 1 hour. This is an RPO of 1 hour.

Which approach you need will be highly dependent on your application!

Summary

These are some common terms you will hear people mention when working on databases. You should have an understanding of why these activities might be required.

Managing a busy database is a skill that requires years of training and experience and should be left to an expert. If an application starts to get super successful you will need to hire a Database Administrator (DBA).

A glossary for database administration