Dirt simple SQL queries in F#

Kasey Speakman - Oct 3 '17 - - Dev Community

SQL is something many of us have gone to great lengths to try to abstract away. The Repository pattern, ORMs, and Type Providers are examples of abstractions around SQL. But each of these have trade-offs which do not fit every use case.

Today I put forth something that is less of an abstraction and more of a set of helper functions. This has trade-offs too, but it fits my needs very well. See what you think.

What it's like to use

It doesn't try to hide SQL from you, but just tries to get out of your way while you use SQL. Here's how you can define a query.

open QueryHelpers

let listCourses offset limit =
    sql
        """
        SELECT CourseId, CourseName
          FROM Course
        OFFSET @Offset ROWS
         FETCH
          NEXT @Limit ROWS ONLY
        ;
        """
        [
            p "Offset" offset
            p "Limit" limit
        ]
Enter fullscreen mode Exit fullscreen mode

The helper functions in the above example are sql and p. Actually, p (short for parameter) is just a shortcut for making a tuple and boxing📌 the parameter value.

📌 Boxing parameter values

Boxing means converting something to type obj. For primitive types like int, converting it to an obj has a memory/performance penalty. (Because it has to be wrapped in a reference type and allocated by the garbage collector IIRC.)

At the time of this writing, ADO.NET requires boxing parameter values before it will use them. So there is no disadvantage to doing so, and it can prevent type inference problems in our F# code.

Here's how you would run the query.

type Course =
    {
        CourseId : int
        CourseName : string
    }

let query = listCourses request.Offset request.Limit
let coursesAsync = Sql.query<Course> connectString query

// coursesAsync is Async<Course array>
Enter fullscreen mode Exit fullscreen mode

Here, Sql.query<Course> is a function which runs the query and converts each data row into a Course object. Like with most mappers, the property types and names have to match the returned columns.

Note that the query is created separately from the code which executes the query. This is particularly handy for queries which perform updates. I can setup multiple writes ahead of time, even from separate pieces of code, then perform them later in the same transaction.

let deactivateCourse courseId =
    sql "UPDATE ..." [ p "CourseId" courseId ]

let cancelCourseRegistrations courseId =
    sql "DELETE ..." [ p "CourseId" courseId ]

...

let patches =
    [
        deactivateCourse courseId
        cancelCourseRegistrations courseId
    ]

...

Sql.writeBatch connectString patches

// returns Async<unit>

Enter fullscreen mode Exit fullscreen mode

Implementing it

Here is the entire 40-line implementation for the usages above.

namespace Foo.MsSql

open System
open System.Data.SqlClient


type SqlQuery =
    {
        Query : string
        Parameters : (string * obj) list
    }


module QueryHelpers =

    let p name value =
        (name, box value)


    let sql query parameters =
        {
            Query = query
            Parameters = parameters
        }


module Sql =

    open Dapper  // adds QueryAsync and ExecuteAsync


    let query<'T> connectString q =
        async {
            use connection = new SqlConnection(connectString)
            do! connection.OpenAsync() |> Async.AwaitTask
            let! result =
                connection.QueryAsync<'T>(q.Query, dict q.Parameters)
                    |> Async.AwaitTask
            return Array.ofSeq result
        }


    let writeBatch connectString writes =
        async {
            use connection = new SqlConnection(connectString)
            do! connection.OpenAsync() |> Async.AwaitTask
            use transaction = connection.BeginTransaction()
            for write in writes do
                do!
                    connection.ExecuteAsync
                        (write.Query, dict write.Parameters, transaction)
                        |> Async.AwaitTask
                        |> Async.Ignore
            transaction.Commit()
        }

Enter fullscreen mode Exit fullscreen mode

Dapper does the hard work of mapping rows to objects. Dapper is awesome.

This code is only a starting point, but it is pretty open for adding new features. For instance, I have added the following over time which I may cover in a later post:

  • returning multiple data sets from a query
  • more settings like command timeout
  • batching by combining writes into one query different from above by making only one round-trip to database
  • large result set handling returns a sequence that doesn't read into memory until iterated
  • Postgres version
  • parameter types mainly for Postgres jsonb type, which Npgsql does not correctly infer

I did have to add Dapper Type Handlers for option types. But with that, I can map null DB fields to F# Option, and avoid null handling.

What about ...?

Getting only 1 row

Since the data is returned as an array, you can use F# Array functions like Array.tryHead to get the first row if one exists.

Getting a scalar

Dapper supports converting rows to primitives like int. So you can use Sql.query<int> and that will return the int value out of the first column for each row. Then you can use Array functions to get only the first row.

How far can this get you?

I use a few principles which makes this (plus the additions I mention) satisfy all my SQL needs.

First, I separate decisions from side-effects. So my code returns messages representing what decisions and events happened as a result of the user's request. Later in the pipeline, these decisions are turned into side effects like saving to the database or sending emails.

Second, I model the data multiple ways. The first model is the authoritative model (source of truth) on which the others are based. This could be an event log or normalized relational tables. This is where most people stop when using a relational database. However, I additionally create models which are tailor-fit for specific purposes.

For example, I store a Course's data (which has several layers of nested objects) as a JSON object, since it is convenient to load and save it that way for Add/Edit/View scenarios. I have a separate model (table) for CourseSearch which consists of some of the same data, but as relational columns. These columns are used for filtering and sorting and populating a text search index, to enable users to perform Full Text Searches. Both models are useful for their specific purposes. This is Separation of Concerns at the data level. But if I tried to squeeze both into the same mega-model, it would be hard to work with. Because I'd always have to think of the mega-model from two different dimensions at once in order to make the right choices about it. My dimensionally-challenged brain can't do that consistently.

Third, as a rule, I do not have queries which both change data and return data. It's either/or. That way I can't introduce bugs due to unanticipated side effects. As a consequence, I cannot use auto-generated IDs. But in the end that turns out to be ok. Because they complicate other features, like retries. And they have to be abandoned (or outsourced) anyway once you outgrow a single server.

Due to the above, my queries stay relatively simple and targeted for a specific purpose. So a simple SQL API meets all my needs.

for each desired change, make the change easy (warning: this may be hard), then make the easy change

  • Kent Beck
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player