Idempotent seeds in Elixir

bitcrowd - Mar 14 - - Dev Community

A standard Phoenix app contains a priv/repo/seeds.exs script file, which populates a database when it is run, so that developers can work with a conveniently prepared environment.

At Bitcrowd, we like our seeds to be idempotent. In practice, this means that running mix run priv/repo/seeds.exs multiple times will not create more rows each time, but rather upsert the existing data. While it is always possible to drop the local database with mix ecto.reset, we might want to keep the current state of our development database.

Upsert all the things!

Our strategy here is to use insert! with a conflict_target on the :id. Let's take a classic blog app with Post and Comment schemas for the sake of the example. In our seeds.exs file we add this little helper:

def insert_idempotently(schema) do
  Repo.insert!(schema, on_conflict: :replace_all, conflict_target: :id)
end
Enter fullscreen mode Exit fullscreen mode

And instead of inserting a seed record like so:

Repo.insert!(%Post{title: "Hello World!"})
Enter fullscreen mode Exit fullscreen mode

We wrap our insert in the helper:

@id = "64af9d13-0f60-45fc-971f-07e6b490c059"
insert_idempotently(%Post{id: @id, title: "Hello World!"})
Enter fullscreen mode Exit fullscreen mode

Nice 🎉

Next time we run the seeds, we will not get a second Post row in the database if a post with this :id already exists. This keeps our development environment neat and clean.

Idempotency for has_many associations

While the previous example is fairly simple, hardcoding UUIDs has its limitations when seeding has_many associations. For example, let's say we want to insert 50 Comments associated to the Post:

@id = "64af9d13-0f60-45fc-971f-07e6b490c059"
post = insert_idempotently(%Post{id: @id, title: "Hello World!"})

Enum.each(1..50, fn index -> 
  insert_idempotently(%Comment{
    id: ???, 
    message: "Comment #{index}", 
    post_id: post.id
  })
end)
Enter fullscreen mode Exit fullscreen mode

Of course one could add 50 lines of hardcoded UUIDs… Or could we generate deterministic UUIDs from the index value? Yes we can ⚡️!

Deterministic UUID v4 from a string

Our deterministic UUID generator should take a string as an argument, and always return the same UUID for the same argument. We first need to hash our string, and then to extract the number of bits that we need. UUIDs have a consistent structure: "64af9d13-0f60-45fc-971f-07e6b490c059": one group of 8 characters, then three groups of 4, and finally a group of 12, all separated by a -.

This is how it works:

def deterministic_uuid4(string) do
  # Hash the string and extract the 128 bits,
  # and match on the length of our characters group
  <<a::size(32), b::size(16), c::size(16), d::size(16), e::size(48), _rest::binary>> =
    :crypto.hash(:sha256, string)

  # Override some bits (necessary to create valid UUID v4)
  c = bor(band(c, 0x0FFF), 0x4000)
  d = bor(band(d, 0x0FFF), 0x8000)

  # Glue all of the chunks together and turn it into a string
  Enum.map_join([a, b, c, d, e], "-", &Base.encode16(:binary.encode_unsigned(&1)))
end
Enter fullscreen mode Exit fullscreen mode

Let's see it in action:

iex(1)> MyApp.Seeds.deterministic_uuid4("foo")
"2C26B46B-68FF-468F-899B-453C1D304134"

iex(2)> MyApp.Seeds.deterministic_uuid4("foo")
"2C26B46B-68FF-468F-899B-453C1D304134"

iex(3)> MyApp.Seeds.deterministic_uuid4("bar")
"FCDE2B2E-DBA5-4BF4-8860-1FB721FE9B5C"
Enter fullscreen mode Exit fullscreen mode

Finally, let's validate that our generated UUID is valid with the uuid utility:

iex(1)> UUID.info("2C26B46B-68FF-468F-899B-453C1D304134")
{:ok,
 [
   uuid: "2C26B46B-68FF-468F-899B-453C1D304134",
   binary: <<44, 38, 180, 107, 104, 255, 70, 143, 137, 155, 69, 60, 29, 48, 65,
     52>>,
   type: :default,
   version: 4,
   variant: :rfc4122
 ]}
Enter fullscreen mode Exit fullscreen mode

Let's rewrite our seeds to make use of our brand new function:

@id = "64af9d13-0f60-45fc-971f-07e6b490c059"
post = insert_idempotently(%Post{id: @id, title: "Hello World!"})

Enum.each(1..50, fn index -> 
  insert_idempotently(%Comment{
    id: deterministic_uuid4("comment-#{index}"), 
    message: "Comment #{index}", 
    post_id: post.id
  })
end)
Enter fullscreen mode Exit fullscreen mode

Amazing! We won't get 50 new rows of Comment each time we run the seeds script. Our development database is clean and we made our developers happy ☕️.

UUID Version-5

To ruin the party, deterministic UUID generation is exactly what UUID v5 is designed for. And since Ecto does not validate UUIDs against their specs, you might as well use uuid again and do:

iex(7)> UUID.uuid5(:nil, "foo")
"aa752cea-8222-5bc8-acd9-555b090c0ccb"

iex(8)> UUID.uuid5(:nil, "foo")
"aa752cea-8222-5bc8-acd9-555b090c0ccb"
Enter fullscreen mode Exit fullscreen mode

But what's the fun in that 🤷‍♀️.

Additional resources


If you enjoyed reading this, you might be interested in working with Elixir at bitcrowd. Check our job offerings!

. . . . . . . . . . .
Terabox Video Player