In a distributed system, we often need to store data that we do not own. We might use it as a unique identifier across domains or the system we do own needs to proxy it to yet another service.
+----+--------------+----------------------+
| id | name | backend |
+----+--------------+----------------------+
| 42 | Marley Spoon | elixir, ruby, python |
+----+--------------+----------------------+
{
"id": 42,
"name": "Marley Spoon",
"backend": ["elixir", "ruby", "python"]
}
+------------+--------------+----------------------+
| company_id | company_name | backend |
+------------+--------------+----------------------+
| 42 | Marley Spoon | elixir, ruby, python |
+------------+--------------+----------------------+
But here's the problem: as engineers, we look at _id
fields and immediately think of it as integers. However, the consuming service has no control over the data it receives and the data type is only assumed.
If you use that distributed ID field as a local foreign key: some external system controls the value and an unforeseen change might break our setup.
Identification
It has proven valuable to us to use the pattern *_identifier
to indicate that…
- it is some kind of a unique identifier
- some other system has control over it
If the *_identifier
value is to be stored, it should always be saved as a string type. Almost anything can be coerced into a string, and that way we guarantee that the origin system can choose whatever they want for their unique identifier.
+--------------------------------------+--------------+----
| company_identifier | company_name | …
+--------------------------------------+--------------+----
| 328129ae-df4e-4168-94d3-2572b4b343ef | Marley Spoon | …
+--------------------------------------+--------------+----
Payload
If the system exposing the data is controlled by your organisation, we can support this at the source.
{
"company_identifier": "328129ae-df4e-4168-94d3-2572b4b343ef",
"company_name": "Marley Spoon",
"backend": ["elixir", "ruby", "python"]
}
Summary
- Use
*_identifier
instead of*_id
fields for externally owned data - Prefer a string type over integer for
*_identifier
values - Avoid a 1:1-map of your persistence model to your external API