A Universally Unique Identifier (UUID) is a 128-bit label used in computer systems to identify information uniquely. UUIDs are designed to be unique across space and time, allowing them to be generated independently without a central authority, minimising the risk of duplication.

UUIDs serve various purposes, including:

Identifying records in databases.
Tagging objects in distributed systems.
Serving as primary keys in applications where uniqueness is critical.

Real-world Use Cases

Databases: UUID is used as the primary key in relational databases to ensure the unique identification of records.
Microservices: Facilitate service communication by providing unique identifiers for requests and resources.
IoT Devices: Identify devices uniquely in a network, ensuring that data from multiple sources can be aggregated without conflicts.

Advantages and Disadvantages in use of UUID

Advantages:

Global Uniqueness: UUIDs are extremely unlikely to collide, making them suitable for distributed systems where multiple nodes generate identifiers independently.
No Central Authority Required: They can be generated without coordination, which simplifies operations in distributed environments.
Scalability: They work well in systems that require scaling across multiple servers or services.

Disadvantages:

Storage Size: UUIDs consume more space (128 bits) compared to traditional integer IDs (typically 32 bits), which can lead to increased storage costs.
Performance Issues: Indexing UUIDs can degrade database performance due to their randomness and size, leading to slower query times compared to sequential IDs.
User Unfriendliness: UUIDs are not easily memorable or user-friendly when presented in user interfaces.

The Standard

The standard representation of a UUID consists of 32 hexadecimal characters divided into five groups, separated by hyphens, following the format 8-4-4-4-12, resulting in a total of 36 characters (32 alphanumeric plus 4 hyphens).

The UUID format can be visualized as follows:

xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx

Where:

M indicates the UUID version.
N indicates the variant, which helps interpret the UUID's layout.

Components of a UUID

TimeLow: 4 bytes (8 hex characters) representing the low field of the timestamp.
TimeMid: 2 bytes (4 hex characters) representing the middle field of the timestamp.
TimeHighAndVersion: 2 bytes (4 hex characters) that include the version number and the high field of the timestamp.
ClockSequence: 2 bytes (4 hex characters) used to help avoid collisions, especially when multiple UUIDs are generated in quick succession or if the system clock is adjusted.
Node: 6 bytes (12 hex characters), typically representing the MAC address of the generating node.

Types of UUIDs

Version 1: Time-based UUIDs that use a combination of the current timestamp and the MAC address of the generating node. This version ensures uniqueness across space and time.
Version 2: Similar to version 1 but includes local domain identifiers; however, it is less commonly used due to its limitations.
Version 3: Name-based UUIDs generated using an MD5 hash of a namespace identifier and a name.
Version 4: Randomly generated UUIDs that provide high randomness and uniqueness, with only a few bits reserved for versioning.
Version 5: Like version 3 but uses SHA-1 for hashing, making it more secure than version 3.

Variants

The variant field in a UUID determines its layout and interpretation. The most common variants include:

Variant 0: Reserved for NCS backward compatibility.
Variant 1: The standard layout used for most UUIDs.
Variant 2: Used for DCE Security UUIDs, which are less common.
Variant 3: Reserved for future definitions.

Example

For Version 4, a UUID might look like this:

550e8400-e29b-41d4-a716-446655440000

Here:

41d4 indicates it's a version 4.
a7 represents the variant, in this case, the common "Leach-Salz" variant.

How UUIDs are Calculated

Version 1 (Time-based):
- The timestamp is typically the number of 100-nanosecond intervals since October 15, 1582 (the date of the Gregorian calendar reform).
- The node is the MAC address of the machine generating the UUID.
- The clock sequence helps ensure uniqueness when the clock time changes (e.g., due to system restarts).
Version 3 and Version 5 (Name-based):
- A namespace (like a DNS domain) is combined with a name (like a file path or URL) and hashed.
- The hash (MD5 for version 3, SHA-1 for version 5) is then structured into a UUID format, ensuring the version and variant fields are properly set.
Version 4 (Random-based):
- Random or pseudo-random numbers are generated for the 122 bits of the UUID.
- The version and variant fields are set accordingly, ensuring compliance with UUID standards.

UUIDv4 Calculation Example

Step 1: Generate 128 Random Bits

Let's assume we generate the following 128-bit random value:

11001100110101101101010101111010101110110110111001011101010110110101111011010011011110100100101111001011

Step 2: Apply UUIDv4 Version and Variant

Version: Replace bits 12-15 (4th character) with 0100 (for UUID version 4).
Original: 1100 becomes 0100 → Updated value in this position.
Variant: Replace bits 6-7 of the 9th byte with 10 (for the RFC 4122 variant).
Original: 11 becomes 10 → Updated value in this position.

Step 3: Format into Hexadecimal

Convert the 128-bit binary into 5 hexadecimal groups:

32-bit group: 11001100110101101101010101111010 → ccda55ba
16-bit group: 1011101101101110 → b76e
16-bit group: 0100010101000101 → 4545 (with 0100 for version 4)
16-bit group: 1010110111110010 → adf2 (with 10 for the variant)
48-bit group: 11010011011110100100101111001011 → d39d25cb

Step 4: Combine the Groups

The final UUID would look like this:
ccda55ba-b76e-4545-adf2-d39d25cb

Everything you need to know about UUID.