A Universally Unique Identifier (UUID) is a 128-bit label used in computer systems to identify information uniquely. UUIDs are designed to be unique across space and time, allowing them to be generated independently without a central authority, minimising the risk of duplication.
UUIDs serve various purposes, including:
- Identifying records in databases.
- Tagging objects in distributed systems.
- Serving as primary keys in applications where uniqueness is critical.
Real-world Use Cases
- Databases: UUID is used as the primary key in relational databases to ensure the unique identification of records.
- Microservices: Facilitate service communication by providing unique identifiers for requests and resources.
- IoT Devices: Identify devices uniquely in a network, ensuring that data from multiple sources can be aggregated without conflicts.
Advantages and Disadvantages in use of UUID
Advantages:
- Global Uniqueness: UUIDs are extremely unlikely to collide, making them suitable for distributed systems where multiple nodes generate identifiers independently.
- No Central Authority Required: They can be generated without coordination, which simplifies operations in distributed environments.
- Scalability: They work well in systems that require scaling across multiple servers or services.
Disadvantages:
- Storage Size: UUIDs consume more space (128 bits) compared to traditional integer IDs (typically 32 bits), which can lead to increased storage costs.
- Performance Issues: Indexing UUIDs can degrade database performance due to their randomness and size, leading to slower query times compared to sequential IDs.
- User Unfriendliness: UUIDs are not easily memorable or user-friendly when presented in user interfaces.
The Standard
The standard representation of a UUID consists of 32 hexadecimal characters divided into five groups, separated by hyphens, following the format 8-4-4-4-12
, resulting in a total of 36 characters (32 alphanumeric plus 4 hyphens).
The UUID format can be visualized as follows:
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
Where:
- M indicates the UUID version.
- N indicates the variant, which helps interpret the UUID's layout.
Components of a UUID
- TimeLow: 4 bytes (8 hex characters) representing the low field of the timestamp.
- TimeMid: 2 bytes (4 hex characters) representing the middle field of the timestamp.
- TimeHighAndVersion: 2 bytes (4 hex characters) that include the version number and the high field of the timestamp.
- ClockSequence: 2 bytes (4 hex characters) used to help avoid collisions, especially when multiple UUIDs are generated in quick succession or if the system clock is adjusted.
- Node: 6 bytes (12 hex characters), typically representing the MAC address of the generating node.
Types of UUIDs
Version 1: Time-based UUIDs that use a combination of the current timestamp and the MAC address of the generating node. This version ensures uniqueness across space and time.
Version 2: Similar to version 1 but includes local domain identifiers; however, it is less commonly used due to its limitations.
Version 3: Name-based UUIDs generated using an MD5 hash of a namespace identifier and a name.
Version 4: Randomly generated UUIDs that provide high randomness and uniqueness, with only a few bits reserved for versioning.
Version 5: Like version 3 but uses SHA-1 for hashing, making it more secure than version 3.
Variants
The variant field in a UUID determines its layout and interpretation. The most common variants include:
- Variant 0: Reserved for NCS backward compatibility.
- Variant 1: The standard layout used for most UUIDs.
- Variant 2: Used for DCE Security UUIDs, which are less common.
- Variant 3: Reserved for future definitions.
Example
For Version 4, a UUID might look like this:
550e8400-e29b-41d4-a716-446655440000
Here:
-
41d4
indicates it's a version 4. -
a7
represents the variant, in this case, the common "Leach-Salz" variant.
How UUIDs are Calculated
-
Version 1 (Time-based):
- The timestamp is typically the number of 100-nanosecond intervals since October 15, 1582 (the date of the Gregorian calendar reform).
- The node is the MAC address of the machine generating the UUID.
- The clock sequence helps ensure uniqueness when the clock time changes (e.g., due to system restarts).
-
Version 3 and Version 5 (Name-based):
- A namespace (like a DNS domain) is combined with a name (like a file path or URL) and hashed.
- The hash (MD5 for version 3, SHA-1 for version 5) is then structured into a UUID format, ensuring the version and variant fields are properly set.
-
Version 4 (Random-based):
- Random or pseudo-random numbers are generated for the 122 bits of the UUID.
- The version and variant fields are set accordingly, ensuring compliance with UUID standards.
UUIDv4 Calculation Example
Step 1: Generate 128 Random Bits
Let's assume we generate the following 128-bit random value:
11001100110101101101010101111010101110110110111001011101010110110101111011010011011110100100101111001011
Step 2: Apply UUIDv4 Version and Variant
Version: Replace bits 12-15 (4th character) with
0100
(for UUID version 4).
Original:1100
becomes0100
→ Updated value in this position.Variant: Replace bits 6-7 of the 9th byte with
10
(for the RFC 4122 variant).
Original:11
becomes10
→ Updated value in this position.
Step 3: Format into Hexadecimal
Convert the 128-bit binary into 5 hexadecimal groups:
- 32-bit group:
11001100110101101101010101111010
→ccda55ba
- 16-bit group:
1011101101101110
→b76e
- 16-bit group:
0100010101000101
→4545
(with0100
for version 4) - 16-bit group:
1010110111110010
→adf2
(with10
for the variant) - 48-bit group:
11010011011110100100101111001011
→d39d25cb
Step 4: Combine the Groups
The final UUID would look like this:
ccda55ba-b76e-4545-adf2-d39d25cb