At first glance, UUIDs (Universally Unique Identifiers) and ULIDs (Universally Unique Lexicographically Sortable Identifiers) are widely used identifiers in databases and distributed systems. Each has unique characteristics that make them suitable for various scenarios. In this article, we’ll delve into the features of UUIDs and ULIDs, and discuss when to use each. If you are currently using an auto-increment type primary key without much consideration, this article might give you some valuable insights.
Comparison Table
Feature | Auto Increment | UUID v4 | UUID v7 | ULID |
---|---|---|---|---|
Data Type (MySQL) | INT, BIGINT | CHAR(36) | CHAR(36) | CHAR(26) |
Sort | ❌ | ❌ | ✅ | ✅ |
Size | 4 bytes (for INT) | 16 bytes | 16 bytes | 16 bytes |
Example | 1, 2, 3, ... | d61f91c3-d3bf-4b34-9894-e21bfa277ca4 | 019020e0-cd2a-730a-a8ea-11ec3ddc847f | 01J0GCBEEDPE3VDR0NBJ8TM8NQ |
If You Don't Want to Use Auto Increment Type
Auto Increment is a mechanism that automatically generates a unique identifier in the database, typically a numeric column that increments with each new record. However, there are significant security and privacy concerns:
- Predictability: Since Auto Increment IDs are sequential, it is easy to predict the next ID. This increases the risk that an attacker could infer the internal structure of the system and attempt unauthorized access.
- Risk of Information Leakage: Sequential IDs can reveal patterns in the company’s activities. For example, a competitor might analyze the sequential IDs to infer the frequency of product releases or user registrations.
Example:
- A competitor figured out how often a company releases new products by analyzing the sequential IDs. This allowed them to predict release timings and adjust their strategy accordingly.
- The sequential IDs used to manage payments could reveal the number of user registrations and paid subscriptions if exposed.
UUID (Universally Unique Identifier)
A UUID is a 128-bit identifier used widely in distributed systems, with multiple versions available, each having a different generation method.
UUID v4
UUID v4 is commonly used due to its simplicity and high uniqueness. It generates a random 128-bit value, making it highly unique.
Generation Method:
-
Set Version Bit: Set 4 specific bits (version field) to
0100
. -
Set Variant Bits: Set 2 specific bits (variant field) to
10
.
Here’s a code snippet to generate a UUID v4 in Go:
package main
import (
"fmt"
"github.com/google/uuid"
)
func main() {
uuidV4 := uuid.New()
fmt.Println(uuidV4)
}
Example Output:
d61f91c3-d3bf-4b34-9894-e21bfa277ca4
UUID v7
UUID v7 is a recent proposal designed to be sortable by incorporating timestamps into the identifier.
Generation Method:
- Get Timestamp: Obtain the current timestamp in milliseconds and convert it to a 48-bit string.
- Generate Random Bits: Fill the remaining 80 bits with random values.
-
Set Version Bit: Set the version field to
0111
.
Here’s how to generate a UUID v7 in Go:
package main
import (
"crypto/rand"
"fmt"
"time"
)
type UUID [16]byte
func NewUUIDv7() UUID {
var uuid UUID
timestamp := uint64(time.Now().UnixNano() / int64(time.Millisecond))
uuid[0] = byte(timestamp >> 40)
uuid[1] = byte(timestamp >> 32)
uuid[2] = byte(timestamp >> 24)
uuid[3] = byte(timestamp >> 16)
uuid[4] = byte(timestamp >> 8)
uuid[5] = byte(timestamp)
randomBytes := make([]byte, 10)
if _, err := rand.Read(randomBytes); err != nil {
panic(err)
}
copy(uuid[6:], randomBytes)
// Set version (7) and variant bits (2 MSB as 01)
uuid[6] = (uuid[6] & 0x0f) | (7 << 4)
uuid[8] = (uuid[8] & 0x3f) | 0x80
return uuid
}
func main() {
uuidV7 := NewUUIDv7()
fmt.Printf("%x\n", uuidV7)
}
Example Output:
019020e0-cd2a-730a-a8ea-11ec3ddc847f
Extracting Timestamps from UUID v7:
package main
import (
"crypto/rand"
"fmt"
"time"
)
type UUID [16]byte
func NewUUIDv7() UUID {
var uuid UUID
timestamp := uint64(time.Now().UnixNano() / int64(time.Millisecond))
uuid[0] = byte(timestamp >> 40)
uuid[1] = byte(timestamp >> 32)
uuid[2] = byte(timestamp >> 24)
uuid[3] = byte(timestamp >> 16)
uuid[4] = byte(timestamp >> 8)
uuid[5] = byte(timestamp)
randomBytes := make([]byte, 10)
if _, err := rand.Read(randomBytes); err != nil {
panic(err)
}
copy(uuid[6:], randomBytes)
// Set version (7) and variant bits (2 MSB as 01)
uuid[6] = (uuid[6] & 0x0f) | (7 << 4)
uuid[8] = (uuid[8] & 0x3f) | 0x80
return uuid
}
func ExtractTimestampFromUUIDv7(uuid UUID) time.Time {
timestamp := uint64(uuid[0])<<40 |
uint64(uuid[1])<<32 |
uint64(uuid[2])<<24 |
uint64(uuid[3])<<16 |
uint64(uuid[4])<<8 |
uint64(uuid[5])
return time.Unix(0, int64(timestamp)*int64(time.Millisecond))
}
func (uuid UUID) String() string {
return fmt.Sprintf("%08x-%04x-%04x-%04x-%012x",
uuid[0:4],
uuid[4:6],
uuid[6:8],
uuid[8:10],
uuid[10:16])
}
func main() {
uuid := NewUUIDv7()
fmt.Println(uuid.String())
timestamp := ExtractTimestampFromUUIDv7(uuid)
fmt.Println(timestamp)
}
Example Output:
019020e0-cd2a-730a-a8ea-11ec3ddc847f
2024-06-16 11:48:41.898 +0000 UTC
ULID (Universally Unique Lexicographically Sortable Identifier)
ULID is designed to be a sortable and human-readable alternative to UUIDs, with a focus on chronological order.
Generation Method:
- Get Timestamp: Obtain the current timestamp in milliseconds and convert it to a 48-bit string.
- Generate Random Values: Fill the remaining 80 bits with random values.
- Encoding: Encode the generated bits using Crockford’s Base32.
Here’s how to generate a ULID in Go:
package main
import (
"fmt"
"github.com/oklog/ulid/v2"
"math/rand"
"time"
)
func main() {
entropy := ulid.Monotonic(rand.New(rand.NewSource(time.Now().UnixNano())), 0)
ulidInstance := ulid.MustNew(ulid.Timestamp(time.Now()), entropy)
fmt.Println(ulidInstance)
// Extracting and formatting the timestamp
timestamp := time.Unix(0, int64(ulidInstance.Time())*int64(time.Millisecond))
fmt.Println(timestamp.Format(time.RFC3339))
}
Example Output:
01HZYC2028WMB3NJ16WCV9Z9E0
2024-06-09 11:27:38.056 +0000 UTC
Performance Considerations and Recommendations
While UUID v4 is purely random and does not support sorting, UUID v7 and ULID provide sortable identifiers based on timestamps. However, using UUIDs and ULIDs has performance implications compared to auto-incrementing numeric types.
If You Do Not Want to Use UUID or ULID
Even if we consider the adoption of UUID and ULID from the issue of Auto Increment, as mentioned above, there are other issues with UUID and ULID. I'll try to summarize it again:
-
UUID v4:
- Completely random values lead to performance degradation due to non-sortability.
-
UUID v7 / ULID:
- Poor performance compared to auto-numbering numbers.
- Leakage of generation time (timestamp).
To illustrate a concrete example, let's take the case of a large-scale e-commerce site that handles millions of products.
Background:
- The database stores product details, user purchase history, reviews, and more. More data is added every day, and query performance is critical.
Challenges:
- Performance: Database performance is critical due to the large amount of data being added. In particular, it is often
used to search for products and obtain the purchase history of users.
- Privacy: Leaking a user's purchase history or review timestamps can identify patterns of user behavior.
UUID v4 Issues:
- The order in which the data is inserted is disjointed, leading to index fragmentation and poor query performance.
UUID v7/ULID Issues:
- The insertion order is preserved, but the ID of the string type is larger than the numeric type, increasing the size of the index.
- Because it includes a timestamp, the time at which the data was generated is deducible, which is risky from a user privacy perspective.
Performance Concerns:
- UUID v4: Random writes can degrade performance due to reduced cache hit rates.
- UUID v7/ULID: Slightly better performance than UUID v4 but still less efficient than auto-increment numbers. Timestamps in UUID v7 and ULID can leak generation times.
Recommendation:
For large-scale applications, consider using auto-increment numeric types for primary keys to ensure optimal performance. For public-facing identifiers, generate a separate random string (UUID or ULID) to enhance security and privacy.
Conclusion
Choosing the right identifier depends on your specific use case. While UUIDs and ULIDs offer unique advantages, they also come with performance and privacy trade-offs. By understanding these trade-offs, you can make informed decisions that balance security, performance, and usability.
For further reading and implementation details, refer to the official documentation and libraries for UUIDs and ULIDs. Implementing these identifiers thoughtfully can significantly enhance the robustness and security of your systems.