Introduction

As backend engineers, we're often tasked with building systems that can scale and handle numerous resources, users, and entities, each needing unique identification. In many cases, using sequential IDs (e.g., 1, 2, 3) seems like a straightforward solution, but this can quickly become problematic as your application grows and scales across distributed systems. This is where UUIDs (Universally Unique Identifiers) come in.

In this blog post, we'll explore:

What a UUID is
Real-world use cases of UUIDs
How to implement UUIDs in Python
Risks of ignoring UUIDs
Common mistakes when using UUIDs
Best practices for using UUIDs

What is a UUID?

A UUID (Universally Unique Identifier) is a 128-bit number used to uniquely identify information in computer systems. It is designed to be globally unique, meaning that UUIDs generated independently in different systems will not conflict.

A UUID looks like this:

66e69275-c6bc-800c-90a6-2f41cb991502

It consists of 32 hexadecimal digits, displayed in five groups separated by hyphens, in the form 8-4-4-4-12.

Real-World Use Cases for UUIDs

Database Keys in Distributed Systems: In systems where different databases or microservices need to generate unique IDs without communicating with each other, UUIDs ensure uniqueness. For example, in a distributed e-commerce platform, each service might independently generate order or transaction IDs, and UUIDs will avoid any collision.
Session IDs: UUIDs are commonly used to identify user sessions in web applications. They are particularly useful when you need to maintain session information without leaking sensitive or predictable data.
File or Resource Identifiers: When you need to track files, documents, or any resource across various platforms or databases, a UUID can be assigned to each resource for easy lookup without the risk of duplicates.
APIs and External References: Exposing sequential or easily guessable IDs (e.g., user/1, user/2) in an API can lead to privacy vulnerabilities. By using UUIDs (e.g., user/66e69275-c6bc-800c-90a6-2f41cb991502), you reduce the likelihood of users guessing and accessing resources that don't belong to them.

Implementing UUIDs in Python

Python’s uuid library makes it simple to generate and manage UUIDs. Here's how:

import uuid

# Generate a UUID
generated_uuid = uuid.uuid4()
print(f"Generated UUID: {generated_uuid}")

The uuid4() function generates a random UUID based on random or pseudo-random numbers, which is the most common variant used in web development.

Example: Using UUIDs as Primary Keys in Databases

When using databases like PostgreSQL, it’s common to use UUIDs as primary keys. Here’s how you could set it up in Python with SQLAlchemy:

from sqlalchemy import Column, String
from sqlalchemy.dialects.postgresql import UUID
import uuid
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class User(Base):
    __tablename__ = 'users'

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4, unique=True, nullable=False)
    username = Column(String, nullable=False)

# This will generate a UUID primary key for each new user.

In this example, we define the id field as a UUID, ensuring each user will have a unique identifier that won't conflict with other records, even across distributed databases.

Risks of Ignoring UUIDs

Ignoring UUIDs in favor of sequential or auto-incrementing IDs can pose several risks:

Security Vulnerabilities: Sequential IDs are predictable, making it easy for attackers to enumerate records and discover sensitive data. For example, if user IDs are sequential, an attacker might attempt to guess other user IDs and access unauthorized accounts.
Data Collisions: In a distributed system, relying on auto-incrementing integers can lead to ID collisions, especially when multiple services or databases are generating IDs without central coordination.
Data Migration and Merging Issues: When combining databases or migrating data across systems, having non-unique sequential IDs can cause conflicts. UUIDs avoid these problems by guaranteeing uniqueness.

Common Mistakes When Using UUIDs

Storing UUIDs as Strings: A common mistake is storing UUIDs as strings, which wastes space and can slow down queries, especially in large databases. Most modern databases, like PostgreSQL, have native UUID types that store UUIDs efficiently.

Wrong:
```
CREATE TABLE users (
    id VARCHAR(36) PRIMARY KEY
);
```
Right:
```
CREATE TABLE users (
    id UUID PRIMARY KEY
);
```
Not Using the Correct UUID Version: There are several versions of UUIDs (e.g., uuid1(), uuid3(), uuid4(), uuid5()), each suited to specific use cases. uuid4(), based on random numbers, is the most commonly used for generating unique IDs in web applications. Be mindful of which version you’re using and whether it fits your needs.
Ignoring Collision Possibilities: While UUIDs are designed to be unique, there’s a very small chance of collision. For most applications, the risk is negligible, but if you’re generating billions of UUIDs or operating in highly sensitive environments, you should implement collision detection.

Best Practices for Using UUIDs

Use UUIDs for External References: When exposing IDs in URLs or APIs, prefer UUIDs to sequential IDs. This enhances security and makes it harder for users to predict resource IDs.
Store UUIDs in Native Formats: Use the database's native UUID type to store UUIDs instead of strings. This reduces storage space and improves query performance.
Choose the Right UUID Version: In most cases, uuid4() (random-based UUID) is the best choice for generating unique identifiers in web applications. However, if you need deterministically generated UUIDs, you might consider uuid3() or uuid5() (namespace-based UUIDs).
Validate UUIDs: When accepting UUIDs from user input, always validate them to ensure they are properly formatted before processing. In Python, you can use UUID objects to check the validity of a string.

def is_valid_uuid(uuid_to_test, version=4):
    try:
        uuid_obj = uuid.UUID(uuid_to_test, version=version)
        return str(uuid_obj) == uuid_to_test
    except ValueError:
        return False

# Example usage
print(is_valid_uuid("66e69275-c6bc-800c-90a6-2f41cb991502"))  # True
print(is_valid_uuid("invalid-uuid-string"))  # False

Conclusion

UUIDs are powerful tools for generating unique identifiers in distributed systems and ensuring security in web applications. They help you avoid issues like data collisions, predictable ID attacks, and ID conflicts during database migrations. By understanding and following best practices for UUIDs, you can build more robust, scalable, and secure backend systems.

Remember to use the appropriate UUID version, store them correctly in your databases, and be mindful of their potential risks. With these tips, you’ll be well-equipped to handle UUIDs effectively in your projects!

Feel free to comment below if you have any questions or additional tips about UUIDs! Happy coding!

Understanding UUIDs: A Backend Engineer’s Guide for Junior Developers