Setup Guide: pgvector with Docker

yukaty - Nov 5 - - Dev Community

Have you ever wondered how Netflix suggests movies you might like, or how Spotify creates personalized playlists? These features often use something called "vector similarity search" - a powerful way to find related content. In this guide, we'll set up a PostgreSQL database with pgvector extension using Docker to build similar features.


Contents


What is Vector Search?

When AI analyzes content (text, images, or products), it creates a special list of numbers (called a "vector" or "embedding") that represents that item's characteristics. Similar items will have similar numbers. pgvector helps us store and search these numbers efficiently.

If you're not familiar with Machine Learning, don't worry! You can easily obtain these embeddings from popular AI APIs like OpenAI's API, even without deep AI knowledge. These embeddings are the building blocks for creating recommendation engines and similarity search features.

Let's get started! 🚀


Prerequisites

Make sure you have Docker Desktop installed on your computer.


Step-by-Step Setup

1. Create docker-compose.yml

Create a docker-compose.yml file in your project root to define the PostgreSQL container.

services:
  db:
    image: pgvector/pgvector:pg17 # PostgreSQL with pgvector support
    container_name: pgvector-db
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
      POSTGRES_DB: example_db
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./postgres/schema.sql:/docker-entrypoint-initdb.d/schema.sql

volumes:
  pgdata: # Stores data outside the container to ensure persistence
Enter fullscreen mode Exit fullscreen mode

2. Define Database Schema (schema.sql)

Create the postgres directory in the project root, and then create a schema.sql file to define your initial schema. This example schema enables pgvector extension and creates a table for storing items with vector embeddings.

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create example table
CREATE TABLE items (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    metadata JSONB,
    embedding vector(1536) -- vector data
);
Enter fullscreen mode Exit fullscreen mode

3. Start Docker Compose

Run Docker Compose to build and start the PostgreSQL container with pgvector.

docker compose up --build
Enter fullscreen mode Exit fullscreen mode

4. Verify the Database and Extensions

Once the container is running, connect to PostgreSQL to verify the setup.

docker exec -it pgvector-db psql -U postgres -d example_db
Enter fullscreen mode Exit fullscreen mode

In the PostgreSQL shell, run:

-- Check installed extensions
\dx

-- Check if your table exists
\dt
Enter fullscreen mode Exit fullscreen mode

Using Your Vector Database

Here's a simple example of how to find similar items:

-- Find items similar to a specific vector
SELECT id, name, metadata
FROM items
ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Replace [0.1, 0.2, ...] with your actual vector from an AI service like OpenAI.


Troubleshooting

Error: Port 5432 already in use

Change the port in docker-compose.yml to 5433 or another free port.

Can't connect to database

Check if the container is up.

  docker ps
Enter fullscreen mode Exit fullscreen mode

Database not initializing properly

Remove the volume and restart.

  docker-compose down -v    # Remove existing volume
  docker-compose up --build # Start fresh
Enter fullscreen mode Exit fullscreen mode

No idea what's wrong

Check the container logs.

  docker compose logs db
Enter fullscreen mode Exit fullscreen mode

Next Steps

Now that your vector database is set up, you can:

  • Generate embeddings using AI services like OpenAI
  • Store your data with its embeddings
  • Build search features that find similar items

Resources


Spot any mistakes or have a better way? Please leave a comment below! 🙌

. . . .
Terabox Video Player