vec2pg: Migrate to pgvector from Pinecone and Qdrant

Yuri - Sep 3 - - Dev Community

vec2pg is a CLI utility for migrating data from vector databases to Supabase, or any Postgres instance with pgvector.

example of result

⚡️ More on Launch Week

Objective

Our goal with https://github.com/supabase-community/vec2pg is to create an easy on-ramp to efficiently copy your data from various vector databases into Postgres with associated ids and metadata. The data loads into a new schema with a table name that matches the source e.g. vec2pg.<collection_name> . That output table uses https://github.com/pgvector/pgvector's's vector type for the embedding/vector and the builtin json type for additional metadata.

example dashboard

Once loaded, the data can be manipulated using SQL to transform it into your preferred schema.

When migrating, be sure to increase your Supabase project's disk size so there is enough space for the vectors.

Vendors

At launch we support migrating to Postgres from Pinecone and Qdrant. You can vote for additional providers in the issue tracker and we'll reference that when deciding which vendor to support next.

Throughput when migrating workloads is measured in records-per-second and is dependent on a few factors:

  • the resources of the source data
  • the size of your Postgres instance
  • network speed
  • vector dimensionality
  • metadata size

When throughput is mentioned, we assume a Small Supabase Instance, a 300 Mbps network, 1024 dimensional vectors, and reasonable geographic colocation of the developer machine, the cloud hosted source DB, and the Postgres instance.

Pinecone

vec2pg copies entire Pinecone indexes without the need to manage namespaces. It will iterate through all namespaces in the specified index and has a column for the namespace in its Postgres output table.

example second output

Given the conditions noted above, expect 700-1100 records per second.

Qdrant

The qdrant subcommand supports migrating from cloud and locally hosted Qdrant instances

instances

Again, with the conditions mentioned above, Qdrant collections migrate at between 900 and 2500 records per second.

Why Use Postgres/pgvector?

The main reasons to use Postgres for your vector workloads are the same reasons you use Postgres for all of your other data. Postgres is performant, scalable, and secure. Its a well understood technology with a wide ecosystem of tools that support needs from early stage startups through to large scale enterprise.

A few game changing capabilities that are old hat for Postgres that haven't made their way to upstart vector DBs include:

Backups

Postgres has extensive supports for backups and point-in-time-recovery (PITR). If your vectors are included in your Postgres instance you get backup and restore functionality for free. Combining the data results in one fewer systems to maintain. Moreover, your relational workload and your vector workload are transactionally consistent with full referential integrity so you never get dangling records.

Row Security

Row Level Security (RLS) allows you to write a SQL expression to determine which users are allowed to insert/update/select individual rows.

For example:

create policy "Individuals can view their own todos."
  on public.todos
  for select
  using
    ( ( select auth.uid() ) = user_id );

Enter fullscreen mode Exit fullscreen mode

Allows users of Supabase APIs to update their own records in the todos table.

Since vector is just another column type in Postgres, you can write policies to ensure e.g. each tenant in your application can only access their own records. That security is enforced at the database level so you can be confident each tenant only sees their own data without repeating that logic all over API endpoint code or in your client application.

Performance

pgvector has world class performance in terms of raw throughput and dominates in performance per dollar. Check out some of our prior blog posts for more information on functionality and performance:

Keep an eye out for our upcoming post directly comparing pgvector with Pinecone Serverless.

Next Steps

To get started, head over to the vec2pg GitHub Page, or if you're comfortable with CLI help guides, you can install it using pip:

pip install vec2pg

Enter fullscreen mode Exit fullscreen mode

If your current vector database vendor isn't supported, be sure to weigh in on the vendor support issue.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player