Name: Vector Databases for generative AI applications
Rating: 4.1 (3808 reviews)
Author: abhirockzz

How to overcome LLM limitations using Vector databases and RAG

I first shared this blog from my session (at GIDS 2024). If you attended it, thank you for coming and I hope you found it useful! If not, well, you have the resources and links anyway – I have written out the talk, so that you can follow along with the slides if you need more context.

Hopefully the folks at GIDS will publish the video as well. I will add the link once it's available.

Key info

Slides available here
GitHub repository - code and instructions on how to get the demo up and running

Summarised version of the talk

I had 30-mins – so, I kept it short and sweet!

Setting the context

Foundation models (FMs) are the heart of generative AI. These models that are pre-trained on vast amounts of data. Large language models (LLMs) are a class of FMs. For instance, Claude family from Anthropic, Llama from Meta etc.
You generally access these using dedicated platforms. For example Amazon Bedrock, which is a fully managed service with a wide range of models accessible via APIs. These models are pretty powerful, and they can be used standalone used to build generative AI apps.

So, why do we need vector databases?

To better understand this, lets take a step back and talk about the limitations of LLMs. I will highlight a few common ones.

LLM Limitations

Knowledge cut-off: The knowledge of these models is often limited to the data that was current at the time it was pre-trained or fine-tuned.
Hallucination: Sometimes, these models provide an incorrect response, quite “confidently”.

Another one is lack of lack of access to external data sources.

Think about it - You can setup an AWS account and start using models on Amazon Bedrock. But, if you want to build generative AI applications that are specific to your business needs, you need domain or company specific private data (example, a customer service chatbot that can access customer details, order info, etc.)

Now its possible to train or fine tune these models with your data – but its not trivial or cost effective. But there are techniques to work around these constraints – RAG (discussed later) being one of them and Vector Databases play a key role.

Dive into Vector Databases

Before we get into it, lets understand..

What is a Vector?

In simple terms - Vectors are numerical representation of text.

There is input text (also called prompt)
You pass it through something called an embedding model - think of as a stateless function
You get an output which is an array of floating point numbers

What’s important to understand is that Vectors capture semantic meaning. So they can be used for relevancy or context based search, rather than simple text search.

I tend to categorise Vector databases as two types:

Vector data type support within existing databases, such as PostgreSQL, Redis, OpenSearch, MongoDB, Cassandra, etc.
And the other category is for specialised vector databases, like Pinecone, Weaviate, Milvus, Qdrant, ChromaDB, etc.

This field is also moving very fast and I’m sure we will see a lot more in the near future!

Now you can run these specialised vector stores on AWS, via their dedicated cloud offerings. But I want to quickly give you a glimpse of the choices in terms of the first category that I referred to.

They are supported as native AWS database(s)

This includes:

Amazon OpenSearch service
Amazon Aurora with PostgreSQL compatibility
Amazon DocumentDB (with MongoDB compatibility)
Amazon MemoryDB for Redis which currently has Vector search in preview (at the time of writing)

Here is a simplified view of where vector databases sit in generative AI solutions

You take your domain-specific data, split/chunk them up
Pass them through an embedding model - This gives you these vectors or embeddings,
Store these embeddings in a vector database
And, then there are applications that execute semantic search queries and combine them in various ways (RAG being one of them)

Demo 1 (of 3) - Semantic Search with OpenSearch and LangChain

Find the details here - https://github.com/abhirockzz/langchain-opensearch-rag

RAG – Retrieval Augmented Generation

We covered the Limitations of LLM – knowledge cut-off, hallucination, no access to internal data, etc. Of course, there are multiple ways to overcome this.

Prompt-engineering techniques: zero-shot , few-shot etc. Sure this is cost-effective but how would this apply to domain-specific data?
Fine-tuning: Take an existing LLM and train it using specific dataset. But what about the infra and costs involved? Do you want to become a model development company or focus on your core business?

These are just a few examples.

RAG technique adopts a middle ground.

There are two key parts to a RAG workflow:
Part 1: Data ingestion is where you take your source data (pdf, text, images, etc.), break it down into chunks, pass it through an embedding model and store it in the vector database.
Part 2: This involves the end-user application (e.g. a chatbot). The user sends a query – this input is converted to vector embedding using the same (embedding) model that was used for the source data. And we then execute a semantic or similarity search to get the top-N closest results.

That’s not all.

Part 3: These results, also referred to as ”context” are then combined with the user input and a specialised prompt. Finally this is sent to a LLM – note this not the embedding model, this is a large language model. The added context in the prompt helps the model provide a more accurate and relevant response to the user’s query.

Demo 2 (of 3) - RAG with OpenSearch and LangChain

Find the details here - https://github.com/abhirockzz/langchain-opensearch-rag

Fully-managed RAG experience - Knowledge Bases for Amazon Bedrock

Another approach is to have a managed solution to take care of the heavy lifting. For example, if you use Amazon Bedrock, then Knowledge Bases can make RAG easier and manageable. It supports the entire RAG workflow, from ingestion, to retrieval, and prompt augmentation.

And it supports multiple vector stores to store vector embedding data.

Demo 3 (of 3) - Full-managed RAG Knowledge Bases for Amazon Bedrock

Find the details here - https://github.com/abhirockzz/langchain-opensearch-rag

Now how do we build RAG applications using this?

For application integration, this is exposed by APIs:

RetrieveAndGenerate: Call the API, get the response - that's it. Everything (query embedding, semantic search, prompt engineering, LLM orchestration) is handled!
Retrieve: For custom RAG workflows, where you simply extract the top-N responses (like semantic search) and integrate the rest as per your choice.

Where do I learn more?

Documentation is a great place to start! Specifically, Knowledge bases
Code samples for Amazon Bedrock
Lots of content and practical solutions on the generative AI community space!
Ultimately, there is no replacement for hands-on learning. Head over to Amazon Bedrock and start building!

Wrap up

And, that's it. Like I said, I had 30-mins and I kept it short and sweet! This area is evolving very quickly. This includes vector databases, LLMs (there is one every week - feels like JavaScript frameworks era!), frameworks (like LangChain, etc.). It's hard to keep up, but remember, the fundamentals are the same. The key is to grasp them - hopefully this helps with some of it.

Happy Building!