As AI continues to shape the way we work and interact with technology, many businesses are looking for ways to leverage their own data within intelligent applications. If you've used tools like ChatGPT or Azure OpenAI, you're already familiar with how generative AI can improve processes and enhance user experiences. However, for truly customized and relevant responses, your applications need to incorporate your proprietary data.
This is where Retrieval-Augmented Generation (RAG) comes in, providing a structured approach to integrating data retrieval with AI-powered responses. With frameworks like LlamaIndex, you can easily build this capability into your solutions, unlocking the full potential of your business data.
Want to quickly run and explore the app? Click here.
What is RAG - Retrieval-Augmented Generation?
Retrieval-Augmented Generation (RAG) is a neural network framework that enhances AI text generation by including a retrieval component to access relevant information and integrate your own data. It consists of two main parts:
- Retriever: A dense retriever model (e.g., based on BERT) that searches a large corpus of documents to find relevant passages or information related to a given query.
- Generator: A sequence-to-sequence model (e.g., based on BART or T5) that takes the query and the retrieved text as input and generates a coherent, contextually enriched response.
The retriever finds relevant documents, and the generator uses them to create more accurate and informative responses. This combination allows the RAG model to leverage external knowledge effectively, improving the quality and relevance of the generated text.
How does LlamaIndex implement RAG?
To implement a RAG system using LlamaIndex, follow these general steps:
Data Ingestion:
- Load your documents into LlamaIndex.ts using a document loader such as
SimpleDirectoryReader
, which helps in importing data from various sources like PDFs, APIs, or SQL databases. - Break down large documents into smaller, manageable chunks using the
SentenceSplitter
.
Index Creation:
- Create a vector index of these document chunks using
VectorStoreIndex
, allowing efficient similarity searches based on embeddings. - Optionally, for complex datasets, use recursive retrieval techniques to manage hierarchically structured data and retrieve relevant sections based on user queries.
Query Engine Setup:
- Convert the vector index into a query engine using
asQueryEngine
with parameters such assimilarityTopK
to define how many top documents should be retrieved. - For more advanced setups, create a multi-agent system where each agent is responsible for specific documents, and a top-level agent coordinates the overall retrieval process.
Retrieval and Generation:
- Implement the RAG pipeline by defining an objective function that retrieves relevant document chunks based on user queries.
- Use the
RetrieverQueryEngine
to perform the actual retrieval and query processing, with optional post-processing steps like re-ranking the retrieved documents using tools such asCohereRerank
.
For a practical example, we have provided a sample application to demonstrate a complete RAG implementation using Azure OpenAI.
Practical RAG Sample Application
We'll now focus on building a RAG application using LlamaIndex.ts (the TypeScipt implementation of LlamaIndex) and Azure OpenAI, and deploy on it as a serverless Web Apps on Azure Container Apps.
Requirements to Run the Sample
- Azure Developer CLI (azd): A command-line tool to easily deploy your entire app, including backend, frontend, and databases.
- Azure Account: You'll need an Azure account to deploy the application. Get a free Azure account with some credits to get started.
You will find the getting started project on GitHub. We recommend you to fork this template so you can freely edit it when needed:
High-Level Architecture
The getting started project application is built based on the following architecture:
- Azure OpenAI: The AI provider that processes the user's queries.
- LlamaIndex.ts: The framework that helps ingest, transform, and vectorize content (PDFs) and create a search index.
- Azure Container Apps: The container environment where the serverless application is hosted.
- Azure Managed Identity: Ensures top-notch security and eliminates the need for handling credentials and API keys.
For more details on what resources are deployed, check the infra
folder available in all our samples.
Example User Workflows
The sample application contains logic for two workflows:
-
Data Ingestion: Data is fetched, vectorized, and search indexes are created. If you want to add more files like PDFs or Word files, this is where you should add them.
npm run generate
Serving Prompt Requests: The app receives user prompts, sends them to Azure OpenAI, and augments these prompts using the vector index as a retriever.
Running the Sample
Before running the sample, ensure you have provisioned the necessary Azure resources.
To run the GitHub template in GitHub Codespace, simply click
In your Codespaces instance, sign into your Azure account, from your terminal:
azd auth login
Provision, package, and deploy the sample application to Azure using a single command:
azd up
To run and try the application locally, install the npm dependencies and run the app:
npm install
npm run dev
The app will run on port 3000 in your Codespaces instance or at http://localhost:3000 in your browser.
Conclusion
This guide demonstrated how to build a serverless RAG (Retrieval-Augmented Generation) application using LlamaIndex.ts and Azure OpenAI, deployed on Microsoft Azure. By following this guide, you can leverage Azure's infrastructure and LlamaIndex's capabilities to create powerful AI applications that provide contextually enriched responses based on your data.
Weโre excited to see what you build with this getting started application. Feel free to fork it and like the GitHub repository to receive the latest updates and features.