Most large language models (LLMs) are trained on vast amounts of data, but there are situations where we need an LLM to be trained on your own data. In this example, I'll show you a simple code using open-source tools like Hugging Face, GPT-2, and VectorDB to use your own data.
This is what we are going to build. We will write Python code that will load a PDF file from a local directory, split it into text, create a vector embedding, load a large language model (LLM) for text generation, and finally provide responses to the queries. Here is the step by step process.
Step 1: Clone the Repository
First, clone the repository using the following command:
git clone https://github.com/100daysofdevops/huggingface-book.git
Step 2: Navigate to the Project Directory
Change to the project directory:
cd huggingface-book
Step 3: Install the Required Dependencies
Install the required Python packages by running:
pip install -r requirements.txt
Step 4: Importing Necessary Libraries
import os
import torch
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.indexes import VectorstoreIndexCreator
from transformers import pipeline
from langchain_huggingface import HuggingFacePipeline
os: Provides a way to use operating system-dependent functionality.
torch: A library for machine learning used here to check if a GPU is available for faster computation.
PyPDFLoader: Part of LangChain, used to load and process PDF documents.
RecursiveCharacterTextSplitter: Splits text into chunks based on specified separators.
HuggingFaceEmbeddings: Provides embeddings using models from the Hugging Face library.
Chroma: A vector store for storing and retrieving embeddings efficiently.
VectorstoreIndexCreator: Helps create an index from the text and embeddings.
pipeline: A Hugging Face utility to streamline the use of models for tasks like text generation.
HuggingFacePipeline: Wraps Hugging Face pipelines to integrate them with LangChain.
Step 5: Creating a function that loads a PDF, splits it, creates embeddings, and stores the result in ChromaDB.
def create_policy_index():
print("Loading PDF file...")
pdf_loader = PyPDFLoader('aws_ec2_faq.pdf')
print("Splitting text from PDF...")
text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n", " ", ""], chunk_size=100, chunk_overlap=10)
print("Loading Hugging Face model for embeddings...")
embeddings_model = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
print("Creating ChromaDB vector store...")
vector_store_creator = VectorstoreIndexCreator(
text_splitter=text_splitter,
embedding=embeddings_model,
vectorstore_cls=Chroma)
print("Creating index from loaded PDF...")
index = vector_store_creator.from_loaders([pdf_loader])
print("Index created successfully.")
return index
Loading PDF file: pdf_loader = PyPDFLoader('aws_ec2_faq.pdf'): This step will load the PDF file aws_ec2_faq.pdf, which we will use for further processing.
Splitting text from PDF:text_splitter = RecursiveCharacterTextSplitter(…): This step splits the text extracted from the PDF file into small chunks using different separators like new lines and spaces. We use a chunk size of 100 with an overlap of 10 characters to ensure that no information is lost at the chunk boundaries
Loading Hugging Face model for embeddings:embeddings_model = HuggingFaceEmbeddings(…): "Now we use sentence-transformers/all-MiniLM-L6-v2 from Hugging Face to convert these chunks into vector embeddings."
Creating ChromaDB vector store:vector_store_creator = VectorstoreIndexCreator(…): Using the text splitter and embeddings model we created in previous step we stores these embeddings in a Chroma vector store.
Creating index from loaded PDF:index = vector_store_creator.from_loaders([pdf_loader]): Next step we create a searchable index from the PDF data.
Return: The function returns the created index.
Step 6: Create a function load_llm that will load the GPT model.
def load_llm():
print("Loading LLM...")
llm_pipeline=pipeline(
"text-generation",
model="EleutherAI/gpt-neo-2.7B",
device=0 if torch.cuda.is_available() else -1,
max_new_tokens=50,
clean_up_tokenization_spaces=True
)
llm_model=HuggingFacePipeline(pipeline=llm_pipeline)
print("LLM model loaded sucessfully...")
return llm_model
llm_pipeline = pipeline(…): This function will load the GPT-Neo-2.7B model from Hugging Face for text generation. It checks if GPU is available using torch.cuda.is_available() and assigns the computation to GPU device 0; otherwise, it assigns it to CPU (device=-1). Additionally, this model is configured to generate up to 50 tokens.
Step 7: Create a function retrieve_response to load model and queries the index
def retrieve_policy_response(index, query):
print(f"Processing query: {query}")
llm = load_open_source_llm()
response = index.query(question=query, llm=llm)
print("Query processed successfully.")
return response
This function will load the model and query the index created earlier. The index searches for the best match in the vector store and generates a response using the LLM.
Step8: Main function
if __name__=='__main__':
index=create_policy_index()
llm=load_llm()
In the final step, we will create a function that calls create_policy_index to create an index from the PDF and load the LLM using the load_llm function.
Now that we have the backend ready, which we will save in a file rag_backend.py we will create a frontend using Streamlit. This application will allow users to ask questions related to AWS EC2. These questions will be processed by the backend RAG system we created, which will retrieve the most relevant answers from the vector index.
import streamlit as st
import rag_backend as backend
st.set_page_config(page_title="AWS EC2 FAQ using RAG")
new_title = '<p style="font-family:sans-serif; color:Green; font-size: 42px;">AWS EC2 FAQ using RAG</p>'
st.markdown(new_title, unsafe_allow_html=True)
if 'vector_index' not in st.session_state:
with st.spinner("📀 Wait for the RAG magic..."):
st.session_state.vector_index = backend.create_policy_index()
input_text = st.text_area("Enter your question", label_visibility="collapsed")
go_button = st.button("🔍 Get Answer", type="primary")
if go_button:
with st.spinner("🔄 Processing your request... Please wait a moment."):
response_content = backend.retrieve_policy_response(index=st.session_state.vector_index, query=input_text)
st.write(response_content)
Imports Libraries: It imports Streamlit for the web interface and a custom rag_backend module for backend processing.
Page Setup: Sets the page title and displays a styled title on the web page.
Vector Index Management: Checks if a vector index for the document is already loaded in the session state. If not, it creates and stores the index for efficient querying.
User Interaction: Provides a text area for users to input their question and a button to submit it.
Query Processing: When the button is clicked, it retrieves the most relevant answer using the vector index and the RAG model, displaying the result on the page.
Step9:To execute this code
Please find the output in the image shown below.
> streamlit run rag_frontend.py
You can now view your Streamlit app in your browser.
Local URL: http://localhost:8503
Network URL: http://192.168.1.151:8503
NOTE: When I tried to execute this code, it was super slow on the CPU, but the performance was much better on the GPU.
📚 If you'd like to learn more about this topic, please check out my book. Building an LLMOps Pipeline Using Hugging Face
https://pratimuniyal.gumroad.com/l/BuildinganLLMOpsPipelineUsingHuggingFace