Before diving into the code, it is essential to install the necessary packages to ensure everything runs smoothly. You can do this by executing the following commands in your terminal:
pip install langchain_community
pip install pypdf
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Load the PDF file from the specified path.
FILE_PATH = "c:/work/Test01.pdf"
loader = PyPDFLoader(file_path=FILE_PATH)
# Load the entire PDF into a list of documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
documents = loader.load_and_split(text_splitter)
for i in range(len(documents)):
print(documents[i].page_content + "\n")```