Hii Hiiiii! 👋

Are you stuck between AI and AI?? I'm too! But we have to go with the flow else we won't be able to last our impact!

This blog is about one such AI thing that is creating a promising impact in the tech world. It doesn't matter if you are a beginner or an expert, if you're working in the tech field or have an interest in it, then you must know about this.

In this blog, I'll be covering Retrieval-Augmented Generation (RAG) in detail and creating a quick prompt model using an exceptional framework LLMWARE.

Let's start... 3️⃣... 2️⃣... 1️⃣... 🤓

What is RAG??

Let's start with the basics so that you can easily understand RAG.

So, first of all, What is AI? AI or Artificial Intelligence is nothing but just the science and engineering of making intelligent machines.

Inside AI, there are so many subsets. Take a look at the diagram below:

Now, let's discuss another field of Chaos, Machine Learning(ML). As per the above diagram, it might be clear that ML is a subset of AI. ML is focused on building computer systems that learn from data. Therefore, ML is a part of the AI that processes and trains a piece of software, called a model, to make useful predictions or generate content from data.

Fun Fact: LLM is a type of artificial intelligence (AI) program and is built on machine learning. Thus, LLMs are trained on huge sets of data — hence the name "large."



AI
├── ML (Machine Learning)
│   ├── LLM (Large Language Models)
│   └── RAG (Retrieval-Augmented Generation)

But What is RAG⁉️

RAG or Retrieval-Augmented Generation is a groundbreaking AI framework (as same as NextJs is a framework of Js) for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge.

RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model. It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.

I hope you are somewhat clear with the RAG concept. To make the concept clearer, let's jump to the Example part, where we will be creating a simple project to test Prompt-based RAG Models using LLMWARE as the framework.

If you don't know about LLMWARE, please read the below article. It's a only pre-requisite for building the project! 😝

LLMware.ai 🤖: An Ultimate Python Toolkit for Building LLM Apps

Rohan Sharma ・ Aug 29

#python #llm #rag #ai

Let's Prompt Model with LLMWare.ai 🤖

llmware provides a unified framework for building LLM-based applications (e.g., RAG, Agents), using small, specialized models that can be deployed privately, integrated with enterprise knowledge sources safely and securely, and cost-effectively tuned and adapted for any business process.

In this example, we will illustrate:

Discovery - how to discover models in the llmware ModelCatalog.
Load Model - how to load a selected model from the catalog.
Prompt - how to create a basic prompt and run an inference with the model.

So let's start 🟩:

1️⃣ Install the llmware as explained above. Or simply run this code in the terminal:



 pip3 install llmware

2️⃣ Considering that you don't have any test questions to test this project. Therefore, you can use the below one:



def hello_world_questions():

    """ This is a set of useful test questions to do a 'hello world' but there is nothing special about the
    questions - please feel free to edit and ask your own queries with your own context passages.

    --if you are using one of the llmware models, please take note that the models have been trained to answer
    based on the information provided, so if you ask a question without passing any context passage, then
    don't be surprised if the model responds with 'Not Found.' """

    test_list = [

    {"query": "What is the total amount of the invoice?",
     "answer": "$22,500.00",
     "context": "Services Vendor Inc. \n100 Elm Street Pleasantville, NY \nTO Alpha Inc. 5900 1st Street "
                "Los Angeles, CA \nDescription Front End Engineering Service $5000.00 \n Back End Engineering"
                " Service $7500.00 \n Quality Assurance Manager $10,000.00 \n Total Amount $22,500.00 \n"
                "Make all checks payable to Services Vendor Inc. Payment is due within 30 days."
                "If you have any questions concerning this invoice, contact Bia Hermes. "
                "THANK YOU FOR YOUR BUSINESS!  INVOICE INVOICE # 0001 DATE 01/01/2022 FOR Alpha Project P.O. # 1000"},

    {"query": "What was the amount of the trade surplus?",
     "answer": "62.4 billion yen ($416.6 million)",
     "context": "Japan’s September trade balance swings into surplus, surprising expectations"
                "Japan recorded a trade surplus of 62.4 billion yen ($416.6 million) for September, "
                "beating expectations from economists polled by Reuters for a trade deficit of 42.5 "
                "billion yen. Data from Japan’s customs agency revealed that exports in September "
                "increased 4.3% year on year, while imports slid 16.3% compared to the same period "
                "last year. According to FactSet, exports to Asia fell for the ninth straight month, "
                "which reflected ongoing China weakness. Exports were supported by shipments to "
                "Western markets, FactSet added. — Lim Hui Jie"}
]

    return test_list

3️⃣ Make a Python file, let's say fast_start_rag.py, and paste the below code:



import time
from llmware.prompts import Prompt
from llmware.models import ModelCatalog

def fast_start_prompting  (model_name):

    """ This is the main example script - it loads the question list, loads the model, and executes the prompts. """

    t0 = time.time()

    # load in the 'hello world' test questions above
    test_list = hello_world_questions()

    print(f"\n > Loading Model: {model_name}...")

    prompter = Prompt().load_model(model_name)

    t1 = time.time()
    print(f"\n > Model {model_name} load time: {t1-t0} seconds")

    for i, entries in enumerate(test_list):
        print(f"\n{i+1}. Query: {entries['query']}")

        #   run the prompt
        output = prompter.prompt_main(entries["query"],
                                      context=entries["context"],
                                      prompt_name="default_with_context",
                                      temperature=0.30)

        #   'output' is a dictionary with two keys - 'llm_response' and 'usage'
        #   --'llm_response' is the output from the model
        #   --'usage' is a dictionary with the usage stats

        llm_response = output["llm_response"].strip("\n")
        print(f"LLM Response: {llm_response}")

        #   note: the 'gold answer' is the answer we provided above in the hello_world question list
        print(f"Gold Answer: {entries['answer']}")

        print(f"LLM Usage: {output['usage']}")

    t2 = time.time()
    print(f"\nTotal processing time: {t2-t1} seconds")

    return 0


if __name__ == "__main__":

    #   Step 1 - we will pick a model from the ModelCatalog

    #   A few useful methods to discover and display a list of available models...

    #   all generative models
    llm_models = ModelCatalog().list_generative_models()

    #   if you only want to see the local models
    llm_local_models = ModelCatalog().list_generative_local_models()

    #   to see only the open source models
    llm_open_source_models = ModelCatalog().list_open_source_models()

    #   we will print out the local models
    for i, models in enumerate(llm_local_models):
        print("models: ", i, models["model_name"], models["model_family"])

    #   for purposes of demo, try a few selected models from the list

    #   each of these pytorch models are ~1b parameters and will run reasonably fast and accurate on CPU
    #   --per note above, may require separate pip3 install of: torch and transformers
    pytorch_generative_models = ["llmware/bling-1b-0.1", "llmware/bling-tiny-llama-v0", "llmware/bling-falcon-1b-0.1"]

    #   bling-answer-tool is 1b parameters quantized
    #   bling-phi-3-gguf is 3.8b parameters quantized
    #   dragon-yi-6b-gguf is 6b parameters quantized
    gguf_generative_models = ["bling-answer-tool", "bling-phi-3-gguf","llmware/dragon-yi-6b-gguf"]

    #   by default, we will select a gguf model requiring no additional imports
    model_name = gguf_generative_models[0]

    #   to swap in a GPT-4 openai model - uncomment these two lines
    #   model_name = "gpt-4"
    #   os.environ["USER_MANAGED_OPENAI_API_KEY"] = "<insert-your-openai-key>"

    fast_start_prompting(model_name)

4️⃣ Move to the terminal again and run the below code to run the application:



python fast_start_rag.py

Output 📃

Although the code is self-explanable (check the comments) but you might be wondering, what's just happened right now! You may have many questions. But wait! I have that explanation part, especially for visual learners. Kindly go through this link once, Prompt Models (Ex. 3): Fast Start to RAG (2024). And if you want to learn more, then go through the playlist:

Fast Start to RAG (2024 updates) - YouTube

Learn how to master the basics of RAG in this easy to follow step-by-step series of tutorials

youtube.com

Moving to the End...

Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response.

If you still have any questions, drop it in the comment section. Alternatively, you can join the LLMWare Official Discord Channel by following this link: https://discord.com/invite/fCztJQeV7J

Thank you! You're the most beautiful person! Keep learning, keep hustling. Have a good day!! 💝

Star LLMWare.ai ⭐

RAG Simplified!! 🐣