Optimizing AI Conversations: A Deep Dive into Semantic Search and Multi-Modal Input

Hasnain01-hub - Oct 5 - - Dev Community

This is a submission for the The Pinata Challenge

What I Built

I developed an AI Chat Assistant platform that leverages advanced semantic search to analyze and retrieve relevant data from a diverse content. The platform intelligently processes user prompts through the LangGraph tool, which determines when to invoke the RAG (retrieval-augmented generation) tool only when necessary, optimizing performance. This ensures that relevant references are efficiently retrieved to enhance the accuracy and relevance of the language model's responses.

The system supports multiple input formats, including images, YouTube links, and PDFs. For videos, it retrieves metadata via the YouTube API and generates concise summaries using a large language model (LLM). Similarly, it summarizes content from images, providing a comprehensive and versatile user experience across different media types.

Additionally, the platform integrates Stable Diffusion to generate images based on textual prompts. These generated images are stored in Pinata (IPFS). The platform also maintains a complete chat history, ensuring smooth continuity in conversations and allowing users to reference previous interactions effortlessly.

LangGraph Flow:

Agent Flow

  • Advanced Semantic Search:

    • Efficiently analyzes and retrieves relevant data from a vector store.
    • Utilizes the LangGraph tool to optimize workflows by invoking the RAG tool only when necessary.
  • Multi-Format Input Support:

    • Processes images, YouTube links, PDFs, and text files.
    • Retrieves video metadata via the YouTube API and generates concise summaries using LLMs.
    • Summarizes content from images for seamless multi-media analysis.
  • Image Generation:

    • Integrates Stable Diffusion to generate images based on user-provided text prompts.
    • Stores generated images using Pinata (IPFS) for decentralized access.
  • Comprehensive Chat History:

    • Allows users to select which chats to store and easily reference previous interactions.
  • Optimized Performance:

    • LangGraph tool ensures that RAG is only called when necessary, enhancing response relevance and efficiency.

Demo

Demo Video: Link

My Code

AI Chat Assistant

Description

This platform utilizes advanced semantic search algorithms to analyze and retrieve relevant data from this diverse content repository. After processing the input prompt, the system performs a semantic search to fetch pertinent content, which is then used as references for the language model to enhance the accuracy and relevance of its outputs.

The platform is designed to handle inputs across a diverse formats including images, YouTube links, and pdf files. For videos, it fetches metadata through the YouTube API, which is then effectively summarized using a LLM to provide concise and relevant content overviews. Similarly, the system is equipped to summarize content from images, ensuring a comprehensive and versatile user experience that accommodates various types of media.

Additionally, the platform incorporates the capability to generate images using the Stable Diffusion algorithm, further enriching the user interaction experience by providing visual content generation based on textual prompts…

More Details

This platform integrates Stable Diffusion to generate images based on user textual prompts. The generated images are stored in Pinata (IPFS), which also maintains the chat history, it utilizes CDN for efficient access to the images.

Chat History

TechStack used:

  • NextJS
  • Langchain
  • LangGraph
  • Pinata
  • Huggingface API
  • Azure OpenAI LLM
  • Pinecone
  • Firebase
  • Redux
. . . . .
Terabox Video Player