How to build a movie recommendation app without the complexities of vector databases

Streamlit - Sep 10 - - Dev Community

Originally published on the Streamlit blog by Liz Acosta

You are what you eat; your model is what your model ingests.

Not only does data inform AI systems, data is the output you ultimately receive. That’s why it’s important to have “good” data. It doesn’t matter how powerful your model is, garbage in will always result in garbage out.

In software development, this isn’t a new concept or problem. However, AI demands a more sophisticated data strategy throughout the ETL process. This can slow the delivery of your AI-integrated applications.

In this recipe, you’ll use Weaviate to abstract away the complexity associated with vector databases, allowing you to implement a powerful search and recommendation system with way less technical overhead. Then we’ll use Streamlit to build the chatbot part of the app.

And don’t panic! There’s no frontend involved!

Read on to learn:

  • What Weaviate is
  • What Streamlit is
  • How to build a demo Weaviate movie recommendation Streamlit app
  • How to query a Collection in Weaviate Cloud

Don’t feel like reading? Here are some other ways to explore this demo:

What is Weaviate?

Weaviate is an AI-native database designed to help you build amazing, scalable, and production-grade AI-powered applications. It offers robust features for data storage, retrieval, and querying as well as integrations with AI models, making it an excellent choice for developers looking to integrate AI capabilities into their apps.

The Streamlit-Weaviate Connection

The Streamlit-Weaviate connection is a wrapper that simplifies the process of integrating Weaviate with Streamlit applications. This connection allows you to perform various operations, such as connecting to a remote or local Weaviate instance, performing queries, and using the underlying Weaviate Python client. The project is open-source so contributions are always welcome.

Key features

  • Connect to a Weaviate Cloud instance: Easily connect to a Weaviate cloud instance using a URL and API key
  • Connect to a locally running Weaviate instance: Easily connect to a Weaviate instance running locally
  • Perform queries: Execute simple and advanced queries using the query or GraphQL query methods
  • Use the Weaviate Python client: Leverage the full capabilities of the Weaviate Python client for more complex operations
  • Support for local instances: Connect to a local Weaviate instance using default parameters
  • Secrets management: Streamlit can handle secret management for secure connections

What is Streamlit?

Streamlit is an open-source Python framework to build highly interactive apps – in only a few lines of code. Streamlit integrates with all the latest tools in generative AI, such as any LLM, vector database, or various AI frameworks like LangChain, LlamaIndex, or Weights & Biases. Streamlit’s chat elements make it especially easy to interact with AI so you can build chatbots that “talk to your data.”

Combined with a platform like Replicate, Streamlit allows you to create generative AI applications without any of the app design overhead.

To learn more about Streamlit, check out the 101 guide.

💡 To learn more about how Streamlit biases you toward forward progress, check out this blog post.

Try the app recipe: Weaviate + Streamlit

In this demo, you’ll spin up a movie recommendation app that utilizes Weaviate for backend data management and Streamlit chat elements on the frontend for interaction. The app accepts a natural language input from a user and uses Weaviate to translate the input into a query and then generate a list of movie titles.

There are three different kinds of search modes available:

Keyword: This search mode uses BM25 to rank documents based on the relative frequencies of search terms. In this particular app, that means the results returned are based on how often the search keywords appear in the different movie properties.

Semantic: This type of search uses vectors to generate results based on their similarity to your search query. In other words, the results returned are based on a similarity of “meaning.” To learn more about vector databases, check out Weaviate’s Gentle Introduction to Vector Databases.

Hybrid: A hybrid search combines vector and BM25 searches to offer best-of-both-worlds search results.

Prerequisites

In this app, the Cohere API is used for two different operations:

Please note that both Weaviate and Cohere have limits on their trial accounts. Check their websites for more details.

💡 To learn more about API keys, check out the blog post here.

Environment setup

Local setup: Create a virtual environment

  1. Clone the Cookbook repo: git clone https://github.com/streamlit/cookbook.git
  2. From the Cookbook root directory, change directory into the recipe: cd recipes/weaviate
  3. Add the necessary secrets to the .streamlit/secrets_template.toml file:

    WEAVIATE_API_KEY = "your weaviate key goes here"
    WEAVIATE_URL = "your weaviate url goes here" 
    COHERE_API_KEY = "your cohere api key goes here" 
    
  4. Update the filename from secrets_template.toml to secrets.toml: mv .streamlit/secrets_template.toml .streamlit/secrets.toml
    (To learn more about secrets handling in Streamlit, refer to the documentation here.)

  5. Create a virtual environment: python3 -m venv weaviatevenv

  6. Activate the virtual environment: source weaviatevenv/bin/activate

  7. Install the dependencies: pip install -r requirements.txt

Add data to your Weaviate Cloud

  1. Create a Weaviate Cloud Collection and add data to it: python3 helpers/add_data.py
  2. (Optional) Verify the data: python3 helpers/verify_data.py

Query the MovieDemo collection in Weaviate Cloud

You can access the Query panel via the Weaviate Cloud UI.

  1. Copy and paste the following query in the editor:

    { Get
    {MovieDemo (limit: 3 where: { 
        path: ["release_year"], 
        operator: Equal, 
        valueInt: 1985}){ 
    budget 
    movie_id 
    overview 
    release_year 
    revenue 
    tagline 
    title 
    vote_average 
    }}} 
    
  2. Click on the arrow to execute the query

Screenshot of the Weaviate Cloud UI

The Weaviate Cloud Query tool is a browser-based GraphQL IDE. In the example query above, we are telling Weaviate to return the budget, movie_id, overview, release_year, revenue, tagline, title, vote_average, vote_count properties for the objects in the MovieDemo collection with a release_year of 1985. We do this by setting the path to ["release_year"], the operator to Equal, and the valueInt to 1985. We also limit the query results to three objects with limit: 3.

You should get back a result of three movies with the release year 1985.

This is a simple query that forms the foundation of more complex queries. To learn more about different kinds of searches available with the Weaviate Cloud Query tool, check out the documentation.

Run the demo Weaviate Streamlit recommendation app

To run the demo app, use the Streamlit CLI: streamlit run demo_app.py.

Running this command deploys the app to a port on localhost. When you access this location, you should see a Streamlit app running. Please note that this version of the demo app does not feature the poster images so it will look different from the deployed app.

A gif of the movie recommendation app demo

Vector databases made easy

Using Streamlit-Weaviate Connection means you can easily create and integrate vector databases in your Streamlit apps.

In demo_app.py, the Weaviate connection is created here:

def setup_weaviate_connection(env_vars):
    """Setup Weaviate connection"""
    return st.connection(
        "weaviate",
        type=WeaviateConnection,
        url=env_vars["WEAVIATE_URL"],
        api_key=env_vars["WEAVIATE_API_KEY"],
        additional_headers={"X-Cohere-Api-Key": env_vars["COHERE_API_KEY"]},
    )
Enter fullscreen mode Exit fullscreen mode

Using Streamlit chat elements, a prompt is created and a query is made here:

        with conn.client() as client:
            collection = client.collections.get("MovieDemo")
            response = collection.generate.hybrid(=
                query=movie_type,
                filters=(                 Filter.by_property("release_year").greater_or_equal(year_range[0]) &                  Filter.by_property("release_year").less_or_equal(year_range[1])
                ),
                limit=SEARCH_LIMIT,
                alpha=SEARCH_MODES[mode][1],
                grouped_task=rag_prompt,
                grouped_properties=["title", "tagline"],
            )

Enter fullscreen mode Exit fullscreen mode

The result is a fully interactive recommendation app with no JavaScript experience required!

If you would like to learn more about Weaviate, check out the Weaviate Quickstart and Weaviate Academy.

Unlock the potential of AI with Streamlit

With Streamlit, months and months of app design work are streamlined to just a few lines of Python. It’s the perfect framework for showing off your latest AI inventions.

Get up and running fast with other AI recipes in the Streamlit Cookbook. (And don’t forget to show us what you’re building in the forum!)

Happy Streamlit-ing! 🎈

. . . .
Terabox Video Player