How to Use Ollama for Front-end with Streaming Output

ppaanngggg - Jun 17 - - Dev Community

Introduction

LLM applications are becoming increasingly popular. However, there are numerous LLM models, each with its differences. Handling streaming output can be complex, especially for new front-end developers.

Thanks to the AI SDK developed by Vercel, implementing LLM chat in next.js with streaming output has become incredibly easy. Next, I'll provide a step-by-step tutorial on how to integrate Ollama into your front-end project.

Install Ollama

Ollama is the premier local LLM inferencer. It allows for direct model downloading and exports APIs for backend use. If you're seeking lower latency or improved privacy through local LLM deployment, Ollama is an excellent choice. For installation, if you're using Linux, simply run the following command:



curl -fsSL https://ollama.com/install.sh | sh


Enter fullscreen mode Exit fullscreen mode

If you're using a different OS, please follow this link.

Create a New Next.js Project

To create a new Next.js project, enter the command npx create-next-app@latest your-new-project. Make sure you choose App route mode. After that, run npm dev and open localhost:3000 in your preferred browser to verify if the new project is set up correctly.

Next, you need to install the AI SDK:



npm install ai


Enter fullscreen mode Exit fullscreen mode

The AI SDK utilizes a sophisticated provider design, enabling you to implement your own LLM provider. At present, it is only necessary to install the Ollama provider offered by third-party support.



npm install ollama-ai-provider


Enter fullscreen mode Exit fullscreen mode

Server-Side Code

Now that you've gathered all the prerequisites for your LLM application, create a new file named actions.ts in the app folder:



"use server";

import { ollama } from "ollama-ai-provider";
import { streamText } from "ai";
import { createStreamableValue } from "ai/rsc";

export interface Message {
  role: "user" | "assistant";
  content: string;
}

export async function continueConversation(history: Message[]) {
  "use server";

  const stream = createStreamableValue();
  const model = ollama("llama3:8b");

  (async () => {
    const { textStream } = await streamText({
      model: model,
      messages: history,
    });

    for await (const text of textStream) {
      stream.update(text);
    }

    stream.done();
  })().then(() => {});

  return {
    messages: history,
    newMessage: stream.value,
  };
}


Enter fullscreen mode Exit fullscreen mode

Let me provide some explanation about this code.

  1. interface Message is a shared interface that establishes the structure of a message. It includes two properties: 'role' (which can be either 'user' or 'assistant') and 'content' (the actual text of the message).
  2. The continueConversation function is a server component that utilizes the conversation history to generate the assistant's response. This function interacts with the Ollama model (specifically llama3:8b, but you can replace it with any model of your choice) to generate a continuous text output.
  3. The streamText function is part of the AI SDK and it creates a text stream that will be updated with the assistant's response as it is generated.

Client-Side Code

Next, replace the contents of page.tsx with the new code:



"use client";

import { useState } from "react";
import { continueConversation, Message } from "./actions";
import { readStreamableValue } from "ai/rsc";

export default function Home() {
  const [conversation, setConversation] = useState<Message[]>([]);
  const [input, setInput] = useState<string>("");

  return (
    <div>
      <div>
        {conversation.map((message, index) => (
          <div key={index}>
            {message.role}: {message.content}
          </div>
        ))}
      </div>

      <div>
        <input
          type="text"
          value={input}
          onChange={(event) => {
            setInput(event.target.value);
          }}
        />
        <button
          onClick={async () => {
            const { messages, newMessage } = await continueConversation([
              ...conversation,
              { role: "user", content: input },
            ]);

            let textContent = "";

            for await (const delta of readStreamableValue(newMessage)) {
              textContent = `${textContent}${delta}`;

              setConversation([
                ...messages,
                { role: "assistant", content: textContent },
              ]);
            }
          }}
        >
          Send Message
        </button>
      </div>
    </div>
  );
}


Enter fullscreen mode Exit fullscreen mode

This is a very simple UI you can continue talk with LLM model now. There are some important snips:

  1. The input field captures the user's input. It is controlled by a React state variable that gets updated every time the input changes.
  2. The button has an onClick event that triggers the continueConversation function. This function takes the current conversation history, appends the user's new message, and waits for the assistant's response.
  3. The conversation array holds the history of the conversation. Each message is displayed on the screen, and new messages are appended at the end. By using readStreamableValue from the AI SDK, we're able to read the streaming output value from the server component function and update the conversation in real-time.

Let’s Test Now

I type "who are you" into the input placeholder.

ollama input

Here is the output of llama:8b supported by Ollama. You'll notice that the output is printed in a streaming manner.

ollama output

References

  1. Documentation for the AI SDK: https://sdk.vercel.ai/docs/introduction
  2. Ollama Github: https://github.com/ollama/ollama
  3. Find more models supported oy Ollama: https://ollama.com/library
. . . . . . . . .
Terabox Video Player