LLM ChatBots 3.0: Merging LLMs with Dynamic UI Elements

Large Language Models (LLMs) have significantly transformed chatbots and conversational AI, making interactions more natural and intuitive. However, there are still areas where improvements can enhance their effectiveness.

Let's look at a typical interaction with a traditional chatbot:

User: I'm looking for warm travel destinations in Europe.

AI: Great! I can help with that. Which country are you interested in?
1. Spain
2. Italy
3. Greece

User: 2

AI: Excellent choice. What type of accommodation do you prefer?
1. Hotel
2. Hostel
3. Airbnb

Users often need to drill down or make selections based on LLM responses.
Selecting options requires typing text or using numbering systems.
This process can be tedious, especially with multiple selections or on mobile devices.

Dynamic UI Generation with LLMs

Looking at the same problem from a more user-friendly perspective, inspired by the typical UI elements used on the web (buttons, icons, etc.), we can integrate these same elements into the chat itself. The result could be something like this.

We ask the same question and instead of just getting a list or bullet points of countries, we get choices like this:

Selecting an item or multiple items then triggers an automatic reply with the choice to the LLM, so it "knows" what we have chosen, in the case below "Lisbon, Portugal".

You can try it out here: https://fluidchat.vercel.app

Please note that the application is still under development and uses the gpt-4o-mini, which sometimes does not output the correct codes for some icons.

The main differences from text-based chat conversations:

Dynamic UI Generation: The LLM can generate UI elements like buttons, checkboxes, and dropdown menus.
Interactive Selections: Users can make selections directly through the UI instead of typing responses.
Conversation Context: UI interactions are seamlessly integrated into the conversation history.
Mobile-Friendly: The interface is designed to be easily usable on both desktop and mobile devices.

How this approach differs from Artifact (Claude) and Canvas (OpenAI)

Aritafcts in Anthropic terminology and Canvas in OpenAI terminology introduce ways of presenting information like HTML, SVG, React etc in a super friendly way. However, they do not yet provide a way of capturing information, e.g. user input.

Technical Deep Dive

To enable this dynamic UI generation, we've implemented a very simple custom markup language for just a few elements. This language allows the LLM to generate not only text, but also instructions for rendering UI elements.

The markup language is designed to be distinct from typical programming languages. This distinction is crucial because it prevents confusion when the LLM generates actual snippets of code as part of the conversation.

Here's a table showcasing some elements of our custom markup language:

To make this work, we need to implement it on two sides: The LLM side (through prompting) and the client side that interprets and displays these UI elements.

LLM-side

Here is a simple prompt that tells the LLM when and how to use these elements.

You are an AI assistant capable of presenting options for user selection and using Font Awesome icons. When appropriate, use the following formats:

1. [SINGLE_SELECT] for single-choice options
2. [MULTI_SELECT] for multiple-choice options
3. [CHOICE] for general choice options

Present the options in a numbered list format. Use Font Awesome icons by wrapping the icon name in double curly braces, like {{icon-name}}. Examples:

[SINGLE_SELECT]
1. {{fa-home}} Home
2. {{fa-user}} Profile
3. {{fa-cog}} Settings

[MULTI_SELECT]
1. {{fa-pizza-slice}} Pizza
2. {{fa-hamburger}} Burger
3. {{fa-ice-cream}} Ice Cream

[CHOICE]
1. {{fa-car}} Drive
2. {{fa-bicycle}} Cycle
3. {{fa-walking}} Walk

Client-side

And on the client side (the chat interface) we have to integrate these elements as well.

// Render different types of content
const renderContent = () => {
  if (typeof message.content === 'string') {
    return <p>{message.content}</p>;
  }
  if (Array.isArray(message.content)) {
    return message.content.map((item, index) => (
      <div key={index}>
        {item.text && <p>{item.text}</p>}
        {item.options && renderOptions(item.options, item.type)}
      </div>
    ));
  }
  return null;
};

// Render interactive UI elements
const renderOptions = (options, type) => {
  switch (type) {
    case 'single-select':
      // Render single-select buttons
    case 'multi-select':
      // Render multi-select buttons with confirm
    default:
      return null;
  }
};

Visual Enhancements: A Nice-to-Have

The implementation described above is not only limited to input elements, but can also add support for rich visualisation such as displaying font-awesome, markdown, etc. Some of these visual elements are already supported by all major chatbot interfaces.

For instance, we could extend our markup language to include icon specifications:

[CHOICE:Small{icon:pizza-sm},Medium{icon:pizza-md},Large{icon:pizza-lg}]

This could render as buttons with both text and representative icons, further improving the user experience.

What's Next

The fluid approach introduced here is just the beginning of new UI patterns for interaction between LLM and users, making interaction smoother, especially when a lot of input is required from the user (as opposed to simple informational conversations).

It will also be interesting to see how voice interaction evolves, particularly with streaming voice APIs. Imho, however, text interaction will never have application, especially in content savvy applications that are too noisy to listen to. We already experience the latter from hotline menus that could sometimes feel like torture :)