Introduction

In the first tutorial of our series, we explored foundation models and Large Language Models (LLMs). We took a glimpse at their basics, evolution, and use cases. In this part, we'll dive deeper into how they work. We'll explore how a computer program can be intelligent enough to produce emails, translate text, summarize data, and understand our language - all without the typical machine learning complexities.

LLMs are not human; they simply predict and generate text based on patterns. As a fellow JavaScript developer, I'll explain how this "magic" is possible in terms we can easily understand. We'll uncover how these seemingly intelligent programs process and generate human-like text.

How do LLMs Work?

Large Language Models (LLMs) are advanced AI systems designed to understand and generate human-like text. They learn from vast amounts of text data using a method called self-supervised learning. Here’s a simple breakdown of how they work:

Key Steps in LLM Functioning:

Input Processing: When you type something, the LLM first breaks down your text into smaller pieces called tokens. For example, the sentence "Hello world!" might be split into tokens like ["Hello", "world", "!"].
Vector Conversion: These tokens are then converted into vectors. Think of vectors as lists of numbers that represent the tokens in a way the model can understand. For instance, the token "Hello" might be represented as a vector like [0.1, 0.2, 0.3, 0.4].
Model Processing: The LLM processes these vectors using its neural network, which is often based on a transformer architecture. This step is where the model uses its knowledge to understand the context and relationships between the tokens.
Output Generation: The model generates a response in the form of new vectors. These output vectors are the model's predictions of what the next tokens should be.
Text Conversion: Finally, the output vectors are converted back into tokens, and these tokens are combined to form human-readable text. So, the vectors might be translated back into words or parts of words to create a sentence.

Throughout this process, LLMs use their training on massive datasets to predict the most likely next word or sequence of words. This allows them to generate responses that are coherent and make sense in the given context.

Understanding Tokens in LLMs

Tokens are the fundamental units that Large Language Models (LLMs) use to process language. Tokenization is the process of breaking down text into these smaller, manageable units. Tokens can be words, subwords, or even characters, serving as the basic elements for processing language in LLMs.

When a user inputs a prompt, the LLM's tokenizer converts the text into a series of tokens. These tokens are typically represented as numbers, with each token corresponding to a unique identifier in the model's vocabulary. This numerical representation allows the LLM to process and manipulate text efficiently. Every LLM has a different tokenizer. For example, you can experiment with the tokenizers used by GPT-3.5 and GPT-4 here.

Understanding tokens is important, especially if you are integrating LLMs like ChatGPT into your application. This is because you are charged based on both the input and output tokens – the text you input and the text generated by the model. We will dive deeper into billing in the next post, but this is how tokens play a crucial role in it.

Vectors and Embeddings in Large Language Models (LLMs)

Vectors are lists of numbers used to represent words or tokens in LLMs. Think of them as numerical versions of words. For example, "cat" might be represented as [0.2, 0.4, 0.6]. Embeddings are a specific type of vector used in LLMs. They are dense, continuous representations of words or tokens that capture meaning. Key Features:

High-Dimensional Space: Vectors and embeddings exist in a space with many dimensions, allowing LLMs to capture complex relationships between words.
Meaning Capture: Words with similar meanings have similar vectors. For example, "king" and "queen" might have close vector representations.
Mathematical Operations: Machine learning models can now perform operations on these vectors to understand language. For example: "king" - "man" + "woman" ≈ "queen"

Similarity Searches in LLMs

Imagine you're trying to find a book in a massive library where every book is represented by a unique code. That's similar to how LLMs use embeddings to find relevant information. The actual process is very complex, involving many calculations and adjustments, but here is the basic idea:

Convert Text to Vector: The LLM turns your search phrase into a special code (embedding). Example: You search for "healthy breakfast ideas" → [0.2, 0.5, 0.8]
Compare Embeddings: It compares your search codes. Example: It might find these codes:
"Nutritious morning meals" → [0.3, 0.4, 0.7]
"Fast food options" → [0.1, 0.2, 0.3]
Measure Closeness: The LLM calculates how similar these codes are to your search. Example: "Nutritious morning meals" is closer to "healthy breakfast ideas" than "Fast food options".
Retrieve Information: It presents the most closely related information.

This process helps LLMs quickly find relevant information from vast amounts of data. This is the main reason why LLMs use vectors—machines understand numbers and don’t have consciousness like humans. By using vectors, LLMs can efficiently match and retrieve information. This is how ChatGPT answers your questions so effectively. Pretty neat, right?

Conclusion

In conclusion, understanding the intricate workings of LLMs, from tokenization to embeddings, provides valuable insights into how these powerful AI systems process and generate human-like text. In the next tutorial, we will cover how to choose the right LLM for your use case. We’ll explore the different LLMs available in the market, their use cases, pricing, and more. See you in the next one!

GEN AI for JavaScript Devs: Decoding LLMS, from Tokens to Embeddings