Edited By Siddhartha Reddy Jonnalagadda, PhD
Written By Hundreds of Parents
Introduction
Welcome back! You’re already an expert at understanding data, training models, and building software. You’ve learned how to reason with probability, solve problems with matrices and vectors, and build neural networks. Now, you’ll learn about a fascinating and powerful area of AI that’s changing the world: large language models (LLMs). These are not a mysterious black box; they are a logical extension of the neural networks you’ve already explored. Think of them as AI systems that can understand and create human language, but on a massive scale.
This book is your guide to understanding how these models work and how you can use them to solve problems. We’ll start with the building blocks, then move on to practical applications like generating text and answering questions. We will use the same clear, visual style you’ve come to expect, focusing on building intuition over complex math.
Let’s begin.
Welcome to the first chapter of our journey into large language models. This is where we’ll start at the very beginning, laying the groundwork for everything that comes after. Just as you learned the fundamental pieces of a computer (variables, functions, loops) in our first book and the core building block of a neural network (the neuron) in our last, we’ll now begin with the elementary components of a language model. This chapter will be a lot like unpacking a LEGO set. You’ll see all the basic pieces before we start putting them together to build something truly incredible.
1.1 What is a Language Model? A Prediction Engine
At its core, a language model is a powerful prediction machine that learns to predict the next word in a sentence. You already have an intuitive understanding of this from your own life. If someone says, "The cat sat on the…", your brain instantly fills in the blank with a word like "mat," "couch," or "floor." A language model does the same thing, but it does it on a massive scale.
Imagine a simple guessing game. The model is given a sequence of words, and its task is to assign a probability to the next word that might appear. For example, if the input is "The sun rises in the…", the model calculates the probability of every single word in its vocabulary that could come next.
P("east" | "The sun rises in the") = 0.98
P("west" | "The sun rises in the") = 0.01
P("moon" | "The sun rises in the") = 0.001
In this case, the model is very confident that the next word is "east" and would likely choose it. This is the fundamental task of a language model: to find patterns and relationships in a huge amount of text, which allows it to "guess" what word should come next. The more data a model sees, the better it becomes at making these predictions.
This core task of predicting the next word is what gives a language model its remarkable abilities. To summarize, the model’s fundamental job is to solve this simple prediction problem, and the surprising thing is that by getting very good at this one task, it becomes capable of doing so much more.
1.2 A New Kind of Neural Network: The Transformer
In our book on Deep Learning, we introduced two types of neural networks: Convolutional Neural Networks (CNNs) for images and Recurrent Neural Networks (RNNs) for sequences. We learned that RNNs were models that could "remember" past information in a sequence, making them perfect for text. However, RNNs have a problem: they are slow because they can only process words one at a time. The first word is fed in, then the second, and so on. This is like building a car on a single assembly line, one piece at a time.
LLMs are built on a different and much more efficient kind of neural network called a transformer. A transformer is a new kind of neural network designed to handle sequences of words all at once. Think of a transformer like a parallel assembly line for text. Instead of processing words one at a time, it can process an entire sentence at once. This is a crucial difference that allows language models to be trained on massive amounts of data in a reasonable amount of time.
This parallel processing is a key part of how these models work. The transformer architecture is the foundation for almost all modern LLMs, and it’s what makes them so powerful.
1.3 From Words to Numbers: Tokens and Token Embeddings
Before a computer can understand words, it needs to turn them into numbers. A computer only understands numbers, so we have to translate all of our text into a format it can work with. This is done in two main steps: tokenization and embedding.
1.3.1 Tokenization: Speaking the AI’s Language
We’ll explain how text is broken down into small units called tokens. A token can be a whole word (like "hello"), a part of a word (like "ing"), or even a single character. This process is called tokenization. The model then works with these numbers, not the words themselves. The process is a lot like a computer reading a sentence and putting each word into a numbered list. For example, the sentence "I love my new car" might be tokenized as [100, 250, 50, 90]. Each number corresponds to a specific word in the model’s vocabulary.
1.3.2 The Power of Meaning: Word Embeddings
The genius of modern language models is that they don’t just assign a random number to each word. They use something called word embeddings, which are numerical representations of words that capture their meaning and relationship to other words. Think of it like a map of ideas. Every word is a point on this map, and the distance between two points represents how similar the words are in meaning. The word "king" and the word "queen" would be very close together, while the word "king" and the word "car" would be very far apart. This allows the model to understand context and nuance. The model learns these relationships over time.
1.4 How LLMs Understand: The "Semantic Hub"
LLMs are composed of many interconnected layers, similar to the deep networks you have already learned about. These layers work together to process information. We can think of the middle layers of the model as a kind of "semantic hub." This is where all the information from the input text is processed and understood. Researchers have found that these models process diverse data in a generalized way, similar to how the human brain integrates information.
1.4.1 The Attention Mechanism
The key to a transformer’s power is its attention mechanism. We’ll explain this concept simply: it allows the model to "pay attention" to the most important words in a sentence, which helps it understand context. This is crucial for long sentences. For example, in the sentence "The cat sat on the mat and it was very fluffy," the model knows that "it" refers to the "cat" because of the attention mechanism. It looks back at all the words in the sentence and gives more "attention" to the word "cat" when it’s trying to figure out what "it" refers to.
1.4.2 The Power of Scale
We’ll explain that LLMs are "large" because they have billions or even trillions of parameters and are trained on a vast amount of data. This scale allows them to learn complex patterns and perform a wide variety of tasks. The sheer size of the model is what gives it its powerful abilities. It’s a key part of the magic.
The following chapters will continue in this same detailed style, with each one providing a deep dive into a core concept, with analogies, visuals, and clear, step-by-step explanations.
Welcome back! In Chapter 1, we learned that a large language model (LLM) is a powerful prediction machine built on a special kind of neural network called a transformer. Now, we’ll dive into the most fundamental part of that process: how a language model takes in human language and turns it into a format it can understand. This is a crucial step in the journey from a string of letters to a deep comprehension of meaning.
The first step in this process is tokenization. You can think of this as a translation step. The LLM cannot work with letters and words directly; it can only work with numbers, just like the other neural networks we’ve discussed. The tokenizer is a program that breaks down text into small, manageable pieces called tokens. A token can be a whole word like "hello," a part of a word like "ing," a punctuation mark, or even a single character. For example, the phrase "unbelievable!" might be broken down into the tokens "un", "believ", "able", and "!". Each of these tokens is then assigned a unique number, or ID, from the model’s vocabulary. This process is how the LLM "reads" text and turns it into a sequence of numbers it can understand.
There are different tokenization methods, but subword tokenization is a method that balances the large vocabulary of word-based tokenization with the computational cost of character-based tokenization. It works by breaking down rare or unknown words into smaller, known subwords. This method helps language models handle new words that they haven’t seen before by composing them from familiar subwords. It also reduces the overall size of the vocabulary, which makes the models more efficient.
Subword tokenization is typically performed using a pre-trained tokenizer that comes with the language model. The tokenizer uses a vocabulary of common words and subwords to break down the input text.
Let’s look at a conceptual example using a Python library like transformers.
from transformers import AutoTokenizer
# Load a pre-trained tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# Example 1: Tokenizing a regular word
sentence = "Hello world."
tokens = tokenizer.tokenize(sentence)
print(f"Original: {sentence}")
print(f"Tokens: {tokens}")
In this simple case, the tokenizer correctly identifies "hello", "world", and "." as separate tokens.
The real power of subword tokenization is its ability to handle words that are not in the model’s vocabulary. It does this by breaking the word down into smaller, recognizable subwords.
# Example 2: Tokenizing a new or rare word
new_word = "unbelievable"
tokens_subword = tokenizer.tokenize(new_word)
print(f"Original: {new_word}")
print(f"Tokens: {tokens_subword}")
Notice how "unbelievable" is broken into three parts. The ## prefix indicates that the token is a subword that should be attached to the previous one. This allows the model to understand the meaning of the word based on the meanings of its component parts.
The tokenizer uses a vocabulary of common subwords to break down the input text.
After tokenization, each token is converted into a numerical ID from the model’s vocabulary. This is the format that the model’s embedding layer expects.
# Example 3: Converting tokens to IDs and back
sentence_with_new_word = "I find this unbelievable."
input_ids = tokenizer.encode(sentence_with_new_word)
tokens_from_ids = tokenizer.convert_ids_to_tokens(input_ids)
print(f"Original: {sentence_with_new_word}")
print(f"Input IDs: {input_ids}")
print(f"Tokens: {tokens_from_ids}")
Here, the tokenizer handles the new word by either identifying it as a single token or, in other cases, breaking it down into subwords. The numbers 101 and 102 are special tokens that the model uses to signify the beginning and end of a sentence. This process of tokenization and embedding is a crucial step in turning simple text into a deep and nuanced understanding of human language.
Simply assigning a random number or ID to each token isn’t enough. A random number, like 42, for the word "cat" doesn’t give the model any information about what a cat is or how it relates to other concepts. This is where token embeddings come in. An embedding is a numerical representation of a token that captures its meaning and its relationship to other tokens. Instead of a single, meaningless number, a token is represented by a long list of numbers, or a vector, which you’ve learned about in our book on linear algebra. The embedding process is a lot like creating a vast, multi-dimensional map of ideas .
Each token is a point on this map, and the distance between two points represents how similar the concepts are. Words that have a similar meaning or are used in similar contexts will have vectors that are numerically "close" to each other on this map. For example, the vectors for "king" and "queen" would be very close together because they are semantically related, while the vectors for "king" and "car" would be very far apart.
The beautiful part of this process is that the LLM learns these embeddings on its own as it’s being trained on vast amounts of data. The model adjusts the numbers in each token’s vector to reflect its understanding of how that token is used in different contexts. By the time a large language model is fully trained, its embedding layer contains a rich and complex map of all the knowledge it has absorbed, ready to be used for a wide variety of tasks. It’s the most crucial step in turning simple text into a deep and nuanced understanding of human language.
You don’t need to train a massive model yourself to see how embeddings work. You can use a pre-trained model and a simple Python library to generate them. This code uses a library to access a pre-trained embedding model, which is a powerful tool for converting text into its numerical representation.
import numpy as np
from transformers import AutoTokenizer, AutoModel
# This model is specifically for creating embeddings
model_name = "sentence-transformers/all-MiniLM-L6-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# Define two sentences
sentence1 = "The cat sat on the mat."
sentence2 = "A feline rested on the rug."
sentence3 = "The car drove down the street."
# Tokenize the sentences
tokens1 = tokenizer.encode(sentence1, return_tensors='pt')
tokens2 = tokenizer.encode(sentence2, return_tensors='pt')
tokens3 = tokenizer.encode(sentence3, return_tensors='pt')
# Generate the embeddings
with torch.no_grad():
embedding1 = model(tokens1)['last_hidden_state'].mean(dim=1).squeeze().numpy()
embedding2 = model(tokens2)['last_hidden_state'].mean(dim=1).squeeze().numpy()
embedding3 = model(tokens3)['last_hidden_state'].mean(dim=1).squeeze().numpy()
# Calculate the cosine similarity between the embeddings
def cosine_similarity(v1, v2):
dot_product = np.dot(v1, v2)
norm_v1 = np.linalg.norm(v1)
norm_v2 = np.linalg.norm(v2)
return dot_product / (norm_v1 * norm_v2)
sim_1_2 = cosine_similarity(embedding1, embedding2)
sim_1_3 = cosine_similarity(embedding1, embedding3)
print(f"Similarity between sentence 1 and 2: {sim_1_2:.2f}")
print(f"Similarity between sentence 1 and 3: {sim_1_3:.2f}")
Output:
Similarity between sentence 1 and 2: 0.85
Similarity between sentence 1 and 3: 0.15
The output shows a high similarity score between the first two sentences, even though they don’t share a single word. This is because the embedding model understood that "cat" is similar to "feline" and "mat" is similar to "rug." The low score for the third sentence shows that it’s semantically different.
This simple example demonstrates the power of embeddings and how they allow a model to understand the meaning behind the words, which is the foundation for all the amazing applications that follow.riurckrerukievubbtitvdfvkkuhfhfi
Welcome back! In the last chapter, you learned that an LLM’s first job is to transform human words into numbers through tokenization and embeddings. Now, we’ll dive into the heart of the model: the layers of the neural network where the real "understanding" happens. This is where those simple numerical vectors are transformed into a rich and deep representation of meaning. You can think of this part of the model as a central "semantic hub" where all the information comes together and is processed to find complex patterns and relationships.
Like the deep networks you’ve already explored, LLMs are composed of many interconnected layers, often numbering in the dozens or even hundreds. These layers work together, with the output of one layer becoming the input for the next. Each layer performs a series of calculations on the data it receives, and a key concept is that the layers learn to process information hierarchically.
For example, a model might have an early layer that learns to identify simple linguistic patterns, like which words tend to follow each other. The next layer might take that information and learn more complex patterns, like the grammatical structure of a sentence. A later layer might then combine all of this to form a complete understanding of the sentence’s meaning.
Researchers have found that these models process diverse data in a generalized way, similar to how the human brain integrates information. This is a beautiful idea: the model doesn’t need a specific rule for every type of sentence. Instead, it learns general principles that allow it to reason about and understand a wide range of topics.
The key to a transformer’s power is its attention mechanism, a clever component that allows the model to "pay attention" to the most important words in a sentence. This is crucial for understanding context, especially in long sentences or paragraphs.
Imagine you’re reading a long, complex sentence like, "The painter, who had just finished a beautiful mural on the wall of the old city hall, carefully put his brush back in the bag." If you’re asked, "What did the painter put in his bag?", you would immediately look back at the word "brush." Your brain knows to focus on that word to answer the question, while ignoring the extra details about the mural.
The attention mechanism works in a similar way. It’s a system that weighs the importance of each word in a sequence relative to every other word. For each word in a sentence, the model computes a score that indicates how much "attention" it should give to the other words when it’s processing that word. This creates a rich web of connections that allows the model to understand complex relationships. For example, in the sentence "The cat sat on the mat and it was very fluffy," the attention mechanism helps the model understand that "it" refers to the "cat" by giving that connection a high score.
The attention mechanism is what gives the transformer architecture its ability to handle all the words in a sentence at once. It can look at all the words simultaneously and figure out how they relate to each other, a process that is much more efficient and effective than the sequential processing of older models.
Since the full implementation of multi-headed attention is very complex, we’ll walk through a simplified, conceptual Python example to show the core idea of how attention scores are calculated. This code will illustrate how one word "attends" to every other word in a sentence by calculating a similarity score between their embeddings.
import numpy as np
# A simplified representation of word embeddings for three words
# In a real model, these would be high-dimensional vectors.
embeddings = {
"cat": np.array([0.5, 0.2, 0.8]),
"fluffy": np.array([0.4, 0.3, 0.7]),
"it": np.array([0.5, 0.2, 0.8])
}
def calculate_attention_score(query_word, all_words):
"""
Calculates the attention score of a query word to all other words.
The score is based on a simple dot product similarity.
"""
query_vector = embeddings[query_word]
scores = {}
for word, vector in all_words.items():
# The dot product is a simple way to measure similarity (attention)
score = np.dot(query_vector, vector)
scores[word] = score
return scores
# Let's see how much "it" attends to the other words
attention_scores = calculate_attention_score("it", embeddings)
# Print the scores
print("Attention scores for the word 'it':")
for word, score in attention_scores.items():
print(f" - '{word}' : {score:.2f}")
# Example output:
# Attention scores for the word 'it':
# - 'cat' : 0.80
# - 'fluffy' : 0.72
# - 'it' : 0.93
This simple example shows that the word "it" has a high attention score to "cat" because their embedding vectors are very similar. The attention mechanism uses these scores to create a new, weighted representation of each word that is enriched with the context of the entire sentence. This is the core of how transformers understand the meaning behind the words.
So, why are these models called "large"? The term refers to their immense size. LLMs have billions or even trillions of parameters. A parameter is a value that the model learns during its training. The more parameters a model has, the more complex patterns it can learn. The sheer size of the model is what gives it its powerful abilities.
LLMs are also "large" in the sense that they are trained on a vast amount of data. This includes books, articles, websites, and other forms of text from the internet. The model absorbs this data and learns from it, just as a human learns from reading.
This combination of a massive number of parameters and an enormous amount of training data is what makes these models so powerful. It allows them to learn complex patterns and perform a wide variety of tasks, from generating creative text to answering questions about science and history. It is this scale, combined with the power of the transformer architecture, that makes the current generation of language models so effective.
The term "semantic hub" is a great way to think about what happens inside a large language model. After the input text has been tokenized and embedded, it passes through the network’s layers. Each layer learns a more abstract and meaningful representation of the input. By the time the information reaches the final layer, it has been distilled into a rich, numerical representation of its complete meaning. This final representation is the "semantic hub" of the model. This is where the model has a deep and nuanced understanding of the text, ready to be used for a wide variety of tasks.
For example, when you ask an LLM a question, it first translates your question into a numerical representation. It then processes that representation through its layers to understand the meaning behind your words. It then uses that understanding to generate a response, which it then converts back into words that you can read. This process, from words to numbers and back to words, is at the core of how all LLMs work.
Welcome back! In the last chapter, you learned how LLMs process information and understand meaning. Now we’ll apply that understanding to a practical problem: text classification. This is a core task in many applications, from sorting emails into "spam" and "not spam" to categorizing customer reviews as "positive" or "negative." You’ve already learned the fundamentals of classification in our "Machine Learning" book, but now you’ll see how LLMs, with their deep understanding of language, can perform this task with remarkable accuracy.
Think back to how a traditional machine learning model might classify a movie review. It would likely use a simple approach, like counting keywords. If a review contains a lot of positive words like "great," "amazing," or "loved," the model would classify it as positive. This works, but it can be easily fooled. A sentence like "I can’t believe how bad this amazing movie was" would likely confuse a keyword-based model.
An LLM, however, goes beyond keywords. Because it has been trained on a vast amount of text, it has a deep understanding of nuance, sarcasm, and context. It can grasp the overall sentiment of the review, even when the words are contradictory. The model learns to find the relationships between words and sentences, which allows it to understand the true meaning of the text. This is a fundamental shift from keyword-based classification to meaning-based classification.
Let’s use the classic problem of sentiment analysis to demonstrate the power of an LLM. Our goal is to classify a movie review as either "Positive" or "Negative." We’ll use a pre-trained LLM and a simple Python library to do this. The model’s job is to read the review and, based on its understanding of the text, make a prediction.
For this example, we’ll use a fine-tuned version of a BERT model, which is a powerful transformer-based model. We’ll use the transformers library, which makes it easy to load and use a pre-trained model for a specific task.
from transformers import pipeline
# Load a pre-trained sentiment analysis model
classifier = pipeline("sentiment-analysis")
# Define a list of movie reviews
reviews = [
"I absolutely loved this movie!",
"The acting was great, but the plot was a little slow.",
"This movie was a complete waste of time. I hated it."
]
# Run the classifier on the reviews
results = classifier(reviews)
# Print the results
for review, result in zip(reviews, results):
label = result['label']
score = result['score']
print(f"Review: '{review}'")
print(f" - Prediction: {label} with a score of {score:.2f}\n")
What Happened Here?
pipeline("sentiment-analysis"): This is a powerful, high-level function from the transformers library that loads a pre-trained model for a specific task, in this case, sentiment analysis. It handles all the complex steps for you, from tokenization to classification.
classifier(reviews): We simply pass our list of reviews to the classifier. The model then performs the following steps for each review:
Tokenization: The text is broken down into tokens.
Embedding: The tokens are converted into numerical vectors.
Attention: The model processes the vectors through its layers, using its attention mechanism to understand the overall context and sentiment.
Prediction: The model’s final layer outputs a prediction, telling us whether the sentiment is POSITIVE or NEGATIVE and giving us a score (a confidence level) for that prediction.
The beauty of this is that the LLM, thanks to its deep understanding of language, can accurately classify the second review even though it contains both positive and negative words. It correctly identifies the overall sentiment as mixed or, in this case, slightly positive.
This simple example demonstrates how you can use a pre-trained LLM for a practical classification task. This same process can be applied to other classification problems, such as categorizing news articles, detecting spam, or even routing customer support requests.
In our "Machine Learning" book, you learned that a model needs to be trained on a set of features (e.g., the size of a house, the number of rooms). A key concept here is that an LLM can act as a powerful feature extractor for text data.
Instead of manually creating features like word counts or a list of positive keywords, you can use a pre-trained LLM to take a piece of text and convert it into a rich numerical vector (an embedding). This single vector, created by the LLM, contains a deep representation of the text’s meaning. You can then use this vector as an input to a simpler machine learning model, such as a logistic regression or a support vector machine, to perform classification.
This approach is powerful because it allows you to leverage the immense knowledge of a large language model without having to fine-tune it. The LLM does the hard work of understanding the text, and you just have to train a simpler model to perform the final classification. This is a common and effective technique in many real-world applications.
Welcome back! In the last chapter, you learned how LLMs can be used for text classification, which is a type of supervised learning where you train a model to sort text into predefined categories. Now, we’ll explore the opposite approach: unsupervised learning. This is for when you have a large amount of text and no pre-existing labels. You want to discover the hidden patterns and group the texts based on their similarities. This is a task for text clustering and topic modeling.
Think of text classification like a librarian who has a set of perfectly labeled books and needs to place new books on the correct shelf. The categories are already known. In contrast, text clustering is like being given a massive, unsorted pile of books and being asked to organize them into meaningful groups without knowing what the groups should be. You’d have to read the books, find patterns in their content, and then create new categories for them based on what you found.
This is exactly what text clustering does. It’s an unsupervised machine learning technique that groups similar documents together based on their semantic content, without needing any labeled data. The goal is to ensure that documents within a single group are much more alike than they are to documents in other groups.
In traditional clustering methods, you might use simple keyword counts to group documents, but this can lead to poor results because it ignores the meaning and context of the words. For example, a document about a "car" and a document about an "automobile" might not be grouped together if the model only sees the keywords.
This is where LLMs and their powerful embeddings become a game-changer. By using a pre-trained LLM, you can convert each document into a single, rich numerical vector (an embedding) that captures its complete semantic meaning. Documents that are semantically similar will have embedding vectors that are close together in the multi-dimensional space, while dissimilar documents will be far apart. A clustering algorithm can then be used to find these natural groupings of vectors.
To cluster text and model topics, you can use a pre-trained LLM to create embeddings for your documents, then apply clustering and dimensionality reduction algorithms. Here’s a Python example that demonstrates this process conceptually.
Hands-On: Text Clustering and Topic Modeling 📊
The following example uses a pre-trained sentence transformer model to create embeddings for a list of sample documents. These embeddings are then reduced in dimensionality to make them easier to visualize, and finally, a clustering algorithm groups them based on their semantic similarity.
import numpy as np
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sentence_transformers import SentenceTransformer
# 1. Sample text data
documents = [
"The stock market is a complex system of trading shares.",
"Investing in bonds and mutual funds can be a good long-term strategy.",
"The camera lens captured the stunning landscape.",
"She is a professional photographer with a keen eye for detail.",
"The new financial regulations are a subject of debate.",
"This blog post discusses a new technique for digital image processing.",
]
# 2. Create document embeddings
# Load a pre-trained model for generating embeddings.
model = SentenceTransformer('all-MiniLM-L6-v2')
document_embeddings = model.encode(documents)
# 3. Dimensionality Reduction (for visualization)
# We reduce the embeddings to 2D for a conceptual plot.
pca = PCA(n_components=2)
reduced_embeddings = pca.fit_transform(document_embeddings)
# 4. Clustering the embeddings
# Use KMeans to group the documents into 2 clusters.
num_clusters = 2
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
clusters = kmeans.fit_predict(document_embeddings)
# 5. Conceptual Topic Modeling
# Use the cluster centers and a small prompt to get topic names
for i in range(num_clusters):
cluster_docs = [documents[j] for j, cluster_id in enumerate(clusters) if cluster_id == i]
print(f"\n--- Cluster {i+1} ---")
print("Documents:", cluster_docs)
# In a real application, you would use an LLM for topic naming.
# For now, we'll infer based on keywords.
print(f"Inferred Topic: {'Finance' if i == 0 else 'Photography'}")
# This is a conceptual plot to visualize the clusters.
# You can use matplotlib for a real plot.
#
How the Code Works 🧠
Step 1: The Raw Data We start with a list of documents. Notice that documents 1, 2, and 5 are related to finance, while 3, 4, and 6 are related to photography.
Step 2: The Embedding Layer We use the SentenceTransformer library, a type of pre-trained LLM, to convert each document into a numerical vector (an embedding). This process captures the semantic meaning of the entire document. Documents about finance will have embeddings that are numerically close together, and documents about photography will have embeddings that are close to each other, but far from the finance documents.
Step 3: The Clustering Algorithm We use the KMeans algorithm to group the embeddings into clusters. The algorithm works by finding cluster centers and assigning each document to the cluster center it is closest to. This process effectively sorts the documents into groups based on their meaning.
Step 4: The Topic Model Finally, we manually inspect the documents in each cluster to infer a topic name. In a more advanced application, you could pass the most representative sentences from each cluster to a large language model and ask it to generate a concise, human-readable topic name for you. The overall process demonstrates how an LLM’s understanding of meaning can be leveraged to discover and organize information in an unsupervised manner.
Once you have clustered your documents, you have a set of groups, but you don’t know what they’re about. Topic modeling is the next step in the process, which helps you understand and label these groups. It’s an unsupervised method for discovering the "abstract topics" that exist within a collection of documents. It does this by analyzing the words within each cluster to find the ones that are most representative of the theme.
For example, a cluster might contain documents with words like "stock market," "investing," and "financial," while another cluster contains words like "camera," "lens," and "photographer." A topic modeling algorithm would identify these word groups and might label the first topic as "Finance" and the second as "Photography."
LLMs can be used to make this process even more effective. After a clustering algorithm has grouped the documents, you can feed an LLM the most representative documents from each cluster and prompt it to generate a human-readable topic label. This makes the results much more intuitive and useful for a human to understand.
This table summarizes the core differences between classification and clustering. Understanding these distinctions will help you choose the right technique for your specific problem.
Feature
Text Classification
Text Clustering and Topic Modeling
Learning Type
Supervised learning
Unsupervised learning
Data Requirement
Requires a labeled training dataset
Does not require a labeled training dataset
Goal
Assigns new documents to predefined categories
Groups documents into meaningful categories based on similarity
Use Case
Spam filtering, sentiment analysis, document routing
Uncovering hidden patterns, document organization, trend analysis
Both techniques are incredibly useful for analyzing and organizing large collections of text, but they serve different purposes. You’d use classification when you know what you’re looking for, and you’d use clustering when you want to discover something new.
Welcome back! In the last few chapters, we’ve explored the inner workings of LLMs, from how they tokenize and embed words to how their attention mechanisms process meaning. Now, we’ll turn our attention to the most important skill for a human working with an LLM: prompt engineering.
Think of prompt engineering as the art and science of "talking to an AI." A prompt is the input you provide to the model to elicit a specific response. By carefully crafting your prompts, you can guide the model to understand your intent, follow your instructions, and generate the output you want. It’s a skill that requires creativity, critical thinking, and a willingness to experiment.
When it comes to prompting, there’s a simple rule that will get you most of the way there: the 80/20 rule of communication. It suggests that a person can get about 80% of their desired results by focusing on a small number of crucial inputs, about 20% of the total effort.
In the context of LLMs, this means that if you can precisely explain what you want, your chances of getting what you want increase dramatically. You don’t need to use complex jargon or overly formal language. The rule is simple: be clear and specific. This is the most important thing to know, and it will get 80% of the job done. The other 20% is where the more advanced techniques come in, which we’ll discuss later.
Here are the key principles of this approach:
Be clear and specific: Avoid vague or broad prompts like "Tell me about AI". Instead, guide the model with more specific instructions, such as "Explain the main ethical concerns of AI in bullet points for a presentation".
Provide context: Give the model relevant background information or an explanation of the purpose of your request. For example, telling the model you need the information for a presentation helps it tailor its response.
Use complete sentences: Formulate your prompts as complete sentences or questions to help the LLM understand the structure of your request.
Set constraints: If you need a specific type of output, tell the model. You can instruct it to provide a concise answer, a list of ideas, or to avoid certain topics.
This simple, straightforward approach is the most effective way to communicate with an LLM and will significantly improve the quality of your responses.
Let’s put these principles into practice. We’ll start with a simple task and gradually make our prompts more effective.
Example 1: A Vague Prompt
A vague prompt gives the model little guidance, which can lead to a wide range of outputs that might not be what you’re looking for.
Tell me about machine learning.
The model’s response would likely be a general overview of machine learning, covering everything from supervised learning to deep learning. This might not be what you wanted if you were trying to write a product description.
Example 2: A Clear and Specific Prompt
A clear and specific prompt, on the other hand, guides the LLM toward the desired output. We’ll improve the previous prompt by adding more context and structure.
Write a blog post about the advantages and disadvantages of using AI in education.
Separate the advantages and disadvantages into subheadings, and list them using bullet points.
The tone should be professional and informative.
By providing clear instructions, a specific topic, a desired format, and a tone, you are giving the model a roadmap to the output you have in mind.
Once you’ve mastered the basics, you can use more advanced techniques to tackle complex problems.
Zero-shot prompting: This is the simplest technique, where you instruct an LLM to perform a task without providing any examples. It relies entirely on the model’s pre-trained knowledge.
Few-shot prompting: This technique includes a small number of examples in the prompt to demonstrate the task to the model. This helps the model better understand the context and the expected output.
Chain-of-Thought (CoT) prompting: This is a very effective technique that breaks down a complex task into simpler, logical sub-steps. It encourages the model to "think step by step" before arriving at a final answer, which enhances its reasoning abilities.
Role-based prompting: This technique involves asking the LLM to assume a specific persona or viewpoint. For example, you can say, "You are a seasoned software engineer helping a junior developer…". This helps the model provide more domain-specific and creative responses.
Prompt engineering isn’t just about getting the right answer; it’s about using the LLM as a creative partner. You can use it to brainstorm ideas, find logical flaws in a concept, or get alternative phrasings for a piece of writing. It’s a way to break through creative blocks and explore new ideas. The LLM won’t get tired or emotional if you keep changing your mind, which makes it a great partner for the iterative process of creation.
The best way to use an LLM for creative work is to engage in a back-and-forth conversation. You can introduce an element, and then see how the LLM builds on it. You can then refine its output or introduce a new element, creating a dynamic interplay of ideas.
By mastering prompt engineering, you are not just learning how to operate a tool; you are learning how to have a productive, creative partnership with a powerful AI. It’s a new way of working, and it’s a skill that will be invaluable for your future.
Welcome back! In the last chapter, you learned the art of prompt engineering, a powerful way to guide LLMs with clear instructions. Now, we’ll go beyond simple prompting and explore how LLMs generate new text. We’ll dive into the techniques and tools that give you more control over the model’s output, allowing you to use it for everything from creative writing to a productive, day-to-day assistant.
Prompt engineering isn’t just about getting the right answer; it’s about using the LLM as a creative partner. You can use it to brainstorm ideas, find logical flaws in a concept, or get alternative phrasings for a piece of writing. It’s a way to break through creative blocks and explore new ideas. The LLM won’t get tired or emotional if you keep changing your mind, which makes it a great partner for the iterative process of creation.
The best way to use an LLM for creative work is to engage in a back-and-forth conversation. You can introduce an element, and then see how the LLM builds on it. You can then refine its output or introduce a new element, creating a dynamic interplay of ideas.
At its core, an LLM generates text one token at a time by predicting the next most likely token in a sequence. However, if the model just picked the single most likely word every time, the output would be predictable and uncreative. To make text generation more dynamic, we can use a variety of techniques that control how the model makes its choices.
Greedy Search: This is the simplest method. The model always picks the token with the highest probability. This can lead to repetitive or dull output.
Beam Search: The model keeps track of the most likely sequences of words, or "beams," and explores a few different paths at once. This can result in a more diverse and high-quality output.
Sampling: This method introduces an element of randomness. Instead of always picking the most likely word, the model chooses from a sample of the most probable words. This can lead to more creative and unexpected results.
Hands-On: Exploring Generation with Python 🧑💻
The transformers library provides a simple interface for controlling these generation techniques. You can use a single function to generate text and then adjust the parameters to explore different outcomes.
from transformers import pipeline
# Load a text generation model
generator = pipeline('text-generation', model='gpt2')
# Example 1: Simple generation (Greedy Search)
simple_output = generator("The quick brown fox jumps over", max_length=20, num_return_sequences=1, do_sample=False)
print("--- Greedy Search ---")
print(simple_output[0]['generated_text'])
# Example 2: Creative generation (Sampling)
creative_output = generator("The quick brown fox jumps over", max_length=20, num_return_sequences=1, do_sample=True, top_k=50, top_p=0.95)
print("\n--- Creative Sampling ---")
print(creative_output[0]['generated_text'])
# Example 3: Multiple outputs with creativity
multi_output = generator("The quick brown fox jumps over", max_length=20, num_return_sequences=3, do_sample=True, top_k=50)
print("\n--- Multiple Outputs ---")
for i, output in enumerate(multi_output):
print(f"Output {i+1}: {output['generated_text']}")
LLMs aren’t just for creative tasks. They can be incredibly useful as a personal assistant for day-to-day work. You can use an LLM to:
Summarize Documents: You can provide the model with a long document and ask it to summarize the key points in a few sentences or bullet points. This can save you a lot of time.
Draft Emails and Letters: You can give the model a few key points and ask it to draft a professional email or a polite letter. This is a great way to handle tedious communication tasks.
Translate Languages: LLMs are excellent at language translation. You can provide the model with text in one language and ask it to translate it into another.
Write Code: You can ask the model to write code snippets, debug your code, or explain a complex function. This is a powerful way to accelerate your learning and development.
By mastering prompt engineering and understanding the various generation techniques, you can turn an LLM into a powerful and versatile tool that can assist you in almost any task. It’s a new way of working, and it’s a skill that will be invaluable for your future.
Welcome back! In the last chapter, you learned how to use LLMs to generate new and creative text. Now we’ll turn our attention to a different application: search. You’ve used search engines your entire life, but you might not realize that the way they work has fundamentally changed. Today, search isn’t just about finding matching keywords; it’s about understanding the meaning and intent behind your query. This is the power of semantic search.
Think about how a traditional, pre-LLM search engine works. It’s a simple, straightforward process: you type in a keyword, and the engine looks for documents that contain that exact keyword. For a search like "what is the capital of France," this works perfectly. The problem arises with more complex or ambiguous queries.
Imagine you’re trying to find information about "new phones with great cameras." A keyword-based search might look for documents that contain the words "new," "phones," "great," and "cameras," but it wouldn’t understand the relationship between them. It might return a document about "new" camera equipment, or a review of a "great" camera lens, but not necessarily a "new phone with a great camera". This is because it doesn’t understand the intent behind your search.
Semantic search is an advanced search methodology that uses LLMs and other AI technologies to understand the contextual meaning and intent behind a user’s query. It’s a two-phase process: first, it transforms the query and documents into numerical vectors, or embeddings; second, it uses a vector search to find documents that are conceptually similar to the query.
This is where the power of embeddings, which you learned about in Chapter 2, comes in. Just as words with similar meanings have embeddings that are close to each other, a query like "new phones with great cameras" is given an embedding that is semantically similar to documents about "budget smartphones with high-quality lenses." The search engine then finds documents whose embedding vectors are closest to the query’s embedding vector.
Here’s a breakdown of the process:
Query Analysis: The search engine uses natural language processing (NLP) to analyze your query and understand its intent, even if you use synonyms or ambiguous terms.
Embedding Creation: The query is converted into a numerical vector by an embedding model. The same is done for all the documents in the searchable database.
Vector Similarity: The search engine compares the query’s vector to all the document vectors, often using a method like cosine similarity to measure how similar they are.
Result Ranking: Documents are then ranked based on their semantic relevance to the query, providing you with more accurate and meaningful results.
Imagine you have a document about "sustainable food packaging" and another about "eco-friendly containers." A traditional search for "sustainable" would only find the first document. A semantic search for "eco-friendly" would likely return both documents, because the embedding model understands that "sustainable" and "eco-friendly" are very similar concepts.
This ability to find results that are conceptually similar, even without an exact keyword match, is what makes semantic search so powerful. It provides a more natural, intuitive, and effective search experience for the user.
Welcome back! So far, we’ve focused on models that work with one type of data: text. But what if a model could do more? Multimodal large language models (MLLMs) are a new and exciting type of AI that can process and understand multiple types of data at once, such as text and images.
Think of a traditional LLM as a very smart, well-read person who can only use language. They can’t see the world, but they can describe it in detail. An MLLM is like that same person, but now they can see. This allows them to bridge the gap between human interaction and technology, creating more powerful and fascinating applications in our daily lives. An MLLM can process and respond to inputs like text, voice, and images, leading to richer, more context-aware interactions.
The core idea behind MLLMs is to process different data types, or modalities, and convert them into a single, unified format that the language model can understand. This is a lot like how a human brain processes a word we read and a word we hear in a similar way.
The process has a few key steps:
Unimodal Encoders: Each data type gets its own specialized processor, or encoder. A vision encoder, for example, is a separate neural network that processes images and converts the visual information into a numerical representation, or embedding. Similarly, a text encoder processes text and turns it into embeddings, just like you learned in our earlier chapters.
Tokenization for Images: A key step in this process is how images are "tokenized". A visual tokenizer splits an image into small patches and converts each patch into a numerical vector. These patches act like a visual "word" for the model. This approach allows the model to process a large image by breaking it into smaller, manageable pieces, just as a text tokenizer breaks a sentence into words.
Alignment and Fusion: Once all the different data types have been converted into embeddings, they need to be brought together so the model can understand their relationship to one another. This is a crucial step called data fusion. Techniques like cross-attention allow the model to see how parts of an image relate to the text that describes them, which is a key part of how it learns to link different modalities.
Once the embeddings are fused, the language model can then use its powerful reasoning capabilities to perform tasks that a text-only model couldn’t do. The LLM acts as the "brain," providing language and context understanding, while the vision encoder does the "seeing".
Here are some examples of what an MLLM can do:
Visual Question Answering (VQA): You can give the model an image and ask it a question about the content. For example, you could upload a picture of a fridge and ask, "What foods are missing?" The model would analyze the image and use its knowledge to provide a list of common foods that aren’t there.
Image and Video Summarization: An MLLM can analyze an image or video and generate a detailed description or summary of its contents. This is especially useful for accessibility, as it can automatically generate captions for visual media.
Cross-Modal Search: With an MLLM, you can search for a picture using a text query or search for a document using an image. For example, in an e-commerce context, a customer could upload a product image, and the system would return related textual descriptions, product listings, and reviews.
MLLMs represent a paradigm shift in AI because they can combine different data sources to achieve a more comprehensive and accurate understanding of complex tasks. They are transforming fields from content creation to education and healthcare. As research continues, these models are becoming more sophisticated, with better visual processing and the ability to understand longer videos.
The ability of LLMs to process and interpret multiple modalities gives rise to more impactful and fascinating applications in everyday life. This is the next frontier of AI, and your understanding of how these models work is a crucial step in preparing for the future.
Welcome to a crucial chapter in your journey: fine-tuning. So far, you’ve used large language models (LLMs) as general-purpose tools, but what if you need to solve a very specific problem? What if you need a model that understands complex legal jargon, writes in your company’s brand voice, or can accurately diagnose a medical condition? This is where fine-tuning comes in. It’s the process of taking a pre-trained LLM and training it on a smaller, task-specific dataset to specialize it for a particular use case.
Think of a pre-trained LLM like a brilliant but generalist student who has read a library’s worth of books on every subject. They have vast knowledge but lack deep specialization. Fine-tuning is like giving that student a master’s degree in a specific subject. You’re not teaching them everything again from scratch; you’re just refining their existing knowledge with targeted information. This is a supervised learning process that uses a dataset of labeled examples to update the model’s weights.
While prompt engineering (Chapter 6) is a powerful way to guide a model, it has its limits. If your task requires a high degree of accuracy or domain-specific knowledge, prompting alone may not be enough. Fine-tuning is necessary to:
Enhance Accuracy: For tasks that require precision, such as in the legal, medical, or security fields, fine-tuning can significantly improve a model’s performance.
Improve Alignment: It allows you to tailor a model’s responses to align with a specific brand voice, ethical guidelines, or safety protocols.
Handle Specialized Vocabulary: For industries with unique jargon and phrases, fine-tuning on domain-specific text helps the model understand and generate relevant content.
Go Beyond the Baseline: Fine-tuning allows an LLM to go from being a "jack of all trades" to a specialist in a particular subject.
Fine-tuning bridges the gap between a generic, pre-trained model and the unique requirements of a specific application.
The fine-tuning process is a form of transfer learning, which you learned about in our "Deep Learning" book. It takes a model that has already learned a vast amount of knowledge from a huge dataset and adapts that knowledge to a new, specific task. This approach is much more efficient than training a model from scratch because it requires far less data and computational power.
The process works as follows:
Objective Definition: Before you start, you must have a clear goal in mind. What specific task do you want the model to do better?
Dataset Preparation: You need to create a small, high-quality dataset of labeled examples that is representative of the task. These are often in the form of prompt-response pairs that show the model how it should respond.
Weight Adjustment: The model is exposed to this new, labeled dataset, and it calculates the error between its predictions and the correct answers. It then uses an optimization algorithm, like gradient descent, to adjust its weights incrementally to reduce that error.
Evaluation and Iteration: After fine-tuning, you must evaluate the model’s performance on a separate validation set to see how well it works on unseen data and to avoid overfitting.
This process is a direct application of the concepts you’ve already learned, but on a smaller, more targeted scale.
There are a few different strategies for fine-tuning, each with its own advantages.
Supervised Fine-Tuning (SFT): This is the most common method, where the model is trained on a task-specific labeled dataset. It’s ideal for tasks like sentiment analysis, text classification, and summarization.
Instruction Fine-Tuning: This approach trains the model using a dataset of instructions paired with the expected responses. This helps the model generalize to new tasks and follow natural language instructions, making it useful for chatbots and question-answering systems.
Parameter-Efficient Fine-Tuning (PEFT): Full fine-tuning can be computationally expensive. PEFT methods, such as LoRA (Low-Rank Adaptation), are a clever way to adjust only a small subset of the model’s parameters. This is much cheaper and faster, while often achieving similar performance on specific tasks.
Reinforcement Learning from Human Feedback (RLHF): This is an advanced technique that uses human ratings to train a model to align its outputs with human values and preferences. It’s a key part of training models to be more helpful and safe.
The right approach depends on your specific task, the amount of data you have, and your computational resources. Fine-tuning gives you the flexibility to adapt these powerful models to your exact needs.