Edited By Siddhartha Reddy Jonnalagadda, PhD
Written By Hundreds of Parents
Welcome back! You have a solid foundation in Python programming and the principles of machine learning. This book, Deep Learning: Building Intuition for a New Generation of AI, is the next step in your journey. Deep learning is a subset of machine learning that uses multi-layered neural networks to process data and solve problems.
A deep learning model is not a mysterious black box; it’s a logical evolution of the models you’ve already encountered. This book will help you build intuition for deep learning concepts, showing you how they are a natural extension of probabilistic and statistical reasoning. We’ll continue to use a patient, step-by-step approach that values a deep understanding over speed.
Our journey will begin by exploring the fundamental building block of deep learning: the neuron. We will then build upon that to understand how these simple units, when organized into layers, can learn complex patterns. We’ll also introduce powerful, high-level tools like TensorFlow and Keras, which simplify the process of building and training these networks. By the end of this book, you will have the knowledge and confidence to apply deep learning to new problems and continue your growth as a thoughtful programmer. Let’s begin.
In our last book, you learned that a machine learning model is a function that takes an input and produces an output. A single neuron in a neural network is a simple model that does the same thing, but it’s built to be stacked with others. This simple building block, when connected, forms a powerful tool.
The Single Neuron: A Weighted Sum and a Decision
Imagine a neuron as a tiny decision-maker. It takes in several pieces of information (inputs), multiplies each one by an importance factor (weight), adds them all up, and then makes a decision based on the total.
The inputs are represented by $x$. The weights are represented by $w$. Just like the parameters $\theta$ you learned about in machine learning, these weights are what the model learns during training. A single neuron’s calculation is a weighted sum:
$z = w_{1}x_{1} + w_{2}x_{2} + … + w_{n}x_{n} + b$
Here, $b$ is the bias, which you can think of as a constant that allows the neuron to activate even if all the inputs are zero. The result, $z$, is then passed through an activation function.
The Activation Function: Introducing Non-Linearity
The activation function decides whether the neuron should "fire" or not. It takes the weighted sum $z$ and transforms it into the neuron’s final output.
A Sigmoid function, which you’ve seen in logistic regression, squishes the output between 0 and 1.
A ReLU (Rectified Linear Unit) function is simpler; it outputs the input if it’s positive and outputs zero otherwise. It’s the most common choice today.
Stacking Neurons: Building the Neural Network
A neural network is simply a collection of these neurons, organized into layers. The outputs of one layer become the inputs for the next. This layered structure is what allows a neural network to learn complex patterns.
Input Layer: The first layer, which receives the raw data.
Hidden Layers: One or more layers of neurons between the input and output layers. These are where the network does its "thinking" and finds deeper patterns. A network with more than one hidden layer is considered a "deep" network.
Output Layer: The final layer, which produces the model’s prediction.
A neural network learns in an iterative process with two key phases: a forward pass and a backward pass. This process is how the network finds the right weights to make accurate predictions.
The Forward Pass: Making a Guess
This is the part you’re already familiar with. You feed the data into the input layer, the signal travels through the hidden layers, and the output layer makes a prediction. This is a one-way trip, from input to output.
The Backward Pass: Learning from Mistakes
This is the magic of deep learning. The network compares its prediction to the correct answer using a loss function, which you learned about previously. The goal is to minimize this loss, and we do this with an algorithm called Backpropagation.
Calculate the Loss: The loss function tells us how wrong the network’s prediction was.
Find the Gradient: Just like in gradient descent, we calculate the gradient, which tells us the direction to change our weights to decrease the loss.
Propagate Backwards: The genius of backpropagation is that it efficiently sends this error signal backward through the network, from the output layer to the hidden layers. It calculates exactly how much each weight contributed to the total error.
Update the Weights: The algorithm then uses the gradient and the learning rate to slightly adjust each weight, pushing the network closer to a correct prediction.
This process repeats thousands or even millions of times, with the network’s guesses getting better and better with each iteration.
Building neural networks from scratch can be complicated, but powerful libraries like PyTorch provide high-level tools to do it efficiently.
PyTorch: The Researcher’s Framework
PyTorch is a popular deep learning framework that is well-known for its simplicity and flexibility. It is especially useful for tasks that require dynamic computation, but for our purposes, it can be used for the same tasks as Keras.
A Step-by-Step Example
We’ll build a simple network to classify an image of a handwritten digit.
Load the Data: We’ll use the torchvision library, which contains many popular datasets, including the MNIST handwritten digits dataset.
Preprocess the Data: PyTorch uses a concept called tensors, which are similar to NumPy arrays, to handle data. We’ll define a series of transformations to prepare the data, including converting it to a tensor and normalizing the pixel values.
Build the Model: In PyTorch, you build a model by creating a class that inherits from torch.nn.Module. The constructor (__init__) defines the layers, and the forward method specifies how data flows through the network.
Define Loss and Optimizer: We’ll specify our loss function and optimizer, just like with Keras.
Train the Model: We’ll write a training loop that iterates through the data, makes predictions, calculates the loss, and then uses backpropagation to update the model’s weights.
Evaluate the Model: After training, we’ll evaluate the model’s accuracy on the test data.
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
# 1. Load and Preprocess the data
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)
# 3. Build the model
class SimpleNet(nn.Module):
def __init__(self):
super(SimpleNet, self).__init__()
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(28*28, 128)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.flatten(x)
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x
model = SimpleNet()
# 4. Define Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 5. Train the model
epochs = 5
for epoch in range(epochs):
running_loss = 0.0
for images, labels in train_loader:
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
# 6. Evaluate the model
correct = 0
total = 0
with torch.no_grad():
for images, labels in test_loader:
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')
Not all data is created equal. We have specialized neural networks designed for specific types of data. While a simple dense neural network can be a universal learner, it often fails to capture the unique structures found in certain data, such as the spatial relationships in images or the sequential dependencies in text.
Convolutional Neural Networks (CNNs) for Images 🖼️
A simple dense neural network would treat an image as a long list of numbers, losing all sense of the image’s two-dimensional structure. CNNs solve this problem by introducing convolutional layers. These layers act like a camera lens, scanning the image and learning to recognize patterns like edges, shapes, and textures. They are the go-to model for any computer vision task, from self-driving cars to medical image analysis.
The Convolutional Layer: This layer uses a small filter (or kernel) that slides over the input image. At each position, it performs a mathematical operation (a dot product) to create a new, smaller representation of the image. The values in this filter are the weights that the network learns through backpropagation. Different filters can learn to detect different features, such as horizontal lines, vertical lines, or corners.
Pooling Layers: After a convolutional layer, a pooling layer is often used to reduce the size of the feature maps. This reduces the number of parameters and makes the model more robust to small shifts in the input image. The most common type is Max Pooling, which simply takes the largest value from a small section of the feature map.
Fully Connected Layers: After the convolutional and pooling layers have extracted important features, the data is flattened and fed into one or more dense, or fully connected, layers to make a final prediction.
Recurrent Neural Networks (RNNs) for Sequences 📜
For data that has a temporal order, like a sentence or a stock price over time, we need a network that can remember previous inputs. A Recurrent Neural Network (RNN) has a loop that allows information to persist, giving it a form of short-term memory. This makes them perfect for natural language processing, like translating languages or generating text.
The Recurrent Cell: Unlike a standard neural network layer, the recurrent layer’s output is not only passed to the next layer but also fed back into itself for the next time step. This feedback loop allows the network to process sequences one item at a time while maintaining a "hidden state" that summarizes all the previous information it has seen.
The Vanishing Gradient Problem: A key challenge with basic RNNs is the vanishing gradient problem, where the gradients become so small that the network can’t learn long-term dependencies. The error signal gets weaker and weaker as it propagates back through the network.
Long Short-Term Memory (LSTM): To combat this, more advanced architectures like LSTMs were developed. LSTMs introduce a "cell state" and a series of "gates" that control what information is remembered and what is forgotten. This allows them to learn long-term dependencies more effectively.
You’ve now built a solid mental model for deep learning. Here are some advanced concepts that will guide your next steps.
Transfer Learning 🧠
Training a deep network from scratch requires a lot of data and computational power. Transfer learning is a shortcut that lets you use a pre-trained model (one that has already learned from a massive dataset) and adapt it to your specific task. It’s like having a master artist’s brushstrokes and then using them to paint your own picture. This approach is highly effective because the pre-trained model has already learned a rich set of features from its original task. For example, a model trained on millions of images can recognize basic shapes and textures, and you can "transfer" that knowledge to a new task, like classifying different types of flowers, by training only the last few layers of the network on your new data.
The Transformer Architecture 🤖
For natural language processing, a new model called the Transformer has replaced RNNs. Transformers use a mechanism called attention that allows them to weigh the importance of different words in a sentence, which leads to a deeper understanding of context. This is the architecture behind large language models like GPT-3 and BERT. Unlike RNNs that process words sequentially, the Transformer processes all words in a sentence simultaneously, which makes it much more efficient and effective at capturing long-range dependencies.
Beyond Supervised Learning 🌐
While supervised learning is dominant, the world of AI is bigger.
Unsupervised Learning: Models that find patterns in unlabeled data. This is useful for tasks like clustering similar data points or reducing the number of features in a dataset.
Reinforcement Learning: Agents that learn to make decisions by trying things and getting rewards or penalties. This is how AI learns to play games and control robots. It’s a powerful framework for training an agent to interact with an environment and achieve a specific goal.
Congratulations! You have successfully completed your first comprehensive introduction to deep learning. You’ve gone from understanding the fundamental building block of a neuron to building and training a neural network. You’ve also learned about specialized architectures like CNNs for images and RNNs for sequences, as well as the cutting-edge Transformer model. The concepts of transfer learning, unsupervised learning, and reinforcement learning will guide your future studies.
This is a significant milestone. It’s a testament to your hard work and a powerful demonstration of what you can accomplish. The skills you’ve acquired in this book are the bedrock upon which you can build anything.
Practice with Real Data: The best way to learn is by doing. Find a dataset that interests you on a site like Kaggle and try to build a model.
Explore a Sub-field: Dive into computer vision with CNNs or natural language processing with Transformers.
Stay Curious and Keep Building: Don’t be afraid to try new things and make mistakes. Remember that learning is a personal process, and it’s perfectly fine to go at your own pace. Happy deep learning!