Edited By Siddhartha Reddy Jonnalagadda, PhD
Written By Hundreds of Parents
Welcome to the next step of your journey. While the last book taught you how to speak the language of code, this one will teach you how to think in the language of AI. Linear algebra isn’t just a bunch of numbers and equations; it’s the foundation of how computers "see," "understand," and "learn". It’s a way of looking at the world in terms of vectors, matrices, and transformations.
Just like our previous guide, this book is designed for clarity and patience. We’ll use visual analogies and practical, hands-on examples to build your understanding piece by piece. There’s no rush to get it all at once; you can revisit any concept as many times as you need. The most important thing is to build a solid foundation so you can feel comfortable and confident as you explore the world of AI.
Let’s begin.
This chapter introduces the fundamental building block of linear algebra: the vector. A vector is a list of numbers arranged in a column. You can think of it as a set of instructions or a recipe. For example, a vector could be a travel route ("go 3 miles east, then 4 miles north") or a recipe for a specific color ("255 parts red, 128 parts green, 0 parts blue"). We’ll represent vectors using a powerful Python library called NumPy, which makes working with numbers incredibly fast and easy.
Setting Up Your Toolkit
To follow along, you’ll need to use Google Colab again. The first step is to import the NumPy library. It’s common practice to give it the nickname np so you don’t have to type out numpy every time.
import numpy as np
Creating Vectors
In Python, a vector is simply a NumPy array. You can create one by passing a Python list into the np.array() function.
Let’s create two vectors, v and w.
# A vector for a journey: 3 steps forward, 4 steps right
v = np.array([3, 4])
# A vector for a color recipe: 255 Red, 128 Green, 0 Blue
w = np.array([255, 128, 0])
print(f"Vector v: {v}")
print(f"Vector w: {w}")
You will see that the output looks similar to a Python list, but it’s a special kind of list that gives us extra powers for math.
Vector Operations: Adding and Scaling
Vector operations are like combining two sets of instructions. When we add two vectors, we add their corresponding elements. It’s like combining two separate journeys into a single, grand journey.
Let’s add our two-dimensional vectors, u and v, together.
u = np.array([1, -2])
v = np.array([2, 5])
sum_vector = u + v
print(f"Vector u: {u}")
print(f"Vector v: {v}")
print(f"The sum u + v is: {sum_vector}")
As you can see, NumPy knows to add 1 and 2 together to get 3, and -2 and 5 together to get 3, resulting in a new vector, [3, 3].
Scalar multiplication is a way to scale or change the length of a vector without changing its direction. A scalar is just a single number.
Let’s multiply our vector v by a scalar, 2.
v = np.array([3, 4])
scalar = 2
scaled_vector = v * scalar
print(f"Original vector: {v}")
print(f"Scaled vector: {scaled_vector}")
You’ll see that each number in the vector is multiplied by the scalar. Our original vector [3, 4] becomes [6, 8], which is a new vector in the same direction but twice as long.
The Dot Product: Measuring Similarity
The dot product is a special operation that takes two vectors and returns a single number. This number is a powerful way to measure how much the two vectors point in the same direction. The closer the dot product is to the maximum possible value, the more similar the vectors are. This is a core concept in machine learning, where we often compare vectors to find similarities.
NumPy gives us a simple way to calculate the dot product using the np.dot() function or the @ symbol, which is often used as a shorthand.
# Let's define two book vectors based on categories (e.g., [Sci-Fi, Fantasy, Mystery])
book_a = np.array([10, 2, 1])
book_b = np.array([9, 3, 2])
book_c = np.array([1, 1, 9])
# Calculate the dot product to see which is most similar
dot_product_ab = np.dot(book_a, book_b)
dot_product_ac = np.dot(book_a, book_c)
print(f"Dot product of book A and B: {dot_product_ab}")
print(f"Dot product of book A and C: {dot_product_ac}")
Notice that the dot product of book_a and book_b is much higher because they are both mostly "Sci-Fi" books. book_c is a "Mystery" book, so its dot product with book_a is low. This simple operation helps an AI find the book most similar to a user’s preference.
Understanding how to use vectors and perform these basic operations is the first major step in understanding how AI works under the hood. In the next chapter, we’ll expand this concept by organizing these vectors into grids called matrices.
This chapter expands on the idea of vectors by introducing the matrix. A matrix is a rectangular grid of numbers, like a spreadsheet. You can also think of a digital image as a matrix, where each number represents the intensity of a pixel. Matrices are collections of vectors, and they are the language of data and transformations in AI.
Creating Matrices
In Python with NumPy, a matrix is simply a list of lists that you convert into a NumPy array. Each inner list represents a row in the matrix.
Let’s create a matrix A that represents a dataset of student test scores, where each row is a different student and each column is a different subject (e.g., Math, Science, History).
# A matrix representing test scores for three students in three subjects
A = np.array([
[90, 85, 92], # Student 1 scores
[78, 91, 88], # Student 2 scores
[95, 80, 75] # Student 3 scores
])
print(f"Matrix A:\n{A}")
print(f"Shape of Matrix A: {A.shape}")
You’ll notice the output is a clean grid. The A.shape tells us the number of rows and columns, in this case, (3, 3).
Matrix Multiplication: Applying Transformations
Matrix multiplication is one of the most important concepts in linear algebra. We’ll present it not as a confusing formula, but as a way to apply a transformation. For example, a matrix can be a set of instructions to "rotate an image" or "resize a picture". In AI, it’s how we apply weights to data to make a prediction.
When you multiply a matrix by a vector, the matrix takes the vector and moves it, rotates it, or scales it in a new direction. Think of the matrix as a machine that transforms things.
Let’s use a simple example. We’ll create a scaling matrix A and a vector x. When we multiply them, the matrix will transform the vector.
# A scaling matrix that shrinks a vector by half
A = np.array([
[0.5, 0],
[0, 0.5]
])
# A vector representing a point in 2D space
x = np.array([10, 8])
# Matrix-vector multiplication using the @ operator
transformed_x = A @ x
print(f"Original vector x: {x}")
print(f"The transformed vector is: {transformed_x}")
The result is [5. 4.]. The scaling matrix has shrunk our original vector by half, just as we intended.
A Practical Example: Image Transformation
Let’s use a more visual example. Imagine a simple 2x2 image, represented by a matrix of pixel values. We can use matrix multiplication to apply a transformation to this image.
# A simple 2x2 image represented by a matrix
image = np.array([
[255, 0], # Top row: a white pixel and a black pixel
[0, 255] # Bottom row: a black pixel and a white pixel
])
# A rotation matrix that rotates by 90 degrees
rotation_matrix = np.array([
[0, -1],
[1, 0]
])
# To perform this multiplication, we'll need to use a slightly different approach
# as it's a matrix-matrix multiplication, not matrix-vector.
# We'll use the np.dot() function here as it's more general for our example.
rotated_image = np.dot(image, rotation_matrix)
print(f"Original image:\n{image}")
print(f"\nRotated image:\n{rotated_image}")
You’ll see that the rotated_image matrix has its columns and rows shifted, effectively rotating the "image" 90 degrees. This is a powerful demonstration of how matrix multiplication is used in computer graphics and other AI applications to manipulate data.
Understanding matrices and how they transform vectors is a crucial step. In the next chapter, we’ll connect this to solving systems of equations, which is the heart of how an AI finds the right solution to a problem.
Chapter 3 connects the abstract ideas of linear algebra to real-world problems. We’ll start with systems of equations, which you can think of as a puzzle with a few known facts and one unknown. We’ll show how this puzzle can be represented neatly as a matrix equation.
The central idea is solving the puzzle: finding the solution vector that satisfies all the equations. We’ll provide a simple, intuitive process for this without getting lost in complex algorithms.
Finally, we’ll make a clear AI connection. We’ll explain that many AI problems, such as training a simple neural network, are fundamentally about solving massive systems of equations to find the right weights and biases. It’s the core of how a computer "learns" from data.
This chapter connects the abstract ideas of linear algebra to real-world problems. We’ll start with systems of equations, which you can think of as a puzzle with a few known facts and one unknown. We’ll show how this puzzle can be represented neatly as a matrix equation.
A system of linear equations is a set of two or more equations that share the same variables. It’s like having a list of clues to solve for a secret number. For example, a system could be:
Equation 1: 2x + 1y = 5
Equation 2: 3x - 2y = 4
In linear algebra, we can simplify this by moving all the numbers into a matrix and all the variables into a vector.
Representing Systems with Matrices
We can represent the numbers on the left side of the equations as a matrix A, and the unknown variables (x and y) as a vector x_vec. The results on the right side of the equations become a vector b.
import numpy as np
# The coefficients of x and y form our matrix A
A = np.array([
[2, 1],
[3, -2]
])
# The unknown variables form our vector x
# Note: We don't know these values yet!
# We just represent them as a vector here for the concept.
# x_vec = np.array([x, y])
# The results form our vector b
b = np.array([5, 4])
Now, our entire system of equations can be written as one simple matrix equation: A * x_vec = b.
Solving the Puzzle with the Inverse Matrix
The central idea is solving the puzzle: finding the vector x_vec that makes the equation true. To do this, we need to "undo" the multiplication by matrix A. In regular math, we would divide by A, but with matrices, we use something called the inverse matrix, written as $A^{-1}$.
The inverse matrix is like an "undo" button. When you multiply a matrix by its inverse, you get the identity matrix—a special matrix that, when multiplied, leaves a vector unchanged.
# We can find the inverse of a matrix using NumPy's linalg.inv() function
A_inverse = np.linalg.inv(A)
print(f"The inverse of A is:\n{A_inverse}")
Now, we can find our unknown vector x_vec by multiplying the inverse of A by our vector b: x_vec = $A^{-1}$ * b.
A_inverse = np.linalg.inv(A)
b = np.array([5, 4])
# Solve for the unknown vector x_vec
solution_vec = A_inverse @ b
print(f"The solution for x and y is: {solution_vec}")
You’ll get a result of [2. 1.], which means x=2 and y=1 are the solutions to our original system of equations.
An AI Connection
This might seem like a simple math trick, but it’s the core of how many AI models, like linear regression, work. In those models, the matrix A represents your data, the vector b represents the correct answers, and the vector x_vec represents the model’s weights or parameters. The AI’s job is to solve the system of equations to find the perfect set of weights (x_vec) that best fits the data. It’s how the computer "learns" from examples to make predictions.
Understanding how to solve systems of linear equations is a key skill for a variety of scientific and engineering applications. In the next chapter, we’ll explore some special matrices that have unique properties and are especially useful in AI.
Chapter 4 will highlight specific types of matrices that are crucial in AI.
We’ll start with the identity matrix, which is the "do nothing" matrix. We’ll use a visual analogy to show that multiplying a vector by an identity matrix leaves it completely unchanged.
Next, we’ll introduce the inverse matrix, which is the "undo" matrix. We’ll explain that if one matrix transforms a vector in a certain way, its inverse matrix will transform it back to its original state. This is vital for solving systems of equations and is a core part of many AI algorithms.
Finally, we’ll discuss symmetric matrices, where the values are mirrored across the diagonal. We’ll introduce this concept as a way to represent relationships where the connection is the same in both directions (e.g., the distance between two cities). As you’ll see in later chapters, symmetric matrices have special properties that make them very useful in data analysis.
In the last chapter, we saw how the inverse matrix is like a secret key that unlocks the solution to a system of equations. In linear algebra and AI, there are several other special kinds of matrices that are just as important because of their unique properties. This chapter will introduce three of the most crucial ones.
The Identity Matrix: The "Do Nothing" Matrix
Think of the identity matrix as the number 1 for matrices. Just like multiplying any number by 1 doesn’t change it, multiplying a matrix by the identity matrix doesn’t change it either. It’s a square matrix (the same number of rows and columns) that has 1s along its main diagonal and 0s everywhere else.
The identity matrix is often referred to as I. In NumPy, we can easily create an identity matrix of any size using the np.identity() function.
Let’s see what happens when we multiply a vector by the identity matrix.
import numpy as np
# A 2x2 identity matrix
I_2x2 = np.identity(2)
# A simple vector
v = np.array([5, 3])
# Multiply the vector by the identity matrix
transformed_v = I_2x2 @ v
print(f"The identity matrix:\n{I_2x2}")
print(f"\nOriginal vector: {v}")
print(f"Vector after multiplication: {transformed_v}")
As you can see, the transformed_v is exactly the same as the original vector v. The identity matrix lives up to its name!
The Inverse Matrix: The "Undo" Matrix
We got a sneak peek at the inverse matrix in the last chapter, but let’s dive deeper. The inverse matrix, is the matrix that, when multiplied by matrix A, gives you the identity matrix. It’s the "undo" button for a transformation.
For example, if a matrix A scales your data by a factor of 2, its inverse, A_inverse, will scale it back by a factor of 1/2. Not all matrices have an inverse, but when they do, they are incredibly useful for solving problems.
Symmetric Matrices: The Mirrored Matrix
A symmetric matrix is a special kind of square matrix where the values are mirrored across the main diagonal. This means the value at row i, column j is the same as the value at row j, column i.
You can think of a symmetric matrix as representing a relationship that is the same in both directions. For example, if you have a matrix showing the distances between cities, the distance from City A to City B is the same as the distance from City B to City A.
Here is an example of a symmetric matrix in NumPy:
# A symmetric matrix
S = np.array([
[1, 2, 3],
[2, 4, 5],
[3, 5, 6]
])
print(f"Symmetric Matrix S:\n{S}")
If you look at this matrix, you’ll see that the value in the first row, second column (2) is the same as the value in the second row, first column (2). The same is true for the values 3 and 5. This symmetry has some powerful mathematical properties that we’ll explore in a later chapter on eigenvalues.
Understanding these special matrices is like knowing the special pieces on a chess board. Each one has a unique role that can be leveraged to solve complex problems in AI and data analysis. In the next chapter, we’ll talk about one of the most abstract but important concepts: eigenvectors and eigenvalues.
This chapter will tackle one of the most abstract but important concepts in linear algebra: eigenvectors and eigenvalues. It may feel a bit like a puzzle, but with the right mental picture, it’s a powerful tool for understanding data.
The Core Idea: Special Vectors
Imagine you have a machine (a matrix) that stretches and rotates every vector you put into it. Most vectors will come out pointing in a completely new direction. However, there are a few very special vectors—the eigenvectors—that only change their length, not their direction.
An eigenvalue is the number that tells you how much the eigenvector was stretched or shrunk by the matrix. It’s the scaling factor.
Think of it like a spinning globe. When the globe spins, every point on the surface moves. But the points at the very top and bottom (the North and South poles) stay in the exact same spot. They don’t change direction, only their position. The axis that goes through the poles is a perfect analogy for an eigenvector.
What Do They Represent?
In a dataset, eigenvectors show the most important "directions" or "principal components" of the data. They reveal the fundamental patterns and structures that are hidden within a matrix. For example, if your data tracks the height and weight of a group of people, an eigenvector might show the overall trend of how height and weight are related. The corresponding eigenvalue would tell you how strong that trend is.
A Practical Application: Principal Component Analysis (PCA)
One of the most common uses of eigenvectors and eigenvalues in AI is in a technique called Principal Component Analysis (PCA). PCA is used for dimensionality reduction, which is a fancy way of saying "making big, complex datasets simpler."
Imagine you have a dataset with 50 different measurements for each person (e.g., height, weight, age, blood pressure, etc.). It would be nearly impossible to visualize this in 50 dimensions. PCA uses the eigenvectors to find the most important "directions" in the data and projects the data onto a much smaller number of dimensions (like 2 or 3) while keeping as much of the original information as possible.
Let’s see a simple example of this. We won’t perform the full PCA calculation, as it is complex, but we’ll show you what the output would look like. We can represent a 3-dimensional vector and project it down to 2 dimensions.
import numpy as np
# A "tall" data vector with 3 dimensions
original_data = np.array([5, 8, 3])
# In a real-world scenario, you would calculate eigenvectors and eigenvalues
# to get a transformation matrix. For this example, we'll use a simplified
# matrix to show the effect.
projection_matrix = np.array([
[1, 0, 0],
[0, 1, 0]
])
# Project the 3D data into 2D space
projected_data = projection_matrix @ original_data
print(f"Original 3D data: {original_data}")
print(f"Projected 2D data: {projected_data}")
As you can see, the projected_data vector only has two numbers. It is a simpler, two-dimensional representation of our original data. Eigenvectors and eigenvalues are the tools that allow us to do this "smartly" so that we keep the most important information, making it easier for an AI to learn from the data.
Understanding these concepts is a major step toward understanding the inner workings of many powerful machine learning algorithms. In the next chapter, we’ll bring all of these concepts together to build a small but complete AI model.
It’s time to bring all the concepts we’ve learned together. In this chapter, we’ll build a small but complete application: a simple linear regression model. This project will require us to use vectors, matrices, and the principles of solving systems of equations to find the right answer to a problem.
The Goal: Predicting House Prices
Imagine you have a dataset of house sizes and their corresponding prices. A linear regression model’s job is to find the best-fit line that can predict the price of a new house based on its size. The model "learns" from the data to find the optimal relationship between size and price.
Our model will be very simple. We’ll have:
Input Data: A vector of house sizes.
Output Data: A vector of house prices.
Model Parameters: A vector of unknown "weights" that the model needs to learn.
The goal is to find the weights that allow our model to make the most accurate predictions. This is fundamentally a system of equations, and we will solve it using the linear algebra concepts we have learned.
The Components
Let’s represent our data using NumPy.
X: The input data matrix. Each row is a house, and each column is a feature (in our simple case, just the size).
y: The output vector of correct house prices.
w: The weights vector that our model needs to find.
Our simple model will use the equation price = w0 + w1 * size. We can represent this as a matrix multiplication. To do this, we’ll add a column of 1s to our X matrix. This is a common trick in linear regression that allows us to find both the slope (w1) and the y-intercept (w0) using a single matrix equation.
import numpy as np
# A vector of house sizes in square feet
house_sizes = np.array([1000, 1500, 2000, 2500])
# A vector of corresponding prices in thousands of dollars
prices = np.array([200, 250, 300, 350])
# Add a column of 1s to the house_sizes vector to create our X matrix
X = np.column_stack([np.ones(len(house_sizes)), house_sizes])
print(f"Our input matrix X:\n{X}")
print(f"\nOur output vector y:\n{prices}")
Training the Model: Solving for the Weights
Finding the best weights is the "training" part of the model. We can solve for the weights vector w using a formula that is derived directly from linear algebra. It might look a little complicated, but NumPy handles all the hard work for us. The formula is: w = (inverse of (X_transpose * X)) * (X_transpose * y).
Let’s break this down into simple Python steps.
# Our input matrix X and output vector y
X = np.array([
[1., 1000.],
[1., 1500.],
[1., 2000.],
[1., 2500.]
])
y = np.array([200, 250, 300, 350])
# Step 1: Calculate the transpose of X. In NumPy, this is just X.T
X_T = X.T
# Step 2: Multiply the transpose of X by X
step_2 = X_T @ X
# Step 3: Find the inverse of the result from step 2
step_3 = np.linalg.inv(step_2)
# Step 4: Multiply the transpose of X by y
step_4 = X_T @ y
# Final Step: Multiply the result from step 3 and step 4 to get our weights vector w
w = step_3 @ step_4
print(f"The learned weights (w0, w1) are: {w}")
The output will give us a vector w with two values: the y-intercept (w0) and the slope (w1). The model has now learned the optimal relationship from the data.
Making Predictions
Now that our model is "trained" and we have our weights, we can use them to make predictions for new house sizes.
Let’s predict the price of a house that is 1750 square feet.
# Our learned weights
w0 = 100.
w1 = 0.1
# The new house size we want to predict
new_house_size = 1750
# Our prediction equation: price = w0 + w1 * size
predicted_price = w0 + w1 * new_house_size
print(f"The predicted price for a 1750 sq ft house is: ${predicted_price:.2f} thousand")
This is a small but complete example of how linear algebra is the engine that drives an AI. We used matrices to organize our data, matrix operations to perform the calculations, and the concept of an inverse matrix to solve for the model’s parameters.
Understanding these concepts is crucial for a variety of scientific and engineering applications. In the final chapter, we’ll reflect on your journey and provide guidance on how to continue your growth as a programmer and a data scientist.
Congratulations! You’ve completed your journey into linear algebra and its applications in AI. We’ll conclude by reflecting on the fact that linear algebra is a powerful mental model for understanding the world.
We’ll provide guidance on how to continue your growth as a programmer and a thoughtful learner, suggesting further topics like:
Deeper into Deep Learning: How linear algebra underpins the complex computations of neural networks.
Graphics and Computer Vision: How vectors and matrices are used to create and manipulate images, a field that relies heavily on the concepts you’ve learned.
Quantum Computing: A brief mention of how linear algebra is the language of this emerging field, a testament to the broad relevance of this subject.