Edited By Siddhartha Reddy Jonnalagadda, PhD
Written By Hundreds of Parents
Welcome back, thoughtful learner. This book is your guide to the world of algorithms and data structures, but with a unique approach. It’s designed for those who learn by seeing the patterns and logic behind the code, and who find comfort in clear, sequential steps. We’ll be using Python as our language for discovery.
Instead of focusing on raw memorization, we’ll build a deep, intuitive understanding of how and why algorithms work. We’ll use clear analogies, visuals, and thoughtful explanations to make complex topics approachable and meaningful. You’ll move from understanding what a single piece of code does to seeing how an entire program works together, like a complex machine with interconnected parts.
The skills you’ve developed are the bedrock for this next phase. You’ve learned that errors are not failures, but opportunities to learn and improve. You’ve also seen how to break down complex tasks into smaller, manageable pieces, a skill that is valuable in all aspects of life. Now, we’ll take that systematic thinking and apply it to some of the most powerful and elegant ideas in computer science.
This book is a journey of continuous learning. The skills you acquire here will be the bedrock upon which you can build anything, from complex data analysis tools to intricate video games. Let’s begin our journey and transform simple solutions into sophisticated, powerful ones.
1.1 The Essence of an Algorithm 🧠
At its heart, an algorithm is a clear, step-by-step set of instructions for solving a problem. Think of it as a blueprint or a recipe for a computer. Just as a recipe specifies ingredients and cooking steps, an algorithm defines the data it needs and the precise actions to take to get a result. They are the fundamental ideas behind all computer programs.
For example, a recipe for a cake is an algorithm. It takes ingredients (the input), gives you a list of actions (the steps), and produces a cake (the output). If you follow the steps exactly, you’ll always get the same result. Algorithms are the same way; they are predictable and repeatable.
1.2 Algorithms as a Tool 🛠️
Algorithms are a powerful technology. They allow us to solve problems efficiently, especially as the problems get bigger. We can choose from many different algorithms to solve the same problem, and they will have different performance.
For instance, imagine you need to find a specific word in a dictionary with thousands of pages.
Algorithm A: Start at the first page and read every single word until you find the one you’re looking for.
Algorithm B: Open the dictionary to the middle. If the word you’re looking for comes before that page, go to the middle of the first half. If it comes after, go to the middle of the second half. Repeat this process until you find your word.
Both algorithms solve the problem, but Algorithm B is much faster and more efficient. A well-designed algorithm can make the difference between a task that takes a few seconds and one that takes a very long time. Choosing the right algorithm for a task is a key skill for any programmer.
2.1 The Sorting Puzzle: Insertion Sort
We’ll start with a simple but powerful sorting algorithm. Insertion Sort is a method for arranging a list of items, like numbers or words, in a specific order. Think of it like arranging a hand of playing cards. You take one card at a time and insert it into its correct position in the cards you’ve already sorted.
The process works as follows: you assume the first item is a sorted list of one item. Then, you take the next item and compare it to the sorted list, moving items over until you find the right place for it. You repeat this for every item in the unsorted part of the list.
def insertion_sort(items):
# We start with the second item and go all the way to the end
for i in range(1, len(items)):
current_item = items[i]
j = i - 1
# Move items in the sorted part to make space for the current item
while j >= 0 and items[j] > current_item:
items[j + 1] = items[j]
j -= 1
# Place the current item in the correct spot
items[j + 1] = current_item
return items
2.2 Thinking About Performance: Analyzing Algorithms 📊
To compare how well different algorithms perform, we need a way to measure them. Instead of measuring in seconds, which can change depending on the computer, we’ll analyze an algorithm’s efficiency based on how it performs as the size of the input grows. This allows us to predict how an algorithm will behave with very large datasets, which is crucial for building real-world applications.
For example, for the insertion_sort function above, if we double the number of items in the list, the algorithm’s runtime will increase by a lot more than double. This is because it has a loop inside another loop, which can be inefficient for large inputs. This concept of performance is what we’ll explore in detail in the next chapter.
2.3 Creating the Solution: Designing Algorithms ✍️
Designing an algorithm is a creative act. There isn’t a single right way to create an algorithm. We will explore different strategies for crafting solutions, from breaking a problem into smaller parts to building upon simple, existing solutions. For example, the insertion_sort algorithm is an example of a simple strategy: take a single step and repeat it until the whole problem is solved.
The process often involves:
Understanding the problem: What is the goal? What is the input, and what should the output be?
Thinking of a plan: What are the steps to get from the input to the output?
Writing the code: Translating the plan into a computer program.
Testing and refining: Checking if the code works and finding ways to make it better or more efficient.
3.1 The Language of Growth: Asymptotic Notation
When we analyze algorithms, we don’t care about the exact number of seconds or microseconds it takes to run. Instead, we want to understand how the runtime scales as the size of the input gets very, very large. This is what we call asymptotic notation.
Think of it this way: if you had to drive a car for 10 minutes and then walk for 5 minutes, the driving time is the most important part of your journey’s speed. If you drove for 10 hours and walked for 5 minutes, the walking time is almost meaningless in the grand scheme of things. Asymptotic notation lets us focus on the "driving time" of our algorithms—the part of the code that dominates the runtime as the input grows.
The most common notation is Big O notation. It describes the upper bound of an algorithm’s growth. It tells us the longest amount of time an algorithm will take for a given input size. It’s like saying, "This car will take at most this long to get there."
3.2 Key Symbols and Common Functions 🔑
Let’s look at the most common "growth rates" for algorithms and what they mean in plain language. We use a variable, n, to represent the size of the input.
O(1) - Constant Time: The algorithm takes the same amount of time no matter how big the input is. It’s like checking the first item on a list; the time it takes doesn’t depend on how long the list is.
O(log n) - Logarithmic Time: The algorithm’s runtime grows very slowly as the input size grows. This is incredibly efficient. A binary search on a sorted list is a perfect example: each step cuts the problem in half, so doubling the list size only adds one more step.
O(n) - Linear Time: The runtime grows in direct proportion to the input size. If the input doubles, the runtime doubles. A linear search on a list is an example; you have to look at every item to find what you’re looking for.
O(n log n) - Linearithmic Time: This is a very common and efficient runtime for many sorting algorithms. It’s slightly worse than linear time but much better than quadratic.
O(n*n) - Quadratic Time: The runtime grows with the square of the input size. If the input doubles, the runtime quadruples. The insertion_sort algorithm we saw in the last chapter is an example of this. This is acceptable for small inputs but becomes very slow for large ones.
Understanding these different categories of growth allows us to choose an algorithm that is not just correct, but also a good fit for the scale of the problem we are trying to solve.
The Divide and Conquer approach is a powerful way to design algorithms. It’s a strategy that breaks a large, difficult problem into smaller, identical subproblems that are easier to solve. Once you have the solutions to these smaller problems, you combine them to get the solution for the original, large problem.
Think of it like a team of people trying to find a missing item in a vast, cluttered warehouse. Instead of one person searching the entire space, the team divides the warehouse into smaller sections. Each person searches their own section, and when someone finds the item, they announce it. The larger problem of finding the item in the entire warehouse is solved by conquering the smaller subproblems of searching individual sections.
The process has three main steps:
Divide: Break the problem into several smaller subproblems that are similar to the original problem but smaller in size.
Conquer: Solve the subproblems recursively. If a subproblem is small enough, solve it directly.
Combine: Merge the solutions to the subproblems to get the solution to the original problem.
4.1 Finding the Treasure: The Maximum-Subarray Problem 💰
We will solve a classic problem: finding the contiguous subarray within a one-dimensional array of numbers that has the largest sum. This problem is surprisingly useful, for example, in financial analysis to find periods where a stock price had the greatest increase.
A naive approach would be to check every possible subarray, which would be very slow. Instead, we’ll use a divide and conquer approach:
Divide: Split the array into two halves, a left half and a right half.
Conquer: Find the maximum subarray sum in the left half, the right half, and a third possibility: a subarray that crosses the midpoint.
Combine: The maximum subarray of the original array must be one of these three. We simply return the largest of the three sums.
To find the maximum crossing subarray, we can start at the midpoint and expand outwards in both directions (left and right) to find the largest sum. By using this method, we avoid checking every single subarray.
def find_max_crossing_subarray(A, low, mid, high):
"""Finds the maximum subarray that crosses the midpoint."""
# Find the max sum on the left side
left_sum = float('-inf')
current_sum = 0
max_left = mid
for i in range(mid, low - 1, -1):
current_sum += A[i]
if current_sum > left_sum:
left_sum = current_sum
max_left = i
# Find the max sum on the right side
right_sum = float('-inf')
current_sum = 0
max_right = mid + 1
for j in range(mid + 1, high + 1):
current_sum += A[j]
if current_sum > right_sum:
right_sum = current_sum
max_right = j
return (max_left, max_right, left_sum + right_sum)
def find_maximum_subarray(A, low, high):
"""Finds the maximum subarray of a given array using divide and conquer."""
# Base case: if the array has only one element
if high == low:
return (low, high, A[low])
else:
mid = (low + high) // 2
# Recursively find the max subarray in the left and right halves
left_low, left_high, left_sum = find_maximum_subarray(A, low, mid)
right_low, right_high, right_sum = find_maximum_subarray(A, mid + 1, high)
# Find the max subarray that crosses the midpoint
cross_low, cross_high, cross_sum = find_max_crossing_subarray(A, low, mid, high)
# Return the subarray with the largest sum
if left_sum >= right_sum and left_sum >= cross_sum:
return (left_low, left_high, left_sum)
elif right_sum >= left_sum and right_sum >= cross_sum:
return (right_low, right_high, right_sum)
else:
return (cross_low, cross_high, cross_sum)
4.2 A Clever Shortcut: Strassen’s Algorithm for Matrix Multiplication ✖️
Multiplying two matrices is a fundamental operation in many fields, from computer graphics to scientific computing. The traditional method for multiplying two n x n matrices is straightforward but can be slow as n gets large.
Strassen’s Algorithm is a more efficient method that uses the divide and conquer approach. Instead of the standard method, which requires 8 recursive multiplications for each subproblem, Strassen’s algorithm cleverly uses only 7.
This might not seem like a big difference, but as the size of the matrices grows, this one saved multiplication at each step of the recursion results in a significant speed-up. While it’s more complex to implement, it demonstrates the power of a well-designed algorithm to beat a simple, straightforward one.
Analogy: Imagine a factory that has to produce a large number of products. The standard method would be to build a separate assembly line for each product. Strassen’s algorithm is like figuring out a way to combine two assembly lines into one, which is more complex to set up but produces products much faster in the long run.
The implementation of Strassen’s algorithm is quite detailed, but here is the general idea:
Divide: The two n x n matrices are each divided into four n/2 x n/2 submatrices.
Conquer: Instead of the eight recursive multiplications of the standard method, we define seven new matrices, M1 through M7, each of which is the result of a single multiplication of two submatrices.
Combine: We then use these seven matrices to construct the four submatrices of the final product matrix, using addition and subtraction.
This is a beautiful example of how thinking about the problem in a different way can lead to a more efficient solution.
4.3 Solving Recurrences: The Substitution Method 🧩
When an algorithm solves a problem by breaking it into smaller parts and calling itself, its runtime can be described with a mathematical formula called a recurrence relation. These relations express the runtime of a problem of size n in terms of the runtime of smaller subproblems. For example, the runtime of an algorithm might be:
T(n) = 2 * T(n/2) + n
This means it divides the problem in half twice and then does some work that is proportional to n.
The substitution method for solving a recurrence is a two-step process:
Guess a solution: Based on your intuition, you "guess" a formula for the runtime.
Prove the guess is correct: You use a mathematical tool called induction to prove that your guess holds for all possible input sizes.
For example, for the recurrence T(n) = 2 * T(n/2) + n, you might guess that the solution is O(n * log(n)). You would then use induction to prove that this guess is correct. This method is powerful because it forces you to deeply understand why an algorithm grows at a certain rate.
4.4 Visualizing Recursion: The Recursion-Tree Method 🌳
The recursion-tree method is a powerful visual tool for solving recurrence relations and gaining an intuitive understanding of how a recursive algorithm works. It’s a way to draw out the work done at each level of the recursion.
Imagine a tree where each node represents the work done at a particular function call. The root of the tree is the original problem of size n, and its children are the subproblems it creates. The cost of each level of the tree is the sum of the work done by all the nodes at that level.
By summing up the costs of all the levels of the tree, you can find the total cost of the algorithm. This method makes it easier to see and sum up the work done at each level of the recursion.
For the recurrence T(n) = 2 * T(n/2) + n, the tree would look like this:
Level 0: The root does n work.
Level 1: Two subproblems of size n/2 are created. Each does n/2 work, for a total of 2 * (n/2) = n work.
Level 2: Four subproblems of size n/4 are created, with a total of 4 * (n/4) = n work.
…and so on, until you reach the leaves of the tree.
Since the height of this tree is log(n) (because you keep dividing the problem by 2 until it’s of size 1), and each level does n work, the total work is roughly n * log(n), which corresponds to our guess of O(n * log(n)). This method provides a clear and visual way to understand where the runtime of a recursive algorithm comes from.
4.5 The Ultimate Tool: The Master Method 📐
While the recursion-tree method gives us a great visual understanding of how an algorithm’s runtime grows, it can be a lot of work. The Master Method is a powerful shortcut for solving a large class of recurrences that come from divide and conquer algorithms. It’s like a cheat sheet for a whole category of problems.
This method applies to recurrences of the form:
T(n) = a * T(n/b) + f(n)
Where:
T(n) is the runtime of the algorithm for an input of size n.
a is the number of subproblems created in the "divide" step.
n/b is the size of each subproblem.
f(n) is the work done to "divide" the problem and "combine" the solutions.
The Master Method compares the work done at the root of the recursion tree (f(n)) to the work done at the leaves (n to the power of log_b a). It has three cases:
Case 1: If the work at the leaves is much larger than the work at the root, the runtime is dominated by the leaves.
If f(n) is "smaller than" n to the power of log_b a, then the total runtime is simply the rate of the leaves.
Case 2: If the work at the leaves and the root are roughly the same, the runtime is a product of the work at the root and the height of the tree.
If f(n) is "the same as" n to the power of log_b a, then the total runtime is the rate of the leaves multiplied by log(n).
Case 3: If the work at the root is much larger than the work at the leaves, the runtime is dominated by the work at the root.
If f(n) is "larger than" n to the power of log_b a, then the total runtime is simply the rate of the root.
Example: For the recurrence of Merge Sort, T(n) = 2 * T(n/2) + n. Here, a=2, b=2, and f(n) = n. We calculate n to the power of log_b a which is n to the power of log_2 2, which simplifies to n. Since f(n) is proportional to n, this fits Case 2. Therefore, the runtime is T(n) is O(n * log(n)). This confirms our analysis from the recursion-tree method.
Sometimes, the best way to solve a problem isn’t to be perfectly logical at every step, but to introduce a bit of controlled randomness. Randomized algorithms use random choices to improve their performance or make their behavior more predictable. They can often be simpler and more efficient than their deterministic counterparts.
5.1 The Hiring Problem: Thinking About Probability 🤝
Let’s imagine you’re a manager looking to hire the best person for a job. You interview one candidate at a time and, after each interview, you must decide whether to hire them on the spot. If you hire someone new, you must pay a fee to fire your current assistant. Your goal is to hire the single best candidate and minimize the number of times you have to fire someone. You have no idea what the quality of the candidates will be.
If you interview n candidates in a random order, the number of times you hire a new candidate is not fixed. It could be as low as one (if the first candidate is the best) or as high as n (if the candidates arrive in increasing order of quality). This problem shows us how randomness can influence a process and how we can use probability to analyze it.
5.2 The Power of Random: Indicator Random Variables 💡
To help us analyze the hiring problem and other randomized algorithms, we’ll use a clever mathematical tool called indicator random variables. It’s a simple way to count events. An indicator random variable for an event A is a variable that is 1 if the event happens and 0 if it doesn’t.
Let’s use our hiring example. Let X be the total number of times we hire a new person. We can define n indicator random variables, X_1, X_2, ..., X_n, where X_i is 1 if we hire the i-th candidate and 0 otherwise.
Then, the total number of hires is the sum of all these variables: X = X_1 + X_2 + ... + X_n. This simple trick allows us to analyze the total number of hires by looking at the probability of each individual hire.
5.3 Unleashing Randomness: Randomized Algorithms 🎉
A randomized algorithm is an algorithm that makes a random choice at one or more points in its execution. We’ll look at two main types:
Las Vegas algorithms: These algorithms always produce the correct answer, but their runtime varies depending on the random choices they make. The randomized version of Quicksort we’ll see later is an example of this.
Monte Carlo algorithms: These algorithms have a chance of producing an incorrect answer, but their runtime is often guaranteed to be fast. We won’t focus on these as much, but it’s good to know they exist.
The beauty of a randomized algorithm is that its performance, in the long run, is not dependent on the specific order of the input. For an algorithm like Quicksort, a "bad" input can make it very slow. But if we randomize the input, a bad case is extremely unlikely, and we can rely on its average-case performance.
6.1 The Heap as a Structure 🗂️
A heap is a specialized data structure that organizes information into a kind of binary tree. Think of it like a family tree, but with a specific rule: every parent node is either always larger or always smaller than its children. This is called the heap property. A max-heap ensures the largest item is always at the top (the root), while a min-heap ensures the smallest item is at the top.
The beautiful part is that we don’t need a complex tree structure to represent a heap. We can use a simple list or array. The relationships are determined by an item’s position. For any item at index i, its parent is at index i/2, and its children are at indices 2i and 2i+1 (using a 1-based index). This makes a heap very efficient to work with.
6.2 Maintaining the Heap Property 🔧
After we add or remove an item, the heap property might be broken. We need a way to fix it. We use two main functions for this:
max_heapify: This function ensures that a subtree rooted at a given node is a max-heap. If a parent is smaller than one of its children, we swap them and then recursively call max_heapify on the new position. This process "sinks" a smaller item down the tree until it finds its correct place.
build_max_heap: This function takes an unsorted list and turns it into a max-heap. We do this by calling max_heapify on every parent node, starting from the bottom of the tree and working our way up to the root.
6.3 Building a Heap from Scratch 🏗️
The build_max_heap function is surprisingly efficient. We can take any list of items and turn it into a perfectly organized heap in linear time, or O(n). The reason it’s so fast is that most of the work happens near the bottom of the tree, where there are many small subtrees that are quick to fix. The few large subtrees at the top don’t take much time relative to the whole process.
def max_heapify(A, i):
# A is the list, i is the index of the node to heapify
l = 2 * i + 1 # left child index
r = 2 * i + 2 # right child index
largest = i
# Check if the left child is larger than the parent
if l < len(A) and A[l] > A[largest]:
largest = l
# Check if the right child is larger than the current largest
if r < len(A) and A[r] > A[largest]:
largest = r
# If the largest is not the parent, swap and continue heapifying
if largest != i:
A[i], A[largest] = A[largest], A[i]
max_heapify(A, largest)
def build_max_heap(A):
# Start from the last parent node and work backwards to the root
n = len(A)
for i in range(n // 2 - 1, -1, -1):
max_heapify(A, i)
6.4 The Heapsort Algorithm 📜
Now that we know how to build and maintain a heap, we can use it to create a powerful sorting algorithm called Heapsort. Heapsort is a sorting algorithm that is known for its efficiency and for not needing any extra space outside of the original list.
The process is simple and intuitive:
Build a Max-Heap: First, we transform our unsorted list into a max-heap using the build_max_heap function we learned about earlier. Now, the largest item is at the very top of our heap, at index 0.
Sort the List: We know the largest item is at the top. So, we swap it with the very last item in the list. This places the largest item in its final, sorted position.
Reduce the Heap: After the swap, the largest item is in the correct place, but the heap property is broken. We now have a heap of size n-1 to deal with. We use the max_heapify function on the root (index 0) to "sink" the new, smaller item down to its correct position, re-establishing the heap property for the remaining items.
Repeat: We repeat this process, swapping the new largest item with the next-to-last item, and so on, until the entire list is sorted.
def heapsort(A):
"""Sorts a list in place using the Heapsort algorithm."""
n = len(A)
# Step 1: Build a max-heap
build_max_heap(A)
# Step 2 & 3: Repeat the sorting process
for i in range(n - 1, 0, -1):
# Swap the current largest item (at the root) with the last item
A[0], A[i] = A[i], A[0]
# Re-establish the heap property for the remaining unsorted part
max_heapify(A, 0)
return A
6.5 A Heap’s Secret Power: Priority Queues 👑
Heaps are also incredibly useful for a specific data structure called a priority queue. A priority queue is like a regular queue (a waiting line), but instead of serving the person who has been waiting the longest, it serves the person with the highest "priority" first. A max-heap is a perfect tool for implementing a priority queue because it always keeps the highest-priority item (the largest number, for example) at the top, ready to be "served".
A priority queue can be used for many tasks, such as:
Finding the item with the highest priority in a group.
Adding a new item to the group.
Updating an item’s priority.
With a heap, all of these operations can be done efficiently, which is why it’s a go-to data structure for scheduling and resource management problems.
7.1 Description of Quicksort 📝
Quicksort is a powerful and widely-used sorting algorithm. It’s known for being very fast in practice, although it can be slow in rare cases. Like Merge Sort, it uses the divide and conquer strategy. The core idea is to break down a large sorting problem into smaller, more manageable subproblems.
Here’s how it works:
Choose a Pivot: First, you select one item from the list and call it the pivot. The choice of the pivot is very important for the algorithm’s performance.
Partition: You rearrange the rest of the list so that all items smaller than the pivot are on one side, and all items larger are on the other. The pivot is now in its final, sorted position.
Recurse: You then recursively apply the same process to the two sub-lists (the one with smaller items and the one with larger items). You continue this process until all sub-lists have a size of one, at which point the entire list is sorted.
7.2 Performance of Quicksort 📈
Quicksort’s performance depends heavily on the choice of the pivot.
Best Case: When the pivot divides the list into two roughly equal halves, the algorithm performs beautifully. The runtime in this case is O(n log n), which is very efficient.
Worst Case: If the pivot is always the smallest or largest item in the list, the algorithm ends up creating a sub-list of size zero and another sub-list of size n-1. This turns the algorithm into a slow linear search, with a runtime of O(n*n). This happens with an already sorted list if you always choose the first item as the pivot.
7.3 A Fair Game: A Randomized Version of Quicksort 🎲
To avoid the worst-case scenario, we can use a randomized version of Quicksort. Instead of always choosing the first or last item as the pivot, we pick a pivot randomly from the list. This ensures that the worst-case input (an already sorted list) is no longer a problem. While a single random choice might still be bad, the probability of always making a bad choice in a row is extremely small. This makes Quicksort’s performance reliably fast on average.
7.4 Analysis of Quicksort 🧐
The average-case performance of randomized Quicksort is O(n log n). This is because, on average, the random pivot choices will lead to a balanced partitioning of the list. We can prove this using the concept of indicator random variables that we learned about in the previous chapter. Even though the analysis is complex, the end result is that Quicksort is one of the fastest sorting algorithms in practice.
Most of the sorting algorithms we’ve seen so far, like Heapsort and Quicksort, are known as comparison sorts. They work by comparing items to each other to figure out their correct order. We know from our analysis that comparison sorts can’t be faster than O(n log n) in the worst case. But what if we’re not just comparing items? What if we have extra information about the data itself?
In this chapter, we’ll explore some clever algorithms that sort in linear time, or O(n). These algorithms are faster because they avoid comparisons and instead use a different approach based on the specific properties of the data, such as the range of the numbers or the length of the strings.
8.1 The Limits of Sorting: Lower Bounds 🚧
There’s a fascinating theoretical idea called a lower bound for sorting. It states that any comparison sort, in the worst case, must perform at least a certain number of comparisons to sort a list. This is because each comparison can only tell us about the relative order of two items, and we need enough information to determine the correct order for all n! possible arrangements of the items. This minimum number of comparisons works out to be O(n log n), which is why we consider it the theoretical limit for comparison-based sorting.
8.2 Counting Sort: The Count and Place Method 🔢
For certain types of data, we can use a clever trick called Counting Sort to sort in linear time. This algorithm is useful when the items to be sorted are integers within a small, known range.
How it works:
Count: We create a temporary array to count how many times each item appears in the original list.
Cumulative Sum: We then change the count array so that each position holds the total number of items that are less than or equal to its index. This new array tells us the correct sorted position for each item.
Place: We create a final, sorted output list. We go through the original list and use our cumulative count array to place each item in its correct, final position. As we place an item, we subtract one from its count to ensure we don’t place two items in the same spot.
def counting_sort(A, k):
"""Sorts a list A of integers in the range 0 to k."""
n = len(A)
# The list to store the sorted output
B = [0] * n
# The list to store the counts of each item
C = [0] * (k + 1)
# Step 1: Count the occurrences of each item
for j in range(n):
C[A[j]] += 1
# Step 2: Calculate the cumulative sum
for i in range(1, k + 1):
C[i] += C[i - 1]
# Step 3: Place items into the sorted list
for j in range(n - 1, -1, -1):
B[C[A[j]] - 1] = A[j]
C[A[j]] -= 1
return B
Counting Sort is a beautiful example of how knowing something about your data’s properties (in this case, the range of numbers) can help you design a much faster algorithm.
8.3 Radix Sort: The Digital Approach 🧮
Radix Sort is another linear-time sorting algorithm that doesn’t use comparisons. It’s a non-comparative integer sorting algorithm that works by grouping numbers by their individual digits, or "radixes," from least significant to most significant.
How it works:
Iterate by Digits: The algorithm sorts the list one digit at a time, starting from the rightmost digit (the ones place), then the tens place, and so on.
Stable Sort: For each digit, it uses a stable sorting algorithm, like Counting Sort, to rearrange the numbers. A stable sort is one that maintains the relative order of items with equal values. For example, if two numbers have the same digit, their original order is preserved.
Repeat: After the first pass, the list is sorted by the ones digit. After the second pass, it’s sorted by the first two digits, and so on. After the last pass (the most significant digit), the list is completely sorted.
Analogy: Think of sorting a large stack of index cards by name. You could first sort them all by the last letter of the last name, then by the second-to-last letter, and so on, until you get to the first letter. The stack would be perfectly sorted.
Radix Sort is incredibly fast for a specific type of data and shows how an understanding of the data’s structure can lead to a more efficient algorithm.
8.4 Bucket Sort: The Distribution Method 🪣
Bucket Sort is another non-comparative sorting algorithm that works by distributing the elements of a list into a number of "buckets" or temporary containers.
How it works:
Distribute: You create a number of empty buckets. You then go through the original list and place each item into a bucket based on its value. For example, if you’re sorting numbers from 0 to 99, you could have 10 buckets, where the first bucket holds numbers 0-9, the second holds numbers 10-19, and so on.
Sort Buckets: You then sort each individual bucket. Since the items within each bucket are already in a smaller, manageable range, a simple sorting algorithm like Insertion Sort works well.
Combine: Finally, you concatenate the sorted buckets back together to get the fully sorted list.
Bucket Sort works best when the input data is uniformly distributed across the range. Its performance depends on how evenly the items are spread out. In the best case, it can be a linear O(n) time algorithm. It’s a great example of an algorithm that uses an initial distribution step to simplify the problem, making the final sorting a breeze.
9.1 The Simplest Cases: Minimum and Maximum 🕵️
Before we find the k-th smallest item in a list, let’s start with the simplest problem: finding the smallest and largest items. A naive approach might be to sort the entire list and then take the first and last items. However, sorting the whole list is overkill and inefficient if all you need are the minimum and maximum values.
A better approach is to simply iterate through the list once, keeping track of the smallest and largest items you’ve seen so far. You would start with the first item as both the minimum and maximum, and then for each subsequent item, you would compare it to your current minimum and maximum, updating them if the new item is smaller or larger. This is a simple linear O(n) algorithm. You can do even better by comparing pairs of items at a time, which can reduce the number of comparisons.
9.2 Selection in Expected Linear Time 🏃
The selection problem is a generalization of finding the minimum or maximum. It asks: what is the k-th smallest item in an unsorted list? For example, if k=1, you want the minimum. If k is the middle of the list, you’re looking for the median.
We can solve this problem in expected linear time O(n) using a randomized algorithm. The method is very similar to Quicksort’s partitioning process:
Choose a Pivot: Pick a random item from the list to be the pivot.
Partition: Partition the list around the pivot, so that all items smaller than the pivot are on the left and all items larger are on the right.
Recurse: After the partitioning, the pivot is in its final sorted position. You can now tell if the k-th smallest item is in the left sub-list, the right sub-list, or if the pivot itself is the answer. You then recursively search only the sub-list that contains the item you’re looking for, ignoring the other sub-list entirely.
Because we only need to recurse on one side, this algorithm is much faster on average than Quicksort, which has to recurse on both sides. On average, this method performs in O(n) time.
9.3 Selection in Worst-Case Linear Time 🧗
The randomized selection algorithm is fast on average, but what if we get unlucky with our pivot choices? There is a more complex, deterministic algorithm that guarantees a worst-case linear time O(n) performance. This algorithm is known as the "median-of-medians" algorithm.
How it works:
Divide into Groups: The list is divided into groups of 5.
Find Medians: The median of each group is found.
Find the Pivot: The median of all these medians is found, and this value is used as the pivot.
Partition and Recurse: The list is partitioned around this pivot, and the algorithm recursively searches on one of the partitions.
This process ensures that the chosen pivot is always a "good" pivot, meaning it’s not too close to the smallest or largest value. By making this guarantee, the algorithm’s worst-case performance is linear, O(n). While it’s more complicated to implement, it provides a powerful guarantee that is useful in critical applications.
So far, you’ve mostly worked with simple lists to store data. In this chapter, we’ll begin exploring different ways to structure information to make operations like searching, adding, and removing items incredibly fast. These data structures are the containers you’ll use to organize your data and are fundamental to writing efficient code.
10.1 Orderly Lines: Stacks and Queues 🗂️
You’ve already been introduced to the basic concepts of stacks and queues, but let’s take a closer look at how they work and where they’re used.
A stack is a data structure where items are added and removed in a Last-In, First-Out (LIFO) order. Think of a stack of plates: you can only add a new plate to the top, and you can only take a plate from the top. The last plate you put on is the first one you take off. The two main operations are Push (adding an item) and Pop (removing an item). Stacks are often used to manage a program’s function calls or the "undo" feature in an application.
A queue is a data structure where items are added to one end and removed from the other in a First-In, First-Out (FIFO) order. Think of a waiting line at a store: the first person to get in line is the first one to be served. The two main operations are Enqueue (adding an item) and Dequeue (removing an item). Queues are used for managing tasks in a system, like print jobs or email sending.
10.2 The Connected Web: Linked Lists ⛓️
A linked list is a data structure where items are connected to each other like a chain. Each item in the list, called a node, contains two parts: the data itself and a reference (or pointer) to the next node in the sequence. This structure is very flexible for adding and removing items. Unlike an array or list, you don’t have to shift a lot of data around when you insert or delete something in the middle; you just need to change a few pointers.
10.3 The Underpinnings: Implementing Pointers and Objects 🧩
Python doesn’t have explicit "pointers" like some other languages, but we can simulate them using classes and object references. We’ll use this approach to create our own custom data structures, like a linked list or a tree.
For a linked list, we can create a Node class that holds the data and a reference to the next node.
class Node:
def __init__(self, data=None):
self.data = data
self.next = None
This simple class gives us the basic building block to create a linked list by connecting nodes together. The self.next attribute acts as our pointer to the next item in the chain.
10.4 The Family Tree: Representing Rooted Trees 🌳
A rooted tree is a hierarchical data structure where one item is the "parent" of others. Each node in the tree can have multiple children, but each child has only one parent. This structure is perfect for modeling things like a computer’s file system or a family tree.
Just like with a linked list, we can represent a rooted tree using classes. Each node would contain its data and a list of references to its children.
class TreeNode:
def __init__(self, data=None):
self.data = data
self.children = []
11.1 The Simplest Way: Direct-Address Tables 📬
Imagine you have a small list of items, each with a unique key, like a number from 0 to 99. If you wanted to quickly find an item based on its key, you could use a direct-address table. This is an array where the index of each item in the array is its key.
For example, if you have a key with the value 57, you could go directly to index 57 in your array to find the item. The lookup time is constant, or O(1), no matter how many items are in your list. It’s the simplest and fastest way to find an item.
The problem with direct-address tables is that they require a huge amount of memory if the keys are large. If your keys are social security numbers, your array would need to have over a trillion spaces, which is not practical. This is why we need a more flexible solution.
11.2 Hash Tables 🧪
A hash table is a clever, real-world version of a direct-address table. It allows you to store and retrieve items efficiently, even when the keys are large or non-numeric. A hash table uses a hash function to map a key to a specific, smaller index in an array.
For example, a hash function could take a person’s name as a key and convert it into a number between 0 and 99. That number is the index where the person’s information will be stored. When you want to find the person, you just run their name through the same hash function, get the index, and go directly to that spot in the array. This allows for very fast lookups.
The main challenge with hash tables is that two different keys can sometimes produce the same index. This is called a collision. We need to have a strategy for handling these collisions.
11.3 Hash Functions 🧠
A good hash function is the secret to a great hash table. A good hash function should be easy to compute and should distribute keys as evenly as possible across the available indices. If keys are clustered together, it will lead to more collisions and slow down the hash table. We’ll explore some common ways to design a hash function and the methods for dealing with collisions in the next section.
11.4 Open Addressing: Finding an Empty Spot 🚶
When a hash function produces the same index for two different keys, a collision occurs. We need a way to handle this without overwriting existing data. Open addressing is a strategy for resolving collisions by "probing," or searching for, the next available empty spot in the hash table. Instead of putting a new item in the same location as another, you look for a new, open spot to place it.
The process of finding an empty spot is called probing. There are a few different ways to do it:
Linear Probing: When a collision occurs, you simply check the next index, and the next, and the next, until you find an open spot. This is simple to implement but can lead to a problem called "clustering," where a bunch of occupied spots form a long block.
Quadratic Probing: To avoid linear clustering, you can use quadratic probing. When a collision occurs at index h, you check indices h + 1^2, h + 2^2, h + 3^2, and so on. This spreads out the probes more evenly.
Double Hashing: This is a more sophisticated method that uses a second hash function to determine the size of the "steps" you take to find the next open spot. This is one of the best methods for avoiding clustering.
12.1 What is a Binary Search Tree? 🌲
A binary search tree is a special kind of tree where the nodes are organized in a specific way to make searching very fast. It’s a logical, hierarchical way to store data. The key rule of a binary search tree is that for any given node, all the values in its left subtree are smaller than the node’s own value, and all the values in its right subtree are larger. This property is what makes it so useful.
For a tree to be a binary search tree, it must follow these rules:
The left subtree of a node contains only values less than the node’s value.
The right subtree of a node contains only values greater than the node’s value.
Both the left and right subtrees must also be binary search trees.
12.2 Finding What You Need: Querying a Binary Search Tree 🧭
Because of the rules we just learned, finding a value in a binary search tree is a lot like a binary search on a sorted list. The process is as follows:
Start at the root.
Compare: Check if the value you are looking for is equal to the current node’s value. If it is, you’ve found it!
Go Left or Right: If your target value is smaller, you move to the left child. If it’s larger, you move to the right child.
Repeat: You continue this process, moving down the tree, until you find the value or you reach an empty spot, which means the value isn’t in the tree.
You can also use this same logic to find the minimum and maximum values in the tree. The minimum value will always be the leftmost node, and the maximum will always be the rightmost node.
12.3 Growing and Shrinking a Tree: Insertion and Deletion 🌱
To add a new item to a binary search tree, you use the same logic as searching. You start at the root and travel down the tree until you find an empty spot where the new item should go, based on the less than and greater than rule.
Deleting an item is a bit more complex. You first find the item you want to delete. Then, you handle a few different cases:
If the node has no children, you simply remove it.
If the node has one child, you replace the node with its child.
If the node has two children, you find its successor (the smallest value in its right subtree), replace the node with the successor, and then delete the successor from its original position. This process keeps the tree’s order intact.
A red-black tree is a type of binary search tree that automatically keeps itself balanced to ensure operations remain fast, even in the worst case. While a simple binary search tree can become lopsided and slow down if items are inserted in a specific order, a red-black tree guarantees that the path from the root to any leaf is never more than twice as long as the shortest path. This is a powerful guarantee that ensures all operations, like searching, insertion, and deletion, maintain an efficient O(log n) runtime.
13.1 The Rules of Balance: Properties of Red-Black Trees 🚦
A red-black tree maintains its balance by following a set of simple, clever rules. Every node in the tree is colored either red or black, and the tree must satisfy the following properties:
Every node is either red or black.
The root is black.
Every leaf (which is a None or nil node) is black.
If a node is red, both of its children must be black.
For each node, all simple paths from the node to a descendant leaf contain the same number of black nodes.
The fourth rule is especially important; it means you can never have two red nodes in a row. The fifth rule, known as the black-height property, is the key to the tree’s balance. It guarantees that the longest possible path is not much longer than the shortest possible path, preventing the tree from becoming too lopsided.
13.2 The Magic of Rebalancing: Rotations 🔄
When you insert or delete a node in a red-black tree, you might break one of the properties. The tree fixes this with a local rearrangement called a rotation. A rotation is a simple, constant-time operation that restructures the tree to fix the balance issues without violating the binary search tree property. There are two types of rotations: left rotations and right rotations. Each one is the mirror image of the other.
The simplest way to think about a rotation is that it "pivots" the tree at a specific node, moving one subtree up and another down to change the height of different branches.
13.3 Adding Items: Insertion ➕
Inserting a new node into a red-black tree is a multi-step process.
Standard Binary Search Tree Insertion: You insert the new node just like you would in a regular binary search tree, following the less than and greater than rules. The new node is always colored red.
Fixing Violations: After the insertion, the new red node might violate the "no consecutive red nodes" rule (Property 4) if its parent is also red. The algorithm then enters a fix-up phase where it uses a series of rotations and recolorings to restore all the red-black properties. The fix-up process travels up the tree, making local adjustments until the entire tree is balanced again.
13.4 Deleting Items: Deletion ➖
Deleting a node from a red-black tree is more complex than insertion because it can cause multiple rule violations. The process is as follows:
Standard Binary Search Tree Deletion: You first find and remove the node using the standard binary search tree deletion process.
Fixing Violations: The removal of a node, especially a black node, can violate the black-height property. To fix this, the algorithm rebalances the tree using a series of rotations and recolorings. This process can be tricky because it involves a lot of different cases, but the end result is that the tree’s balance and properties are fully restored.
14.1 Beyond the Basics: Dynamic Order Statistics 📊
Sometimes, a standard data structure isn’t enough. We might need to ask more complex questions about the data it holds. For example, with a standard binary search tree, we can find an item in O(log n) time, but what if we need to find the i-th smallest item? We could traverse the tree and count, but that would be slow.
Dynamic order statistics are a way to extend a data structure to quickly answer questions about the rank or order of its elements. We can add extra information to each node in a binary search tree to make this possible. By storing the number of nodes in each subtree, we can find the i-th smallest item in O(log n) time, without having to sort the entire structure. This is a great example of augmenting a data structure to solve a new problem efficiently.
14.2 The Augmentation Blueprint blueprints
The process of enhancing a data structure follows a general blueprint.
Choose an underlying structure: You need to select a data structure that can be easily modified, like a red-black tree, which is a balanced and flexible base.
Add attributes: Decide what extra information each node needs to store. For dynamic order statistics, we add a size attribute to each node to keep track of the number of nodes in its subtree.
Modify operations: You must update the core operations of the data structure (like insertion, deletion, and rotation) to maintain this new information. When you insert a node, you need to update the size of all its ancestors.
Develop new operations: Finally, you create a new operation that uses the added information to solve your new problem. For example, a select(i) function would use the size attribute to find the i-th smallest item in logarithmic time.
This blueprint allows you to build a powerful, custom tool from a standard, flexible base.
14.3 Interval Trees ⏱️
An interval tree is a perfect example of an augmented data structure. It’s used to efficiently find all intervals that overlap with a given point or another interval. This is useful for things like scheduling, where you might want to find all meetings that overlap with a specific time.
An interval tree is a red-black tree that has been augmented to store intervals. In addition to the standard key, each node also stores the maximum endpoint of any interval in its subtree. When you search for overlapping intervals, you use this new information to quickly prune the search space and avoid checking intervals that could not possibly overlap. By cleverly augmenting a balanced tree, we can solve a complex geometric problem with high efficiency.
This section is about designing smart, powerful algorithms to solve complex problems. We’ll explore two general strategies: dynamic programming and greedy algorithms. These are not single algorithms, but rather a way of thinking about problems that can lead to incredibly efficient solutions.
Dynamic programming is an algorithmic technique for solving problems by breaking them down into smaller, simpler, overlapping subproblems. Instead of solving each subproblem over and over again, you solve each one just once and store the answer in a table. When you need the answer to a subproblem again, you can just look it up. This saves a lot of time and effort. It’s like building with LEGOs: once you build a single, common piece, you can reuse it many times instead of building it again from scratch.
15.1 The Smart Shortcut: Rod Cutting 📏
We’ll solve a classic problem to understand dynamic programming: how to cut a rod of a given length into smaller pieces to maximize profit. For each length of a rod, there is a different price. A 4-meter rod could be sold as one 4-meter piece, or two 2-meter pieces, or four 1-meter pieces, and so on. The number of combinations is huge.
A naive recursive approach would be to calculate the profit for every single combination, but that would be incredibly slow because it recomputes the solutions for the same subproblems many times. With dynamic programming, we use a different approach. We build a table to store the optimal profit for each rod length from 1 up to the total length. To find the best way to cut a rod of length n, we simply look up the best ways to cut smaller rods and combine them.
def cut_rod_dp(prices, n):
"""
Computes the maximum profit for a rod of length n.
prices is a list where prices[i] is the price of a rod of length i+1.
"""
profits = [0] * (n + 1)
# Iterate through all possible rod lengths from 1 up to n
for i in range(1, n + 1):
max_profit = float('-inf')
# Consider all possible first cuts
for j in range(i):
current_profit = prices[j] + profits[i - j - 1]
if current_profit > max_profit:
max_profit = current_profit
profits[i] = max_profit
return profits[n]
# Example usage:
# prices list for lengths 1 through 10
# prices = [1, 5, 8, 9, 10, 17, 17, 20, 24, 30]
# max_profit = cut_rod_dp(prices, 4)
# print(max_profit) # Expected output: 10 (from a single cut of length 4)
In this code, the profits list is our table. We fill it up from the bottom (smaller rod lengths) to the top (the full rod length), ensuring that we never have to recompute a subproblem’s solution.
15.2 From Little to Big: Matrix-Chain Multiplication 📊
We’ll apply dynamic programming to another problem: finding the most efficient way to multiply a series of matrices. Multiplying matrices is not a simple task, and the order in which you multiply them can have a huge effect on the number of calculations required. By using dynamic programming, we can break the problem into subproblems of multiplying smaller chains of matrices. We solve these subproblems once and store the results, building up to the most efficient way to multiply the entire chain.
def matrix_chain_order(p):
"""
Computes the minimum number of scalar multiplications needed
to multiply a chain of matrices.
p is a list where p[i] is the number of rows of matrix i and
p[i+1] is the number of columns.
"""
n = len(p) - 1
# Create a table to store the costs
costs = [[0] * n for _ in range(n)]
# Iterate through all possible chain lengths
for length in range(2, n + 1):
for i in range(n - length + 1):
j = i + length - 1
costs[i][j] = float('inf')
# Find the best split point
for k in range(i, j):
cost = costs[i][k] + costs[k + 1][j] + p[i] * p[k + 1] * p[j + 1]
if cost < costs[i][j]:
costs[i][j] = cost
return costs[0][n - 1]
# Example usage:
# p represents dimensions: A(10x100), B(100x5), C(5x50)
# p = [10, 100, 5, 50]
# min_mult = matrix_chain_order(p)
# print(min_mult) # Expected output: 7500
This algorithm builds a table of costs for multiplying smaller chains of matrices. The costs[i][j] entry stores the minimum cost for multiplying the chain of matrices from i to j. By filling this table systematically, we find the optimal solution for the entire chain.
15.3 Elements of Dynamic Programming 🧩
Not every problem can be solved with dynamic programming. For this approach to be a good fit, a problem must have two key properties:
Optimal Substructure: The optimal solution to the problem contains within it the optimal solutions to subproblems. This means you can find the best solution to the large problem by combining the best solutions to the smaller ones. For example, the maximum profit for a rod of length 7 is made up of the best profit for a smaller rod and a single remaining piece.
Overlapping Subproblems: The recursive solution to the problem revisits the same subproblems over and over again. This is where dynamic programming saves so much time. By storing the answers to these subproblems in a table, you avoid redundant calculations. The rod-cutting problem is a perfect example: finding the best way to cut a 4-meter rod and a 3-meter rod both require knowing the best way to cut a 2-meter rod.
15.4 The Longest Shared Sequence: Longest Common Subsequence 📜
We’ll use dynamic programming to find the longest common subsequence between two different strings. A subsequence is a sequence of characters that appear in the same relative order but not necessarily contiguously. For example, "ace" is a subsequence of "abcde".
We can solve this problem by building a table. Each cell in the table will store the length of the longest common subsequence for a prefix of the two strings. We fill this table systematically, starting from the top-left and working our way to the bottom-right. By doing so, we’re building up the solution to the larger problem by reusing the solutions for smaller prefixes.
def lcs_length(X, Y):
"""
Computes the length of the longest common subsequence of strings X and Y.
"""
m = len(X)
n = len(Y)
# Create a table to store the lengths of the common subsequences
# We use m+1 and n+1 to handle empty prefixes
table = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(m):
for j in range(n):
if X[i] == Y[j]:
# If characters match, add 1 to the previous diagonal value
table[i + 1][j + 1] = table[i][j] + 1
else:
# If they don't match, take the max of the top or left cell
table[i + 1][j + 1] = max(table[i + 1][j], table[i][j + 1])
return table[m][n]
# Example usage:
# X = "ABCBDAB"
# Y = "BDCABA"
# print(lcs_length(X, Y)) # Expected output: 4 ("BCBA" is one possible subsequence)
15.5 Optimal Binary Search Trees 🌳
We can also use dynamic programming to solve the problem of arranging a binary search tree to minimize the average search time. Given a set of keys and the probability that each key will be searched, a key’s location in the tree affects how many comparisons are needed to find it.
A naive approach might be to try every single possible tree structure, but this is too slow. Instead, dynamic programming helps us find the optimal structure by breaking the problem into subproblems. We solve for the optimal trees for all possible subsets of keys, starting from single keys and building up. We store these optimal solutions, and the solution for the full set of keys is built on the solutions for these smaller subsets. This ensures that we make the best decision for each part of the tree, leading to the overall optimal structure.
Greedy algorithms are a class of algorithms that solve problems by making the choice that looks best at the moment. At each step, a greedy algorithm makes a locally optimal choice in the hope that this choice will lead to a globally optimal solution. Think of it like navigating a maze by always choosing the path that seems to get you closer to the exit, without considering the long-term consequences. While this strategy doesn’t work for every problem, it is very effective for a specific set of problems with the right properties.
16.1 An Activity-Selection Problem 🗓️
A perfect example of where a greedy algorithm works is the activity-selection problem. Imagine you have a list of activities, each with a start and end time, and you can only participate in one activity at a time. The goal is to choose the largest number of non-overlapping activities.
A greedy solution is to sort the activities by their finish times and then, at each step, choose the activity that finishes earliest. This is a locally optimal choice because it frees up the room for the next activity as soon as possible, leaving the maximum amount of time for the remaining activities.
def greedy_activity_selector(start, finish):
"""
Selects the maximum number of non-overlapping activities.
start and finish are lists of start and finish times.
"""
n = len(start)
# The first activity is always selected
selected_activities = [0]
k = 0
# Iterate through the rest of the activities
for m in range(1, n):
# If the start time of the next activity is after or at the finish time
# of the last selected activity, select it.
if start[m] >= finish[k]:
selected_activities.append(m)
k = m
return selected_activities
# Example usage:
# start_times = [1, 3, 0, 5, 8, 5]
# finish_times = [2, 4, 6, 7, 9, 9]
# sorted by finish times
# print(greedy_activity_selector(start_times, finish_times)) # Expected output: [0, 1, 3, 4]
This simple, greedy strategy leads to the best possible solution for this problem.
16.2 Elements of the Greedy Strategy 🧭
For a greedy algorithm to work, a problem must exhibit two main properties:
Greedy-choice property: You can make a locally optimal choice, and that choice will always lead to a globally optimal solution. The decision you make at each step does not depend on future choices. In the activity-selection problem, choosing the earliest-finishing activity is always part of the optimal solution.
Optimal substructure: The optimal solution to the problem contains within it the optimal solutions to subproblems. This is a property that greedy algorithms share with dynamic programming. The optimal solution to the activity-selection problem for the whole set is built from the optimal solutions for smaller subsets.
16.3 Huffman Codes: A Clever Compression 📦
Huffman coding is a powerful data compression algorithm that uses a greedy approach. The goal is to represent a message using as few bits as possible. The algorithm does this by assigning shorter binary codes to characters that appear more frequently and longer codes to those that appear less frequently.
The greedy strategy is to always choose the two characters with the lowest frequencies and merge them into a new parent node. This process is repeated until all characters are part of a single tree. The final tree, called a Huffman tree, gives us the optimal prefix codes for each character, meaning no code is a prefix of another. This allows us to decompress the message without any confusion. It’s a beautiful example of how a simple, greedy rule can lead to a powerful, globally optimal result.
Sometimes, a single operation can be very slow, but if you look at a whole sequence of operations, the average time per operation is very fast. Amortized analysis is a way of looking at the total cost of a sequence of operations to prove that the average cost is low, even if some individual operations are expensive. This is a more realistic way to analyze data structures that grow and shrink over time, like dynamic arrays.
Think of it like buying something with a coupon. The first item might be full price, but a coupon on the receipt gives you a discount on the next item. Amortized analysis is like calculating your average cost per item after a long shopping trip, where the cost of the first item is "amortized" over all the discounted ones.
17.1 Aggregate Analysis: Summing the Costs ➕
Aggregate analysis is the simplest way to perform an amortized analysis. You look at the entire sequence of n operations and calculate the total cost for all of them. Then, you divide that total cost by n to get the average cost per operation.
For example, a dynamic array might double in size whenever it gets full. Inserting an item is usually an O(1) operation, but when the array is full, the insertion requires a costly O(n) operation to copy all the items to a new, larger array. To analyze this, you would sum the cost of all the n insertions. The total cost turns out to be O(n), which means the average cost per operation is O(n)/n, or O(1). Even though some insertions are expensive, the average cost over a long sequence is constant.
17.2 The Accounting Method: A Credit System 💰
The accounting method is a more intuitive way to perform amortized analysis. We assign an amortized cost to each operation, which is a little more than its actual cost. The difference is stored as a "credit." This credit can then be used to pay for more expensive operations later on.
Using our dynamic array example, we can assign an amortized cost of 3 to each insertion. The first insertion costs 1, and we store 2 credits. The next insertion costs 1, and we store 2 more credits. When a costly resizing operation happens, we use the banked credits to "pay" for the cost of the move. We can prove that for any sequence of operations, we will always have enough credits to pay for the expensive resizes, which proves that the average cost is constant.
17.3 The Potential Method: Stored Energy 🔋
The potential method is the most formal way to do amortized analysis. It’s like modeling the data structure as a physical system that can store "potential energy." An expensive operation decreases the potential, and a cheap one increases it. The total amortized cost of an operation is its actual cost plus the change in potential.
By defining a potential function that is never negative, we can prove that the total amortized cost of a sequence of operations is an upper bound on the actual total cost. This method is the most powerful because it doesn’t require us to know the sequence of operations in advance.
17.4 Dynamic Tables: A Real-World Example 📝
A dynamic table is a data structure that grows and shrinks as needed to accommodate the number of items it holds. We’ve seen a dynamic array that grows when it’s full. We can also make it shrink when it’s too empty to save space. We can use amortized analysis to show that both growing and shrinking a dynamic table has an amortized cost of O(1). This is a key reason why data structures like Python’s list are so efficient and flexible.
This part of the book introduces more complex and powerful data structures used in specialized applications. These structures are often built upon the foundations we’ve already covered, such as trees and heaps, but are designed to solve specific problems with exceptional efficiency, particularly when dealing with large datasets or operations that are not standard.
18.1 The Idea of B-Trees 🌳
A B-Tree is a type of tree data structure that is designed to work with data stored on a disk or other slow storage devices. The primary goal of a B-Tree is to minimize the number of times it has to access the disk, as this is a very slow operation compared to working with data in memory. B-Trees keep their height low by allowing each node to have many children. This means you can find an item by following a short path from the root, which requires only a few disk accesses.
18.2 Core Operations on B-Trees 🔎
Searching, inserting, and deleting keys in a B-Tree are similar to operations in a binary search tree, but with modifications to handle the fact that each node can hold many keys and have many children.
Searching: To search for a key, you start at the root and navigate down the tree, choosing the correct child node to follow at each step by comparing the key to the keys stored in the current node.
Insertion: When inserting a key, you find the correct leaf node to add it to. If the leaf is full, it splits into two nodes, and the middle key is moved up to the parent node. This splitting process can continue all the way up to the root, which is how the tree grows taller.
Deletion: Deleting a key is more complex, as it might cause a node to become "underfull".
18.3 Deleting a Key from a B-Tree 🗑️
The process of deleting a key from a B-Tree involves three steps:
Find the key: Locate the key to be deleted.
Delete and potentially fix: Delete the key. If this causes the node to have too few keys, the tree must be rebalanced. This can be done by either borrowing a key from a sibling node or merging the underfull node with a sibling node.
Fixing up the tree: If a merge or borrow occurs, it can cause the parent node to become underfull, and the process repeats all the way up the tree. This ensures the tree’s height remains as short as possible, preserving its efficiency for disk-based operations.
19.1 Structure of Fibonacci Heaps 📈
A Fibonacci heap is a type of priority queue that is incredibly fast for certain operations, which makes it perfect for graph algorithms like Dijkstra’s algorithm. It is a collection of trees that are structured to make operations like decreasing a key and merging heaps very efficient.
19.2 Mergeable-Heap Operations 🤝
A Fibonacci heap is designed to perform a set of operations efficiently. The most common operations on a priority queue are:
insert: Adds a new item to the heap.
minimum: Finds the item with the smallest key.
extract-min: Removes the minimum item from the heap and returns it.
union: Merges two heaps into one. These operations are "lazy," meaning they defer some of the work until it’s absolutely necessary, which helps keep the amortized time low.
19.3 Decreasing a Key and Deleting a Node 📉
The most notable advantage of a Fibonacci heap is its speed in two specific operations:
decrease-key: Changes the key of an item to a smaller value. This is a very fast operation, making it ideal for algorithms that frequently update item priorities, such as finding the shortest path in a graph.
delete: Removes an item from the heap. This is done by first decreasing the key of the item to negative infinity (making it the new minimum) and then using extract-min to remove it.
19.4 Bounding the Maximum Degree 📏
To ensure the high performance of a Fibonacci heap, it is important to keep the number of children for each node (its degree) from getting too large. A key property of the Fibonacci heap is that the degree of any node is bounded by a logarithmic function of the number of nodes in the heap, which helps maintain the overall efficiency of the data structure.
20.1 Preliminary Approaches: The Foundation of Speed 🚀
Before diving into the van Emde Boas tree (or vEB tree), let’s consider a simple problem: finding the next item in a set of integers. If the set is small and the numbers are in a tight range, a simple array or boolean list works perfectly. You can check if a number exists by looking up its index. But what if the numbers are very large and sparse (far apart)? This would require a huge array, wasting a lot of memory. This is the problem the vEB tree was designed to solve. It offers a way to get the best of both worlds: the speed of an array with the space efficiency of other data structures.
20.2 A Recursive Structure: The Core Idea 🧩
The vEB tree is a clever data structure that uses recursion. It’s built in a hierarchical way, where a large vEB tree is made up of smaller vEB trees. Think of a large city divided into districts, where each district is a smaller city with its own subdivisions. To find a specific house, you first find the right district, then the right neighborhood within that district, and so on.
The vEB tree for a universe of numbers from 0 to U-1 is built on two levels:
The top level: A single "summary" vEB tree that keeps track of which smaller vEB trees are not empty.
The bottom level: An array of smaller vEB trees, each of which is responsible for a smaller range of numbers.
To find the next item, you first look in the summary tree to find the next non-empty smaller tree. Once you find it, you search within that smaller tree. This recursive approach is what gives the vEB tree its incredible speed.
20.3 The van Emde Boas Tree: The Full Picture 🖼️
The full vEB tree structure ensures that operations like finding the next item, finding the previous item, insertion, and deletion can all be done in O(log log U) time, where U is the size of the universe of numbers. This is incredibly fast, especially for very large universes.
Here’s how the core operations work:
insert(x): To insert an item x, you first insert it into the correct smaller tree. You then update the summary tree to reflect that this smaller tree is now non-empty. This process is very quick.
find_next(x): To find the next item after x, you first look in the smaller tree that contains x. If you find one, you’re done. If not, you find the next non-empty smaller tree by searching the summary tree and then find the first item in that tree.
The vEB tree is a sophisticated data structure that is used in specialized scenarios where finding the next or previous item quickly is critical. It’s a testament to how creative recursive thinking can lead to breakthroughs in algorithm design.
21.1 Disjoint-Set Operations 🤝
A disjoint-set data structure, also known as a union-find data structure, is a tool that keeps track of a collection of elements partitioned into a number of disjoint (non-overlapping) sets. It is used to model relationships where elements can be grouped together, and it provides two core operations:
FIND-SET(x): This operation returns a unique representative (or root) of the set containing element x. You can use this to check if two elements are in the same set by seeing if their representatives are the same.
UNION(x, y): This operation merges the two sets containing elements x and y into a single, new set.
This data structure is great for problems that involve grouping, like finding the connected components in a graph or in network connectivity problems.
21.2 Linked-List Representation of Disjoint Sets ⛓️
The simplest way to implement a disjoint-set data structure is to use a linked list for each set. Each node in the list contains a pointer to the next element and a pointer to the head of the list, which serves as the set’s representative.
FIND-SET(x): To find the representative of element x, you simply follow the pointer from x’s node to the head of its linked list. This takes constant time, O(1).
UNION(x, y): To merge the sets of x and y, you change the representative pointer of every node in one list to point to the representative of the other list. This can be slow, as the runtime depends on the size of the list being merged. To make it more efficient, we can always merge the smaller list into the larger one, a strategy called union by size.
21.3 Disjoint-Set Forests: A Faster Approach 🌳
A more advanced and highly efficient way to implement a disjoint-set data structure is to use a disjoint-set forest. In this representation, each set is a tree, and each node has a pointer to its parent. The root of the tree is the set’s representative.
To make operations even faster, we can use two clever techniques:
Union by Rank: When merging two sets, we attach the root of the tree with the smaller "rank" (or height) to the root of the tree with the larger rank. This keeps the trees short and bushy, which speeds up the find operation.
Path Compression: During a FIND-SET operation, after finding the root of the tree, we go back through the path from the original node to the root and make every node on that path point directly to the root. This "flattens" the tree and makes future FIND-SET operations much faster.
With these optimizations, the amortized running time for a sequence of operations is extremely fast, nearly constant. The disjoint-set forest is a great example of a data structure that achieves exceptional performance through a combination of clever implementation choices.
22.1 How to Describe a Graph: Representations 📝
Before we can work with graphs, we need to find a way to represent them in a computer. The two primary ways are an adjacency list and an adjacency matrix.
Adjacency List: This is a list of lists. Each index i in the main list represents a node, and the inner list at that index contains all the nodes that i is connected to. This is an efficient way to represent a sparse graph (a graph with few connections).
# Adjacency list representation of a directed graph
# (A -> B, A -> C, B -> D)
graph_adj_list = {
'A': ['B', 'C'],
'B': ['D'],
'C': [],
'D': []
}
Adjacency Matrix: This is a two-dimensional array where a cell at [i][j] is 1 if there is a connection between node i and node j, and 0 otherwise. This is an efficient way to represent a dense graph (a graph with many connections).
# Adjacency matrix representation (for 4 nodes)
graph_adj_matrix = [
[0, 1, 1, 0], # Connections from node A (index 0)
[0, 0, 0, 1], # Connections from node B (index 1)
[0, 0, 0, 0], # Connections from node C (index 2)
[0, 0, 0, 0] # Connections from node D (index 3)
]
22.2 Exploring the Network: Breadth-First Search (BFS) 🌊
Breadth-First Search (BFS) is a simple algorithm for exploring a graph. It works by exploring the graph layer by layer, like ripples spreading out in a pond. It starts at a chosen node and explores all of its immediate neighbors, then all of their neighbors, and so on. This process is managed with a queue, which ensures that nodes are visited in the order they are discovered. BFS is used to find the shortest path between two nodes in an unweighted graph and for web crawling.
from collections import deque
def bfs(graph, start_node):
"""Performs a Breadth-First Search on a graph."""
visited = set()
queue = deque([start_node])
visited.add(start_node)
while queue:
current_node = queue.popleft()
print(current_node, end=" ")
for neighbor in graph[current_node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
# Example usage with the adjacency list
# graph_adj_list = {'A': ['B', 'C'], 'B': ['D'], 'C': [], 'D': []}
# print("BFS traversal starting from 'A':")
# bfs(graph_adj_list, 'A') # Output: A B C D
22.3 The Deep Dive: Depth-First Search (DFS) 🤿
Depth-First Search (DFS) is another way to explore a graph. Instead of exploring layer by layer, it goes as deep as possible along each path before it has to backtrack. This process is managed with a stack, which keeps track of the path taken and allows the algorithm to return to a previous node when it hits a dead end. DFS is useful for many things, like checking for cycles in a graph and finding connected components.
def dfs(graph, start_node):
"""Performs a Depth-First Search on a graph."""
visited = set()
stack = [start_node]
while stack:
current_node = stack.pop()
if current_node not in visited:
visited.add(current_node)
print(current_node, end=" ")
# Add neighbors to the stack (in reverse order for consistent output)
for neighbor in reversed(graph.get(current_node, [])):
if neighbor not in visited:
stack.append(neighbor)
# Example usage with the adjacency list
# graph_adj_list = {'A': ['B', 'C'], 'B': ['D'], 'C': [], 'D': []}
# print("\nDFS traversal starting from 'A':")
# dfs(graph_adj_list, 'A') # Output: A C B D
22.4 Ordering Tasks: Topological Sort 🔗
A topological sort is a linear ordering of nodes in a directed acyclic graph (a graph with no cycles) such that for every directed connection from node u to node v, u comes before v in the ordering. Think of it as a way to order a set of tasks that have dependencies, like laying the foundation before building the walls of a house. The topological sort algorithm uses DFS to find a valid ordering.
def topological_sort(graph):
"""
Performs a topological sort on a directed acyclic graph.
Returns a list of nodes in topological order.
"""
visited = set()
stack = [] # This stack will hold the sorted nodes
def dfs_visit(node):
visited.add(node)
for neighbor in graph.get(node, []):
if neighbor not in visited:
dfs_visit(neighbor)
stack.append(node)
for node in graph:
if node not in visited:
dfs_visit(node)
return stack[::-1] # Return the reversed list
# Example usage with a graph representing dependencies
# A -> B, A -> C, C -> D
graph_for_sort = {
'A': ['B', 'C'],
'B': [],
'C': ['D'],
'D': []
}
# print("Topological sort:", topological_sort(graph_for_sort))
# Output: ['A', 'C', 'B', 'D'] (a valid ordering)
22.5 Finding the Groups: Strongly Connected Components 🤝
In a directed graph, a strongly connected component is a set of nodes where every node in the set can be reached from every other node in the same set by following the connections. Finding these components is useful for analyzing network structures. We can use a two-pass algorithm that uses DFS to find these components efficiently.
The algorithm, known as Kosaraju’s algorithm, works as follows:
Perform a DFS on the original graph to get a finishing order of the nodes.
Compute the transpose of the graph, which is a new graph with all the connections reversed.
Perform a second DFS on the transpose graph, but this time, process the nodes in decreasing order of their finishing times from the first DFS. Each DFS tree in this second pass is a strongly connected component.
def get_transpose(graph):
"""Computes the transpose of a directed graph."""
transpose = {node: [] for node in graph}
for node in graph:
for neighbor in graph[node]:
transpose[neighbor].append(node)
return transpose
def find_strongly_connected_components(graph):
"""Finds the strongly connected components of a directed graph."""
visited = set()
stack = []
# First pass: DFS to get finishing times
def dfs_pass1(node):
visited.add(node)
for neighbor in graph.get(node, []):
if neighbor not in visited:
dfs_pass1(neighbor)
stack.append(node)
for node in graph:
if node not in visited:
dfs_pass1(node)
# Second pass: DFS on the transpose graph
transpose_graph = get_transpose(graph)
visited.clear()
sccs = []
while stack:
node = stack.pop()
if node not in visited:
component = []
def dfs_pass2(n):
visited.add(n)
component.append(n)
for neighbor in transpose_graph.get(n, []):
if neighbor not in visited:
dfs_pass2(neighbor)
dfs_pass2(node)
sccs.append(component)
return sccs
# Example usage:
# A <-> B, B -> C, C <-> D
graph_for_scc = {
'A': ['B'], 'B': ['A', 'C'], 'C': ['D'], 'D': ['C']
}
# print("\nStrongly connected components:", find_strongly_connected_components(graph_for_scc))
# Output: [['C', 'D'], ['A', 'B']] (the order may vary)
23.1 Building the Minimal Bridge: Growing a Minimum Spanning Tree 🌱
Imagine a group of islands that you need to connect with a network of bridges. Each possible bridge has a different cost. Your goal is to build just enough bridges to connect all the islands while keeping the total cost as low as possible. This is the idea behind a minimum spanning tree.
A spanning tree is a subset of a graph’s connections that links all the nodes together without forming any cycles. A minimum spanning tree (MST) is a spanning tree where the total weight of all the connections is the smallest possible. We can "grow" a minimum spanning tree by repeatedly adding the cheapest connection that links a new node to the growing tree without creating a cycle.
23.2 The Algorithms of Kruskal and Prim 🤝
There are two classic greedy algorithms for finding a minimum spanning tree. Both make locally optimal choices that lead to a globally optimal solution. They are good examples of how different greedy strategies can solve the same problem.
Kruskal’s Algorithm: This algorithm sorts all the connections in the graph by their weight, from smallest to largest. It then goes through the sorted list and adds each connection to the MST if it doesn’t form a cycle. A disjoint-set data structure is used to efficiently check for cycles. Kruskal’s algorithm builds the MST by adding one connection at a time, creating a "forest" of trees that eventually merge into a single, complete tree.
Prim’s Algorithm: This algorithm grows the MST from a single starting node. At each step, it chooses the cheapest connection that connects a node inside the growing tree to a node outside the tree. This process is managed efficiently with a priority queue, which keeps track of the cheapest connections to the nodes that are not yet in the MST. Prim’s algorithm builds a single tree that eventually spans the entire graph.
Both algorithms guarantee that they will find the minimum spanning tree, but they work best in different situations. Kruskal’s is often faster on sparse graphs, while Prim’s is often better on dense graphs.
24.1 The Bellman-Ford Algorithm: Handling Negative Costs 💸
Most graph algorithms assume that all the connections (or edges) have positive weights. But what if a connection has a negative weight? This could represent a profit in a financial network or a shortcut in a road map. The Bellman-Ford algorithm is a powerful tool for finding the shortest paths from a single starting point to all other nodes in a graph that might contain negative weights.
The algorithm works by repeatedly relaxing all the edges in the graph. Relaxing an edge means checking if the path to a node can be shortened by going through the current edge. The algorithm performs this relaxation process |V| - 1 times, where |V| is the number of nodes in the graph. After this many passes, the shortest paths are guaranteed to have been found, as any simple path in a graph can have at most |V| - 1 edges. The Bellman-Ford algorithm can also detect if the graph contains a negative-weight cycle, which would mean there is no shortest path.
def bellman_ford(graph, start_node):
distances = {node: float('inf') for node in graph}
distances[start_node] = 0
# Relax edges repeatedly
for _ in range(len(graph) - 1):
for node in graph:
for neighbor, weight in graph[node].items():
if distances[node] != float('inf') and distances[node] + weight < distances[neighbor]:
distances[neighbor] = distances[node] + weight
# Check for negative-weight cycles
for node in graph:
for neighbor, weight in graph[node].items():
if distances[node] + weight < distances[neighbor]:
return None # A negative cycle exists
return distances
24.2 Single-Source Shortest Paths in Directed Acyclic Graphs (DAGs) ➡️
When a graph is a directed acyclic graph (DAG), meaning it has no cycles, we can find the shortest paths much more efficiently than with Bellman-Ford. We can do this in a single pass by first performing a topological sort of the graph and then relaxing the edges in that sorted order. Because a topological sort ensures that we process each node only after all nodes that point to it have been processed, we can find the shortest paths in a single pass. This makes the algorithm very fast, running in O(|V| + |E|) time, where |V| is the number of nodes and |E| is the number of edges.
24.3 Dijkstra’s Algorithm: A Greedy Choice for Positive Weights 🚦
Dijkstra’s algorithm is the go-to solution for finding the single-source shortest paths in a graph with non-negative weights. It uses a greedy approach, similar to Prim’s algorithm for MSTs. It maintains a set of nodes whose final shortest-path distances have been determined and a priority queue of nodes that have been "discovered" but not yet fully processed.
The algorithm works as follows:
Initialize all distances to infinity, except for the start node, which has a distance of 0.
Repeatedly extract the node with the smallest distance from the priority queue.
For each neighbor of that node, relax the edge, updating the neighbor’s distance if a shorter path is found.
Once a node has been extracted, its shortest path from the source is final.
This greedy choice of always exploring the closest node ensures that the shortest paths are found efficiently. The algorithm is much faster than Bellman-Ford on most graphs, especially with a good priority queue implementation.
24.4 Difference Constraints and Shortest Paths 📐
Some problems that look completely unrelated to graphs and shortest paths can actually be modeled as such. A system of difference constraints is a set of inequalities that look like this: x_j - x_i <= w_ij. These inequalities arise in many different scenarios, such as finding a valid schedule of events where each event has a specific start time, and there are constraints on how long after one event another can begin.
A remarkable thing is that a system of difference constraints can be solved by finding the single-source shortest paths in a specially constructed graph. We can build a graph where each variable (x_i, x_j, etc.) is a node and each inequality (x_j - x_i <= w_ij) corresponds to a weighted edge from x_i to x_j with weight w_ij. By adding a new starting node and special zero-weight edges, we can then run an algorithm like Bellman-Ford to find the shortest paths. The shortest-path distances to each node then correspond to a valid assignment of values to the variables that satisfies all the original inequalities. This powerful connection shows how a seemingly complex problem can be simplified by thinking about it in a new way.
Sometimes, you don’t just need the shortest path from a single starting point; you need the shortest path between every pair of nodes in a graph. This is the all-pairs shortest paths problem. It’s useful for things like building a routing table for a network or pre-computing travel times between all major cities.
25.1 Shortest Paths and Matrix Multiplication 🧮
A fascinating way to think about the all-pairs shortest paths problem is through matrix multiplication. We can represent a graph with a weighted adjacency matrix, where each cell (i, j) contains the weight of the edge from node i to node j (or infinity if there is no edge). The key insight is that by performing a modified form of matrix multiplication on this matrix, we can find the lengths of the shortest paths that use a specific number of edges. For example, multiplying the matrix by itself gives us the shortest paths that use at most two edges. By repeatedly "multiplying" the matrix, we can find the shortest paths of any length, and after about log(n) multiplications, we have the shortest paths between all pairs.
This is not the most efficient algorithm in practice, but it’s a beautiful example of how a problem can be elegantly solved by transforming it into a different domain.
25.2 The Floyd-Warshall Algorithm: The Dynamic Approach 🔄
The Floyd-Warshall algorithm is a powerful dynamic programming solution that finds the shortest path between every pair of nodes in a graph with non-negative or negative weights (but no negative-weight cycles). It works by systematically trying to improve every path by going through an intermediate node.
The algorithm has three nested loops. The outermost loop goes through every node k in the graph and uses it as a potential intermediate point for a path. The inner loops then check every pair of nodes i and j to see if the path from i to j is shorter by going through k.
# The graph is represented as a matrix of weights
def floyd_warshall(W):
n = len(W)
D = [[W[i][j] for j in range(n)] for i in range(n)]
for k in range(n):
for i in range(n):
for j in range(n):
# Check if the path through k is shorter
if D[i][k] + D[k][j] < D[i][j]:
D[i][j] = D[i][k] + D[k][j]
return D
This algorithm is simple to implement and works for all graphs that do not contain a negative-weight cycle, making it a very versatile tool.
25.3 Johnson’s Algorithm for Sparse Graphs 🌲
For sparse graphs (graphs with many nodes but few connections), the O(n^3) runtime of the Floyd-Warshall algorithm can be too slow. Johnson’s algorithm is a more efficient solution for these graphs.
It works in two main parts:
Re-weight the graph: First, the algorithm uses the Bellman-Ford algorithm to find a set of non-negative "potentials" for each node. It then uses these potentials to create a new set of non-negative edge weights that preserve the shortest-path information.
Run Dijkstra’s: Since the new weights are all non-negative, the algorithm can run Dijkstra’s algorithm from every single node in the graph. This gives us the shortest paths from every starting point.
While it’s more complex, Johnson’s algorithm is faster on sparse graphs because it leverages the efficiency of Dijkstra’s algorithm.
26.1 Flow Networks: Understanding the Pipes 🚰
A flow network is a directed graph where each connection (edge) has a capacity, which is the maximum amount of "stuff" (like water or data) that can flow through it. The network has a single source node (where the flow originates) and a single sink node (where the flow ends). The goal of a maximum flow problem is to determine the highest possible total flow from the source to the sink without exceeding any edge’s capacity. This is useful for modeling things like traffic in a city, data packets in a computer network, or liquids in a plumbing system.
26.2 The Ford-Fulkerson Method: Finding the Path to Capacity 🔍
The Ford-Fulkerson method is an algorithm that solves the maximum flow problem by repeatedly finding a path from the source to the sink that has available capacity. Such a path is called an augmenting path because it allows you to increase the total flow in the network.
The algorithm works as follows:
Initialize: Start with a flow of 0 on all edges.
Find a Path: Use an algorithm like Breadth-First Search or Depth-First Search to find an augmenting path from the source to the sink in the residual graph. The residual graph shows the remaining capacity on each edge.
Augment the Flow: Once a path is found, increase the flow along that path by the minimum available capacity on any of its edges (this is the path’s bottleneck).
Repeat: Continue finding augmenting paths and increasing the flow until no more augmenting paths can be found.
When no more paths can be found, the flow is at its maximum.
26.3 Maximum Bipartite Matching: The Perfect Match 🩷
A fascinating application of the maximum flow problem is in solving maximum bipartite matching. A bipartite graph is a graph whose nodes can be divided into two disjoint sets, say U and V, such that every connection links a node in U to a node in V. A matching is a set of connections where no two connections share a common node. A maximum matching is one with the largest possible number of connections.
We can solve this problem by transforming it into a maximum flow problem. We create a new graph with a source connected to all nodes in U and a sink connected to all nodes in V. We set the capacities of all connections to 1. The maximum flow we find in this new graph is equal to the size of the maximum matching in the original bipartite graph.
26.4 Push-Relabel Algorithms (An Advanced Topic) ➡️
The Ford-Fulkerson method can be slow on some graphs. A more modern and efficient class of algorithms called push-relabel algorithms solve the maximum flow problem in a different way. They work by maintaining an excess flow at each node and "pushing" this flow through the network until it reaches the sink. These algorithms are generally much faster in the worst-case scenario.
26.5 The Relabel-to-Front Algorithm (An Advanced Topic) ⏪
The relabel-to-front algorithm is a specific and highly efficient type of push-relabel algorithm. It maintains a list of nodes that have excess flow and systematically processes them, pushing the excess flow forward. This algorithm’s clever use of a list of "active" nodes ensures it runs quickly and is often the best choice for solving maximum flow problems in practice.
27.1 The Basics of Dynamic Multithreading 👯
So far, our algorithms have been designed for a single processor, running one instruction at a time. But modern computers often have multiple processors, and this allows for parallel computing. Dynamic multithreading is a programming model that allows you to specify when parts of your program can be run in parallel. It’s a way to tell the computer, "These tasks don’t depend on each other, so you can run them at the same time."
The main ideas behind dynamic multithreading are simple:
spawn: This keyword tells the computer to run a function in a new thread, allowing it to execute in parallel with the current thread. The spawn command returns immediately, and the program continues to the next instruction.
sync: This keyword forces the program to wait until all the child threads that were "spawned" from the current thread have completed their work.
This model allows for a flexible and intuitive way to write algorithms that can take advantage of multiple processors without having to manage the low-level details of each thread.
27.2 Multithreaded Matrix Multiplication ✖️
Matrix multiplication is a perfect problem for multithreading because the subproblems are independent of each other. To multiply two n x n matrices, we need to compute n^2 individual dot products. Since each dot product can be calculated independently, we can use a multithreaded approach to speed up the process. We can spawn a separate thread to calculate each dot product, and then sync to collect all the results and form the final matrix.
27.3 Multithreaded Merge Sort 🔄
We can also apply multithreading to the Merge Sort algorithm. Recall that Merge Sort works by recursively splitting a list in half until it gets down to single elements and then merging the sorted sub-lists back together. The "divide" step is a perfect place for multithreading. We can spawn a new thread to sort the right half of the list, while the main thread sorts the left half. After both subproblems are sorted, we can sync and then merge the results. This can provide a significant speed-up, especially for very large lists. The merge step itself can also be parallelized, though it is more complex.
28.1 Solving Systems of Linear Equations 🧮
A system of linear equations is a set of equations with multiple variables, like 2x + 3y = 7 and 4x - y = 3. These systems come up everywhere, from computer graphics to engineering. One way to represent and solve these systems is with matrices. By putting the coefficients into a matrix, we can use powerful algorithms to find the values of the variables that satisfy all the equations.
The most common method for solving these systems is Gaussian elimination. It uses a series of simple row operations to transform the matrix into a simpler form from which the solution can be easily found. This process is systematic and guarantees a solution if one exists.
28.2 Inverting Matrices: The Opposite Operation 🔄
The inverse of a matrix is another matrix that, when multiplied by the original, gives you the identity matrix. The identity matrix is like the number 1 for matrix multiplication. Finding the inverse is a key step in solving many matrix-related problems, especially those involving systems of equations.
The algorithm to find the inverse is very similar to Gaussian elimination. You start with your matrix next to an identity matrix. Then, you perform the same row operations on both matrices until your original matrix becomes the identity matrix. The matrix on the right will then be the inverse.
28.3 Symmetric Positive-Definite Matrices and Least-Squares Approximation 📈
A special type of matrix called a symmetric positive-definite matrix has properties that make it easier to work with. These matrices often appear in scientific computing and optimization problems.
One common problem is least-squares approximation. This is when you have a set of data points that don’t fit a perfect line or curve, and you want to find the line that comes closest to all of them. This problem can be represented and solved using matrices, and the unique properties of symmetric positive-definite matrices make the solution more stable and reliable. The solution to a least-squares problem often involves inverting a matrix, so having a good algorithm for it is essential.
29.1 Standard and Slack Forms 📝
Linear programming is a powerful mathematical method for finding the best possible outcome in a problem that can be represented by a linear relationship. It’s used to solve a wide range of problems, from optimizing a company’s production schedule to finding the most cost-effective way to transport goods.
A linear program has two main parts: an objective function that you want to maximize or minimize, and a set of constraints that are inequalities that the variables must satisfy.
Linear programs are typically written in standard form. A linear program is in standard form if:
The objective function is to be maximized.
All variables are non-negative.
All constraints are "less than or equal to" a constant.
For example, to maximize 2x + 3y with constraints x + y <= 4 and 2x + y <= 5, the linear program would be in standard form.
Another form is slack form, which is used internally by algorithms to solve the problem. In slack form, all the constraints are equalities rather than inequalities. This is achieved by introducing slack variables that take up the "slack" in each inequality. For example, x + y <= 4 becomes s_1 = 4 - x - y, where s_1 is the non-negative slack variable.
29.2 Formulating Problems as Linear Programs ✍️
Many real-world problems can be translated into a linear program. The key is to identify the variables, the objective, and the constraints. For example, a company that makes two products could use linear programming to decide how many of each product to make to maximize profit, given constraints on labor, materials, and machine time. Each product’s quantity would be a variable, the total profit would be the objective function, and the limitations on resources would be the constraints.
29.3 The Simplex Algorithm: The Go-To Solver 🚀
The simplex algorithm is a classic and widely-used method for solving linear programs. It works by starting at a corner of the solution space and then systematically moving to an adjacent corner that improves the objective function. The algorithm continues this process until it can no longer improve the objective function, which means it has found the optimal solution. The algorithm’s behavior can be visualized as moving along the edges of a high-dimensional shape until it reaches the highest point.
29.4 Duality: A Hidden Connection 🕵️
Every linear program has a corresponding dual linear program. The original is called the primal, and the dual is a related problem that can be used to prove properties about the primal solution. The solutions to the primal and dual problems are deeply connected by the Duality Theorem, which states that if one has an optimal solution, so does the other, and their optimal values are equal. Thinking about a problem’s dual can often provide new insights and different ways to approach the solution.
29.5 The Initial Basic Feasible Solution: A Starting Point 🏁
The simplex algorithm needs a valid starting point to begin its search for the optimal solution. This starting point is called an initial basic feasible solution. If a problem doesn’t have an obvious starting point, the algorithm can use a clever trick called the two-phase simplex method to find one. This method involves solving a temporary linear program to find a feasible solution for the original problem, which is then used as the starting point for the main algorithm.
30.1 Representing Polynomials 📜
A polynomial is a mathematical expression with one or more variables and coefficients, like $a_n x^n + a_{n-1} x^{n-1} + … + a_1 x + a_0$. Polynomials are used in many areas of computing, from signal processing to cryptography. There are two main ways to represent a polynomial in a computer:
Coefficient Representation: We can store the polynomial as a list of its coefficients. For a polynomial of degree n, we need n+1 coefficients. This is a compact and simple way to represent a polynomial for operations like addition.
Point-Value Representation: We can represent a polynomial by a set of points that it passes through. A polynomial of degree n is uniquely defined by n+1 point-value pairs, such as (x_0, y_0), (x_1, y_1), ..., (x_n, y_n), where $y_i = P(x_i)$. This representation is great for multiplication, as multiplying two polynomials in this form is a simple matter of multiplying their corresponding y-values.
30.2 The DFT and FFT: A Bridge Between Representations 🌉
The Discrete Fourier Transform (DFT) is a mathematical tool that converts a polynomial from its coefficient representation to its point-value representation. A related tool, the inverse DFT, converts a polynomial back from point-value to coefficient representation.
The Fast Fourier Transform (FFT) is a highly efficient algorithm for computing the DFT and its inverse. It’s a clever divide and conquer algorithm that performs this conversion in O(n log n) time, which is much faster than the naive O(n^2) method. The FFT is a cornerstone of digital signal processing and is used in everything from MP3 players to medical imaging.
30.3 Efficient FFT Implementations ⚡
The FFT algorithm’s efficiency comes from a clever recursive structure. To compute the DFT of a polynomial of degree n-1, it divides the problem into two subproblems: the DFT of the polynomial’s even-indexed coefficients and the DFT of its odd-indexed coefficients. The solutions to these subproblems are then combined to get the final result. This recursive structure is what gives the FFT its "fast" name and its incredible speed. It is a beautiful example of how an elegant algorithm can completely transform a computational problem.
31.1 Elementary Number-Theoretic Notions 🧠
Number theory is a branch of mathematics that studies integers and their properties. It’s the foundation of modern cryptography and security. We’ll start by looking at some basic ideas:
Divisibility: An integer a divides an integer b if b = ak for some integer k. For example, 3 divides 12 because 12 = 3 * 4.
Prime Numbers: A prime number is a positive integer greater than 1 that has no positive divisors other than 1 and itself. The first few primes are 2, 3, 5, 7, 11, etc.
Relatively Prime: Two integers are relatively prime if their only common positive divisor is 1. For example, 10 and 21 are relatively prime.
31.2 Greatest Common Divisor (GCD) 🤝
The greatest common divisor (GCD) of two integers is the largest integer that divides both of them. For example, the GCD of 10 and 15 is 5. We can find the GCD using the Euclidean algorithm, which is a very old and efficient method. It works by repeatedly taking the remainder of a division until the remainder is 0. The last non-zero remainder is the GCD.
def gcd(a, b):
"""Computes the greatest common divisor of a and b."""
while b:
a, b = b, a % b
return a
31.3 Modular Arithmetic 🕰️
Modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" when they reach a certain value, called the modulus. It’s like a clock: on a 12-hour clock, 13 o’clock is 1 o’clock. We write this as 13 ≡ 1 (mod 12). This concept is fundamental to number-theoretic algorithms.
31.4 Solving Modular Linear Equations ✍️
A modular linear equation looks like ax ≡ b (mod n). We want to find all integer values of x that satisfy this equation. We can solve these equations by using the Extended Euclidean Algorithm, which finds integers x and y such that ax + ny = gcd(a, n). This algorithm is a critical building block for many cryptographic systems.
31.5 The Chinese Remainder Theorem 🧩
The Chinese Remainder Theorem is a powerful theorem that helps solve a system of simultaneous modular linear equations. For example, if you know that a number gives a remainder of 2 when divided by 3, and a remainder of 3 when divided by 5, the theorem helps you find the number. This theorem is essential in cryptography and number theory.
31.6 Powers of an Element 🔋
We often need to compute the powers of a number in modular arithmetic, such as a^k (mod n). A naive approach of multiplying a by itself k times would be very slow for large k. Instead, we can use a much faster algorithm called modular exponentiation that uses a clever binary approach to compute the result in O(log k) time.
31.7 The RSA Public-Key Cryptosystem 🔐
The RSA cryptosystem is a famous public-key cryptography algorithm that is used to secure data on the internet. It’s a beautiful application of number theory. It relies on the fact that it is easy to multiply two large prime numbers together, but it is extremely difficult to factor the result back into the two original primes.
31.8 Primality Testing (An Advanced Topic) 🧐
Primality testing is the problem of determining whether a given number is prime or not. While we could try to divide a number by every integer up to its square root, this is very slow for large numbers. Instead, we can use fast, probabilistic algorithms like the Miller-Rabin test that can quickly determine if a number is likely to be prime.
31.9 Integer Factorization (An Advanced Topic) 🧩
Integer factorization is the problem of finding the prime factors of a composite number. It is a very hard problem that is used as the basis for the security of the RSA cryptosystem. There are many algorithms for integer factorization, but none of them are fast enough to break modern encryption that uses very large numbers.
32.1 The Naive String-Matching Algorithm 🗺️
String matching is the problem of finding all occurrences of a shorter string, called a pattern, within a longer string, called the text. The simplest way to solve this is with the naive algorithm. It’s a straightforward approach that’s easy to understand but not very efficient for large inputs.
The algorithm works as follows:
Align the pattern with the beginning of the text.
Check each character of the pattern against the corresponding character in the text.
If all characters match, a match is found.
If a character doesn’t match, you shift the pattern over by one character and start the process again from the beginning of the new alignment.
This process continues until the pattern has been checked against every possible position in the text.
32.2 The Rabin-Karp Algorithm: Using a Hash to Skip Ahead 💨
The Rabin-Karp algorithm is a more clever string-matching algorithm that uses hashing to quickly compare a pattern to a section of the text. It works by computing a hash value for the pattern and for each section of the text that has the same length as the pattern.
The key idea is that if the hash values don’t match, the strings can’t be a match, and you can quickly skip to the next section. If the hash values do match, you perform a character-by-character comparison to be sure, since a hash collision could cause different strings to have the same hash value. The genius of the algorithm is in its ability to compute the hash for the next section of text in constant time, O(1), using a rolling hash. This makes it very fast on average.
32.3 String Matching with Finite Automata: The State Machine 🤖
Another way to solve the string-matching problem is with a finite automaton. This is a theoretical machine that has a finite number of "states" and transitions between them based on the input. We can build a finite automaton that recognizes the pattern we are searching for.
The automaton starts in a specific state and, as it reads the text one character at a time, it moves from one state to another. If the automaton ends in a specific "accept" state, it means the pattern has been found. This algorithm is very fast, as it only requires one pass through the text, running in O(n) time, where n is the length of the text.
32.4 The Knuth-Morris-Pratt Algorithm: Smart Skipping 🧠
The Knuth-Morris-Pratt (KMP) algorithm is a very efficient string-matching algorithm that avoids a key inefficiency of the naive approach. In the naive algorithm, when a mismatch occurs, you might shift the pattern back over already-checked characters. The KMP algorithm uses precomputed information about the pattern to decide how far to shift after a mismatch, ensuring that it never checks any character in the text more than once. This gives it a worst-case runtime of O(m + n), where m is the length of the pattern and n is the length of the text.
33.1 Line-Segment Properties 📏
Computational geometry is a field of computer science that deals with algorithms for geometric objects like points, lines, and polygons. Many problems in this area, from computer graphics to robotics, can be solved by thinking about the spatial relationships between these objects. We’ll start with the most basic building block: a line segment. A line segment is defined by its two endpoints. To understand the relationship between line segments, we need a way to determine if they intersect or which direction one turns from another.
We can determine the direction of a turn using a simple cross-product calculation. For three points p1, p2, and p3, this calculation tells us whether p3 is to the left of the line from p1 to p2, to the right, or on the line itself. This simple calculation is a fundamental tool for solving more complex geometry problems.
33.2 Determining Whether Any Pair of Segments Intersects 🚧
A naive way to determine if any pair of line segments in a set intersects is to check every possible pair. If there are n segments, this would take O(n^2) time. A more efficient approach, called the sweep-line algorithm, solves this problem in O(n log n) time.
The algorithm works by imagining a vertical line "sweeping" across the plane from left to right. As the line moves, it stops at specific points called event points, which are the endpoints of the line segments. The algorithm maintains a data structure that keeps track of the line segments that the sweep line currently intersects. When the sweep line hits a new event point, the algorithm checks for intersections only with the segments that are currently active. This clever approach avoids unnecessary checks and provides a much faster solution.
33.3 Finding the Convex Hull: The Outer Boundary 🖼️
The convex hull of a set of points is the smallest convex polygon that contains all the points. It’s like finding a rubber band that stretches around the entire set of points. The points on the convex hull are the "extreme" points of the set.
The Graham scan is a common algorithm for finding the convex hull. It works in a few steps:
Find the point with the lowest y-coordinate. This point is guaranteed to be on the convex hull.
Sort all other points by the angle they make with the lowest point.
Go through the sorted points, adding them to a stack. At each step, check for a "right turn" by looking at the last three points added to the stack. If there is a right turn, it means the middle point is not on the convex hull, and it is removed from the stack.
The points remaining on the stack at the end form the convex hull.
33.4 Finding the Closest Pair of Points: The Nearest Neighbors 🫂
The closest pair of points problem is a fundamental problem in computational geometry. The goal is to find the two points in a set that are closest to each other. A naive approach would be to calculate the distance between every pair of points, but this takes O(n^2) time.
A much more efficient divide and conquer algorithm solves this in O(n log n) time.
Divide: Sort the points and split them into two equal halves with a vertical line.
Conquer: Recursively find the closest pair in the left half and the right half.
Combine: The closest pair is either in the left half, in the right half, or it’s a pair with one point in each half. We then find the shortest distance in a narrow "strip" around the dividing line, where the points could potentially be closer than the shortest distance found so far. By checking a limited number of points in this strip, the algorithm achieves a fast overall runtime.
34.1 The Realm of Polynomial Time ⏳
When we talk about an algorithm’s efficiency, we often use polynomial time as a benchmark for what is "fast" or "reasonable". An algorithm runs in polynomial time if its runtime is proportional to a polynomial function of the input size, such as $O(n)$, $O(n^2)$, or $O(n^3)$. This is because these runtimes don’t grow too quickly as the input size gets large. Problems that can be solved by a polynomial-time algorithm are considered tractable. Most of the algorithms we have studied so far, like sorting and shortest-path algorithms, are in this category.
34.2 Polynomial-Time Verification: The Easy Check ✅
Some problems are very hard to solve, but if you’re given a potential solution, it’s very easy to check if that solution is correct. A problem has the property of polynomial-time verification if a solution can be checked in polynomial time.
For example, consider the Traveling Salesman Problem (TSP). The goal is to find the shortest possible route that visits every city exactly once and returns to the origin. Finding the shortest route for many cities is incredibly difficult. However, if you’re given a route and told its total length, you can easily verify its correctness by summing the lengths of all the connections in the route. Problems that can be verified in polynomial time are in a complexity class called NP.
34.3 NP-Completeness and Reducibility 🧩
Some problems in NP are special: they are the "hardest" problems in the group. A problem is NP-complete if it satisfies two conditions:
It is in NP.
Any other problem in NP can be transformed, or reduced, to it in polynomial time.
The concept of reducibility is key here. If problem A can be reduced to problem B, it means that if you have a way to solve B, you can use it to solve A. This is a way of saying that problem A is "no harder than" problem B. If we can show that a problem is NP-complete, it means that if we could find a polynomial-time algorithm for it, we could use that algorithm to solve every other NP problem in polynomial time as well. This is a major open question in computer science, known as the P vs. NP problem.
34.4 NP-Completeness Proofs: A Domino Effect Domino
To prove that a new problem is NP-complete, you don’t have to show that it can be reduced from all other NP problems. You just need to show two things:
The problem is in NP.
You can reduce a known NP-complete problem to your new problem in polynomial time.
This second step is a powerful technique. It’s like a domino effect: if you prove that one problem is NP-complete, you can use it as a basis to prove that other problems are also NP-complete. The first NP-complete problem ever discovered was the circuit satisfiability problem.
34.5 NP-Complete Problems: The Family of Hardness 🤯
There are thousands of problems that are known to be NP-complete. They appear in many different areas of computer science and mathematics. Some well-known examples include:
The Traveling Salesman Problem.
The Vertex-Cover Problem.
The Knapsack Problem (deciding if it’s possible to fill a knapsack with a specific total weight and value).
The Subset-Sum Problem.
Since no one has found a polynomial-time algorithm for any NP-complete problem, if you encounter one, it’s a sign that you should not try to find an exact, fast solution. Instead, you should look for an alternative approach, like an approximation algorithm that finds a solution that is "close enough".
For problems that are NP-complete, finding an exact, optimal solution in a reasonable amount of time is likely impossible 🤯. In these cases, we often don’t need the absolute best answer; we just need a solution that is "close enough" to the optimal one. An approximation algorithm is a method for finding such a solution efficiently. The goal is to get a solution with a guaranteed quality, often within a certain percentage of the optimal solution.
35.1 The Vertex-Cover Problem: The Smallest Set of Guards 💂
The vertex-cover problem is a classic NP-complete problem. In a graph, a vertex cover is a set of nodes such that every connection (edge) in the graph has at least one of its endpoints in the set. The goal is to find the smallest possible vertex cover.
A simple and efficient approximation algorithm can find a vertex cover that is at most twice the size of the optimal one. The algorithm works as follows:
Initialize an empty set for the vertex cover.
While there are still edges in the graph, choose an arbitrary edge.
Add both of the edge’s endpoints to the vertex cover set.
Remove all edges connected to these two endpoints from the graph.
Repeat until no edges remain.
This is a greedy algorithm that makes a locally optimal choice at each step to get a good result.
35.2 The Traveling-Salesman Problem (TSP): The Shortest Route 🌍
The Traveling-Salesman Problem (TSP) is one of the most famous NP-complete problems. The goal is to find the shortest possible tour that visits a set of cities exactly once and returns to the starting city. For a small number of cities, you can check every possible route, but the number of routes grows so fast that it’s impossible for a large number of cities.
For a version of TSP where the cities are in a plane and the distances obey the triangle inequality, we can use an approximation algorithm that uses a minimum spanning tree.
Find the minimum spanning tree of the graph of cities.
Perform a traversal of the MST to create a tour that visits every city.
This method gives a tour that is guaranteed to be at most twice the length of the optimal tour, which is a powerful and reliable guarantee for a problem that is so difficult to solve exactly.
35.3 The Set-Covering Problem: The Fewest Sources 🗺️
The set-covering problem is another NP-complete problem that has many real-world applications. Imagine you have a set of items that you want to cover, and you have a collection of sets, each containing some of these items. The goal is to find the smallest number of sets that, when combined, cover all the items. This is useful for problems like deciding where to place fire stations to cover an entire city or where to install cell towers to cover a geographical area.
A greedy approximation algorithm can solve this problem by repeatedly choosing the set that covers the largest number of uncovered items. This algorithm provides a solution that is close to the optimal one.
35.4 Randomization and Linear Programming: A New Approach 🎲
Some approximation algorithms use randomization or linear programming to find a good solution. For example, a problem can be modeled as a linear program, and if the solution to the linear program is not an integer, you can use a randomized rounding technique to turn the fractional solution into a valid integer solution.
35.5 The Subset-Sum Problem: The Exact Total 💰
The subset-sum problem is to determine if a subset of a given set of numbers sums to a specific target value. This problem is NP-complete. An approximation algorithm for this problem can find a subset whose sum is "close" to the target value. The algorithm can be made to run faster by using a clever method that trims the list of possible sums to keep its size manageable.
This book is dedicated to Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.
Their work, "Introduction to Algorithms," is not merely a book; it is the undisputed bible of the field and the gold standard for clarity and rigor. In this spirit, we have adopted their canonical structure, using the same chapter and section numbers as a guiding framework. This choice is deliberate, designed to absolve us from the task of creating a new path where a perfect one already exists. Instead, our goal is to build upon their profound insights, making their monumental work more accessible to a new generation of thoughtful learners. This is an homage to their dedication, and it is our hope that by following their lead, we can make the journey into the world of algorithms easier and more intuitive for everyone.