Grokking Algorithms An illustrated guide for programmers and other curious people, by Aditya Y. Bhargava
Chapter 4: Quicksort
Chapter 5: Hash tables
Chapter 6: Breadth-first search
If you read any of these chapters, please go ahead and discuss what you thought in the comments!
📝 you can read my notes below.
📆 The next post will be on chapters 7 - 8. I'll aim for August 2023.
If you're on Storygraph and want to join me, say hi and I'll start a Buddy Read.
Quicksort
key topics in this chapter: base case
, recursive case
, divide and conquer
, big O notation
, average vs worst case
...
Base and recursive cases for binary search
A base case is the simplest scenario for an algorithm.
The base case for binary search is an array with 1 item, that is
true
(matches) orfalse
(doesn't match).The recursive case for binary search, is to split the array in half, then call binary search on the true half.
Average case vs. worst case
The pivot you choose will impact quicksort's performance!
- A worst case scenario is a pivot of 1 on a sorted scenario O(n)
- It's much faster to choose the middle as the pivot, which is the best case scenario O(log n)
- The best case scenario is the average case! If you always choose a random element in an array as the pivot, quicksort will complete in _O(n log n) time (in average)
Hash Tables
Key topics covered Common hash table use cases
, hash functions
, caching
, collisions
, load factor
...
Hash functions
- A function where you put in a sequence of data and you get back a number: it maps 'strings' to numbers
- the number you get back is consistent
- no two 'strings' are mapped to the same number
Hash tables
This data structure has logic behind it, unlike an array which maps straight to memory. A hash function and an array together make a hash table.
some examples from standard code libraries...
Use cases
- Contacts/phonebook
- DNS resolution (mapping a web address to an IP address)
- Limiting survey submissions to one per user
- Caching (websites remembering data instead of recalculating it)
Performance
- On average, hash tables take O(1) which is constant time; the time taken will stay the same, regardless of the size of the table.
- In the worst case, they take linear time for everything, O(n), which is really slow.
A collision is when two keys are assigned to the same slot. If this happens, start a linked list in that slot. To avoid collisions, you need a low load factor and a good hash function.
Load factor measures how many empty slots remain in the hash table. Use resizing when the table is getting full, e.g. when the load factor is >0.7
load factor = number of items in hash table/total number of slots
Breadth-first Search
This chapter covers graphs
, network modelling
, data structures
, breadth-first search
(BFS
), queues
, directed graphs
, undirected graphs
, topological sort
, dependencies
between nodes...
What is a graph?
A graph models a set of connections, it's made up of nodes and edges, where connected nodes are called neighbors.
Breadth-first Search (BFS)
BFS is an algorithm to solve the "shortest-path" problem (e.g. how to get from A to B with the fewest transport changes). To use BFS you need to,
- Model a problem as a graph
- Solve the problem with BFS
BFS can answer
- Is there a path from node A to node B? (Search all 1st degree neighbors, then all 2nd degree connections, etc...)
- What is the shortest path from node A to node B? (Searching all connections in order of their degree to the starting node, will find the shortest path)
Queues
Are have a "first in, first out" structure (FIFO) structure and have only two operations, enqueue and dequeue
Implementing the graph
A hash table data structure allows you to represent relationships, by mapping a node (the key) to all its neighbors (the value).
- An undirected graph doesn't have any arrows, and a directed graph does.
- Making an ordered list out of a graph is a topological sort.
- If all directions are one-way, the graph is a tree.
- nodes with shared neighbors will cause an infinate loop, so you need to check off nodes that have been processed.
Implementing the algorithm
Code example in Python
from collections import dequeue
queue = dequeue() # create a queue
queue += graph["Ruth"] # add all my neighbors to the queue
searched[] # keep track of searched nodes
# While the queue isnt empty
while queue:
person = queue.popleft() # pop a node from the queue
if not name in searched :
if friend_match_criteria(name) :
print name + " matches the criteria!"
return True
else:
queue += graph[name] # enqueue all the node's neighbors to the queue
searched.append(name) # check off this node so it's not re-processed
return False # no friends met the criteria :(
def friend_match_criteria(name) :
return name[-1] == '!' # checks if their name ends in an exclaimation
Running time
- The running time will be at least O(number of edges)
- Adding a neighbor node to the queue takes constant time O(1), so doing this for every node or vertice will be O(number of vertices)
- Breadth-first search takes O(number of vertices + number of edges) which is written as O(V+E)
Key Takeaways
-
"Divide and conquer" in algorithms land is breaking a problem into smaller & smaller chunks
- the base case is probably an empty array/array with 1 element
Choose a random element for quicksort's pivot since the average runtime of quicksort is O(n log n)
Sometimes the constant in Big O matters
but it almost never matters for simple Vs binary searchUsing a lang's hash table will give you average case performance of constant time (the same amount of time regardless of the map size)
Collisions are bad, a good hash function will avoid them
-
Hashes are good for
- Catching duplicates
- Caching data
- Mapping a key to a value
- Fast search, insert, and delete behaviours
-
Breadth-first search (BFS) tells you if there's a path from A to B, and the shortest path
- the search list needed to be a queue
- nodes should not be re-checked (to avoid infinite looping)
An undirected graph has no arrows and goes both ways. A directed graph has arrows. If all arrows go one-way, the graph is a tree
Queues are FIFO (first in, first out) and stacks are LIFO (last in, first out)