Graphs - Minimum Spanning Trees
Introduction
- Examine 2 algorithms for finding the Minimum Spanning Tree (MST) of a
graph
- Prim's and Kruskal's Algorithms
- Both are Greedy Algorithms
Some Graph Terminology
- Graph consists of
- V = Set of Vertices (aka nodes)
- E = Set of Edges
- Edges:
- Edges connect 2 vertices
- Specified by a pair
- Weighted Graph: each edge has a (positive) weight
- Directed graphs:
- Edges have direction
- For example: distinguish between [(A,B) and (B,A)]
- Represent direction with arrowhead
- Undirected graphs:
- Edges have no direction
- For example: do NOT distinguish between [(A,B) and (B,A)]
- Edge has NO arrowhead
- Path between two nodes
- Path between two nodes: sequence of nodes and edges
- Begins and ends with a node
- Each edge connects the node preceding and following it
- In a directed graph, a path must follow arrows
- Connected graph: a path exists between every pair of nodes
- Unconnected graph: Some pair of nodes has no path between them
- Cycle: path that begins and ends at same node
- Cyclic graph: graph that contains a cycle
- Acyclic graph: graph that contains NO cycles
Example Problems
- Set of cities connected by roads
- roads need repair before they can be used
- Each road connects exactly two cities
- Each road has a repair cost
- Problem: find lowest cost set of roads to repair so that all cities are connected
- connected means there is a path between each pair of cities
- This is a minimum spanning tree for the graph
- Nodes are a set of pins in an electronic circuit
- Goal is to connect all pins with minimal wire
- Edges are possible connections
- Edge weights are distances between pins
- MST gives solution that connects pins with least wire
MST Definition
- A MST is a minimum weight tree that contains all nodes
of an undirected graph
MST Example
- Example (from CLRS IM):
- MST in green
- Is the MST unique?
MST Properties
- Graph is undirected
- MST is a TREE
- A tree is connected acyclic graph: it has no cycles (ie no closed paths)
- Number of edges in a tree: |V| - 1
- MST is a SPANNING Tree
- Nodes of MST = Nodes of G
- MST contains a path between any two nodes
- MST is a MINIMUM Spanning Tree
- Sum of edges is a minimum
- MST may not be unique
- In a directed graph, the related problem is finding a tree in a graph that
has exactly path from the root to each edge. We don't consider this problem.
MST: Algorithms
- This algorithm shows the overall approach:
MST(G)
M := the empty graph
while M is not a MST of G loop
find an edge E in G that is in some MST of G, but not in M
add E to M (unless E makes a cycle)
end loop
return M
The trick with this algorithm is finding E
Two common MST algorithms find E in different ways:
- Prim's Algorithm
- Kruskal's Algorithm
Prim's and Kruskal's algorithms are Greedy Algorithms
- At each step, each adds the best edge to M
MST: Kruskal's Algorithm
- Assume G is undirected, connected, weighted
- Kruskal's algorithm:
- M is a forest of trees
- E is smallest edge that joins two trees in the forest
- This reduces the number of trees by 1 at each step
- Kruskal's algorithm:
Kruskal(G, w) -- G: Graph; w: weights
M := empty set
make a singleton vertex set from each vertex in G
sort the edges of G into non-decreasing order
for i in 1 .. |V| - 1 loop
(u, v) := next edge of G (from sorted order list)
if sets containing u and v are different then
add (u, v) to M
merge vertex sets containing u and v
end if
end loop
Tracing Kruskal's Algorithm
- Initialize
- M = {}
- vertex sets: {A},{B},{C},{D},{E},{F},{G},{H},{I}
- edgelist, sorted by weight: 1CF, 2GI, 3EF, 3CE, 5DH, 6FH, 7DE,
8BD, 9BC, 9GH, 10AB, 11HI, 12AC
- (u, v) = (C, F)
- C and F are in different vertex sets (ie {C} and {F}
- Add CF to M: M = {CF} (omit the comma from the edge)
- Merge {C} and {F}: {A},{B},{C,F},{D},{E},{G},{H},{I}
- (u, v) = (G, I)
- G and I are in different vertex sets (ie {G} and {I}
- Add GI to M: M = {CF, GI}
- Merge {G} and {I}: {A},{B},{C,F},{D},{E},{G,I},{H}
- (u, v) = (E, F)
- E and F are in different vertex sets (ie {E} and {C,F}
- Add EF to M: M = {CF, EF, GI}
- Merge {E} and {F}: {A},{B},{C,E,F},{D},{G,I},{H}
- (u, v) = (C, E)
- C and E are both in {C,E,F}: don't add CE since it would create a cycle
- Another look at the example:
Kruskal's Algorithm: How it works
- For each set of nodes S, M contains a tree that connects the nodes in S
- Alternatively: the edges (ie trees) in M divide the nodes of G into sets
- That is, the nodes for each tree, plus remaining singleton sets
- Why does it produce a tree:
- Start with |V| sets
- Each edge added to M merges 2 sets of nodes
- This reduces the number of sets by 1
- Thus, the number of sets is reduced |V|-1 times
- Thus, |V|-1 edges are added
- Thus, M is a connected graph with |V|-1 edges
- Thus, M is a tree
- Another way of looking at it:
- Each set of nodes is connected by a tree in M
- At each step, adding an edge connects two trees without making a loop (why?)
- Thus the final M is a tree
- Why does it produce a spanning tree:
- Each vertex starts in a set and ends in the final set of nodes
- Thus the final set contains all nodes and M is a tree that connects
them
Kruskal's Algorithm: Why does it work
- Why does it produce a minimal spanning tree (proof outline):
- Assume M1 and M2 are MST of 2 subsets of nodes A1 and A2
- Assume alg adds edge e=(u, v) that connects M1 and M2
- e is the smallest edge that connects M1 and M2 (why)
- M1+M2+e is a MST for A1 + A2
- If not, there must be a smaller set of edges among them
- This smaller set must contain a smaller tree on A1 or on A2 or
connecting one
- So we get a contradiction
Kruskal's Algorithm - Performance
- Sort edges: O(E lg E)
- Create, Find, and Union Sets:
- For loop 1, number of make sets is O(V)
- For loop 2, number of find sets and union sets is O(E)
- Total: O((V+E) α(V))
- Where α(n) is a very, very slow growing function
- α(n) ≤ 4 for n < 1080
- Assumes use of a fast data structure for finding which disjoint set
a node belongs, and for merging sets
- Store each set of nodes in a tree
- Use Union by rank and Path compression
- Total: O(E lg E) + O((V+E) α(V))
- = O(E lg E) + O(E α(V)), since |E| ≥ |V| - 1 since G is connected
- = O(E lg E) + O(E lg E), since α(V) = O(lg V) = O(lg E)
- = O(E lg E)
- = O(E lg V), since |E| ≤ |V|2 ⇒ lg |E| = O(2 lg V) = O(lg v)
- If edges are already sorted: O(E α(V)), almost linear
MST: Prim's Algorithm
- Assume G is undirected, connected, weighted
- Basic approach: M is part of a MST, and E is the smallest edge that
can be connected to the tree
- Produces MST rooted at r
- v.π gives predecessor of node v in MST
- v.π is null for root and for nodes not yet on MST
- MST is edges (v, v.π) for all nodes v in V-r
- MST is never created as a set
- For all v in Q (ie not yet on MST), v.key is weight of the smallest edge
(so far) that connects v to current MST
- v.key = Integer'last for all nodes that have no edge to current MST
- Basic algorithm (based on CLRS IM):
PRIM(G: Graph, w: weights, r: start node)
Q := empty priority queue
for each vertex u in G loop
u.key := Integer'Last
u.π = nil
enqueue(Q, u)
end loop
decreasekey(Q, r, 0) -- r.key := 0, moves node to front of queue
while not isempty(Q) loop
u = front(Q) -- get/remove node w/ smallest key. u is on MST
-- Add u to (virtual) MST
for each v adjacent to u in G loop
-- is (u,v) a better edge to connect node v to current MST
if v in Q and w(u, v) < v.key then
v.π := u
decreasekey(Q, v, w(u,v)) -- v.key = w(u,v)
end if
end loop
end loop
Prim's Algorithm - Trace
- Let's trace it
- You pick a root
Prim's Algorithm - Performance
- Performance depends on implementation of priority queue:
- binary heap implementation: O(E lg V)
- Fibonacci heap implementation: O(lg V + E)
- Option 1: implement priority queue with binary heap
- Cost of initializing Q and first for loop: O(V lg V)
- Cost of decreasing key of r: O(lg V)
- While loop:
- V calls to front (=extract min) → O(V lg V)
- E (or fewer) calls to decrease key → O(E lg V)
- Total for loop: O((V + E) lg V)
- Overall Total: O((V + E) lg V) = O(E lg V),
- Since E ≥ V+1, since G is connected
- Option 2: implement priority queue with Fibonacci heap
- Decrease key can be done in O(1) (amortized) time
- Amortized means the cost is spread out
- Average time is O(1).
- Other times may sometimes be larger.
- E Calls to decrease key take O(E) * O(1) = O(E) time
- Total: O(V lg V + E)