Graphs - Minimum Spanning Trees

Introduction

Examine 2 algorithms for finding the Minimum Spanning Tree (MST) of a graph

MST are defined later

Prim's and Kruskal's Algorithms

Both are Greedy Algorithms

Some Graph Terminology

Graph consists of

V = Set of Vertices (aka nodes)
E = Set of Edges

Edges:

Edges connect 2 vertices
Specified by a pair

Weighted Graph: each edge has a (positive) weight

Directed graphs:

Edges have direction
For example: distinguish between [(A,B) and (B,A)]
Represent direction with arrowhead

Undirected graphs:

Edges have no direction
For example: do NOT distinguish between [(A,B) and (B,A)]
Edge has NO arrowhead

Path between two nodes

Path between two nodes: sequence of nodes and edges
Begins and ends with a node
Each edge connects the node preceding and following it
In a directed graph, a path must follow arrows

Connected graph: a path exists between every pair of nodes
Unconnected graph: Some pair of nodes has no path between them

Cycle: path that begins and ends at same node

Cyclic graph: graph that contains a cycle
Acyclic graph: graph that contains NO cycles

Example Problems

Set of cities connected by roads

roads need repair before they can be used

Each road connects exactly two cities

Each road has a repair cost

Problem: find lowest cost set of roads to repair so that all cities are connected

connected means there is a path between each pair of cities
This is a minimum spanning tree for the graph

Nodes are a set of pins in an electronic circuit

Goal is to connect all pins with minimal wire
Edges are possible connections
Edge weights are distances between pins
MST gives solution that connects pins with least wire

MST Definition

A MST is a minimum weight tree that contains all nodes of an undirected graph

MST Example

Example (from CLRS IM):

MST in green

Is the MST unique?

MST Properties

Graph is undirected

MST is a TREE

A tree is connected acyclic graph: it has no cycles (ie no closed paths)
Number of edges in a tree: |V| - 1

MST is a SPANNING Tree

Nodes of MST = Nodes of G
MST contains a path between any two nodes

MST is a MINIMUM Spanning Tree

Sum of edges is a minimum

MST may not be unique

In a directed graph, the related problem is finding a tree in a graph that has exactly path from the root to each edge. We don't consider this problem.

MST: Algorithms

This algorithm shows the overall approach:

    MST(G)
        M := the empty graph
        while M is not a MST of G loop
            find an edge E in G that is in some MST of G, but not in M
            add E to M (unless E makes a cycle)
        end loop
        return M

The trick with this algorithm is finding E

Two common MST algorithms find E in different ways:

Prim's Algorithm
Kruskal's Algorithm

Prim's and Kruskal's algorithms are Greedy Algorithms

At each step, each adds the best edge to M

MST: Kruskal's Algorithm

Assume G is undirected, connected, weighted

Kruskal's algorithm:

M is a forest of trees
E is smallest edge that joins two trees in the forest

This reduces the number of trees by 1 at each step

Kruskal's algorithm:

    Kruskal(G, w)  -- G: Graph; w: weights
        M := empty set
        make a singleton vertex set from each vertex in G
        sort the edges of G into non-decreasing order

        for i in 1 .. |V| - 1 loop
            (u, v) := next edge of G (from sorted order list)
            if sets containing u and v are different then
                add (u, v) to M
                merge vertex sets containing u and v
            end if
        end loop

Tracing Kruskal's Algorithm

Initialize

M = {}
vertex sets: {A},{B},{C},{D},{E},{F},{G},{H},{I}
edgelist, sorted by weight: 1CF, 2GI, 3EF, 3CE, 5DH, 6FH, 7DE, 8BD, 9BC, 9GH, 10AB, 11HI, 12AC

(u, v) = (C, F)

C and F are in different vertex sets (ie {C} and {F}
Add CF to M: M = {CF} (omit the comma from the edge)
Merge {C} and {F}: {A},{B},{C,F},{D},{E},{G},{H},{I}

(u, v) = (G, I)

G and I are in different vertex sets (ie {G} and {I}
Add GI to M: M = {CF, GI}
Merge {G} and {I}: {A},{B},{C,F},{D},{E},{G,I},{H}

(u, v) = (E, F)

E and F are in different vertex sets (ie {E} and {C,F}
Add EF to M: M = {CF, EF, GI}
Merge {E} and {F}: {A},{B},{C,E,F},{D},{G,I},{H}

(u, v) = (C, E)

C and E are both in {C,E,F}: don't add CE since it would create a cycle

Another look at the example:

Kruskal's Algorithm: How it works

For each set of nodes S, M contains a tree that connects the nodes in S

Alternatively: the edges (ie trees) in M divide the nodes of G into sets

That is, the nodes for each tree, plus remaining singleton sets

Why does it produce a tree:

Start with |V| sets
Each edge added to M merges 2 sets of nodes
This reduces the number of sets by 1
Thus, the number of sets is reduced |V|-1 times
Thus, |V|-1 edges are added
Thus, M is a connected graph with |V|-1 edges
Thus, M is a tree
Another way of looking at it:

Each set of nodes is connected by a tree in M
At each step, adding an edge connects two trees without making a loop (why?)
Thus the final M is a tree

Why does it produce a spanning tree:

Each vertex starts in a set and ends in the final set of nodes
Thus the final set contains all nodes and M is a tree that connects them

Kruskal's Algorithm: Why does it work

Why does it produce a minimal spanning tree (proof outline):

Assume M1 and M2 are MST of 2 subsets of nodes A1 and A2

Assume alg adds edge e=(u, v) that connects M1 and M2

u ∈ A1 and v ∈ A2

e is the smallest edge that connects M1 and M2 (why)

M1+M2+e is a MST for A1 + A2

If not, there must be a smaller set of edges among them
This smaller set must contain a smaller tree on A1 or on A2 or connecting one
So we get a contradiction

Kruskal's Algorithm - Performance

Sort edges: O(E lg E)

Create, Find, and Union Sets:

For loop 1, number of make sets is O(V)
For loop 2, number of find sets and union sets is O(E)
Total: O((V+E) α(V))

Where α(n) is a very, very slow growing function
α(n) ≤ 4 for n < 10⁸⁰
Assumes use of a fast data structure for finding which disjoint set a node belongs, and for merging sets

Store each set of nodes in a tree
Use Union by rank and Path compression

Total: O(E lg E) + O((V+E) α(V))

= O(E lg E) + O(E α(V)), since |E| ≥ |V| - 1 since G is connected
= O(E lg E) + O(E lg E), since α(V) = O(lg V) = O(lg E)
= O(E lg E)
= O(E lg V), since |E| ≤ |V|² ⇒ lg |E| = O(2 lg V) = O(lg v)

If edges are already sorted: O(E α(V)), almost linear

MST: Prim's Algorithm

Assume G is undirected, connected, weighted
Basic approach: M is part of a MST, and E is the smallest edge that can be connected to the tree
Produces MST rooted at r

v.π gives predecessor of node v in MST
v.π is null for root and for nodes not yet on MST
MST is edges (v, v.π) for all nodes v in V-r
MST is never created as a set

For all v in Q (ie not yet on MST), v.key is weight of the smallest edge (so far) that connects v to current MST

v.key = Integer'last for all nodes that have no edge to current MST

Basic algorithm (based on CLRS IM):

    PRIM(G: Graph, w: weights, r: start node) 
        Q := empty priority queue
        for each vertex u in G loop
            u.key := Integer'Last
            u.π = nil
            enqueue(Q, u)
        end loop

        decreasekey(Q, r, 0) -- r.key := 0, moves node to front of queue
        while not isempty(Q) loop
            u = front(Q) -- get/remove node w/ smallest key.  u is on MST
            -- Add u to (virtual) MST 
            for each v adjacent to u in G loop
                -- is (u,v) a better edge to connect node v to current MST
                if v in Q and w(u, v) < v.key then  
                    v.π := u
                    decreasekey(Q, v, w(u,v)) -- v.key = w(u,v)
                end if
            end loop
        end loop

Prim's Algorithm - Trace

Let's trace it

You pick a root

Prim's Algorithm - Performance

Performance depends on implementation of priority queue:

binary heap implementation: O(E lg V)
Fibonacci heap implementation: O(lg V + E)

Option 1: implement priority queue with binary heap

Cost of initializing Q and first for loop: O(V lg V)
Cost of decreasing key of r: O(lg V)
While loop:

V calls to front (=extract min) → O(V lg V)
E (or fewer) calls to decrease key → O(E lg V)
Total for loop: O((V + E) lg V)

Overall Total: O((V + E) lg V) = O(E lg V),

Since E ≥ V+1, since G is connected

Option 2: implement priority queue with Fibonacci heap

Decrease key can be done in O(1) (amortized) time

Amortized means the cost is spread out
Average time is O(1).
Other times may sometimes be larger.

E Calls to decrease key take O(E) * O(1) = O(E) time

Total: O(V lg V + E)