B Trees

B-Trees - Introduction

Motivation 1: Keep search tree balanced

All leaves are at same level

Motivation 2: For storing data sets that are too large for memory

Solution: Data sets are stored on disk
But ... disk access is much slower than memory access

Solution: Reduce disk access with a shallow tree with large nodes
Entire node should fit in memory

Basic ideas:

Each node has between t and 2t children

Typically, t is large (eg 1000 or 2000)

Fetch an entire node from disk at one time
Split nodes and push keys up, when needed

All leaves are at same level

What is t for a 2-3-4 tree?

Presentation based on CLRS

Invented by Bayer and McCreght (1970) and Kaufman (around 1970)

B-Trees (or their variants) are widely used

Properties of B-Trees -

Each node (except perhaps the root) has between t and 2t children

The key of t characterizes the tree nodes
What is t for a 2-3-4 tree

Each node has between t-1 and 2t-1 keys
Subtrees and node keys are ordered

For example: keys in the leftmost child are less than or equal to the first key

Tree structure:

All leaves are at the same depth
How many nodes at depth h: at least 2t^h-1
A B-Tree with n nodes has height h = O(log_t n)

B-Tree Implementation

B-Tree T has the following fields:
- T.root = root node of tree T
B-Tree node x has the following fields:
- x.leaf = whether x is a leaf node
- x.n = number of keys in node x
- x.keys(1 .. x.n) = array of keys in x
- x.child(1 .. x.n+1) = array of pointers to children of x
- keys in x.child(i) ≤ x.key(i)
All nodes of T have between t and 2t children

And between t-1 and 2t-1 keys

Searching a B-Tree

Search for key k in node x by comparing k to the keys in x and reading a new child node from disk, as needed

Generalizes BST algorithm

Recursive Code:

    -- x is a B-Tree node
    -- k is a key
    Search(x, k)
        -- Find first key in x that is less than k
        i := 1
        while i ≤ x.n and then k > x.keys(i) loop   
            i := i + 1; 
        end loop

        -- Return x and location i if key is found
        if i ≤ x.n and then k = x.keys(i) then
            return (x, i)
        elsif x.leaf then
            return not found
        else
            return Search(x.child(i), k)
        end if;

Insert(T, k)

Insert key k into B-Tree T

Code:

    -- Root of T may be full
    Insert(T, k)
        r := T.root

        -- if root is full, must split it
        if r.n == 2t - 1 then 
            s := new node
            T.root := s
            s.n := 0
            s.child(1) := r
            Split-Child(s, 1); -- Split old root
            Insert-Nonfull(s, k)

        else  -- root is not full and so no split needed
            Insert-Nonfull(r, k)

        end if
   end Insert

If the root is not full, then there is room to insert

If a lower node fills, split it and move a key up
Since the root is not full, there is always room for this operation

B-Tree Split a Child - Implementation

Split x.child(i) of node x

Child i of node x is full with 2t children (2t-1 keys).

Move key at t in child i up into x
- and create a new child i+1 in x with t-1 keys larger than key t
Create a new node to be new child i+1 of x

New node will hold upper t children (t-1 keys) of original child i

The keys in node x are distributed as follows:

Upper t-1 keys in child i go to new child i+1
One key moves from child i up into node x
Lower t-1 keys in child i remain in child i, unchanged.

B-Tree Split a Child

Code:

    Split-Child(x, i)
        ----------------
        -- Update child i
        -- Child i goes from 2t-1 keys (2t children) to t-1 keys (t children) 
        --    (1 key moves up and t-1 keys move right)
        y := x.child(i)   
        y.n := t - 1

        --------------------------------
        -- Create and fill the new node (for child i+1)
        z := new node     
        z.leaf := y.leaf

        -- Move upper t-1 keys from y (child i) to z (new child i+1)
        z.n := t - 1    
        for j in 1 .. t-1 loop
            z.key(j) := y.key(t+j)
        end loop

        -- Move upper t children from y to z, if needed
        if not y.leaf then
            for j in 1 .. t loop
                z.child(j) := y.child(t+j)
            end loop
        end if

        --------------------------------
        -- Update x
        for j in reverse i .. x.n loop  -- Shift children and keys right
            x.key(j+1) := x.key(j)         
            x.child(j+2) := x.child(j+1)
        end loop
                                           
        x.key(i) := y.key(t)                -- Move key up from child 
        x.child(i+1) := z                   -- Put new child into x
        x.n := x.n+1                        -- x now has n+1 keys

        WRITE(x)
        WRITE(y)
        WRITE(z)
   end Split-Child

B-Tree Split Child - Example

Assume t=4

Before splitting child i:

Node x contains L, M, W, X
Child i of x contains 7 keys: P,Q,R,S,T,U,V
Child i of x contains 8 children: T1, ..., T8

After splitting child i:

Node x contains L, M, S, W, X
Child i of x contains 3 keys: P,Q,R
Child i+1 of x contains 3 keys: T,U,V
Child i of x contains 4 children: T1, ..., T4
New Child i+1 of x contains 4 children: T5, ..., T8

B-Tree Insert Nonfull

Insert key k into nonfull node x

Code:

    Insert-Nonfull(x, k)
    i := x.n
    if x.leaf then 
        while i ≥ 1 and then k < x.key(i) loop
            i := i - 1
            x.key(i+1) := x.key(i)
        end loop
        x.key(i+1) := k
        x.n := x.n + 1
        WRITE(x)
    else 
        while i ≥ 1 and then k < x.key(i) loop
            i := i - 1 
        end loop
        i := i + 1 
        READ(x.child(i))
        if x.child(i).n = 2t - 1
            Split-Child(x, i)
            if k > x.key(i) then
                i := i + 1
            end if
        end if
        Insert-Nonfull(x.child(i), k)
    end if

B-Tree Variant: B+-Tree

B+ trees keep only keys (and not records) in internal nodes
All records are at leaves
Enables easy sequential access