B Trees
B-Trees - Introduction
- Motivation 1: Keep search tree balanced
- All leaves are at same level
- Motivation 2: For storing data sets that are too large for memory
- Solution: Data sets are stored on disk
- But ... disk access is much slower than memory access
- Solution: Reduce disk access with a shallow tree with
large nodes
- Entire node should fit in memory
- Basic ideas:
- Each node has between t and 2t children
- Typically, t is large (eg 1000 or 2000)
- Fetch an entire node from disk at one time
- Split nodes and push keys up, when needed
- All leaves are at same level
- What is t for a 2-3-4 tree?
- Presentation based on CLRS
- Invented by Bayer and McCreght (1970) and Kaufman (around 1970)
- B-Trees (or their variants) are widely used
Properties of B-Trees -
- Each node (except perhaps the root) has between t and 2t children
- The key of t characterizes the tree nodes
- What is t for a 2-3-4 tree
- Each node has between t-1 and 2t-1 keys
- Subtrees and node keys are ordered
- For example: keys in the leftmost child are less than or equal to the
first key
- Tree structure:
- All leaves are at the same depth
- How many nodes at depth h: at least 2th-1
- A B-Tree with n nodes has height
h = O(logt n)
B-Tree Implementation
- B-Tree T has the following fields:
- T.root = root node of tree T
- B-Tree node x has the following fields:
- x.leaf = whether x is a leaf node
- x.n = number of keys in node x
- x.keys(1 .. x.n) = array of keys in x
- x.child(1 .. x.n+1) = array of pointers to children of x
- keys in x.child(i) ≤ x.key(i)
- All nodes of T have between t and 2t children
- And between t-1 and 2t-1 keys
Searching a B-Tree
- Search for key k in node x by comparing k to the keys in x
and reading a new child node from disk, as needed
- Generalizes BST algorithm
- Recursive Code:
-- x is a B-Tree node
-- k is a key
Search(x, k)
-- Find first key in x that is less than k
i := 1
while i ≤ x.n and then k > x.keys(i) loop
i := i + 1;
end loop
-- Return x and location i if key is found
if i ≤ x.n and then k = x.keys(i) then
return (x, i)
elsif x.leaf then
return not found
else
return Search(x.child(i), k)
end if;
Insert(T, k)
- Insert key k into B-Tree T
- Code:
-- Root of T may be full
Insert(T, k)
r := T.root
-- if root is full, must split it
if r.n == 2t - 1 then
s := new node
T.root := s
s.n := 0
s.child(1) := r
Split-Child(s, 1); -- Split old root
Insert-Nonfull(s, k)
else -- root is not full and so no split needed
Insert-Nonfull(r, k)
end if
end Insert
If the root is not full, then there is room to insert
- If a lower node fills, split it and move a key up
- Since the root is not full, there is always room for this operation
B-Tree Split a Child - Implementation
- Split x.child(i) of node x
- Child i of node x is full with 2t children (2t-1 keys).
- Move key at t in child i up into x
- and create a new child i+1 in x with t-1 keys larger than key t
- Create a new node to be new child i+1 of x
- New node will hold upper t children (t-1 keys) of original child i
- The keys in node x are distributed as follows:
- Upper t-1 keys in child i go to new child i+1
- One key moves from child i up into node x
- Lower t-1 keys in child i remain in child i, unchanged.
B-Tree Split a Child
Split-Child(x, i)
----------------
-- Update child i
-- Child i goes from 2t-1 keys (2t children) to t-1 keys (t children)
-- (1 key moves up and t-1 keys move right)
y := x.child(i)
y.n := t - 1
--------------------------------
-- Create and fill the new node (for child i+1)
z := new node
z.leaf := y.leaf
-- Move upper t-1 keys from y (child i) to z (new child i+1)
z.n := t - 1
for j in 1 .. t-1 loop
z.key(j) := y.key(t+j)
end loop
-- Move upper t children from y to z, if needed
if not y.leaf then
for j in 1 .. t loop
z.child(j) := y.child(t+j)
end loop
end if
--------------------------------
-- Update x
for j in reverse i .. x.n loop -- Shift children and keys right
x.key(j+1) := x.key(j)
x.child(j+2) := x.child(j+1)
end loop
x.key(i) := y.key(t) -- Move key up from child
x.child(i+1) := z -- Put new child into x
x.n := x.n+1 -- x now has n+1 keys
WRITE(x)
WRITE(y)
WRITE(z)
end Split-Child
B-Tree Split Child - Example
- Assume t=4
- Before splitting child i:
- Node x contains L, M, W, X
- Child i of x contains 7 keys: P,Q,R,S,T,U,V
- Child i of x contains 8 children: T1, ..., T8
- After splitting child i:
- Node x contains L, M, S, W, X
- Child i of x contains 3 keys: P,Q,R
- Child i+1 of x contains 3 keys: T,U,V
- Child i of x contains 4 children: T1, ..., T4
- New Child i+1 of x contains 4 children: T5, ..., T8
B-Tree Insert Nonfull
- Insert key k into nonfull node x
- Code:
Insert-Nonfull(x, k)
i := x.n
if x.leaf then
while i ≥ 1 and then k < x.key(i) loop
i := i - 1
x.key(i+1) := x.key(i)
end loop
x.key(i+1) := k
x.n := x.n + 1
WRITE(x)
else
while i ≥ 1 and then k < x.key(i) loop
i := i - 1
end loop
i := i + 1
READ(x.child(i))
if x.child(i).n = 2t - 1
Split-Child(x, i)
if k > x.key(i) then
i := i + 1
end if
end if
Insert-Nonfull(x.child(i), k)
end if
B-Tree Variant: B+-Tree
- B+ trees keep only keys (and not records) in internal nodes
- All records are at leaves
- Enables easy sequential access