To understand how binary search trees can implement sets
To learn how red-black trees provide performance guarantees for set operations
To choose appropriate methods for tree traversal
To become familiar with the heap data structure
To use heaps for implementing priority queues and for sorting
Basic Tree Concepts
A family tree shows the descendants of a common ancestor.
In computer science, a tree is a hierarchical data structure composed of nodes.
Each node has a sequence of child nodes.
The root is the node with no parent.
A leaf is a node with no children.
Basic Tree Concepts
British royal family tree
Figure 1 A Family Tree
Basic Tree Concepts
Some common tree terms:
Trees in Computer Science
Figure 2 A Directory Tree
Figure 3 An Inheritance Tree
Basic Tree Concepts
A tree class holds a reference to a root node.
Each node holds:
A data item
A list of references to the child nodes
A Tree class:
public class Tree
{
private Node root;
class Node
{
public Object data;
public List<Node> children;
}
public Tree(Object rootData)
{
root = new Node();
root.data = rootData;
root.children = new ArrayList<>();
}
public void addSubtree(Tree subtree)
{
root.children.add(subtree.root);
}
. . .
}
Basic Tree Concepts
When computing tree properties, it is common to recursively visit smaller and smaller subtrees.
Size of tree with root r whose children are c1 ... ck
size(r) = 1 + size(c1) + ... + size(ck)
Basic Tree Concepts
Recursive helper method in the Node class:
class Node
{
. . .
public int size()
{
int sum = 0;
for (Node child : children) { sum = sum + child.size(); }
return 1 + sum;
}
}
The size method in the Tree class invokes the helper if the root exists:
public class Tree
{
. . .
public int size()
{
if (root == null) { return 0; }
else { return root.size(); }
}
}
Self Check 17.1
What are the paths starting with Anne in the tree shown in Figure 1?
Answer:
There are four paths:
Anne
Anne, Peter
Anne, Zara
Anne, Peter, Savannah
Self Check 17.2
What are the roots of the subtrees consisting of 3 nodes in the tree shown in Figure 1?
Answer:
There are three subtrees with three nodes—
they have roots Charles, Andrew, and Edward.
Self Check 17.5
Describe a recursive algorithm for counting all leaves in a tree.
Answer:
If n is a leaf, the leaf count is 1.
Otherwise
Let c1 ... cn be the children of n.
The leaf count is leafCount(c1) + ...+ leafCount(cn).
Self Check 17.6
Using the public interface of the Tree class in this section, construct a tree that is
identical to the subtree with root Anne in Figure 1.
Answer:
Tree t1 = new Tree("Anne");
Tree t2 = new Tree("Peter");
t1.addSubtree(t2);
Tree t3 = new Tree("Zara");
t1.addSubtree(t3);
Tree t4 = new Tree("Savannah");
t2.addSubtree(t4);
Self Check 17.7
Is the size method of the Tree class recursive? Why or why not?
Answer:
It is not. However, it calls a recursive
method—the size method of the Node class.
Binary Trees
In a binary tree, each node has a left and a right child node.
Binary Trees Examples - Decision Tree
A decision tree contains
questions used
to decide among a
number of options.
In a decision tree:
Each non-leaf node contains a question.
The left subtree corresponds to a “yes” answer.
The right subtree corresponds to a “no” answer.
Every node has either two children or no children.
Binary Trees Examples - Decision Tree
Figure 4 A Decision Tree for an Animal Guessing Game
Binary Tree Examples - Huffman Tree
In a Huffman tree:
The leaves contain symbols that we want to encode.
To encode a particular symbol:
Walk along the path from the root to the leaf containing the symbol, and produce:
A zero for every left turn.
A one for every right turn.
Binary Tree Examples - Huffman Tree
Figure 5 A Huffman Tree for Encoding the Thirteen Characters of Hawaiian Alphabet
Binary Tree Examples - Expression Tree
An expression tree
shows the order
of evaluation in
an arithmetic
expression.
The leaves of the expression trees contain numbers.
Interior nodes contain the operators.
(3 + 4) * 5 and 3 + 4 * 5
Figure 6 Expression Trees
Balanced Trees
In a balanced binary tree, each
subtree has approximately the
same number of nodes.
Balanced Trees
Figure 7 Balanced and Unbalanced Trees
Balanced Trees
A binary tree of height h can have up to n = 2h – 1 nodes.
A completely filled binary tree of height 4 has 1 + 2 + 4 + 8 = 15 = 24 – 1 nodes.
Figure 8 A Completely Filled Binary Tree of Height 4
For a completely filled binary tree: h = log2(n + 1)
For a balanced tree: h ≈ log2(n)
Example: the height of a balanced binary tree with 1,000 nodes
Approximately 10 (because 1000 ≈ 1024 = 210).
Example: the height of a balanced binary tree with 1,000,000 nodes
Approximately 20 (because 106 ≈ 220)
You can find any element in this tree in about 20 steps.
A Binary Tree Implementation
Binary tree node has a reference:
to a right child
to a left child
either can be null
Leaf: node in which both children are null.
A Binary Tree Implementation
BinaryTree class:
public class BinaryTree
{
private Node root;
public BinaryTree() { root = null; } // An empty tree
public BinaryTree(Object rootData, BinaryTree left, BinaryTree right)
{
root = new Node();
root.data = rootData;
root.left = left.root;
root.right = right.root;
}
class Node
{
public Object data;
public Node left;
public Node right;
}
. . .
}
A Binary Tree Implementation
To find the height of a binary tree t with left and right children l and r
Take the maximum height of the two children and add 1
height(t) = 1 + max(height(l), height(r))
A Binary Tree Implementation
Make a static recursive helper method height in the Tree class:
public class BinaryTree
{
. . .
private static int height(Node n)
{
if (n == null) { return 0; }
else { return 1 + Math.max(height(n.left), height(n.right)); }
}
. . .
}
Provide a public height method in the Tree class:
public class BinaryTree
{
. . .
public int height() { return height(root); }
}
What is the difference between a tree, a binary tree, and a balanced binary tree?
Answer:
In a tree, each node can have any number of
children. In a binary tree, a node has at most
two children. In a balanced binary tree, all
nodes have approximately as many descendants
to the left as to the right.
Self Check 17.14
Are the left and right children of a binary search tree always binary search trees? Why or why not?
Answer:
Yes––because the binary search condition
holds for all nodes of the tree, it holds for all
nodes of the subtrees.
Self Check 17. 15
Draw all binary search trees containing data values A, B, and C.
Answer:
Self Check 17.16
Give an example of a string that, when inserted into the tree of Figure 12, becomes a right child of Romeo.
Answer:
For example, Sarah. Any string between Romeo and Tom will do.
Self Check 17.17
Trace the removal of the node “Tom” from the tree in Figure 12.
Answer:
“Tom” has a single child. That child replaces “Tom” in the parent “Juliet”.
Self Check 17.18
Trace the removal of the node “Juliet” from the tree in Figure 12.
Answer:
“Juliet” has two children. We look for the smallest child in the right subtree, “Romeo”. The data replaces “Juliet”, and the node is removed from its parent “Tom”.
Tree Traversal - Inorder Traversal
To print a Binary Search Tree in sorted order:
Print the left subtree.
Print the root data.
Print the right subtree.
Public print method starts the recursive process at the root:
public void print()
{
print(root);
}
Preorder and Postorder Traversals
Preorder
Visit the root
Visit left subtree
Visit the right subtree
Postorder
Visit left subtree
Visit the right subtree
Visit the root
A postorder traversal of an expression tree results in an expression in reverse Polish notation.
Preorder and Postorder Traversals
Use postorder traversal to remove all directories from a directory tree.
A directory must be empty before you can remove it.
Use preorder traversal to copy a directory tree.
Can have pre- and post-order traversal for any tree.
Only a binary tree has an inorder traversal.
The Visitor Pattern
Visitor interface to define action to take when visiting the nodes:
public interface Visitor
{
void visit(Object data);
}
Preorder traversal with a Visitor:
private static void preorder(Node n, Visitor v)
{
if (n == null) { return; }
v.visit(n.data);
for (Node c : n.children) { preorder(c, v); }
}
public void preorder(Visitor v) { preorder(root, v); }
You can also create visitors with inorder or postorder.
The Visitor Pattern
Example: Count all the names with at most 5 letters.
public static void main(String[] args)
{
BinarySearchTree bst = . . .;
class ShortNameCounter implements Visitor
{
public int counter = 0;
public void visit(Object data)
{
if (data.toString().length() <= 5) { counter++; }
}
}
ShortNameCounter v = new ShortNameCounter();
bst.inorder(v);
System.out.println("Short names: " + v.counter);
}
Depth-First Search
Iterative traversal can stop when a goal has been met.
Depth-first search uses a stack to track the nodes that it still needs to visit.
Algorithm:
Push the root node on a stack.
While the stack is not empty
Pop the stack; let n be the popped node.
Process n.
Push the children of n on the stack, starting with the last one.
Figure 16 Depth-First Search
Breadth-First Search
Breadth-first search first visits all nodes on the same level before visiting the children.
Breadth-first search uses a queue.
Figure 17 Breadth-First Search
Modify the Visitor interface to return false when the traversal should stop
Breadth-First Search
An implementation:
public interface Visitor
{
boolean visit(Object data);
}
public void breadthFirst(Visitor v)
{
if (root == null) { return; }
Queue<Node> q = new LinkedList<Node>();
q.add(root);
boolean more = true;
while (more && q.size() > 0)
{
Node n = q.remove();
more = v.visit(n.data);
if (more)
{
for (Node c : n.children) { q.add(c); }
}
}
}
Tree Iterators
The Java collection library has an iterator to process trees:
TreeSet<String> t = . . .
Iterator<String> iter = t.iterator();
String first = iter.next();
String second = iter.next();
A breadth first iterator:
class BreadthFirstIterator
{
private Queue<Node> q;
public BreadthFirstIterator(Node root)
{
q = new LinkedList<Node>();
if (root != null) { q.add(root); }
}
public boolean hasNext() { return q.size() > 0; }
public Object next()
{
Node n = q.remove();
for (Node c : n.children) { q.add(c); }
return n.data;
}
}
Self Check 17.19
What are the inorder traversals of the two trees in Figure 6 on page 771?
Answer:
For both trees, the inorder traversal is 3 + 4 * 5.
Self Check 17.20
Are the trees in Figure 6 binary search trees?
Answer:
No—for example, consider the children of +.
Even without looking up the Unicode values
for 3, 4, and +, it is obvious that + isn't between
3 and 4.
Self Check 17.24
What are the first eight visited nodes in the breadth-first traversal of the tree in
Figure 1?
Answer:
That’s the royal family tree, the first tree in the
chapter: George V, Edward VIII, George VI,
Mary, Henry, George, John, Elizabeth II.
Red-Black Trees
A kind of binary search tree that rebalances itself after each insertion or removal.
Guaranteed O(log(n)) efficiency.
Additional requirements:
Every node is colored red or black.
The root is black.
A red node cannot have a red child (the “no double reds” rule).
All paths from the root to a null have the same number of black nodes (the “equal exit cost” rule).
Example
Figure 18 A Red-Black Tree
Red-Black Trees
Think of each node of a red-black tree as a toll booth. The total toll to each exit is the same.
Red-Black Trees
Figure 19 A Tree that Violates "Equal Exit Cost" Rule
Figure 20 A Tree that Violates the "No Double Red" Rule
Red-Black Trees
The “equal exit cost” rule eliminates highly unbalanced trees.
You can’t have null references high up in the tree.
The nodes that aren't near the leaves need to have two children.
The “no double reds” rule gives some flexibility to add nodes without having to restructure the tree all the time.
Some paths can be a bit longer than others
None can be longer than twice the black height.
Red-Black Trees
Black height of a node:
The cost of traveling on a path from a given node to a null
The number of black nodes on the path
Black height of the tree:
The cost of traveling from the root to a null
Tree with black height bh
must have at least 2bh – 1 nodes
bh ≤ log(n + 1)
The “no double reds” rule says that the total height h of a tree is at most twice the black height:
h ≤ 2 · bh ≤ 2 · log(n + 1)
Traveling from the root to a null is O(log(n)).
Red-Black Trees - Insertion
First insert the node as into a regular binary search tree:
If it is the root, color it black
Otherwise, color it red
If the parent is black, it is a red-black tree.
If the parent is red, need to fix the "double red" violation.
We know the grandparent is black.
Red-Black Trees - Insertion
Four possible configurations given black grandparent:
Smallest, middle, and largest labeled n1, n2, n3
Their children are labeled in sorted order, starting with t1
Figure 21 The Four Possible Configurations of a "Double Red"
Red-Black Trees - Insertion
Re-arrange n2 with children n1 and n3 colored black.
Figure 22 Fixing the "Double Red" Violation
Move up the tree fixing any other "double-red" problems in the same way.
If the troublesome red parent is the root,
Turn it black.
That will add 1 to all paths, preserving "equal exit cost" rule.
Red-Black Trees - Insertion
When the height of the binary search tree is h:
Finding the insertion point takes at most h steps.
Fixing double-red violations takes at most h / 2.
Because we know h = O(log(n)), insertion is guaranteedO(log(n)) efficiency.
Red-Black Trees - Removal
Before removing a
node in a red-black
tree, turn it red and
fix any double-black
and double-red
violations.
First remove the node as in a regular binary search tree.
If the node to be removed is red, just remove it.
If the node to be removed is black and has a child:
Color that child black
Troublesome case is removal of a black leaf.
Just removing it will cause an "equal exit cost" violation.
So turn it into a red node.
Red-Black Trees - Removal
To turn a black node into a red one:
Bubble up the cost by adding one to the parent and subtracting 1 from the children.
Results in six possible configurations:
May result in a double black and negative red.
Transform in this manner:
Figure 23 Eliminating a Negative-Red Node with a Double-Black Parent
Red-Black Trees - Removal
Fixing a Double-Red
Violation Also Fixes
a Double-Black
Grandparent
Figure 24 Fixing a Double-Red Violation Also Fixes a Double-Black Grandparent
Red-Black Trees - Removal
Figure 25 Bubbling Up a Double-Black Node
If a double black reaches the root, replace it with a regular black:
Reduces the cost of all paths by 1
Preserves the "equal exit cost" rule
Red-Black Trees - Efficiency
Self Check 17.25
Consider the extreme example of a tree with only right children
and at least three nodes. Why can’t this be a red-black tree?
Answer:
The root must be black, and the second or
third node must also be black, because of the
“no double reds” rule. The left null of the root
has black height 1, but the null child of the next
black node has black height 2.
Self Check 17.26
What are the shapes and colorings of all possible red-black
trees that have four nodes?
Answer:
Self Check 17.27
Why does Figure 21 show all possible configurations of a
double-red violation?
Answer:
The top red node can be the left or right child
of the black parent, and the bottom red node
can be the left or right child of its (red) parent,
yielding four configurations.
Self Check 17.28
When inserting an element, can there ever be a triple-red violation in Figure 21?
That is, can you have a red node with two red children? (For example, in the first
tree, can t1 have a red root?)
Answer: No. Look at the first tree. At the beginning, n2
must have been the inserted node. Because the
tree was a valid red-black tree before insertion,
t1 couldn't have had a red root. Now consider
the step after one double-red removal. The
parent of n2 in Figure 22 may be red, but then
n2 can't have a red sibling—otherwise the tree
would not have been a red-black tree.
Self Check 17.29
When removing an element, show that it is possible to have a triple-red violation
in Figure 23.
Answer: Consider this scenario, where X is the black
leaf to be removed.
Self Check 17.30
What happens to a triple-red violation when the double-red fix is applied?
Answer:
It goes away. Suppose the sibling of the red
grandchild in Figure 21 is also red. That means
that one of the ti has a red root. However, all of
them become children of the black n1 and n3 in
Figure 22.
Heaps
A heap (min-heap):
Binary tree
Almost completely filled:
All nodes are filled in, except the last level.
May have some nodes missing toward the right.
All nodes fulfill the heap property:
The value of any node is less than or equal to the values of its descendants.
The value of the root is the minimum of all all the values in the tree.
Heaps
Figure 26 An Almost Completely Filled Tree
Heaps
In an almost complete tree, all layers but one are completely filled.
Heaps
Figure 27 A Heap
The value of every node is smaller than all its descendants.
Heaps
Differences from a binary search tree
The shape of a heap is very regular.
Binary search trees can have arbitrary shapes.
In a heap, the left and right subtrees both store elements that are larger than the root element.
In a binary search tree, smaller elements are stored in the left subtree and larger elements are stored in the right subtree.
Heaps - Insertion
Algorithm to insert a node
Add a vacant slot to the end of the tree.
If the parent of the empty slot if it is larger than the element to be inserted:
Demote the parent by moving the parent value into the vacant slot.
Move the vacant slot up.
Repeat this demotion as long as the parent of the vacant slot is larger than the element to be inserted.
Insert the element into the vacant slot at this point:
Either the vacant slot is at the root.
Or the parent of the vacant slot is smaller than the element to be inserted.
Heaps - Insertion Step 1
Heaps - Insertion Step 2
Heaps - Insertion Step 3
Heaps - Removing the Root
The root contains the minimum of all the values in the heap.
Algorithm to remove the root:
Extract the root node value.
Move the value of the last node of the heap into the root node.
Remove the last node.
One or both of the children of the root may now be smaller - violating the heap property.
Promote the smaller child of the root node.
Repeat this process with the demoted child - promoting the smaller of its children.
Continue until the demoted child has no smaller children.
The heap property is now fulfilled again. This process is called “fixing the heap”.
Heaps - Removing the Root Steps 1 and 2
Heaps - Removing the Root Step 3
Heaps - Efficiency
Inserting or removing a heap element is an O(log(n)) operation.
These operations visit at most h nodes (where h is the height of the tree).
A tree of height h contains between 2h-1 and 2h nodes (n).
2h-1≤ n < 2h
h − 1 ≤ log2(n) < h
Heaps
The regular layout of a heap makes it possible to store heap nodes very efficiently in an array.
Store the first layer, then the second, and so on.
Leave the 0 element empty.
Figure 30 Storing a Heap in an Array
The child nodes of the node with index i have index 2 • i and 2 • i + 1.
The parent node of the node with index i has index i / 2.
Heaps
A max-heap has the largest element stored in the root.
A min-heap can be used to implement a priority queue.
The software that controls the events in a user interface keeps the events in a data
structure. Whenever
an event such as a mouse move or repaint request occurs,
the event is added. Events are retrieved according to their importance. What
abstract data type is appropriate for this application?
Answer:
A priority queue is appropriate because we
want to get the important events first, even if
they have been inserted later.
Self Check 17.32
In an almost-complete tree with 100 nodes, how many nodes are missing in the lowest level?
Answer:
27. The next power of 2 greater than 100 is 128, and a completely filled tree has 127 nodes.
Self Check 17.33
If you traverse a heap in preorder, will the nodes be in sorted order?
Answer: Generally not. For example, the heap in Figure
30 in preorder is 20 75 84 90 96 91 93 43 57 71.
Self Check 17.34
What is the heap that results from inserting 1 into the following?
Answer:
Self Check 17.35
What is the result of removing the minimum from the following?
Answer:
The Heapsort Algorithm
The heapsort
algorithm:
Insert all elements into the heap.
Keep extracting the minimum.
Heapsort is an O(n log(n)) algorithm.
The Heapsort Algorithm
Can convert an existing array into a heap.
Fix small subtrees into heaps, then fix larger trees.
Subtrees of size 1 are already heaps.
Begin fixing subtrees at next-to-last level.
int n = a.length - 1;
for (int i = (n - 1) / 2; i >= 0; i--)
{
fixHeap(a, i, n);
}
Tree to Heap
Figure 31 Turning a Tree into a Heap
Tree to Heap
Better to use a max-heap rather than a min-heap.
For each step, can swap root element to last position of array, then reduce tree size.
Which algorithm requires less storage, heapsort or merge sort?
Answer: Heapsort requires less storage because it
doesn't need an auxiliary array.
Self Check 17.37
Why are the computations of the left child index and the right child index in the HeapSorter different than in MinHeap?
Answer: The MinHeap wastes the 0 entry to make the
formulas more intuitive. When sorting an
array, we don’t want to waste the 0 entry, so
we adjust the formulas instead.
Self Check 17.38
What is the result of calling HeapSorter.fixHeap(a, 0, 4) where a contains 1 4 9 5 3?
Answer: In tree form, that is
Remember, it’s a max-heap!
Self Check 17.39
Suppose after turning the array into a heap, it is 9 4 5 1 3. What happens in the
first iteration of the while loop in the sort method?
Answer:
The 9 is swapped with 3, and the heap is fixed up again, yielding
5 4 3 1 9.
Self Check 17.40
Does heapsort sort an array that is already sorted in O(n) time?
Answer:
Unfortunately not. The largest element is
removed first, and it must be moved to the
root, requiring O(log(n)) steps. The second largest
element is still toward the end of the
array, again requiring O(log(n)) steps, and
so on.