Binary Heaps
I've lost count of the number of times I have had to re-learn how binary heaps work. It's time for another refresher. Here are some notes.
Elegant Economy¶
The original binary heap representation appeared in 1964. By today's standards it is elegant and economical.
Data in a binary heap is stored in an array. New elements are inserted, initially, at the end of the array.
The array, A
, is a linearized binary tree. Each element in the array corresponds to a location in a fixed-structure binary tree. That is, A[n]
always represents a specific node in the binary tree.
The first element of A
is always the root of the tree – the top of the heap. Assuming array indexing is 1-based, the children of tree node A[i]
are always stored at A[i * 2]
and A[(i * 2) + 1]
.
Full and Balanced¶
In our current model of the electron "shells" of an atom, all of the inner shells are always filled to capacity with electrons. Only the outermost shell may be incompletely filled.
Similarly, new elements are always added to the end of A
. As a result the inner levels of the corresponding binary tree are always full. At most one leaf node is empty at any time.
Order¶
The values in a binary heap are partially ordered in one of two ways.
In a max heap, the value in any node is always ≥ the value of either of its children node. The maximum value in such a heap is always stored in the root node. (There it is again: the top of the heap.)
In a min heap, the value in any node is always ≤ the value of either child; so the minimum value is always at the top of the heap.
This ordering relationship could be encoded by an ordering function.
import typing as tp
import abc
import random
class Comparable(abc.ABC):
@abc.abstractmethod
def __ge__(self, other: tp.Any) -> bool: ...
@abc.abstractmethod
def __lt__(self, other: tp.Any) -> bool: ...
ValueType = tp.TypeVar('ValueType', bound=Comparable)
OrderFn = tp.Callable[[ValueType, ValueType], bool]
ValueList = tp.List[ValueType]
def max_heap(v1: ValueType, v2: ValueType) -> bool:
return v1 >= v2
Insertion¶
To insert a new value into a binary heap, first append the new value to the end of the heap array A
. Then compare the new value to its parent, swapping if the two values do not satisfy the ordering function (e.g., parent ≥ child) for this heap.
class InsertableBinaryHeap:
def __init__(self, order_fn: OrderFn) -> None:
self._ordered = order_fn
self._a: ValueList = []
def empty(self) -> bool:
return len(self._a) <= 0
def insert(self, new_value: ValueType) -> None:
i = len(self._a)
self._a.append(new_value)
while i > 0:
# Zero-based indexing is a bear.
# i = (i_parent + 1) * 2 - 1, so:
i_parent = (i + 1) // 2 - 1
v: ValueType = self._a[i]
v_parent: ValueType = self._a[i_parent]
if not self._ordered(v_parent, v):
self._a[i_parent], self._a[i] = v, v_parent
i = i_parent
def array(self) -> tp.List[ValueType]:
return self._a[:]
bh = InsertableBinaryHeap(max_heap)
print("Empty initially:", bh.empty())
for i in range(8):
bh.insert(i)
print("Empty after insertion:", bh.empty())
bh.array()
Removal¶
The binary heap is often used as the underlying structure for a priority queue, which supports add (aka push, insert) and remove (aka pop) operations. Removal from a priority queue always returns the top-priority item – the top of the underlying heap.
When the root is removed, one of the remaining items needs to be moved into the root position, and the heap needs to be re-organized to maintain its ordering.
Start by making the last array item the new heap root. (This is guaranteed to break the ordering relationship of the heap.) Then, starting from the root, compare each node to its children. If they satisfy the ordering relationship, stop. Otherwise, swap with the child which, according to the heap's ordering, should be root-most. Then continue rebalancing with the subtree whose child was swapped.
class PriorityQueue(InsertableBinaryHeap):
def remove(self) -> ValueType:
# Let Python raise exceptions if the heap is empty.
result: ValueType = self._a[0]
self._re_root()
return result
def _re_root(self) -> None:
new_root = self._a.pop()
# If the array is now empty, then new_root was the only item -- it will
# be returned by remove(). Otherwise, there is at least one child, and
# we need to rebalance.
newlen = len(self._a)
if newlen > 0:
self._a[0] = new_root
self._rebalance()
def _rebalance(self) -> None:
# From https://en.wikipedia.org/wiki/Binary_heap#Extract
i = 0
a = self._a
while True:
# zero-based indexing leads to this fenceposting nonsense.
i_left = (i + 1) * 2 - 1
i_right = i_left + 1
i_hoist = self._rvi(self._rvi(i, i_left), i_right)
if i_hoist == i:
return
a[i], a[i_hoist] = a[i_hoist], a[i]
i = i_hoist
def _rvi(self, i_parent: int, i_child: int) -> int:
# Given a parent "node" index and a child index, return the index
# whose value should be root-most, given self's ordered relationship.
a = self._a
if (i_child < len(a)) and not self._ordered(a[i_parent], a[i_child]):
return i_child
return i_parent
q = PriorityQueue(max_heap)
sep = "Values = ["
for i in range(20):
v = random.randint(1, 101)
print(f"{sep}{v}", end="")
q.insert(v)
sep = ", "
print("]")
print("Heap array:", q.array())
vprev = None
while not q.empty():
v = q.remove()
print(v, end=" ")
# Verify descending order of max_heap
assert((vprev is None) or (vprev >= v))
vprev = v
print()