Binary Heaps

I've lost count of the number of times I have had to re-learn how binary heaps work. It's time for another refresher. Here are some notes.

Elegant Economy

The original binary heap representation appeared in 1964. By today's standards it is elegant and economical.

Data in a binary heap is stored in an array. New elements are inserted, initially, at the end of the array.

The array, A, is a linearized binary tree. Each element in the array corresponds to a location in a fixed-structure binary tree. That is, A[n] always represents a specific node in the binary tree.

The first element of A is always the root of the tree – the top of the heap. Assuming array indexing is 1-based, the children of tree node A[i] are always stored at A[i * 2] and A[(i * 2) + 1].

No description has been provided for this image No description has been provided for this image

Full and Balanced

In our current model of the electron "shells" of an atom, all of the inner shells are always filled to capacity with electrons. Only the outermost shell may be incompletely filled.

Similarly, new elements are always added to the end of A. As a result the inner levels of the corresponding binary tree are always full. At most one leaf node is empty at any time.

Order

The values in a binary heap are partially ordered in one of two ways.

In a max heap, the value in any node is always ≥ the value of either of its children node. The maximum value in such a heap is always stored in the root node. (There it is again: the top of the heap.)

In a min heap, the value in any node is always ≤ the value of either child; so the minimum value is always at the top of the heap.

This ordering relationship could be encoded by an ordering function.

In [1]:
import typing as tp
import abc
import random
In [2]:
class Comparable(abc.ABC):
    @abc.abstractmethod
    def __ge__(self, other: tp.Any) -> bool: ...

    @abc.abstractmethod
    def __lt__(self, other: tp.Any) -> bool: ...


ValueType = tp.TypeVar('ValueType', bound=Comparable)
OrderFn = tp.Callable[[ValueType, ValueType], bool]
ValueList = tp.List[ValueType]
In [3]:
def max_heap(v1: ValueType, v2: ValueType) -> bool:
    return v1 >= v2

Insertion

To insert a new value into a binary heap, first append the new value to the end of the heap array A. Then compare the new value to its parent, swapping if the two values do not satisfy the ordering function (e.g., parent ≥ child) for this heap.

In [4]:
class InsertableBinaryHeap:
    def __init__(self, order_fn: OrderFn) -> None:
        self._ordered = order_fn
        self._a: ValueList = []

    def empty(self) -> bool:
        return len(self._a) <= 0

    def insert(self, new_value: ValueType) -> None:
        i = len(self._a)
        self._a.append(new_value)
        while i > 0:
            # Zero-based indexing is a bear.
            # i = (i_parent +  1) * 2 - 1, so:
            i_parent = (i + 1) // 2 - 1
            v: ValueType = self._a[i]
            v_parent: ValueType = self._a[i_parent]
            if not self._ordered(v_parent, v):
                self._a[i_parent], self._a[i] = v, v_parent
            i = i_parent

    def array(self) -> tp.List[ValueType]:
        return self._a[:]
In [5]:
bh = InsertableBinaryHeap(max_heap)
print("Empty initially:", bh.empty())

for i in range(8):
    bh.insert(i)
print("Empty after insertion:", bh.empty())
bh.array()
Empty initially: True
Empty after insertion: False
Out[5]:
[7, 6, 5, 3, 2, 1, 4, 0]

Removal

The binary heap is often used as the underlying structure for a priority queue, which supports add (aka push, insert) and remove (aka pop) operations. Removal from a priority queue always returns the top-priority item – the top of the underlying heap.

When the root is removed, one of the remaining items needs to be moved into the root position, and the heap needs to be re-organized to maintain its ordering.

Start by making the last array item the new heap root. (This is guaranteed to break the ordering relationship of the heap.) Then, starting from the root, compare each node to its children. If they satisfy the ordering relationship, stop. Otherwise, swap with the child which, according to the heap's ordering, should be root-most. Then continue rebalancing with the subtree whose child was swapped.

In [8]:
class PriorityQueue(InsertableBinaryHeap):
    def remove(self) -> ValueType:
        # Let Python raise exceptions if the heap is empty.
        result: ValueType = self._a[0]
        self._re_root()
        return result

    def _re_root(self) -> None:
        new_root = self._a.pop()
        # If the array is now empty, then new_root was the only item -- it will
        # be returned by remove().  Otherwise, there is at least one child, and
        # we need to rebalance.
        newlen = len(self._a)
        if newlen > 0:
            self._a[0] = new_root
            self._rebalance()

    def _rebalance(self) -> None:
        # From https://en.wikipedia.org/wiki/Binary_heap#Extract
        i = 0
        a = self._a

        while True:
            # zero-based indexing leads to this fenceposting nonsense.
            i_left = (i + 1) * 2 - 1
            i_right = i_left + 1

            i_hoist = self._rvi(self._rvi(i, i_left), i_right)
            if i_hoist == i:
                return

            a[i], a[i_hoist] = a[i_hoist], a[i]
            i = i_hoist

    def _rvi(self, i_parent: int, i_child: int) -> int:
        # Given a parent "node" index and a child index, return the index
        # whose value should be root-most, given self's ordered relationship.
        a = self._a
        if (i_child < len(a)) and not self._ordered(a[i_parent], a[i_child]):
            return i_child
        return i_parent
In [11]:
q = PriorityQueue(max_heap)

sep = "Values = ["
for i in range(20):
    v = random.randint(1, 101)
    print(f"{sep}{v}", end="")
    q.insert(v)
    sep = ", "
print("]")

print("Heap array:", q.array())

vprev = None
while not q.empty():
    v = q.remove()
    print(v, end=" ")
    # Verify descending order of max_heap
    assert((vprev is None) or (vprev >= v))
    vprev = v
print()
Values = [19, 78, 26, 86, 78, 23, 32, 52, 98, 37, 6, 82, 94, 42, 18, 70, 91, 32, 66, 101]
Heap array: [101, 98, 94, 86, 91, 82, 42, 78, 66, 78, 6, 23, 32, 26, 18, 19, 70, 32, 52, 37]
101 98 94 91 86 82 78 78 70 66 52 42 37 32 32 26 23 19 18 6 
In [ ]: