Last time, we introduced the concept of a priority queue data structure, and illustrated it with a couple of simple implementations using an array: firstly by adding new items at the end, and then getting the highest priority item by searching through the array; secondly by maintaining the array in sorted order by priority, forcing new items to be inserted in the correct position in the array, and getting an item is merely removing it from the end of the array. I called these fast insertion/slow deletion and slow insertion/fast deletion implementations.
Can we do better than this? Yes, we can, by abandoning the array and moving to another data structure entirely: the binary search tree. Here, insertions and deletions are both O(log N) operations - in other words, the time taken for both item insertion and item deletion are proportional to the logarithm of the number of items in the structure. I won’t go into this structure here today because the binary tree suffers from one big problem: you need to worry about balancing it to maintain its insertion and deletion time properties. There are a few balancing algorithms out there (red-black, AVL, and splaying, to name but three) and they deserve articles of their own. Of course you needn’t worry too much about them here because I have a better structure up my sleeve anyway.
In a binary search tree, the nodes are arranged so that for every node, it is greater than its left child and less than its right child. This is known as strict ordering. There is a lesser ordering called the heap property that can be applied to a binary tree that just states that any node in the tree must be greater than both its children. Note that the heap property doesn’t state that the left child is less than the right child (for then the heap property becomes a strict ordering again), but just that the parent is greater than both its children. There’s one more thing to the heap property: the tree to which it applies must be complete. A binary tree is called complete when all its levels are full, except for possibly the last. A complete tree is as balanced as it can be.
So how does this help us in our quest for the perfect priority queue structure? Well, it turns out that the insertion and deletion operations on a binary tree with the heap property are O(log N), but they are significantly faster than the same operations in a binary search tree. (By the way, let’s shorten “binary tree with the heap property” into “heap”, it’ll save on pixels.) This is one instance where the big-O notation falls short, it doesn’t give any quantitative feel for which of two operations with the same big-O value is actually faster.
Anyway, how do we insert and remove from a heap? To do an insertion we perform a bubble-up operation, with removal we perform a trickle-down operation. Let’s see what these cute terms actually mean.
Insertion first. To insert an item into a heap, we add it to the end of the heap (in the only place that continues to maintain the completeness attribute - in the figure above that would be the right child of the 4 node). At this point the heap property of the tree may be violated (the new node may be larger than its parent), so we need to patch the tree up. If this new child node is greater than its parent, swap it with the parent. Similarly our new node may also greater than its new parent, and so it needs to be swapped again. In this manner we continue working our way up the heap until we reach a point where our new node is no longer greater than its parent, or we’ve reached the root of the tree. We’ve now ensured that all nodes are greater than both their children again and the heap property has been restored. As you can see, we bubble up the new node until it reaches its correct place (the root or just under a node that is larger than it), hence the term.
If you think about it, the heap property ensures that the largest item is at the root (if it weren’t, it would have a parent smaller than itself, hence the heap property would be violated). We can therefore move onto removal: the item we want is at the root. The theory would seem to indicate that we delete the root node - passing the item back as a result - but this would leave us with two separate subtrees, an utter violation of the completeness attribute of our heap. Instead we replace the root node with the last node of the heap and thereby ensure that the tree remains complete. But, again, we’ve probably violated the heap property: the new root may be smaller than one or both of its children. So we find the larger of the node’s two children and swap it over. Again, this new position may violate the heap property, so we verify whether it’s again smaller than one (or both) of its two children, and repeat the process. Eventually, we’ll find that the node has sunk or trickled down to a level where it is greater than both its children, or it’s now a leaf with no children. Either way we’ve again restored the heap property.
Pretty simple, huh? Well, unfortunately using a tree like this is pretty wasteful of space. For every node we have to maintain three pointers: one for each child, so that we can trickle down the tree, and one for the parent, so that we can bubble up. Every time we swap nodes around we run the risk of having to update umpteen different pointers for numerous nodes. So, the usual trick is to leave the nodes where they are and just swap the items around inside the nodes instead.
There is however a simpler way. A complete binary tree can be easily represented by an array. Look at that image of a complete binary tree again. Notice how the node numbers are sequential and have been allocated to the nodes from top to bottom, level by level, and from left to right on each level. This top/bottom, left/right annotation provides a nice mapping from node to element number in an array. There are no ‘holes’ in the mapping; the fact that the heap is a complete tree helps there. Now have a look at the numbers of the children for each node. The children for node 0 are 1 and 2 respectively. The children for node 3 are 7 and 8, for node 5 they are 11 and 12. Notice any pattern? In fact the children for node n are nodes 2n+1 and 2n+2, and the parent for node n is |(n-1)/2| (where the || operator means take the integer part of the bit in between the bars - in Javascript the accepted way is to use Math.floor()
, unless you want to get into the bitwise operators). Suddenly we have a simple way of implementing a heap with an array and are able to work out where the children and parent of a node are in the array. And furthermore, we can use an array again!
function createPriorityQueue(comparer) {
"use strict";
var queue = [];
var bubbleUp = function(index, item) {
// if we're at the top of the tree, set the item there
if (index === 0) {
queue[0] = item;
return;
}
// compare the item with its parent, if greater move the parent down
// and then bubbleup from the parent position
// if not, we're at the place where the item should be
var parentIndex = Math.floor((index - 1)/2);
if (comparer(item, queue[parentIndex]) > 0) {
queue[index] = queue[parentIndex];
bubbleUp(parentIndex, item);
}
else {
queue[index] = item;
}
};
var trickleDown = function (index, item) {
// calculate the index of the left child
var child = (index * 2) + 1;
// if there is one...
if (child < queue.length) {
// work out the larger of the left and right children
if (child + 1 < queue.length) {
if (comparer(queue[child], queue[child + 1]) < 0) {
child += 1;
}
}
// if the item is less than the larger child, move that larger child up
// and trickledown from that child's position
if (comparer(queue[child], item) > 0) {
queue[index] = queue[child];
trickleDown(child, item);
}
else {
queue[index] = item;
}
}
else {
queue[index] = item;
}
};
var push = function (item) {
queue.push(item);
bubbleUp(queue.length - 1, item);
};
var pop = function () {
// take care of easy cases
if (queue.length === 0) {
return null;
}
if (queue.length === 1) {
return queue.pop();
}
// return the item at the root
var result = queue[0];
// replace the root with the child at the lowest, rightmost position
queue[0] = queue.pop();
// now trickle down the root item as far as it will go
if (queue.length > 0) {
trickleDown(0, queue[0]);
}
return result;
};
return {
push: push,
pop: pop
};
}
This listing shows this new code for insert and remove. Rather than describe what’s going on, I leave you to follow the description of bubble-up and trickle-down and check the code and comments yourself; it’s all pretty easy.
(Aside: In the literature, nodes are usually counted from one instead. This makes the arithmetic a little easier: node n’s children are at 2n and 2n+1 and its parent is at |n/2|. We’ll stick with our original formulae though, since we’re using a JavaScript array.)
Having obtained a heap implementation of a priority queue, we can observe that the heap can in fact be used as a sorting algorithm: add a bunch of items all in one go to the heap and then pick them off one by one in the correct order. (Note that, as written, the items are picked off in reverse order, largest first, but with a quick change of the compare method, we can get them in the correct ascending order instead. In fact, the heap we have developed so far is known as a max-heap - the largest is picked off first - and one that works in reverse order is known as a min-heap - the smallest is picked off first.) Sorting with heaps, or more strictly with the heap algorithm, is known as… heapsort.
So just for fun, let’s play around with the heap, at least with regard to sorting. The algorithm I’ve given so far is this: assume we have a min-heap, add all the items to it, and then retrieve them one by one. If the items were held in an array in the first place, this algorithm would mean that all the items would be copied from one array to another, and then copied back. Is there any way to order the items into a heap in situ without having to have a separate work array to hold them? In other words, can we make an existing array into a heap by ‘applying’ a heap structure to it? Amazingly enough, we can, and furthermore we can do so in O(N) time, rather than the O(NlogN) time required by adding the items one by one to a separate heap. We use an algorithm called Floyd’s Algorithm.
We start out with a filled unsorted array and identify the parent of the rightmost child node (i.e., the node furthest to the right on the last complete level of the heap). Apply the trickle-down algorithm to this parent. Select the node to the left of the parent on the same level (it’ll be a parent as well, of course). Apply the trickle-down algorithm again. Keep on moving left, applying the trickle-down algorithm, until you run out of nodes. Move up a level, to the rightmost node. Continue the same process from right to left, going up level by level, until you reach the root node. At this point the array has been ordered into a heap and we could start peeling off items one by one in the usual manner.
Having ordered an array into a heap, what then? Peeling off the items one by one still means we need somewhere to put them in sorted order, presumably some auxiliary array. Or does it? Think about it for a moment. If we peel off the largest item (so the heap should be a max-heap), the heap size reduces by one, leaving space at the end for the item we just obtained. In fact, the algorithm to remove an item from a heap requires the lowest, rightmost node to be copied to the root before being trickled down, so all we need to do is to swap the root with the lowest, rightmost node, reduce the count of items in the heap, and then apply the trickle-down algorithm. Continue doing this until we run out of items in the heap. What’s left are the items sorted in the original array.
And that is heapsort.
Heapsort is important for a couple of reasons. It’s an O(NlogN) algorithm, so it’s fast. The other reason is that heapsort is equally as fast in both the general case and in the worst case. Compare this with quicksort. In the general case quicksort is faster than heapsort, but quicksort can be easily tripped up by an already sorted set of items, causing it to crawl (it becomes an O(N2) algorithm, unless we apply some algorithm improvements). In this worst case scenario, heapsort is better.
var heapsort = function (array, comparer) {
"use strict";
var trickleDown = function (index, item, countOfItems) {
// calculate the index of the left child
var child = (index * 2) + 1;
// if there is one...
if (child < countOfItems) {
// work out the larger of the left and right children
if (child + 1 < countOfItems) {
if (comparer(array[child], array[child + 1]) < 0) {
child += 1;
}
}
// if the item is less than the larger child, move that larger child up
// and trickledown from that child's position
if (comparer(array[child], item) > 0) {
array[index] = array[child];
trickleDown(child, item, countOfItems);
}
else {
array[index] = item;
}
}
else {
array[index] = item;
}
};
// cater for the simple cases where there's nothing to do
if (array.length <= 1) {
return;
}
// first heapify the array items (Floyd's Algorithm)
// last parent is at (length / 2) - 1
var lastParent = Math.floor(array.length / 2) - 1;
for (var i = lastParent; i >= 0; i -= 1) {
trickleDown(i, array[i], array.length);
}
// now remove the items one at a time from the heap,
// placing them at the end of the array, replacing
// them with the old last item, then trickle down
for (i = array.length - 1; i >= 0; i -= 1) {
var item = array[0];
array[0] = array[i];
array[i] = item;
trickleDown(0, array[0], i);
}
};
This listing shows the heapsort routine. There’s something I didn’t point out before which I should. If you are using a max-heap, you retrieve the items in reverse order, from largest to smallest. However, if you are using a max-heap to do a heapsort, you get the items sorted in ascending order in your array, not in reverse order. With a min-heap you’d peel items off in ascending order, but you’d heapsort in descending order. Just something to be aware of.
And with that, we come to the end of this couple of posts on the priority queue. We’ve come a long way in the meantime, from a simplistic priority queue that allows addition of items and their removal in priority order (with one or the other operation inefficiently coded), to a more sophisticated version that performs these two operations equally efficiently using a data structure known as a heap. Knowing what a heap was and how it works, gave us heapsort.
(A long time ago I used to write a monthly algorithms column for The Delphi Magazine, called Algorithms Alfresco. This was a Delphi programmer magazine in the UK, which sadly closed down in 2007. I've decided to reuse/revamp those articles in a long series here on my blog, but coding in JavaScript instead of Delphi. This article first appeared in the November 1998 issue.)
No Responses
Feel free to add a comment...
Leave a response
Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.
_emphasis_
**strong**
[text](url)
`IEnumerable`
* an item
1. an item
> Now is the time...
Preview of response