Weirdly, someone pinged me about a week ago asking about the A* algorithm. Wut?, I thought to myself, and then remembered writing an article about it way back when. So, here goes…
The A* algorithm (pronounced “A Star”) is an algorithm that tries to find the best path to go from here to there (imagine me pointing and hand-waving at this juncture). As a definition that’s a little woolly, so let’s see if I can’t add some more weight. By “best path” I mean a path that is smallest in “distance” or “time” or “effort” (or all of the above) to go from one state to another in some problem space. Mmm, we seem to be drowning in abstractions so let’s be a little more concrete.
You are writing a strategy game. The computer is in charge of one or more workers (like farmers, woodsmen, soldiers) and has to maneuver them around some dynamic landscape that has obstacles in the form of walls, trees, or water or other workers. How does the software calculate the best path for a worker to go from point A to point B and avoid (or go round) the obstacles in its way?
You are writing a mapping program which has to provide driving directions to go from point A to point B. The program has to take account of various issues: highways tend to be faster than side roads; traveling at a faster speed uses more fuel and is therefore more expensive; road works can disrupt the traffic flow; toll roads cost money to use; and so on.
You are writing a solver program for a sliding tile puzzle. The most familiar example of such a puzzle is a 4x4 square board containing 15 square tiles (that is, one of the possible 16 is not present and so there’s an empty space), numbered from 1 to 15. The tiles can be slid into the empty space, one by one, left/right or up/down, and the board has a lip around it to stop you sliding the tiles off the board. The puzzle is to mix the tiles up (by sliding them around randomly) and then bring them back into numerical order. What’s the quickest way of doing that?
In each of these examples we have some starting point and a goal we’re trying to reach. Both of these are known as states and the problem space we’re considering is known as the state space. The state space governs whether (and how) we can move from one state to another. In effect, what we need to do in order to solve the problem is to generate states until we reach the goal. The generation of successive states would presumably be under the control of some algorithm, otherwise we’re generally left with a huge number of calculations to do.
In the sliding puzzle game I described above, there are 16! (16 factorial) different states in the game (in other words, there are 16! different ways to arrange the 15 tiles and the one space on the board). That’s 2.0 * 1013 different states, quite a few and enumerating them all just to find the quickest way to go from our original mixed up state to the final in-numeric-order state is a little overkill even if it were practical. If we wanted to write an application that worked on higher order puzzles (such as a 10x10 version with 100! or 10158 possible states) we’d have to find a better way to perform the calculation.
These problems have some common characteristics. Their state spaces are huge. Finding the optimal solution to a particular problem would generally take too long for the solution to be usable (imagine if the map program produced the optimal route in more time than it took to drive from point A to point B by the most direct route!). Since the optimal solution is so expensive to calculate, we’d settle for a “good enough” one, providing that it could be calculated in a practical amount of time.
Enter goal-seeking algorithms. These algorithms are designed to solve these kinds of path problems we’re talking about, but in an efficient, relatively quick manner. The solutions they produce may not be optimal (in fact, it’s unlikely we’ll ever produce an optimal result) but the solutions will be close enough not to matter.
The goal-seeking algorithm we’ll look at this time is known as the A* algorithm. (I’m afraid that I don’t know why it has that name, but it’s probably from the paper that originally introduced the concept.) We’ll discuss the algorithm after talking about where it came from and what it was designed to do.
One of the most famous algorithms in graph theory is Dijkstra’s algorithm, used for finding the smallest cost path from one node to another in a graph. Imagine a graph or network of nodes (sometimes called vertexes), connected by edges. Imagine furthermore that the edges have a cost associated with them, so that it costs you to move from one node to another along one of these edges. The cost could be time, it could be monetary, it could be effort, it could be distance, anything so long as it is a positive value.
So, for example, imagine a set of cities connected by highways. Here the cost of an edge (that is, the highway connecting two cities) would be the distance between the cities at its endpoints. Conversely it could be the cost of fuel to drive along that highway. What Dijkstra’s algorithm does is to find the route between two cities A and B such that this cost is the smallest (or, in the more general case, it finds the path from node A to node B along the various edges, visiting other nodes, such that the cost function is minimized).
It does this by a priority-first traversal. You’re starting at city A and let’s assume that the cost function is the distance to drive. A has highways to two other cities, D and E. The cost of going to D is the distance along the highway to D. Ditto for E. For these two cities we make a note of the distance to each from A. Select the city that has the smallest distance value from A that we haven’t visited yet. Suppose this is D and look at the cities we can get to from there. One of them is obviously A, but we’ve already been there so we ignore it. Say there are three other cities, G, H, and I. We can calculate the cost of reaching each of these cities from A by adding the distance from A to D to the distance from D to each of them. Now suppose that we can also reach G directly from E. The minimum cost of getting to G from A is obviously the smaller of the cost through D and the cost through E. (In this idealized example, if the distance to G through E were smaller than the distance via D, you’d travel through E.)
You continue this process, selecting the unvisited city with the smallest distance (cost) from A as you spread out from the cities already visited. It’s as if you are spreading a net wider and wider. If you hit a city already visited, you update its cost (if it is smaller). Eventually you’ll find that you select the goal city B (providing the graph is connected, etc). The cost associated with B is the solution to the problem.
If you want the actual path that has the smallest cost, when you update the cost for a city, you also save the city (or node) from whence you came with this smaller cost. At the end you walk these links backwards to find the path. (Use a stack to get it in the correct order.)
Notice a couple of things about Dijkstra’s algorithm. First, it’s thorough. Even though it’s guided by visiting the node in the graph with the smallest cost first, it could visit a lot of the nodes (if you let it visit all the nodes you’ll find out the smallest cost to get to each node from the start). Second, because it’s thorough, it is guaranteed to find the path with the smallest cost. Third, all costs have to be positive. This one is slightly more difficult to grasp, but imagine this scenario: the cost of an edge was the monetary cost of your fuel. For some reason there’s a petrol station on the highway between X and Y that’s giving away fuel as a promotion. The smallest cost in this scenario would involve continually driving back and forth along this highway: you’d make money every time.
Dijkstra’s algorithm is great then if you have a smallish number of nodes: it’ll calculate the smallest cost very quickly for moving from one node to another. However, if there is a very large number of nodes, it’s going to be really slow going. For example, in theory we could use Dijkstra’s algorithm to solve the sliding tile puzzle with each state being considered a node. But we might have to enumerate a lot of the possible states, and, as we’ve seen, there are 2.0 * 1013 of those. I’m afraid that I don’t have enough disk space for that, let alone main memory.
So this is one of those cases where Dijkstra’s algorithm, no matter how effective at finding the one true answer, just doesn’t cut it. I’m reminded of the days when Rubik’s Cube was all the rage: there were several papers on the mathematics behind solving the cube, and one of them included the assertion that any cube, not matter how disorganized, was only a dozen or so moves away from the solution. It was called God’s algorithm because it was computationally infeasible to calculate it.
Consider the city problem again. Say city B was to the east of city A. If we were using the distance metric as our cost function, we could “optimize” Dijkstra’s algorithm a little bit, and concentrate on cities that were to the east of city A. It makes sense to ignore the cities to the west since going that way would be going in the wrong direction and would just add to the cost for no benefit. Going to the north and south, well maybe, it just depends.
So we could use some kind of heuristic (a guiding function) to help us towards our goal. Instead of blindly checking all cities reachable from the current city and then all cities reachable from those, the function could give greater preference to those cities that are more eastwards and thereby guide us towards B instead of completely analyzing every city we visit. The hope is that the heuristic helps us calculate a pretty optimal solution rather than the optimal one, but does so in a much shorter time (and using much less space) so that the sub-optimality doesn’t matter too much.
This is the essence of the A* algorithm. For each node in the graph that we visit we calculate two values. The first value is the exact cost to reach that node from our starting node (this value is known in the literature as g). This is the same as the cost value we were calculating for Dijkstra’s algorithm. The second value, known as the heuristic, is an estimate of the cost to go from here to the target node (this value is known as h). These two values are added together to produce an overall estimated cost (known as f, and so f=g+h). Basically we then “expand” nodes with the smallest f that we’ve identified so far until we reach the target node.
Let’s assume that we start at node A and we want to get to node B. Create two lists, one called Open and one called Closed. The Open list is for nodes we have seen but have not expanded yet, whereas the Closed list is for nodes that have been expanded; we’ll see how they’re used in a moment. Consider node A. Its g value is 0 (the cost to get to A from A is zero, of course) and we calculate its h value using the heuristic function. Add the A node to the Open list.
Now we go into a loop, popping off nodes from the Open list and expanding them until we reach the target node B. Find the node in the Open list that has the smallest f value. Call this the current node. If it is the target node, we’re done, and we can break out of the loop.
If it isn’t, we now need to “expand” it, which means work out all of the successor nodes for it, that is, all of the nodes that are reachable from this current node. For every one of these successor nodes we need to work out first its g value, that is, the cost to get here from the start node through the current node (simple: it’s the g value for the current node plus the cost to go from the current node to this successor node).
Now the tricky bit. Try and find this node on the Open list. If it’s there it’ll either have a smaller (or equal) g value or a larger one. If the one in the list is smaller (or equal) then we’ll just discard the successor node we just found, and go process the next successor node.
Otherwise, do the same for the Closed list. Try and find the successor node on the Closed list, if it’s there and has a smaller (or equal) g cost then discard the successor node we’re holding and go process the next successor node.
Otherwise (that is, the successor node’s either not on either list or it’s smaller than the equal nodes there), we remove the equal nodes on both lists (remember they may not be there, so we have to be careful in the implementation), calculate its new h value (and hence the f value), set its parent to the current node, and add it to the Open list.
Finally, after processing all its successor nodes in the same manner, we add the current node to the Closed list.
This algorithm may seem complicated, especially with all those operations on the two lists, but think about what’s going on. If we come across a successor node that we’ve seen before (whether it has been expanded or not, that is, is on the Closed list or the Open list), but whose g value is now smaller than before, it behooves us to ignore the fact that we’re now seeing it again and instead treat it as being brand new (that is add it to the Open list again). That way, we can be sure that we recalculate all the right values for all of its successor nodes.
There are still several gotchas about this algorithm though. First: what on earth is this heuristic function? Where do we get it from? At the moment it seems almost magic. Second: we have to be careful with regard to the node class when we come to write this algorithm since we’ll have instances that are equal (that is, they describe the same node), but that have different g values and even different parents. Third: which container class should we use for these Open and Closed lists? They seem to have qualities that go best with a priority queue (get the node with the smallest f value) and other qualities that are better served by a hash table (get the node equal to this one). Two of my favorite containers in one!
So, with these caveats programmer in mind, next time we’ll see how we can implement the algorithm itself.