For June 2012, it was back to some computer science. The problem posed was: given a string of characters, find the longest subsequence in that string that is a palindrome. So, taking the example ‘abbaca’ (hey, computer science is full of nonsense strings), it’s fairly obvious there are two palindromic substrings – ignoring those of length 1, which are just silly – ‘abba’ and ‘aca’. The article delves into some algorithms for solving this problem, from the O(n^2) version to a more intricate O(n) one.
The first one is very simple to describe: at each character or between two characters, look at the previous and next characters. If they are the same, look at the previous and next to those. Continue like this until the character comparison fails, and make a note of the position and length of the palindrome substring. Since the number of positions to start making these comparison tests is proportional to the number of characters in the string (roughly 2n, if there are n characters), and there’s a possibility of comparing all the characters (for n/2 comparisons) at each point, the algorithm is O(n^2).
The second algorithm is more subtle and requires the use of a data structure derived from suffix trees, known as suffix arrays. A suffix tree is a structure that stores in sorted order all the suffixes of the string. A suffix is just the remainder of the string from a given character in the string. So for ‘abbaca’, the suffixes are ‘abbaca’, ‘bbaca’, ‘baca’, ‘aca’, ‘ca’, and ‘a’. There is a wrinkle though: we calculate the suffixes of the string concatenated to its reverse and use another array of information, the longest common prefix array, to determine the longest palindromic substring. (For more information, read the article.) Since this algorithm is dominated by the creation of the suffix array, a O(n.log n) algorithm, this more subtle algorithm turns out to be O(n.log n) itself.
I then outline another, more complex algorithm, Manacher’s Algorithm that uses some clever reasoning to be able to skip ahead in the string as it works out where the longest palindrome may be. Since it just makes a single pass through the string (with the skipping), it is considered to be a O(n) algorithm; that is, linear in time.
From a simple recreational puzzle, the article goes through some unfamiliar computer science to get at a pretty remarkable algorithm.
This article first appeared in issue 322, June 2012.
You can read the PDF here.
(I used to write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK, which sadly is no longer published. The column was called Theory Workshop and appeared in the Make It section of the magazine. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so.)