PCPlus 322: Finding palindromic substrings

For June 2012, it was back to some computer science. The problem posed was: given a string of characters, find the longest subsequence in that string that is a palindrome. So, taking the example ‘abbaca’ (hey, computer science is full of nonsense strings), it’s fairly obvious there are two palindromic substrings – ignoring those of length 1, which are just silly – ‘abba’ and ‘aca’. The article delves into some algorithms for solving this problem, from the O(n^2) version to a more intricate O(n) one.

PCPlus logoThe first one is very simple to describe: at each character or between two characters, look at the previous and next characters. If they are the same, look at the previous and next to those. Continue like this until the character comparison fails, and make a note of the position and length of the palindrome substring. Since the number of positions to start making these comparison tests is proportional to the number of characters in the string (roughly 2n, if there are n characters), and there’s a possibility of comparing all the characters (for n/2 comparisons) at each point, the algorithm is O(n^2).

The second algorithm is more subtle and requires the use of a data structure derived from suffix trees, known as suffix arrays. A suffix tree is a structure that stores in sorted order all the suffixes of the string. A suffix is just the remainder of the string from a given character in the string. So for ‘abbaca’, the suffixes are ‘abbaca’, ‘bbaca’, ‘baca’, ‘aca’, ‘ca’, and ‘a’. There is a wrinkle though: we calculate the suffixes of the string concatenated to its reverse and use another array of information, the longest common prefix array, to determine the longest palindromic substring. (For more information, read the article.) Since this algorithm is dominated by the creation of the suffix array, a O(n.log n) algorithm, this more subtle algorithm turns out to be O(n.log n) itself.

I then outline another, more complex algorithm, Manacher’s Algorithm that uses some clever reasoning to be able to skip ahead in the string as it works out where the longest palindrome may be. Since it just makes a single pass through the string (with the skipping), it is considered to be a O(n) algorithm; that is, linear in time.

From a simple recreational puzzle, the article goes through some unfamiliar computer science to get at a pretty remarkable algorithm.

This article first appeared in issue 322, June 2012.

You can read the PDF here.

(I used to write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK, which sadly is no longer published. The column was called Theory Workshop and appeared in the Make It section of the magazine. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so.)

Loading similar posts...   Loading links to posts on similar topics...

2 Responses

#1 Bruce said...
24-Jun-13 8:12 AM

I almost feel I should apologise for this pedantry before I start, so I will - sorry.

I think the example in the article has been nobbled by the printers. The article explicitly state that it will be strict and consider all characters as significant. The solution is given as "a racecar a" but the original sentence includes "a race car a". That is, the original sentence includes a space character between "race" and "car" that isn't palindromic, when including punctuation.

However, this didn't detract from the article which I, as always, found interesting. I really appreciate you re-publishing these articles. I find them a useful reminder that many problems already have interesting algorithmic solutions and there's no need to re-invent the wheel.

julian m bucknall avatar
#2 julian m bucknall said...
24-Jun-13 9:10 AM

Bruce: you're right. I'd spotted it when the article came out (and banged my head on the desk), but I'd forgotten about it when I wrote this post. I'd always had issues with mathematical expressions when they were typeset for the mag, but I thought I was immune from an expression in English being changed...

Cheers, Julian

Leave a response

Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.

  •  Emphasize with italics: surround word with underscores _emphasis_
  •  Emphasize strongly: surround word with double-asterisks **strong**
  •  Link: surround text with square brackets, url with parentheses [text](url)
  •  Inline code: surround text with backticks `IEnumerable`
  •  Unordered list: start each line with an asterisk, space * an item
  •  Ordered list: start each line with a digit, period, space 1. an item
  •  Insert code block: start each line with four spaces
  •  Insert blockquote: start each line with right-angle-bracket, space > Now is the time...
Preview of response