I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there's an Xmas issue as well — so it's a bit more than monthly). The column is called Theory Workshop and appears in the back of every issue. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. After all, the PDFs do appear on each issue's DVD after a couple of months.
This time around I decided to talk about how regular expressions work and how they're implemented. As I mention in the article, regular expressions still fascinate me after many years.
I suppose this all goes back to my fascination with parsers. I like writing parsers, they're hard but fun. I think the ability to write a functional parser (doesn't have to be pretty) is the entryway into becoming a better programmer. After all the main tool we use to create or applications is a compiler: a parser with a specific backend. I'm not saying that the parser you write should be all-encompassing, merely that the act of writing one gives you deeper insight into all kinds of programming ideas and issues. You have to decide whether to be bottom-up or top-down, recursive or not. You have to use trees and other basic data structures. You are learning by doing.
I've now written a couple of regular expression parsers and have written the code to apply them to a string to be matched. Were they the best regex compilers ever written? Heck, no. But I now understand the mechanics of parsing a regex, why certain regexes are very fast and why others would take centuries to complete. I understand about backtracking, and the different ways to implement it. And so on, so forth.
I used this paper as a reference: it reactivated some of the fascination and enthusiasm.
This article first appeared in issue 264, January 2008.
You can download the PDF here.
Now playing:
Daho, Etienne - La Baie
(from Corps et Armes)
No Responses
Feel free to add a comment...
Leave a response
Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.
_emphasis_
**strong**
[text](url)
`IEnumerable`
* an item
1. an item
> Now is the time...
Preview of response