JavaScript for C# programmers: refactoring the expression evaluator

Another in the series in learning JavaScript from the viewpoint of a C# programmer, using Firebug as our test engine.

In this episode, we take the functioning expression evaluator from the last post and clean it up JavaScript style.

Wrap in an object

The first step is to wrap this global-properties-all-over-the-place code tidily in a single global object. This happens to be quite easy, and I used an auto-executing function to create a single object called expressionEvaluator that has a single method called exec. At the top was this:

var expressionEvaluator = function() {

And then at the bottom of the original code, I replaced the old evaluateExpression method with this code:

    return {
        exec: function(expression) {
            var state = makeParserState(expression);
            var result = parseExpression(state);
            if (result.failed) { return NaN; }
            return result.expression.evaluate();
        }
    };
}();

As you can see the function is automatically executed (the execution () operator on the last line), and it returns an object (using the object literal syntax) that has one property, a method called exec. Everything else in this object is private, hidden from view from the closure created by the auto-executing function; it is a black box, and we are free to modify it completely so long as we retain the exec method.

I can't stress this enough: these patterns (closures and auto-executing functions to create other objects/functions) are very common in JavaScript. You might call these patterns idiomatic JavaScript. I'll agree that they can be hard to spot — heck, it's only until 200 lines later that you realize that expressionEvaluator is not a function after all but an object returned by a function that's executed immediately — but it's worth learning the pattern and how the natives speak.

So with this change we can check off that particular bug: we have one global object and no longer a dozen or so. The code to call it hasn't changed that much, and it's easy to see that everything still works properly:

console.log(expressionEvaluator.exec("1+2")); // outputs 3
console.log(expressionEvaluator.exec("1+2*3")); // outputs 7
console.log(expressionEvaluator.exec("((1+2)*(3-4))+((5*6)-(7/8))")); // outputs 26.125

State machine

Now we have a single object (since the auto-executing function is only run once, there can only ever be a single object, so it's essentially a singleton), we can get rid of the state object (there was only ever one of those too), and place all its data as fields of our evaluator object. This means we can get rid of the getCurrent method and generally clean things up.

    // State maintenance
    var expr, // the original expression string
        ch,   // the current character
        at,   // where we're at in the expression string
        advance = function() {
            ch = expr.charAt(++at);
        },
        skipWhite = function() {
            while (ch && (ch <= ' ')) { advance(); }
        },
        initialize = function(expression) {
            expr = expression;
            at = -1;
            advance();
        };

I've written this as a connected set of properties, connected in the sense of a series of variable declarations separated with commas; it's another JavaScript idiom that you may come across. The reason is that JavaScript is often minimized before use, that is, all comments and unnecessary white space is removed to make the interpreter work more efficiently. Since a comma is one character, it's often used like this instead of having a set of var declarations, which is more verbose. Notice I've added a skipWhite method to skip over any white space.

The first statement in exec now becomes this:

initialize(expression);

...but we're in a load of hurt because everything expects a state object. Nothing works. Let's forge on.

The tokens

Next up is the whole rpnToken thing, with its encapsulation-breaking isToken and isOperator. It's just nasty, people. It's a poor man's way of creating something like the is operator in C#. I hang my head in shame, but at least I could say I did it on purpose so I could show how to refactor it. Yeah, that's the reason. Anyway...

The thing to do here is to push down into the rpnToken object hierarchy the functionality that's being exposed at a higher level. For example, the isOperator function is only used in one place: when an RPN expression is being evaluated:

if (token.isOperator()) {
    var rhs = stack.pop();
    var lhs = stack.pop();
    stack.push(token.evaluate(lhs, rhs));
}
else {
    stack.push(token.value);
}

Better would be to get rid of this altogether, by declaring an evaluate method on the number token, and then making both evaluate functions accept a stack parameter. This way the operator token would pop, pop, calculate, push, and the number token would just push. But, at this higher level, all we'd do is call token.evaluate(stack); and be done with it. Nice one. Even better, we can now create a unaryOperator token as well that will pop, negate, push for unary minus, and nothing much at all for unary plus.

Having done that, let's look at the whole RPN expression thing. I wrote it originally to be an exact representation of how you'd write down an RPN expression: string of simple tokens, some of them numbers, some of them operators. But we don't have to be that literal, I could rephrase the definition to be recursive: an RPN expression is one or more operands, followed by an operator. The operator will determine how many operands there are. An operand could either be a number or another RPN expression.

Whoa. An RPN expression is then merely a token in another RPN expression. To evaluate an RPN expression, we evaluate its operands, and then its operator. This is great: we now have four types of tokens: number, unary operator, binary operator, RPN expression. They will all have an evaluate method.

    // Token hierarchy
    // ...base object
    var rpnToken = {
        evaluate: function(stack) {
            throw {
                type: "Abstract",
                message: "evaluate is an abstract method"
            };
        }
    };
    // ...number object
    var makeNumber = function(value) {
        var token = Object.create(rpnToken);
        token.evaluate = function(stack) { stack.push(value); };
        return token;
    };
    // ...unary operator object
    var makeUnaryOp = function(eval) {
        var token = Object.create(rpnToken);
        token.evaluate = function(stack) {
            stack.push(eval(stack.pop()));
        };
        return token;
    };
    // ...binary operator object
    var makeBinaryOp = function(eval) {
        var token = Object.create(rpnToken);
        token.evaluate = function(stack) {
            var rhs = stack.pop();
            var lhs = stack.pop();
            stack.push(eval(lhs, rhs));
        };
        return token;
    };

    // ...the pre-built operators
    var operator = {
        "unary-": makeUnaryOp(function(value) { return -value; }),
        "unary+": makeUnaryOp(function(value) { return +value; }),
        "+": makeBinaryOp(function(lhs, rhs) { return lhs + rhs; }),
        "-": makeBinaryOp(function(lhs, rhs) { return lhs - rhs; }),
        "*": makeBinaryOp(function(lhs, rhs) { return lhs * rhs; }),
        "/": makeBinaryOp(function(lhs, rhs) { return lhs / rhs; })
    };

I've left out the one for the RPN expression for a moment, but look at how all of these functions create objects that, first, descend from rpnToken and, second, depend implicitly on closures. Taking the number token as an example, the evaluate method uses the value parameter. The only way it gets that is through the closure since the execution of makeNumber will have long been completed by the time evaluate will run. Notice I've also explicitly made rpnToken.evaluate an abstract method by throwing an exception if it's ever called.

The RPN expression creation function is a little special:

    // ...RPN expression object
    var makeExpression = function() {
        var expr = arguments;

        var token = Object.create(rpnToken);
        token.evaluate = function(stack) {
            for (var i = 0; i < expr.length; i++) {
                expr[i].evaluate(stack);
            }
        };
        return token;
    };

It looks like it's been declared so that it accepts no parameters. Not quite. It's going to be called with either two parameters (operand, operator) for a unary operator, or three parameters (operand1, operand2, operator) for a binary operator. To get around this in C#, we'd have to either write overloaded methods or use parameter arrays, but in JavaScript we're merely going to use the arguments array and copy it to a local variable called expr. Later on, in the evaluate method, we're going to evaluate each argument in sequence as I discussed above and we'll use the copied arguments array provided by the closure.

Parsing functions

We're now ready for some parsing action.

    // parse a binary operator (either the adds or the multiplys)
    var parseOperator = function(ops) {
        skipWhite();
        if ((ch === ops[0]) || (ch === ops[1])) {
            var op = operator[ch];
            advance();
            return op;
        }
        return null;
    };

This method parses a binary operator. The operators come in pairs (plus/minus and multiply/divide) so I wrote a generic method that gets passes a two-element array containing the operator characters in question. Notice that now the parsing functions are returning an RPN expression and not one of the success/fail result objects (they're gone, toast). If there's a failure we just pass back null.

Next up is parsing a parenthesized expression:

    // parse a parenthesized expression
    var parseParens = function() {
        advance();

        var result = parseExpression();
        if (!result) { return null; }

        skipWhite();
        if (ch !== ')') { return null; } // missing right paren
        advance();

        return result;
    };

It is assumed here that the function will be called with the state machine already positioned on the opening parenthesis (which is how it's done in the only place it's called), so we can just move past it. This function also has an identifiable error: the missing right parenthesis. Later on, we can hook up an error function here to report that back to the caller of the expression evaluator, but for now we'll just note it as a comment.

Parsing a number:

    // parse a number
    var parseNumber = function() {
        var value = '';
        while (('0' <= ch) && (ch <= '9')) {
            value += ch;
            advance();
        }

        if (ch === '.') {
            value += '.';
            advance();
            while (('0' <= ch) && (ch <= '9')) {
                value += ch;
                advance();
            }
        }

        if (!value) { return null; } // number missing
        var number = +value; // force conversion to number type
        if (isNaN(number)) { return null; } // invalid number
        return makeNumber(number);
    };

Again we have a couple of identifiable errors (a number is expected but is missing, a number couldn't be converted from the string); again points at which we can add an error function.

Parsing a factor:

    // parse a factor (either expression in parentheses or number)
    var parseFactor = function() {
        skipWhite();
        if (ch === '(') { return parseParens(); }
        return parseNumber();
    };

As you can see, when parseParens is called, the state machine is pointing at the open parenthesis.

Now parsing a unary expression (yes, I added them):

    // parse a unary expression (factor or +/- factor)
    var parseUnaryExpr = function() {
        skipWhite();
        if ((ch === '-') || (ch === '+')) {
            var op = operator["unary" + ch];
            advance();
            var operand = parseFactor();
            if (!operand) { return null; }
            return makeExpression(operand, op);
        }
        return parseFactor();
    };

Notice how I index into the operator array to differentiate the unary minus and plus from their binary brethrens. Also I could "throw away" the unary plus here, if I wanted: it has no effect when evaluating an expression. Also note that this is where I call makeExpression with only two parameters for the unary operators.

And finally the remaining parser functions.

    // parse a binary expression (operand operator operand)
    var parseBinaryExpr = function(parseOperand, operators) {
        var operand = parseOperand();
        if (!operand) { return null; }
        var rpn = operand;

        var operator = parseOperator(operators);
        while (operator) {
            operand = parseOperand();
            if (!operand) { return null; }
            rpn = makeExpression(rpn, operand, operator);
            operator = parseOperator(operators);
        }

        return rpn;
    };
    // parse a term (unaryexpression multop unaryexpression)
    var parseTerm = function() {
        return parseBinaryExpr(parseUnaryExpr, ['*', '/']);
    };
    // parse an expression (term addop term)
    parseExpression = function() {
        return parseBinaryExpr(parseTerm, ['+', '-']);
    };

parseBinaryExpr does most of the work: it's called with a function that parses an operand, and with an array containing the binary operators to look out for. makeExpression is called with three parameters here.

After all these changes, we can now look at the refactored exec function:

        exec: function(expression) {
            initialize(expression);

            var result = parseExpression();
            if ((!result) || (ch !== '')) { return NaN; } // badly terminated expression

            var stack = [];
            result.evaluate(stack);
            return stack.pop();
        }

Here we see the final recognizable error where we've parsed the expression but there's still more left (an example would be "(1+2)3"). We also see where the stack gets created, the RPN expression evaluated, leaving the result on the stack, which can then be popped and returned.

Summary

Now that we've seen a non-trivial conversion of a C# project to JavaScript, what conclusions can we derive?

The first thing is closures are important. Really important. You have to understand closures and how they work because you'll see them all over the place. In this small example, we have the large closure that creates the expressionEvaluator object (where only one property is public and everything else, the vast majority of the code in fact, is private and hidden); we have the smaller closures that create the token objects.

Second, classical class hierarchies are not as important as in a classical class language like C#. I had to work hard to even get one: the tokens hierarchy. In fact, we could remove the base token object altogether and the evaluator would work just as well. The reason for this is of course JavaScript only tries to evaluate a property at run-time, there's no need to worry about it at coding time. If every token object has an evaluate method, it doesn't matter whether they're descended from a base object with that method, or whether they all independently define one, JavaScript will just call it. In fact, although this example didn't show it particularly, creating an object with some behavior (such as the expressionEvaluator object) is virtually a zero-energy exercise, compared with C#, where we have to write a full-blown class first.

Third, functions are objects. Create them when you need them, pass them around like candy. In this example, the operator tokens are created with function objects that know how to evaluate the operator.

Fourth, learn the idiom. In this example, I had only a few examples. The first and biggest, is the auto-executing function that creates the whole evaluator object in the first place (those two parentheses at the end are easy to miss when you're scanning code). Second, learn what values evaluate to false in a conditional expression. In this code, I was using the fact that null evaluates to false (an example: if (!operand) instead of writing the more long-winded if (operand === null), or the code in skipWhite which checks for an empty character).

Fifth, avoid global object pollution. My original code had umpteen methods created on the global object, the code in this post, just one. Like any language, global variables and methods are bad.

Album cover for (the rest of) New Order Now playing:
New Order - Confusion [Pump Panel Reconstruction Mix]
(from (the rest of) New Order)


Loading similar posts...   Loading links to posts on similar topics...

7 Responses

 avatar
#1 Jamie said...
08-Mar-09 4:41 PM

Thanks Julian. Enjoying this series, suitably detailed and nicely explained. Some subtleties that I've never really got my head around before!

julian m bucknall avatar
#2 julian m bucknall said...
08-Mar-09 4:52 PM

Jamie: Glad you're enjoying it. Lots more to come...

Cheers, Julian

#3 Dew Drop – March 9, 2009 | Alvin Ashcraft's Morning Dew said...
09-Mar-09 5:37 AM

Pingback from Dew Drop – March 9, 2009 | Alvin Ashcraft's Morning Dew

#4 JavaScript for C# programmers: refactoring the expression evaluator : Algorithms for the masses - Julian M Bucknall said...
09-Mar-09 6:07 PM

Thank you for submitting this cool story - Trackback from DotNetShoutout

 avatar
#5 Web & SEO developer indonesia said...
27-Nov-09 11:51 AM

Great post. Thanks for the code.

 avatar
#6 santaClara said...
02-Apr-11 2:48 AM

very good article(s) ... hence I hope you don't mind me posting typos {

a evaluate; an conditional ;descended form;

}

julian m bucknall avatar
#7 julian m bucknall said...
02-Apr-11 8:26 AM

santaClara: Thanks for these edits, much appreciated. The "form"/"from" thing is one of those typos my fingers just make on their own. It's like when I'm tired I type "calls" instead of "class": my right hand is way ahead of my left, or something.

Cheers, Julian

Leave a response

Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.

  •  Emphasize with italics: surround word with underscores _emphasis_
  •  Emphasize strongly: surround word with double-asterisks **strong**
  •  Link: surround text with square brackets, url with parentheses [text](url)
  •  Inline code: surround text with backticks `IEnumerable`
  •  Unordered list: start each line with an asterisk, space * an item
  •  Ordered list: start each line with a digit, period, space 1. an item
  •  Insert code block: start each line with four spaces
  •  Insert blockquote: start each line with right-angle-bracket, space > Now is the time...
Preview of response