Testing for equality with floating-point types

published: Mon, 11-Jul-2005   |   updated: Thu, 21-Jul-2005
Helmsley Castle, N. Yorks

Every now and then, someone "discovers" that equality expressions using floating point types don't resolve into the correct boolean value. At least most of the time. (This is a requested reprint, updated, from my old Falafel posts.)

It should come as no surprise to anyone that decimal floating point values are held and manipulated as binary values in the processor, for exactly the same reasons as decimal integer values are held as binary integers: it's easier to design the silicon to manipulate them. I'm also sure everyone has read What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg, but just in case let's review some of the basics.

From our human (decimal-based) viewpoint a number like 0.1 is precise. We can add 10 of them together, one my one, and get the perfectly accurate result of 1.0. There's no round-off, there's no approximations. This doesn't surprise us, because we learned it all at school. Unfortunately although that may be true in our ten-fingered world, it's not in the computer's binary on/off world.

So what is decimal 0.1 in binary? Before we calculate that, let's review how to convert positive decimal integer values into binary. I'm sure you know that we divide continually by 2 and make a note of the remainder each time, until we reach zero as dividend. So to convert decimal 10 into binary, we divide it by 2 to get 5 with remainder 0. Write down the 0. Divide the 5 by 2 to get 2 remainder 1. Write down the 1 to the left of the previous 0 to make "10". Divide the 2 by 2 to get 1 and remainder 0. Write down the 0 to the left of the "10" we already have to make "010". Finally divide the final 1 by 2 to get 0 (which means stop) with remainder 1. Write this 1 to the left of our "010" to give the final answer "1010" as being the binary value of decimal 10. Easy.

(To those who'd forgotten their high-school math lessons: this also works for other bases as well. So 10 (base 10) is equal to 101 (base 3) and this answer is calculated in the same way, except you keep dividing by 3.)

Well converting decimal fractions into binary fractions is just as easy. Here, though, we multiply by 2 all the time. We look at the integer part of the answer (it'll be 0 or 1) and write that down after a binary radix point, resetting the answer's integer part to 0. So for 0.25, for example, we multiply by 2 to make 0.5. The integer part is 0, so we write 0 after a binary radix point: ".0". Multiply by 2 again and we get 1.0. The integer part is 1, so we write that to the right of what we have so far to make ".01". Reset the integer part of the dividend to 0, which makes 0., and this indicates to us that we stop. So, 0.25 decimal is 0.01 binary. (Notice that 0.01 in binary means zero halves plus one quarter, just like 0.01 in decimal means zero tenths plus one hundredth. Hence 0.01 binary = 1/4 decimal = 0.25.)

(And again this algorithm works for other bases: just keep multiplying by the base and make a note of the integer parts. 0.25 (base 10) is 0.02020202 recurring (base 3). Or, if you like: 1/4 == 2/9 + 2/81 + 2/729 + ...)

Back to 0.1 decimal now. What's this in binary? 0.1 times 2 gives 0.2, so our first binary digit is 0. 0.2 times 2 is 0.4, giving another 0 digit. 0.4 times 2 is 0.8: another zero. So far we have ".000" for our binary fraction. 0.8 times 2 is 1.6. This time we have a 1 digit to append (".0001") and we strip off this 1 from the dividend to give 0.6. 0.6 times 2 is 1.2, so we get another 1 digit for our fraction (".00011") and we strip off the 1 from the dividend to give 0.2. Before we go on, notice something. We've seen 0.2 before so we know what digits are now going to appear: we're in a never-ending loop. So, the answer we get for the binary representation of 0.1 decimal is 0.000110011...

This is an infinitely repeating fraction, with the "0011" repeating ad infinitum. There is no precise binary value that is equal to decimal 0.1, apart from this infinite fraction. Indeed, it can be shown that the only finite binary equivalents of decimal fractions are the sums of powers of 1/2. So, for example, 0.25, 0.5, 0.75, 0.625 can be converted to a precise finite binary fraction, but the majority of precise finite decimal fractions cannot. If you think about it, this is equivalent to saying that only sums of powers of 1/2 and 1/5 have finite decimal fractions; so for example, 1/7 has an infinite decimal representation. (But note that it has an exact representation in base 7: that is, 0.1.)

The big problem is that the floating point types we use, single, double, float, extended, are finite in size. In other words, to represent decimal 0.1 as a value of single type, we have to cut off this infinite binary fraction somewhere. In essence, the single type stores 24 significant binary digits from and including the very first 1 digit. So decimal 0.1 is stored in a single type as 0.000110011001100110011001101. (The final digit is a 1 because the compiler rounds it up for us in the conversion.) This is an approximation, remember, it's a little larger than the actual value. For those who really want to know, this binary approximation, when converted to decimal, is approximately 0.10000000149. And from this you can see that single is significant to roughly 7 or 8 decimal digits.

What happens when you add ten of these almost-there-but-not-quite approximations to decimal 0.1? Well, for a start, the processor will promote these single values to its internal type (known as extended in Delphi, but it doesn't exist in C#). This has 64 significant binary digits. Note that the FPU doesn't say "Whoa, dude, this is really decimal 0.1, so let's work out a more accurate representation with my 64 bits." Nope, it just moves the 24 bits from the single value, and adds 40 zeros on the end. It then does the addition to produce a 64-bit answer, and this then gets rounded back into 24 bits. If we're lucky we'll end up with 1.0, but in fact we end up with 1.0000001192 in decimal or 1.00000000000000000000001 in binary.

Of course, if we multiply our single value by 10 instead, we get the miraculous result of 1.0 exactly. Why? Because the calculation is with 64 significant bits in the FPU, producing one plus a single binary digit as the 27th significant bit, and this least significant bit is lost when the result is rounded back into a single. (The rounding the FPU does is to look at the first binary digit that will be lost -- here the 25th bit -- and if it is zero the FPU just truncates the answer to fit, if it is one, one is added to the previous binary digit.)

If you move a single value into a double value, you'll maintain the same accuracy as in the single, but of course, the answer you get won't be the same as the original value. (This applies to float in C# as well, float is the same as single). This is for essentially the same reason as above: a double has 53 significant binary digits.

So, let's assume we have the following code:

decimal A = 4.2M;
double B = 0.1F * 42F;
float C = 0.1F * 42F;
double D = 0.1D * 42D;

The variable A will hold exactly 4.2, because the decimal type has been designed to hold decimal values exactly. The decimal type in .NET is a scaled large integer and not really a floating point type at all. By scaled, I mean that values in a decimal instance are held as an integer together with a power-of-ten divisor. So, 4.2 is held as 42 with 1 as the power of ten divisor (in other words: 4.2 == 42 / 10^1), or as (42, 1) in some human readable array-like form. The exponent value can vary from 0 to 28, and the integer value is 96 bits long. The interesting thing about decimal is the number of different representations of a particular value: 4.2 can be represented as (42, 1) or (420, 2) or (4200, 3) and so on; no normalization is applied until the value is used in some expression. (See here for more information on decimal types.) Anyway, where were we...?

The second line is more problematic. The compiler will generate IL to set B to the hex value 0x4010CCCCD1000000 (approximately 4.20000006258488). Essentially what has happened here is that 0.1 was converted to a single value (as we saw above) and this was then translated to the internal representation for the FPU. Ditto for 42 (although note that this would be an exact conversion). The two numbers are multiplied and the result is rounded to fit into a double (actually no rounding takes place since the double type has enough room for all digits). It does not go through any "rounding to float" step.

In the third line, the result of the multiplication is rounded to fit into a float, losing some accuracy in the process.

The final statement converts the constants to doubles, promotes them to the internal FPU type, the multiplication takes place and the answer is rounded to fit into the variable D.

(By the way, looking at ILDASM or Reflector output here is misleading, since the disassembler converts the binary floating point value into a decimal floating point value for display. You are not looking at the real value being used in the IL. The only way to see exactly what's happening is with a hex viewer.)

So, in theory at least, we now have four variables all with the same value, 4.2. Except that they're not. If you compare any two of A, B, C, and D for equality, you will get false for all six possible tests. Huh?

We have A as exactly 4.2 for the reasons explained above. B is not quite equal to 0.1F * 42F, and is definitely not equal to 4.2F. C is equal to 0.1F * 42F, which due to rounding errors, is not equal to 4.2F. D is equal to 0.1D * 42D, which may or may not be equal to 4.2D (I haven't checked). Using this description in English, you can see why all of the tests for equality fail at this point. Marvelous.

To summarize, testing for equality between binary floating point values is doomed to nasty little bugs, and bugs that remain hidden because we generally only look at these values as strings that are the result of conversions to decimal base. You should never test for equality between two floating point values; instead you should check for the two values to be "close enough to be considered equal." What that closeness measurement is, is entirely up to you. (Essentially, subtract the two values, take the result's absolute value, and then test that to be less than some value you decide on, such as 0.000001. If it's less, the two original values are "equal", otherwise they are not.)

Another point is that you shouldn't mess around in converting between several different floating point types in your apps. Stick to one type, say double. Make sure all your floating point constants are suffixed with D. Of course, if you are mainly doing commercial or financial calculations, look at the decimal type. You won't suffer from these awkward round-off errors, but it's slower to use (since calculations are done in software, not hardware).

The final point is that if you really want to discover why equalities fail, be prepared to hunt for byte values in a hex viewer. Don't trust binary floating point numbers that have been converted to strings: they are only approximations.