Floating-point rounding error

Tharaka Dissanayake
3 min readJun 25, 2022

--

Sometimes the way of calculating numbers by person and computer may differ. The following examples will not give expected answers.

Example1:

Example1 problem

This program should be stopped after 99 times run. It is the expectation, but this code will not stop. It is a never-ending loop.

Result of example-1

Example2:

Example2 problem

The expected output should be 99.99, but the actual result is 100.

Result of example-2

Why is it happening? It is called the floating-point error.

IEEE 754 standard is used to represent the floating points on computers.

Double precision IEEE 754 Floating-Point Standard

Floating-point representation with IEEE 754 standard.

The following is an example of converting 9.1 into the IEEE 754 standard.

Step1:

Converting the 9.1 into binary format.

9.1 -> 1001. 00011001100110011001100…

Step2:

Converting the binary into the scientific format.

To convert the binary format into the scientific format, we need to move the decimal point into the first point binary number. So, we need to move the three decimal points, which should be multiplied by 2^3(because we carry the three decimal points) to convert our example.

9.1 -> 1.0010001100110011001100… × 2 ³

Step3:

Converting the scientific format into the IEEE754 standard.

If converting number is negative, then the sign number will be 1. If converting number is a positive sign number will be 0. (9.1 is positive therefore example’s sign bit is 0)

Exponent value is 23 = 8. So, exponent can have 28 bits (-128 -> +127). 127 is called as exponent bias. Our exponent value must be 127 + 3 =130.

130 = 100000102

Mantissa value will be 0010001100110011001100

The following number is the IEEE754 representation of the 9.1.

0 10000010 00100011001100110011001100

The above example has 23 mantissa points. The computer looks at the 24th value of the mantissa. If the 24th value is 1, it will be added to the 23rd position value. If the 24th value is 0, it does not affect the actual number.

In our example, the 24th position value is 1. So 1 will be added to the 23rd position value. So the actual number that the computer read is 0 10000010 00100011001100110011010

Its actual decimal value is 9.10000038. But initially, we converted 9.1. It is called floating-point rounding error.

--

--