Understand how the floating point representation works and describe systematically (possibly using categories) all the possible problems that can happen. Try to classify the various issues and limitations (representation, comparison, rounding, propagation, approximation, loss of significance, cancellation, etc.) and provide simple examples for each of the categories you have identified.
How it works
Floating-point numbers are represented in computer hardware as base 2 (binary) fractions.
Since computer memory is limited, you cannot store numbers with infinite precision, no matter whether you use binary fractions or decimal ones, at some point you have to cut off.
The idea is to compose a number of two main parts:
- A significand that contains the number’s digits. Negative significands represent negative numbers.
- An exponent that says where the decimal (or binary) point is placed relative to the beginning of the significand. Negative exponents represent numbers that are very small (i.e. close to zero).
Nearly all hardware and programming languages use floating-point numbers in the same binary formats, which are defined in the IEEE 754 standard. The usual formats are 32 or 64 bits in total length.
So it allows us to represent very large numbers in a compact way, but also very small ones.
Issues
Representation
Representation error refers to the fact that some decimal fractions cannot be represented exactly as binary (base 2) fractions:
// 1/10 and 2/10 are not exactly representable as a binary
// fraction.
0.1 + 0.2
0.30000000000000004
Rounding
How already said, since floating-point numbers have a limited number of digits, they cannot represent all real numbers accurately: when there are more digits than the format allows, the leftover ones are omitted – the number is rounded.
Cancellation
There are two kinds of cancellation: catastrophic and benign.
Catastrophic cancellation occurs when the operands are subject to rounding errors.
The subtraction of two numbers did not introduce any error, but rather exposed the error introduced in the earlier multiplications.
Benign cancellation occurs when subtracting exactly known quantities. If x and y have no rounding error if the subtraction is done with a guard digit, the difference x-y has a very small relative error.
Guard Digits
One method of computing the difference between two floating-point numbers is to compute the difference exactly and then round it to the nearest floating-point number:
x = 2.15 × 1012
y = .0000000000000000125 × 1012
x – y = 2.1499999999999999875 × 1012 which rounds to 2.15 × 1012
But floating-point hardware normally operates on a fixed number of digits(cause is very expensive if the operands differ greatly in size):
x = 2.15 × 1012
y = 0.00 × 1012
x – y = 2.15 × 1012
The answer is exactly the same as if the difference had been computed exactly and then rounded.
Observation
Since many floating-point numbers are merely approximations of the exact value this means that for a given approximation f of a real number r there can be infinitely many more real numbers r1, r2, … which map to exactly the same approximation. Those numbers lie in a certain interval.
https://stackoverflow.com/questions/2100490/floating-point-inaccuracy-examples
This is an important phenomenon cause if you perform calculations on that number—adding, subtracting, multiplying, etc.—you lose precision. Every number is just an approximation, therefore you’re actually performing calculations with intervals.
There is a mathematical technique used to put bounds on rounding errors and measurement errors in mathematical computation:
Interval arithmetic (also known as interval mathematics, interval analysis, or interval computation)
A standard for interval arithmetic, IEEE Std 1788-2015, has been approved in June 2015. Two reference implementations are freely available. These have been developed by members of the standard’s working group: The libieeep1788 library for C++, and the interval package[ for GNU Octave.
Curiosity, real world example
https://en.wikipedia.org/wiki/Round-off_error#Real_world_example:_Patriot_missile_failure_due_to_magnification_of_roundoff_error
https://docs.python.org/2/tutorial/floatingpoint.html
https://floating-point-gui.de/errors/rounding/
https://floating-point-gui.de/formats/fp/
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
https://en.wikipedia.org/wiki/Interval_arithmetic#Application















