Do not trust Floating Point: Unterschied zwischen den Versionen

Version vom 22. Dezember 2021, 12:46 Uhr

Introduction[Bearbeiten]

The essence of this article in one sentence is "Do not blindly trust any result when computing with floating point numbers".

This may sound strange, but you have to be aware all the time, that computations involving floating point numbers are always in danger of being inexact and sometimes completely (absurdly) wrong.

An Example[Bearbeiten]

Rump's Royal Pain¹⁾[Bearbeiten]

This example shows absurdly wrong results from a seemingly innocent floating point computation. In addition to generating a completely wrong result in double precision IEEE floats, it also does so when using longer precision (80bit or 128 bit).

Try to compute the following expression (in your favorite programming language):

333.75y⁶ + x²(11x²y² - y⁶ - 121y⁴ - 2) + 5.5y⁸ + x/(2y)

with:

 x = 77617 and y = 33096.

in Smalltalk, this could be written as:

x := 77617.
y := 33096.

(333.75 * (y ** 6)) 
+ ((x ** 2)
   * ((11 * (x ** 2) * (y ** 2))
      - (y ** 6)
      - (121 * (y ** 4))
      - 2))
+ (5.5 * (y ** 8))
+ (x / (2 * y))

or in JavaScript as:

x = 77617;
y = 33096;

(333.75 * (y ** 6)) 
+ ((x ** 2)
   * ((11 * (x ** 2) * (y ** 2))
      - (y ** 6)
      - (121 * (y ** 4))
      - 2))
+ (5.5 * (y ** 8))
+ (x / (2 * y))

when evaluated, the result will be 1.18059162071741e+21. Even when using higher precision IEEE floats (by using "x := 77617q" and "y := 33096q"), we get wrong results.
This is completely incorrect. The correct (approximated) result is: -0.8273960599..

So not even the sign is correct, but the IEEE value is off by 21 orders of magnitudes!

In Smalltalk/X, you can compute this by using large precision floats; simply write:

x := 77617QL.
y := 33096QL.
...

to get:

-0.827396059946821368141165095479816291999033115784384819

By the way, in expecco, the above expression delivers the following results as per precision used:

Precision		Result
IEEE Single	x := 77617f. y := 33096f.	-1.18059162071741e+21	WRONG
IEEE Double	x := 77617. y := 33096.	1.18059162071741e+21	WRONG
IEEE Extended	x := 77617q. y := 33096q.	576460752303423489.2	WRONG
IEEE Quadruple	x := 77617Q. y := 33096Q.	1.17260394005317863185883490452	WRONG
IEEE Octuple	x := 77617QO. y := 33096QO.	-0.8273960599468213681411650...	OK
QDouble	x := 77617QD. y := 33096QD.	-0.8273960599468213681327577...	ALMOST
LargeFloat	x := 77617QL. y := 33096QL.	-0.8273960599468213681411650...	OK

1) see Rump's Pain in books.google

@@ Zeile 8: / Zeile 8: @@
 === Rump's Royal Pain<sup>1)</sup> ===
+This example shows absurdly wrong results from a seemingly innocent floating point computation. In addition to generating a completely wrong result in double precision IEEE floats, it also does so when using longer precision (80bit or 128 bit).
-) see [https://books.google.de/books?id=ZqnuDwAAQBAJ&pg=PT244&lpg=PT244&dq=rumps+royal+pain&source=bl&ots=mQeqSQWHtM&sig=ACfU3U3DLU3vpu4yMlx99xFDeorzlXLGEA&hl=de&sa=X&ved=2ahUKEwiEhIjmlPL0AhVdS_EDHSw8A10Q6AF6BAgPEAM#v=onepage&q=rumps%20royal%20pain&f=false Rump's Pain in books.google]
 Try to compute the following expression (in your favorite programming language):
@@ Zeile 45: / Zeile 45: @@
 when evaluated, the result will be 1.18059162071741e+21.
+Even when using higher precision IEEE floats (by using "x := 77617q" and "y := 33096q"), we get wrong results.
 <br>
-- which is completely incorrect. The correct (approximated) result is: -0.8273960599..
+This is completely incorrect. The correct (approximated) result is: -0.8273960599..
-So not even the sign is correct, but the IEEE value is off by 21 orders of magnitudes!
+So not even the sign is correct, but the IEEE value is off by '''21 orders of magnitudes'''!
 In Smalltalk/X, you can compute this by using large precision floats; simply write:
@@ Zeile 56: / Zeile 57: @@
 to get:
  -0.827396059946821368141165095479816291999033115784384819
+By the way, in expecco, the above expression delivers the following results as per precision used:
+{| class="wikitable"
+|Precision
+|
+|Result
+|-
+|IEEE Single
+|x := 77617f.<br>y := 33096f.
+| -1.18059162071741e+21
+|WRONG
+|-
+|IEEE Double
+|x := 77617.<br>y := 33096.
+| 1.18059162071741e+21
+|WRONG
+|-
+|IEEE Extended
+|x := 77617q.<br>y := 33096q.
+| 576460752303423489.2
+|WRONG
+|-
+|IEEE Quadruple
+|x := 77617Q.<br>y := 33096Q.
+| 1.17260394005317863185883490452
+|WRONG
+|-
+|IEEE Octuple
+|x := 77617QO.<br>y := 33096QO.
+| -0.8273960599468213681411650...
+|OK
+|-
+|QDouble
+|x := 77617QD.<br>y := 33096QD.
+| -0.8273960599468213681327577...
+|ALMOST
+|-
+|LargeFloat
+|x := 77617QL.<br>y := 33096QL.
+| -0.8273960599468213681411650...
+|OK
+|}
+) see [https://books.google.de/books?id=ZqnuDwAAQBAJ&pg=PT244&lpg=PT244&dq=rumps+royal+pain&source=bl&ots=mQeqSQWHtM&sig=ACfU3U3DLU3vpu4yMlx99xFDeorzlXLGEA&hl=de&sa=X&ved=2ahUKEwiEhIjmlPL0AhVdS_EDHSw8A10Q6AF6BAgPEAM#v=onepage&q=rumps%20royal%20pain&f=false Rump's Pain in books.google]

Do not trust Floating Point: Unterschied zwischen den Versionen

Version vom 22. Dezember 2021, 12:46 Uhr

Introduction[Bearbeiten]

An Example[Bearbeiten]

Rump's Royal Pain¹⁾[Bearbeiten]

Navigationsmenü

Meine Werkzeuge

Namensräume

Varianten

Ansichten

Mehr

Suche

Navigation

Werkzeuge

Drucken/exportieren

Do not trust Floating Point: Unterschied zwischen den Versionen

Version vom 22. Dezember 2021, 12:46 Uhr

Introduction[Bearbeiten]

An Example[Bearbeiten]

Rump's Royal Pain1)[Bearbeiten]

Navigationsmenü

Suche

Rump's Royal Pain¹⁾[Bearbeiten]