Numeric Limits/en: Unterschied zwischen den Versionen
Cg (Diskussion | Beiträge) |
Cg (Diskussion | Beiträge) |
||
Zeile 99: | Zeile 99: | ||
You will loose a cent/penny here and there, if you use floats/doubles on big budgets. |
You will loose a cent/penny here and there, if you use floats/doubles on big budgets. |
||
== Limited Range of Float and Double Numbers == |
=== Limited Range of Float and Double Numbers === |
||
Floating point numbers also have a limited range. |
Floating point numbers also have a limited range. |
Version vom 3. Dezember 2020, 10:54 Uhr
This page provides some computer science basics, which are not specific to expecco. However, in the past some users encountered problems and it is useful to provide some insight on number representations.
Expecco supports arbitrary precision integer arithmetic, arbitrary precision fractions and limited precision floating point numbers.
Inhaltsverzeichnis
Integer Arithmetic[Bearbeiten]
For integer operations, there is no overflow or error in the result for any legal operation. I.e. operations on two big numbers deliver a correct result.
This is a feature of the underlying Smalltalk runtime environment and in contrast to many other programming languages (especially. Java and C) which provide int (usually 32bit) and long (usually 64bit):
4294967295 "(0xFFFFFFFF)" + 1 -> 4294967296 "(0x100000000)"
18446744073709551615 "(0xFFFFFFFFFFFFFFFF)" + 1 -> 18446744073709551616 "(0x10000000000000000)"
Very large values can be computed:
10000 factorial -> a huge number beginning with: 284625968091705451890641321211986889014....
When dividing integers, the "/" operator will deliver an exact result, possibly as a fraction:
5 / 3 -> 5/3
and reduce the result (possibly returning an Integer):
(5/3)*(3/2) -> 5/2 (5/3)/(3/2) -> 10/9 1000 factorial / 999 factorial -> 1000
whereas the truncating division "//" will deliver an integer, truncated towards negative infinity (i.e. the next smaller integer), which is what you'd get in Java or C:
5 // 3 -> 1 -5 // 3 -> -3
The modulo operator "\\" provides the remainder, such that:
(a // b) + (a \\ b) = a
There is also a truncating division operator which truncates towards zero, and a corresponding remainder operator, for which:
(a quo: b) + (a rem: b) = a
For positive a and b, the two operator pairs deliver the same result. For negative arguments, these are different. Be aware and think about the domain of your arguments.
In addition, the usual ceiling, floor and rounding operations are available (both on fractions and on limited precision reals):
(5 / 3) ceiling -> 2 (5 / 3) floor -> 1 (5 / 3) rounded -> 2 (5 / 3) roundTo: 0.1 -> 1.7 (5 / 3) roundTo: 0.01 -> 1.67
(-5 / 3) ceiling -> -1 (-5 / 3) floor -> -2 (-5 / 3) rounded -> -2 (-5 / 3) roundTo: 0.1 -> -1.7
Inexact Float and Double Numbers[Bearbeiten]
Floating point numbers are inherently inexact. This is not a problem specific to expecco, but inherent to the way floating point numbers are represented (in the CPU). See "What Every Computer Scientist Should Know About Floating-Point Arithmetic" Floating point numbers are represented as a sum of powers of 2 (actually 1/2 + 1/4 + 1/8+...) multiplied by 2 raised by an exponent (i.e. mantissa * 2 exponent) with the mantissa being above sum. And many numbers (most actually) cannot be exactly represented by a finite sum of powers of 2. Such numbers will have an error in the last significant bit (actually half the last bit).
The situation may be relaxed slightly, by using more bits for the mantissa (and expecco gives you a choice of 32bit floats (called "ShortFloat"), 64bit floats ("Float") and 80bits ("LongFloat") which are mapped to corresponding IEEE floats (single, double and extended).
However, even with more bits, the fundamental restriction remains (although appearing less frequently with hight precision).
Floating Point Errors Propagate[Bearbeiten]
Such errors will even accumulate, with every math operation performed on it.
For example, the decimal 0.1 cannot be exactly represented as a floating point number, and is actually 0.099999... with an error in the last bit. Adding this multiple times will result in a larger error in the final result:
1.0 - (0.1 + 0.1 + 0.1 ...(10 times)... + 0.1) -> 1.11022302462516E-16
The print functions will usually try to compensate for an error in the last bit(s), showing "0.1" although in reality, it is "0.9999...". Thus, even though the printed representation of such a number might look ok, it will inject more and more error when the value is used in further operations (until print will no longer be able to cheat, and the error becomes visible).
This is especially inconvenient, when monetary values are represented as floats and a final sum is wrong in the cent value.
As an example for this, try to sum an array consisting of 10 values:
(Array new:10 withAll:0.1) sum printString
which results in "1.0"due to print's cheating,
whereas:
(Array new:100 withAll:0.1) sum printString
will show '9.99999999999998' (i.e. the error accumulated to a value too big for print's cheating to compensate).
In contrast to Float, expecco (actually the underlying Smalltalk) provides two number representations which are better suited for such computations: Fraction and FixedPoint (in other systems, these are also called "ScaledDecimal").
Both are keep internally exact fractions, but use different print strategies: Fractions print as such (i.e. '1/3', '2/17' etc.) whereas FixedPoint numbers print themself as decimal expansion (i.e. '0.33' or '0.20'). FixedPoint constant numbers can be entered by using 's' instead of 'e' (i.e. '1.23s' defines a fixedPoint and '1.23s4' which will print 4 valid digits after the decimal point)
No such rounding errors are encountered, if fractions are used:
1 - ((1/10) + (1/10) + (1/10) ...(10 times)... + (1/10)) -> 0
or if FixedPoint numbers are used:
(Array new:100 withAll:0.1s) sum printString -> '10.0'
Floating Point Number Comparison[Bearbeiten]
Be aware of such errors, and do not compare floating point numbers for equality/inequality. Instead either use range-compares and/or use the special "compare-almost-equal" functions, where the number of bits of acceptable error can be specified. Expecco provides such functions both for elementary code and in the standard action block library.
Also, for this reason, do not compute money values using floats or doubles. Instead, use instances of FixedPoint. You will loose a cent/penny here and there, if you use floats/doubles on big budgets.
Limited Range of Float and Double Numbers[Bearbeiten]
Floating point numbers also have a limited range. In expecco, the default float format is IEEE double precision format (called "Float" in expecco). Floating numbers with absolute value greater than 1.79769313486232E+308 will lead to a +INF/-INF (infinite) result, and numbers with absolute value smaller than 2.2250738585072E-308 will be zero. For IEEE single precision floats (called ShortFloat in expecco), the range is much smaller, and for IEEE extended precision (called LongFloat in expecco), the range is larger.
You can ask the classes for their limits, with fmin and fmax:
Float fmin -> 2.2250738585072E-308 Float fmax -> 1.79769313486232E+308
ShortFloat fmin -> 1.175494e-38 ShortFloat fmax -> 3.402823e+38
LongFloat fmin -> 3.362103143112093506E-4932 LongFloat fmax -> 1.189731495357231765E+4932
As a consequence, you cannot (using floats) compute very large numbers;
for example you cannot compute the number of decimal digits of huge numbers with floats:
10000 factorial asFloat log10 -> INF
whereas the integer computation does it:
10000 factorial log10 -> 35659.454274
This is not a problem specific to expecco,
but inherent to the way floating point numbers are represented. [https://docs.oracle.com/
Trigonometric and other Math Functions[Bearbeiten]
Trigonometric and other math functions (sqrt, log, exp) will first convert the number to a limited precision real number (a C-double), and therefore may have a limited input value range and also generate inexact results.
For example, you will not get a valid result for:
10000 factorial sin
because it is not possible to represent that large number as real number. (expecco will signal a domain error, as the input to sin will be +INF)
Also, the result from:
(9 / 64) sqrt
will be the (inexact) 0.375 (a double), instead of the exact 3/4 (a fraction). (this might change in a future release and provide exact results if both numerator and denominator are perfect squares)
Different Results on Different CPUs[Bearbeiten]
Since floating point arithmetic is performed by the underlying CPU hardware, different results (in the least significant bit) can be returned from math operations on different CPUs or even different versions (steppings) of the same CPU architecture. This applies especially to trigonometric and other math functions. Be prepared for this, and use the "almost-equal" comparison functions when results are to be verified.
Higher Precision Numbers[Bearbeiten]
Expecco supports various inexact real formats, with different precision (i.e. number of mantissa bits):
Name overall exponent mantissa decimal Smalltalk size size size precision class name IEEE single precision floats 32 bit 8 24 6 ShortFloat IEEE double precision floats 64 bit 11 53 15 Float IEEE extended prec. floats. 80/128 bit 15 64/112 19/34 LongFloat (1) quad double 4*64 bit 11 200 60 QDouble (2) IEEE arbitrary any any any any IEEEFloat. (2) arbitrary any any any any LargeFloat (2)
(1) on x86/x64 machines, LongFloats are represented as 80bit extended floats with 64bit mantissa; on other CPUs, these might be represented as 128bit quadFloats with 112 bit mantissa.
(2) these are currently been developed and provided as a preview feature without warranty (meaning: they may be buggy at the moment; let us know if you need them).
Notice that the use of any but double precision floats (which are directly supported by the machine) comes at a price. The speed of operations degrades from double -> single -> extended -> quad double -> largeFloat.