Numeric Limits/en: Unterschied zwischen den Versionen

Aus expecco Wiki (Version 2.x)
Zur Navigation springen Zur Suche springen
Zeile 10: Zeile 10:
I.e. operations on two big numbers deliver a correct result.
I.e. operations on two big numbers deliver a correct result.


This is a feature of the underlying Smalltalk runtime environment and in contrast to many other programming languages (especially: Java and C) which provide int (usually 32bit) and long (usually 64bit) integer types.<br>In expecco, you can write both in Smalltalk and in the builtin JavaScript syntax:
This is a feature of the underlying Smalltalk runtime environment and in contrast to many other programming languages (especially: Java and C) which provide int (usually 32bit) and long (usually 64bit) integer types.<br>In expecco, you can write both in Smalltalk and in the builtin JavaScript syntax (1):

2147483647"(0x7FFFFFFF)" + 1
-> 2147483648 "(0x80000000)"


4294967295 "(0xFFFFFFFF)" + 1
4294967295 "(0xFFFFFFFF)" + 1
Zeile 30: Zeile 33:


Hint: b.t.w. therefore, you can use a [[Tools_Notepad/en |Workspace (Notepad) window]] as a calculator with arbitrary precision.
Hint: b.t.w. therefore, you can use a [[Tools_Notepad/en |Workspace (Notepad) window]] as a calculator with arbitrary precision.

1) be aware that this only computes correct results if the elementary action is written in either Smalltalk or in the builtin JavaScript syntax. Depending on the version, it may or may not be correct, if using Java/Groovy, Python, C/C++, Node.js etc.


== Exact Fractions, ScaledDecimals and FixedDecimals ==
== Exact Fractions, ScaledDecimals and FixedDecimals ==

Version vom 13. Oktober 2021, 14:23 Uhr

This page provides some computer science basics, which are not specific to expecco. However, in the past some users encountered problems and it is useful to provide some insight on number representations.

Expecco supports arbitrary precision integer arithmetic, arbitrary precision fractions and limited precision floating point numbers.

Exact Integer Numbers[Bearbeiten]

For integer operations, there is no overflow or error in the result for any legal operation. I.e. operations on two big numbers deliver a correct result.

This is a feature of the underlying Smalltalk runtime environment and in contrast to many other programming languages (especially: Java and C) which provide int (usually 32bit) and long (usually 64bit) integer types.
In expecco, you can write both in Smalltalk and in the builtin JavaScript syntax (1):

2147483647"(0x7FFFFFFF)" + 1
    -> 2147483648 "(0x80000000)"
4294967295 "(0xFFFFFFFF)" + 1
    -> 4294967296 "(0x100000000)"
18446744073709551615 "(0xFFFFFFFFFFFFFFFF)" + 1
    -> 18446744073709551616 "(0x10000000000000000)"

Very large values can be computed:

10000 factorial
    -> a huge number beginning with: 284625968091705451890641321211986889014....

Smalltalk will automatically convert any result which is too large to fit into a machine-integer into a LargeInteger (with an arbitrary number of bits) and also automatically convert back to a small representation, if possible.
Thus, although the two operands to the division in the following example are large integers,

rslt := (1000 factorial) / (999 factorial)

the result will be a small integer (since the value 1000 fits easily into a machine word).

As a user, you do not have to care about these internals.

Hint: b.t.w. therefore, you can use a Workspace (Notepad) window as a calculator with arbitrary precision.

1) be aware that this only computes correct results if the elementary action is written in either Smalltalk or in the builtin JavaScript syntax. Depending on the version, it may or may not be correct, if using Java/Groovy, Python, C/C++, Node.js etc.

Exact Fractions, ScaledDecimals and FixedDecimals[Bearbeiten]

When dividing integers, the "/" operator will deliver an exact result, possibly as a fraction:

5 / 3 -> 5/3

and reduce the result (possibly returning an Integer):

(5/3)*(3/2) -> 5/2
(5/3)/(3/2) -> 10/9
(5/3)*(9/3) -> 5
1000 factorial / 999 factorial -> 1000

There is also a truncating division operator "//", which will deliver an integer, truncated towards negative infinity (i.e. the next smaller integer), which is what you'd get in Java or C:

5 // 3 -> 1
-5 // 3 -> -3

The corresponding modulo operator "\\" provides the remainder, such that:

(a // b) + (a \\ b) = a

The "\\" is the standard Smalltalk modulo operator; Smalltalk/X also provides "%" as an alias (for users with a C/Java background).
Thus you can also write:

(a // b) + (a % b) = a

There is also a division operator which truncates towards zero, and a corresponding remainder operator, for which:

(a quo: b) + (a rem: b) = a

For positive a and b, the two operator pairs deliver the same result. For negative arguments, these are different. Be aware and think about the domain of your arguments.

In addition, the usual ceiling, floor and rounding operations are available (both on fractions and on limited precision reals):

(5 / 3) ceiling -> 2      "the next larger integer"
(5 / 3) floor -> 1        "the next smaller integer"
(5 / 3) truncated -> 1    "truncate towards zero"
(5 / 3) rounded -> 2           "round wrt. fraction >= 0.5)
(5 / 3) roundTo: 0.1  -> 1.7.
(5 / 3) roundTo: 0.01  -> 1.67
(-5 / 3) ceiling -> -1.   "the next larger integer"
(-5 / 3) floor -> -2      "the next smaller integer"
(-5 / 3) truncated -> -1  "truncate towards zero"
(-5 / 3) rounded -> -2.
(-5 / 3) roundTo: 0.1 -> -1.7

Fractions print themself as "(numerator / denominator)".

ScaledDecimal[Bearbeiten]

If you prefer a decimal representation with a defined number of fractional digits, use ScaledDecimals (which for backward compatibility are also called "FixedPoint" (*).

These are also exact fractions, but print differently: you can specify the number of digits to be printed and it will print itself rounded on the last digit:

(5 / 3) asScaledDecimal:2 -> 1.67
(5 / 3) asScaledDecimal:4 -> 1.6667
((5 / 3) asScaledDecimal:2) * 3 -> 5.00
1.2 asScaledDecimal:3 -> 1.200

(*) the class was previously called "FixedPoint" and the converters were called "asFixedPoint:". For compatibility with other Smalltalk dialects, these have aliases "ScaledDecimal" and "asScaledDecimal:".
Both the old class name and the old operators are and will be supported in the future for backward compatibility (as aliases), but you should use the new name, both for compatibility with other Smalltalk dialects, and to avoid confusion with FixedDecimal numbers.

FixedDecimal[Bearbeiten]

As mentioned above, ScaledDecimal keep the exact value internally, but print themself rounded to a given number of fractional digits. Smalltalk/X provides an alternative class called "FixedDecimal", which always keeps a rounded value internally. These may be better suited for monetary values, especially in computed additive sums which are printed, as the sum of two FixedDecimals will always be the presented (printed) sum of two FixedDecimals.

For example:

v := 50.004 asScaledDecimal:2.
    v printString -> '50.00'.   "is actually 50.004"
v2 := v * 2.
    v2 printString -> '100.01'. "is actually 100.008" 

this leads to confusion, iff such numbers represent monetary values and are printed eg. in a summed-up table.

With FixedDecimals, you'll get:

v := 50.004 asFixedDecimal:2.
    v printString -> '50.00'.   "is actually 50.00"
v2 := v * 2.
    v2 printString -> '100.00'. "is actually 100.00"

Be aware that mixed arithmetic operations will usually return an instance of the class with a higher generality, and that Floats do have a higher generality than FixedDecimals. Thus, when multiplying a float and a fixed decimal, you'll get a float result, whereas if you multiply an integer and a fixed decimal, the result will be a fixed decimal.

Inexact Float and Double Numbers[Bearbeiten]

Floating point numbers are inherently inexact.

This is not a problem specific to expecco, but inherent to the way floating point numbers are represented (in the machine).
See "What Every Computer Scientist Should Know About Floating-Point Arithmetic" and "Mindless Assessments of Roundoff in Floating-Point Computation" and "Some disasters attributable to bad numerical computing".

Floating point numbers are represented as a sum of powers of 2 (actually 1/2 + 1/4 + 1/8+...) multiplied by 2 raised to an exponent. I.e.

value = mantissa * 2 exponent

with the mantissa being normalized to be a sum in the interval 0.5..1 (as listed above).

Floating point formats differ in the number of bits (single/double/extended precision); typically, a double precision float has 11 bits for the exponent and 53 for the mantissa (see IEEE floating point formats).

Limited Precision[Bearbeiten]

Due to the limited number of bits in the mantissa, different values may end in the same floating point representation. For example, both 9223372036854776000 and 9223372036854775808 will end up being represented as the same float when converted from integer to float. (i.e. 9223372036854776000 asFloat = 9223372036854775808 asFloat will return true, in contrast to the correct result being returned when comparing them as integers 9223372036854776000 = 9223372036854775808).

Also, many numbers (actually: most numbers) cannot be exactly represented by a finite sum of powers of 2. Such numbers will have an error in the last significant bit (actually half the last bit). When floating point numbers are added or multiplied, the result is usually computed internally with a few more bits as mantissa, and then rounded on the last bit, to fit the mantissa's number of bits.

The situation may be relaxed slightly, by using more bits for the mantissa (and expecco gives you a choice of 32bit (called "ShortFloat"), 64bit ("Float") and 80bit ("LongFloat") which are mapped to corresponding IEEE floats (single, double and extended).

However, even with more bits, the fundamental restriction remains (although appearing less frequently with higher precision).

The limited precision may lead to "strange" results, especially when operands are far apart; for example, when subtracting a very small value from a much larger one, as in:

2.15e12 - 1.25e-5

Here, the operands differ by 17 orders of magnitude, and there are not enough bits to represent the result, which will be rounded to give you 2.15e12 again.
Thus, the comparison "2.15e12 - 1.25e-5 = 2.15e12 returns true, which is obviously wrong.
In this special case, a better result is obtained when operating with extended precision:

2.15e12 asLongFloat - 1.25e-5 asLongFloat

which gives 2149999999999.999987 as result (still incorrect due to its 64 bit mantissa, but much better).

If you are willing to trade speed for precision, you can use expecco's builtin arbitrary precision package, and compute with 200 bits of precision:

(2.15e12 asLargeFloatPrecision:200) - (1.25e-5 asLargeFloatPrecision:200)

which returns the correct 2149999999999.9999875. Be aware, that the arbitrary precision package is both much slower and also currently being developed and not yet released for official use (don't use its trigonometric and other math functions by the time of writing).

Floating Point Errors Propagate[Bearbeiten]

Such errors will accumulate, with every math operation performed on it (and may even do so wildly).

For example, the decimal 0.1 cannot be exactly represented as a floating point number, and is actually 0.099999... with an error in the last bit (half a ULP). Adding this multiple times will result in a larger error in the final result:

1.0 - (0.1 + 0.1 + 0.1 ...(10 times)... + 0.1) -> 1.11022302462516E-16

The print functions will usually try to compensate for an error in the last bit(s), showing "0.1" although in reality, it is "0.09999..." (it rounds before printing). Thus, even though the printed representation of such a number might look ok, it will inject more and more error when the value is used in further operations (until print will no longer be able to cheat, and the error becomes visible).

This is especially inconvenient, when monetary values or counts are represented as floats and a final sum is wrong in the penny value.
As an example, try to sum an array consisting of 10 values:

(Array new:10 withAll:0.1) sum printString

which results in "1.0" due to print's cheating,
whereas:

(Array new:100 withAll:0.1) sum printString

will show '9.99999999999998' (i.e. the error accumulated to a value too big for print's cheating to compensate).

In contrast to Float, expecco (actually the underlying Smalltalk) provides two number representations which are better suited for such computations: Fraction and FixedPoint (in other systems, these are also called "ScaledDecimal").

Both are internally exact fractions, but use different print strategies: Fractions print as such (i.e. '1/3', '2/17' etc.) whereas ScaledDecimal numbers print themself as decimal expansion (i.e. '0.33' or '0.20'). ScaledDecimal constant numbers can be entered by using "s" instead of "e" (i.e. '1.23s' defines a scaled decimal with 2 and '1.23s4' which will print 4 valid digits after the decimal point).

No such rounding errors are encountered, if fractions are used:

1 - ((1/10) + (1/10) + (1/10) ...(10 times)... + (1/10)) -> 0

or if FixedPoint numbers are used:

(Array new:100 withAll:0.1s) sum printString -> '10.0'

Floating Point Number Comparison[Bearbeiten]

Be aware of such errors, and do not compare floating point numbers for equality/inequality. Instead either use range-compares and/or use the special "compare-almost-equal" functions, where the number of bits of acceptable error can be specified (so called: "ULPs"). Expecco provides such functions both for elementary code and in the standard action block library.

Also, for this reason, do not compute money values using floats or doubles. Instead, use instances of ScaledDecimal. You will loose a cent/penny here and there, if you use floats/doubles on big budgets.

Limited Range of Float and Double Numbers[Bearbeiten]

Floating point numbers also have a limited range. In expecco, the default float format is IEEE double precision format (called "Float" in expecco). Floating numbers with absolute value greater than 1.79769313486232E+308 will lead to a +INF/-INF (infinite) result, and numbers with absolute value smaller than 2.2250738585072E-308 will be zero. For IEEE single precision floats (called "ShortFloat" in expecco), the range is much smaller, and for IEEE extended precision (called "LongFloat" in expecco), the range is larger.

You can ask the classes for their limits, with:

  • fmin (smallest representable number larger than zero)
  • fmax (largest representable number),
  • emin (smallest exponent)
  • emax (largest exponent),
  • precision (bits in mantissa, incl. any hidden bit),
  • decimalPrecision (digits when printed),
Float fmin -> 2.2250738585072E-308
Float fmax -> 1.79769313486232E+308
Float emin -> -1022
Float emax -> 1023
Float precision -> 53
Float decimalPrecision -> 15
ShortFloat fmin -> 1.175494e-38
ShortFloat fmax -> 3.402823e+38
ShortFloat emin -> -126
ShortFloat emax -> 127
ShortFloat precision -> 24
ShortFloat decimalPrecision -> 7
LongFloat fmin -> 3.362103143112093506E-4932
LongFloat fmax -> 1.189731495357231765E+4932
LongFloat emin -> -16382
LongFloat emax -> 16383
LongFloat precision -> 64
LongFloat decimalPrecision -> 19

As a consequence, you cannot (using floats) compute very large numbers;
for example you cannot compute the number of decimal digits of huge numbers with floats:

10000 factorial asFloat log10 -> INF

whereas the integer computation does it:

10000 factorial log10 -> 35659.454274

Again: this is not a problem specific to expecco, but inherent to the way floating point numbers are represented.

Speed of Operations[Bearbeiten]

Machines have builtin floating point math operations, which usually work fastest in single or double precision (actually, some modern machines work faster in double than in single precision).

Unless you have special precision needs, it is best to stick with double precision (which is also portable across machines).

Trigonometric and other Math Functions[Bearbeiten]

Some trigonometric and other math functions (sqrt, log, exp) will first convert the number to a limited precision real number (a C-double), and therefore may have a limited input value range and also generate inexact results.

For example, you will not get a valid result for:

10000 factorial sin

because it is not possible to represent that large number as real number. (expecco will signal a domain error, as the input to sin will be +INF)

Also, the result from:

(9 / 64) sqrt

will be the (inexact) 0.375 (a double), instead of the exact 3/4 (a fraction). (this might change in a future release and provide exact results if both numerator and denominator are perfect squares)

Complex Results[Bearbeiten]

By default, Smalltalk/X will raise an error if the result of a function with a real operand would return a complex result.
For example, computing the square root of a negative number as in:

-2 sqrt

will raise an "ImaginaryResultError".

However, this is a proceedable exception, which can be caught and if the handler proceeds, a complex result is returned:

rslt := ImaginaryResultError ignoreIn:[ -2 sqrt ].

for readability, there is also an alias called "trapImaginary:" in the number class:

rslt := Number trapImaginary:[ -2 sqrt ]

Both of the above would return the complex result:

(0+1.4142135623731 i)

Thus,

rslt := -2 sqrt squared

will result in an exception, whereas:

rslt := Number trapImaginary:[ -2 sqrt squared]

will generate a result of "-2.0".

Undefined Results[Bearbeiten]

Similar to the way imaginary results are handled, some operations are not defined for certain values (values outside the function's domain).
For example, the receiver of the arcSin operation must be in [-1 .. 1].

By default, these situations are reported by raising an error, and:

-2 arcSin

will raise a "DomainError".

SImilar to the above, this can be handled, although no useful value will be provided (in the above case, NaN (Not a number) will be returned:

rslt := DomainError ignoreIn:[ -2 arcSin ]

or:

rslt := Number trapDomainError:[ -2 arcSin ].

Both of the above would generate a NaN as result.

Notice that if such a NaN is used in other arithmetic operations, either more exceptions or other NaNs will usually be generated (depending on the exception being handled or not).
Thus:

rslt := Number trapDomainError:[ -2 arcSin sin ].

will generate a NaN as result.

Different Results on Different CPUs[Bearbeiten]

Since floating point arithmetic is performed by the underlying CPU hardware, different results (in the least significant bit) may be returned from math operations on different CPUs or even different versions (steppings) of the same CPU architecture.

This applies especially to trigonometric and other math functions (which are computed by Power- or Taylor-series).
Be prepared for this, and use the "almost-equal" comparison functions when results are to be verified.

Higher Precision Numbers[Bearbeiten]

Expecco supports various inexact real formats, with different precision (i.e. number of mantissa bits):

Name                          overall   exponent   mantissa    decimal    fmin               Smalltalk/X  ANSI Smalltalk 
                              size      size       size (1)    precision  fmax               class name   class name (4)
                              bit       bit        bit         digits

IEEE single precision floats    32        8        24          6          1.175494e-038      ShortFloat   FloatE
                                                                          3.402823e+038

IEEE double precision floats    64       11        53         15          2.225074E-308      Float        FloatD
                                                                          1.797693E+308  

IEEE extended prec. floats.   80/128     15       64/112      19/34       3.362103E-4932     LongFloat    FloatQ  (2)
                                                                          1.189731E+4932

quad double                    4*64      11       200         60          1.175494e-038      QDouble              (3)
                                                                          3.402823e+038 

IEEE arbitrary                 any       any      any         any         any                IEEEFloat            (3)
                                                                          any

large float                    any       any      any         any         any                LargeFloat           (3)
                                                                          any

(1) mantissa incl. any hidden bit (normalized floats)

(2) on x86/x64 machines, LongFloats are represented as 80bit extended floats with 64bit mantissa; on other CPUs, these might be represented as 128bit quadFloats with 112 bit mantissa.

(3) these are currently been developed and provided as a preview feature without warranty (meaning: they may be buggy at the moment; let us know if you need them).

(4) different Smalltalk dialects use different precisions for their floating point numbers: ST80 Floats are IEEE singles, V'Age Floats are IEEE doubles and ST/X Floats are IEEE doubles. Later, the ANSI standard defined FloatE, FloatD and FloatQ as aliases. You can use either interchangable in expecco.

Notice that the use of any but double precision floats (which are directly supported by the machine) may come at a performance price. The speed of operations degrades from double -> single -> extended -> quad double -> ieee arbitrary - largeFloat.



Copyright © 2014-2024 eXept Software AG