Numeric Limits/en: Unterschied zwischen den Versionen

Inhalt gelöscht Inhalt hinzugefügt

Inline

Aktuelle Version vom 14. Mai 2026, 10:39 Uhr

This page provides some computer science basics, which are not specific to expecco. However, in the past some users encountered problems and it is useful to provide some insight on number representations.

Expecco supports arbitrary precision integer arithmetic, arbitrary precision fractions, arbitrary and limited precision floating point numbers in various precisions and complex numbers. In addition, special purpose numbers for monetary and decimal presentations.

Being based on Smalltalk/X, expecco provides a complete set of number classes.

Syntax (In Smalltalk and FreezeValues)

See also: "Smalltalk_Syntax_Cheat_Sheet".

Integers (arbitrary size)

In addition to the standard Smalltalk radix/base-notation <b>r<digits> with bases between 2 and 36, the C notations 0x<digits> (hex), 0b<digits> (binary) and 0o<digits> (octal) are recognized.
The old C-style leading-zero octal notation is NOT recognized, but instead interpreted as decimal. Notice that in Smalltalk any negative sign precedes the number's digits, whereas in C it precedes the base.

1234567
0xFF00AA  (base 16)
0b01010101 (base 2)
-0xAFFE  (negative C notation)
3r121212 (base 3)
4r123123 (base 4)
16r-AFFE (negative Smalltalk notation)

Fractions (arbitrary integral numerator and denominator)

Numerator and denominator must be integers (of any size). Any negative sign must be inside the parentheses.

(1/2)
(1/101)
(-1/3)

ScaledDecimals

The number after the 's' character specifies the number of digits to be printed. However, internally the full precision is used in arithmetic operations.

123s2
123.456s2
123.009s2  (prints as 123.01)
  0.001s2  (prints as 0.00)
123.009s2 + 0.001s2  (prints as 123.01)

Floats / Float64 (actually 64bit IEEE doubles)

12.456
1e17
1.0e23
12e     (legal, but not recommended)
.5      (illegal)
.5e     (illegal)

ShortFloats / Float32 (32bit IEEE doubles)

An 'f' suffix makes it float32 constant.

12.456f
1e17f
1.0e23f
12f      (legal, not recommended)
.5f      (illegal)
.5ef     (illegal)

LongFloats / Float80 (80bit IEEE long doubles)

12.456q
1e17q
1.0e23q

QuadFloats / Float128 (128bit IEEE quadruple floats)

12.456Q
1e17Q
1.0e23Q

OctaFloats / Float256 (256bit IEEE octuple floats)

12.456QO
1e17QO
1.0e23QO

LargeFloats (arbitrary precision (defaults to 200bit) software floats)

12.456QL
1e17QL
1.0e23QL

QDoubles (4 IEEE doubles combined)

12.456QD
1e17QD
1.0e23QD

Complex

1+5i
4i
1.23+5.67i

Exact Integer Numbers

For integer operations, there is no overflow or error in the result for any legal operation. I.e. operations on two integers delivers a correct and exact result.

This is a feature of the underlying Smalltalk runtime environment and in contrast to many other programming languages (especially: Java and C) which provide int (usually 32bit) and long (usually 64bit) integer types.
In expecco, you can write both in Smalltalk and in the builtin JavaScript syntax:¹

2147483647 "(0x7FFFFFFF)" + 1
    -> 2147483648 "(0x80000000)"

4294967295 "(0xFFFFFFFF)" + 1
    -> 4294967296 "(0x100000000)"

18446744073709551615 "(0xFFFFFFFFFFFFFFFF)" + 1
    -> 18446744073709551616 "(0x10000000000000000)"

Very large values can be computed:

10000 factorial
    -> a huge number beginning with: 284625968091705451890641321211986889014....

Smalltalk will automatically convert any result which is too large to fit into a machine-integer into a LargeInteger (with an arbitrary number of bits) and also automatically return results converted back to a small representation, if possible.²
Thus, although the two operands to the division in the following example are large integers,

rslt := (1000 factorial) / (999 factorial)

the result will be a small integer (since the value 1000 fits easily into a machine word).
As a user, you do not have to care about these internals.

Hint: Therefore, you can use a Workspace (Notepad) window as a calculator with arbitrary precision.

¹) be aware that this only computes correct results if the elementary action is written in either Smalltalk or in the builtin JavaScript syntax.
Depending on the version of the external language interpreter, it may or may not be correct, if using Java/Groovy, Python, C/C++, Node.js etc.
²) the integer class provides functions which operate in the limited 32 or 64 bit range. These might be useful if you have to verify results or repeat computations as returned by corresponding C or Java operations.

Exact Fractions, ScaledDecimals and FixedDecimals

Fractions

When dividing integers, the "/" operator will deliver an exact result, possibly as a fraction:

5 / 3 -> 5/3

and reduce the result (possibly returning an Integer):

(5/3)*(3/2) -> 5/2
(5/3)/(3/2) -> 10/9
(5/3)*(9/3) -> 5
1000 factorial / 999 factorial -> 1000

There are also two truncating division operators:

"//", which returns an integer truncated towards negative infinity (i.e. the next smaller integer), and

"quo:". which truncates towards zero. "quo:" is what you'd get in Java or C.

5 // 3 -> 1
-5 // 3 -> -3

The corresponding modulo operators are "\\" and "rem:".

"\\" is the standard Smalltalk remainder operator, such that:

(a // b) + (a \\ b) = a

Smalltalk/X also provides "%" as an alias.
Thus you can also write:

(a // b) + (a % b) = a

But be aware that its semantic is different from C, when negative operands are involved (see below).

The corresponding remainder for the "quo:" division operator (which truncates towards zero), is the "rem:" remainder operator, such that:

(a quo: b) + (a rem: b) = a

For positive a and b, the two operator pairs deliver the same result. For negative arguments, these are different. Be aware and think about the domain of your arguments. And be reminded again that the Smalltalk remainder "%" returns different results as C or Java, when operands are negative.

In addition, the usual ceiling, floor and rounding operations are available (both on fractions and on limited precision reals):

1.5 ceiling -> 2      "the next larger integer"
1.5 floor -> 1        "the next smaller integer"
1.5 truncated -> 1    "truncate towards zero"
1.5 rounded -> 2           "round wrt. fraction >= 0.5)
1.55 roundTo: 0.1  -> 1.6.
1.54 roundTo: 0.1  -> 1.5.

-1.5 ceiling -> -1      "the next larger integer"
-1.5 floor -> -2        "the next smaller integer"
-1.5 truncated -> -1    "truncate towards zero"
-1.5 rounded -> -2           "round wrt. fraction >= 0.5)
-1.55 roundTo: 0.1  -> -1.6.
-1.54 roundTo: 0.1  -> -1.5.

(5 / 3) ceiling -> 2      "the next larger integer"
(5 / 3) floor -> 1        "the next smaller integer"
(5 / 3) truncated -> 1    "truncate towards zero"
(5 / 3) rounded -> 2           "round wrt. fraction >= 0.5)

(-5 / 3) ceiling -> -1.   "the next larger integer"
(-5 / 3) floor -> -2      "the next smaller integer"
(-5 / 3) truncated -> -1  "truncate towards zero"
(-5 / 3) rounded -> -2.

Fractions print themself as "(numerator / denominator)".

Be careful when rounding fractions to an inexact float (which cannot be represented in the IEEE binary format). A famous example is 0.1, which has a periodic binary mantissa. When rounding a fraction to such a number, Smalltalk does its best but has no chance to return a small fraction. Therefore:

(5 / 3) roundTo: 0.1  -> (1595024868027069/938249922368864).
(5 / 3) roundTo: 0.01  -> (652865570981689/390937467653706)
(-5 / 3) roundTo: 0.1 -> (-1595024868027069/938249922368864)

but:

(5 / 3) roundTo: (1/10)  -> (17/10).
(5 / 3) roundTo: (1/100)  -> (167/100)
(-5 / 3) roundTo: (1/10) -> (-17/10)

ScaledDecimal

If you prefer a decimal representation with a defined number of fractional digits, use ScaledDecimals (which for backward compatibility are also called "FixedPoint" (¹⁾).

These are also exact fractions, but print temself differently: you can specify the number of digits to be printed and it will print itself rounded on the last digit. In other words: the computation and internal value will be exact (as with Fractions), and therefore, no rounding errors will accumulate. Only when printed, will the external represenatation be rounded to the specified number of decimal places.

(5 / 3) asScaledDecimal:2 -> 1.67
(5 / 3) asScaledDecimal:4 -> 1.6667

((5 / 3) asScaledDecimal:2) * 3 -> 5.00

1.2 asScaledDecimal:3 -> 1.200

Float pi asScaledDecimal:5 -> 3.14159

1) the class was previously called "FixedPoint" and the converters were called "asFixedPoint:".
For compatibility with other Smalltalk dialects, these have aliases "ScaledDecimal" and "asScaledDecimal:".
Both the old class name and the old operators are and will be supported in the future for backward compatibility (as aliases), but you should use the new name, both for compatibility with other Smalltalk dialects, and to avoid confusion with FixedDecimal numbers.

FixedDecimal

As mentioned above, a ScaledDecimal keeps the exact value internally, but prints itself rounded to a given number of decimal digits. Smalltalk/X provides an alternative class called "FixedDecimal" ⁽¹⁾, which always keeps a rounded value internally. These may be better suited for monetary values, especially in computed additive sums which are printed in a table, as the sum of two FixedDecimals will always be the presented (printed) sum of two FixedDecimals. In contrast, with ScaledDecimals, you may see a sum which differs from what presented table values suggest.

For example:

v := 50.004 asScaledDecimal:2.
    v printString -> '50.00'.   "is actually 50.004"
v2 := v * 2.
    v2 printString -> '100.01'. "is actually 100.008"

this leads to confusion, iff such numbers represent monetary values and are printed eg. in a summed-up table.

With FixedDecimals, you'll get:

v := 50.004 asFixedDecimal:2.
    v printString -> '50.00'.   "is actually 50.00"
v2 := v * 2.
    v2 printString -> '100.00'. "is actually 100.00"

Be aware that mixed arithmetic operations will usually return an instance of the class with a higher generality, and that Floats do have a higher generality than FixedDecimals which have a higher generality than ScaledDecimals.

Thus, when multiplying a float and a fixed decimal, you'll get a float result, whereas if you multiply an integer and a fixed decimal, the result will be a fixed decimal. Finally, when multiplying a fixed decimal and a scaled decimal, the result will be a fixed decimal.

¹⁾ both names "ScaledDecimal" and "FixedDecimal" have been chosen a bit unwise, and may be confusing. However, these cannot easily be changed for backward and cross Smalltalk dialect compatibility reasons. We apologize.

Scaled- vs. FixedDecimal; which one to use and when?

Use ScaledDecimals to print tables or measurement values.

Use FixedDecimals when working with money, escpecially when printing accounting lists where monetary values are added up. With FixedDecimals, youwill get a sumwhich corresponds to what has been printed above.

Inexact Float and Double Numbers

Floating point numbers are inherently inexact and almost always represent an approximated value. The error depends on the floating point number's precision, which is the number of bits with which the value is approximated. there are numbers which cannot ever be represented as a floating point number, whatever precision is used. Even innocent looking numbers (eg. "0.1") are of this kind.

This is not a problem specific to expecco or Smalltalk, but inherent to the way floating point numbers are represented in the machine's CPU.
See "What Every Computer Scientist Should Know About Floating-Point Arithmetic", "Mindless Assessments of Roundoff in Floating-Point Computation" and "Some disasters attributable to bad numerical computing".
A very impressive example of how wrong double precision IEEE arithmetic can be is described in "Do_not_trust_Floating_Point" ⁽¹⁾.

Floating point numbers are represented as a sum of powers of 2 (actually 1/2 + 1/4 + 1/8 +...) called the "mantissa" then multiplied by 2 raised to an exponent. I.e.

value = mantissa * (2 ** exponent)

with the mantissa being normalized to be a sum in the interval 0.5..1 (as listed above). And the exponent stored with an offset (called "ebias"). The minimum exponent (0) is reserved for the zero number and non-normalized tiny numbers (called "subnormals"); the maximum exponent is reserved for infinities and NaNs ("Not a Number"). These might be returned from some operations if invalid arguments are provided (for example, trying to compute the logarithm or square root of a negative number).

The number of exponent bits determines the largest and smallest representable magnitudes, the number of mantissa bits determines the relative error. The absolute error depends on the magnitude of the last mantissa bit, which depends on the exponent. This value is called "Unit in the Last Place" or "'ULP'" (see Wikipedia).
For a large number like 1e100, one ULP is the very large 1.94266889222573e+84, whereas for a small number like 0.5, it is 1.11022302462516e-16.

Floating point formats differ in the number of bits (single/double/extended precision etc.).
A double precision IEEE float has 11 bits for the exponent and 53 for the mantissa (see IEEE floating point formats).

As a rule of thumb, the error in the last bit of a double precision IEEE float is roughly 15 to 16 orders of magnitudes smaller than the magnitude of the (double precision) floating point number. The error is larger for single precision (32bit) floats and smaller for extended floats (80, 128 or more bits).

¹⁾: If you do not believe me, try the following example from one of the mentioned papers in (eg. excel) or your favourite programming language:

v := 4/3.    "/ or maybe (4.0/3.0)
w := v - 1.
x := w*3.   
y := x - 1.  
z := (y*2)**52.

Limited Precision

Due to the limited number of bits in the mantissa, different values may end in the same floating point representation. For example, both 9223372036854776000 and 9223372036854775808 will end up being represented as the same float when converted from integer. The reason is that there are simply not enough bits in the mantissa.

Thus:

9223372036854776000 asFloat = 9223372036854775808 asFloat

will return "true", and the difference will be zero in:

9223372036854776000 asFloat - 9223372036854775808 asFloat

in contrast to the correct result being returned when comparing/subtracting them as integers:

9223372036854776000 = 9223372036854775808.
=> false
9223372036854776000 - 9223372036854775808.
=> 192

Try it in a workspace window.

Also, many numbers (actually: most numbers) cannot be exactly represented by a finite sum of powers of 2. Such numbers have an error in the last significant bit (actually half the last bit). When floating point numbers are added or multiplied, the result is usually computed internally with a few more bits as mantissa, and then rounded on the last bit, to fit the mantissa's number of bits.

Notice that this may not be immediately obvious, because the print functions (such as printf) cheat, and round again on the last bit. Thus, a result such as 0.9999999 would still be printed as "1.0").

The situation may be relaxed slightly, by using more bits for the mantissa (and expecco gives you a choice of 32bit (in Smalltalk called "ShortFloat" or "Float32"), 64bit ("Float" or "Float64") and 80bit ("LongFloat" or "Float80") and even more ("Float128", "Float256"), which are mapped to corresponding IEEE floats (single, double, extended, quadruple and octuple precision).

Repeating the above example with long floats, there are enough mantissa bits and the numbers are no longer represented or considered equal,
thus:

9223372036854776000 asLongFloat = 9223372036854775808 asLongFloat

yields "false" as answer, and the computation:

9223372036854776000 asLongFloat - 9223372036854775808 asLongFloat

will give 192.0 as answer.

However, even with more bits, the fundamental restriction remains, although appearing less frequently or less obvious given higher precision. But be aware that many numbers (such as 1/10, 1/5, 1/3) can never be represented exactly, no matter how many bits are used. So even the above mentioned innocent looking "0.1" is actually an approximation and wrong in the last bit.

The limited precision may lead to "strange" results, especially when operands are far apart; for example, when subtracting a very small value from a much larger one, as in:

2.15e12 - 1.25e-5

Here, the operands differ by 17 orders of magnitude, and there are not enough bits to represent the result, which will be rounded to give you 2.15e12 again.
Thus, the comparison "2.15e12 - 1.25e-5 = 2.15e12" returns true and "2.15e12 - (2.15e12 - 1.25e-5) = 0.0", which are both obviously wrong.
In this special case, a better result is obtained when operating with extended precision:

2.15e12 asLongFloat - 1.25e-5 asLongFloat

which gives 2149999999999.9999875q as result (although the default print precision will not print the final digit, so look at it in an inspector)

If you are willing to trade speed for precision, you can use one of expecco's builtin higher precision representations or even the arbitrary precision representation, and compute with more bits of precision. The QDouble class provides a compromise between speed and precision, proving roughly 200 bits of precision or alternatively represents a combination of up to 4 arbitrarily valued doubles (i.e. it can represent the sum of a very large and a small number):

(2.15e12 asQDouble) - (1.25e-5 asQDouble)
(2.15e12 asQDouble) - ((2.15e12 asQDouble) - (1.25e-5 asQDouble))

Another representation supports an arbitrary number of precision bits (here 200):

(2.15e12 asLargeFloatPrecision:200) - (1.25e-5 asLargeFloatPrecision:200)
(2.15e12 asLargeFloatPrecision:200) - ((2.15e12 asLargeFloatPrecision:200) - (1.25e-5 asLargeFloatPrecision:200))

Both return the correct results 2149999999999.9999875 and 1.25e-5 respectively.

Be aware, that the higher precision and arbitrary precision operations are much slower than the ones which are directly supported by the processor (which has special hardware, usually for single, double and extended precision). Also these extended classes are still being developed and not yet released for official use (meaning they may contain bugs, especially in their trigonometric and other math functions by the time of writing).

You may use fractions,

2150000000000 - (1 / 125000)

to compute the exact result: (268749999999999999/125000) (of course, these are less convenent to read, and should probably be presented to the end-user as ScaledDecimals.

Also, do not forget that a conversion of one number to a higher precision number cannot magically generate missing bits.
For example, given a 32 bit floating point number which is already an approximation (i.e. the real value cannot be represented as an exact sum of powers of 2), then the conversion will give you another such approximation, with the same error. Thus, "0.25 asLongFloat" will give you an exact 0.25 (because 0.25 is representable), whereas "0.1 asLongFloat" will not give an exact "0.1". Actually, the result of such a conversion will usually not give you the full possible precision.

If you need a constant with the max. precision, either enter it as such (i.e. "0.1q" instead of "0.1 asLongFloat") or read it from a string (i.e. LongFloat fromString:'0.1').

Floating Point Errors Propagate

The above rounding and last bit errors will accumulate, with every math operation performed on it (and may even do so wildly).

For example, the already mentioned 0.1 cannot be exactly represented as a floating point number, and is actually 0.099999..X with an error in the last bit (half a ULP).
Adding this multiple times will result in a larger and larger accumulated error in the final result:

1.0 - (0.1 + 0.1 + 0.1 ...(10 times)... + 0.1) -> 1.11022302462516E-16

The print functions will try to compensate an error in the last bit(s), showing "0.1" although in reality, it is "0.09999..." (it cheats and rounds before printing). Thus, even though the printed representation of such a number might look ok, it will inject more and more error when the value is used in further operations (up to the point when the error accumulates out of the last bit and print will no longer be able to cheat, and the error becomes visible).

Monetary Values

This is especially inconvenient, when monetary values or counts are represented as floats and a final sum is wrong in the penny value
(and therefore, real programmers never ever use floating point to represent monetary values!).

As an example, try to sum an array consisting of 10 values:

(Array new:10 withAll:0.1) sum printString

which results in "1.0" due to print's cheating,
whereas:

(Array new:100 withAll:0.1) sum printString

will show '9.99999999999998' (i.e. the error accumulated to a value too big for print's cheating to compensate).

Expecco (actually the underlying Smalltalk) provides additional number representations which are better suited for such computations: Fraction, ScaledDecimal and FixedDecimal as described above).

These are exact fractions internally, but use different print strategies: Fractions print as such (i.e. '1/3', '2/17' etc.) whereas ScaledDecimal and FixedDecimal numbers print themself as decimal expansion (i.e. '0.33' or '0.20'). ScaledDecimal constant numbers can be entered by using "s" instead of "e" (i.e. '1.23s' defines a scaled decimal with 2 and '1.23s4' which will print 4 valid digits after the decimal point).

No such rounding errors are encountered, if fractions are used:

1 - ((1/10) + (1/10) + (1/10) ...(10 times)... + (1/10)) -> 0

or if ScaledDecimal numbers are used:

(Array new:100 withAll:0.1s) sum printString -> '10.0'

Floating Point Number Comparison

Be aware of such errors, and do not compare floating point numbers for equality/inequality.

As a concrete example, try:

(0.2 + 0.1 - 0.3) = 0

which will return false and if you print "0.2 + 0.1 - 0.3", you might get something like: "5.55111512312578e-17".
Even increasing the precision does not really help; if we went to 200bits precision, we'd still get a small error:

(0.2QL + 0.1QL - 0.3QL) printString

gives "-3.111507638930570853572...e-61"

The problem also occurs when comparing numbers with different precision. For example, consider that a float32 value is to be compared against a constant. The float32 might be read from a file or provided by a measurement device via any communication mechanism.
If we compare it against a higher precision value, the missing bits in the shorter float are filled with zeros. Thus:

(Float32 readFrom:'0.125') = 0.125

leads to a true value, whereas:

(Float32 readFrom:'0.123') = 0.123

returns false.
The reason for this is that 0.125 can be represented as exact float in both float32 and float64 formats, whereas 0.123 is non-exact and has repeated binary digits in the mantissa. Its representation as float64 is:

0 01111111011 1111011111001110110110010001011010000111001010110000

and:

0 01111011 11110111110011101101101

as float32. When comparing, the float32 is expanded to:

0 01111011 111101111100111011011010000000000000000000000000000000

which is obviously different.

Instead of comparing against a constant, either use range-compares and/or use the special "compare-almost-equal" functions, where the number of bits of acceptable error can be specified (so called: "ULPs"). Expecco provides such functions both for elementary code and in the standard action block library.

Again, this is another reason to not compute monetary values using floats or doubles. Instead, use instances of ScaledDecimal. Otherwise you might loose a cent/penny here and there, if you use floats/doubles on big budgets.

With scaled decimals, the result is correct:

0.1s + 0.2s - 0.3s = 0.  => true

and:

(0.1 asScaledDecimal + 0.2 asScaledDecimal - 0.3 asScaledDecimal) printString => 0.00

Find further insight here

Limited Range of Float and Double Numbers

Floating point numbers also have a limited range; there are smallest and largest representable values.
In expecco, the default float format is IEEE double precision format (called "Float" or "Float64" in expecco). Numbers with an absolute value greater than 1.79769313486232E+308 will lead to a +INF/-INF (infinite) result, and numbers with absolute value smaller than 2.2250738585072E-308 will be zero.

For IEEE single precision floats (called "ShortFloat" or "Float32" in expecco), the range is much smaller, and for IEEE extended precision (called "LongFloat" or "Float80" in expecco), the range is larger.

You can ask the classes (or its instances ¹) for their limits, with:

fmin (smallest representable number larger than zero)
fmax (largest representable number),
emin (smallest exponent; binary)
emax (largest exponent binary),
precision (bits in mantissa, incl. any hidden bit),
decimalPrecision (digits when printed),

remember Float is the same as Float64
Float fmin -> 2.2250738585072E-308
Float fmax -> 1.79769313486232E+308
Float emin -> -1022
Float emax -> 1023
Float precision -> 53
Float decimalPrecision -> 15

remember ShortFloat is an alias for Float32
ShortFloat fmin -> 1.175494e-38
ShortFloat fmax -> 3.402823e+38
ShortFloat emin -> -126
ShortFloat emax -> 127
ShortFloat precision -> 24
ShortFloat decimalPrecision -> 7
 
remember LongFloat is an alias for Float80
LongFloat fmin -> 3.362103143112093506E-4932
LongFloat fmax -> 1.189731495357231765E+4932
LongFloat emin -> -16382
LongFloat emax -> 16383
LongFloat precision -> 64
LongFloat decimalPrecision -> 19
 
QuadFloat is an alias for Float128
Float128 fmin -> 3.36210314311209350626267781732e-4932
Float128 fmax -> 1.18973149535723176508575932662e+4932
Float128 emin -> -16382
Float128 emax -> 16383
Float128 precision -> 113
Float128 decimalPrecision -> 34
 
OctaFloat is an alias for Float256
Float256 fmin -> 2.48242795146434978829932822291387172...5329791379e-78913
Float256 fmax -> 1.61132571748576047361957211845200501...7125049607e+78913
Float256 emin -> -262142
Float256 emax -> 262143
Float256 precision -> 237
Float256 decimalPrecision -> 71

QDouble fmin -> same as float64
QDouble fmax -> same as float64
QDouble emin -> same as float64
QDouble emax -> same as float64
QDouble precision -> 204
QDouble decimalPrecision -> 61

aLargeFloat fmin -> 0.0 arbitrary small ¹
aLargeFloat fmax -> inf arbitrary large
aLargeFloat emin -> -inf arbitrary small
aLargeFloat emax -> inf arbitrary large
aLargeFloat precision -> 200 default; configurable
aLargeFloat decimalPrecision -> 60 default; configurable

Remember that the name "Float" refers to "Float64", which is called "double" in the C language. And also remember that the name "ShortFloat" refers to "Float32", which is called "float" in C. And finally, the name "LongFloat" refers to "Float80", which is called "long double" in C.

As a consequence of the limits, you cannot compute very large numbers using any of the CPU supported floats, and you will have to use one of the software computed float representations.
For example trying to compute the number of decimal digits of a huge number:

10000 factorial asFloat log10 -> INF

I.e. it returns infinity, because 10000 factorial asFloat already returns INF. (it could have been converted with "asFloatChecked" which raises an exception in that situation, which is probably a good idea to do. However, the regular asFloat conversion uses the underlying machine's CPU support, which returns INF, and is similar to the behavior in other programming languages).

In contrast, the integer computation does it:

10000 factorial log10 -> 35659.454274

Again: this is not a problem specific to expecco, but inherent to the way floating point numbers are represented in the CPU.

¹⁾because LargeFloats have an individual number of precision (per instance), you should ask the instance, not the class. The class will return the default values (which are valid for 200 bits of resolution).

Speed of Operations

Machines have builtin floating point math operations, which usually work fastest in single or double precision (actually, some modern machines work slightly faster in double than in single precision).

Unless you have special precision needs, it is best to stick with double precision which is also portable across machines. Therefore, double precision floats (aka "double") is the default and simply named "Float" in Smalltalk/X (and therefore also in expecco) ¹.

¹⁾The reason for calling them "Float" is historic. There exist Smalltalk dialects where Floats are 32bit IEEE floats, and others where they are 64bit. To be able to import code from either dialect, Smalltalk/X uses double precision for "Float".
To make you intention clear, it is recommended to use the alias names which are more descriptive and explicit; i.e. "Float32", "Float64" etc.

Trigonometric and other Math Functions

Some trigonometric and other math functions (sqrt, log, exp) will first convert the number to a limited precision real number (a C-double), and therefore may have a limited input value range and also generate inexact results.

For example, you will not get a valid result for:

10000 factorial sin

because it is not possible to represent that large number as real number. (expecco will signal a domain error, as the input to sin will be +INF)

Also, the result from:

(9 / 64) sqrt

will be the (inexact) 0.375 (a double), instead of the exact 3/4 (a fraction). (this might change in a future release and provide exact results if both numerator and denominator are perfect squares)

You can however first convert to a higher precision float and then apply the function. These will compute using a Taylor series or Newton approximation taking the number's precision into account:

2 sqrt
=> 1.4142135623731 (computed with float64 precision)

2 asFloat128 sqrt
=> 1.41421356237309504880168872421

2 asFloat256 sqrt
=> 1.41421356237309504880168872420969807856967187537694807317667973799073

(2 asLargeFloatPrecision:500) sqrt
=> 1.4142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727350138462309122970249248360558507372126441214971

Complex Results

By default, Smalltalk/X will raise an error if the result of a function with a real operand would return a complex result.
For example, computing the square root of a negative number as in:

-2 sqrt

will raise an "ImaginaryResultError".

However, this is a proceedable exception ¹, which can be caught and if the handler proceeds, a complex result is returned:

rslt := ImaginaryResultError ignoreIn:[ -2 sqrt ].

for readability, there is also an alias called "trapImaginary:" in the number class:

rslt := Number trapImaginary:[ -2 sqrt ]

Both of the above would return the complex result:

(0+1.4142135623731 i)

Thus,

rslt := -2 sqrt squared

will result in an exception, whereas:

rslt := Number trapImaginary:[ -2 sqrt squared]

will generate a result of "-2.0".

All operations within "[" .. "]" which would produce an ImaginaryResultError will return a complex. Thus you can write:

Number trapImaginary:[ 
   |num1 num2|

   num1 := -2 sqrt.
   num2 := -3 sqrt.
   Transcript showCR: (num1 + num2).
]

and "(0+3.14626436994197i)" will be shown.

¹⁾Proceedable exceptions are among the unique features of the Smalltalk programming language; exceptions may be raised as being proceedable, and an exception handler may then "proceed" and provide an alternative return value from the failed operation. This mechanism is used here, where the exception handler - if pesent - can decide to return an imaginary result.

Undefined Results, NaN and Domain Errors

Similar to the way imaginary results are handled, some operations are not defined for certain values (values outside the function's domain).
For example, the receiver of the arcSin operation must be in [-1 .. 1].

By default, these situations are also reported by raising an error, and therefore:

-2 arcSin

will raise such a "DomainError".

Similar to the above, this can be handled, although no useful value will be provided (in the above case, NaN (Not a Number) will be returned:

rslt := DomainError ignoreIn:[ -2 arcSin ]

or:

rslt := Number trapDomainError:[ -2 arcSin ].

Both of the above would generate a NaN as result.

Notice that if such a NaN is used in other arithmetic operations, either more exceptions or other NaNs will usually be generated (depending on the exception being handled or not).
Thus:

rslt := Number trapDomainError:[ -2 arcSin sin ].

will also generate a NaN as result.

You can check for valid results with:

aNumber isNaN               - true for NaN
aNumber isInfinite          - true for infinities
aNumber isPositiveInfinity  - true for +inf
aNumber isNegativeInfinity  - true for -inf
aNumber isFinite            - false for NaN or infinities (i.e. true for valid numbers)

Overflow

When an operation's arguments are OK, but the result falls outside the range of representable numbers [fmin..fmax], you will get an infinite result ¹⁾.
Further operations on these might produce more infinities or a NaN ("Not a Number"). This may be especially troublesome, if a final result gets corrupted due to an intermediate computation, as in:

a := 1e+10.
b := 1e+300.
c := 1e+20.
d := 1e+300.

rslt := (a*b) / (c*d)
  -> NaN

Here, the final result is certainly representable, but the intermediate values (1e10 * 1e300) are out of the Float range [2.225E-308 .. 1.796e+308]. Thus, the computation will be:

a := 1e+10.
b := 1e+300.
c := 1e+20.
d := 1e+300.

(a*b) -> INF
(c*d) -> INF
INF / INF -> NaN

If the above intermediates are computed with a higher precision, the final result will be correct:

a := 1e+10.
b := 1e+300.
c := 1e+20.
d := 1e+300.

t := (a asLongFloat * b asLongFloat) / (c asLongFloat * d asLongFloat).
rslt := t asFloat
  -> 1e-10

Of course, with LongFloats, the problem is only shifted towards larger numbers; as soon as the temporary result cannot be represented by a LongFloat:

a := 1q+100.
b := 1q+4900.
c := 1q+200.
d := 1q+4900.
rslt := (a * b) / (c * d).
  -> NaN

then, you may use arbitrary precision LargeFloat numbers:

t := (a asLargeFloat * b asLargeFloat) / (c asLargeFloat * d asLargeFloat).
rslt := t asFloat.
  -> 1e-100

¹⁾ that is the current default behavior. Future versions may allow enabling exceptions in this situation, if there are customer requests. However, as most other programming languages behave similar in these situations, most programmers are aware of these pitfalls and avoid such problems.

Different Results on Different CPUs

Since FLOAT32 and FLOAT64 arithmetic is performed by the underlying CPU hardware, different results (in the least significant bit) may be returned from math operations on different CPUs or even different versions (steppings) of the same CPU architecture.

This applies especially to trigonometric and other math functions. Therse are computed by Power- or Taylor-series or Newton approximations with different algorithms on different systems, leading to different results.
Be prepared for this, and use the "almost-equal" comparison functions when results are to be verified.

Summary of Higher Precision Numbers

Expecco supports various inexact real formats, with different precision (i.e. number of mantissa bits). Some of those classes have alias names; these are provided to make Smalltalk/X code portable to other Smalltalk dialects (however, within expecco, you will probably not care, as it is not planned to port it to another dialect).

Name              Overall   Exponent   Mantissa    Decimal    fmin                ST/X        ANSI-ST      ST80/VW    IBM VA-ST
                  Size      Size       Size ¹⁾     Precision  fmax                 Name        Name ⁴⁾       Name ⁴⁾   Name ⁴⁾
                  Bit       Bit        Bit         Digits

IEEE single          32        8        24          6          1.175494e-038      ShortFloat   FloatE       Float      -
                                                               3.402823e+038      FloatE
                                                                                  Float32

IEEE double          64       11        53         15          2.225074e-308      Float        FloatD       Double    Float
                                                               1.797693e+308      FloatD
                                                                                  Float64
                                                                                  Double

IEEE extended    80/128       15       64/112      19/34       3.362103e-4932     LongFloat    FloatQ  ²⁾     -         -
                                                               1.189731e+4932     FloatQ  
                                                                                  Float80
                                                                                  or Float128

quad double        4*64       11       204         60          1.175494e-038      QDouble       -      ³⁾    -         -
                                                               3.402823e+038 

IEEE quadruple      128       15       112         34          3.362103e-4932     QuadFloat     -      ³⁾     -         -
                                                               1.189731e+4932     Float128

IEEE octuple        256       19       236         71          2.482427e-78913    OctaFloat     -      ³⁾     -         -
                                                               1.611325e+78913    Float256

IEEE arbitrary      any       any      any         any         any                IEEEFloat     -      ³⁾     -         -
                                                               any

large float         any       any      any         any         any                LargeFloat    -      ³⁾     -         -
                                                               any

(1) mantissa incl. any hidden bit (normalized floats)

(2) LongFloats use the underlying CPU's long double format.
On x86/x64 machines, LongFloats are represented as 80bit extended floats with 64bit mantissa; on other CPUs these might be represented as 128bit quadFloats with 112 bit mantissa (eg. a SPARC CPU does this).

(3) these are still been developed and provided as a preview feature without warranty (meaning: they may be buggy at the moment; let us know if you need them).

(4) different Smalltalk dialects use different precisions for their floating point numbers: ST80/VW Floats are IEEE singles, V'Age Floats are IEEE doubles and ST/X Floats are IEEE doubles. VW refers to Float64 as Double.
Later, the ANSI standard defined FloatE, FloatD and FloatQ as aliases. You can use either interchangable in expecco.

Notice that the use of any but double precision floats (which are directly supported by the machine) may come at a performance price.
The speed of operations degrades from double -> single -> extended -> ieee128 -> quad double -> ieee256 -> ieee arbitrary - largeFloat.
This is especially true for the trigonometric and math functions, where both more iterations are needed to get to the desired precision in the series computations and the individual operations are also much slower.

Be aware that LargeFloats are super precise, but also super slow.

Constants

Some well known and common constants can be acquired by asking a number class such as the Float class:

Float pi     -> pi (3.14159...)
Float halfPi -> pi / 2
Float twoPi  -> pi * 2 
Float phi    -> phi (golden ratio 1.6180...)
Float sqrt2  -> square root of 2 (1.4142...)
Float sqrt5  -> square root of 5 (2.2360...)
Float ln2    -> natural log of 2 (0.69314...)
Float ln10   -> natural log of 10 (2.30258...)
Float e      -> e (2.718281...)

Each number class will return a representation of that constant with its precision.
I.e. if you ask the Float class for the constant "pi", you'll get a pi with roughly 15 digits precision, whereas if you ask QDoubles, a more accurate representation will be returned.

ShortFloat pi   (= Float32 pi)      -> 3.141593
Float pi        (= Float64 pi)      -> 3.14159265358979
LongFloat pi    (= Float80 pi)      -> 3.141592653589793238
QuadFloat pi    (= Float128 pi)     -> 3.1415926535897932384626433832795027
QDouble pi      (= QDouble pi)      -> 3.1415926535897932384626433832795028841971693993751058209749446
OctaFloat pi    (= Float256 pi)     -> 3.14159265358979323846264338327950288419716939937510582097494459230781639
LargeFloat pi   (= LargeFloat64 pi) -> 3.1415926535897932384626433832795028841971693993751058209749445923078164 [many more digits...]

Examples

Code examples in Smalltalk syntax. Notice the float precision qualifiers:

'q'  -> longFloat (= IEEE extended precision = Float80)
'Q'  -> quadFloat (= IEEE quadruple precision = Float128)
'QO' -> octaFloat (= IEEE octuple precision = Float256)
'QD' -> qDouble
'QL' -> longFloat (arbitrary precision)

Square Root:

2.0 sqrt asShortFloat           -> 1.414214
2.0 sqrt                        -> 1.4142135623731
2.0q sqrt                       -> 1.414213562373095049
2.0Q sqrt                       -> 1.4142135623730950488016887242097
2.0QD sqrt                      -> 1.4142135623730950488016887242096980785696718753769
2.0QO sqrt                      -> 1.41421356237309504880168872420969807
                                     8569671875376948073176679737990732
(2.0QL precision:200) sqrt      -> 1.41421356237309504880168872420969807
                                     8569671875376948073176679
(2.0QL precision:400) sqrt      -> 1.41421356237309504880168872420969807
                                     85696718753769480731766797379907324
                                     78462107038850387534327641572735013
                                     846230912297025
(2.0QL precision:800) sqrt      -> 1.41421356237309504880168872420969807
                                     85696718753769480731766797379907324
                                     78462107038850387534327641572735013
                                     84623091229702492483605585073721264
                                     41214970999358314132226659275055927
                                     55799950501152782060571470109559971
                                     605970274...

Precision value from Wolfram:      1.41421356237309504880168872420969807
                                     85696718753769480731766797379907324
                                     78462107038850387534327641572735013
                                     8462309122970249248360...

Cubic Root:

2.0 cbrt asShortFloat           -> 1.259921
2.0 cbrt                        -> 1.25992104989487
2.0q cbrt                       -> 1.259921049894873165
2.0Q cbrt                       -> 1.25992104989487316476721060727823
2.0QD cbrt                      -> 1.25992104989487316476721060727822835
                                     05702514647015
2.0QO cbrt                      -> 1.25992104989487316476721060727822835
                                     05702514647015079800819751121553
(2.0QL precision:200) cbrt      -> 1.25992104989487316476721060727822835
                                     0570251464701507980081975
(2.0QL precision:400) cbrt      -> 1.25992104989487316476721060727822835
                                     05702514647015079800819751121552996
                                     76513959483729396562436255094154310
                                     256035615665259
(2.0QL precision:800) cbrt      -> 1.25992104989487316476721060727822835
                                     05702514647015079800819751121552996
                                     76513959483729396562436255094154310
                                     25603561566525939902404061373722845
                                     91103042693552469606426166250009774
                                     74526565480306867185405518689245872
                                     516764199373709695098382...

Wolfram:                           1.25992104989487316476721060727822835
                                     05702514647015079800819751121552996
                                     76513959483729396562436255094154310
                                     2560356156652593990240...

Exponentiation:

2.0 exp asShortFloat            -> 7.389056
2.0 exp                         -> 7.38905609893065
2.0q exp                        -> 7.389056098930650227
2.0Q exp                        -> 7.38905609893065022723042746057499
2.0QD exp                       -> 7.38905609893065022723042746057500781
                                     31803155705518
2.0QO exp                       -> 7.38905609893065022723042746057500781
                                     318031557055184732408712782252257266
(2.0QL precision:200) exp       -> 7.38905609893065022723042746057500781
                                     3180315570551847324087123
(2.0QL precision:400) exp       -> 7.38905609893065022723042746057500781
                                     31803155705518473240871278225225737
                                     96079057763384312485079121794773753
                                     161265478866123
(2.0QL precision:800) exp       -> 7.38905609893065022723042746057500781
                                     31803155705518473240871278225225737
                                     96079057763384312485079121794773753
                                     16126547886612388460369278127337447
                                     83922133980777749001228956074107537
                                     02391330947550682086581820269647868
                                     208404220982255234875742...

Wolfram:                           7.38905609893065022723042746057500781
                                     31803155705518473240871278225225737
                                     96079057763384312485079121794773753
                                     16126547886612388460369278...

Numeric Limits/en: Unterschied zwischen den Versionen

Aktuelle Version vom 14. Mai 2026, 10:39 Uhr

Inhaltsverzeichnis

Syntax (In Smalltalk and FreezeValues)

Integers (arbitrary size)

Fractions (arbitrary integral numerator and denominator)

ScaledDecimals

Floats / Float64 (actually 64bit IEEE doubles)

ShortFloats / Float32 (32bit IEEE doubles)

LongFloats / Float80 (80bit IEEE long doubles)

QuadFloats / Float128 (128bit IEEE quadruple floats)

OctaFloats / Float256 (256bit IEEE octuple floats)

LargeFloats (arbitrary precision (defaults to 200bit) software floats)

QDoubles (4 IEEE doubles combined)

Complex

Exact Integer Numbers

Exact Fractions, ScaledDecimals and FixedDecimals

Fractions

ScaledDecimal

FixedDecimal

Scaled- vs. FixedDecimal; which one to use and when?

Inexact Float and Double Numbers

Limited Precision

Floating Point Errors Propagate

Monetary Values

Floating Point Number Comparison

Limited Range of Float and Double Numbers

Speed of Operations

Trigonometric and other Math Functions

Complex Results

Undefined Results, NaN and Domain Errors

Overflow

Different Results on Different CPUs

Summary of Higher Precision Numbers

Constants

Examples

See Also

Navigationsmenü

@@ Zeile 3: / Zeile 3: @@
 Expecco supports arbitrary precision integer arithmetic,
-arbitrary precision fractions and limited precision floating point numbers.
+arbitrary precision fractions, arbitrary and limited precision floating point numbers in various precisions and complex numbers.
+In addition, special purpose numbers for monetary and decimal presentations.
+Being based on Smalltalk/X, expecco provides a complete set of number classes.
+== Syntax (In Smalltalk and FreezeValues) ==
+See also: "[[Smalltalk_Syntax_Cheat_Sheet]]".
+====Integers (arbitrary size)====
+In addition to the standard Smalltalk radix/base-notation <code>&lt;''b''&gt;r&lt;''digits''&gt;</code> with bases between 2 and 36, the C notations <code>0x&lt;''digits''&gt;</code> (hex), <code>0b&lt;''digits''&gt;</code> (binary) and <code>0o&lt;''digits''&gt;</code> (octal) are recognized.<br>The old C-style leading-zero octal notation is NOT recognized, but instead interpreted as decimal.
+Notice that in Smalltalk any negative sign precedes the number's digits, whereas in C it precedes the base.
+ 1234567
+xFF00AA  ''<small>(base 16)</small>''
+b01010101 ''<small>(base 2)</small>''
+ -0xAFFE  ''<small>(negative C notation)</small>''
+r121212 ''<small>(base 3)</small>''
+r123123 ''<small>(base 4)</small>''
+r-AFFE ''<small>(negative Smalltalk notation)</small>''
+====Fractions (arbitrary integral numerator and denominator)====
+Numerator and denominator must be integers (of any size). Any negative sign must be inside the parentheses.
+ (1/2)
+ (1/101)
+ (-1/3)
+====ScaledDecimals ====
+The number after the 's' character specifies the number of digits to be printed. However, internally the full precision is used in arithmetic operations.
+s2
+.456s2
+.009s2  ''<small>(prints as 123.01)</small>''
+.001s2  ''<small>(prints as 0.00)</small>''
+.009s2 + 0.001s2  ''<small>(prints as 123.01)</small>''
+====Floats / Float64 (actually 64bit IEEE doubles)====
+.456
+e17
+.0e23
+e     ''<small>(legal, but not recommended)</small>''
+ .5      ''<small>(illegal)</small>''
+ .5e     ''<small>(illegal)</small>''
+====ShortFloats / Float32 (32bit IEEE doubles)====
+An 'f' suffix makes it float32 constant.
+.456f
+e17f
+.0e23f
+f      ''<small>(legal, not recommended</small>)''
+ .5f      ''<small>(illegal</small>)''
+ .5ef     ''<small>(illegal</small>)''
+====LongFloats / Float80 (80bit IEEE long doubles)====
+.456q
+e17q
+.0e23q
+====QuadFloats / Float128 (128bit IEEE quadruple floats)====
+.456Q
+e17Q
+.0e23Q
+====OctaFloats / Float256 (256bit IEEE octuple floats)====
+.456QO
+e17QO
+.0e23QO
+====LargeFloats (arbitrary precision (defaults to 200bit) software floats)====
+.456QL
+e17QL
+.0e23QL
+====QDoubles (4 IEEE doubles combined)====
+.456QD
+e17QD
+.0e23QD
+====Complex====
++5i
+i
+.23+5.67i
 == Exact Integer Numbers ==
 For integer operations, there is no overflow or error in the result for any legal operation.
-I.e. operations on two big numbers deliver a correct result.
+I.e. operations on two integers delivers a correct and exact result.
-This is a feature of the underlying Smalltalk runtime environment and in contrast to many other programming languages (especially: Java and C) which provide int (usually 32bit) and long (usually 64bit) integer types.<br>In expecco, you can write both in Smalltalk and in the builtin JavaScript syntax<sup>1</sup>:
+This is a feature of the underlying Smalltalk runtime environment and in contrast to many other programming languages (especially: Java and C) which provide int (usually 32bit) and long (usually 64bit) integer types.<br>In expecco, you can write both in Smalltalk and in the builtin JavaScript syntax:<sup>1</sup>
  2147483647 "(0x7FFFFFFF)" + 1
-     -> 2147483648 "(0x80000000)"
+     <small>-> 2147483648 "(0x80000000)"</small>
  4294967295 "(0xFFFFFFFF)" + 1
-     -> 4294967296 "(0x100000000)"
+     <small>-> 4294967296 "(0x100000000)"</small>
  18446744073709551615 "(0xFFFFFFFFFFFFFFFF)" + 1
-     -> 18446744073709551616 "(0x10000000000000000)"
+     <small>-> 18446744073709551616 "(0x10000000000000000)"</small>
 Very large values can be computed:
 factorial
-     -> a huge number beginning with: 284625968091705451890641321211986889014....
+     <small>-> a huge number beginning with: 284625968091705451890641321211986889014....</small>
-Smalltalk will automatically convert any result which is too large to fit into a machine-integer into a LargeInteger (with an arbitrary number of bits) and also automatically convert back to a small representation, if possible.
+Smalltalk will automatically convert any result which is too large to fit into a machine-integer into a LargeInteger (with an arbitrary number of bits) and also automatically return results converted back to a small representation, if possible.<sup>2</sup>
 <br>Thus, although the two operands to the division in the following example are large integers,
  rslt := (1000 factorial) / (999 factorial)
@@ Zeile 33: / Zeile 112: @@
 Hint: Therefore, you can use a [[Tools_Notepad/en |Workspace (Notepad) window]] as a calculator with arbitrary precision.
-<sup>1</sup>) be aware that this only computes correct results if the elementary action is written in either Smalltalk or in the builtin JavaScript syntax. Depending on the version, it may or may not be correct, if using Java/Groovy, Python, C/C++, Node.js etc.
+<sup>1</sup>) be aware that this only computes correct results if the elementary action is written in either Smalltalk or in the builtin JavaScript syntax.<br>Depending on the version of the external language interpreter, it may or may not be correct, if using Java/Groovy, Python, C/C++, Node.js etc.
+<br>
+<sup>2</sup>) the integer class provides functions which operate in the limited 32 or 64 bit range. These might be useful if you have to verify results or repeat computations as returned by corresponding C or Java operations.
 == Exact Fractions, ScaledDecimals and FixedDecimals ==
 === Fractions ===
-When dividing integers, the "/" operator will deliver an exact result, possibly as a fraction:
+When dividing integers, the "<code>/</code>" operator will deliver an exact result, possibly as a fraction:
 / 3 -> 5/3
 and reduce the result (possibly returning an Integer):
@@ Zeile 45: / Zeile 126: @@
 factorial / 999 factorial -> 1000
+There are also two ''truncating division'' operators:
-There is also a ''truncating division'' operator "//", which will deliver an integer, truncated towards negative infinity (i.e. the next smaller integer), which is what you'd get in Java or C:
+: "<code>//</code>", which returns an integer truncated towards negative infinity (i.e. the next smaller integer), and
+: "<code>quo:</code>". which truncates towards zero. "quo:" is what you'd get in Java or C.
 // 3 -> 1
  -5 // 3 -> -3
-The corresponding modulo operator "\\" provides the remainder, such that:
+The corresponding modulo operators are "<code>\\</code>" and "<code>rem:</code>".
- (a // b) + (a \\ b) = a
-The "\\" is the standard Smalltalk modulo operator; Smalltalk/X also provides "%" as an alias (for users with a C/Java background).<br>Thus you can also write:
+"<code>\\</code>" is the standard Smalltalk remainder operator, such that:
+ (a // b) + (a \\ b) = a
+Smalltalk/X also provides "<code>%</code>" as an alias.<br>Thus you can also write:
  (a // b) + (a % b) = a
+But be aware that its semantic is different from C, when negative operands are involved (see below).
-There is also a division operator ("quo:") which truncates towards zero, and a corresponding remainder operator ("rem:") , for which:
+The corresponding remainder for the "<code>quo:</code>" division operator (which truncates towards zero), is the "<code>rem:</code>" remainder operator, such that:
  (a quo: b) + (a rem: b) = a
 For positive a and b, the two operator pairs deliver the same result.
 For negative arguments, these are different. Be aware and think about the domain of your arguments.
+And be reminded again that the Smalltalk remainder "%" returns different results as C or Java, when operands are negative.
 In addition, the usual ceiling, floor and rounding operations are available (both on fractions and on limited precision reals):
- (5 / 3) ceiling -> 2      "the next larger integer"
+.5 ceiling -> 2      <small>"the next larger integer"</small>
- (5 / 3) floor -> 1        "the next smaller integer"
+.5 floor -> 1        <small>"the next smaller integer"</small>
- (5 / 3) truncated -> 1    "truncate towards zero"
+.5 truncated -> 1    <small>"truncate towards zero"</small>
- (5 / 3) rounded -> 2           "round wrt. fraction >= 0.5)
+.5 rounded -> 2           <small>"round wrt. fraction >= 0.5)</small>
- (5 / 3) roundTo: 0.1  -> 1.7.
+.55 roundTo: 0.1  -> 1.6.
- (5 / 3) roundTo: 0.01  -> 1.67
+.54 roundTo: 0.1  -> 1.5.
- (-5 / 3) ceiling -> -1.   "the next larger integer"
+ -1.5 ceiling -> -1      <small>"the next larger integer"</small>
- (-5 / 3) floor -> -2      "the next smaller integer"
+ -1.5 floor -> -2        <small>"the next smaller integer"</small>
- (-5 / 3) truncated -> -1  "truncate towards zero"
+ -1.5 truncated -> -1    <small>"truncate towards zero"</small>
+ -1.5 rounded -> -2           <small>"round wrt. fraction >= 0.5)</small>
+ -1.55 roundTo: 0.1  -> -1.6.
+ -1.54 roundTo: 0.1  -> -1.5.
+ (5 / 3) ceiling -> 2      <small>"the next larger integer"</small>
+ (5 / 3) floor -> 1        <small>"the next smaller integer"</small>
+ (5 / 3) truncated -> 1    <small>"truncate towards zero"</small>
+ (5 / 3) rounded -> 2           <small>"round wrt. fraction >= 0.5)</small>
+ (-5 / 3) ceiling -> -1.   <small>"the next larger integer"</small>
+ (-5 / 3) floor -> -2      <small>"the next smaller integer"</small>
+ (-5 / 3) truncated -> -1  <small>"truncate towards zero"</small>
  (-5 / 3) rounded -> -2.
- (-5 / 3) roundTo: 0.1 -> -1.7
 Fractions print themself as "(''numerator'' / ''denominator'')".
+Be careful when rounding fractions to an inexact float (which cannot be represented in the IEEE binary format). A famous example is 0.1, which has a periodic binary mantissa.
+When rounding a fraction to such a number, Smalltalk does its best but has no chance to return a small fraction. Therefore:
+ (5 / 3) roundTo: 0.1  -> (1595024868027069/938249922368864).
+ (5 / 3) roundTo: 0.01  -> (652865570981689/390937467653706)
+ (-5 / 3) roundTo: 0.1 -> (-1595024868027069/938249922368864)
+but:
+ (5 / 3) roundTo: (1/10)  -> (17/10).
+ (5 / 3) roundTo: (1/100)  -> (167/100)
+ (-5 / 3) roundTo: (1/10) -> (-17/10)
 === ScaledDecimal ===
@@ Zeile 82: / Zeile 190: @@
 use ScaledDecimals (which for backward compatibility are also called "FixedPoint" (<sup>1)</sup>).
-These are also exact fractions, but print differently: you can specify the number of digits to be printed and it will print itself rounded on the last digit:
+These are also exact fractions, but '''print''' temself differently: you can specify the number of digits to be printed and it will print itself rounded on the last digit.
+In other words: the computation and internal value will be exact (as with Fractions), and therefore, no rounding errors will accumulate.
+Only when printed, will the external represenatation be rounded to the specified number of decimal places.
  (5 / 3) asScaledDecimal:2 -> 1.67
@@ Zeile 91: / Zeile 201: @@
 .2 asScaledDecimal:3 -> 1.200
- Float pi asScaledDecimal:5
+ Float pi asScaledDecimal:5 -> 3.14159
-) the class was previously called "FixedPoint" and the converters were called "asFixedPoint:". For compatibility with other Smalltalk dialects, these have aliases "ScaledDecimal" and "asScaledDecimal:".<br>Both the old class name and the old operators are and will be supported in the future for backward compatibility (as aliases),
+) the class was previously called "<code>FixedPoint</code>" and the converters were called "<code>asFixedPoint:</code>".<br>For compatibility with other Smalltalk dialects, these have aliases "<code>ScaledDecimal</code>" and "<code>asScaledDecimal:</code>".<br>Both the old class name and the old operators are and will be supported in the future for backward compatibility (as aliases), but you should use the new name, both for compatibility with other Smalltalk dialects, and to avoid confusion with FixedDecimal numbers.
-but you should use the new name, both for compatibility with other Smalltalk dialects, and to avoid confusion with FixedDecimal numbers.
 === FixedDecimal ===
-As mentioned above, ScaledDecimal keep the exact value internally, but print themself rounded to a given number of fractional digits.
+As mentioned above, a ScaledDecimal keeps the exact value internally, but prints itself rounded to a given number of decimal digits.
-Smalltalk/X provides an alternative class called "''FixedDecimal''", which always keeps a rounded value internally. These may be better suited for monetary values, especially in computed additive sums which are printed in a table, as the sum of two FixedDecimals will always be the presented (printed) sum of two FixedDecimals.
+Smalltalk/X provides an alternative class called "''FixedDecimal''" <sup>(1)</sup>, which always keeps a rounded value internally. These may be better suited for monetary values, especially in computed additive sums which are printed in a table, as the sum of two FixedDecimals will always be the presented (printed) sum of two FixedDecimals. In contrast, with ScaledDecimals, you may see a sum which differs from what presented table values suggest.
 For example:
@@ Zeile 117: / Zeile 226: @@
 Thus, when multiplying a float and a fixed decimal, you'll get a float result, whereas if you multiply an integer and a fixed decimal, the result will be a fixed decimal. Finally, when multiplying a fixed decimal and a scaled decimal, the result will be a fixed decimal.
+<sup>1)</sup> both names "<code>ScaledDecimal</code>" and "<code>FixedDecimal</code>" have been chosen a bit unwise, and may be confusing. However, these cannot easily be changed for backward and cross Smalltalk dialect compatibility reasons. We apologize.
+=== Scaled- vs. FixedDecimal; which one to use and when? ===
+Use ScaledDecimals to print tables or measurement values.
+Use FixedDecimals when working with money, escpecially when printing accounting lists where monetary values are added up. With FixedDecimals, youwill get a sumwhich corresponds to what has been printed above.
 == Inexact Float and Double Numbers ==
-Floating point numbers are inherently inexact and almost always represent an approximated value. The error depends on the floating point number's precision, which is the number of bits with which the value is approximated.
+Floating point numbers are inherently inexact and almost always represent an approximated value. The error depends on the floating point number's precision, which is the number of bits with which the value is approximated. there are numbers which cannot ever be represented as a floating point number, whatever precision is used. Even innocent looking numbers (eg. "0.1") are of this kind.
+This is not a problem specific to expecco or Smalltalk,
+but inherent to the way floating point numbers are represented in the machine's CPU.<br>See [https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html "What Every Computer Scientist Should Know About Floating-Point Arithmetic"],
+[https://people.eecs.berkeley.edu/~wkahan/Mindless.pdf "Mindless Assessments of Roundoff in Floating-Point Computation"] and [https://www-users.math.umn.edu/~arnold/disasters "Some disasters attributable to bad numerical computing"].<br>A very impressive example of how wrong double precision IEEE arithmetic can be is described in "[[Do_not_trust_Floating_Point]]" <sup>(1)</sup>.
+Floating point numbers are represented as a sum of powers of 2 (actually 1/2 + 1/4 + 1/8 +...) called the "''mantissa''" then multiplied by 2 raised to an exponent. I.e.
+ value = mantissa * (2 ** exponent)
+with the mantissa being normalized to be a sum in the interval 0.5..1 (as listed above). And the exponent stored with an offset (called "''ebias''"). The minimum exponent (0) is reserved for the zero number and non-normalized tiny numbers (called "''subnormals''"); the maximum exponent is reserved for infinities and NaNs ("''Not a Number''"). These might be returned from some operations if invalid arguments are provided (for example, trying to compute the logarithm or square root of a negative number).
+The number of exponent bits determines the largest and smallest representable magnitudes, the number of mantissa bits determines the relative error. The absolute error depends on the magnitude of the last mantissa bit, which depends on the exponent. This value is called "''Unit in the Last Place''" or "'ULP'" (see [https://en.wikipedia.org/wiki/Unit_in_the_last_place Wikipedia]).<br>For a large number like 1e100, one ULP is the very large 1.94266889222573e+84, whereas for a small number like 0.5, it is 1.11022302462516e-16.
-This is not a problem specific to expecco,
-but inherent to the way floating point numbers are represented (in the machine). <br>See [https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html "What Every Computer Scientist Should Know About Floating-Point Arithmetic"],
-[https://people.eecs.berkeley.edu/~wkahan/Mindless.pdf "Mindless Assessments of Roundoff in Floating-Point Computation"] and [http://www-users.math.umn.edu/~arnold//disasters "Some disasters attributable to bad numerical computing"]. A very impressive example of how wrong double precision IEEE arithmetic can be is described in "[[Do_not_trust_Floating_Point]]".
+Floating point formats differ in the number of bits (single/double/extended precision etc.).<br>
-Floating point numbers are represented as a sum of powers of 2 (actually 1/2 + 1/4 + 1/8 +...) multiplied by 2 raised to an exponent. I.e.
+A double precision IEEE float has 11 bits for the exponent and 53 for the mantissa (see [https://en.wikipedia.org/wiki/IEEE_754 IEEE floating point formats]).
- value = mantissa * 2 exponent
-with the mantissa being normalized to be a sum in the interval 0.5..1 (as listed above).
+As a rule of thumb, the error in the last bit of a double precision IEEE float is roughly 15 to 16 orders of magnitudes smaller than the magnitude of the (double precision) floating point number. The error is larger for single precision (32bit) floats and smaller for extended floats (80, 128 or more bits).
-The number of exponent bits determines the largest and smallest representable magnitudes, the number of mantissa bits determines the relative error. The error depends on the value of the last mantissa bit, which depends on the exponent. This value is called "''Unit in the Last Place''" or "ULP" (see [https://en.wikipedia.org/wiki/Unit_in_the_last_place Wikipedia]).<br>For a large number like 1e100, one ULP is the very large 1.94266889222573e+84, whereas for a small number like 0.5, it is 1.11022302462516e-16. As a rule of thumb, the error in the last bit is roughly 15 to 16 orders of magnitudes smaller than the magnitude of the (double precision) floating point number.
+<sup>1)</sup>: If you do not believe me, try the following example from one of the mentioned papers in (eg. excel) or your favourite programming language:
-Floating point formats differ in the number of bits (single/double/extended precision). A double precision IEEE float has 11 bits for the exponent and 53 for the mantissa (see [https://en.wikipedia.org/wiki/IEEE_754 IEEE floating point formats]).
+ v := 4/3.    "/ or maybe (4.0/3.0)
+ w := v - 1.
+ x := w*3.
+ y := x - 1.
+ z := (y*2)**52.
 === Limited Precision ===
-Due to the limited number of bits in the mantissa, different values may end in the same floating point representation. For example, both 9223372036854776000 and 9223372036854775808 will end up being represented as the same float when converted from integer to float. For example:
+Due to the limited number of bits in the mantissa, different values may end in the same floating point representation. For example, both 9223372036854776000 and 9223372036854775808 will end up being represented as the same float when converted from integer. The reason is that there are simply not enough bits in the mantissa.
+Thus:
  9223372036854776000 asFloat = 9223372036854775808 asFloat
-will return true, and the difference will be zero in:
+will return "true", and the difference will be zero in:
  9223372036854776000 asFloat - 9223372036854775808 asFloat
 in contrast to the correct result being returned when comparing/subtracting them as integers:
  9223372036854776000 = 9223372036854775808.
+ <small>=> false</small>
+ 9223372036854776000 - 9223372036854775808.
+ <small>=> 192</small>
+Try it in a workspace window.
-Also, many numbers (actually: most numbers) cannot be exactly represented by a finite sum of powers of 2. Such numbers will have an error in the last significant bit (actually half the last bit). When floating point numbers are added or multiplied, the result is usually computed internally with a few more bits as mantissa, and then rounded on the last bit, to fit the mantissa's number of bits.
+Also, many numbers (actually: most numbers) cannot be exactly represented by a finite sum of powers of 2. Such numbers have an error in the last significant bit (actually half the last bit). When floating point numbers are added or multiplied, the result is usually computed internally with a few more bits as mantissa, and then rounded on the last bit, to fit the mantissa's number of bits.
+Notice that this may not be immediately obvious, because the print functions (such as printf) cheat, and round again on the last bit. Thus, a result such as 0.9999999 would still be printed as "1.0").
-The situation may be relaxed slightly, by using more bits for the mantissa (and expecco gives you a choice of 32bit (called "''ShortFloat''"), 64bit ("''Float''") and 80bit ("''LongFloat''") and even more, which are mapped to corresponding IEEE floats (single, double and extended).
+The situation may be relaxed slightly, by using more bits for the mantissa (and expecco gives you a choice of 32bit (in Smalltalk called "''ShortFloat''" or "''Float32''"), 64bit ("''Float''" or "''Float64''") and 80bit ("''LongFloat''" or "''Float80''") and even more ("''Float128''", "''Float256''"), which are mapped to corresponding IEEE floats (single, double, extended, quadruple and octuple precision).
 Repeating the above example with long floats, there are enough mantissa bits and the numbers are no longer represented or considered equal,<br>thus:
  9223372036854776000 asLongFloat = 9223372036854775808 asLongFloat
-yields a false answer, and the computation:
+yields "false" as answer, and the computation:
  9223372036854776000 asLongFloat - 9223372036854775808 asLongFloat
 will give 192.0 as answer.
-However, even with more bits, the fundamental restriction remains, although appearing less frequently with higher precision. But be aware that many numbers (such as 1/3) can '''never''' be represented exactly, no matter how many bits are used.
+However, even with more bits, the fundamental restriction remains, although appearing less frequently or less obvious given higher precision. But be aware that many numbers (such as 1/10, 1/5, 1/3) can '''never''' be represented exactly, no matter how many bits are used. So even the above mentioned innocent looking "0.1" is actually an approximation and wrong in the last bit.
 The limited precision may lead to "strange" results, especially when operands are far apart; for example, when subtracting a very small value from a much larger one, as in:
@@ Zeile 158: / Zeile 293: @@
 Here, the operands differ by 17 orders of magnitude, and there are not enough bits to represent the result, which will be rounded to give you 2.15e12 again.
 <br>Thus, the comparison "2.15e12 - 1.25e-5 = 2.15e12" returns true and "2.15e12 - (2.15e12 - 1.25e-5) = 0.0", which are both obviously wrong.
-<br> In this special case, a better result is obtained when operating with extended precision:
+<br>In this special case, a better result is obtained when operating with extended precision:
 .15e12 asLongFloat - 1.25e-5 asLongFloat
-which gives 2149999999999.999987 as result (still incorrect due to its 64 bit mantissa, but much better).
+which gives 2149999999999.9999875q as result (although the default print precision will not print the final digit, so look at it in an inspector)
-If you are willing to trade speed for precision, you can use one of expecco's builtin higher precision representations or even the arbitrary precision representation, and compute with more bits of precision.
+If you are willing to trade speed for precision, you can use one of expecco's builtin higher precision representations or even the arbitrary precision representation, and compute with more bits of precision. The QDouble class provides a compromise between speed and precision, proving roughly 200 bits of precision or alternatively represents a combination of up to 4 arbitrarily valued doubles (i.e. it can represent the sum of a very large and a small number):
  (2.15e12 asQDouble) - (1.25e-5 asQDouble)
  (2.15e12 asQDouble) - ((2.15e12 asQDouble) - (1.25e-5 asQDouble))
-or an arbitrary number of precision bits (here 200):
+Another representation supports an arbitrary number of precision bits (here 200):
  (2.15e12 asLargeFloatPrecision:200) - (1.25e-5 asLargeFloatPrecision:200)
  (2.15e12 asLargeFloatPrecision:200) - ((2.15e12 asLargeFloatPrecision:200) - (1.25e-5 asLargeFloatPrecision:200))
-Which both return the correct 2149999999999.9999875 and 1.25e-5 respectively.
+Both return the correct results 2149999999999.9999875 and 1.25e-5 respectively.
+Be aware, that the higher precision and arbitrary precision operations are much slower than the ones which are directly supported by the processor (which has special hardware, usually for single, double and extended precision). Also these extended classes are still being developed and not yet released for official use (meaning they may contain bugs, especially in their trigonometric and other math functions by the time of writing).
+You may use fractions,
+ 2150000000000 - (1 / 125000)
+to compute the exact result: (268749999999999999/125000)
+(of course, these are less convenent to read, and should probably be presented to the end-user as ScaledDecimals.
+Also, do not forget that a conversion of one number to a higher precision number cannot ''magically'' generate missing bits.
+<br>For example, given a 32 bit floating point number which is already an approximation (i.e. the real value cannot be represented as an exact sum of powers of 2),
+then the conversion will give you another such approximation, with the same error.
+Thus, "0.25 asLongFloat" will give you an exact 0.25 (because 0.25 is representable), whereas "0.1 asLongFloat" will not give an exact "0.1".
+Actually, the result of such a conversion will usually not give you the full possible precision.
+If you need a constant with the max. precision, either enter it as such (i.e. "0.1q" instead of "0.1 asLongFloat") or read it from a string (i.e. LongFloat fromString:'0.1').
-Be aware, that the higher precision and arbitrary precision operations are much slower than the ones which are directly supported by the processor which has special hardware, usually for single, double and extended precision. Also they are currently being developed and not yet released for official use (meaning they may contain bugs, especially in their trigonometric and other math functions by the time of writing).
 === Floating Point Errors Propagate ===
 The above rounding and last bit errors will accumulate, with every math operation performed on it (and may even do so wildly).
-For example, the decimal 0.1 cannot be exactly represented as a floating point number, and is actually 0.099999... with an error in the last bit (half a ULP).
+For example, the already mentioned 0.1 cannot be exactly represented as a floating point number, and is actually 0.099999..X with an error in the last bit (half a ULP).
-Adding this multiple times will result in a larger error in the final result:
+<br>Adding this multiple times will result in a larger and larger accumulated error in the final result:
 .0 - (0.1 + 0.1 + 0.1 ...(10 times)... + 0.1) -> 1.11022302462516E-16
-The print functions will usually try to compensate for an error in the last bit(s), showing "0.1" although in reality, it is "0.09999..." (it rounds before printing).
+The print functions will try to compensate an error in the last bit(s), showing "0.1" although in reality, it is "0.09999..." (it ''cheats'' and rounds before printing).
-Thus, even though the printed representation of such a number might look ok, it will inject more and more error when the value is used in further operations (up to the point when print will no longer be able to cheat, and the error becomes visible).
+Thus, even though the printed representation of such a number might look ok, it will inject more and more error when the value is used in further operations (up to the point when the error accumulates out of the last bit and print will no longer be able to cheat, and the error becomes visible).
+==== Monetary Values ====
-This is especially inconvenient, when monetary values or counts are represented as floats and a final sum is wrong in the penny value<br>(and therefore, a ''real programmer'' will never ever use floating point numbers to represent monetary values!).
+This is especially inconvenient, when monetary values or counts are represented as floats and a final sum is wrong in the penny value<br>(and therefore, ''' ''real programmers'' never ever use floating point to represent monetary values'''!).
 <br>As an example, try to sum an array consisting of 10 values:
@@ Zeile 191: / Zeile 340: @@
 will show '9.99999999999998' (i.e. the error accumulated to a value too big for print's cheating to compensate).
-Expecco (actually the underlying Smalltalk) provides additional number representations which are better suited for such computations: Fraction, ScaledDecimal and FixedDecimal (in other systems, ScaledDecimals are also called "''FixedPoint''" numbers, and expecco knows that as an alias).
+Expecco (actually the underlying Smalltalk) provides additional number representations which are better suited for such computations: Fraction, ScaledDecimal and FixedDecimal as described above).
 These are exact fractions internally, but use different print strategies: Fractions print as such (i.e. '1/3', '2/17' etc.) whereas ScaledDecimal and FixedDecimal numbers print themself as decimal expansion (i.e. '0.33' or '0.20'). ScaledDecimal constant numbers can be entered by using "s" instead of "e" (i.e. '1.23s' defines a scaled decimal with 2 and '1.23s4' which will print 4 valid digits after the decimal point).
@@ Zeile 205: / Zeile 354: @@
 As a concrete example, try:
  (0.2 + 0.1 - 0.3) = 0
-which will return "false" and if you print "0.2 + 0.1 - 0.3", you might get something like: "5.55111512312578e-17".
+which will return <code>false</code> and if you print "<code>0.2 + 0.1 - 0.3</code>", you might get something like: "<code>5.55111512312578e-17</code>".
 <br>Even increasing the precision does not really help; if we went to 200bits precision, we'd still get a small error:
  (0.2QL + 0.1QL - 0.3QL) printString
-gives "-3.111507638930570853572...e-61"
+gives "<code>-3.111507638930570853572...e-61</code>"
+The problem also occurs when comparing numbers with different precision. For example, consider that a float32 value is to be compared against a constant. The float32 might be read from a file or provided by a measurement device via any communication mechanism.
+<br>If we compare it against a higher precision value, the missing bits in the shorter float are filled with zeros. Thus:
+ (Float32 readFrom:'0.125') = 0.125
+leads to a <code>true</code> value, whereas:
+ (Float32 readFrom:'0.123') = 0.123
+returns <code>false</code>.
+<br>The reason for this is that 0.125 can be represented as exact float in both float32 and float64 formats, whereas 0.123 is non-exact and has repeated binary digits in the mantissa.
+Its representation as float64 is:
+01111111011 1111011111001110110110010001011010000111001010110000
+and:
+01111011 11110111110011101101101
+as float32.
+When comparing, the float32 is expanded to:
+01111011 111101111100111011011010000000000000000000000000000000
+which is obviously different.
 Instead of comparing against a constant, either use range-compares and/or use the special "''compare-almost-equal''" functions, where the number of bits of acceptable error can be specified (so called: "''ULPs''"). Expecco provides such functions both for elementary code and in the standard action block library.
-Also, for this reason, do not compute money values using floats or doubles.
+Again, this is another reason to not compute monetary values using floats or doubles.
 Instead, use instances of ScaledDecimal.
 Otherwise you might loose a cent/penny here and there, if you use floats/doubles on big budgets.
@@ Zeile 220: / Zeile 385: @@
 and:
  (0.1 asScaledDecimal + 0.2 asScaledDecimal - 0.3 asScaledDecimal) printString => 0.00
+Find further insight [https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ here]
 === Limited Range of Float and Double Numbers ===
-Floating point numbers also have a limited range.
+Floating point numbers also have a limited range; there are smallest and largest representable values.
+<br>
 In expecco, the default float format is IEEE double precision format (called "''Float''" or "''Float64''" in expecco).
 Numbers with an absolute value greater than 1.79769313486232E+308 will lead to a +INF/-INF
-(infinite) result, and numbers with absolute value smaller than 2.2250738585072E-308 will be zero. For IEEE single precision floats (called "''ShortFloat''" or "''Float32''" in expecco), the range is much smaller, and for IEEE extended precision (called "''LongFloat''" or "''Float80''" in expecco), the range is larger.
+(infinite) result, and numbers with absolute value smaller than 2.2250738585072E-308 will be zero.
+For IEEE single precision floats (called "''ShortFloat''" or "''Float32''" in expecco), the range is much smaller, and for IEEE extended precision (called "''LongFloat''" or "''Float80''" in expecco), the range is larger.
-You can ask the classes for their limits, with:
+You can ask the classes (or its instances <sup>1</sup>) for their limits, with:
 * fmin (smallest representable number larger than zero)
 * fmax (largest representable number),
@@ Zeile 236: / Zeile 406: @@
 * decimalPrecision (digits when printed),
+ <small>remember Float is the same as Float64</small>
  Float fmin -> 2.2250738585072E-308
  Float fmax -> 1.79769313486232E+308
@@ Zeile 244: / Zeile 414: @@
  Float decimalPrecision -> 15
+ <small>remember ShortFloat is an alias for Float32</small>
  ShortFloat fmin -> 1.175494e-38
  ShortFloat fmax -> 3.402823e+38
@@ Zeile 250: / Zeile 421: @@
  ShortFloat precision -> 24
  ShortFloat decimalPrecision -> 7
+ <small>remember LongFloat is an alias for Float80</small>
  LongFloat fmin -> 3.362103143112093506E-4932
  LongFloat fmax -> 1.189731495357231765E+4932
@@ Zeile 257: / Zeile 429: @@
  LongFloat precision -> 64
  LongFloat decimalPrecision -> 19
+ <small>QuadFloat is an alias for Float128</small>
  Float128 fmin -> 3.36210314311209350626267781732e-4932
  Float128 fmax -> 1.18973149535723176508575932662e+4932
@@ Zeile 264: / Zeile 437: @@
  Float128 precision -> 113
  Float128 decimalPrecision -> 34
+ <small>OctaFloat is an alias for Float256</small>
- Float256 fmin -> 2.48242795146434978829932822291387172367768770607964686927095329791379e-78913
- Float256 fmax -> 1.61132571748576047361957211845200501064402387454966951747637125049607e+78913
+ Float256 fmin -> 2.48242795146434978829932822291387172...5329791379e-78913
+ Float256 fmax -> 1.61132571748576047361957211845200501...7125049607e+78913
  Float256 emin -> -262142
  Float256 emax -> 262143
  Float256 precision -> 237
  Float256 decimalPrecision -> 71
+ QDouble fmin -> <small>same as float64</small>
+ QDouble fmax -> <small>same as float64</small>
+ QDouble emin -> <small>same as float64</small>
+ QDouble emax -> <small>same as float64</small>
+ QDouble precision -> 204
+ QDouble decimalPrecision -> 61
+ aLargeFloat fmin -> 0.0 <small>arbitrary small</small> <sup>1</sup>
+ aLargeFloat fmax -> inf <small>arbitrary large</small>
+ aLargeFloat emin -> -inf <small>arbitrary small</small>
+ aLargeFloat emax -> inf <small>arbitrary large</small>
+ aLargeFloat precision -> 200 <small>default; configurable</small>
+ aLargeFloat decimalPrecision -> 60 <small>default; configurable</small>
 Remember that the name "Float" refers to "Float64", which is called "double" in the C language.
@@ Zeile 276: / Zeile 464: @@
 And finally, the name "LongFloat" refers to "Float80", which is called "long double" in C.
-As a consequence, you cannot compute very large numbers using floats.
+As a consequence of the limits, you cannot compute very large numbers using any of the CPU supported floats, and you will have to use one of the software computed float representations.
 <br>For example trying to compute the number of decimal digits of a huge number:
 factorial asFloat log10 -> INF
@@ Zeile 288: / Zeile 476: @@
 Again: this is not a problem specific to expecco,
 but inherent to the way floating point numbers are represented in the CPU.
+<sup>1)</sup>because LargeFloats have an individual number of precision (per instance), you should ask the instance, not the class. The class will return the default values (which are valid for 200 bits of resolution).
 === Speed of Operations ===
-Machines have builtin floating point math operations, which usually work fastest in single or double precision (actually, some modern machines work faster in double than in single precision).
+Machines have builtin floating point math operations, which usually work fastest in single or double precision (actually, some modern machines work slightly faster in double than in single precision).
+Unless you have special precision needs, it is best to stick with double precision which is also portable across machines.
+Therefore, double precision floats (aka "''double''") is the default and simply named "''Float''" in Smalltalk/X (and therefore also in expecco) <sup>1</sup>.
+<sup>1)</sup>The reason for calling them "Float" is historic. There exist Smalltalk dialects where Floats are 32bit IEEE floats, and others where they are 64bit. To be able to import code from either dialect, Smalltalk/X uses double precision for "Float".<br>To make you intention clear, it is recommended to use the alias names which are more descriptive and explicit; i.e. "Float32", "Float64" etc.
-Unless you have special precision needs, it is best to stick with double precision (which is also portable across machines).
 == Trigonometric and other Math Functions ==
@@ Zeile 309: / Zeile 502: @@
 will be the (inexact) 0.375 (a double), instead of the exact 3/4 (a fraction).
 (this might change in a future release and provide exact results if both numerator and denominator are perfect squares)
+You can however first convert to a higher precision float and then apply the function. These will compute using a Taylor series or Newton approximation taking the number's precision into account:
+sqrt
+ <small>=> 1.4142135623731 (computed with float64 precision)</small>
+asFloat128 sqrt
+ <small>=> 1.41421356237309504880168872421</small>
+asFloat256 sqrt
+ <small>=> 1.41421356237309504880168872420969807856967187537694807317667973799073</small>
+ (2 asLargeFloatPrecision:500) sqrt
+ <small>=> 1.4142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727350138462309122970249248360558507372126441214971</small>
 == Complex Results==
@@ Zeile 316: / Zeile 523: @@
 will raise an "ImaginaryResultError".
-However, this is a ''proceedable exception'', which can be caught and if the handler proceeds, a complex result is returned:
+However, this is a ''proceedable exception'' <sup>1</sup>, which can be caught and if the handler proceeds, a complex result is returned:
  rslt := ImaginaryResultError ignoreIn:[ -2 sqrt ].
 for readability, there is also an alias called "trapImaginary:" in the number class:
@@ Zeile 330: / Zeile 537: @@
 will generate a result of "-2.0".
+All operations within "[" .. "]" which would produce an ImaginaryResultError will return a complex. Thus you can write:
-== Undefined Results==
+ Number trapImaginary:[
+    |num1 num2|
+    num1 := -2 sqrt.
+    num2 := -3 sqrt.
+    Transcript showCR: (num1 + num2).
+ ]
+and "(0+3.14626436994197i)" will be shown.
+<sup>1)</sup>Proceedable exceptions are among the unique features of the Smalltalk programming language; exceptions may be raised as being proceedable, and an exception handler may then "proceed" and provide an alternative return value from the failed operation. This mechanism is used here, where the exception handler - if pesent - can decide to return an imaginary result.
+== Undefined Results, NaN and Domain Errors ==
 Similar to the way imaginary results are handled, some operations are not defined for certain values (values outside the function's domain).
 <br>For example, the receiver of the <code>arcSin</code> operation must be in [-1 .. 1].
@@ Zeile 354: / Zeile 573: @@
  ''aNumber'' isInfinite          - true for infinities
  ''aNumber'' isPositiveInfinity  - true for +inf
- ''aNumber'' isBegativeInfinity  - true for -inf
+ ''aNumber'' isNegativeInfinity  - true for -inf
  ''aNumber'' isFinite            - false for NaN or infinities (i.e. true for valid numbers)
 == Overflow ==
-When an operation's result falls outside the range of representable numbers [fmin..fmax], you will get an infinite result (1). Further operations on these might produce more infinities or a NaN ("Not a Number").
+When an operation's arguments are OK, but the result falls outside the range of representable numbers [fmin..fmax], you will get an infinite result <sup>1)</sup>.<br>Further operations on these might produce more infinities or a NaN ("''Not a Number''").
 This may be especially troublesome, if a final result gets corrupted due to an intermediate computation, as in:
  a := 1e+10.
@@ Zeile 397: / Zeile 616: @@
  t := (a asLargeFloat * b asLargeFloat) / (c asLargeFloat * d asLargeFloat).
  rslt := t asFloat.
-   -> 1r-100
+   -> 1e-100
-) that is the current default behavior. Future versions may allow enabling exceptions in this situation, if there are customer requests. However, as most other programming languages behave similar in these situations, most programmers are aware of these pitfalls and avoid such problems.
+<sup>1)</sup> that is the current default behavior. Future versions may allow enabling exceptions in this situation, if there are customer requests. However, as most other programming languages behave similar in these situations, most programmers are aware of these pitfalls and avoid such problems.
 == Different Results on Different CPUs ==
@@ Zeile 410: / Zeile 629: @@
 <br>Be prepared for this, and use the "''almost-equal''" comparison functions when results are to be verified.
-== Higher Precision Numbers ==
+== Summary of Higher Precision Numbers ==
 Expecco supports various inexact real formats, with different precision (i.e. number of mantissa bits).
@@ Zeile 416: / Zeile 635: @@
 (however, within expecco, you will probably not care, as it is not planned to port it to another dialect).
- Name              Overall   Exponent   Mantissa    Decimal    fmin                Smalltalk/X  ANSI-Smalltalk
+ Name              Overall   Exponent   Mantissa    Decimal    fmin                ST/X        ANSI-ST      ST80/VW    IBM VA-ST
-                   Size      Size       Size (1)    Precision  fmax                Name         Name (4)
+                   Size      Size       Size <sup>1)</sup>     Precision  fmax                 Name        Name <sup>4)</sup>       Name <sup>4)</sup>   Name <sup>4)</sup>
                    Bit       Bit        Bit         Digits
- IEEE single          32        8        24          6          1.175494e-038      ShortFloat   FloatE
+ IEEE single          32        8        24          6          1.175494e-038      ShortFloat   FloatE       Float      -
 .402823e+038      FloatE
                                                                                    Float32
- IEEE double          64       11        53         15          2.225074e-308      Float        FloatD
+ IEEE double          64       11        53         15          2.225074e-308      Float        FloatD       Double    Float
 .797693e+308      FloatD
                                                                                    Float64
                                                                                    Double
- IEEE extended    80/128       15       64/112      19/34       3.362103e-4932     LongFloat    FloatQ  (2)
+ IEEE extended    80/128       15       64/112      19/34       3.362103e-4932     LongFloat    FloatQ  <sup>2)</sup>     -         -
 .189731e+4932     FloatQ
                                                                                    Float80
                                                                                    or Float128
- quad double        4*64       11       200         60          1.175494e-038      QDouble       -      (3)
+ quad double        4*64       11       204         60          1.175494e-038      QDouble       -      <sup>3)</sup>    -         -
 .402823e+038
- IEEE quadruple      128       15       112         34          3.362103e-4932     QuadFloat     -      (3)
+ IEEE quadruple      128       15       112         34          3.362103e-4932     QuadFloat     -      <sup>3)</sup>     -         -
 .189731e+4932     Float128
- IEEE octuple        256       19       236         71          2.482427e-78913    OctaFloat     -      (3)
+ IEEE octuple        256       19       236         71          2.482427e-78913    OctaFloat     -      <sup>3)</sup>     -         -
 .611325e+78913    Float256
- IEEE arbitrary      any       any      any         any         any                IEEEFloat     -      (3)
+ IEEE arbitrary      any       any      any         any         any                IEEEFloat     -      <sup>3)</sup>     -         -
                                                                 any
- large float         any       any      any         any         any                LargeFloat    -      (3)
+ large float         any       any      any         any         any                LargeFloat    -      <sup>3)</sup>     -         -
                                                                 any
 (1) mantissa incl. any hidden bit (normalized floats)
-(2) LongFloats use the underlying CPU's long double format.<br>On x86/x64 machines, LongFloats are represented as 80bit extended floats with 64bit mantissa; on other CPUs, these might be represented as 128bit quadFloats with 112 bit mantissa.
+(2) LongFloats use the underlying CPU's long double format.<br>On x86/x64 machines, LongFloats are represented as 80bit extended floats with 64bit mantissa; on other CPUs these might be represented as 128bit quadFloats with 112 bit mantissa (eg. a SPARC CPU does this).
-(3) these are currently been developed and provided as a preview feature without warranty (meaning: they may be buggy at the moment; let us know if you need them).
+(3) these are still been developed and provided as a preview feature without warranty (meaning: they may be buggy at the moment; let us know if you need them).
 (4) different Smalltalk dialects use different precisions for their floating point numbers: ST80/VW Floats are IEEE singles, V'Age Floats are IEEE doubles and ST/X Floats are IEEE doubles. VW refers to Float64 as Double.<br>Later, the ANSI standard defined FloatE, FloatD and FloatQ as aliases. You can use either interchangable in expecco.
 Notice that the use of any but double precision floats (which are directly supported by the machine) may come at a performance price.
-The speed of operations degrades from double -> single -> extended -> quad double -> ieee arbitrary - largeFloat.
+<br>The speed of operations degrades from double -> single -> extended -> ieee128 -> quad double -> ieee256 -> ieee arbitrary - largeFloat.
+<br>This is especially true for the trigonometric and math functions, where both more iterations are needed to get to the desired precision in the series computations and the individual operations are also much slower.
+Be aware that LargeFloats are super precise, but also super slow.
 == Constants ==
@@ Zeile 474: / Zeile 696: @@
 Each number class will return a representation of that constant with its precision.<br>I.e. if you ask the Float class for the constant "pi", you'll get a pi with roughly 15 digits precision, whereas if you ask QDoubles, a more accurate representation will be returned.
- Float pi        -> 3.14159265358979
+ ShortFloat pi   (= Float32 pi)      -> 3.141593
+ Float pi        (= Float64 pi)      -> 3.14159265358979
- ShortFloat pi   -> 3.141593
- LongFloat pi    -> 3.141592653589793238
+ LongFloat pi    (= Float80 pi)      -> 3.141592653589793238
+ QuadFloat pi    (= Float128 pi)     -> 3.1415926535897932384626433832795027
- QDouble pi      -> 3.1415926535897932384626433832795028841971693993751058209749446
+ QDouble pi      (= QDouble pi)      -> 3.1415926535897932384626433832795028841971693993751058209749446
- QuadFloat pi    -> 3.1415926535897932384626433832795027
- OctaFloat pi    -> 3.14159265358979323846264338327950288419716939937510582097494459230781639
+ OctaFloat pi    (= Float256 pi)     -> 3.14159265358979323846264338327950288419716939937510582097494459230781639
- LargeFloat pi   -> 3.1415926535897932384626433832795028841971693993751058209749445923078164 [many more digits...]
+ LargeFloat pi   (= LargeFloat64 pi) -> 3.1415926535897932384626433832795028841971693993751058209749445923078164 [many more digits...]
 == Examples ==

Numeric Limits/en: Unterschied zwischen den Versionen

Aktuelle Version vom 14. Mai 2026, 10:39 Uhr

Syntax (In Smalltalk and FreezeValues)

Integers (arbitrary size)

Fractions (arbitrary integral numerator and denominator)

ScaledDecimals

Floats / Float64 (actually 64bit IEEE doubles)

ShortFloats / Float32 (32bit IEEE doubles)

LongFloats / Float80 (80bit IEEE long doubles)

QuadFloats / Float128 (128bit IEEE quadruple floats)

OctaFloats / Float256 (256bit IEEE octuple floats)

LargeFloats (arbitrary precision (defaults to 200bit) software floats)

QDoubles (4 IEEE doubles combined)

Complex

Exact Integer Numbers

Exact Fractions, ScaledDecimals and FixedDecimals

Fractions

ScaledDecimal

FixedDecimal

Scaled- vs. FixedDecimal; which one to use and when?

Inexact Float and Double Numbers

Limited Precision

Floating Point Errors Propagate

Monetary Values

Floating Point Number Comparison

Limited Range of Float and Double Numbers

Speed of Operations

Trigonometric and other Math Functions

Complex Results

Undefined Results, NaN and Domain Errors

Overflow

Different Results on Different CPUs

Summary of Higher Precision Numbers

Constants

Examples

See Also

Navigationsmenü

Suche