Elementary Numerical Mathematics
I) Round-off Errors and Computer Arithmetic
Winter Term 2020/21Gerhard Wellein
HPC Services - Regionales Rechenzentrum Erlangen (RRZE) Department for Computer Science
Round-off Errors and Computer Arithmetic Content
1.
Number representation in computers
2.
Decimal floating-point representation and round-off
errors
3.
Error analysis for computer arithmetics
Round-off Errors and Computer Arithmetic
1.
Number representation in computers
2.
Decimal floating-point representation and round-off errors
3.
Error analysis for computer arithmetics
1) Number representation in computers
Motivation:
▪
Finite digit (arithmetics) in computers
→ A general number 𝑥 ∈ ℝ may not be stored exactly in a
computer
▪
Example:
3
Exact arithmetics:
3 = 1,7320508075 … ⟺
3
2= 3
Computer: a=sqrt(3.0); b=a*a; print b;
Binary Floating-Point Arithmetic Standard 754-1985 by IEEE:
▪
Representation of 64-Bit Floating-Point number
(„double“):
1
bit:
𝑠
→ Sign:
−1
𝑠11
bits:
𝑐
𝑖→ Exponent: 𝑐 = σ
𝑖=111𝑐
𝑖2
𝑖−1 (0 ≤ 𝑐 ≤ 2047)52 bits:
𝑓
𝑖→ Mantissa: 𝑓 = σ
𝑖=152𝑓
𝑖 1 2 𝑖 (0 ≤ 𝑓 < 1)▪
Normal form of 64-Bit Floating-Point numbers:
−1
𝑠2
𝑐−1023(1 + 𝑓)
1) Number representation in computers
1) Number representation in computers
−1
𝑠2
𝑐−1023(1 + 𝑓)
Special cases (64 Bit):
▪
𝑠 = 0,1 ; 𝑐 = 0; 𝑓 = 0
→
−+𝟎
▪
𝑐 = 0; 𝑓 ≠ 0
→
„denormalized numbers“
▪
𝑐 = 2047
→ ∞ (𝑓 = 0) and NaN (𝑓 ≠ 0)
Positive (𝑠 = 0) number ranges (64 Bit):
▪ Smallest number: 𝑓 = 0; 𝑐 = 1 → 2−1022 ≈ 10−308
▪ Largest number: 𝑓 ≈ 1; 𝑐 = 2046 → 21024 ≈ 10+308
Binary Floating-Point Arithmetic Standard 754-1985 by IEEE:
▪
Representation of 32-Bit Floating-Point number
(„float“):
1
bit:
𝑠
→ Sign:
−1
𝑠8
bits:
𝑐
𝑖→ Exponent: 𝑐 = σ
𝑖=18𝑐
𝑖2
𝑖−1 (0 ≤ 𝑐 ≤ 255)23 bits:
𝑓
𝑖→ Mantissa: 𝑓 = σ
𝑖=123𝑓
𝑖 1 2 𝑖 (0 ≤ 𝑓 < 1)▪
Normal form of 32-Bit Floating-Point numbers:
−1
𝑠2
𝑐−127(1 + 𝑓)
1) Number representation in computers
1) Number representation in computers
▪ 1 ; 999,999 ; 1,000,000 : These numbers can be represented exactly in IEEE 754 standard (32-Bit)
▪ IEEE 754 32-Bit representation of 0.999999
x1=0.999999 → x1_32= 0.999998986721038…
1) Number representation in computers
▪ Compute (x0-x1) * 1 000 000
▪ Assume: float x0,x1 // 32 bit values
▪ x0= 1,000,000.0; x1= 999,999.0
→(x0-x1)*1,000,000 = 1,000,000 //exact arithmetic
▪ x0= 1.000000; x1= 0.999999
Round-off Errors and Computer Arithmetic Content
1.
Number representation in computers
2.
Decimal floating-point representation and round-off
errors
3.
Error analysis for computer arithmetics
2) Decimal floating-point representation & round-off errors
▪
Preliminaries:
▪
For simplicity assume a real number (𝑦) is stored in a decimal
floating point form in a computer using
„k digits“:
fl y =
−+0. 𝑑
1𝑑
2𝑑
3… 𝑑
𝑘∗ 10
𝑛with
1 ≤ 𝑑
1≤ 9 and 0 ≤ 𝑑
𝑖≤ 9 for 𝑖 = 2, … , 𝑘
„k-digit decimal floating-point form of 𝑦“
▪
Any real number
𝑦 can be written as follows:
𝑦 =
−+0. 𝑑1𝑑2𝑑3 … 𝑑𝑘𝑑𝑘+1 … ∗ 10𝑛Round-off Errors and Computer Arithmetic Content
1.
Number representation in computers
2.
Decimal floating-point representation and round-off
errors
3.
Error analysis for computer arithmetics
3) Error analysis for computer arithmetics
▪
Problem:
▪
In general floating-point arithmetic is not exact on computers!
▪
Associativity law does not hold in general
– order of
evaluation may change the binary result of computation
𝑎 + 𝑏 + 𝑐 ≠ 𝑎 + 𝑏 + 𝑐
▪
This section: Qualitative analysis of impact of finite-digit
arithmetic
3) Error analysis for computer arithmetics
Harmonic series: Divergence ?!
3) Error analysis for computer arithmetics
Accumulated relative error eRE
#FP operations 32-Bit 64-Bit
MFlop 106 10-3 10-12
TFlop 1012 1 10-9
Accumulated relative error after a large series of FP (floating point) operations assuming a simple random walk theory for error propagation.
After N successive arithmetic operations (each having a relative error em) of the relative error
becomes:
e
RE~ N
1/2e
mRound-off Errors and Computer Arithmetic Content
1.
Number representation in computers
2.
Decimal floating-point representation and round-off
errors
3.
Error analysis for computer arithmetics
4) Stability of Computations
▪ Definition:
A computation M(x1,x2,…,xn) with input data (x1,x2,…,xn) is called
stable if small errors with respect to the input data (< e) lead to
small errors of the output data (< c* e)
▪ Definition:
Let E0 > 0 denote an initial error and En represent the magnitude of an error after n subsequent operations. Let C>1 be an constant independent of:
1) If En C n E0 the error growth is called linear
2) If En Cn E