• No results found

I) Round-off Errors and Computer Arithmetic

N/A
N/A
Protected

Academic year: 2021

Share "I) Round-off Errors and Computer Arithmetic"

Copied!
17
0
0

Loading.... (view fulltext now)

Full text

(1)

Elementary Numerical Mathematics

I) Round-off Errors and Computer Arithmetic

Winter Term 2020/21

Gerhard Wellein

HPC Services - Regionales Rechenzentrum Erlangen (RRZE) Department for Computer Science

(2)

Round-off Errors and Computer Arithmetic Content

1.

Number representation in computers

2.

Decimal floating-point representation and round-off

errors

3.

Error analysis for computer arithmetics

(3)

Round-off Errors and Computer Arithmetic

1.

Number representation in computers

2.

Decimal floating-point representation and round-off errors

3.

Error analysis for computer arithmetics

(4)

1) Number representation in computers

Motivation:

Finite digit (arithmetics) in computers

→ A general number 𝑥 ∈ ℝ may not be stored exactly in a

computer

Example:

3

Exact arithmetics:

3 = 1,7320508075 … ⟺

3

2

= 3

Computer: a=sqrt(3.0); b=a*a; print b;

(5)

Binary Floating-Point Arithmetic Standard 754-1985 by IEEE:

Representation of 64-Bit Floating-Point number

(„double“):

1

bit:

𝑠

→ Sign:

−1

𝑠

11

bits:

𝑐

𝑖

→ Exponent: 𝑐 = σ

𝑖=111

𝑐

𝑖

2

𝑖−1 (0 ≤ 𝑐 ≤ 2047)

52 bits:

𝑓

𝑖

→ Mantissa: 𝑓 = σ

𝑖=152

𝑓

𝑖 1 2 𝑖 (0 ≤ 𝑓 < 1)

Normal form of 64-Bit Floating-Point numbers:

−1

𝑠

2

𝑐−1023

(1 + 𝑓)

1) Number representation in computers

(6)

1) Number representation in computers

−1

𝑠

2

𝑐−1023

(1 + 𝑓)

Special cases (64 Bit):

𝑠 = 0,1 ; 𝑐 = 0; 𝑓 = 0

+

𝟎

𝑐 = 0; 𝑓 ≠ 0

„denormalized numbers“

𝑐 = 2047

→ ∞ (𝑓 = 0) and NaN (𝑓 ≠ 0)

Positive (𝑠 = 0) number ranges (64 Bit):

▪ Smallest number: 𝑓 = 0; 𝑐 = 1 → 2−1022 ≈ 10−308

▪ Largest number: 𝑓 ≈ 1; 𝑐 = 2046 → 21024 ≈ 10+308

(7)

Binary Floating-Point Arithmetic Standard 754-1985 by IEEE:

Representation of 32-Bit Floating-Point number

(„float“):

1

bit:

𝑠

→ Sign:

−1

𝑠

8

bits:

𝑐

𝑖

→ Exponent: 𝑐 = σ

𝑖=18

𝑐

𝑖

2

𝑖−1 (0 ≤ 𝑐 ≤ 255)

23 bits:

𝑓

𝑖

→ Mantissa: 𝑓 = σ

𝑖=123

𝑓

𝑖 1 2 𝑖 (0 ≤ 𝑓 < 1)

Normal form of 32-Bit Floating-Point numbers:

−1

𝑠

2

𝑐−127

(1 + 𝑓)

1) Number representation in computers

(8)

1) Number representation in computers

1 ; 999,999 ; 1,000,000 : These numbers can be represented exactly in IEEE 754 standard (32-Bit)

IEEE 754 32-Bit representation of 0.999999

x1=0.999999 → x1_32= 0.999998986721038…

(9)

1) Number representation in computers

Compute (x0-x1) * 1 000 000

Assume: float x0,x1 // 32 bit values

x0= 1,000,000.0; x1= 999,999.0

→(x0-x1)*1,000,000 = 1,000,000 //exact arithmetic

x0= 1.000000; x1= 0.999999

(10)

Round-off Errors and Computer Arithmetic Content

1.

Number representation in computers

2.

Decimal floating-point representation and round-off

errors

3.

Error analysis for computer arithmetics

(11)

2) Decimal floating-point representation & round-off errors

Preliminaries:

For simplicity assume a real number (𝑦) is stored in a decimal

floating point form in a computer using

„k digits“:

fl y =

+

0. 𝑑

1

𝑑

2

𝑑

3

… 𝑑

𝑘

∗ 10

𝑛

with

1 ≤ 𝑑

1

≤ 9 and 0 ≤ 𝑑

𝑖

≤ 9 for 𝑖 = 2, … , 𝑘

„k-digit decimal floating-point form of 𝑦“

Any real number

𝑦 can be written as follows:

𝑦 =

+0. 𝑑1𝑑2𝑑3 … 𝑑𝑘𝑑𝑘+1 … ∗ 10𝑛
(12)

Round-off Errors and Computer Arithmetic Content

1.

Number representation in computers

2.

Decimal floating-point representation and round-off

errors

3.

Error analysis for computer arithmetics

(13)

3) Error analysis for computer arithmetics

Problem:

In general floating-point arithmetic is not exact on computers!

Associativity law does not hold in general

– order of

evaluation may change the binary result of computation

𝑎 + 𝑏 + 𝑐 ≠ 𝑎 + 𝑏 + 𝑐

This section: Qualitative analysis of impact of finite-digit

arithmetic

(14)

3) Error analysis for computer arithmetics

Harmonic series: Divergence ?!

(15)

3) Error analysis for computer arithmetics

Accumulated relative error eRE

#FP operations 32-Bit 64-Bit

MFlop 106 10-3 10-12

TFlop 1012 1 10-9

Accumulated relative error after a large series of FP (floating point) operations assuming a simple random walk theory for error propagation.

After N successive arithmetic operations (each having a relative error em) of the relative error

becomes:

e

RE

~ N

1/2

e

m
(16)

Round-off Errors and Computer Arithmetic Content

1.

Number representation in computers

2.

Decimal floating-point representation and round-off

errors

3.

Error analysis for computer arithmetics

(17)

4) Stability of Computations

Definition:

A computation M(x1,x2,…,xn) with input data (x1,x2,…,xn) is called

stable if small errors with respect to the input data (< e) lead to

small errors of the output data (< c* e)

Definition:

Let E0 > 0 denote an initial error and En represent the magnitude of an error after n subsequent operations. Let C>1 be an constant independent of:

1) If En  C n E0 the error growth is called linear

2) If En  Cn E

References

Related documents

Also, for this particular age group, the number of wards where the White British population is a minority (represents less than 50% of the population of a given ward for

This study aims to partly fill this gap by examining (1) the prevalence of the self-assessed need for primary care, unmet needs for primary care, perceived unjust treatment

The results report the probability of students being identified for special education in third grade based on the first grade variables of DIBELS “at risk” status on the Fall

[r]

Suppose we have a number v , which we do not know, but which satisfies the following system of modular equivalences.. The supermarket has a lot of eggs, but the manager is not

Tentative Purchase DC: 14 (Often Unavailable) Required Proficiency: Energy Weapons Handed: Requires 1 hand to use Range Increment: 40 feet “To Hit” Bonus: -- Magazine: 35. Rate

Here, we demonstrate in a mouse model of NBPI that denervation does not prevent myonuclear accretion and that reduction in myonuclear number has no effect on functional muscle

Considering the different results of the previous stud- ies, the aim of the present study is to compare the mar- ital satisfaction, marital intimacy, sexual satisfaction and