Introduction to Scientific Computing Languages

(1)

Introduction to Scientific Computing Languages

Prof.Paolo Bientinesi

[email protected]

(2)

Numerical Representation

Numbers

123 29 = (first 40 digits) 4.241379310344827586206896551724137931034. . . π= 3.141592653589793238462643383279502884197. . .

In general:Infinitenumber of digits

Computers

(3)

Numerical Representation

Numbers

123 29 = (first 40 digits) 4.241379310344827586206896551724137931034. . . π= 3.141592653589793238462643383279502884197. . .

In general:Infinitenumber of digits

Computers

Finitememory

(4)

Computers

⇒

Inexact numbers

Infinitenumbers vs.finitememory

⇒Approximated numbers

How many digits? A pre-determined amount

π=3.141592653589793238462643383279502884197. . .

Using 4 digits:π= 3.141

Modern computers: normally 8 or 16 digits, “single” / “double” precision.

Alternatively?

Extended precision (“quad”, “double double”) Variable precision

(5)

Computers

⇒

Inexact numbers

π= 3.141592653589793238462643383279502884197. . .

Using 4 digits:π= 3.141

Alternatively?

Symbolic representation

(6)

Computers

⇒

Inexact numbers

π=3.141592653589793238462643383279502884197. . . Using 4 digits:π= 3.141

Alternatively?

(7)

Computers

⇒

Inexact numbers

π=3.141592653589793238462643383279502884197. . . Using 4 digits:π= 3.141

Alternatively?

Symbolic representation

(8)

Computers

⇒

Approximated computations

4-digit representation

Inexact arithmetic

123.4 + .5678 = 123.9678 = Truncated 123.9 Rounded 124.0

Associativity?

No!

Exact arithmetic: (123.4 + .5678) + .5432 = 123.4 + (.5678 + .5432) Inexact arithmetic: (123.4 + .5678) + .5432 = 124.4 123.4 + (.5678 + .5432) = 124.5

(9)

Computers

⇒

Approximated computations

Inexact arithmetic

Associativity?

No!

Exact arithmetic: (123.4 + .5678) + .5432 = 123.4 + (.5678 + .5432) Inexact arithmetic: (123.4 + .5678) + .5432 = 124.4 123.4 + (.5678 + .5432) = 124.5 4 / 21

(10)

Computers

⇒

Approximated computations

Inexact arithmetic

Associativity?

No!

(11)

Computers

⇒

Approximated computations

Inexact arithmetic

Associativity?

No!

(12)

Computers

⇒

Approximated computations

Inexact arithmetic

Associativity?

No!

(13)

Computers

⇒

Approximated computations

Inexact arithmetic

Associativity?

No!

(14)

Error Analysis

f

:

A

→

B,

y

=

f

(

x

)

Es.:f(x) =x2+ sin(2∗x) x= π 123, f(x) =? Exact arithmetic: π 123 2 + sin2∗ π 123 =... Inexact arithmetic:

x

x,

ˆ

f

ˆ

f

ˆ

(ˆ

x

)

instead of

f

(

x

)

(15)

Error Analysis

f

:

A

→

B,

y

=

f

(

x

)

x

x,

ˆ

f

ˆ

f

ˆ

(ˆ

x

)

instead of

f

(

x

)

5 / 21

(16)

Error Analysis

f

:

A

→

B,

y

=

f

(

x

)

x

x,

ˆ

f

ˆ

f

ˆ

(ˆ

x

)

instead of

f

(

x

)

(17)

Errors

Representation Errors

Roundoff Errors

Algorithmic Errors

(18)

Known Disasters

http://ta.twi.tudelft.nl/users/vuik/wi211/disasters.html

Patriot Missile, 1991

Scud launched from Iraq againt US military base in South Arabia.

US Patriot’s missile missed the incoming Scud. 28 casualties.

Cancellation

Spaceship Ariane 5 launched in 1996,

destroyed 37 seconds after liftoff.

(19)

Known Disasters

Cancellation

Overflow

(20)

Known Disasters

Cancellation

(21)

Floating Point Numbers

y=±d0.d1d2. . . dt−1×βe

β: base (radix)

t: precision = # slots for the mantissa (d0.d1. . . dt−1)

0≤di≤β−1: digits

emin≤e≤emax: exponent range

Normalization: (

y=±1.d1d2. . . dt−1×βe

the bitd0is used for the sign±

Arithmetic β t emin emax

Single precision 2 24 -126 127 Double precision 2 53 -1022 1023 HP calculator 10 12 -499 499 IBM 3090 16 14 -63 64 Setun 3 Quadruple prec. 2 113 -16382 16383 8 / 21

(22)

Floating Point Numbers

y=±d0.d1d2. . . dt−1×βe β: base (radix)

Normalization: (

y=±1.d1d2. . . dt−1×βe

Single precision 2 24 -126 127 Double precision 2 53 -1022 1023 HP calculator 10 12 -499 499 IBM 3090 16 14 -63 64 Setun 3 Quadruple prec. 2 113 -16382 16383

(23)

Floating Point Numbers

Normalization: (

y=±1.d1d2. . . dt−1×βe

(24)

Floating Point Numbers

Normalization: (

y=±1.d1d2. . . dt−1×βe

(25)

Floating Point Numbers

Normalization: (

y=±1.d1d2. . . dt−1×βe

(26)

Floating Point Numbers

Normalization: (

y=±1.d1d2. . . dt−1×βe

(27)

Floating Point Numbers

Normalization: (

y=±1.d1d2. . . dt−1×βe

(28)

Floating Point Numbers

Normalization: (

y=±1.d1d2. . . dt−1×βe

Single precision 2 24 -126 127 Double precision 2 53 -1022 1023 HP calculator 10 12 -499 499

(29)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×20= (1 +1₂×0 + 1₄×0)×1 = 1.0 1.01×20= (1 +1₂×0 + 1₄×1)×1 = 1.25 1.10×20= 1.5 1.11×20_{= 1}_.₇₅ 1.00×21_{= 2} 9 / 21

(30)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20_{= (1 +}1 2×0 + 1 4×1)×1 = 1.25 1.10×20_{= 1}_.₅ 1.11×20= 1.75 1.00×21= 2

(31)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×0)× 1 4 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20_{= (1 +}1 2×0 + 1 4×1)×1 = 1.25 1.10×20_{= 1}_.₅ 1.11×20= 1.75 1.00×21= 2 9 / 21

(32)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×0)× 1 4 =.25 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20_{= (1 +}1 2×0 + 1 4×1)×1 = 1.25 1.10×20_{= 1}_.₅ 1.11×20= 1.75 1.00×21= 2

(33)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×0)× 1 4 =.25 1.01×2−2 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20_{= (1 +}1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2 9 / 21

(34)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×0)× 1 4 =.25 1.01×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×1)× 1 4 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20_{= (1 +}1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2

(35)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×0)× 1 4 =.25 1.01×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×1)× 1 4 = (.5 +.125)/2 =.3125 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20_{= (1 +}1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2 9 / 21

(36)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×0)× 1 4 =.25 1.01×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×1)× 1 4 = (.5 +.125)/2 =.3125 1.10×2−2 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1₂×0 + 1₄×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21_{= 2}

(37)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×0)× 1 4 =.25 1.01×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×1)× 1 4 = (.5 +.125)/2 =.3125 1.10×2−2 _{= (1}_×_{1 +} 1 2×1 + 1 4×0)× 1 4 = (.5 +.25)/2 =.375 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1₂×0 + 1₄×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21_{= 2} 9 / 21

(38)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×0)× 1 4 =.25 1.01×2−2 _{= (1}_×_{1 +} 1 2×0 + 1 4×1)× 1 4 = (.5 +.125)/2 =.3125 1.10×2−2 _{= (1}_×_{1 +} 1 2×1 + 1 4×0)× 1 4 = (.5 +.25)/2 =.375 1.11×2−2 _{= (1}_×_{1 +} 1 2×1 + 1 4×1)× 1 4 = (.5 +.25 +.125)/2 =.4375 1.00×20= (1 +1₂×0 + 1₄×0)×1 = 1.0 1.01×20= (1 +1₂×0 + 1₄×1)×1 = 1.25 1.10×20= 1.5 1.11×20_{= 1}_.₇₅ 1.00×21_{= 2}

(39)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−1 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20_{= (1 +}1 2×0 + 1 4×1)×1 = 1.25 1.10×20_{= 1}_.₅ 1.11×20= 1.75 1.00×21= 2 9 / 21

(40)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−1_{= (1 +} 1 2×0 + 1 4×0)× 1 2 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20_{= (1 +}1 2×0 + 1 4×1)×1 = 1.25 1.10×20_{= 1}_.₅ 1.11×20= 1.75 1.00×21= 2

(41)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−1_{= (1 +} 1 2×0 + 1 4×0)× 1 2 =.5 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20_{= (1 +}1 2×0 + 1 4×1)×1 = 1.25 1.10×20_{= 1}_.₅ 1.11×20= 1.75 1.00×21= 2 9 / 21

(42)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−1_{= (1 +} 1 2×0 + 1 4×0)× 1 2 =.5 1.01×2−1_{= (1 +} 1 2×0 + 1 4×1)× 1 2 = (.5 +.125) =.625 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20_{= (1 +}1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2

(43)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−1_{= (1 +} 1 2×0 + 1 4×0)× 1 2 =.5 1.01×2−1_{= (1 +} 1 2×0 + 1 4×1)× 1 2 = (.5 +.125) =.625 1.10×2−1_{= (1 +} 1 2×1 + 1 4×0)× 1 2 = (.5 +.25) =.75 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1₂×0 + 1₄×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21_{= 2} 9 / 21

(44)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−1_{= (1 +} 1 2×0 + 1 4×0)× 1 2 =.5 1.01×2−1_{= (1 +} 1 2×0 + 1 4×1)× 1 2 = (.5 +.125) =.625 1.10×2−1_{= (1 +} 1 2×1 + 1 4×0)× 1 2 = (.5 +.25) =.75 1.11×2−1_{= (1 +} 1 2×1 + 1 4×1)× 1 2 = (.5 +.25 +.125) =.875 1.00×20= (1 +1₂×0 + 1₄×0)×1 = 1.0 1.01×20= (1 +1₂×0 + 1₄×1)×1 = 1.25 1.10×20= 1.5 1.11×20_{= 1}_.₇₅ 1.00×21_{= 2}

(45)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1₂×0 + 1₄×1)×1 = 1.25 1.10×20= 1.5 1.11×20_{= 1}_.₇₅ 1.00×21_{= 2} 9 / 21

(46)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1₂×0 + 1₄×1)×1 = 1.25 1.10×20= 1.5 1.11×20_{= 1}_.₇₅ 1.00×21_{= 2}

(47)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×20_{= (1 +}1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1₂×0 + 1₄×1)×1 = 1.25 1.10×20= 1.5 1.11×20_{= 1}_.₇₅ 1.00×21_{= 2} 9 / 21

(48)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

Subnormal numbers -0 .25 .1875 .125 .0625

(49)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

Subnormal numbers -0 .25 .1875 .125 .0625 1.00×2−2=.25 10 / 21

(50)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

Subnormal numbers -0 .25 .1875 .125 .0625 1.00×2−2=.25 0.11×2−2= (1×0 +1₂×1 + 1₄×1)×1 4 = (.25 +.125)/2 =.1875

(51)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

Subnormal numbers -0 .25 .1875 .125 .0625 1.00×2−2=.25 0.11×2−2= (1×0 +1₂×1 + 1₄×1)×1 4 = (.25 +.125)/2 =.1875 0.10×2−2_{= (1}_×_{0 +}1 2×1 + 1 4×0)× 1 4 = (.25)/2 =.125 10 / 21

(52)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

Subnormal numbers -0 .25 .1875 .125 .0625 1.00×2−2=.25 0.11×2−2= (1×0 +1₂×1 + 1₄×1)×1 4 = (.25 +.125)/2 =.1875 0.10×2−2_{= (1}_×_{0 +}1 2×1 + 1 4×0)× 1 4 = (.25)/2 =.125 0.01×2−2_{= (1}_×_{0 +}1 2×0 + 1 4×1)× 1 4 = (.125)/2 =.0625

(53)

Floating Point Numbers (2)

Floating point numbers are non-equidistant!

How to represent 0? underflow? overflow? NaN?

(54)

Floating Point Numbers (2)

Floating point numbers are non-equidistant! How to represent 0?

(55)

Floating Point Numbers (2)

Floating point numbers are non-equidistant! How to represent 0?

underflow? overflow? NaN?

(56)

(57)

IEEE double precision – FYI

(58)

Machine Precision

NON-equivalent definitions

Machine Precision

u

Smallest positive number such that [1 +u]6=1

Largest positive number such that [1 +u] = 1 1

2β

1−t

Distance between 1 and the next floating point number “Machine epsilon”,M,u

(59)

Machine Precision

u

Smallest positive number such that [1 +u]6=1 Largest positive number such that [1 +u] = 1

1 2β

1−t

(60)

Machine Precision

u

1 2β

1−t

(61)

Machine Precision

u

1 2β

1−t

(62)

Representation Error

fmin=smallest positive floating point number fmax=largest positive floating point number

¯

x= [x] =floating point representation ofx

u=distance between 1 and the next floating point number Theorem:

Letx∈_Randx∈[fmin, fmax]

Then x¯=x(1 +δ1)where|δ1| ≤u

Also, x¯=x/(1 +δ2)where|δ2| ≤u

(63)

Representation Error

fmin=smallest positive floating point number fmax=largest positive floating point number

¯

x= [x] =floating point representation ofx

u=distance between 1 and the next floating point number Theorem:

Letx∈_Randx∈[fmin, fmax]

Then x¯=x(1 +δ1)where|δ1| ≤u

Also, x¯=x/(1 +δ2)where|δ2| ≤u

Note:δ1andδ2are functions ofx

(64)

2nd lecture

ucomputed according to definition #4

Representation error: examples of relative and absolute errors Normal vs. subnormal numbers; access via exponent

How the exponent range and the mantissa affect the arithmetic Examples: cancellation,π2_/_{6, bisection}

(65)

Smallest distance

6= 0

Normalized numbers

x and y are floating point numbers, x 6= y; how small canzbe?

z:=|x−y| vs. z:= |x−y| max(|x|,|y|)

z

:=

|

x

−

y

|

1) 0.0· · ·01×20 _{= 2}1−t 2) 0.0· · ·01×2emin _{= 2}emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

z

:=

|

x

−

y

|

max

(

|

x

|

,

|

y

|

)

1) 0.0· · ·01×20 _{= 2}−t 2) 0.0· · ·01×2emin _{= 2}emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin 17 / 21

(66)

Smallest distance

6= 0

Normalized numbers

z:=|x−y| vs. z:= |x−y| max(|x|,|y|)

z

:=

|

x

−

y

|

1) 0.0· · ·01×20 _{= 2}1−t 2) 0.0· · ·01×2emin _{= 2}emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

z

:=

|

x

−

y

|

max

(

|

x

|

,

|

y

|

)

1) 0.0· · ·01×20 _{= 2}−t 2) 0.0· · ·01×2emin _{= 2}emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

(67)

Smallest distance

6= 0

Normalized numbers

z:=|x−y| vs. z:= |x−y| max(|x|,|y|)

z

:=

|

x

−

y

|

1) 0.0· · ·01×20 = 21−t 2) 0.0· · ·01×2emin _{= 2}emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

z

:=

|

x

−

y

|

max

(

|

x

|

,

|

y

|

)

1) 0.0· · ·01×20 _{= 2}−t 2) 0.0· · ·01×2emin _{= 2}emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin 17 / 21

(68)

Smallest distance

6= 0

Normalized numbers

z:=|x−y| vs. z:= |x−y| max(|x|,|y|)

z

:=

|

x

−

y

|

z

:=

|

x

−

y

|

max

(

|

x

|

,

|

y

|

)

1) 0.0· · ·01×20 _{= 2}−t 2) 0.0· · ·01×2emin _{= 2}emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

(69)

Smallest distance

6= 0

Normalized numbers

z:=|x−y| vs. z:= |x−y| max(|x|,|y|)

z

:=

|

x

−

y

|

z

:=

|

x

−

y

|

max

(

|

x

|

,

|

y

|

)

1) 0.0· · ·01×20 = 2−t 2) 0.0· · ·01×2emin _{= 2}emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin 17 / 21

(70)

Smallest distance

6= 0

Normalized numbers

z:=|x−y| vs. z:= |x−y| max(|x|,|y|)

z

:=

|

x

−

y

|

z

:=

|

x

−

y

|

max

(

|

x

|

,

|

y

|

)

(71)

Roundoff Error

Notation:[exp]denotes the evaluation ofexpin floating point arithmetic. Assuming a left-to-right evaluation, it holds

[x+y+z/w] =h

[x] + [y] +

[z]/[w]i

Floating Point Arithmetic

Theorem: (Standard and Alternative Computational Models) Letxandybe floating point numbers

u=distance between 1 and the next floating point number Then

[xopy] = (xopy)(1 +1), where|1| ≤u, and op∈ {+,−,∗, /}

Also,

[xopy] = xopy

(1 +2), where|2| ≤u, and op∈ {+,−,∗, /} Note:1and2are functions ofx, yandop

(72)

Roundoff Error

[x+y+z/w] =h

[x] + [y] +

[z]/[w]i

Floating Point Arithmetic

Also,

[xopy] = xopy

(73)

Roundoff Error

[x+y+z/w] =h

[x] + [y] +

[z]/[w]i

Floating Point Arithmetic

Also,

[xopy] = xopy

(74)

Example: Dot Product

x, y∈Rn; κ:=xTy κ:= (χ0ψ0+χ1ψ1) +· · · +χn−2ψn−2 +χn−1ψn−1 ˇ κ = χ0ψ0(1 +(0)∗ ) +χ1ψ1(1 +(1)∗ )(1 +(1)+ ) +· · · (1 +(n₊−2)) +χn−1ψn−1(1 + (n−1) ∗ ) (1 +(n+−1)) = n−1 X i=0  χiψi(1 + (i) ∗ ) n−1 Y j=i (1 +(j)₊ )   where(0)+ = 0and| (0) ∗ |,|(j)∗ |,|+(j)| ≤uforj= 1, . . . , n−1

(75)

References

IEEE 754-1985 and IEEE 754-2008:

“Standard for Floating-Point Arithmetic” Book: “Accuracy and Stability of Numerical Algorithms”, by Nick Higham

Article: “What every computer scientist should know about floating-point arithmetic”, by David Goldberg Book: “Numerical Computing with IEEE Floating Point Arithmetic”, by Michael Overton