• No results found

Introduction to Scientific Computing Languages

N/A
N/A
Protected

Academic year: 2021

Share "Introduction to Scientific Computing Languages"

Copied!
79
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction to Scientific Computing Languages

Prof.Paolo Bientinesi

pauldj@aices.rwth-aachen.de

(2)

Numerical Representation

Numbers

123 29 = (first 40 digits) 4.241379310344827586206896551724137931034. . . π= 3.141592653589793238462643383279502884197. . .

In general:Infinitenumber of digits

Computers

(3)

Numerical Representation

Numbers

123 29 = (first 40 digits) 4.241379310344827586206896551724137931034. . . π= 3.141592653589793238462643383279502884197. . .

In general:Infinitenumber of digits

Computers

Finitememory

(4)

Computers

Inexact numbers

Infinitenumbers vs.finitememory

Approximated numbers

How many digits? A pre-determined amount

π=3.141592653589793238462643383279502884197. . .

Using 4 digits:π= 3.141

Modern computers: normally 8 or 16 digits, “single” / “double” precision.

Alternatively?

Extended precision (“quad”, “double double”) Variable precision

(5)

Computers

Inexact numbers

Infinitenumbers vs.finitememory

Approximated numbers

How many digits? A pre-determined amount

π= 3.141592653589793238462643383279502884197. . .

Using 4 digits:π= 3.141

Modern computers: normally 8 or 16 digits, “single” / “double” precision.

Alternatively?

Extended precision (“quad”, “double double”) Variable precision

Symbolic representation

(6)

Computers

Inexact numbers

Infinitenumbers vs.finitememory

Approximated numbers

How many digits? A pre-determined amount

π=3.141592653589793238462643383279502884197. . . Using 4 digits:π= 3.141

Modern computers: normally 8 or 16 digits, “single” / “double” precision.

Alternatively?

Extended precision (“quad”, “double double”) Variable precision

(7)

Computers

Inexact numbers

Infinitenumbers vs.finitememory

Approximated numbers

How many digits? A pre-determined amount

π=3.141592653589793238462643383279502884197. . . Using 4 digits:π= 3.141

Modern computers: normally 8 or 16 digits, “single” / “double” precision.

Alternatively?

Extended precision (“quad”, “double double”) Variable precision

Symbolic representation

(8)

Computers

Approximated computations

4-digit representation

Inexact arithmetic

123.4 + .5678 = 123.9678 = Truncated 123.9 Rounded 124.0

Associativity?

No!

Exact arithmetic: (123.4 + .5678) + .5432 = 123.4 + (.5678 + .5432) Inexact arithmetic: (123.4 + .5678) + .5432 = 124.4 123.4 + (.5678 + .5432) = 124.5

(9)

Computers

Approximated computations

4-digit representation

Inexact arithmetic

123.4 + .5678 = 123.9678 = Truncated 123.9 Rounded 124.0

Associativity?

No!

Exact arithmetic: (123.4 + .5678) + .5432 = 123.4 + (.5678 + .5432) Inexact arithmetic: (123.4 + .5678) + .5432 = 124.4 123.4 + (.5678 + .5432) = 124.5 4 / 21

(10)

Computers

Approximated computations

4-digit representation

Inexact arithmetic

123.4 + .5678 = 123.9678 = Truncated 123.9 Rounded 124.0

Associativity?

No!

Exact arithmetic: (123.4 + .5678) + .5432 = 123.4 + (.5678 + .5432) Inexact arithmetic: (123.4 + .5678) + .5432 = 124.4 123.4 + (.5678 + .5432) = 124.5

(11)

Computers

Approximated computations

4-digit representation

Inexact arithmetic

123.4 + .5678 = 123.9678 = Truncated 123.9 Rounded 124.0

Associativity?

No!

Exact arithmetic: (123.4 + .5678) + .5432 = 123.4 + (.5678 + .5432) Inexact arithmetic: (123.4 + .5678) + .5432 = 124.4 123.4 + (.5678 + .5432) = 124.5 4 / 21

(12)

Computers

Approximated computations

4-digit representation

Inexact arithmetic

123.4 + .5678 = 123.9678 = Truncated 123.9 Rounded 124.0

Associativity?

No!

Exact arithmetic: (123.4 + .5678) + .5432 = 123.4 + (.5678 + .5432) Inexact arithmetic: (123.4 + .5678) + .5432 = 124.4 123.4 + (.5678 + .5432) = 124.5

(13)

Computers

Approximated computations

4-digit representation

Inexact arithmetic

123.4 + .5678 = 123.9678 = Truncated 123.9 Rounded 124.0

Associativity?

No!

Exact arithmetic: (123.4 + .5678) + .5432 = 123.4 + (.5678 + .5432) Inexact arithmetic: (123.4 + .5678) + .5432 = 124.4 123.4 + (.5678 + .5432) = 124.5 4 / 21

(14)

Error Analysis

f

:

A

B,

y

=

f

(

x

)

Es.:f(x) =x2+ sin(2∗x) x= π 123, f(x) =? Exact arithmetic: π 123 2 + sin2∗ π 123 =... Inexact arithmetic:

x

x,

ˆ

f

f

ˆ

f

ˆ

x

)

instead of

f

(

x

)

(15)

Error Analysis

f

:

A

B,

y

=

f

(

x

)

Es.:f(x) =x2+ sin(2∗x) x= π 123, f(x) =? Exact arithmetic: π 123 2 + sin2∗ π 123 =... Inexact arithmetic:

x

x,

ˆ

f

f

ˆ

f

ˆ

x

)

instead of

f

(

x

)

5 / 21

(16)

Error Analysis

f

:

A

B,

y

=

f

(

x

)

Es.:f(x) =x2+ sin(2∗x) x= π 123, f(x) =? Exact arithmetic: π 123 2 + sin2∗ π 123 =... Inexact arithmetic:

x

x,

ˆ

f

f

ˆ

f

ˆ

x

)

instead of

f

(

x

)

(17)

Errors

Representation Errors

Roundoff Errors

Algorithmic Errors

(18)

Known Disasters

http://ta.twi.tudelft.nl/users/vuik/wi211/disasters.html

Patriot Missile, 1991

Scud launched from Iraq againt US military base in South Arabia.

US Patriot’s missile missed the incoming Scud. 28 casualties.

Cancellation

Spaceship Ariane 5 launched in 1996,

destroyed 37 seconds after liftoff.

(19)

Known Disasters

http://ta.twi.tudelft.nl/users/vuik/wi211/disasters.html

Patriot Missile, 1991

Scud launched from Iraq againt US military base in South Arabia.

US Patriot’s missile missed the incoming Scud. 28 casualties.

Cancellation

Spaceship Ariane 5 launched in 1996,

destroyed 37 seconds after liftoff.

Overflow

(20)

Known Disasters

http://ta.twi.tudelft.nl/users/vuik/wi211/disasters.html

Patriot Missile, 1991

Scud launched from Iraq againt US military base in South Arabia.

US Patriot’s missile missed the incoming Scud. 28 casualties.

Cancellation

Spaceship Ariane 5 launched in 1996,

destroyed 37 seconds after liftoff.

(21)

Floating Point Numbers

y=±d0.d1d2. . . dt−1×βe

β: base (radix)

t: precision = # slots for the mantissa (d0.d1. . . dt−1)

0≤di≤β−1: digits

emin≤e≤emax: exponent range

Normalization: (

y=±1.d1d2. . . dt−1×βe

the bitd0is used for the sign±

Arithmetic β t emin emax

Single precision 2 24 -126 127 Double precision 2 53 -1022 1023 HP calculator 10 12 -499 499 IBM 3090 16 14 -63 64 Setun 3 Quadruple prec. 2 113 -16382 16383 8 / 21

(22)

Floating Point Numbers

y=±d0.d1d2. . . dt−1×βe β: base (radix)

t: precision = # slots for the mantissa (d0.d1. . . dt−1)

0≤di≤β−1: digits

emin≤e≤emax: exponent range

Normalization: (

y=±1.d1d2. . . dt−1×βe

the bitd0is used for the sign±

Arithmetic β t emin emax

Single precision 2 24 -126 127 Double precision 2 53 -1022 1023 HP calculator 10 12 -499 499 IBM 3090 16 14 -63 64 Setun 3 Quadruple prec. 2 113 -16382 16383

(23)

Floating Point Numbers

y=±d0.d1d2. . . dt−1×βe β: base (radix)

t: precision = # slots for the mantissa (d0.d1. . . dt−1)

0≤di≤β−1: digits

emin≤e≤emax: exponent range

Normalization: (

y=±1.d1d2. . . dt−1×βe

the bitd0is used for the sign±

Arithmetic β t emin emax

Single precision 2 24 -126 127 Double precision 2 53 -1022 1023 HP calculator 10 12 -499 499 IBM 3090 16 14 -63 64 Setun 3 Quadruple prec. 2 113 -16382 16383 8 / 21

(24)

Floating Point Numbers

y=±d0.d1d2. . . dt−1×βe β: base (radix)

t: precision = # slots for the mantissa (d0.d1. . . dt−1)

0≤di≤β−1: digits

emin≤e≤emax: exponent range

Normalization: (

y=±1.d1d2. . . dt−1×βe

the bitd0is used for the sign±

Arithmetic β t emin emax

Single precision 2 24 -126 127 Double precision 2 53 -1022 1023 HP calculator 10 12 -499 499 IBM 3090 16 14 -63 64 Setun 3 Quadruple prec. 2 113 -16382 16383

(25)

Floating Point Numbers

y=±d0.d1d2. . . dt−1×βe β: base (radix)

t: precision = # slots for the mantissa (d0.d1. . . dt−1)

0≤di≤β−1: digits

emin≤e≤emax: exponent range

Normalization: (

y=±1.d1d2. . . dt−1×βe

the bitd0is used for the sign±

Arithmetic β t emin emax

Single precision 2 24 -126 127 Double precision 2 53 -1022 1023 HP calculator 10 12 -499 499 IBM 3090 16 14 -63 64 Setun 3 Quadruple prec. 2 113 -16382 16383 8 / 21

(26)

Floating Point Numbers

y=±d0.d1d2. . . dt−1×βe β: base (radix)

t: precision = # slots for the mantissa (d0.d1. . . dt−1)

0≤di≤β−1: digits

emin≤e≤emax: exponent range

Normalization: (

y=±1.d1d2. . . dt−1×βe

the bitd0is used for the sign±

Arithmetic β t emin emax

Single precision 2 24 -126 127 Double precision 2 53 -1022 1023 HP calculator 10 12 -499 499 IBM 3090 16 14 -63 64 Setun 3 Quadruple prec. 2 113 -16382 16383

(27)

Floating Point Numbers

y=±d0.d1d2. . . dt−1×βe β: base (radix)

t: precision = # slots for the mantissa (d0.d1. . . dt−1)

0≤di≤β−1: digits

emin≤e≤emax: exponent range

Normalization: (

y=±1.d1d2. . . dt−1×βe

the bitd0is used for the sign±

Arithmetic β t emin emax

Single precision 2 24 -126 127 Double precision 2 53 -1022 1023 HP calculator 10 12 -499 499 IBM 3090 16 14 -63 64 Setun 3 Quadruple prec. 2 113 -16382 16383 8 / 21

(28)

Floating Point Numbers

y=±d0.d1d2. . . dt−1×βe β: base (radix)

t: precision = # slots for the mantissa (d0.d1. . . dt−1)

0≤di≤β−1: digits

emin≤e≤emax: exponent range

Normalization: (

y=±1.d1d2. . . dt−1×βe

the bitd0is used for the sign±

Arithmetic β t emin emax

Single precision 2 24 -126 127 Double precision 2 53 -1022 1023 HP calculator 10 12 -499 499

(29)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×20= (1 +12×0 + 14×0)×1 = 1.0 1.01×20= (1 +12×0 + 14×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2 9 / 21

(30)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2

(31)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 = (1×1 + 1 2×0 + 1 4×0)× 1 4 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2 9 / 21

(32)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 = (1×1 + 1 2×0 + 1 4×0)× 1 4 =.25 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2

(33)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 = (1×1 + 1 2×0 + 1 4×0)× 1 4 =.25 1.01×2−2 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2 9 / 21

(34)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 = (1×1 + 1 2×0 + 1 4×0)× 1 4 =.25 1.01×2−2 = (1×1 + 1 2×0 + 1 4×1)× 1 4 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2

(35)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 = (1×1 + 1 2×0 + 1 4×0)× 1 4 =.25 1.01×2−2 = (1×1 + 1 2×0 + 1 4×1)× 1 4 = (.5 +.125)/2 =.3125 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2 9 / 21

(36)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 = (1×1 + 1 2×0 + 1 4×0)× 1 4 =.25 1.01×2−2 = (1×1 + 1 2×0 + 1 4×1)× 1 4 = (.5 +.125)/2 =.3125 1.10×2−2 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +12×0 + 14×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2

(37)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 = (1×1 + 1 2×0 + 1 4×0)× 1 4 =.25 1.01×2−2 = (1×1 + 1 2×0 + 1 4×1)× 1 4 = (.5 +.125)/2 =.3125 1.10×2−2 = (1×1 + 1 2×1 + 1 4×0)× 1 4 = (.5 +.25)/2 =.375 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +12×0 + 14×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2 9 / 21

(38)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−2 = (1×1 + 1 2×0 + 1 4×0)× 1 4 =.25 1.01×2−2 = (1×1 + 1 2×0 + 1 4×1)× 1 4 = (.5 +.125)/2 =.3125 1.10×2−2 = (1×1 + 1 2×1 + 1 4×0)× 1 4 = (.5 +.25)/2 =.375 1.11×2−2 = (1×1 + 1 2×1 + 1 4×1)× 1 4 = (.5 +.25 +.125)/2 =.4375 1.00×20= (1 +12×0 + 14×0)×1 = 1.0 1.01×20= (1 +12×0 + 14×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2

(39)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−1 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2 9 / 21

(40)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−1= (1 + 1 2×0 + 1 4×0)× 1 2 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2

(41)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−1= (1 + 1 2×0 + 1 4×0)× 1 2 =.5 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2 9 / 21

(42)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−1= (1 + 1 2×0 + 1 4×0)× 1 2 =.5 1.01×2−1= (1 + 1 2×0 + 1 4×1)× 1 2 = (.5 +.125) =.625 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +1 2×0 + 1 4×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2

(43)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−1= (1 + 1 2×0 + 1 4×0)× 1 2 =.5 1.01×2−1= (1 + 1 2×0 + 1 4×1)× 1 2 = (.5 +.125) =.625 1.10×2−1= (1 + 1 2×1 + 1 4×0)× 1 2 = (.5 +.25) =.75 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +12×0 + 14×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2 9 / 21

(44)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×2−1= (1 + 1 2×0 + 1 4×0)× 1 2 =.5 1.01×2−1= (1 + 1 2×0 + 1 4×1)× 1 2 = (.5 +.125) =.625 1.10×2−1= (1 + 1 2×1 + 1 4×0)× 1 2 = (.5 +.25) =.75 1.11×2−1= (1 + 1 2×1 + 1 4×1)× 1 2 = (.5 +.25 +.125) =.875 1.00×20= (1 +12×0 + 14×0)×1 = 1.0 1.01×20= (1 +12×0 + 14×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2

(45)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +12×0 + 14×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2 9 / 21

(46)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +12×0 + 14×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2

(47)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

What are the floating point numbers?

-0 .25 .3125 .375 .4375 .5 .625 .75 .875 1.0 1.25 1.00×20= (1 +1 2×0 + 1 4×0)×1 = 1.0 1.01×20= (1 +12×0 + 14×1)×1 = 1.25 1.10×20= 1.5 1.11×20= 1.75 1.00×21= 2 9 / 21

(48)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

Subnormal numbers -0 .25 .1875 .125 .0625

(49)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

Subnormal numbers -0 .25 .1875 .125 .0625 1.00×2−2=.25 10 / 21

(50)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

Subnormal numbers -0 .25 .1875 .125 .0625 1.00×2−2=.25 0.11×2−2= (1×0 +12×1 + 14×1)×1 4 = (.25 +.125)/2 =.1875

(51)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

Subnormal numbers -0 .25 .1875 .125 .0625 1.00×2−2=.25 0.11×2−2= (1×0 +12×1 + 14×1)×1 4 = (.25 +.125)/2 =.1875 0.10×2−2= (1×0 +1 2×1 + 1 4×0)× 1 4 = (.25)/2 =.125 10 / 21

(52)

β

= 2, t

= 3, e

min

=

−2, e

max

= 2

Subnormal numbers -0 .25 .1875 .125 .0625 1.00×2−2=.25 0.11×2−2= (1×0 +12×1 + 14×1)×1 4 = (.25 +.125)/2 =.1875 0.10×2−2= (1×0 +1 2×1 + 1 4×0)× 1 4 = (.25)/2 =.125 0.01×2−2= (1×0 +1 2×0 + 1 4×1)× 1 4 = (.125)/2 =.0625

(53)

Floating Point Numbers (2)

Floating point numbers are non-equidistant!

How to represent 0? underflow? overflow? NaN?

(54)

Floating Point Numbers (2)

Floating point numbers are non-equidistant! How to represent 0?

(55)

Floating Point Numbers (2)

Floating point numbers are non-equidistant! How to represent 0?

underflow? overflow? NaN?

(56)
(57)

IEEE double precision – FYI

(58)

Machine Precision

NON-equivalent definitions

Machine Precision

u

Smallest positive number such that [1 +u]6=1

Largest positive number such that [1 +u] = 1 1

1−t

Distance between 1 and the next floating point number “Machine epsilon”,M,u

(59)

Machine Precision

NON-equivalent definitions

Machine Precision

u

Smallest positive number such that [1 +u]6=1 Largest positive number such that [1 +u] = 1

1 2β

1−t

Distance between 1 and the next floating point number “Machine epsilon”,M,u

(60)

Machine Precision

NON-equivalent definitions

Machine Precision

u

Smallest positive number such that [1 +u]6=1 Largest positive number such that [1 +u] = 1

1 2β

1−t

Distance between 1 and the next floating point number “Machine epsilon”,M,u

(61)

Machine Precision

NON-equivalent definitions

Machine Precision

u

Smallest positive number such that [1 +u]6=1 Largest positive number such that [1 +u] = 1

1 2β

1−t

Distance between 1 and the next floating point number “Machine epsilon”,M,u

(62)

Representation Error

fmin=smallest positive floating point number fmax=largest positive floating point number

¯

x= [x] =floating point representation ofx

u=distance between 1 and the next floating point number Theorem:

Letx∈Randx∈[fmin, fmax]

Then x¯=x(1 +δ1)where|δ1| ≤u

Also, x¯=x/(1 +δ2)where|δ2| ≤u

(63)

Representation Error

fmin=smallest positive floating point number fmax=largest positive floating point number

¯

x= [x] =floating point representation ofx

u=distance between 1 and the next floating point number Theorem:

Letx∈Randx∈[fmin, fmax]

Then x¯=x(1 +δ1)where|δ1| ≤u

Also, x¯=x/(1 +δ2)where|δ2| ≤u

Note:δ1andδ2are functions ofx

(64)

2nd lecture

ucomputed according to definition #4

Representation error: examples of relative and absolute errors Normal vs. subnormal numbers; access via exponent

How the exponent range and the mantissa affect the arithmetic Examples: cancellation,π2/6, bisection

(65)

Smallest distance

6= 0

Normalized numbers

x and y are floating point numbers, x 6= y; how small canzbe?

z:=|x−y| vs. z:= |x−y| max(|x|,|y|)

z

:=

|

x

y

|

1) 0.0· · ·01×20 = 21−t 2) 0.0· · ·01×2emin = 2emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

z

:=

|

x

y

|

max

(

|

x

|

,

|

y

|

)

1) 0.0· · ·01×20 = 2−t 2) 0.0· · ·01×2emin = 2emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin 17 / 21

(66)

Smallest distance

6= 0

Normalized numbers

x and y are floating point numbers, x 6= y; how small canzbe?

z:=|x−y| vs. z:= |x−y| max(|x|,|y|)

z

:=

|

x

y

|

1) 0.0· · ·01×20 = 21−t 2) 0.0· · ·01×2emin = 2emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

z

:=

|

x

y

|

max

(

|

x

|

,

|

y

|

)

1) 0.0· · ·01×20 = 2−t 2) 0.0· · ·01×2emin = 2emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

(67)

Smallest distance

6= 0

Normalized numbers

x and y are floating point numbers, x 6= y; how small canzbe?

z:=|x−y| vs. z:= |x−y| max(|x|,|y|)

z

:=

|

x

y

|

1) 0.0· · ·01×20 = 21−t 2) 0.0· · ·01×2emin = 2emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

z

:=

|

x

y

|

max

(

|

x

|

,

|

y

|

)

1) 0.0· · ·01×20 = 2−t 2) 0.0· · ·01×2emin = 2emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin 17 / 21

(68)

Smallest distance

6= 0

Normalized numbers

x and y are floating point numbers, x 6= y; how small canzbe?

z:=|x−y| vs. z:= |x−y| max(|x|,|y|)

z

:=

|

x

y

|

1) 0.0· · ·01×20 = 21−t 2) 0.0· · ·01×2emin = 2emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

z

:=

|

x

y

|

max

(

|

x

|

,

|

y

|

)

1) 0.0· · ·01×20 = 2−t 2) 0.0· · ·01×2emin = 2emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

(69)

Smallest distance

6= 0

Normalized numbers

x and y are floating point numbers, x 6= y; how small canzbe?

z:=|x−y| vs. z:= |x−y| max(|x|,|y|)

z

:=

|

x

y

|

1) 0.0· · ·01×20 = 21−t 2) 0.0· · ·01×2emin = 2emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

z

:=

|

x

y

|

max

(

|

x

|

,

|

y

|

)

1) 0.0· · ·01×20 = 2−t 2) 0.0· · ·01×2emin = 2emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin 17 / 21

(70)

Smallest distance

6= 0

Normalized numbers

x and y are floating point numbers, x 6= y; how small canzbe?

z:=|x−y| vs. z:= |x−y| max(|x|,|y|)

z

:=

|

x

y

|

1) 0.0· · ·01×20 = 21−t 2) 0.0· · ·01×2emin = 2emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

z

:=

|

x

y

|

max

(

|

x

|

,

|

y

|

)

1) 0.0· · ·01×20 = 2−t 2) 0.0· · ·01×2emin = 2emin−t+1 3) 0.0· · ·01×2emin+t−1= 2emin

(71)

Roundoff Error

Notation:[exp]denotes the evaluation ofexpin floating point arithmetic. Assuming a left-to-right evaluation, it holds

[x+y+z/w] =h

[x] + [y] +

[z]/[w]i

Floating Point Arithmetic

Theorem: (Standard and Alternative Computational Models) Letxandybe floating point numbers

u=distance between 1 and the next floating point number Then

[xopy] = (xopy)(1 +1), where|1| ≤u, and op∈ {+,−,∗, /}

Also,

[xopy] = xopy

(1 +2), where|2| ≤u, and op∈ {+,−,∗, /} Note:1and2are functions ofx, yandop

(72)

Roundoff Error

Notation:[exp]denotes the evaluation ofexpin floating point arithmetic. Assuming a left-to-right evaluation, it holds

[x+y+z/w] =h

[x] + [y] +

[z]/[w]i

Floating Point Arithmetic

Theorem: (Standard and Alternative Computational Models) Letxandybe floating point numbers

u=distance between 1 and the next floating point number Then

[xopy] = (xopy)(1 +1), where|1| ≤u, and op∈ {+,−,∗, /}

Also,

[xopy] = xopy

(1 +2), where|2| ≤u, and op∈ {+,−,∗, /} Note:1and2are functions ofx, yandop

(73)

Roundoff Error

Notation:[exp]denotes the evaluation ofexpin floating point arithmetic. Assuming a left-to-right evaluation, it holds

[x+y+z/w] =h

[x] + [y] +

[z]/[w]i

Floating Point Arithmetic

Theorem: (Standard and Alternative Computational Models) Letxandybe floating point numbers

u=distance between 1 and the next floating point number Then

[xopy] = (xopy)(1 +1), where|1| ≤u, and op∈ {+,−,∗, /}

Also,

[xopy] = xopy

(1 +2), where|2| ≤u, and op∈ {+,−,∗, /} Note:1and2are functions ofx, yandop

(74)

Example: Dot Product

x, y∈Rn; κ:=xTy κ:= (χ0ψ0+χ1ψ1) +· · · +χn−2ψn−2 +χn−1ψn−1 ˇ κ = χ0ψ0(1 +(0)∗ ) +χ1ψ1(1 +(1)∗ )(1 +(1)+ ) +· · · (1 +(n+−2)) +χn−1ψn−1(1 + (n−1) ∗ ) (1 +(n+−1)) = n−1 X i=0  χiψi(1 + (i) ∗ ) n−1 Y j=i (1 +(j)+ )   where(0)+ = 0and| (0) ∗ |,|(j)∗ |,|+(j)| ≤uforj= 1, . . . , n−1

(75)

References

IEEE 754-1985 and IEEE 754-2008:

“Standard for Floating-Point Arithmetic” Book: “Accuracy and Stability of Numerical Algorithms”, by Nick Higham

Article: “What every computer scientist should know about floating-point arithmetic”, by David Goldberg Book: “Numerical Computing with IEEE Floating Point Arithmetic”, by Michael Overton

(76)

Case Studies

Bisection

Recursive implementation Base case?

Π

2

/

6

10000 X i=1 1 i2 vs. 1 X i=10000 1 i2

sqsqrt

vs.

sqrtsq

Ax

=

b

A= −1 1 1

(77)

Case Studies

Bisection

Recursive implementation Base case?

Π

2

/

6

10000 X i=1 1 i2 vs. 1 X i=10000 1 i2

sqsqrt

vs.

sqrtsq

Ax

=

b

A= −1 1 1 21 / 21

(78)

Case Studies

Bisection

Recursive implementation Base case?

Π

2

/

6

10000 X i=1 1 i2 vs. 1 X i=10000 1 i2

sqsqrt

vs.

sqrtsq

Ax

=

b

A= −1 1 1

(79)

Case Studies

Bisection

Recursive implementation Base case?

Π

2

/

6

10000 X i=1 1 i2 vs. 1 X i=10000 1 i2

sqsqrt

vs.

sqrtsq

Ax

=

b

A= −1 1 1 21 / 21

References

Related documents

Standard enthalpy change of a reaction is the amount of energy absorbed or released in a chemical reaction when the molar quantities of reactants stated in the chemical equation

We aimed to explore awareness and perception of the menopause; menopausal experiences and their impact across each individual’s life; ways that menopause with autism might differ from

D80 MUD MODULE MEZZANINE LEVEL D90 MUD MODULE PIPE DECK LEVEL J01 JACKET (EXCEPT RISERS) K13 DECK CRANE SOUTH K14 DECK CRANE NORTH T01 BRIDGE SUPPORT (TRIPOD) U10 UTILITY AREA

The United States formally withdraws from one landmark nuclear missile pact with Russia saying Moscow was in violation of valid treaty, take the Kremlin has repeatedly denied.. The

The University of Gloucestershire welcomes students from South Africa onto our courses at foundation, undergraduate and postgraduate level. For Accounting and Information

On the other hand, each speaker’s opposition between /-voice/ and /+voice/ seems remarkably consistent in maintaining the contrast, as shown by the comparable jumps made by

India military platforms procured from india was used treaties with indian treaty held at diplomat risk.. intelligence

Kaspersky Anti-Virus installed on a server under a Microsoft Windows operating system protects network storage systems against viruses and other security threats that infiltrate