Imprecise stochastic processes

(1)

Imprecise stochastic processes

Alexander Erreygers

11 th

November 2019

(2)

(3)

discrete-time

(4)

discrete-time

(5)

We consider an inﬁnite sequence

X

1 , X

2 , X

3 , . . . , X

n

, . . .

of uncertain variables that take values in the ﬁnite state space

X .

Example

X

n

is the weather in Oviedo n days from now, and

X

= {

,

}

.

We want to make inferences, for example answer the following questions:

• What is the probability of

in 4 days?

• What is the expected number of days until the next

day?

(6)

We consider an inﬁnite sequence

X

1 , X

2 , X

3 , . . . , X

n

, . . .

of uncertain variables that take values in the ﬁnite state space

X .

Example

X

n

is the weather in Oviedo n days from now, and

X

= {

,

}

.

We want to make inferences, for example answer the following questions:

• What is the probability of

in 4 days?

(7)

Modelling our uncertainty

First, we construct a tree with nodes (or situations)

s

= (

x

1 , . . . , x

n

)

,

x

i

∈

X .

For example,

(

x

1 , x

2 , x

3 ) = (

,

)

.

Second, we turn this into a probability tree by specifying a local probability mass

function p

s

:

X

→ [

0, 1

]

for every situation s

= (

x

1 , . . . , x

n

)

:

(8)

(9)

Modelling our uncertainty

First, we construct a tree with nodes (or situations)

s

= (

x

1 , . . . , x

n

)

,

x

i

∈

X .

Second, we turn this into a probability tree by specifying a local probability mass

function p

s

:

X

→ [

0, 1

]

for every situation s

= (

x

1 , . . . , x

n

)

:

(10)

(11)

This way, we construct a probability measure P and

we can make inferences

—that is, compute E

P

(

f

|

s

)

for suﬃciently nice functions f on

Ω—

by using backwards recursion due to the law of total probability (aka the law of

iterated expectation);

(12)

To make this tractable, one may assume that the local models

p

(

x

1 ,...,x

n

)

=

p

n,x

n

=

p

x

n

1. only depend on the present,

[Markovianity]

2. do not change over time.

[time homogeneity]

This way, we end up with a homogeneous

Markov chain

and

(13)

To make this tractable, one may assume that the local models

p

(

x

1 ,...,x

n

)

=

p

n,x

n

=

p

x

n

1. only depend on the present,

[Markovianity]

2. do not change over time.

[time homogeneity]

This way, we end up with a homogeneous

Markov chain

and

(14)

To make this tractable, one may assume that the local models

p

(

x

1 ,...,x

n

)

=

p

n,x

n

=

p

x

n

1. only depend on the present,

[Markovianity]

2. do not change over time.

[time homogeneity]

This way, we end up with a homogeneous

Markov chain

and

(15)

(16)

What if we cannot specify

the local models

p

_s

precisely?

¡impr

ecise

(17)

What if we cannot specify

the local models

p

_s

precisely?

¡impr

ecise

(18)

Imprecise probabilities

is a collection of theories

(19)

Let

L denote the real vector space of all real-valued functions on X ,

and let

Σ

_X

denote the subset of all probability mass functions on

X :

Σ

X

=

n

p

∈

L : p

≥

0,

∑

x

∈

X

p

(

x

) =

1 o

.

A probability mass function p induces an

expectation

operator E

p

:

L

→

R,

deﬁned by

E

p

(

f

) =

∑

x

∈

X

p

(

x

)

f

(

x

)

for all f

∈

L .

Recall that

E1. E

p

(

f

) ≥

min f for all f

∈

L ;

[boundedness]

E2. E

p

(

f

+

g

) =

E

p

(

f

) +

E

p

(

g

)

for all f , g

∈

L ;

[additivity]

(20)

Let

L denote the real vector space of all real-valued functions on X ,

and let

Σ

_X

denote the subset of all probability mass functions on

X :

Σ

X

=

n

p

∈

L : p

≥

0,

∑

x

∈

X

p

(

x

) =

1 o

.

A probability mass function p induces an

expectation

operator E

p

:

L

→

R,

deﬁned by

E

p

(

f

) =

∑

x

∈

X

p

(

x

)

f

(

x

)

for all f

∈

L .

Recall that

E1. E

p

(

f

) ≥

min f for all f

∈

L ;

[boundedness]

E2. E

p

(

f

+

g

) =

E

p

(

f

) +

E

p

(

g

)

for all f , g

∈

L ;

[additivity]

(21)

Let

L denote the real vector space of all real-valued functions on X ,

and let

Σ

_X

denote the subset of all probability mass functions on

X :

Σ

X

=

n

p

∈

L : p

≥

0,

∑

x

∈

X

p

(

x

) =

1 o

.

A probability mass function p induces an

expectation

operator E

p

:

L

→

R,

deﬁned by

E

p

(

f

) =

∑

x

∈

X

p

(

x

)

f

(

x

)

for all f

∈

L .

Recall that

E1. E

p

(

f

) ≥

min f for all f

∈

L ;

[boundedness]

E2. E

p

(

f

+

g

) =

E

p

(

f

) +

E

p

(

g

)

for all f , g

∈

L ;

[additivity]

(22)

Credal set

Instead of a single probability mass function p, we now consider

a

credal set

M

⊆

Σ

_X

,

a non-empty, closed and convex set of probability mass functions.

The credal set

M induces a set of expectations:

{

E

p

(

f

)

: p

∈

M

}

.

Speciﬁcally of interest are the bounds

E

_M

(

f

)

:

=

min

{

E

p

(

f

)

: p

∈

M

}

and

E

M

(

f

)

:

=

max

{

E

p

(

f

)

: p

∈

M

}

.

(23)

Credal set

Instead of a single probability mass function p, we now consider

a

credal set

M

⊆

Σ

_X

,

a non-empty, closed and convex set of probability mass functions.

A credal set is deﬁned by constraints of the form

c

f

≤

∑

x

∈

X

p

(

x

)

f

(

x

) =

E

p

(

f

)

.

The credal set

M induces a set of expectations:

{

E

p

(

f

)

: p

∈

M

}

.

Speciﬁcally of interest are the bounds

E

_M

(

f

)

:

=

min

{

E

p

(

f

)

: p

∈

M

}

and

E

M

(

f

)

:

=

max

{

E

p

(

f

)

: p

∈

M

}

.

(24)

Credal set

Instead of a single probability mass function p, we now consider

a

credal set

M

⊆

Σ

_X

,

a non-empty, closed and convex set of probability mass functions.

The credal set

M induces a set of expectations:

{

E

p

(

f

)

: p

∈

M

}

.

Speciﬁcally of interest are the bounds

(25)

Lower expectation

An operator E :

L

→

R is called a

lower expectation

(coherent lower prevision) if

LE1. E

(

f

) ≥

min f for all f

∈

L ;

[boundedness]

LE2. E

(

f

+

g

) ≥

E

(

f

) +

E

(

g

)

for all f , g

∈

L ;

[super-additivity]

LE3. E

(

λ f

) =

λE

(

f

)

for all f

∈

L and λ

∈

R

>

0 .

[positive homogeneity]

Theorem

An operator E :

L

→

R is a lower expectation if and only if it is the lower envelope of

some credal set

M

⊆

Σ

_X

, meaning that

(26)

Lower expectation

An operator E :

L

→

R is called a

lower expectation

(coherent lower prevision) if

LE1. E

(

f

) ≥

min f for all f

∈

L ;

[boundedness]

LE2. E

(

f

+

g

) ≥

E

(

f

) +

E

(

g

)

for all f , g

∈

L ;

[super-additivity]

LE3. E

(

λ f

) =

λE

(

f

)

for all f

∈

L and λ

∈

R

>

0 .

[positive homogeneity]

Theorem

An operator E :

L

→

R is a lower expectation if and only if it is the lower envelope of

some credal set

M

⊆

Σ

_X

, meaning that

(27)

Lower expectation

An operator E :

L

→

R is called a

lower expectation

(coherent lower prevision) if

LE1. E

(

f

) ≥

min f for all f

∈

L ;

[boundedness]

LE2. E

(

f

+

g

) ≥

E

(

f

) +

E

(

g

)

for all f , g

∈

L ;

[super-additivity]

LE3. E

(

λ f

) =

λE

(

f

)

for all f

∈

L and λ

∈

R

>

0 .

[positive homogeneity]

Theorem

An operator E :

L

→

R is a lower expectation if and only if it is the lower envelope of

some credal set

M

⊆

Σ

_X

, meaning that

(28)

Assume that we can only assess that

p

∈

M

and

p

(

x

1 ,...,x

n

)

∈

M

x

n

,

(1)

where

M

and

M

_x

for all x

∈

X are credal sets.

We consider three nested sets of probability trees that satisfy (1):

• P

CHM

: all compatible homogeneous Markov chains;

• P

CM

: all compatible Markov chains;

• P

C

: all compatible probability trees.

Can we compute

E

_P

(

f

|

s

)

:

=

inf

{

E

P

(

f

|

s

)

: P

∈

P

}

and

(29)

(30)

(31)

Assume that we can only assess that

p

∈

M

and

p

(

x

1 ,...,x

n

)

∈

M

x

n

.

(1)

We consider three nested sets of probability trees that satisfy (1):

• P

CHM

: all compatible homogeneous Markov chains;

• P

CM

: all compatible Markov chains;

• P

C

: all compatible probability trees.

Can we compute

E

_P

(

f

|

s

)

:

=

inf

{

E

P

(

f

|

s

)

: P

∈

P

}

and

(32)

Imprecise Markov chains

Computing these tight lower and upper bounds turns out to be

intractable for

P

CHM

,

intractable for

P

CM

—at least in general,

tractable for

P

C

, because we can use backwards recursion due to the

(33)

Gert de Cooman and Filip Hermans. “Imprecise probability trees: Bridging two

theories of imprecise probability”.

In: AI 172.11 (2008), pp. 1400–1427

Gert de Cooman, Filip Hermans, and Erik Quaeghebeur. “Imprecise Markov

chains and their limit behavior”.

In: PEIS 23.4 (2009), pp. 597–635

Damjan Škulj. “Discrete time Markov chains with interval probabilities”.

In:

IJAR 50.8 (2009), pp. 1314–1329

Stavros Lopatatzidis. “Robust modelling and optimisation in stochastic

processes using imprecise probalities, with an application to queueing theory”.

PhD thesis. Ghent University, 2017

(34)

continuous-time

(35)

We consider the collection

{

X

τ

: τ

∈

R

≥

0 }

of uncertain variables that take values in the ﬁnite state space

X .

Example

X

τ

is the weather in Oviedo τ time units from now, and

X

= {

,

}

.

We want to make inferences, for example answer questions like:

• What is the probability of

after 4 days?

• How long do I have to wait until it is

again?

(36)

We consider the collection

{

X

τ

: τ

∈

R

≥

0 }

of uncertain variables that take values in the ﬁnite state space

X .

Example

X

τ

is the weather in Oviedo τ time units from now, and

X

= {

,

}

.

We want to make inferences, for example answer questions like:

• What is the probability of

after 4 days?

(37)

(38)

Imprecise continuous-time Markov chains

Similar results as for imprecise discrete-time Markov chains,

but—for now—limited to inferences that depend on a ﬁnite number of time

points.

Damjan Škulj. “Eﬃcient computation of the bounds of continuous time

imprecise Markov chains”.

In: AMC 250 (2015), pp. 165–180

(39)

A counting process is a model for a stream of events

X

τ

: the number of events that have occurred up to time τ, so

X

=

Z

≥

0 .

Example

X

τ

: the number of

lightning strikes that have hit the cathedral of Oviedo.

We want to answer questions like:

• What is the probability of at least one

in some time period?

• What is the expected number of

in the following year?

(40)

A counting process is a model for a stream of events

X

τ

: the number of events that have occurred up to time τ, so

X

=

Z

≥

0 .

Example

X

τ

: the number of

lightning strikes that have hit the cathedral of Oviedo.

We want to answer questions like:

• What is the probability of at least one

in some time period?

• What is the expected number of

in the following year?

(41)

Counting processes in general

τ

|

0 |

t

1 X

t

1 =

x

1 |

t

n

X

t

n

=

x

n

|

t

X

t

=

x

|

t

+

∆

X

t

+

∆

=

y

We model our beliefs by means of the transition probabilities

P

(

X

t

+

∆

=

y

|

X

t

=

x, X

t

n

=

x

n

, . . . , X

t

1 =

x

1 |

{z

}

X

u

=

x

u

(42)

The Poisson process in particular

τ

|

0 |

t

1 X

t

1 =

x

1 |

t

n

X

t

n

=

x

n

|

t

X

t

=

x

|

t

+

∆

X

t

+

∆

=

y

For the Poisson process, we additionally assume that the transition probabilities

P

(

X

t

+

∆

=

y

|

X

t

=

x, X

u

=

x

u

)

=

P

(

X

∆

=

y

−

x

|

X

0 =

0 )

1. only depend on the present,

[Markovianity]

(43)

The Poisson process in particular

τ

|

0 |

t

1 X

t

1 =

x

1 |

t

n

X

t

n

=

x

n

|

t

X

t

=

x

|

t

+

∆

X

t

+

∆

=

y

For the Poisson process, we additionally assume that the transition probabilities

P

(

X

t

+

∆

=

y

|

X

t

=

x, X

u

=

x

u

) =

P

(

X

t

+

∆

=

y

|

X

t

=

x

)

1. only depend on the present,

[Markovianity]

(44)

The Poisson process in particular

τ

|

0 |

t

1 X

t

1 =

x

1 |

t

n

X

t

n

=

x

n

|

0 X

0 =

x

|

∆

X

∆

=

y

For the Poisson process, we additionally assume that the transition probabilities

P

(

X

t

+

∆

=

y

|

X

t

=

x, X

u

=

x

u

) =

P

(

X

∆

=

y

|

X

0 =

x

)

1. only depend on the present,

[Markovianity]

2. only depend on the length of the time period,

[time homogeneity]

(45)

The Poisson process in particular

τ

|

0 |

t

1 X

t

1 =

x

1 |

t

n

X

t

n

=

x

n

|

0 X

0 =

0 |

∆

X

∆

=

y

−

x

For the Poisson process, we additionally assume that the transition probabilities

P

(

X

t

+

∆

=

y

|

X

t

=

x, X

u

=

x

u

) =

P

(

X

∆

=

y

−

x

|

X

0 =

0 )

1. only depend on the present,

[Markovianity]

(46)

The rate parameter

A Poisson process is uniquely characterised by a single parameter:

the rate λ!

It has multiple interpretations, for instance:

the expected number of new events in any time period is proportional to λ:

E

P

(

X

t

+

∆

|

X

t

=

x, X

u

=

x

u

) =

x

+

λ

∆;

(47)

The rate parameter

A Poisson process is uniquely characterised by a single parameter:

the rate λ!

It has multiple interpretations, for instance:

the expected number of new events in any time period is proportional to λ:

E

P

(

X

t

+

∆

|

X

t

=

x, X

u

=

x

u

) =

x

+

λ

∆;

(48)

The rate parameter

A Poisson process is uniquely characterised by a single parameter:

the rate λ!

It has multiple interpretations, for instance:

the expected number of new events in any time period is proportional to λ:

E

P

(

X

t

+

∆

|

X

t

=

x, X

u

=

x

u

) =

x

+

λ

∆;

(49)

What if we do not know the rate λ precisely,

but only know that it belongs to

(50)

The general approach

Let

P be a set of counting processes characterised by the rate interval

[

λ

, λ

]

,

and deﬁne the lower expectation

E

_P

(

f

|

X

t

=

x, X

u

=

x

u

)

:

=

inf

{

E

P

(

f

|

X

t

=

x, X

u

=

x

u

)

: P

∈

P

}

.

Choose

P such that

(i) computing E

_P

(

f

|

X

t

=

x, X

u

=

x

u

)

is tractable,

(ii) E

_P

(· | ·)

is Poisson-like, in the sense that

(51)

A naive imprecise Poisson process

If

P is the set of all Poisson processes with rate λ in the rate interval

[

λ

, λ

]

, then

computing E

_P

(

f

|

X

t

=

x, X

u

=

x

u

)

is a one-parameter optimisation

problem;

E

_P

(· | ·)

is Poisson-like;

(52)

(53)

(54)

A more involved imprecise Poisson process

If

P is the set of processes that are consistent with the rate interval

[

λ

, λ

]

,

in the sense that

λ

∆

+

o

(

∆

) ≤

P

(

X

t

+

∆

=

x

+

1 |

X

t

=

x, X

u

=

x

u

) ≤

λ

∆

+

o

(

∆

)

,

then

a P in

P is not necessarily Markov nor homogeneous;

computing E

_P

(

f

|

X

t

=

x, X

u

=

x

u

)

is non-trivial (if not infeasible).

However, it turns out that

computing E

_P

(

g

(

X

t

+

∆

) |

X

t

=

x, X

u

=

x

u

)

is tractable;

(55)

A more involved imprecise Poisson process

If

P is the set of processes that are consistent with the rate interval

[

λ

, λ

]

,

in the sense that

λ

∆

+

o

(

∆

) ≤

P

(

X

t

+

∆

=

x

+

1 |

X

t

=

x, X

u

=

x

u

) ≤

λ

∆

+

o

(

∆

)

,

then

a P in

P is not necessarily Markov nor homogeneous;

computing E

_P

(

f

|

X

t

=

x, X

u

=

x

u

)

is non-trivial (if not infeasible).

However, it turns out that

computing E

_P

(

g

(

X

t

+

∆

) |

X

t

=

x, X

u

=

x

u

)

is tractable;

(56)