Mathematical Methods to Predict the Dynamic Shape Evolution of Cancer Growth based on Spatio-Temporal Bayesian and Geometrical Models

(1)

Superior School of Technologies and Experimental Sciences

DEPARTMENT OF MATHEMATICS

MATHEMATICAL METHODS TO PREDICT THE DYNAMIC SHAPE EVOLUTION

OF CANCER GROWTH BASED ON SPATIO-TEMPORAL BAYESIAN

AND GEOMETRICAL MODELS

Author: Advisors:

Iulian Teodor Vlad Prof. Dr. Jorge Mateu

Prof. Dr. Jos´e Joaquin Gual Arnau

Castell´on de la Plana 2015

(2)

(3)

Superior School of Technologies and Experimental Sciences

DEPARTMENT OF MATHEMATICS

Doctoral Thesis

Mathematical Methods to Predict the

Dynamic Shape Evolution of Cancer

Growth based on Spatio-Temporal

Bayesian and Geometrical Models

Author:

Ing. Iulian Teodor Vlad

Supervisors: Prof. Dr. Jorge Mateu Prof. Dr. Jos´e Joaquin Gual Arnau

A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy

in the

Computer Mathematics

(4)

(5)

the Dynamic Shape Evolution

of Cancer Growth based on

Spatio-Temporal Bayesian

and Geometrical Models

Copyrightc 2015 - Ing. Iulian Teodor Vlad

(6)

(7)

I, Ing. Iulian Teodor Vlad, declare that this thesis entitled, ’Mathematical Methods to Predict the Dynamic Shape Evolution of Cancer Growth based on Spatio-Temporal Bayesian and Geometrical Models’ and the work presented in it are my own. I confirm that:

This work was done wholly or mainly while in candidature for a research degree

at this University.

Where any part of this thesis has previously been submitted for a degree or any

other qualification at this University or any other institution, this has been clearly stated.

Where I have consulted the published work of others, this is always clearly

at-tributed.

Where I have quoted from the work of others, the source is always given. With

the exception of such quotations, this thesis is entirely my own work.

I have acknowledged all main sources of help.

Where the thesis is based on work done by myself jointly with others, I have made

clear exactly what was done by others and what I have contributed myself.

DATE: SIGNATURE:

... ...

(8)

(9)

PHILOSOPHY ON MATHEMATICAL COMPUTATIONrunning in the period: oct 2009 - oct 2013, in particular, of fulfill the period of two years (oct 2009 - oct 2011) as well as Formatting Professional Research and another two years (oct 2011 - oct 2013) as Member of the Research Team: Statistical Modeling to Environmental Problems, code 145 coordinated by the Prof. Dr. Jorge Mateu, as requirements of the course 14019 - “Matem´atica Computacional” on the Superior School of Technologies and Experimental Sciences, Department of Mathematics. I certify that I have read this dissertation and that, in my opinion, it is fully ade-quate in scope and quality as a dissertation for the degree of Doctor of Philosophy:

Director: Name: Prof. Dr. Jorge Mateu Signature: ...

I certify that I have read this dissertation and that, in my opinion, it is fully ade-quate in scope and quality as a dissertation for the degree of Doctor of Philosophy:

Co-Director: Name: Prof. Dr. Jos´e Joaquin Gual Arnau Signature: ...

Committee Member: Name: ... Signature: ...

(10)

(11)

Axiom 5B: “Be good, beyond better, to become the best”.

I.T. Vlad

(12)

(13)

by Ing. Iulian Teodor Vlad

The aim of this research is to observe the dynamics of cancer tumors and to develop and implement new methods and algorithms for prediction of tumor growth. We offer some tools to help physicians for a better understanding and treatment of this disease. Using a prediction method, and comparing with the real evolution a physician can note if the prescribed treatment has the desired effect, and according to this, if necessary, to take the decision of surgical intervention.

In this thesis we analyze the spatio-temporal dynamics of shape evolution and we apply these to a particular case of brain tumors.

The plan of the thesis is the following. In Chapter 1 we briefly recall some proper-ties and classification of points processes with some examples of spatio-temporal point processes. Chapter 2 presents a short overview of the theory of L´evy bases and integration with respect to such basis is given, we recall standard results about spatial Cox processes, and finally we propose different types of growth models and a new algorithm, the Cobweb, which is presented and developed based on the pro-posed methodology. Chapters 3, 4 and 5 are dedicated to present new prediction methods. The implementation in Matlab software comes in Chapter 6. The thesis ends with some conclusion and future research.

(14)

(15)

Special thanks to my advisor Prof. Dr. Jorge Mateu for his collaboration and to University “JAUME I” of Castell´on for the inestimable help, supporting and invaluable experience offered, an educational institution of study nearby of citi-zens and nature, with a great human and material basis, designed to provide a permanently support to the current development of actual society.

Also we want to thank to our colleague Francisco, who offer us the right to use the image acquisitions of the tumor images based on which we have shown the functionality of these algorithms.

(16)

(17)

Title ii Copyright iii Declaration v Approve vii Motto ix Abstract xi Acknowledgements xiii Table of Contents xv

List of Figures xix

List of Tables xxi

Abbreviations xxiii

Notations xxv

Dedication xxvii

1 Introduction: basics of stochastic processes 1

1.1 Temporal processes . . . 12

1.2 Spatial processes . . . 14

1.2.1 Types of spatial processes . . . 15

1.3 Point processes . . . 15

1.4 Spatial point processes . . . 27

1.4.1 Regular processes . . . 30

1.4.2 Cluster processes . . . 31

1.4.3 Complete spatial randomness . . . 32

1.5 Marked spatial point processes . . . 33

1.6 Poisson processes . . . 35

1.6.1 Homogeneous Poisson process . . . 39

(18)

1.6.2 Non-homogeneous (inhomogeneous) Poisson process . . . 39

1.6.3 Spatial Poisson processes . . . 40

1.7 Cluster processes . . . 41

1.8 Spatio-temporal point processes . . . 46

1.8.1 Earthquake processes . . . 48

1.8.2 Explosion processes . . . 49

1.8.3 Birth-death processes . . . 49

1.8.4 Point patterns sampled in time . . . 50

1.8.5 Intensity measures of spatio-temporal point patterns (First-Order Properties) . . . 51

1.8.6 Second-order intensities . . . 54

2 Spatial point processes for tumor growth. Cobweb algorithm1 ₅₇ 2.1 Spatial Cox point processes . . . 60

2.1.1 L´evy-based Cox processes . . . 61

2.1.2 L´evy-based tumor growth modeling . . . 63

2.2 Modeling tumor growth: a new algorithm. . . 69

2.3 Software . . . 76

2.3.1 Input data . . . 76

2.3.2 Procedures that must be fulfilled . . . 76

2.4 Real data analysis. . . 78

2.5 Conclusions . . . 81

3 Geometric prediction methods of tumor growth2 ₈₃ 3.1 Methodology . . . 86

3.1.1 Shape and growth description . . . 86

3.1.2 Normal method . . . 87

3.1.2.1 Curve evolutions . . . 93

3.1.3 Radius method . . . 94

3.2 Simulations and application . . . 94

3.2.1 Simulated data: random curves . . . 94

3.2.2 Simulated data: parametric curves . . . 97

3.2.3 Real data . . . 98

4 Bayesian prediction of tumor growth3 101 4.1 Data set . . . 103

4.1.1 Image registration. . . 104

4.1.2 Preparing the data . . . 106

4.2 Methodology . . . 109

4.2.1 Statistical framework . . . 109

4.2.2 Statistical inference . . . 110

1

This chapter is based on the published paper: “A geometric approach to cancer growth prediction based on Cox Processes” byVlad and Mateu (2014)[1]

2

This chapter is based on the published paper: “Two handy geometric prediction methods of cancer growth” byVlad et al. (2015)[2]

3

This chapter is based on the published paper: “Bayesian spatio-temporal prediction of cancer dy-namics” byVlad et al. (2015)[3]

(19)

4.2.2.1 SPDE approach. . . 110

4.2.2.2 Bayesian computation . . . 115

4.3 Modeling results. . . 119

5 Functional prediction of tumor growth4 ₁₂₅ 5.1 Parametric contour functions of Gioblastoma Multiform: the prob-lem of registration by FDA. . . 126

5.2 Principal differential analysis and tumor growth . . . 129

5.3 Application: a Gioblastoma Multiform study . . . 132

5.3.1 Brain scans processing: from images to contour functions . . 133

5.4 Results of principal differential analysis on the contour functions . . 135

6 Software: Prediction of the Dynamic Shape Evolution of Cancer 139 6.1 Image processing . . . 144

6.2 Tumor contour . . . 145

6.3 Prediction of the dynamic shape evolution using the Cobweb algo-rithm. . . 146

6.4 Bayesian prediction . . . 148

6.5 Geometrical prediction . . . 150

6.6 Logical prediction in space and time. . . 151

7 Conclusions and future research 153

A List of Matlab functions 159

References 165

4

This chapter is based on the submitted paper: “Principal differential analysis for modeling dynamic contour evolution. A distance-based approach for the analysis of Gioblastoma Multiform” by Romano et al. (2014)[4]

(20)

(21)

1.1 Several distributions of points in a 2-dimensional region: a) Cluster

distribution; b) Poisson distribution; c) Regular distribution . . . . 12

1.2 A sequential inhibition processes (Diggle, 1976) . . . 30

1.3 A Poisson cluster process . . . 31

1.4 A CSR pattern . . . 32

1.5 An earthquake process . . . 48

1.6 An explosion process . . . 49

1.7 A birth-death process . . . 50

1.8 Points patterns sample in time. . . 50

2.1 The star-shape object Yt. . . 64

2.2 Stochastic representation of ¯At(φ). . . 66

2.3 Brain tumor (a) and its location (b) . . . 69

2.4 Second image acquisition after one month: a) original image acqui-sition; b) boundary of tumor (red) within the sample space (blue) . 70 2.5 Superimposed images of tumor. . . 71

2.6 Tumor growth in star-shape . . . 71

2.7 Calculation of the growth tumor at time t+ ∆T . . . 73

2.8 Error propagation . . . 75

2.9 Predicted tumor after two months . . . 75

2.10 Diagram of the script . . . 78

2.11 Real data analysis. A yellow line represents the tumor at time t (time when it was discovered), and a red line represents the tumor after timet+ ∆T. . . 80

2.12 Predicted tumor . . . 81

3.1 Signed curvature κ for a negatively oriented planar closed curve α . 88 3.2 Evolution in time of a tumor brain cancer: a) first image acquisition of curve α1 and a pointsPit1; b) the same tumor after time ∆t . . . 90

3.3 Calculus of the predicted pointsPt3 k with the normal method . . . . 91

3.4 Simulated tumors at timet,t+∆tandt+2∆tusing random curves: a) prediction with radius method; b) prediction with normal method 95 3.5 Parametric and simulation shapes of curve at timet1,t2 and t3: a) prediction with radius method; b) prediction with normal method . 97 3.6 Evolution in time of the real brain tumor: a) boundary of the tumor at timet; b) boundary of the same tumor in december; c) boundary of the tumor in January . . . 99

(22)

3.7 Brain tumor: real evolution vs. prediction: a) prediction with ra-dius method; b) prediction with normal method . . . 100

4.1 Original CT image acquisition . . . 103

4.2 Registered and normalized images . . . 105

4.3 Image histogram and thresholding . . . 107

4.4 ROI in registered images (left) and overlapped points for infected cells in ROI (right) . . . 109

4.5 Correlations between the predicted and observed data for all mod-els. From top left to bottom right: ModelM1 for November, model

M1 for December, modelM1 for January, modelM2 for November, December, January, model M3 (bottom left), model M4 (bottom middle) and the complete modelM5 with covariable distance (bot-tom right). . . 122

4.6 Posterior distribution of parameters for model M5: Intercept = 2.7470 (top left);φ= 2.68945162 (top right);N ormal V ariance(σ2

X) =

0.5815021 (bottom left); P ractical Range= 107.1021 (bottom right).123

4.7 Prediction map based on modelM5 . . . 123

5.1 Contour function. The red line identifies the first step of observa-tion, the green one the second step . . . 128

5.2 Tumor boundary from different patients at different times. . . 134

5.3 Registered curves: Xi(s), i= 1, . . . ,15 (top left);Yi(s), i= 1, . . . ,15

(top right)X_i∗(s), i= 1, . . . ,15(bottom left)dY_i∗(s), i= 1, . . . ,15(bottom right). . . 135

5.4 Functional boxplot for the first step. . . 135

5.5 Residual function . . . 136

5.6 Estimated coefficient functions for theX component. From top left to bottom right: αx∗(s), βx∗1(s) for the first step; αx(s), βx1(s) for the second step. . . 137

5.7 Estimated coefficient functions β2x(s), β2x∗(s) for the first and

sec-ond step . . . 137

6.1 Diagram of PreDySEC . . . 141

6.2 The main interface of PreDySEC . . . 142

6.3 Image processing module . . . 144

6.4 Automatic tumor boundary . . . 145

6.5 The prediction with Cobweb module . . . 147

6.6 Bayesian prediction interface . . . 149

6.7 Geometrical predictions . . . 151

(23)

3.1 Absolute and relative errors for areas obtained from the normal and radius methods. . . 96

4.1 DIC, CPO and nEp for each model . . . 119

4.2 Fixed effects: Intercept . . . 121

4.3 Fixed effects: Distance . . . 121

4.4 Correlation coefficients for each model . . . 121

5.1 Models fitting, R2 goodness of fit. . . 136

(24)

(25)

ACF Auto-Correlation Function

ARMA Auto-Regressive Moving Average

CPO Conditional Predictive Ordinate

CSMS Complete Separable Metric Space

CSR Complete Spatial Random

CT Computer Tomography

DIC Deviance Information Criterion

FDA Functional Data Analysis

GBM Blioblastoma Multiforme

GF Gaussian Field

GMRF Gaussian Markov Random Field

GUI Graphical User Interface

IGRT Image Guided Radiation Therapy

IMRT Intensity Modulated Radiotherapy

INLA Integrated Nested Laplace Approximation

LGCP Log-Gaussian Cox Processes

MCMC Markov Chain Monte Carlo

MRI Magnetic Resonance Images

PDA Principal Differential Analysis

PDE Principal Differential Equation

PDF Probability Density Function

PreDySEC Prediction of the Dynamic Shape Evolution of Cancer

RMSE Root Mean Square Error

ROI Region Of Interest

RV Random Variable

RW1 Random Walk model of order 1

SPDE Stochastic Partial Differential Equations

(26)

(27)

In this work, we use the following notations throughout the text:

R real line

Rd dimensional space

R+ nonnegative real numbers

Z,Z+ integers of R,R+

X random variable

Ω space of probability elementsω (Ω,F) measurable space

P(A) probability measure (Ω,F, P) probability space

ε measurable sets in probability space (Ω, ε, P) basic probability space

S sample space

W observable space

∅,∅(·) null set, null measure A ⊆Ω subset of Ω

An _n_{-fold product set} _A_{× · · · ×}_A

A family of sets generating B B Borelσ-field

Bb(Ω) bounded Borel subset of Ω

E(X) expectation or mean or average value E(X|A) conditional expectation

f(x) probability density function FX(x) distribution function

N(A) number of points in setA

N(a, b] number of points in interval (a, b] N(t) N(t) =N(0, t] =N (0, t]

N(·) point process

Nm(·) marked point process

Nc(·) cluster process

(28)

G[h] probability generating functional (p.g.fl.) ofN G[h|x] member of measurable family of p.g.fl.’s

G, GI expected information gain of N

K(r) RipleyK-function ti instance of time

∆T time interval

Ti forward recurrence time

r =l(r) =ksi−sj k distance between si and sj

λ(·) intensity function of a point process λ2(·,·) second order-intensity function

V(A) = VarN(A)variance function ofN orξ in_R c(k) = Cov(Xt, Xt−k) covariance function

C(·) C[k](·) factorial cumulant measure and density

L(A) Gaussian L´evy-basis k(·,·) probability kernel

P(x, A) Markov transition kernel

κj(·) the jth cumulant moment of the spot variable

Λ(ξ) intensity field

`(·) Lebesgue measure inB(_Rd)

(29)

(30)

(31)

Introduction: basics of stochastic

processes

Cancer is a widely spread disease that affects a large proportion of the human population. Recently developed technologies such as controlled chemotherapy, IMRT (intensity modulated radiotherapy), IGRT (image guided radiation therapy) and hadronotherapy, do pro-vide good results in the detection, control and follow-up of cancer growing. In this context, it is needed a precisely detection of the tumor boundary, and a further prediction of the tumor dynamics to verify the result of a particular treatment.

Prediction, a statement about the future, is meant to have a a

pri-ori information about one event that can be observable. Over time, this subject has created a lot of controversy and discussion. From ancient times to present, people have tried to find and predict some facts and how they evolve in the future. The power to have such in-formation sometimes turns on the perception that this is magic (e.g. to know when a solar eclipse will happen in the Maya civilisation, to know when the wind will turn on the direction in a sea battle of ships in 17th-18th Century, to depict the cholera source by clusters in London 1854, etc.). This kind of scientific worries was contemned

(32)

but a number of researchers were continuing to study and make ex-periments to develop revolutionary theories and methodologies for prediction.

Talking about future it is obvious that we must refer to time and to the next values of this abstract variable. Actually, even mathemat-ical formalism can contain multiple abstract notions may be hard to explain and analyze intuitively. For example the “lineal element” defined by eq.(3) in the article “The foundation of general relative theory of relativity” by Albert Einstein, has no direct physical mean-ing.

In this research we are concerned with observing and modeling the effect of time on the evolution of cancer.

In modern societies, mathematicians and statisticians developed the-ories and methodologies to equipe with some probability the realiza-tion of one event in the study experiment. Some evolurealiza-tion aspects over time and the dynamic of phenomenon can be defined a priori if we have enough input data and the developed theory to do this. Otherwise we can use empirical methods,based on input data and the researcher experience, to find an approximate solution of the interest variable.

If we look at the pairs of terms synthetic/analytic and a priori/a pos-teriori we see that a mathematical interpretation can be represented diagrammatically as follows: the overlap of the synthetic with the a priori indicates that he asserts that there are synthetics statements a priori, but we would also regard logical laws and certain fundamen-tal principles of mathematic as a priori synthetic. By means of these distinctions between the a priori and a posteriori, and between the analytic and synthetic, is called formalism. Also the formalism can be defined in simple words as a description of something in formal mathematical or logical terms.

(33)

Mathematical formalism always tries to find a compromise between simplicity of analysis and requirements of realism. On the one hand, we have extremely complex natural and biological systems; on the other hand, we need to formally address some quantitative issues about these systems which can be often done only through the use of mathematical models that may rest on grossly over-simplified as-sumptions.

On some occasions, a particular mathematical formalism seems to be “pre-adapted” to a variety of natural and biological systems and can be profitably used to model a diverse set of processes.

Stochastic Geometry, Bayesian methods, Inference Methods, SPDE, Functional data Analysis are some class of such models, used now to solve real problems in the field of:

- epidemiology (home locations of infected patients), - computational neuroscience (spikes of neurons),

- forestry and plant ecology (positions of trees or plants in general), - meteorology (weather prediction),

- geography (positions of human settlements, towns or cities), - seismology (epicenters of earthquakes),

- materials science (positions of defects in industrial materials), - astronomy (locations of stars or galaxies, revealing regularity in the spatial distribution of point-like objects, identification of important scales in the spatial distribution of point-like objects, etc.),

- stellar statistics (deriving distributions, testing of predicted distri-bution functions, identification of clusters and associations of stars, search for wide binaries and multiple systems),

- cosmological problems (testing of predicted distribution functions, identification of galaxy clusters, voids, etc.),

- medicine (snapshots of a growing brain tumors), - zoology (burrows or nests of animals),

(34)

In this research we analyze the spatio-temporal dynamics of shape evolution in order to develop new prediction methods and we apply these to a particular case of brain tumors. These objects are origi-nally processed from CT and MRI images, and can be depicted as a collection of image pixels with varying degrees of color intensity levels. We consider spatio-temporal stochastic processes within a Bayesian framework to model spatial heterogenity, temporal depen-dence and spatio-temporal interactions amongst the pixels, providing a general modeling framework for such dynamics. We aim at pre-dicting cancer growth in space and time. We analyze real data on brain tumor based on a set of images taken at several visits and also simulated closed curves to randomly generate the tumor cancer contours.

Let us begin with a brief introduction to stochastic theory, define the basic notations and show some important characteristics and types of point processes.

A stochastic process, or sometimes a random process, is the counter-part to a deterministic process (or deterministic system) in proba-bility theory. Instead of dealing with only one possible“reality” of how the process might evolve under time (as is the case, for example, for solutions of an ordinary differential equation), in a stochastic or random process there exist some indetermination in its future evo-lution described by probability distributions. This means that even if the initial condition (or starting point) is known, there are many possibilities when the process might go, but some paths are more probable and others less (Cox (1994)[5] and Daley and Vere-Jones (2003)[6]).

In the simplest possible case (“discrete time”), a stochastic process amounts to a sequence of random variables known as a time series (for example, see Markov chain). Another basic type of a stochastic

(35)

process is a random field, whose domain is a region of space, in other words, a random function whose arguments are drawn from a range of continuously changing values. One approach to stochastic processes treats them as functions of one or several deterministic arguments (“inputs”, in most cases regarded as “time”) whose values (“out-puts”) are random variables: non-deterministic (single) quantities which have certain probability distributions. Random variables cor-responding to various times (or points, in the case of random fields) may be completely different. The main requirement is that these different random quantities all have the same “type”. Although the random values of a stochastic process at different times may be inde-pendent random variables, in most commonly considered situations they exhibit complicated statistical correlations.

When we use the word “point” we shall refer to an object or an event in a “location” or in the “sample space”.

The sample space will be denoted by Ω ⊆ S ⊂ _Rd _{dimensional space}

(usually the Euclidean space, with d=2 or d=3 in applications). The distribution of such n points in the sample space or in the“study

region” Ω can be random or governed by some laws. Each point n

must accomplish at least one condition to be observed (it represents a value of the observed phenomenon) and can contain also some supplementary “information”.

If we talk about “point process” in terms of stochastic theory, then we shall deal with words like: “random” or “probability” refer to values, and “measure” or “topology” with reference to the sample space.

Let us consider that we have n points, in an observable space Ω

included or equal in the sample space S ( Ω ⊆ S ) and let A be a realization of these events.

(36)

Then:

A = {ω1, ω2, . . . , ωn}.

Definition 1.1. The “atoms” of Ω are the events {ω} having just

one outcome ω ∈ Ω.

We say that the atom {ω}of the outcome ω is the event“ω occurs”

The events are subsets of the sample space and by additivity, any probability measure P must satisfy:

P(A) = P({ω1}) +P({ω2}) +. . .+P({ωn}) (1.1)

Every probability measureP on a finite sample space Ω is determined by its values on the atoms. The value on an arbitrary event A ⊆ Ω is then computed by the formula:

P(A) = X

ω∈A

P({ω}) (1.2)

The values of P on the atoms may be assigned arbitrary as long as: 1. For every atom {ω}, 0 ≤ P({ω}) ≤ 1

2. P

ω∈ΩP({ω}) = 1

whenever (1) and (2) hold, P defines a consistent probability

measure on Ω.

Definition 1.2. Let Ω 6= ∅. A field on Ω is a system A of subsets

A ⊆Ω satisfying the following conditions: 1. Ω ∈ A, ∅ ∈ A

2. If A1, A2 ∈ A then A1 ∪A2 ∈ A and A1 ∩A2 ∈ A 3. If A ∈ A then Ac ∈ A

(37)

Definition 1.3. A content is a set function µ defined on a field A

such that:

1. µ(A) ∈ [0,∞] whenever A ∈ A

2. µ(∅) = 0

3. µ(A1∪A2) = µ(A1)+µ(A2) wheneverA1, A2 ∈ AandA1∩A2 = ∅

Definition 1.4. Let A be a field and let µ|A be a content. The content µ is called σ-additive if:

µ [ i∈N Ai ! = X i∈N µ(Ai) (1.3)

for every pairwise disjoint sequence (Ai)k∈N ⊆ A such that:

S

i∈NAi ∈ A.

If a content is σ-additive then the content has several continuity properties which make calculations easier.

Definition 1.5. A field F on Ω is a σ-field if: (Ai)i∈_N ⊆ F ⇒

[

i∈N

Ai ∈ F (1.4)

A pair (Ω,F) where F is a σ-field on Ω is called a measurable space.

Definition 1.6. A σ-additive content which is defined on a σ-field is called a measure.

A probability space is a measure space such that the measure of the whole space is equal to 1.

Definition 1.7. A probability space is a triplet (Ω,F, P) con-sisting of a set Ω (called the sample space), a σ-algebra (also called

σ-field) F of subsets of Ω (these subsets are called events), and a measure P such that P(Ω) = 1 (called the probability measure).

(38)

Definition 1.8. Let (Ω,F, P) be a probability space and let A ⊆ F be a sub-σ-field. Let X be a nonnegative or integrable random variable. The conditional expectation E(X|A) of X given A is an A-measurable random variable Y satisfying:

Z A X dP = Z A Y dP, for all A ∈ A (1.5)

Definition 1.9. Let (Ω,F, P) be a probability space. AnyF-measurable

real-valued function X : Ω → _R is called a random variable

(R.V.).

An integer random variable is a functionX defined on a sample space Ω, that takes only integer values. Namely, for every sample point

ω ∈ Ω, X(ω) is a integer. The probability distribution of X is the sequence of numbers pn such that pn is the probability of the event

“X equals n”. The event “X equals n” can be writen (X = n). As a subset of Ω this event is:

(X = n) = {ω ∈ Ω : X(ω) = n}

Definition 1.10. SupposeX is a integer R.V. with distributionpn =

P(X = n). The expectation or mean or average value of X is:

E(X) = X n n·pn = X n n·P(X = n) (1.6)

Definition 1.11. Let X be a random variable. Then the function

FX : _R→ [0,1] is the distribution function of X defined by:

FX(x) := P(X ≤ x), x ∈ R (1.7)

and satisfies: 1. F(−∞) = 0 2. F(∞) = 1

(39)

3. F(x) ≤ F(x0) if x < x0.

The distribution of X is PX, i.e. the image of P under X defined

by:

PX(B) := P(X−1(B)) = P(X ∈ B), B ∈ B,

where B is the Borel σ-algebra.

Thus, the distribution function FX determines the values of the

dis-tribution PX on intervals by:

PX((a, b]) = P ({a ≤X ≤ b}) =F(b)−F(a).

A probability distribution has density f.

Definition 1.12. Let X be a random variable. Then the function

f : _R → _R is the probability density function (P.D.F.) of the

random variable X if:

PX((a, b]) = P ({a ≤X ≤ b}) = Z b a f(x)dx (1.8) and satisfies: 1. F(x) =P({X < x}) =R_−∞x f(x)dx. 2. f(x) > 0, ∀x ∈ _R 3. R_−∞∞ f(x)dx = 1 4. if X ∈ (a, b), then f(x) = 0 for ∀x /∈ (a, b) 5. R_abf(x)dx = F(b)−F(a)

Example 1: Consider the experiment of flipping a coin once. Then: Ω = {H, T} (the possible outcomes: “Heads” and “Tails”)

F = P(Ω) (F contains all subsets of Ω)

P({H}) = P({T}) = 1 2

(40)

Example 2: We pick a real number at random in the interval [0, 2]. Ω = [0, 2], F is the Borel σ-field of [0, 2]. The probability of an interval [a, b] ⊂ [0,2] is:

P([a, b]) = b−a 2

Example 3: If A is an event in a probability space, the random variable:

1A(ω) =

(

1, ω ∈ A; 0, ω /∈ A.

is called the indicator function of A. Its probability law is called the

Bernoulli distribution with parameter p = P(A).

The set of all possible sequences is called “the Bernoulli sample space” Ω, and the correspondent experiment process is called “the Bernoulli process”

Example 4: We say that a random variable X has the Binomial law B(n, p) if: P(X = k) = n k ! pk(1−p)n−k (1.9) for k = 0,1,2, . . . , n

Example 5: We say that a random variable X has the Normal law N(m, σ2) if: P(a < X < b) = √ 1 2πσ2 Z b a e−(x−2σm2)2dx (1.10)

(41)

Example 6: If X is a random variable with normal law N(0, σ2) and λ is a real number,

E(exp(λX)) = √ 1 2πσ2 Z ∞ −∞ eλxe−x 2 2σ2dx = √ 1 2πσ2e σ2λ2 2 Z ∞ −∞ e−(x−σ 2_λ₎₂ 2σ2 dx = eσ 2_λ2 2 (1.11)

Example 7: Consider an experiment that consists of counting the number of traffic accidents at a given intersection during a specified time interval.

Ω ={0,1,2, . . .}

F = P(Ω) (F contains all subsets of Ω)

P({k}) =e−λλ

k

k! (Poisson probability with parameter λ >0)

Example 8: If X is a random variable with Poisson distribution of parameter λ >0, then: E(X) = ∞ X n=0 ne −λ_λn n! = λe −λ ∞ X n=1 e−λλn−1 (n−1)! = λ With these concepts in mind we can now state that:

• if in the probability space (Ω,F, P), the random variable X has associated a time series then we have a “temporal process”

• if the sample space is d dimensional (d ≥ 2) Euclidean then we

have a “spatial process”

• if in the probability space (Ω,F, P), the measure N(A) repre-sents the number of points falling in the subset A of Ω ⊆ S,

(42)

• if in the probability space (Ω,F, P), each event occurred in the subset A of Ω ⊆ S has associated a location s then the measure

N(s) represents the number of events that are considered and

we have a “spatial point process”

• if the point process N(s) is “marked” then we have a particular

point process named “marked point process”

• if in the case of point process N(s), the random variable X has a Poisson distribution then we have a “poisson point process”

• if in the case of point processes N(s) the random variable X in distributed in clusters over the sample space then we have a

“cluster point process”

• if the time series values ti are index the spatial random set

Ω ⊆ S of spatial point processes N(s) then we have a “spatio-temporal point process” N(s, t)

Figure 1.1: Several distributions of points in a 2-dimensional region: a) Cluster distribution; b) Poisson distribution; c) Regular distribution

In Figure 1.1 it is shown several possible distributions of points in a region. They are realizations of spatial point processes.

1.1 Temporal processes

Definition 1.13. A stochastic process observed over time is called a time series.

(43)

Define a real-valued time series by:

{Xt ∈ R: −∞ < t< ∞} (1.12)

the stochastic process is said to be a second-order (weakly) stationary if:

E(Xt) = µ(= 0) (1.13)

and

Cov(Xt, Xt−k) =c(k) (1.14)

wherec(k) is the auto-covariance function and is completely specified by the time lag k.

Analysis of time series data is typically conducted in the time or in the frequency domain. The analysis of the time series autocorrela-tion funcautocorrela-tion (ACF) is discussed in Box et al. (1994)[7], Bhansali (1980)[8], Brockwell and Davis (1991)[9], Hamilton (1994)[10], etc. Estimating, for example, Auto-Regressive Moving Average (ARMA) models is straightforward in the time domain (Postcher and Srini-vasan (1994)[11]).

Fuller (1976)[12], Harvey (1981)[13], L˝utkepohl (1991)[14], etc, pre-fer the time domain approach because of the relative ease of inter-pretation. Conversely, the frequency domain of time series data is steeped in the Hilbert space algebra. However, the spectral analy-sis has advantages in its nonparametric approach to data analyanaly-sis. Although ARMA models can be obtained in the frequency domain, the spectral approach does not require any parametric model for inference.

(44)

1.2 Spatial processes

Definition 1.14. A stochastic process with a spatial domain is called a spatial process. A spatial process is defined by:

{N(s) : s ⊆Ω ⊂_Rd} (1.15)

where Ω is an index set and N(s) is the attribute of interest at location s.

For simplicity, the dimension of space is _R2, representing observa-tions in the plane. The main difference between (1.12) and (1.15) is that the “information” about the locations is not necessary well-ordered like a temporal process which is directed in time (t= {t1, t2,

t3, . . . , tn}, where t1 < t2 < t3 < · · · < tn). Time flows

unidirection-ally, whereas there is no equivalent to past, present, or future in a spatial domain. For this reason many of the methods used to analyze time series must be modified to be appropriate in the spatial context and many techniques of spatial data analysis have been developed independently to time series analysis.

In the geostatistical literatureN(s), the attribute of interest observed at location s, is often viewed in the context of random functions (see, e.g., Juornel and Huijbregts (1978)[15], Goovaerts (1997)[16], Chil´es and Delfiner (1999)[17]).

Briefly, N(s, ω) depends on the realizationω of a random experiment. For a given realization N(·, ω) is a function of spatial locations and n observations N(s1),N(s2), ...,N(sn) represent an n-dimensional

sam-ple of size one from the set of all possible random functions. The stochastic behavior of the attributeN at locationsis induced by con-sidering all possible realizations of the random experiment at that location N(s).

(45)

1.2.1 Types of spatial processes

Viewing spatial processes as stochastic processes according to (1.15) is general in that the nature of the index set Ω permits the definition of different types of spatial data. Three spatial data types are defined as follows, according to Cressie (1993)[18]:

I. Geostatistical Data: N(s) is a random variable observed at lo-cations s ∈ Ω, where Ω is fixed and continuous.

Examples: Random or systematic sampling of a surface, plant yields across a corn field, drilling for ore.

II. Lattice Data: N(s) is a random variable observed at locations

s∈ Ω, where Ω is fixed and discrete.

Examples: Unemployment rates by census tracts, coloring on remotely sensed pixel images.

III. Point Patterns: N(s) is a random variable observed at loca-tions s∈ Ω, where Ω is a random set of indices.

Examples: Positions of lunar craters, locations of trees in a for-est, residences that reported break-ins in 1999.

1.3 Point processes

The theory of point processes has undergone an explosive expansion in the last two decades. Point processes and random measures are common in many physical applications found in engineering, astrol-ogy, biolastrol-ogy, etc. These processes can be observed in one-dimension as a time series or two-dimensions as a spatial point pattern with extensive amounts of literature devoted to their analysis.

(46)

The analysis of point pattern data in a compact subset S of Rn is a major object of study within spatial statistics. There are dif-ferent ways to build and characterize a point process (using finite-dimensional distributions, void probabilities, capacity functionals, or generating functions). An easier way to build a point process is by transforming an existing point process (by thinning, superposition, or clustering).

Point processes are covered in detail byBartlett (1975)[19] andDaley and Vere-Jones (2003)[6]. Bartlett (1964)[20], [21],Bartlett (1975)[19] extends analysis of the spatial point process to the frequency domain through the spatial periodic functions.

Ripley (1976)[22] introduced the analysis in the spatial domain through K-function. Currently, Ripley’s approach to studying the depen-dency structure of point patterns remains the dominant method for analysis. Daley and Vere-Jones (2003)[6] in their book: An

Introduc-tion to the Theory of Points Processes, offer a complete background

about this type of Stochastic Processes.

Informally, a point process on a suitable state space S is understood to be a locally finite collection of distinct random elements k,(k = 1,2, ..., n) in S. With these specifications we can make the first general definition:

Definition 1.15. A point process is a random distribution of points in a sample space.

In mathematics, a point process is a random element whose values are “point patterns” on a setS ⊆ _Rd_{. While in the exact}

mathemat-ical definition a point pattern is specified as a locally finite counting measure, it is sufficient for more applied purposes to think of a point pattern as a countable subset of S that has no limit points.

(47)

Definition 1.16. A point process N is a stochastic model governing the locations of events si in some bounded set A.

When estimated from point process data, the empirical product den-sity function (1.28) provides a description of the density of inter-event distances in an observed point pattern. For instance, high values for small distances are indicative of an overabundance of short inter-event distances (this is a typical situation for cluster processes, where data tend to form groups). Conversely, if short inter-event distances are rare, this will indicate that an inhibitory structure is present, and points tend to separate from each other. In the homogeneous and isotropic case (Cressie (1993)[18]; Stoyan and Stoyan (1994)[23];

Stoyan et al. (1995)[24]), the product density (1.28) depends only on the distance r =k si−sj k between the points si and sj, and thus

we write, for the sake of simplicity l(r) for the product density. If the points in this space have associated the time axis, we have a

time point process, and its random points Pi are time instants ti,

which are called the events.

Attention is typically restricted to points in some time interval [T0, T1], and to processes with only a finite number of points in any compact subset of S.

Traditionally the points of a point process are thought to be indistin-guishable, other than by their times and locations. Often, however, there is other important “information” to be stored along with each point. For example, one may wish to analyze a list of points in time and space where a member of a certain species was observed, along with the size or age of the organism, or alternatively a catalog of arrival times and locations of hurricanes along with the amounts of damage attributed to each. Such processes may be viewed as marked spatiotemporal point processes, i.e. random collections of

(48)

points, where each point has associated with it a further random variable called a mark.

Much of the theory of spatiotemporal point processes carries over from that of spatial point processes. However, the temporal aspect enables a natural ordering of the points that does not generally exist for spatial processes. Indeed, it may often be convenient to view a spatiotemporal point process as a purely temporal point process, with spatial marks associated with each point. Sometimes investigat-ing the purely temporal (or purely spatial) behavior of the resultinvestigat-ing marginalized point process is of interest.

The spatial region of interest is often a rectangular portion of _R2 or _R3, but not always. Cases where the points are spatially dis-tributed in a sphere or an ellipse are investigated by Brillinger et al. (1997)[25] and Brillinger (2001)[26]. When the domain of possible spatial coordinates is discrete (e.g. a lattice) rather than continuous, it may be convenient to view the spatiotemporal point process as a sequence {Ni} of temporal point processes which may interact with

each another.

For modeling and statistical inference purposes we consider point processes in a bounded region of space. Under this restriction it is possible to define point processes by writing down their probability densities.

A point process on the line may be taken as modeling the occurrences of some phenomenon at the time instants ti with i in some suitable

index set. For such a process, there are four equivalent descriptions of the sample paths:

1. counting measures;

(49)

3. sequences of points; and 4. sequences of intervals.

In describing a point process as a counting measure, it does not matter that the process is on the real line. However, for the three other methods of describing the process, the order properties of the reals are used in an essential way. While the methods of description may be extended into higher dimensions, they become less natural and, in the case of (4), definitely artificial. We mostly used the intuitive notion of a point process as a counting measure. To make this notion precise, take any subset A of the real line and let N(A) denote the number of occurrences of the process in the set A; i.e.

N(A) = number of indices i for which ti lies in A = ]{i : ti ∈ A}.

(1.16) When A is expressed as the union of the disjoint sets A1, . . . , Ar, say,

that is, A = r [ i=1 Ai where Ai∩ Aj = ∅ for i /∈ j, it is a consequence of (1.16) that: N r [ i=1 Ai ! = r X i=1

N(Ai) for mutually disjoint A1, . . . , Ar

A natural way of measuring the average density of points of a point process is via its mean, or in the case of a stationary point process, its mean density, which Daley and Vere-Jones (2003)[6] define as

m = E(N(0,1]). (1.17)

Defining the function

(50)

is a consequence of the additivity properties of N(·) as in (1.17), of expectations of sums, and of the stationarity property in (1.18), the following properties for x, y ≥0,

M(x+y) = E(N(0, x+ y]) = E(N(0, x] +N(x, x+y]) = E(N(0, x]) +E(N(x, x+y]) =

= E(N(0, x]) +E(N(0, y]) = = M(x) +M(y)

In other words, M(·) is a nonnegative function satisfying Cauchy’s functional equation:

M(x+y) = M(x) + M(y) (0 ≤ x, y < ∞) Consequently

M(x) = M(1)x = m(x) (0 ≤ x < ∞) (1.19) irrespective of whether M(x) is finite or infinite for finite x > 0 There is another natural way of measuring the rate of occurrence of points of a stationary point process, due originally to Khintchine (1960)[27].

Proposition 1.17. For a stationary (or even crudely stationary) point process, the limit:

λ = lim

h↓0

P r{N(0, h] > 0}

h (1.20)

exists, though it may be infinite.

PROOF:

Introduce the function

(51)

Then φ(x) ↓ 0 as x ↓ 0, and φ(·) is subadditive on (0,∞) because for x, y > 0, φ(x+y) = P r{N(0, x+ y]> 0} = = P r{N(0, x]> 0}+P r{N(0, x] = 0, N(x, x+ y]> 0} ≤P r{N(0, x] > 0}+P r{N(x, x+y] > 0} = = φ(x) +φ(y)

Parameter λ is called the intensity of the point process, and when it is finite, (1.20) can be written as:

Pr{N(x, x+h] > 0} = Pr{there is at least one point in (x, x+h]}

= λh+o(h) (h ↓ 0) (1.22)

These two measures of the ”rate” of a stationary point process coin-cide when the point process has the following property (text of propo-sition, proof and definition from Daley and Vere-Jones (2003)[6]).

Definition 1.18. A point process is simple when:

Pr{N({t}) = 0 or 1 for all t} = 1 (1.23)

Daley and Vere-Jones (2003)[6] called this sample-path property al-most sure orderliness to contrast it with the following analytic prop-erty due to Khintchine (1960)[27].

Definition 1.19. A crudely stationary point process is orderly

when

Pr{N(0, h] ≥ 2} = o(h) (h ↓ 0) (1.24) Notice that stationarity plays no role in the definition of a simple point process. In addition, it does not matter whether the point process is defined on the real line or an a Euclidean space (Daley and Vere-Jones (2003)[6]).

Definition 1.20. A regular point process (see Snyder (1975)[28]) is such that the probability of an event occurring in the time interval

(52)

[t, t+4t] is given by:

Pr[one event in [t, t+4t)|Nt, wt] = µ(t;Nt, wt)4t

Pr[more than one event in [t, t+4t)|Nt, wt] = o(t,4t)

(1.25) where:

Nt is the number of events that have occurred up to time t

(observations are assumed to start at time t= 0);

wt is the vector of occurrence times of these Nt events:

wt = [w1, . . . , wNt]; and

o(t,4t) decreases to zero as 4t decreasing faster than linearly: lim

∆t→0 o(t,4t)/4t= 0

These equations mean that no more than one event can occur in a sufficiently small interval and that the probability of one event occur-ring within a small interval, is proportional to the interval’s duration. The quantities Nt and wt describe the history of the process,

giv-ing the number and the times at which all events occurred prior to timet. Note that the probabilities are conditional probabilities: they depend on the point process’s history.

Definition 1.21. a) Let {Ni}i∈{1,2,...} be a sequence of nonnegative random variables on some probability space (Ω,F, P) such that 0 < Ni ≤ Ni+1. Then the sequence {Ni} is called a point process on

[0,∞). If in addition, Ni < Ni+1 ∀i then the point process is said to be a simple point process.

b) Let {Ni}i∈{1,2,...} be a simple point process on [0,∞), defined on (Ω,F, P), and let {Zi}i∈{1,2,...} be a sequence of {1,2, . . . , M}- val-ued random variables (also defined on (Ω,F, P), with 1 ≤M < ∞). Then the double sequence {Ni, Zi}i∈{1,2,...} is called a M-variate

(53)

point process on [0,∞). Define for all m, 1 ≤ m ≤ M, and all t ≥ 0 Vm(t) = X i≥1 1(Ni ≤ t)1(Zi = m). (1.26)

Then the M-vector process V(t) = (V1(t), V2(t), . . . , VM(t)) is the

M-variate counting process associated with {Ni, Zi}.

In our context, Ni will be the occurrence time of the ith market

event and Zi will indicate the event’s type. Vm(t) gives the random

number of events of type m that have occurred up to and including time t. Because {Ti} in the above definition is simple, the possibility

of the simultaneous occurrence of two events (of either the same or different types) is ruled out.

Ripley’s K-function

The K-function, defined by Ripley (1976)[22], Ripley (1977)[29] is a good indicator for spatial structures (Besag and Diggle (1977)[30],

Cressie (1993)[18], Diggle (1983)[31]).

The probability to find a neighbor at a given distance r is very important in applications. The neighbors of point i represents all the points located at a distance less than or equal to a given value r (basically, it represents the number neighbors in a circle of radius r centered on the point i). We denote the expected value by ν(r). Its estimator, the observed average number of neighbors, is denoted by

V(r).

Ripley (1977)[29] showed that:

ν(r)

λ =

Z r

ρ=0

g(ρ)2πρ dρ (1.27) and offers an interpretable measure for the spatial dependence in isotropic stationary point processes.

(54)

Thus the K-function is defined as

λK(r) =E[ number of extra events within a distance r of an arbitrary event]

The K-function provide an interpretable measure of clustering in a point process. The expected number of pairs of events N(A) in a region A with area | A | with pairwise distance less or equal to

r is: λ2 | A | K0(r). The K-function is a cumulative function and the derivative is another interpretable function called “the product density function”, defined by:

λ(2)(r) = λ

2_K0₍_r₎

2πr , r > 0 (1.28)

Definition 1.22. Ripley (1977)[29] defined the K function as:

K(r) = Z r

ρ=0

g(ρ)2πρ dρ (1.29) where g(ρ) is the pair-correlation function.

If points are distributed independently from each other, g(ρ) = 1 for all values of ρ, so K(r) = πr2. This value is used as a benchmark:

• K(r) > πr2 indicates that the average value of g(ρ) is greater than 1. The probability to find a neighbor at the distance ρ is then greater than the probability to find a point in the same area anywhere in the domain: points are aggregated.

• Inversely, K(r) < πr2 indicates that the average neighbor den-sity is smaller than the average point denden-sity on the studied domain. Points are then dispersed.

K(r) is estimated by the ratio of the average number of neighbors over the density, the latter being estimated by the total number of

(55)

points divided by the domain area: ˆ λ = N |A| (1.30) Thus we have: ˆ K(r) = ˆv(r) ˆ λ = V(r) N/|A| (1.31)

The average number of neighbors can be expressed more explicitly by defining the indicator c(i, j, r) = 1 if the distance between points i and j is at most r, 0 otherwise:

ˆ K(r) = 1 ˆ λN N X i=1 N X j=1,i6=j c(i, j, r) (1.32)

Points located close to the domain borders are problematic because possible neighbors of these points lying outside the domain are not counted. This is so called “edge effect”. Ignoring this edge effect results in underestimating K. Ripley (1977)[29] proposed to correct the indicator c(i, j, r) introduced in equation (1.32). We denote Lir

the portion of the circle of radius r centered on the point i located inside the domain. If a part of the crown of widthdr, inside of which a neighbor is counted, is outside the domain, the neighbor is given a weight equal to the inverse of the ratio between the inside part of the crown (Lirdr) and the whole crown (2πrdr). The idea is that the

outside part of the crown could have contained the same neighbor density than the inside part. The correction is then given by:

ˆ K(r) = 1 ˆ λN N X i=1 N X j=1,i6=j 2πr Ljr c(i, j, r) (1.33)

Cressie (1993)[18] also refers to theK-function as the reduced second-order measure. If we assume the process is completely random then the extra number of events within a distance r will be uniform on a

(56)

disc. From this we see that: λK(r) = Z 2π 0 Z r 0 {λ2(x)/λ}x dxdθ = = 2π λ Z r 0 λ2(x)x dx (1.34)

and as a result, the second-order intensity function, λ2(·), is:

λ2(r) = λ 2 2πr

∂K(r)

∂r (1.35)

The K-function has many appealing features not shared by λ2(r), such as its invariance to random thinning, physical interpretation, and simple estimation (Cressie (1993)[18]). Several approaches for estimation and interpretation of the K-function are given by Cressie (1993)[18] and Diggle et al. (2003)[32]. However, it must be noted that K(r) does not uniquely determine the distribution of a point process. Different point processes can produce identical K-functions (Baddeley and Silverman (1984)[33]). Furthermore, though the K -function is used to analyze second-order properties of a spatial point pattern, it cannot distinguish between deviations from Complete Spatial Random (CSR) due to lack of uniformity or lack of indepen-dence of events. However, since K(r) is defined only for first-order stationary processes, uniformity is a requirement of K-function anal-ysis.

Under CSR, λ2(r) ≡ λ2, thus (1.35) reduces to:

K(r) = πr2 (1.36)

Testing for CSR can be done by comparing the empirical K-function to πr2. Quite often, inference is based on the L-function defined by:

L(r) =pK(r)/π, (1.37)

(57)

Because the probability distribution of the K-function (or the L -function) is intractable, inference about a process is based on simu-lated K-functions. An envelope is built by simulating the teoretical point process a number of times and defining the K-function for a set of distances r.

For every distance r, (Kmin(r), Kmax(r)) is stored. The upper and lower envelopes are then overlaid on the observed K-function ˆK(r). Inference can be obtained by comparing the observed K- function to the simulated envelope. If ˆK(r) > K¯sim(r) then the number of events within a distance r of an arbitrary event is greater than ex-pected under the hypothesized process. If the hypothesized process is CSR then this would imply an aggregated process if r is small or a regular process if r is large. Conversely, if ˆK(r) < K¯sim(r) then the number of events within a distancer of an arbitrary event is less than expected under the hypothesized process. Reversing our conclusion we would infer that the observed process is regular if r is small or aggregated if r is large. If ˆK(r0) > K¯max(r0) or ˆK(r0) < K¯min(r0), the hypothesis is rejected for that particular distance.

1.4 Spatial point processes

A spatial point process differs from the first two types of spatial data in that the domain Ω is a random set containing location sof events. Whereas interest with geostatistical and lattice data lies in studying the properties ofN(s) orE[N(s)], for spatial point processes, studying the properties of the set Ω is the primary goal. Note that we can write:

N(s) = (

1, if s ∈ Ω

0, if s ∈/ Ω (1.38)

(58)

Definition 1.23. A process where N(s) = 1,∀ s ∈ Ω is called a

simple point process to emphasize that only the random locations at which events occur are of interest.

Furthermore, point processes are considered to be orderly in the sense that:

limP(N(ds) > 1) = 0

where ds is an infinitesimal disk at location s with area (volume)

| ds | and N(ds) denotes the number of events in the disk. In other words, only those processes where any given location can record at most one event are considered.

For geostatistical and lattice data, a weakly stationary process is defined similarly to a second-order (or weakly) stationary temporal process. Specifically, a spatial process is weakly stationary if:

E[N(s)] = µ (1.39)

and

Cov[N(si), N(sj)] = c(r) (1.40)

where r = si −sj is a two-dimensional vector containing the shift in

location from site si to site sj . This is analogous to weak temporal

stationarity in that the stochastic process is location invariant and self-replicating.

For spatial point patterns, weak stationarity is defined through the first and second-order intensities.

Definition 1.24. The first-order intensity of a spatial point pro-cess is defined as the expected number of points per unit area:

λ(s) = lim |ds|→0

E[N(ds)]

(59)

Here ds is an infinitesimal region containing the site s, N(ds) repre-sents the number of points located inds, and|ds|is the area (volume) of the region ds. Throughout the text, the notation | · |will represent the Lesbesgue measure on the spatial region Ω.

Definition 1.25. The second-order intensity is a measure of the dependency structure of the events in Ω and is given by:

λ2(si,sj) = lim |dsi|→0 |dsj|→0 E[N(dsi)N(dsj)] |dsi||dsj| (1.42)

The second-order intensity λ2(si, sj) contains information about the

stochastic dependence between events in two regions. Although

λ2(·,·) is akin to c(·) defined previously, it is not a covariance func-tion.

As with the other types of spatial data, a spatial point process is weakly stationary if the process is location invariant. This is equiva-lent to saying λ(s) ≡ λ so that the expected number of events at an arbitrary location s is constant for all s ∈ D; and λ2(si,sj) = λ2(r) so that the dependence between events at two arbitrary locations si

and sj depends only on the distance r.

A spatial process has the distinction from a temporal process in that its observations typically cannot be ordered. However, if a spatial process is weakly stationary, we can write the covariance between any two observations as a function of the distance between them. We can define two types of weakly stationary covariance functions: anisotropic and isotropic ones. An anisotropic process has covariance function defined by (1.40) but covariances that differ with direction. In an isotropic process the covariance function does not depend on direction and r can be replaced by:

(60)

the Euclidean distance between si and sj. An isotropic process is

thus invariant under coordinate shifts and rotations.

Definition 1.26. Stationarity and isotropy mean that:

N +s = {xi +s} and rN = {rxi}

have the same distribution as N for any s ∈ Rd and any (Euclidean) rotation r around the origin, respectively.

Spatial point patterns can be classified into three classes: - Regular processes

- Aggregated processes

- Complete spatial randomness

All three types can be observed with isotropic or anisotropic covari-ance functions (Cressie (1993)[18]).

1.4.1 Regular processes

The simplest type of regular process is one that does not allow two events to be within a distance r of each other (Cressie (1993)[18]). Other subclasses of regular processes are listed by Cressie (1993)[18] and back in time referred in Matern (1986)[34], Stoyan and Stoyan (1994)[23], and Bartlett (1975)[19].

(61)

An example of a regular pattern in shown inFigure 1.2. Two hundred events were generated using an inhibition radius (r = 0.05) on a unit square.

1.4.2 Cluster processes

Aggregated, or clustered, processes include the Poisson cluster pro-cess, the Neyman- Scott propro-cess, and the Cox process. A general ag-gregated point pattern can be thought of a parent-offspring process where offsprings are dispersed around a parent event. The Poisson cluster process is generated by first obtaining parent events from a homogeneous Poisson process with mean measure µp, whereµp is the

expected number of parents. Each parent event produces a random number of offspring positioned around the parent according to some bivariate probability density. The parent events are then removed leaving only the offspring point process.

Figure 1.3: A Poisson cluster process

Figure 1.3 shows an example of two-hundred events in a Poisson cluster process on a unit square with µp = 15 and a bivariate normal

distribution with radius 0.075 for the offsprings events according to a standard bivariate normal distribution.

(62)

1.4.3 Complete spatial randomness

A CSR process is one that generates events uniformly and indepen-dently in a region A with area |A|. The number of events N(A),

A ⊂ Ω, follows a Poisson distribution with mean λ|A|, where λ is the average number of events per unit area. Further, given n events

si ∈ A, the si are independent realizations from a uniform

distribu-tion on A.

Figure 1.4: A CSR pattern

Figure 1.4show a CSR process with n = 200. Note that conditioning on n, yields a Binomial process. The importance of the CSR hypoth-esis lies in the fact that it is often used as a null hypothhypoth-esis for testing a spatial point pattern. Under CSR, a spatial point pattern has no structure and thus failing to reject such a hypothesis warrants no further examination of the data (Diggle (1983)[31]). A CSR spatial point pattern implies uniformity of events (E[N(A)] = λ|A|) as well as independence:

(63)

1.5 Marked spatial point processes

A flexible marked point process of the form:

Nm = {[xi;m(xi)]}

can be built starting from an unmarked point process N = {xi} and

providing each point xi ∈ N by a real-valued mark m(xi). This

procedure is called ”marking”.

Definition 1.27. Given a point process N a marked point process

Nm = {[xi;m(xi)]} is obtained if each point xi ∈ N is provided with

a random variable m(xi) called mark.

The focus here is in the intensity-dependence which means that the local point density affects the marks. For example, intensity-dependence allows the marks to be large (small) in areas of low point intensity and small (large) in areas of high intensity.

The simplest marking strategy is independent marking, where the marks are drawn for each point xi from a probability distribution

independent of each other and independent of the point process. This model is often used as a reference model. In geostatistical marking (Mase (1996)[35]; Schlather et al. (2004)[36]; Illian et al. (2008)[37]) the marks are drawn from a random field {U(s)} which is independent of the point process N: the marks are m(xi) =U(xi).

This marking generates correlated marks but is not able to model intensity-dependence by construction.

A step forward is intensity-dependent marking suggested by Stoyan (2008)[38] and Myllym¨aki (2006)[39] for the stationary log Gaus-sian Cox process generated by a random intensity {Λ(s)}. In these markings the mean of the conditional mark distribution of m(xi)

(64)

independent given the intensity {Λ(s)}. Although the marks are conditionally independent, they are marginally correlated. The log Gaussian Cox process as a point process model is a natural choice for two reasons. First, intensity-dependent marking presumes the existence of local variation in the point intensity, and thus, only clustered or heterogeneous point process models are relevant. Cox processes are such models, see Møller et al. (1998)[40] and Møller and Waagepetersen (2004)[41]. Second, the log Gaussian Cox pro-cess is a flexible model with nice theoretical properties. The existing intensity-dependent markings are useful models but assume that the variance of the conditional mark distribution does not depend on the point intensity.

In this case the intensity function λ = E(Λ(s)), that gives the mean number of points per unit volume, can be used to write

λ2g(ko−r k) dodr

which gives the probability that two infinitesimal disjoint regions of volumes do and dr both contain exactly one point of N. For further details see e.g. Stoyan and Stoyan (1994)[23],Stoyan et al. (1995)[24] or Illian et al. (2008)[37].

Stationarity means that the translated process {[xi+ s;m(xi)]} has

the same distribution as Nm. Note that the marks are kept

un-touched in the translation. Isotropy is defined in a similar way when translation is replaced by rotation around the origin.

Various first-order and second-order mark characteristics have been suggested to describe the properties of marked point processes (Stoyan and Stoyan (1994)[23]; Stoyan et al. (1995)[24]; Schlather (1999)[42];

Schlather et al. (2004)[36] and Illian et al. (2008)[37]). Their empir-ical counterparts are used in model identification, parameter esti-mation, evaluation of goodness-of-fit and in model interpretation.

(65)

These mark characteristics are conditional quantities (in the Palm sense): let Ex and varx stand for the conditional expectation and

variance, respectively, given that there is a point of N at the loca-tion x. Further let Exy refer to the conditional expectation given

there are two points of N at locations x and y. Because of stationar-ity and isotropy, it suffices to consider expectations Eo and Eor with k r k= r.

Then, the mean mark µm = Eo(m(o)) and the mark variance σ_m2 =

Varo(m(o)), which are the mean and variance of the mark

distri-bution function FM(m), are first-order characteristics of marks and

conditional on “there is a point of N at o”.

1.6 Poisson processes

Many processes in everyday life that “count” events up to a particu-lar point in time can be accurately described by the so-called Poisson process, named after the French scientist Sim´eon Poisson (1781-1840; appointed as full professor at the Ecole Polytechnique, Paris, in 1806 as a successor of Fourier). An (ordinary) Poisson process is a spe-cial Markov process, in continuous time, in which the only possible jumps are to the next higher state.

The simplest Poisson process is the stationary Poisson process on the line who is completely defined by the following equation, in which we use N(ai, bi] to denote the number of events of the process falling

in the half-open interval (ai, bi] with ai < bi ≤ ai+1:

P r{N(ai, bi] = ni, i = 1, . . . , k}= k Y i=1 [λ(bi −ai)]ni ni! eλ(bi−ai)_. _(1.44)

This definition embodies three important features:

(66)

distribution;

(ii) the number of points in disjoint intervals are independent random variables; and

(iii) the distributions are stationary: they depend only on the lengths

bi −ai of the intervals.

Thus, the joint distributions are multivariate Poisson of the special type in which the variates are independent.

Let us first summarize a number of properties that follow directly from above. Themean M(a, b] andvariance V(a, b], of the number of points falling in the interval (a, b] are given by:

M(a, b] = λ(b−a) = V(a, b]. (1.45)

The constant λ here can be interpreted as the mean rate or mean

density of points of the process. It also coincides with the intensity of the point process.

The fact that the mean and variance are equal and that both are proportional to the length of the interval provide a useful diagnostic test for the stationary Poisson process: estimate the mean M(a, b] and the variance V(a, b] for half-open intervals (a, b] over a range of different lengths, and plot the ratios V(a, b]/(b−a). The estimates should be approximately constant for a stationary Poisson process and equal to the mean rate. Any systematic departure from this constant value indicates some departure either from the Poisson as-sumption or from stationarity (seeCox and Lewis (1966)[43] (Section 6.3) for more discussion).

A Poisson process may also be viewed as a counting process that has particular, desirable, properties. A counting process {N(t), t ≥

0} is a stochastic process that counts the number of events that have occurred up to time t. Obviously, N(t) is non-negative and integer-valued for all t ≥ 0. Furthermore, N(t) is nondecreasing in

(67)

t. N(ti)−N(ti+1) equals the number of events in the time interval (ti, ti+1], ti < ti+1.

N(t) could denote the number of arrivals of customers at a railway station in (0, t], or the number of accidents on a particular highway in that time interval, or the number of births of animals in a particular zoo in (0, t], or the number of calls to a telephone call-center during that period. A Poisson process is a counting process that has the desirable additional properties that the number of events in disjoint intervals are independent (“independent increments”) and that the number of events in any given interval depends only on the length of that interval, and not on its particular position in time (“stationary increments”). In the case of the arrivals at the railway station, the stationarity assumption is clearly not fulfilled; there will be many

more arrivals between 5 p.m. and 6 p.m. than between, say, 5

a.m. and 6 a.m. Still, one might wish to study the arrival process at the railway station during the rush hour. Restricting oneself to subsequent working days between 5 p.m. and 6 p.m. does allow one to use the stationary increments assumption.

Definition 1.28. A Poisson processN(t), t ≥ 0 is a counting process with the following additional properties:

(i) N(0) = 0.

(ii) The process has stationary and independent increments.

(iii) P(N(h) = 1) = λh + o(h) and P(N(h) ≥ 2) = o(h), h ↓ 0, for some λ >0.

Above, the o(h) symbol indicates that the ratio

P(N(h) ≥ 2)

h tends to zero for h ↓ 0

The last property may look awkward at first sight, but is insightful. It states that having two or more events in a small time interval