• No results found

Layered continuous time processes in biology

N/A
N/A
Protected

Academic year: 2021

Share "Layered continuous time processes in biology"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

Layered continuous time processes in biology

Combining causal

statistical time series with

fossil measurements.

Tore Schweder and Trond Reitan University of Oslo

Jorijntje Henderiks University of Uppsala

BISP7, Madrid 2011

(2)

Overview

1. Introduction - Motivating example: • Coccolith data (microfossils)

• Phenotypic evolution: irregular time series related by (possibly common) latent processes

2. Causality in continuous time processes

3. Stochastic differential equation vector processes • Ito representation and diagonalization

• Tracking processes and hidden layers • Kalman filtering

• Model variants 4. Inference and results

• Bayesian inference on model properties, models, process parameters and process states

• Results for the coccolith data.

5. Second application: Phenotypic evolution on a phylogenetic tree • Primates - preliminary results

6. Conclusion

(3)

The size of a single cell algae (

Coccolithus

) is

measured by the diameter of its fossilized coccoliths

(calcite platelets). Want to model the evolution of a

lineage found at six sites.

•Phenotype (size) is a process in continuous time

and changes continuously.

Irregular time series related by latent processes:

evolution of body size in

Coccolithus

Henderiks – Schweder - Reitan

19,899 coccolith measurements, 205

sediment samples (1< n < 400) of body

size by site and time (0 to -60 my).

(4)

Our data – Coccolith size measurements

205 Sample mean log coccolith size (1 < n < 400) by time and site.

(5)

Evolution of size distribution

Fitness = expected number of

reproducing offspring.

The population tracks the

fitness curve (natural selection)

The fitness curve moves about,

the population follow.

With a known fitness,

µ

, the

mean phenotype should be an

Ornstein-Uhlenbeck process

(Lande 1976).

With fitness as a process,

µ(t)

,

,

we can make a tracking model:

5

(

)

(

)

)

(

t

X

t

dt

dW

t

dX

(

)

(

)

(

)

)

(

t

X

t

t

dt

dW

t

dX

(6)

The Ornstein-Uhlenbeck process

Attributes: • Stationary  Normally distributed  long-term level:   Standard deviation: s=/ •Markovian  α: pull  corr(x(0),x(t))=e-t

 Time for the

correlation to drop to 1/e: t =1/α

1.96 s-1.96 s

t

The parameters (, t, s) can be estimated from the data. In this case: 1.99,

t=1/α0.80Myr, s0.12. 6

(

)

(

)

)

(

t

X

t

dt

dW

t

(7)

One layer tracking another

Black process (t1=1/1=0.2,

s1=2) tracking red process (t2=1/2=2, s2=1)

Auto-correlation of the upper (black) process, compared to a one-layered SDE model.

A slow-tracking-fast can always be re-scaled to a fast-tracking slow process. Impose identifying restriction: 1 2 7

(

)

(

)

)

(

)

(

)

(

)

(

)

(

2 2 2 2 2 1 1 2 1 1 1

t

dW

dt

t

X

t

dX

t

dW

dt

t

X

t

X

t

dX

(8)

Process layers - illustration

Layer 1 – local phenotypic expression External series Stationary expectancy

Observations

T

Layer 2 – local fitness optimum

Layer 3 – hidden climate variations or primary optimum

(9)

Causality in SDE processes

We want to express in process terms the fact that climate

affects the phenotypic optimum which again affects the

actual phenotype.

Local independence (

Schweder 1970

):

Context: Composable continuous time processes.

Component A is locally independent of B iff transition

prob. of A does not depend on B.

If B is locally dependent on A but not vice versa, we

write A→B.

Local dependence can form a notion of causality in cont. time

processes the same way Granger causality does for

discrete time processes.

(10)

Stochastic differential equation (SDE)

vector processes

10 ) ( )) ( ) ( ( ) ( matrix diffusion l dimensiona matrix pull l dimensiona : matrix regression a process, level l dimensiona : ) ( ) ( motion) (Brownian process ener vector Wi l dimensiona : ) ( process state vector l dimensiona : ) ( t dW dt t t AX t dX q p p p A r p p t u t q t W p t X              matrix. r eigenvecto left the is , of matrix eigenvalue diagonal a is existing) (when ! / ) ( ) ( ) ( ) ( ) ( ) ( ) 0 ( ) ( ) ( : solution Ito 0 1 0 0 1 V A Λ V e V n t A e t B u dW u B du u u B X t B t X n t n n At t t

                     

(11)

Stochastic differential equation (SDE)

vector processes

11                                             L L L L L L L L L L L L L t dB dt X X t dB dt X X t dB dt X X X X               0 0 0 0 0 0 0 0 0 0 0 0 0 A : this like looking matrix pull a yield will this form, al in vectori Expressed ) ( ) ( dX ) ( ) ( dX ) ( ) ( dX processes tracking like -OU of system a as this express would we X structure causal the have want to we If : SDEs linear in structure Causal 1 1 2 2 1 1 1 1 2 1 1 1 1 1 1 1 1 -L L 1 2 1 L            

The zeros in this matrix determines the local independencies.

(12)

Model variants for Coccolith evolution

Model variations:

1, 2 or 3 layers

(possibly more)

Inclusion of external time series or extra internal time

series in connection to the one we are modelling

In a single layer:

Local or global parameters

Correlation between sites (inter-regional correlation)

Deterministic response to the lower layer

Random walk (no tracking)

(13)

Likelihood: Kalman filter

13

The Ito solution

gives, together with measurement variances, what is needed to calculate the likelihood using the Kalman filter:

and

t t

u

dW

u

B

du

u

u

B

X

t

B

t

X

0 0 1

)

(

)

(

)

(

)

(

)

0

(

)

(

)

(

i i i z z m z E( | 1,, 1)  Var(zi | z1,, zi1)  vi

)

(

)

(

)

(

t

1

X

t

2

X

t

n

X

1

z

z

2

z

n Process Observations

Need a linear, normal Markov chain with independent normal observations:

)

,

,

|

(

)

|

(

i 1 N i i i

n

v

m

z

f

z

f

(14)

Kalman smoothing (state inference)

A tracking model with 3 layers and fixed parameters

North Atlantic

Red curve: expectancy Black curve: realization Green curve: uncertainty

Snapshot:

(15)

Parameter and model inference

Wide but informative prior distributions respecting identifying

restrictions

MCMC for parameter inference on individual models:

MCMC samples+state samples from Kalman smoother ->

Possible to do inference on the

process state conditioned only

on the data.

Model likelihood (using an importance sampler) for model

comparison (posterior model probabilities).

Posterior weight of a property C

from model likelihoods:

15

   ' ' ' ) | ( 1 ) | ( 1 ) ( C C M C C M C M data f n M data f n C W

(16)

Bayesian inference on the 723 models

16       ' ' ' ) | ( 1 ) | ( 1 ) ( C C M C C M C M data f n M data f n C W

(17)

Results

Best 5 models in good

agreement. (together, 19.7% of

summed integrated likelihood):

Three layers.

Common expectancy in

bottom layer .

No impact of exogenous

temperature series.

Lowest layer: Inter-regional

correlations,

 

0.5.

Site-specific pull.

Middle layer: Intermediate

tracking.

Upper layer: Very fast

tracking.

Middle layers: fitness optima

Top layer: population mean log size

(18)

Phenotypic evolution on a phylogenetic tree:

Body size of primates

18

model. SDE the from s expression parametric as matrix covariance and r mean vecto with d distribute normally -multi be will ) , 0 ( ) ( ns Observatio . given tree for the equalities on these Condition . when ) ( ) ( . from split when Time : tree ic Phylogenet coccoliths for as model tracking same optimum primary t, environmen underlying ) ( optimum fitness ) ( mass) (log phenotype mean ) ( fossils) 117 ts, measuremen extant (92 209 , , 1 ts Measuremen 2 , , , , , , 1 , 2 , 3 , 3 , 2 , 1 , k k k i k i j i l j l i j i i i i i i i N t X Z T t t X t X j i T X X X t X t X t X i             

(19)

First results for some primates

(20)

Why linear SDE processes?

 Parsimonious: Simplest way of having a stochastic continuous time process that can track something else.

 Tractable: The likelihood, L()  f(Data | ), can be analytically

calculated by the Kalman filter or directly by the parameterized multi-normal model for the observations. ( = model parameter set)

 Can code for causal structure (local dependence/independence).

 Some justification in biology, see Lande (1976), Estes and Arnold (2007), Hansen (1997), Hansen et. al (2008).

 Great flexibility, widely applicable...

 Thinking and modelling might then be more natural in continuous time.

 Allows for varying observation frequency, missing data and observations arbitrary spaced in time.

 Extensions:

 Non-linear SDE models…

 Non-Gaussian instantaneous stochasticity (jump processes).

(21)

Bibliography

 Commenges D and Gégout-Petit A (2009), A general dynamical statistical model with causal interpretation. J.R. Statist. Soc. B, 71, 719-736

 Lande R (1976), Natural Selection and Random Genetic Drift in Phenotypic Evolution, Evolution 30, 314-334

 Hansen TF (1997), Stabilizing Selection and the Comparative Analysis of Adaptation, Evolution, 51-5, 1341-1351

 Estes S, Arnold SJ (2007), Resolving the Paradox of Stasis:

Models with Stabilizing Selection Explain Evolutionary Divergence on All Timescales, The American Naturalist, 169-2, 227-244

 Hansen TF, Pienaar J, Orzack SH (2008), A Comparative Method for Studying Adaptation to a Randomly Evolving Environment,

Evolution 62-8, 1965-1977

 Schuss Z (1980). Theory and Applications of Stochastic Differential

Equations. John Wiley and Sons, Inc., New York.

 Schweder T (1970). Decomposable Markov Processes. J. Applied

Prob. 7, 400–410

Source codes, examples files and supplementary information

can be found at http://folk.uio.no/trondr/layered/.

21

References

Related documents

Principal component analysis (PCA) was used to obtain main cognitive dimensions (based on the continuous neurocognitive test variables) and MCA to detect and

The price shall include the cost of labor, materials, equipment usage, utilities, and fuel for: [preparation of the Bench-Scale Test Plan] [collecting samples,] [sample

This implies employment in the retail of second hand goods sector, employment in repair activities by employment in the repair of machinery and equipment sectors and the repair of

This study used historical data to study the impact of the independent variable advances, on the dependent variables which were the gross &amp; net NPA% of a midsized

The addition of calcium sulfate alleviated the growth reduction in DN 1 caused by NaCl and in Pima Cobalt caused by Na 2 SO 4 salinity.

• Ensure to load the BIOS default settings to ensure system compatibility and stability. Select the Load Optimized Defaults item under the Exit menu. See section 2.10 Exit.. Menu

Depending on the specifics of the NoSQL architecture, this loosening of ACID consistency enables the database to provide high performance while scaling large data sets

For the purpose of Big Data and Analytics, this layer calls out several types of applications that support analytical, intelligence gathering, and performance management