• No results found

Spatial Semi-Parametric Bootstrap Method for Analysis of Kriging Predictor of Random Field

N/A
N/A
Protected

Academic year: 2021

Share "Spatial Semi-Parametric Bootstrap Method for Analysis of Kriging Predictor of Random Field"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Procedia Environmental Sciences 3 (2011) 14–19

1878-0296 © 2011 Published by Elsevier doi:10.1016/j.proenv.2011.02.015

Procedia Environmental Sciences 3 (2011) 81–86

Procedia

Environmental

Sciences

Procedia Environmental Sciences 00 (2011) 000–000

www.elsevier.com/locate/procedia

1

st

Conference on Spatial Statistics 2011

Spatial Semi-Parametric Bootstrap Method for Analysis of

Kriging Predictor of Random Field

N. Iranpanah

a

, A. Mansourian

b

, B. Tashayo

b

, F. Haghighi

c

a*

aDepartment of statistics, University of Isfahan, Isfahan, Iran

bFaculty of Geodesy and Geomatics Engineering, K.N.Toosi University of Technology, Tehran, Iran cDepartment of Statistics, Facultiy of Mathematics, Alzahra University, Tehran, Iran

Abstract

In the spatial statistics it is often assumed that the data follow a Gaussian random field. Efron introduced bootstrap method for independent data analysis (IIDB) but it can not be applied in spatial data analysis because of dependency of observations. In this paper, an algorithm is given for spatial semi-parametric bootstrap (SSPB) method to estimate the precision measures of plug-in kriging predictor of random field. We also compare IIDB and SSPB methods for analysis of plug-in kriging predictor in a Monte-Carlo simulation study. Finally, we use SSPB method for analysis of finite strain data in geology.

© 2010 Published by Elsevier Ltd.

Selection and/or peer-review under responsibility of [name organizer]

Keywords: Kriging; Bootstrap; Spatial semi-parametric bootstrap; Monte-Carlo simulation; Finite strain data.

1. Introduction

Spatial data can be thought of as a realization of random field. Spatial models (Cressie, 1993) have been extensively used in many disciplines that works with dependent data collected from different spatial locations. Determination of the spatial correlation structure of the data and prediction are two important problems in statistical analysis of spatial data. To do so a parametric variogram model is often fitted to the empirical variogram of the data. Since, there is no closed form for the variogram parameters estimators, they are usually computed numerically. In addition, when data behave as realizations from a non-Gaussian random field, the bootstrap method can be used for statistical inference of spatial data.

The bootstrap technique (Efron, 1979; Efron and Tibshirani, 1991) is a very general method to measure the accuracy of statistical estimator, in particular for parameter estimation from independent identically

* Corresponding author. Tel.: +98-311-7934593; fax: +98-311-7934601. E-mail address: [email protected].

Open access under CC BY-NC-ND license.

(2)

distributed variables. This is an important aspect of the bootstrap in the dependent case as the problem of model misspecification is more prevalent under dependence and traditional statistical methods are often very sensitive to deviations from model assumptions.

Usually, the IIDB is applied for analysis of dependent data (e.g. time series and spatial data) incorrectly. For spatially dependent data, block bootstrap and semi-parametric bootstrap methods can be used. Like the IIDB for independent data, block bootstrap provides tools for statistical analysis of dependent data, such as spatial data (Hall, 1985; Bühlman and Künsch, 1999 and Lahiri, 2003) without requiring stringent structural assumptions. The semi-parametric bootstrap method has been proposed by Freedman and Peters (1984) for the linear models and by Bose (1988) for autoregression models in time series. Iranpanah et al. (2009) introduced spatial semi-parametric bootstrap (SSPB) method for spatial data analysis.

In this paper, we first propose the SSPB method to estimate the precision measures of plug-in kriging predictor. Then, the SSPB method is compared with IIDB method in a Monte-Carlo simulation study. It is shown that SSPB method is more accurate than the IIDB for the analysis of spatial data. Finally, the SSPB method is used to estimate the bias, variance and distribution of the plug-in kriging predictor and variogram parameter estimators for the analysis of the finite strain data in geology.

2. Spatial Statistics

Usually a random field { ( ) :Z s sD}is used for modeling a spatial data, where the index set D is a subset of Euclidean space Rd , d t1. Suppose ( ( ),..., (1 ))T

N

z s z s

= is a vector of observations of a second order stationary random field =(.) with constant unknown mean P E Z s[ ( )] and covariogram.V( )h Cov Z s Z s[ ( ), ( h s h)]; , D. At a given location s0D the best linear unbiased predictor for Z s( )0 namely the ordinary kriging predictor is Z sˆ ( )0 OTz with kriging predictor

variance 2( )0 [ ( )ˆ 0 ( )]0 2 (0) T k s E Z s Z s m V  V O V , where OT (V 1m)T 61 and

1 T 1

( T 1 ) 1 m 1 6V 1 61 , where 1 (1,...,1)T , ( ( 0 1),..., ( 0 ))T N s s s s V V  V  and 6 is an

N uN matrix with ( , )i j th element given by ( )

i j

s s

V  (Cressie, 1993).

In reality, the covariogram is unknown and needs to be estimated based on the observations. An empirical estimator of covariogram is defined as Vˆ( )h Nh1 N h( )[( ( )Z s Z Z s h)( ( ) Z)]

 ¦    , where 1 1 ( ) N i i

Z N ¦ Z s is sample mean, N h( ) {( , ) :s si j si sj h i j; , 1,...,N and Nh is the number of elements of ( )N h . To fit a valid parametric covariogram model ( ; )V hT such as exponential,

spherical, Gaussian, linear models (Journel and Hüijbregts, 1978), several methods such as maximum likelihood (ML), restricted maximum likelihood (REML), ordinary least squares (OLS) and generalized least squares (GLS) can be applied to estimate T . Kent and Mardia (1996) also introduced the spectral and circulant approximations to the likelihood for stationary Gaussian random fields and Kent and Mohammadzadeh (1999) obtained a spectral approximation to the likelihood for an intrinsic random field. These approximations can be used to compute ˆT numerically.

The plug-in kriging predictor and its variance are determined by using ˆT instead of T in covariance function Vˆ( , )s si j V( , ; )s si j Tˆ , respectively as (1) 2 2 0 0 0 0 ˆ ˆ ˆ ˆ( ) ˆ( ; ); ˆk( ) k( ; ) Z s Z s T V s V s T

Mardia, Southworth and Taylor (1999) discussed the bias of maximum likelihood estimators. Under the assumption that (.)Z is Gaussian, Zimmerman and Cressie (1992) show that

2 2 2

0 ˆ 0 ˆ 0 ˆ 0

[ k( ; )] k( ) [ ( ; ) ( )] ,

E V s T dV s dE Z s T Z s

Where ˆT is ML estimator of T . We estimate variance of the plug-in kriging predictor 2

0 ˆˆ 0

( )s Var Z s( ( )) V

(3)

3. IID Bootstrap

Assume that X1,...,Xn be n iid random variables with common unknown distribution F. Suppose, 1

{ ,...,X Xn}

X denotes the random data and let

T

t

( ; )

X

F

, be a random variables of interest. The goal is to find an accurate approximation to the unknown distribution of T or to some population characteristics, e.g., the variance of T. The bootstrap method of Efron (1979) provides an effective way of addressing these problems without any model assumptions on F. The IIDB algorithm is done as following steps:

Step 1. Resampling and bootstrap sample.

Given X , we draw a simple random sample X* {X1*,...,Xn*} with replacement from X . Thus, conditional on X , {X1*,...,Xn*} are iid random variables with P X*( 1* Xi) 1,i 1,...,n

n , where P*

denotes the conditional probability given X . Hence, the common distribution of Xi* is given by the empirical distribution F xn( ) n 1 ni 1n I X1 ( i x)

 ¦  d , where ( )I A is the indicator function equal to 1

when A is true and equal to 0 otherwise.

Step 2. Bootstrap version T*

We define the bootstrap version T* of T by replacing X with X* and F with n

F as T* t(X*; )Fn .

Step 3. Bootstrap estimators.

We can estimate the population characteristics, e.g. the bias, variance of T or the unknown distribution of T as following bootstrap estimators

* * * * * * * 2 * * * * * * ˆ ( ) ( ) , ( ) [( ) ( )] , ( ) ( ). Bias T E T T Var T E T E T G t P T t   d

Step 4. Monte- Carlo approximation.

Closed- form expressions bootstrap estimator of T* are obtained for some random variables T. In other

cases we may evaluate precision measures of T* by Monte-Carlo simulation as follows. We repeat steps

1-2, B (e.g., B=1000) times to obtain bootstrap replicates T1*,...,TB* of T*. The Monte- Carlo approximations to the above bootstrap estimators are given by

(2)

* * * 1 1 ˆ ( ) B b , b Bias T T T B ¦  (3)

* * * 2 *( ) 1 1( 1 1 ) , B B b b b b Var T T T B ¦ B ¦ (4) * * 1 1 ˆ ( ) B ( b ). b G t I T t B ¦ d

4. Spatial Semi-Parametric Bootstrap

Suppose ( ( ),..., (1 ))T N

z Z s Z s are observations of a stationary random field { ( ) :Z s sD Rd} with unknown constant mean P E Z s( ( )) and covariance matrix 6. The Cholesky decomposition allows 6 to be decomposed as the matrix product T

LL

(4)

matrix. Furthermore, suppose that ( ( ),..., (1 ))T 1( )

N

s s L z

H { H H  P , is a vector of iid random variables distributed according to a cumulative distribution function ( )F H . Let ( )F Hn is the empirical distribution that is an estimate of ( )F H . Then the SSPB algorithm is done as following steps:

Step 1. Estimation and removal of correlation structure.

Let the spatial- dependence structure of residual R s( )i Z s( )i Pˆ is estimated by the covariance matrix ˆ6. Then, define ˆ ( ( ),..., (ˆ 1 ˆ ))T ˆ1

N

s s L

H{ H H  ƒ, where ˆL is a lower triangular N Nu matrix from Cholesky decomposition 6 ˆ LLˆ ˆT and

1

( ( ),..., ( ))T N

R s R s

ƒ { is a vector of residuals.

Step 2. Computation of empirical distribution F HN( ). Suppose that ( ( ),..., (1 ))T

N

s s

H{ H H is centred value of ˆH , where, H( )si( )si H;i 1,...,N and 1 1ˆ( ) N j j N s

H  ¦ H . The empirical distribution function formed from central uncorrelated residuals collection { ( ),..., (H s1 H sN)} is FN( )H N1¦iN1I( ( )H si dH).

Step 3. Resampling and Bootstrap sample.

Now, the IIDB algorithm for iid is used for the vector of centred residuals H . We generate N random variables H*( ),..., (s1 H* sN) having common distribution F HN ( ). In other words, * * *

1

( ( ),..., ( ))T N

s s

H { H H

is a simple random sample with replacement from centred residuals collection { ( ),..., (H s1 H sN)}. The bootstrap sample * ( *( ),...,1 *( ))T

N

z { Z s Z s can be determined by inverse transform z* Pˆ LˆH*.

Step 4-6.

We continue steps 4-6 same as steps 2-4 in IIDB algorithm.

5. Simulation Study

In this section, we conducted a simulation study to compare IIDB and SSPB variance estimators of

2

0 ˆˆ 0

( )s Var Z s[ ( )]

V , where Z sˆˆ ( )0 is the plug- in kriging predictor in (1). Let { ( ) :Z s sD N2} be a zero mean stationary Gaussian process with the exponential covariogram

(5) 1 0 1

0

( ; )

0,

h a

c

c

h

h

c e

h

V

T



­ 

°

®

°

z

¯

where T ( , , )c c a0 1 T are nugget effect, partial sill and range, respectively. For the parameter values 1 (1,1,1)

T

T (weak dependence) and T2 (0, 2, 2)T (strong dependence), we generate 1

{ ( ),..., ( N)}

z Z s Z s as realizations of the Gaussian random field Z(.) over three rectangular regionsD un n n; 6,12, 24.

To compare IIDB and SSPB variance estimators of V2( )s0 Var Z s[ ( )]ˆ 0 , we define the T*version of

plug-in ordinary kriging predictor T Z sˆˆ ( )0 OˆTz, based on a bootstrap sample z* by

* * *

0 ˆ

( ) T

T Z s O z . The IIDB and SSPB estimators Vˆ ( )2 s0 Var Z s*[ *( )]0 of V2( )s0 have closed form expressions as

(5)

2 2 2 1 2 0 ˆ ˆ 1 ˆIIDB( ) N T ; N N [ ( )i ] , i s S S N Z s Z V O O  ¦  2 2 2 1 2 0 ˆ ˆ ˆ 1 ˆSSPB( ) T , N i. i s SH SH N V O 6O   ¦

H



Table 1: True values of Vˆ ( )2 s0 , estimates of normalized bias and MSE for IIDB and SSPB variance estimators Vˆ ( )2 s0 .

T T1

2

T

Method n s x0( ,0 y0) V2( )s0 Bias MSE V2( )s0 Bias MSE

IIDB SSPB 6 (3.5,3.5) 0.410 -0.541 -0.217 0.305 0.276 1.467 -0.704 -0.255 0.508 0.311 IIDB SSPB 12 (6.5,6.5) 0.407 -0.531 -0.162 0.286 0.099 1.459 -0.630 -0.072 0.405 0.108 IIDB SSPB 24 (12.5,12.5) 0.370 -0.486 -0.086 0.238 0.028 1.347 -0.569 0.061 0.327 0.039

Table 1 shows the true values of V2( )s0 Var Z s[ ( )]ˆˆ 0 , estimates of normalized bias

2 2

0 0

ˆ

[ ( ) / ( ) 1]

E V s V s  and mean square error, E[ ( ) /Vˆ2 s0 V2( ) 1]s0  2 for IIDB and SSPB methods for each region D and covariance parameters T and 1 T based on 10000 simulation. The simulation result 2 indicates that the SSPB estimators are dramatically preferable to the IIDB versions, especially for stronger dependence structure and larger sample size. The reason is that spatial dependence in the data makes the IIDB method inappropriate.

6. Analysis of FSD

In this section, we apply SSPB method for finite strain data (FSD) that are collected by Mukul (1998). These data are collected from a part of the Sheeprock thrust sheet in the southern Sheeprock Mountain and the West Tintic Mountains, north- central Utah. The strain was measured in the quartzite of the Sheeprock thrust sheet and the geostatistics analysis was illustrated using the X/Z strain axial ratios. A sample from 56 locations was collected systematically from the quartzite on a square grid with spacing of approximately 625 m on the map along and across the strike of the Sheeprock thrust. The sample grids covered a 5 km×3 km rectangular area in the Sheeprock thrust sheet. Iranpanah et al. (2009) introduced the best variogram model between 64 different variogram models and compared theses variogram models with Mukulǯs model (Mukul, 1998) for analysis of spatial kriging predictor. Mukul et al. (2004) used IIDB method for analysis of these data, that are spatially dependent. Our goal is estimates of bias, variance and distribution for plug- in kriging predictor and variogram parameter estimators with SSPB method for these data.

The SSPB algorithm is first done with estimate and removal of correlation structure. For estimate of correlation structure of residuals, first, a valid parametric variogram 2 (., )J T is fitted to the empirical variogram estimation ˆ2 (.)J and then, the variogram parameters T are estimated. The spherical variogram is fitted to the FSD with T ˆ ( , , ) (0.0045, 0.0030, 2818)c c aˆ ˆ ˆ0 1 . The correlation structure can be estimated from estimate of semivariogram function as 6 ˆ V( ; )h Tˆ V(0; )Tˆ J( ; )h Tˆ .

Then, we determine residuals Hˆ Lˆ1(z Pˆ) and compare centred residuals

1 1 ˆ ˆ ( ) ( ) N ( ); 1,..., j j i i s s s i N N

H H  ¦ H . Finally, the bootstrap sample are determined as z* Pˆ LˆH*, where bootstrap vector H is generated as sampling with replacement from centred residuals vector H . *

Now, suppose that the plug-in ordinary kriging predictor T1 Z sˆˆ ( )0 and variogram parameter estimators Tˆ ( , , ) ( , , )T T T2 3 4 c c aˆ ˆ ˆ0 1 are interest, where Ti t z ii( ); 1, 2, 3, 4 . For example, if

0 (1500, 3000)

s be a new location then, Z sˆˆ ( )0 OˆTz 1.266 and also 0 1

ˆ ( , , ) (0.0045, 0.0030, 2818)c c aˆ ˆ ˆ

(6)

SSPB sample. We estimate the precision measures Bias T( )i and Var T( )i and distribution GTi ( )t by SSPB method and B times bootstrap replicates Ti*,1,...,Ti B*, ;i 1, 2,3, 4 in relations 2-4 in step 4.

Table 2 shows estimates of SSPB bias and SSPB variance for plug-in kriging predictor and estimates of variogram parameters based on B= 1000 times bootstrap.

Table 2: Estimates of SSPB bias and SSPB variance for plug-in kriging predictor and parameter variogram estimators. Estimates based on B= 1000 times bootstrap replicates for FSD.

* i T

Bias

* V ar* * 0 ( ) Z s 0.0133 0.0020 * 0 c 0.0014 2.5×10-6 * 1 c 0.0022 6.7×10-5 * a -487 1.4×10+6 References

[1] Bose, A. Edgeworth correction by bootstrap in autoregression. The Annals of statistics 1998; 16:1709-1772.

[2] Bühlman, P. Künsch, H.R., Comments on ”Prediction of spatial cumulative distribution functions using subsampling”. Journal of the American Statistical Association 1999; 94:97-99.

[3] Cressie, N. Statistics for spatial data. John Wiley, 1993.

[4] Efron, B. Bootstrap method; another look at the jackknife. The Annals of statistics 1979; 7:1-26. [5] Efron, B., Tibshirani, R. An introduction to the bootstrap. Chapman and Hall, 1993.

[6] Freedman, D.A. Peters, S.C. Bootstrapping a regression equation: Some empirical results. Journal of the American Statistical Association 1984; 79:97-106.

[7] Hall, P. Resampling a coverage pattern. Stochastic processes and their Application 1985; 20:231-246.

[8] Iranpanah, N., Mohammadzadeh, M., Vahidi Asl, M.Q. Yassaghi, A. Spatial data analysis of finite strain data across a thrust sheet using R. Computer and Geosciences 2009; 35:626-634.

[9] Iranpanah, N., Mohammadzadeh, M. Taylor C.C. Comparison between block and semi-parametric bootstrap methods for spatial data analysis. Computation Statistics and Data Analysis 2011; 55:578-587.

[10] Journel, A.G., Hüijbregts, C.J. Mining Geostatistics. Academic Press, London; 1978.

[11] Kent, J. T. Mardia, K. V. Spectral and circulant approximations to the likelihood for stationary Gaussian random field. Journal of Statistical Planning and Inference 1996; 50:379-394.

[12] Kent, J. T. Mohammadzadeh, M. Spectral approximation to the likelihood for an intrinsic random field. Journal of Multivariate Analysis 1999; 70:136-155.

[13] Lahiri, S.N. Resampling methods for dependent data. Springer, New York; 2004.

[14] Mardia, K.V., Southworth, H.R., Taylor, C.C. On bias in maximum likelihood estimators. Journal of Statistical Planning and Inference 1999; 76:31-39.

[15] Mukul, M. A spatial statistics approach to the quantification of finite strain variation in penetratively deformed thrust sheet: an example from the Sheeprock thrust sheet, Sevier fold-and-thrust belt, Utah. Journal of Structural Geology 1998; 20(4):371-384. [16] Mukul, M., Roy, D., Satpathy, S., Kumar, V.A. Bootstrapped statistics: a more robust approach to the analysis of finite strain data. Journal of Structural Geology 2004; 26:595-600.

[17] Zimmerman, D.L. Cressie, N. Mean squared prediction error in the spatial linear model with estimated covariance parameters. Annals of the institute of Statistical Mathematics 1992; 44:27-43.

References

Related documents

These data, the first to examine several critical aspects of how providers experience, view and make decisions about unconventional reproductive arrangements, sug- gest that

AIRWAYS ICPs: integrated care pathways for airway diseases; ARIA: Allergic Rhinitis and its Impact on Asthma; COPD: chronic obstructive pulmonary disease; DG: Directorate General;

It was decided that with the presence of such significant red flag signs that she should undergo advanced imaging, in this case an MRI, that revealed an underlying malignancy, which

19% serve a county. Fourteen per cent of the centers provide service for adjoining states in addition to the states in which they are located; usually these adjoining states have

This ethnographic study examines the social effects of the introduction of an enterprise-level quality control program (CMMI) on professional engineers in a complex

The degree of resistance exhibited after 1, 10 and 20 subcultures in broth in the absence of strepto- mycin was tested by comparing the number of colonies which grew from the

Field experiments were conducted at Ebonyi State University Research Farm during 2009 and 2010 farming seasons to evaluate the effect of intercropping maize with

(STEVENS-JOHNSON SYNDROME): Resulting in Blindness in a Patient Treated SEVERE ERYTHEMA MULTIFORME OF THE PLURIORIFICIAL