• No results found

Predictive spatio-temporal models for spatially sparse environmental data

N/A
N/A
Protected

Academic year: 2020

Share "Predictive spatio-temporal models for spatially sparse environmental data"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

spatially sparse environmental data

Xavier de Luna

and Mar G. Genton y

Otober 17,2003

Abstrat

Wepresentafamilyofspatio-temporalmodelswhiharegearedtoprovidetime-forward

preditionsinenvironmentalappliationswheredataisspatiallysparsebuttemporallyrih.

Thatismeasurementsaremadeatfewspatialloations(stations),butatmanyregulartime

intervals. Whenpreditionsinthetimediretionisthepurposeoftheanalysis,then

spatial-stationarityassumptionswhihare ommonlyusedin spatial modeling, arenotneessary.

The familyof models proposed does not makesuh assumptions and onsists in avetor

autoregressive(VAR) speiation,where there areasmanytimeseries asstations.

How-ever,bytakingintoaountthespatialdependenestruture,amodelbuildingstrategyis

introdued,whihborrowsitssimpliityfromtheBox-Jenkinsstrategyforunivariate

autore-gressive(AR)modelsfortimeseries. AsforARmodels,modelbuildingmaybeperformed

either bydisplayingsamplepartial orrelationfuntions,orbyminimizing aninformation

riterion. Two environmental data sets are studied. In partiular, wend evidene that

a parametri modeling of the spatio-temporal orrelation funtion is not appropriate

be-auseitrestsontoostrongassumptions. Moreover,weproposetoomparemodelseletion

strategieswith anout-of-samplevalidationmethodbasedonreursivepreditionerrors.

Keywords: Aumulated predition errors; Spatio-temporal orrelation; Partial

orrela-tion;Vetor autoregression.

DepartmentofStatistis,UmeaUniversity,S-90187Umea,Sweden. E-mail: xavier.delunastat.umu.se

y

DepartmentofStatistis,NorthCarolinaStateUniversity,Box8203,Raleigh,NC27695-8203,USA.E-mail:

(2)

Inthisartilewepresent a modelbuildingstrategydesignedto workwithina familyofvetor

autoregressive modelsfortimeseriesbeing reordedat spei spatialloations. The

method-ology has been developed withenvironmental appliations inmind where measurementson a

variablearemadeatregulartimeintervalsandatseveralstationsloatedwithinaspeiarea.

Morespeially,wefousonsituationsweremeasurementsareavailableatafewstations the

spatio-temporal data is sparse inspae butrih in time. We put the disussioninto onrete

form withtwo examplestreated previouslyintheliterature.

The rst data set that we onsider onsists in average daily wind speeds measured at 11

synopti meteorologial stations loated in Ireland during the period 1961-78, making up for

6,570 observations per loation. Gneiting (2002) used this data set to illustrate the lak of

realism implied by the separability assumption, where the orrelation funtion is written as

a produtof a purely spatialorrelation funtionand a purely temporal orrelation funtion.

Modelingdiretly thespae-time orrelation byan appropriate parametri familyof funtions

is inspired from the geostatistial tradition where spatial dependenes are of main interest.

Suhmodelsmake strongspatialstationarityassumptions: typiallyisotropyisassumedwhere

spatialorrelationsareafuntiononlyofthedistanebetweenstations. Whenspatio-temporal

orrelationfuntions are onsidered, an isotropy-like assumption is typiallymade, where the

relativeloationof thestationsisagain onsideredasirrelevant;see, e.g.,Mardia andGoodall

(1993), Host, Omre, and Switzer (1995). These spatial stationarity assumptions are mainly

appropriate for data sets rih in spae (so that they an be heked) and when the purpose

is to obtain preditions at spatial loations where there are no measurements available. In

orderto applythegeostatistialapproahtoenvironmentalspae-timedata, manyeortshave

beenmaderesulting inthe introdutionof transformationsahievingspatialstationarity (e.g.,

Guttorp, Meiring, and Sampson, 1994) or byonstruting exible enough ovariane models,

e.g., by allowing non-separable ovariane funtions, see Gneiting (2002). A geostatistial

inspiredanalysisforspatio-temporaldatawhihdoesnotmake neitherspatialstationaritynor

separability assumptions was reently proposed by Stroud, Muller, and Sanso (2001). Their

approah isexiblebutisrelevant fordata whihis rihbothintimeand inspae.

Aspatio-temporaldatasetwithmeasurementsavailableonlyatafewstationswasanalyzed

in Tonellato (2001). Carbon monoxide atmospheri onentrations were observed at four

(3)

observationsmade atthedierent loations. Thiswasahieved withamultivariatetimeseries

state spae modeltogetherwiththeBayesianinferentialparadigm. Tonellato's modelassumes

isotropy. Thiswasavoided withthe Kalmanlter proposedinWikle andCressie (1999).

When datasets are sparsein spae, multivariate time seriesmodelsare indeedmost

exi-bleto providetime-forwardpreditionstaking into aount spatio-temporal dependenies. We

introduein thispapera modeling approah whih avoids making anyspatialstationarity

as-sumptions. Weusevetorautoregressive(VAR)models,andproposeamodelbuildingstrategy

whih borrows itssimpliityfromthe widelyappliedmodelseletion methodsusedfor

autore-gressive (AR) modelsfor univariate timeseries (Box and Jenkins, 1976). For AR models,the

seletion problemisruially simpliedbeause of the naturalnestedness of themodels: time

lags for the time series are introdued in the model sequentially and only p+1 modelsneed

to bevisited,wherep isthemaximumtimelag. In ageneral vetor autoregressivemodel,this

nestingdoesnotexist,butanberetrievedbyonsideringaspae-timehierarhywhihisoften

almost asnaturalasthe purelytemporal hierarhy.

The artileis set up asfollows. Setion 2 desribes thevetor autoregressive modelswith

a spatial struture. The theory related to the inferene on the parameters is available in

the literatureforgeneral VAR models. The neessary stationarityassumptions are desribed.

Cruially,spatialstationarityis nota neessaryassumption anda dierentmodelan bebuilt

for eah station. Temporal trends present in the data an be estimated and/or removed in a

lassialmanner. Themodelbuildingstrategythatwethenintrodueisthemainoriginalityof

thispaper,andisessentialsineitmakestheVARfamilyofmodelsrelativelysimpleandnatural

to use for the environmental appliations of interest. Our modeling approah is also applied

on the Veniian arbon monoxide data in Setion 2. In Setion 3, we perform a orrelation

analysison theIrishwinddatasetwhihallows usto investigateisotropyand spatio-temporal

orrelation symmetry assumptions. Our analysis indiates that the latter assumption is not

fullledforthisappliation. We alsoshowtheexibilityof ourmodelsompared to thediret

modeling of the spae-time orrelation funtion. Both modeling approahes are useful when

used on the appropriate type of data. In Setion 4 we fous on preditive performane and

useanout-of-samplevalidationtehniquetoomparedierentmodelingstrategies,bothonthe

Veniianarbon monoxideand theIrishwinddata. The artileisonluded witha disussion

(4)

Themodelsdevelopedinthissetionarespeiallydesignedfortheanalysisofspatio-temporal

datasetswiththepurposeofprovidingtime-forwardpreditionsatgivenspatialloations. Our

aim isto provide preditionsbased ona minimumof assumptions.

2.1 Time-stationary models

We assume that observations z(s

i

;t) are made at s

i

;i = 1;:::;N loations (stations) and

t=1;:::;T times. Apreditive model forz

t

=(z(s

1

;t);:::;z(s

N ;t)) 0 is z t = p X i=1 R i (z t i

)+"

t

; (1)

where =((s

1

);:::;(s

N ))

0

isa vetor of spatialeets (spatialtrend), R

i

,i=1;:::;p,are

unknown N N parameter matries, and "

t

is an N-dimensional white noise or innovation

proess,thatis,E("

t

)=0,E("

t " 0 t )= "

,andE("

t "

0

u

)=0foru6=t. Thismodelisappropriate

onetemporal trendshave beenremoved, seethe disussioninthenext setion.

The model(1)isa vetorautoregressive (VAR)model ommonlyusedinmultivariatetime

series analysis (e.g., Lutkepohl, 1991; Pe~na, Tiao, and Tsay, 2001). When z

t

is univariate

(onlyone timeseriesisobserved)then(1)is simplyalledan autoregressive (AR)model. The

deterministi dynami system dened by (1) where the error term "

t

is dropped also alled

skeletonof themodel ,issaid tobestableifitsreverseharateristi polynomialhasnoroots

inand on theomplexunit irle,i.e.

det(I R

1

x ::: R

p x

p

)6=0; forx2C; jxj1: (2)

The stability property ensures that the iteration of the dynami system yields a onverging

path towards a onstant. Stability is an important propertysine it impliestime-stationarity

ofthestohastiproess(e.g.,Lutkepohl,1991;Pe~naetal.,2001), i.e.,E(z

t

)=,forallt, and

Cov(z t ;z t )= z

(), i.e. is a funtionof only,forall t and =0;1;2;:::. The ovariane

matrix

z

() an then be omputed from the parameter matries R

1 ;:::;R

p

and

" . For

instane, when p=1, we have ve(

z

(0))=(I

N 2 R 1 R 1 ) 1 ve ( " ) and z

() =R

1 z

(0),

whereve() denotestheoperatorvetorizing a matrix.

Estimationof theparameters inmodel (1)an bearried outwithmaximum likelihood(if

distributional assumptions are made), with least squares or with moments estimators

(5)

worth mentioning that robust estimation of the parameters may be obtained by usingrobust

estimatorsofmoments(MaandGenton,2000)togetherwithYule-Walkerestimatingequations

asitwasproposedindeLunaandGenton(2002). Estimationoftheparametersanbearried

out for all stations simultaneously or station-wise in an equivalent manner. However, model

buildingmustbe performed separately foreah station,seeSetion2.3.

An important property of the presented models is that they do not assume any spatial

stationarity,forinstanethatspatialorrelationsdependonlyonthedistanebetweenstations

(isotropy). Suh assumptions have often been made in the spatio-temporal literature, for

in-stane to develop spae-time ARMAX models(Pfeifer and Deutsh, 1980; Stoer, 1986). For

theappliations of interest inthis paper, spatial-stationarityis an over-restritive assumption

whih is,moreover, diÆultto hekinpratie.

2.2 Spatio-temporal trends

Intheunivariatetimeseriesliteraturetwotypesoftrendsareusuallyonsidered,deterministi

trends typiallyfuntionsoftime andstohastitrends,see,e.g.,DagumandDagum(1988)

and referenes therein. Deterministi trends an often be removed by dierening with r d

,

the lassial time series dierene operator of order d. For instane, we have rz(s

i ;t) =

z(s

i

;t) z(s

i

;t 1);r 2

z(s

i

;t)=(z(s

i

;t) z(s

i

;t 1)) (z(s

i

;t 1) z(s

i

;t 2)); andsoon.

Model(1)an, therefore,beused aftervery generaltypesofspatio-temporal trendshave been

removed. Forinstane,a naturaldeterministitrend speiationis thefollowing:

z(s;t)=+g(s;t)+y(s;t);

wherey(s;t) isa timestationary proess. Let usnow take dierenes intime:

rz(s;t)=g(s;t) g(s;t 1)+ry(s;t)

wherery(s;t) isatimestationaryproessbydenition. Thus,rz(s;t) anbemodeled by(1)

if(s)=g(s;t) g(s;t 1)is afuntionofs only. This,we believe,willhappenmostoftenin

pratie,atleastapproximately. Indeed,itwillhappenassoonasg(s;t)isapolynomialfuntion

int with oeÆients possiblyfuntion of s. A simpleexample, for whih a dierene of order

one willsuÆeis: g(s;t)=

1 (s)+

2

(s)t. In otherwords,diereningeliminatesdeterministi

polynomialtimetrendsinteratingwithaspatialtrend. Anyspatio-temporaltrendg(s;t)whih

(6)

trend,forinstane,z(s;t) maybean integrated (intime) proess.

Inthespatialdimension,diereninghasalsobeensuggestedinorderto removetrends,see

e.g. theintrinsirandomfuntionsoforder k,IRF-k,proposedbyMatheron (1973). However,

they an beused onlyifthespatialstationsareloated on aregular grid,whihis seldomthe

ase inpratie.

Periodi time trends or yles may also be takled by taking dierenes. For instane,

observationstaken monthlywilltypiallyhave to be stationarizedwiththeseasonal dierene

operator r

12

z(s;t) = z(s;t) z(s;t 12). When other variables are observed at the same

loations and times,these mayalso beused to modeltrendsbyregressing on them.

Another popularapproah onsistsinmodelingadeterministi trendwith a weightedsum

ofknownbasisfuntions,wheretheweightsaretypiallyestimatedbyregression. Forexample,

periodifuntionsanbeusedtoaountforseasonaleetsalongthetimeaxisandpolynomials

anbeusedtomodelsmoothvariationsinspae. Furtherdisussionsanbefoundinthereview

artilebyKyriakidisand Journel(1999).

2.3 Model building and heking

Althoughthe model presentedinSetion 2.1is essentiallya VAR model,thespatialstruture

of thedata isinformative and, inpartiular,the parametermatries R

i

's willtypiallyhave a

spei struture.

We, therefore, introdue a model building strategy to identifyzeroes in the R

i

's matries

of model (1). The N rows of these matries orrespond to the N loations at whih time

series are observed. These N stations must be modeled separately if no spatial-stationarity

assumption is to be made. For eah station s, we have a ovariate seletion problem, where,

for the responsez(s;t), the available preditors are the time-lagged valuesat all stations, i.e.

z(s

i

;t j),i=1;:::;N andj =1;:::. Thiswouldbea omplexmodelseletionproblemifall

laggedvaluesatallstationshadtobeonsideredaspotentialpreditors,sinethenthenumber

ofpotentialmodelswouldrapidlyexplodewiththenumberofstationsandthetimefrequeny.

However, bytakingintoaountthespatio-temporalstruture,itbeomespossibletodenean

ordering with whih to sequentially introduethe preditors inthe model for z(s;t). Suh an

ordering is used inunivariate time seriesmodelswhere for explainingx

t

(a variable observed

at timet) thelag onevariable x

t 1

is onsideredrst,thenthe lagtwo x

t 2

(7)

0000000000000000000

0000000000000000000

0000000000000000000

0000000000000000000

0000000000000000000

0000000000000000000

0000000000000000000

0000000000000000000

1111111111111111111

1111111111111111111

1111111111111111111

1111111111111111111

1111111111111111111

1111111111111111111

1111111111111111111

1111111111111111111

000000000000

000000000000

000000000000

000000000000

000000000000

111111111111

111111111111

111111111111

111111111111

111111111111

t

t−1

t−2

s

Figure 1: A shemati representationof theorderingof thestations(3).

In thespatio-temporal ontext weproposeto use anatural orderingdened asfollows. To

explainz(s;t), onsiderpreditors inthefollowingorder

z(s;t 1);z(s(1);t 1);z(s(2);t 1);:::;z(s(N 1);t 1);z(s;t 2);z(s(1);t 2);::: (3)

where s(1);s(2);:::;s(N 1) is an ordering of the stations, for instane, in asending order

with respet to their distane (using a given metri) to s. This ordering of the stations, and

thereby the ovariates, is most learly explained graphially, see Figure 1. Other orderings

an be onsidered as well, for instane orderings motivated by physial knowledge about the

underlyingproess,seethe Irishwind speedappliationinSetion3.

Thefatthatpreditorsanbeentered inthemodelsequentiallysimpliesthemodel

build-ingstage and several strategies may be followedto knowhow manyof these preditors should

be used. A popular tehnique in time series modeling is to look at partial autoorrelations.

Forthespatio-temporalmodelsunderonsiderationandwithagivenorderingofthepreditors

we an straightforwardly generalizethistime seriestehniquebylooking at partialorrelation

along the ordering of preditors. For three random variables x, z and y, the latter possibly

vetor valued,thepartialorrelationof xand z given yis denedas

(8)

of x given y. This partial orrelationhas the property that it is equal to zero when x and z

areindependentonditional ony.

Renaming thesequene (3)asfollows

x

1

= z(s;t 1);x

2

=z(s(1);t 1);:::;x

N

=z(s(N 1);t 1);

x

N+1

= z(s;t 2);x

N+2

=z(s(1);t 2);:::;x

2N

=z(s(N 1);t 2);

:::;

we an denea partialorrelationfuntion(PCF) forstations as

s

(h)=Corr(z(s;t);x

h jx

1 ;:::;x

h 1 ):

TheusefulnessofthePCFformodelseletionisnowlaried. Letusdeneh

1

to besuhthat

s (h

1

)6=0and

s

(h)=0forh

1

<hN. Similarly,avalueh

i

anbedenedforeahtimelagi,

suhthat

s (h

i

)6=0and

s

(h)=0forh

i

<hiN. Theordersh

i

'sanbeidentiedbylooking

at the sample partial orrelation funtion ^

s (h) =

[

Corr(z(s;t) ^

P(z(s;t)jx

1 ;:::;x

h 1 );x h ^ P(x h jx 1 ;:::;x

h 1

)). Anapproximatetestfor

s

(h)=0isobtainedbynotingthat,underjoint

normality of the variables, ^

s (h)

p

(n h)=(1 ^

s (h)

2

) is t-distributed with n h degrees of

freedom, wherendenotesthesamplesizeutilized;see,e.g.,Krzanowski(1988,Se. 14.4). The

normalityassumptionisfairlynatural whenlinear modelsareutilized. Thistest statistis an

be usedto derive ondeneintervals for

s

(h)=0, seeSetion 2.4. We proposethefollowing

strategy.

Identifiation strategy

Step 0: Chooseone of theobserved sites,s.

Step 1: Identifyh

1

bylookingat thesample PCF^

s

(h), h=1;:::;N.

Step2: Identifyh

2

bylookingatthesamplePCF^

s

(h),h=N+1;:::;2N whenx

h

1 +1

;:::;x

N

havebeenputto zerosinethey have beenshownto beunhelpfulinexplainingz(s;t) in

thepreviousstep.

Step3: Identifyh

3

bylookingatthesamplePCF^

s

(h),h=2N+1;:::;3N whenx

h

1 +1

;:::;x

N

and x

h2+1 ;:::;x

2N

have beenputto zerosine theyhave beenshown to be unhelpfulin

(9)

h

4 ,h

5 ,et.

Step 5: Repeat thepreviousstepsforall observed sites.

Thedeletionofuninterestingpreditorsateah timelagimproveson theeÆienybyavoiding

the estimation of zero oeÆients. However, this proedure does not avoid the estimation of

all zerooeÆients, sine, forinstane,thepreditors inludedat laterstagesmay putto zero

oeÆientsthatwerenotsowhenonsideringpartialmodels. Thisproblematiis,however,also

present intheidentiationoftheorder of linearAR models. For suh models,all parameters

are estimated up to a given time-lag by onvention. This isonvenient sineit avoids hasing

zero oeÆientsbyusinga sequeneof t-tests onthe oeÆients.

An alternative to the use of the PCFis the use of an automati model seletion riterion

in eah step of the identiation strategy. AIC (Akaike information riterion, Akaike, 1974)

orBIC (Bayesian informationriterion,Shwarz, 1978)maybe used,theformerbeingusually

preferred forpreditivepurposes.

Finally, model heking is ommonly done by looking at residuals for signs of deviation

frommodelassumptions. Afeaturethatan be ontrolledforis,forinstane,whethertheyare

orrelated intime.

2.4 Carbone monoxide in Venie

The data set available has T = 300 hourly observations of atmospheri onentration

(mi-rograms per ubi meter) of arbon monoxide (CO) reorded in September 1995 at N = 4

monitoring stations loated in Mestre (Venie, Italy). This data set has been analyzed by

Tonellato (2001) ina Bayesiandynamilinear modelframework.

Our methodology is partiularlywell suited forthis appliation for several reasons. First,

Italianlawrequires publiauthoritiesto produeshort-termforeasts ofair pollutant

onen-trationsat loationswheremonitoring stationsarepresent,thatis,spatialinterpolationisnot

of prime interest. Seond,our model does notmake spatialstationarity assumptionsby

mod-eling eah station separately, in ontrast with Tonellato (2001) who used a spatial stationary

isotropi exponential orrelation funtion. Suh assumptions are arbitrary beause with only

fourstationsit isnotpossibleto assess theirvalidity. Moreover, windspeedand diretion an

(10)

Time (hours)

CO concentrations

0

50

100

150

200

250

300

0

1

2

3

4

5

6

Station 1

Time (hours)

CO concentrations

0

50

100

150

200

250

300

0

1

2

3

4

5

6

Station 2

Time (hours)

CO concentrations

0

50

100

150

200

250

300

0

1

2

3

4

5

6

Station 3

Time (hours)

CO concentrations

0

50

100

150

200

250

300

0

1

2

3

4

5

6

Station 4

Figure2: The timeseriesof COonentrationsat the 4stations.

-0.5

0.0

0.5

1.0

-0.6

-0.4

-0.2

0.0

0.2

0.4

2

1

3

4

Locations of stations

(11)

eah station. The maximum lag allowed forwith BIC was 30. Parameters areestimated with

least squares. The models are written in matrix form as a VAR(2) model to failitate the

omparison with the VAR model with spatial struture identied in Table 2. The residual

orrelationmatrixassumesthat thetimeseriesareindependentof eahother.

^ R1 ^ R2 ^ " 0 B B B B B B

0:64 0 0 0

0 0:67 0 0

0 0 0:48 0

0 0 0 0:53

1 C C C C C C A 0 B B B B B B

0 0 0 0

0 0 0 0

0 0 0:19 0

0 0 0 0:17 1 C C C C C C A 0 B B B B B B

0:13 0 0 0

0 0:11 0 0

0 0 0:22 0

0 0 0 0:13

1 C C C C C C A

Table 2: TheVAR(2)model identiedfortheCOdatawithBIC.Themaximumtemporallag

wasxed at sixbeause of theresultsinTable1. Parameters inmodel (1)areestimated with

leastsquares. Theresiduals ovariane matrixis thelassialvariane estimator.

^ R1 ^ R2 ^ " 0 B B B B B B

0:59 0 0 0:16

0 0:67 0 0

0:14 0:15 0:32 0:25

0 0 0 0:53

1 C C C C C C A 0 B B B B B B

0 0 0 0

0 0 0 0

0 0 0:15 0

0 0 0 0:17 1 C C C C C C A 0 B B B B B B

0:13 0:01 0:03 0:03

0:01 0:11 0:03 0:01

0:03 0:03 0:20 0:04

0:03 0:01 0:04 0:13 1 C C C C C C A

Thereareafewmissingvaluesinthedataset. FollowingTonellato(2001), theyarereplaed

bythe mean of the valuesat the same station and hour over the sample period. The data is

plotted inFigure 2aording to the fourdierent loations asdepitedin Figure3. We take

thelogarithmoftheobservations,therebystabilizingthevariane. Stations2and4areloated

alongstreets withhigh intensityof traÆ, whereasstation 1is loatedina gardenand station

3 ina pedestrianarea. Therefore, there aredierenes in COlevels among the stations. This

spatialtrendisreadilyinludedinourmodelssineeahstationisallowedtohaveitsownlevel,

inmodel(1). Time trendsmust, on theother hand,beaountedforpreviousto ttingthe

time-stationary model (1). We follow Tonellato (2001) and estimate a trend for eah station

byregressing onafamilyofdailyharmonis. We thensubtratthesetemporaltrendsfromthe

data.

(12)

of them. UsingtheBayesian informationriterion we obtainthemodelsdesribedinTable1.

Theunivariatetimeseriesapproahignoresthespatialdependenestruture. We,therefore,

applythemodelbuildingstrategydesribedintheprevioussetion,wherethestationordering

(3) is dened by onsidering asending distanes, see Figure 3. We start by looking at the

partial orrelations to explore the existing dependenes. Plots of these orrelations are given

in Figure 4 forstation 3 asillustration, as a funtionof theordered stations, along with 95%

ondene intervals for

s

(h) =0 (horizontal lines). We see that all the partialorrelation at

time lag one are signiant exept for station 2, and that for time lag two, only the partial

orrelation orresponding to station 2 is strongly signiant. The partial orrelations at time

lagthree are notsigniant.

The model seletion may be performed automatially and we use the algorithm desribed

inthe previous setionassoiated to the BIC riterion. The modelsobtained are desribedin

Table 2. Forinstane, forstation 3 themodelidentiedorresponds to the partialorrelation

patternobserved inFigure4 at timet 1 and t 3, butdiersslightlyat timet 2.

Tables 1 and 2 an be ompared diretly. The model seletion strategy we use identies

spatio-temporal dependeniesasrelevantfortime-forwardforeastingsinetheoeÆient

ma-tries R

i

's are notdiagonal. Note, however, that forstations 2 and 4 a univariate time series

modelisseleted. ThelatterareloatedalongstreetswithhighintensitytraÆ. ResultsinT

a-ble2areinthisrespetreasonablesinethestationswithhighlevelof pollutionareinuential

forthose withlow levels. Identiationstrategies arefurtherevaluatedinSetion 4.1

station

PCF

1

2

3

4

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

time t-1

station

PCF

1

2

3

4

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

time t-2

station

PCF

1

2

3

4

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

time t-3

(13)

In this setion, we aim at showing the exibility of our approah in modeling theorrelation

struture of the Irishwind data set of Haslett and Raftery (1989). The data onsist of daily

averages of wind speeds reorded at N = 11 synopti meteorologial stations in Ireland

be-tween 1961 and 1978. A map of the loations of the stations an be found in Haslett and

Raftery (1989). Preditions of windspeeds may be useful for various reasons. An example is

theprodutionof renewable energies, seeAlexiadis, Dokopoulos,and Sahsamanoglou (1999).

(a) (b)

distance (m)

correlation

0

100000

200000

300000

400000

0.5

0.6

0.7

0.8

0.9

1.0

distance (m)

correlation

0

100000

200000

300000

400000

0.1

0.2

0.3

0.4

0.5

0.6

() (d)

distance (m)

correlation

0

100000

200000

300000

400000

0.0

0.1

0.2

0.3

0.4

0.5

distance (m)

correlation

0

100000

200000

300000

400000

0.0

0.1

0.2

0.3

0.4

0.5

Figure 5: Spatial orrelation as a funtion of distane (in meters) for the Irish wind dataset

(open irles: from tted VAR(3); pluses: empirial) at various temporal lags: (a) =0; (b)

= 1; () = 2; (d) = 3. The solid urve is the stationary orrelation funtion tted by

(14)

formation to stabilize the variane over stations and time periods, and subtrat an estimated

seasonal eet and spatially varying mean from the data. The data set initiallyinluded one

additional station,butit was omitted beause its spatialorrelation struture withrespetto

theotherstationswasnotonsistentwiththestationarityassumptions(seeHaslettandRaftery,

1989). Our analysis below indiates that some spatialstationarity assumptionsare stillbeing

violatedbytheremaining stations.

Gneiting (2002) provided evidene that the assumption of full symmetry of the

spatio-temporalorrelationstrutureisnotrealisti,beausewindpatternsarepredominantlywesterly

over Ireland. We inorporate this physial information about general wind diretions in our

model bydeninga speialordering of the stations. Speially,foreah station s, we dene

an ordering of the stations to the west of s in asending order of distanes, followed by the

stationsto theeast of s. This orderingreets thepattern ofIrishwinds to blow mainlyfrom

west to east. Note that this ordering does not imply any assumption of stationarity or even

isotropy, but helps in improving the seletion of a model. An inappropriate ordering would

onlyimplythat feweroeÆientsinthe matriesR

i

of model (1)wouldbeidentiedaszeros,

therebyimplyinglargerpredition errors(see Setion4.2).

Next, we t our model (1) to the modied data above. We use the BIC model seletion

riteriontoidentifyamodelforeahstationseparately. Themaximumtemporallagallowedfor

was three to make ourresultsomparableto those inGneiting(2002), and beause univariate

ARanalysesofthedierenttimeseriesonrmedthathighertemporallagswerenotneessary.

The resulting model is a VAR(3), whereeah matrixR

1 ,R

2

,and R

3

is ofdimension 1111.

The ovariane matrix

"

is estimated from the residuals of the model and yields estimates

of the matries

z

(), = 0;1;2;3, from straightforward formulae, see Lutkepohl (1991, p.

23-24). The estimatedspatialorrelationmatries orrespondingto

z

() from ourmodelare

plotted in Figure 5 as open irles for = 0;1;2;3. The empirial spatial orrelations are

plotted as pluses along with the tted stationary orrelation funtions (solid urve) proposed

byGneiting(2002). The estimated orrelation from ourmodel (open irles)are very loseto

the empirialorrelations (pluses), indiating that thespatial VAR(3) model an apturethe

orrelationstruture well. Although the tted stationary orrelation funtion summarizesthe

orrelationstruturerather wellat thetemporal lag =0 (Figure5,panel (a)), thisislessso

(15)

distance (m)

correlation

0

100000

200000

300000

400000

0.5

0.6

0.7

0.8

0.9

1.0

distance (m)

correlation

0

100000

200000

300000

400000

0.1

0.2

0.3

0.4

0.5

0.6

(b) (e)

distance (m)

correlation

0

100000

200000

300000

400000

0.5

0.6

0.7

0.8

0.9

1.0

distance (m)

correlation

0

100000

200000

300000

400000

0.1

0.2

0.3

0.4

0.5

0.6

() (f)

distance (m)

correlation

0

100000

200000

300000

400000

0.5

0.6

0.7

0.8

0.9

1.0

distance (m)

correlation

0

100000

200000

300000

400000

0.1

0.2

0.3

0.4

0.5

0.6

Figure 6: Panels (a)-(): Spatial orrelation (open irles: tted VAR(3); pluses: empirial)

as a funtionof distane (in meters) forthe Irishwind datasetat temporal lag =0, forthe

diretions WE(a), NS (b), and SW-NE(). Panels(d)-(f): Spatial orrelation from thetted

VAR(3) modelat temporal lag =1,forthediretionsWE,NS,and SW-NE(losed irlesin

panels(d), (e)and(f)respetively)andEW,SN,andNE-SWdiretions(openirlesinpanels

(d), (e) and (f) respetively). In the right-hand-sidepanels empirialorrelations are omitted

(16)

ture of the Irish wind data. For this purpose, we onsider three main diretions: horizontal

west-east(WE),vertialnorth-south(NS),anddiagonal(SW-NE).Thepanels(a)-()inFigure

6 depit the spatialorrelation at temporal lag =0 in the WE, NS,and SW-NE diretions

respetively. Open irlesindiatethe orrelationsfrom ourtted model VAR(3), pluses

indi-ate empirial orrelations, and the solid urve is Gneiting's (2002) t. We an see that the

ttedorrelationfuntion(solid line)representstheorrelationstrutureratherwellineah of

thethree diretions. Thus,theassumption ofisotropyat temporal lag =0 seemsadequate.

The panels(d)-(f) in Figure6 depitthe spatialorrelation from ourtted VAR(3) model

at temporallag =1intheWE, NS,andSW-NEdiretions(losed irles)andEW, SN,and

NE-SW diretions (open irles) respetively. We an see that although the orrelations are

rathersymmetri in theNS and SN diretion (panel(e)), thereare strongasymmetries inthe

WE and EW diretion and in the SW-NE and NE-SW diretion. This asymmetri behavior,

whihisaughtbytheVAR(3)speiation,annotbemodeledbyasingleorrelationfuntion

(solid line).

4 Predition performane analysis

InSetion2 we haveproposedastrategy to identifyaspatio-temporalmodel. Modelseletion

strategies are seldomunique and itis neessary to evaluatethem. Theavailabilityof dierent

modelseletionstrategiesmaybeduetotheavailabilityofdierentlassesofmodels. Evenfor

a givenfamilyof models, dierent model strategies maybeavailable,whih areoptimalunder

dierent irumstanes; see de Luna and Skouras (2003). In the latter paper, a framework

wasadvoatedtoomparemodelseletionstrategiesthroughout-of-samplevalidationbasedon

reursivepreditionerrors. Thisanbeadaptedtothespatio-temporaldatastudiedherein. For

agiven stationloated ats, amodelseletionstrategyan be evaluatedwiththe aumulated

preditionerror riterion

T

X

t=M

(z(s ;t) z^ t 1

(s ;t)) 2

; (4)

where z^ t 1

(s;t) is the predition of z(s ;t) obtained by applyingthe model seletion strategy

on the sub-samplez

1 ;:::;z

t 1

. That is, a model is hosen and tted to all sub-samples t =

M;:::;T. Thepreditionerrorsz(s ;t) z^ t 1

(s ;t)arealsoalledreursiveresiduals. Otherloss

(17)

strategies: AR models hosen with BIC and AIC (AR-BIC and AR-AIC respetively) and

spatio-temporalautoregressionshosen withBIC andAIC(SVAR-BICandSVAR-AIC

respe-tively). Themaximumtemporallag allowed forwas six. The smallestroot aumulatedmean

squared errorsperstationare representedinboldfae.

strategy AR-BIC SVAR-BIC AR-AIC SVAR-AIC

Station1 0.371 0.380 0.371 0.382

Station3 0.480 0.466 0.480 0.494

for several model seletion strategies, possibly based on dierent model sets. In de Luna and

Skouras(2003,Theorem1)itwasshownthathoosingthemodelstrategyminimizing(4)would

eventuallyidentifythebeststrategywith probabilityone.

We now apply this out-of-sample framework to the two data sets studied earlier in this

paper. Thisallows usto evaluatemodelseletionstrategies assoiated totheVAR modelsand

to omparethem with abenhmarkunivariatetime seriesmodeling.

4.1 Carbone monoxide in Venie

The arbon monoxide data was desribed in Setion 2.4, where we used BIC to identify a

univariatetimeseriesmodeland a spatio-temporal model. Thesemodelsneedto beevaluated

inorder to deidewhihis mostappropriate. We evaluatethem here together withtwo other

model seletionstrategies. We usebothBIC and AICto seleta modelwithintheAR family

(univariatetimeseriesmodels)and theVAR familywithspatialstruture. Onlystation1 and

3 areonsideredbeause themodelsinTable 1 and2 were equivalent forstation2 and 4.

From Table 3, we note that BIC provides the best predition performane. Moreover,

for station 3, the univariate time series model was outperformed by the model with spatial

struture found with BIC. On the other hand, for station 1 the univariate time series model

has lowest aumulatedpredition errorindiating that the spatialstruture found with BIC

(18)

TheIrishwinddatasetwasdesribedinSetion3,wherestationarityhypotheseswere

investi-gated. Thespatialstruture(orderingofthestations)of theVARmodelsusedwasdenedby

taking into aount that winds are predominantlywesterlyover Ireland. The spatial ordering

ofthestationsispart ofthemodelspeiationand an alsobeinvestigatedwithapredition

performaneanalysis. We,therefore, onsiderhere sixmodel identiationstrategies based on

dierentfamiliesof models. Asintheprevioussetion two strategiesbased onunivariatetime

seriesmodels: AR-BIC and AR-AIC. Moreover, AIC and BIC are usedto selet between two

familiesof VAR models having dierent spatialstruture: therst familyuses an ordering of

stations(3)denedsolelyonasendingdistanesbetweenstations(SVAR-BICand SVAR-AIC

with dist. order), while the seond family is based on an ordering whih takes into aount

theinformationonwinddiretions asdesribedinSetion 3(SVAR-BIC and SVAR-AICwith

windorder).

The rootaumulatedmeansquarederrors(squareroot of(4))arepresentedinTable4for

theeleven stationsand thesixmodelseletionstrategies. Twomaingeneralommentsan be

given. Firstly,thespatio-temporalmodelsoutperformtheunivariatetimeseriesmodels.

More-over,thedierenesinpreditionperformanesbetweenthedierentspatio-temporalmodeling

strategies aremuh lessimportant than the dierenes observed between the spatio-temporal

modeling and the univariate time series strategies. Note, however, that AIC performs better

than BIC in many ases and that the models whose spatial struture takes into aount the

predominant wind diretion (wind order) has as good or better predition performane than

themodelsignoringthisphysialinformation(dist order)inall butone ase(station 3).

The out-of-sample predition performanes summarized in Table 4 may be analyzed in

more detail by displaying the aumulated predition errors graphially. More preisely, for

two strategies S

1 andS

2

itis informative to plotthesums

i

X

t=M

(z(s ;t) z^ t 1

(s;t)) 2

(z(s ;t) z~ t 1

(s;t)) 2

(5)

against i = M;:::;T, where z^ t 1

and ~z t 1

are obtained with S

1

and S

2

, respetively. Suh

umulative sums plots allows us to understand whether a strategy whih has best root

a-umulated mean squared error is really performing better reursively through time. This is

illustratedinFigure 7whereforstation3 (Kilkenny), thebeststrategySVAR-AIC(distorder)

(19)

evi-sixstrategies: AR modelshosen with BIC and AIC (AR-BIC and AR-AIC respetively) and

spatio-temporalautoregressionshosen withBIC andAIC(SVAR-BICandSVAR-AIC

respe-tively). Forthelatter,twodierentorderingof thestationsareexamined. Onebasedsolelyon

distanes betweenstations(dist order)and theother taking also into aount thedominating

wind diretion over Ireland(wind order). The maximum temporal lag allowed for was three,

see Setion3. The smallestroot aumulatedmean squared errorsperstation arerepresented

inboldfae.

Station Strategy

AR-BIC AR-AIC SVAR-BIC SVAR-AIC

distorder windorder distorder windorder

Rohe'sPt. (1) 0.492 0.492 0.474 0.473 0.473 0.473

Valentia(2) 0.503 0.503 0.498 0.498 0.499 0.499

Kilkenny(3) 0.446 0.446 0.424 0.424 0.423 0.424

Shannon(4) 0.461 0.460 0.455 0.454 0.452 0.452

Birr(5) 0.481 0.480 0.470 0.470 0.469 0.468

Dublin(6) 0.461 0.461 0.445 0.445 0.444 0.444

Claremorris(7) 0.488 0.488 0.485 0.485 0.483 0.482

Mullingar(8) 0.446 0.445 0.433 0.433 0.433 0.433

Clones(9) 0.477 0.477 0.467 0.468 0.467 0.467

Belmullet(10) 0.490 0.490 0.491 0.490 0.489 0.489

(20)

time seriesmodels. The dierenein performane when omparingspatio-temporal strategies

to eah other is muh less onlusive, indiating that the dierent spatio-temporal strategies

providesimilarpreditionperformanes.

The patterns observed in Figure 7 ould also be observed for the other stations although

we do not inlude the graphs. Notable exeptions were that for station 2 and 10, for whih

univariate time series and spatio-temporal strategies had similar performanes. This an be

explained by the fat that these two stations are loated on the west oast. Winds oming

predominantlyfromthisdiretionmake thesestationsfairlyindependentfromwindsmeasured

at stationsloated eastwards.

0

1000

2000

3000

4000

5000

0

20

40

60

80

100

AR-AIC vs SVAR-AIC (dist order)

0

1000

2000

3000

4000

5000

-1

0

1

2

3

4

5

SVAR-BIC (dist order) vs SVAR-AIC (dist order)

0

1000

2000

3000

4000

5000

-2

0

2

4

SVAR-BIC (wind order) vs SVAR-AIC (dist order)

0

1000

2000

3000

4000

5000

-3

-2

-1

0

SVAR-AIC (wind order) vs SVAR-AIC (dist order)

Figure7: Thegraphsplotthedierenesinaumulatedsquaredpreditionerrors(5)between

the two model seletion strategies (given in titles) for station 3 (Kilkenny). Upward trends

mean thattheseond strategyis performingbetter whiledownwardtrends showthatthe rst

(21)

In this paper, we have advoated the use of VAR models when the available spatio-temporal

dataisrihinthetimedimensionbutsparseinthespatialdimension,and whenthepurposeof

theanalysisisto providetime-forwardpreditionsat thespatialloationwherehistorialdata

is available. VAR modeling is well understood and widely applied in many disiplines. Our

main ontribution is to propose a model identiation strategy whih takes advantage of the

spatialloationof thedierenttime series.

We believe that the proposed modeling strategy will nd wide appliability in the

envi-ronmentalsieneswheretimeforwardpreditionandmonitoring ofspatio-temporal proesses

is a major ativity. This wide appliabilityshould be eased by the fat that the models and

identiationstrategy(whihanbeautomated) arebotheasyto implementandgeneral. The

simpliityin implementationis a onsequene of theonsideration of linear models whih are

nestedwith respet to a naturalspatio-temporal hierarhy. Splus odesare availablefrom the

authors upon request. The generality of the method is impliedby treating eah spatial

loa-tionseparatelyinthemodelingproess,therebyavoidingrestritiveandoftendiÆult-to-verify

spatial-stationarityassumptions.

We have aimed at showing both the simpliity and generality of the method with two

real data sets. Forinstane, the estimation of temporal trends has been done with harmoni

funtions oftime. Aswe mentionedinSetion2.3, ourmodelingstrategymay readilybeused

together with other trend estimation tehniques suh as those using measurements on other

variableswhen theseare available.

The modelingframeworkis,moreover, open tofurthergeneralizations,althoughattheost

of its simpliityand probablyrobustness. For instane, non-linearautoregressive modelsmay

be entertained. This would be muh more diÆult to do in the ontext of spatial-stationary

proesses. State-spaerepresentations ouldalso beonsideredina similarsettingallowing us

to deal optimallywithmissingobservationsat ertainloationsand times.

Aknowledgments

(22)

Akaike,H.(1974), \Anewlookatthestatistialmodelidentiation,"IEEETransationson

Automati Control, 19, 716-723.

Alexiadis, M. C., Dokopoulos, P. S., Sahsamanoglou, H. S. (1999), \Wind speed and power

foreastingbased on spatialorrelation models," IEEETransations on Energy

Conver-sion,14, 836-842.

Box, G.E.P., Jenkins,G.M.(1976),TimeSeriesAnalysis: Foreasting andControl,San

Fran-iso: Holden-Day.

Dagum, C., Dagum, E.B. (1988), \Trend," in Enylopedia of Statistial Sienes, S. Kotz,

N.L.Johnson, eds,New York: Wiley&Sons, 321-324.

de Luna,X.,Genton,M.G.(2002),\Simulation-basedinfereneforsimultaneousproesseson

regularlatties," Statistisand Computing, 12, 125-134.

deLuna,X., Skouras,K.(2003),\Seletingamodelseletionstrategy,"Sandinavian Journal

of Statistis, 30, 113-128.

Gneiting, T. (2002), \Nonseparable, stationary ovariane funtions for spae-time data,"

Journal of the Amerian Statistial Assoiation, 97, 590-600.

Guttorp, P., Meiring, W., Sampson, P. (1994), \A spae-time analysis of ground-level ozone

data," Environmetris, 5,241-254.

Haslett, J., Raftery, A.E. (1989), \Spae-time modeling with long-memory dependene:

as-sessingIreland'swindpowerresoure," Applied Statistis, 38, 1-50.

Host, G., Omre, H., Switzer, P. (1995), \Spatial interpolation errors for monitoring data,"

Journal of the Amerian Statistial Assoiation, 90, 853-861.

Krzanowski, W.J.(1988), Priniples of Multivariate Analysis: A User's Perspetive,Oxford:

OxfordUniversityPress.

Kyriakidis,P.C., Journel,A.G. (1999), \Geostatistial spae-time models: a review,"

Mathe-matial Geology, 31, 651-684.

(23)

Journal of Time Series Analysis, 21, 663-684.

Matheron, G. (1973), \The intrinsi random funtions and their appliations," Advanes in

Applied Probability, 5,439-468.

Mardia,K.V.,Goodall,C.R.(1993), \Fatorized modelsinspatialmodeling,"inMultivariate

Environmental Statistis, N.K. Bose, G.P. Patil, C.R. Rao, eds., Amsterdam: North

Holland.

Pe~na,D., Tiao, G. C.,Tsay,R.S.(2001), A Course in Time SeriesAnalysis, Wiley.

Pfeifer, P.E., Deutsh, S.J.(1980), \A three-stage iterative proedure forspae-time

model-ing,"Tehnometris, 22, 35-47.

Shwarz, G. (1978), \Estimating the dimension of a model," The Annals of Statistis, 6,

461-464.

Stoer, D.S. (1986), \Estimation and identiation of spae-time ARMAX models in the

preseneofmissingdata," Journal of the Amerian Statistial Assoiation," 81, 762-772.

Stroud,J.R.,Muller,P.,Sanso,B.(2001),\Dynamimodelsforspatiotemporaldata,"Journal

of the Royal Statististial Soiety, Series B,63, 673-689.

Tonellato, S.F. (2001), \A multivariate time series model for the analysis and predition of

arbon monoxideatmospherionentrations," Applied Statistis, 50, 187-200.

Wikle, C.K., Cressie, N. (1999), \A dimension redution approah to spae-time Kalman

References

Related documents

According to Boland and Schultze (1996), the socializing effects associated with IR are strongly influenced by the narrative mode of cognition enabled by the Company’s

While the duration dependence of ice cream purchases implied by Figures 1 and 2 can be consistent with variety-seeking behaviour induced by a diminishing marginal utility, it can

Appreciation is expressed to members of the Hillsborough County School Board, Superintendent Earl Lennard, and school district employees, students and community residents who

Although there is a relevant body of research on Target Costing, many questions are still in need of clarification, especially its linkages to competitive

National Healthy Worksite Program training webinars www.cdc.gov/nationalhealthyworksite/index.html Centers for Disease Prevention and Control Workplace Health

efficient terminal handling capacity evolves with the ships to be served. Local and international terminal operators may have different views on the size of unit required to serve

Air Baltic (delayed flight), Air China (delayed flight), Air Mediterranée (delayed flight), British Airways (cancelled flight, rebooked on a flight departing 23 hours later),

To the author’s best knowledge no studies has been reported yet on non-aligned bioconvective stagnation point flow of a nanofluid comprising gyrotactic microorganisms across