• No results found

Forecasting of air quality in Delhi using principal component regression technique

N/A
N/A
Protected

Academic year: 2021

Share "Forecasting of air quality in Delhi using principal component regression technique"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

Atmospheric Pollution Research

www.atmospolres.com

Forecasting

of

air

quality

in

Delhi

using

principal

component

regression

technique

Anikender

Kumar,

Pramila

Goyal

CentreforAtmosphericSciences,IndianInstituteofTechnologyDelhi,HauzKhas,NewDelhiͲ110016India

ABSTRACT

Overthepastdecade,anincreasinginteresthasevolvedbythepublicintheday–to–dayairqualityconditionsto whichtheyareexposed.Drivenbytheincreasingawarenessofthehealthaspectsofairpollutionexposure, especiallybymostsensitivesub–populationssuchaschildrenandtheelderly,short–termairpollutionforecasts arebeingprovidedmoreandmorebylocalauthorities.TheAirQualityIndex(AQI)isanumberusedby governmentalagenciestocharacterizethequalityoftheairatagivenlocation.AQIisusedforlocalandregional airqualitymanagementinmanymetropolitancitiesoftheworld.Themainobjectiveofthepresentstudyisto forecastshort–termdailyAQIthroughpreviousday’sAQIandmeteorologicalvariablesusingprincipal componentregression(PCR)technique.Thisstudyhasbeenmadeforfourdifferentseasonsnamelysummer, monsoon,postmonsoonandwinter.AQIwasestimatedfortheperiodofsevenyearsfrom2000–2006atITO(a busiesttrafficintersection)forcriteriapollutantssuchasrespirablesuspendedparticulatematter(RSPM),sulfur dioxide(SO2),nitrogendioxide(NO2)andsuspendedparticulatematter(SPM)usingamethodofUS

EnvironmentalProtectionAgency(USEPA),inwhichsub–indexandbreakpointpollutantconcentrationdepends onIndianNationalAmbientAirQualityStandard(NAAQS).ThePrincipalcomponentshavebeencomputedusing covarianceofinputdatamatrix.Onlythosecomponents,havingeigenvaluesш1,wereusedtopredicttheAQI usingprincipalcomponentregressiontechnique.TheperformanceofPCRmodel,usedforforecastingofAQI, wasbetterinwinterthantheotherseasonsasstudiedthroughstatisticalerroranalysis.Thevaluesof normalizedmeansquareerror(NMSE)werefoundas0.0058,0.0082,0.0241and0.0418forwinter,summer, postmonsoonandmonsoonrespectively.Theotherstatisticalparametersarealsosupportingthesameresult.

Keywords: Forecasting Principalcomponentregression(PCR) Airqualityindex(AQI) Meteorologicalvariables Pollutants ArticleHistory: Received:03November2010 Revised:02February2011 Accepted:08February2011 CorrespondingAuthor: PramilaGoyal Tel:

+91Ͳ11Ͳ26591309 Fax:

+91Ͳ11Ͳ26591386 EͲmail:

[email protected] ©Author(s)2011.ThisworkisdistributedundertheCreativeCommonsAttribution3.0License. doi:10.5094/APR.2011.050

1.

Introduction

Withcontinuousdevelopmentandincreaseofpopulationin

theurbanareas,aseriesofproblemsrelatedtoenvironmentsuch

asdeforestation,releaseoftoxicmaterials,solidwastedisposals,

airpollutionandmanymore,haveattractedattentionmuch

greaterthaneverbefore.Theproblemofairpollutionincitieshas

becomesoseverethatthereisaneedfortimelyinformationabout

changesinthepollutionlevel.Todayforecastingofairqualityis

oneofthemajortopicsofairpollutionstudiesduetothehealth

effectscausedbytheseairbornepollutantsinurbanareasduring

pollution episodes. Therefore, the development of effective

forecastingmodelsofAQIformajorairpollutantsinurbanareasis

ofprimeimportance.Withthisendinview,thereisaneedtohave

amodelthatwouldgeneratethefutureAQI.Althoughmany

forecastingmodelsexistandsomeareinuse,thereisstillneedfor

developingmoreaccuratemodels.TheGaussiandispersionmodels

aregenerallyusedinmostoftheairpollutionstudies.Eventhough

themodelshavesomephysicalbasis,detailedinformationabout

thesourceofthepollutantsandothervariablesaregenerallynot

known.Inordertoovercometheselimitations,statisticalmodels

areused,whichfacilitatethepredictionofpollutantconcen– trations(FinziandTebaldi,1982;Ziomassetal.,1995;Polydoraset

al.,1998).

Numerousstudiesbasedonthestatisticalmodelshavebeen

carriedoutindifferentregionstoidentifylocalmeteorological

conditions,moststronglyassociatedwithairpollutantconcen– trations,andtoforecasttheirvalues(McCollisterandWillson,

1975;AronandAron,1978;Lin,1982;Aron,1984;Katsoulis,1988;

RobesonandSteyn,1990).Manyofthepreviousstudies(Sanchez

et.al.,1990;Mantisetal.,1992;MilionisandDavies,1994)

analyzed the meteorological conditions associated with high

pollutant concentrations. These studies usually produced

qualitativeorsemi–quantitativeresultsandshedalightonthe

relationbetweenthemeteorological conditionsandpollutant

concentrations.ShiandHarrison(1997)developedaregression

modelforthepredictionofNOxandNO2inLondon.Somenon–

linearmodelsi.e.,ArtificialNeuralNetworkscanalsobeusedto

forecastthepollutantconcentrations(Boznaretal.,1993;Comrie,

1997).

Asforthehealthimpactofairpollutants,AQIisanimportant

indicatorforgeneralpublictounderstandeasilyhowbadorgood

theairqualityisfortheirhealthandtoassistindatainterpretation

for decisionmaking processes relatedtopollutionmitigation

measuresandenvironmentalmanagement.Basically,theAQIis

definedasanindexorratingscaleforreportingdailycombined

effectofambientairpollutantsrecordedinthemonitoringsites.

Recently,VandenElshoutetal.(2008)gaveareviewofexistingair

qualityindicesandaproposalofacommonalternative.Fuzzy

inferencesystemshavealsousedinmodelingofairqualityindices

byHajekandOlej(2009).Aregressionmodelwasalsousedby

Cogliani(2001)forairpollutionforecastincitiesbyanairpollution

indexhighlycorrelatedwithmeteorologicalvariables.However,

whenmulticollinearityispresent,thecomputationsofregression

coefficients in regression models become dubious. Principal

(2)

limitation.PCAisalsoaproceduretoreducethenumberof

variables.Itisusefulwhenobtaineddatahasanumberofvariables

(possiblyalargenumberofvariables),andbelievedthatthereis

somevariablesthosearecorrelatedwithoneanother.Sanchezet

al.(1986)usedtheprincipalcomponentfactoranalysisforstudying

thespatialandtemporaldistributionofSO2inanurbanarea.The

PCR technique was also used to forecast the long–range

forecastingofSouthwestmonsoonrainfalloverIndia(Rajeevanet

al.,2005).ThemostoftheairqualityforecastinginDelhihasbeen

donethroughindividualairpollutantswhereasthepresentstudy

wasconductedusingprincipalcomponentregressiontechnique

withrespecttodailyAQI.ThePCRmodelisusedinthepresent

studytoforecastthedailyairqualityindexonedayinadvance.

1.1.Descriptionandmeteorologyofstudyarea

TheDelhicity(Latitude28°35'N,Longitude77°12'E)islocated

inthenorthernpartofIndiaandsituatedbetweentheGreatIndian

Desert(TharDesert)ofRajasthantothewest,thecentralhot

plainstothesouthandthecoolerhillyregiontothenorthand

east.Delhihasasemi–aridclimatewithanextremelyhotsummer,

average rainfall and very cold winter. Due to the worst

meteorologicalscenario,themostimportantseasoninDelhiis

winter,whichstartsinDecemberandendsinFebruary.Thisperiod

isdominatedbycold,dryairandground–basedinversionwithlow

windconditions(ud1ms–1),whichoccurfrequentlyandincreases

theconcentrationofpollutants(Anfossietal.,1990).Thesummer

(March,April,May)isgovernedbyhightemperatureandhigh

winds,whilethemonsoon(June,July,August)isdominatedby

rainsandpost–monsoon(September,October,November)has

moderatetemperatureandwindconditions.

Delhi,thecapitalcityofIndiawith13.8millioninhabitants

spreadover1483km2(Anejaetal.,2001).Duetothepresenceof

large number of industries and migration of people from

neighboringstates,nearly5.4millionvehiclesarerunningonDelhi

roads.Theemissionofpollutantsfromthesesourcesdeteriorates

theambientairquality.Thesteepincreaseinvehicularpopulation

(majorsourceofairpollution)hasresultedincorresponding

increaseinpollutantsemittedbythesevehicles.Presently,more

than1300tonsofpollutantsareemittedbythevehiclesinDelhi.

Duetotheincreasedlevelofpollutants,Delhi’sairisblamedfor

40%ofemergencyhospitaladmissionsofpatientswithbreathing

and heartcomplaints.Theambientair qualitydataof Delhi

monitoredbyCentralPollutionControlBoard(CPCB)showsvery

highvaluesofsuspendedparticles,SO2andNOxwhichhavebeen

beyondthepermissiblelimitsfromlastseveralyearscontinuously

(GoyalandSidhartha,2003).AllIndiaInstituteofMedicalSciences

(AIIMS)reportsthattherewasamassive900%increaseinasthma

casesinDecember1999comparedtoDecember1998.Studyby

BrandonandHommann(1995),byusingthestandardUSmetric,

estimatedthatthe7490deathscouldbeavoidedinDelhibya

141.6μgm–3reductioninPM10.One,outofevery10school

childreninthecity,suffersfromasthmathatisworseningdueto

vehicularairpollution.

TheprimaryobjectiveofthisstudyistoforecastthedailyAQI

onedayinadvanceonseasonalbasis.Abusiesttrafficintersection

ITO has been chosen to forecast using the air pollutant

concentrationsmonitoredatITO,sinceitisacontinuousairquality

monitoring station at the same place. The daily air quality

parameters(dailyaverageconcentrationsofpollutants)namely

RSPM,SO2,NO2andSPMusedinthepresentstudyweremeasured

byCPCB,aregulatorymonitoringagencyinDelhi.Thelocationsof

monitoringstationswerecategorizedonalandusebasis(CPCB,

2005)i.e.,residential,industrialandtrafficintersections.The

stationthatisclassifiedastrafficintersectionisITO.Inthisstudy,

AQIwascalculatedusingUSEPAmethodinwhichsub–indexand

breakpointpollutantconcentrationsdependonIndianNAAQSand

aPCRtechniquewasalsousedtoforecasttheshorttermi.e.,daily

AQIthroughpreviousday’sAQIandmeteorologicalvariables.

The24–hourlyaveragedsurfacemeteorologicalvariablesat

Safdarjung airport like daily maximum temperature (tmax),

minimumtemperature(tmin),dailytemperaturerange(difference

between daily maximum and minimum temperature, trange),

averagetemperature(tavg),windspeed(wsp),winddirectionindex

(wdi),relativehumidity(rh),vaporpressure(vp),stationlevel

pressure(slp),rainfall(rf),sunshinehours(ssh),cloudcover(cc),

visibility(v)andradiation(rd)forDelhiwereacquiredfromthe

IndianMeteorologicalDepartment(IMD),Punefortheperiodof

2000–2006. There is no meteorological station at ITO. The

meteorologicalstationisatSafdarjungairportabout5.7kmfrom

theITOandthisistheonlystationinthestudyarea(ITO).Figure1

shows the air pollution monitoring (ITO) and meteorological

(Safdarjungairport)stationsontheMapofDelhi.

2.

Materials

and

Methods

ThereareprimarilytwostepsinvolvedinformulatinganAQI:

firsttheformationofsub–indicesofeachpollutant,secondthe

aggregation (breakpoints) of sub indices. Breakpoint concenͲ

trationsofeachpollutant,usedincalculationofAQI,arebasedon

IndianNAAQSandresultsofepidemiologicalstudiesindicatingthe

riskofadversehealtheffectsofspecificpollutants.Ithasbeen

noticedthatdifferentbreakpointconcentrationsanddifferentair

qualitystandardshavebeenreportedinliterature(Environmental

ProtectionAgency,1999).InIndia,toreflectthestatusofair

qualityanditseffectsonhumanhealth,therangeofindexvalues

hasbeendesignatedas“Good(0–100)”,“Moderate(101–200)”,

“Poor(201–300)”,“VeryPoor(301–400)”and“Severe(401–500)”

(Table1)(Nagendraetal.,2007).

AllthevaluesofSO2,NO2,RSPMandSPMareinμgm–3.

Theformula(EPA,1999)usedtocalculateAQIforfourcriteria

pollutantsRSPM,SO2,NO2andSPMfrom2000–2006isgiven

below:

Hi Lo

P P Lo Lo Hi Lo I I I C BP I BP BP ª º « » « » ¬ ¼ (1)

whereIPistheAQIforpollutant“p”,CPistheactualambient

concentrationofthepollutant“p”,BPHiisthebreakpointinTable1

thatisgreaterthanorequaltoCp,BPLoisthebreakpointinTable1

thatisless thanorequalto Cp,IHiisthesub index value

correspondingtoBPHi,and,ILoisthesubindexvaluecorresponding

toBPLo.

TheAQIisdeterminedonthebasisofAQIofstudypollutants

andthehighestamongthemisdeclaredastheoverallAQI.The

formulausedhereissameasusedbyUSEPA,inwhichsubͲindex

andbreakpointconcentrationdependsonIndianNAAQS.

2.1. Multiple Linear Regression and Principal Component

Regressionmodel

Aforecastcanbeexpressedasafunctionofacertainnumber

offactorsthatdetermineitsoutcome.Multiplelinearregression

(MLR)techniqueincludesonedependentvariabletobepredicted

andtwoormoreindependentvariables.Ingeneral,multiplelinear

regressioncanbeexpressedasinEquation(2): Y=b1+b2X2+…...+bkXk+e (2)

(3)

Figure1.MapofDelhiwithairpollutionmonitoring(ITO)andmeteorological(Safdarjungairport)stations(Source:http://www.mapmyindia.com/). Table1.ProposedsubͲindexandbreakpointpollutantconcentrationsforIndianͲAQI

SI.No. Indexvalues Descriptor SO2(24Ͳhavg.) NO2(24Ͳhavg.) RSPM(24Ͳhavg.) SPM(24Ͳhavg.)

1 0Ͳ100 Gooda 0Ͳ80 0Ͳ80 0Ͳ100 0Ͳ200 2 101Ͳ200 Moderateb 81Ͳ367 81Ͳ180 101Ͳ150 201Ͳ260 3 201Ͳ300 Poorc 368Ͳ786 181Ͳ564 151Ͳ350 261Ͳ400 4 301Ͳ400 VeryPoord 787Ͳ1572 565Ͳ1272 351Ͳ420 401Ͳ800 5 401Ͳ500 Severee >1572 >1272 >420 >800 a Good:Airqualityisacceptable;however,forsomepollutantstheremaybeamoderatehealthconcernforaverysmall numberofpeople. b Moderate:Membersofsensitivegroupsmayexperiencehealtheffects. c Poor:Membersofsensitivegroupsmayexperiencemoreserioushealtheffects. d Verypoor:Triggershealthalter,everyonemayexperiencemoreserioushealtheffects. e Severe:Triggershealthwarningsofemergencyconditions.

whereYisthe dependentvariable, X2,X3…..., Xkarethe

independent variables, b1, b2…...,bk are linear regression

parameters.Inthismodel,AQIisthedependentvariableand,

previousday’sAQIandmeteorologicalvariables,areindependent

variables,eisanestimatederrortermwhichisobtainedfrom

independentrandomsamplingfromthenormaldistributionwith

meanzeroandconstantvariance.Thetaskofregressionmodeling

is to estimatetheb1,b2...,bk,whichcanbedone using

minimumsquareerrortechnique.

Equation(2)canbewritteninthefollowingform:

Y=Xb+e ( (3) where Y= 1 2 : : n Y Y Y ª º « » « » « » « » « » « » ¬ ¼ ,X= 21 31 1 2 3 1 .... .... .. 1 .... k n n kn X X X X X X ª º « » « » « » ¬ ¼ ,b= 1 2 : : k b b b ª º « » « » « » « » « » « » ¬ ¼ ande= 1 2 : : n e e e ª º « » « » « » « » « » « » ¬ ¼ SoYisannx1,Xisannxk,bisakx1andeisannx1matrix.

Afterusingtheminimumsquareerrortechnique,thesolution

canbeobtainedasb=(X’X)Ͳ1(X’Y).Further,theF–testhasbeen

performedtodeterminewhetherarelationshipexistsbetweenthe

dependentvariableandtheregressors.Thet–testisperformedin

ordertodeterminethepotentialvalueofeachoftheregressor

variablesintheregressionmodel.Theresultingmodelcanbeused

topredictfutureobservations.

Whenmulticollinearity ispresent thecomputationof an

inversematrix(X’X)Ͳ1becomesdubious.PCAcanbeappliedto

overcome thislimitation. Itisusefulwhenlarge number of

variables are present, and also if there are some variables

correlatedwitheachother.

TheapplicationofPCAwithregressionmodelaimstoreduce

thecollinearityinthedatasetswhichleadstotheworstpredictions

andalsodeterminetherelevantindependentvariablesforthe

predictionofairpollutantconcentrations(Sousaetal.,2007).The

differencebetweenPCRandMLRismainlyduetoinputdata.PCR

modeltakesPCsofvariablesasinputdataandreducesthe

complexityduetolessnumberofinputvariables.

Computationofprincipalcomponents.Principalcomponentscan

becomputedbycovarianceofinputdatamatrix.Inthisstudy,the

covariance matrix of the initial data was considered. The

Safdarjung

ITO(Airqualitymonitoringstation) Safdarjung(Meteorologicalstation)

(4)

eigenvaluesofthecovariancematrix”C”areobtainedfromits

characteristicequation:

C ʄI 0 (4)

where,

Ȝ

istheeigenvalueandIistheidentitymatrix.

Foreacheigenvalue,anon–zerovectorecanbedefinedas:

C e ʄe (5)

wherethevectoreiscalledthecharacteristicvectororeigenvector

ofthecovariancematrixCassociatedwithits corresponding

eigenvalue.Theeigenvectorsderivedfromthecovariancematrix

represent the mutuallyorthogonal linearcombination of the

matrix.Theirassociatedeigenvaluerepresenttheamountoftotal

variance,whichisexplainedbyeachoftheeigenvectors.By

retainingonlythefirstfewpairsofeigenvalue–eigenvector,or

principalcomponents,asubstantialamountofthetotalvariance

canbe explainedwhileexplainingthehigher orderprincipal

componentswhichexplainminimalamountsofthetotalvariance

andcanbeviewedasnoise.VarianceexplainedbyithPCisgiven

by: i i n n ʄ Thevariance ʄ

¦

(6)

ThePCassociatedwiththegreatesteigenvalue,thefirstPC

(PC1), represents the linear combination of the variables

accountingforthemaximumtotalvariabilityinthedata.The

secondPCexplainsthemaximumvariabilitythatisnotaccounted

bythePC1andsoon.Allcomponentswitheigenvalues>1should

beretained.Therationalebehindthismethodisaneigenvalueof

1, represents amount of variance, explained by the original

variables,andcomponentsofeigenvalue<1explainlessvariance

thantheoriginalvariables.AftergettingthePC’s,theinitialdata

setistransformedintotheorthogonalsetbymultiplyingthe

eigenvectorstotheinitialdataset.Nowthistransformeddataset

isusedasinputtothemultiplelinearregressiontechnique.

0 1 1 2 2 n n

Y ɴ ɴ (PC )ɴ(PC ) ... ɴ (PC ) e (7)

where ɴ0,ɴ1, ɴ2……ɴn arethecoefficientsinthemodelequation.

Thecoefficientsofregressionmodelhavebeenestimatedusing

theleastsquaresmethod.Further,theF–testhasbeenperformed

todeterminewhetherarelationshipexistsbetweenthedependent

variableandtheregressors.Thet–testisperformedinorderto

determinethepotentialvalueofeachoftheregressorvariablesin

theregressionmodel.Theresultingmodelcanbeusedtopredict

futureobservations.

PrincipalComponentswerecomputedusingthedataforthe

years2000–2005andalsousedasaninputtoregressionmodelto

formPCRmodel.Thesameprocesswasadoptedforallfour

seasons.Thepreviousday’sAQIandmeteorologicalvariables(15

variables,asmentionedinSection1)fortheyears2000–2005were

usedastheinputtoPCRmodel.Thecovariancematrixofthegiven

inputisdetermined.ThePCshavebeendeterminedonthebasisof

thevarianceexplainedbytheeigenvaluesofthecovariancematrix.

Onlythoseprincipalcomponentswhoseeigenvaluesш1basedon

theanalysisof15variables,wereusedtoforecastonedailyair

qualityindex.TheapplicationofPCAwithregressionmodels

reducesthecollinearityofthedatasets,whichcanleadtoworst

predictions and also determines the relevant independent

variablesforthepredictionofAQI.ThearchitectureofthePCR

modeltoforecastAQIhasbeenshowninFigure2.Thedifference

between the PCR and MLR is due to the input variables.

Consequently,thenetworkarchitecturewillbelesscomplexinPCR

duetothedecreasednumberofinputvariables.

Oncethisprocesshasbeencompleted,theperformanceof

PCRmodelhasbeenvalidatedwithanindependentdata,observed

fortheyear2006,thathasbeentransformedtothenewdataset

usingthepreviouslydeterminedweightsofprincipalcomponents.

Itisimportanttomentionthat2006datawasnotusedtobuildthe

model.Theaccuracyofthemodelwasanalyzedthroughthe

statisticalparameters.

3.

Results

and

Discussion

TheforecastingofdailyAQI,basedonpreviousday’sAQIand

meteorologicalvariables,wasdoneusingMLRandPCRmodelon

theseasonalbasisfortheperiodof2000–2005andvalidated

throughthedailyAQIof2006.

The regression models for different seasons, summer,

monsoon,postmonsoonandwinterweredevelopedusingthe

MLRtechniqueonthebasisofdailydataof2000–2005usingthe

procedurediscussedinSection2.Theregressionequationsbased

onMLRtechniqueareobtainedasEquations(8),(9),(10)and(11)

for summer, monsoon, post monsoon and winter season,

respectively. Figure2.ArchitectureofPCRmodelfortheforecastingofAQI.

(5)

d 1 max [AQI] 131.26 0.503 [AQI ] 0.462 [rh] 1.689 [t ] 6.131 [cc] u u u u (8) d 1 min [AQI] 2493.64 0.599 [AQI ] 1.736 [rh] 22.89 [v] 3.44 [t ] u u u u (9) d 1 max [AQI] 361.33 0.537 [AQI ] 1.72 [slp] 1.67 [vp] 2.697 [ssh] 20.49[v] 1.49[t ] u u u u (10) d 1 [AQI] 1728.11 0.503 [AQI ] 15.60 [v] 7.98 [cc] 4.50 [wsp] 1.19[rh] 0.84[rf] u u u u (11)

Equations(8)–(11)showthatpreviousday’sairqualityindexis

thecommonvariableforallseasons.

ThedailyAQIoftheyear2006hasbeenforecastedusingthe

aboveequations,whichhasbeencomparedwithobservedAQIof

2006.ThestatisticalevaluationofforecastedandobservedAQI

valuesisshowninTable2.ResultsindicatedthattheMLRmodelis

performingsatisfactorilyinallseasonsandgivesbetterresultsin

winterwithrespecttotheNMSEandRootMeanSquareError

(RMSE).Itshowsaminordifferenceincoefficientofdetermination

comparedtotheotherseasons.Fractionalbiasshowsthatthe

modelisunder–predictinginsummer,postmonsoonandwinterin

trainingaswellasinvalidationandisover–predictinginmonsoon

season.

ThePCRmodelsforsummer,monsoon,postmonsoonand

winter,basedonthedailydatafortheyears2000–2005were

developedasdiscussedinSection2andwereanalyzedstatistically.

Asafirststep,datafortheyears2000–2005wasusedto

calculatethecovariancematrixforallfourseasons.Thepredictor

variablesweretransformedintoprincipalcomponentsthroughthe

eigenvaluematrixofvariablesthatwouldexplainmostofthetotal

variationinthedata.Table3representstheeigenvaluesand

amountofvariance,explainedbyeachprincipalcomponentwith

eigenvaluesш1.Restofthecomponentshavingeigenvalues<1,

explaininglessvariancethananyoforiginalvariableswereignored.

Table3alsoshowsthatonly5PCshaveeigenvaluesш1witha

cumulative variance of 69.98 in summer and 4 PCs have

eigenvaluesш1withcumulativevariancesof60.79,68.35and66.62

in monsoon, post monsoon and winter, respectively.

Communalitiesofeachoriginalvariableinallfourseasonsare

showninTable4,usingthefirst5PCsinsummerand4PCsin

monsoon,postmonsoonandwinterseasons.ThisTablereflects

thatthemostrelevantoriginalvariablesforPCRareaveragedaily

temperature,relativehumidity,dailyminimumtemperatureand

dailyaveragetemperatureinsummer,monsoon,postmonsoon

andwinter,respectively.

Table2.ComparisonofMLRmodelpredictedandobservedvaluesinyears2000Ͳ2005andyear2006

S.N. Season 2000Ͳ2005 2006

RMSE NMSE Coefficientof

determination

Fractional

Bias RMSE NMSE

Coefficientof determination Fractional Bias 1 Summer 35.62 0.0114 0.4983 0.0004 35.41 0.0112 0.5495 0.0509 2 Monsoon 53.18 0.0405 0.6362 Ͳ0.0003 56.53 0.0427 0.4203 Ͳ0.0022 3 PostMonsoon 40.63 0.0174 0.6680 0.0336 48.98 0.0260 0.5014 0.0309 4 Winter 40.03 0.1227 0.4590 0.0005 31.78 0.0080 0.3936 0.0446 Table3.EigenvaluesandexplainedvarianceofthecomputedPCsforsummer,monsoon,postmonsoonandwinterseasons Seasons Principal

Component Eigenvalue %ofVariance

Cumulative variance(%) Summer 1 4.7032 31.35 31.35 2 2.1172 14.11 45.47 3 1.5657 10.44 55.91 4 1.1039 7.36 63.27 5 1.0076 6.72 69.98 Monsoon 1 5.1592 34.39 34.39 2 1.5240 10.16 44.55 3 1.3698 9.13 53.69 4 1.0658 7.11 60.79 PostͲ monsoon 1 5.7652 38.43 38.43 2 2.2458 14.97 53.41 3 1.1746 7.83 61.24 4 1.0676 7.12 68.35 Winter 1 4.1107 27.40 27.40 2 2.7014 18.01 45.41 3 2.0011 13.34 58.75 4 1.1804 7.87 66.62

(6)

Table4.Communalitiesofeachoriginalvariableforsummer,monsoon, postͲmonsoonandwinterseasons

Variable Summer Monsoon PostͲ

Monsoon Winter AQIdͲ1 0.6370 0.6802 0.6958 0.7135 tavg 0.9597 0.8871 0.8585 0.9272 rh 0.7880 0.8949 0.7630 0.8267 vp 0.6849 0.5720 0.8899 0.7395 rf 0.5502 0.6015 0.2131 0.5305 wsp 0.7752 0.5598 0.7816 0.6079 wdi 0.7095 0.4124 0.6415 0.2377 rd 0.2247 0.2154 0.4218 0.4333 tmax 0.9463 0.8414 0.8516 0.9085 tmin 0.9503 0.7778 0.9116 0.8872 ssh 0.6525 0.6481 0.7016 0.7469 slp 0.7816 0.1948 0.4904 0.4580 v 0.5867 0.6419 0.6439 0.6962 cc 0.4232 0.4081 0.5845 0.4273 trange 0.8271 0.7825 0.8038 0.8526

The loadings (or coefficients) of each input variable

correspondingtoall5PCsinsummerand4PCsinmonsoon,post

monsoonandwinteraregiveninTablesS1,S2,S3andS4,

respectively(seetheSupportingMaterial,SM).Inthiscase,only5

newvariables(PCs)wereusedinsteadoforiginal15variablesin

summerand4newvariableswereusedfortheremainingthree

seasons.

ThePCRmodelsforallfourseasonsbasedonthetransferred

datafor2000–2005weredevelopedandanalyzedstatistically.The

T–test was used to test the significance of the variables.

Insignificant/statisticallyinvalidvariableswereremovedfromthe

modelequation.Itwasobservedthat3PCs(PC1,PC2andPC3)lied

in95%confidenceintervalandthesevariableswereretainedinthe

modelequations.ItwasalsoobservedthattwoPCs(PC1andPC2),

onePC(PC1)andtwoPCs(PC2andPC4)liedin95%confidence

intervalinmonsoon,postmonsoonandwinter,respectively.Thus,

theforecastingEquations(12,13,14,and15)usingPCRtechnique

forfourseasonsareshownbelow: [AQI] 450.53 0.773 PC1 1.040 PC2 0.969 PC3 u u u (12) [AQI] 257.59 1.639 PC1 0.398 PC2 u u (13) [AQI] 260.605 1.856 PC1 u (14) [AQI] 162.504 1.462 PC2 0.617 PC4 u u (15)

Figures3a,3b,3c,and3dshowgraphicalpresentationoffour

differentseasons. The coefficientofcorrelation(R) between

observedandforecastedvaluesfortheyears2000–2005were

foundas0.70,0.79,0.80and0.64insummer,monsoon,post

monsoonandwinter,respectively.ThedailyAQIoftheyear2006

wasforecastedusingtheEquations(12)–(15). (a) (b) (c) (d) Figure3.ComparisonofobservedandmodelpredictedvaluesofdailyAQIin(a)Summer,(b)Monsoon,

(c)PostMonsoonand(d)Winterseasonsduringtheyears2000Ͳ2005. 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 550 600 Days AQ I Observed Prediction 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 550 600 Days AQ I Ovserved Prediction 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 550 600 Days AQ I Observed Prediction 0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500 Days AQ I Observed Prediction

(7)

ThecomparisonofforecastedandobservedAQIvaluesforthe

year2006areshowninFigures 4a,4b,4c,and4dforsummer,

monsoon,postmonsoonandwinter,respectively.Figures3and4

showthatmaximumobservedAQIis497andtheminimumis48in

monsoonseasonof2003and2000respectively,whereasthe

predictedmaximumAQIis447andtheminimumis96.5in

monsoonfortheperiodsof2003and2000.Theobservedvalues

for2006werenotincludedinmodeldevelopmentforforecasting

AQIof2006.Figure4alsoshowsthatthereisonedayshifting

betweenthepredictedandobservedvaluesofAQI.Thereasonfor

thisshiftingmaybeduetotheuncertaintiesinvolvedintheair

qualitydatafortheyears2000–2005thatwasvalidatedwiththe

datafor2006.

Statisticalevaluationbetweenobservedandpredictedvalues

for2000–2005and2006wasmadefordifferentseasonsinTable5.

TheNMSEandcoefficientofdetermination(R2)arefoundas

(0.0082,0.5767)in summer, followedby(0.0418, 0.4225) in

monsoon;(0.0241,0.5155)inpostmonsoonand(0.0058,0.5625)

inwinterseasonsduring2006. ThisshowsthatforecastedAQI

could be explained by the selected input variables as

approximately 58% in summer, 57% in winter, 52% in post

monsoonand42%inmonsoonseasons.Fractionalbiasshowsthe

under–predictionofPCRmodelinalltheseasonsintrainingaswell

asinvalidation.However,theoverallperformanceofthePCR

modelwasfoundbetterincomparisontotheMLRmodelandalso

model’sperformancewasfoundtobebetterinwintercompared

tootherseasons.

4.

Conclusions

Inthepresentstudy,thedailyAQIatITOwasforecastedusing

theMLRandPCRmodelsbasedonthepreviousday’sAQIand

meteorologicalvariables.

Thestatisticallyerroranalysisofmodelevaluationforallfour

seasonsshowsthatmodelisperformingsatisfactorilyinallthe

seasonsbutisperformingbetterinwinterthantheotherseasons.

TheuseofPCsbasedmodelswasfoundusefulduetoelimination

ofcollinearityproblemsinMLRandreductionofthenumberof

predictors.ItisalsofoundthattheperformanceofthePCRmodel

wasfoundbetterincomparisontotheMLRmodelin2006

validationperiod.Finally,itcouldbeconcludedthattheairquality

forecastingwouldbehelpfultoconcernedauthoritiesinproviding

thenecessaryinformationtothegeneralpublic,toprotecttheir

healthandtakenecessaryprecautionarymeasures.

(a) (b)

(c) (d)

Figure4.ComparisonofobservedandmodelpredictedvaluesofdailyAQIin(a)Summer,(b)Monsoon,

(c)PostMonsoonand(d)Winterseasonsduringtheyear2006.

Table5.ComparisonofPCRmodelpredictedandobservedvaluesinyears2000Ͳ2005andyear2006

S.N. Season 2000Ͳ2005 2006

RMSE NMSE Coefficientof

determination

Fractional

Bias RMSE NMSE

Coefficientof determination Fractional Bias 1 Summer 35.91 0.0116 0.4902 0.0003 30.90 0.0082 0.5767 0.0229 2 Monsoon 53.94 0.0417 0.6241 0.0002 55.62 0.0418 0.4225 0.0094 3 Post Monsoon 40.44 0.0166 0.6467 0.0003 47.40 0.0241 0.5155 0.0019 4 Winter 41.52 0.1273 0.4096 0.0001 27.19 0.0058 0.5625 0.0360 0 50 100 150 200 250 300 350 400 450 500 0 10 20 30 40 50 60 70 80 90 100 Days AQI Observed Prediction 0 50 100 150 200 250 300 350 400 450 500 0 10 20 30 40 50 60 70 80 90 100 Days AQI Observed Prediction 0 50 100 150 200 250 300 350 400 450 500 0 10 20 30 40 50 60 70 80 90 100 Days AQI Observed Prediction 0 50 100 150 200 250 300 350 400 450 500 0 10 20 30 40 50 60 70 80 90 100 Days AQ I Observed Prediction

(8)

Appendix

Thestatisticalmeasuresusedforstatisticalevaluationofthe

performanceofmodelsweregivenbyChangandHanna(2004)as

follows:

CoefficientofCorrelation(R).Coefficientofcorrelation(R)is

relativemeasureoftheassociationbetweentheobservedand

predicted values. It can vary from 0 (which indicates no

correlation)to+1.0(whichindicatesperfectcorrelation).Avalueof

Rcloseto1.0impliesgoodagreementbetweentheobservedand

predictedvalues,i.e.goodmodelperformance.

p o o o p p C C C C C C R V V

CoefficientofDetermination(R2).Coefficientofdetermination

(R2),whichisthesquareofcoefficientofcorrelation,determines

theproportionofvariancethatcanbeexplainedbythemodel.

RootMeanSquareError(RMSE).RMSE,isameasureofthe

differencesbetweenvaluespredictedbyamodelandtheobserved

valuesandisexpressedasfollows:

2

o p RMSE C C

NormalizedMeanSquareError(NMSE).NMSE,asameasureof

performance,emphasizesthescatterintheentiredatasetandis

definedasfollows:

2 o p o p C C C .C NMSE

ThenormalizationbyC .Co pensuresthatNMSEwillnotbe biasedtowardsmodelsthatover–predictorunder–predict.Ideal

valueforNMSEiszero.SmallervaluesofNMSEdenotebetter

modelperformance.

FractionalBias(FB).Itisaperformancemeasureknownasthe

normalizedorfractionalbiasofthemeanconcentrations:

oo PP

C C FB 0.5 C C

whereCparethemodelpredictions,Coaretheobservations,

Overbar

C

istheaverageoverthedataset,and

V

Cisthe

standarddeviationoverthedataset.

Supporting

Material

Available

TheweightsofthePC’sforsummerseason(TableS1),The

weightsofthePC’sformonsoonseason(TableS2),Theweightsof

thePC’sforpostmonsoonseason(TableS3),Theweightsofthe

PC’sforwinterseason(TableS4).Thisinformationisavailablefree

ofchargeviatheInternetathttp://www.atmospolres.com.

References

Aneja,V.P.,Agarwal,A.,Roelle,P.A.,Phillips,S.B.,Tong,Q.S.,Watkins,N., Yablonsky,R.,2001.Measurementsandanalysisofcriteriapollutants inNewDelhi,India.EnvironmentInternational27,35Ͳ42.

Anfossi,D.,Brisasca,G.,Tinarelli,G.,1990.Simulationofatmospheric diffusioninlowwindspeedmeanderingconditionsbyaMonteCarlo dispersionmodel.IlNuovoCimento13C,995Ͳ1006.

Aron,R.,1984.Modelsforestimatingcurrentandfuturesulphurdioxide concentrationsinTaipei.BulletinofGeophysics25,47–52.

Aron,R.,Aron,I.M.,1978.Statisticalforecastingmodels:I.Carbon monoxideconcentrationsintheLosAngelesbasin.JournalofAir PollutionControlAssociation28,681–684.

Boznar,M.,Lesjak,M.,Mlakar,P.,1993.AneuralͲnetworkͲbasedmethod forshortͲtermpredictionsofambientSO2concentrationsinhighly

pollutedindustrialareasofcomplexterrain.AtmosphericEnvironment PartBͲUrbanAtmosphere27,221Ͳ230.

Brandon,C.,Hommann,K.,1995.Thecostofinaction:valuingthe economyͲwidecostofenvironmentaldegradationinIndia.Proceedings oftheSymposiumonGlobalSustainability,UnitedNationsUniversity, Tokyo.

CentralPollutionControlBoard,2005.PARIVESHhighlights2004ͲDelhi: CPCB.

Chang,J.C.,Hanna,S.R.,2004.Airqualitymodelperformanceevaluation. MeteorologyandAtmosphericPhysics87,167Ͳ196.

Cogliani,E.,2001.Airpollutionforecastincitiesbyanairpollutionindex highly correlated with meteorological variables. Atmospheric Environment35,2871Ͳ2877.

Comrie,A.C.,1997.Comparingneuralnetworksandregressionmodelsfor ozone forecasting.Journalof the Air and Waste Management Association47,653Ͳ663.

EPA,1999.AirQualityIndexReporting;FinalRule,FederalRegister,PartIII, 40CFRPart58.

Finzi,G.,Tebaldi,G.,1982.Amathematicalmodelforairpollutionforecast andalarminanurbanarea.AtmosphericEnvironment16,2055Ͳ2059. Goyal,P.,Sidhartha,2003.PresentscenarioofairqualityinDelhi:acase

studyofCNGimplementation.AtmosphericEnvironment37,5423Ͳ 5431.

Hajek,P.,Olej,V.,2009.Airqualityindicesandtheirmodellingby hierarchical fuzzy inference systems. WSEAS Transactions on EnvironmentandDevelopment10,661Ͳ672.

Katsoulis,B.D.,1988.Somemeteorologicalaspectsofairpollutionin Athens,Greece.MeteorologyandAtmosphericPhysics39,203Ͳ212. Lin,G.Y.,1982.Oxidantpredictionbydiscriminantanalysisinthesouth

coastairbasinofCalifornia.AtmosphericEnvironment16,135Ͳ143. Mantis,H.T.,Repapis,C.C.,Zerefos,C.S.,Ziomas,J.C.,1992.Assessmentof

thepotentialforphotochemicalairpollutioninAthensͲacomparison ofemissionsandairpollutantlevelsinAthenswiththoseinLos Angeles.JournalofAppliedMeteorology31,1467Ͳ1476.

McCollister,G.M.,Wilson,K.R.,1975.Linearstochasticmodelsfor forecastingdailymaximaandhourlyconcentrationsofairpollutants. AtmosphericEnvironment(1967)9,417Ͳ423.

Milionis,A.E.,Davies,T.D.,1994.Regressionandstochasticmodelsforair pollution. 1. Review, comments and suggestions. Atmospheric Environment28,2801Ͳ2810.

Nagendra,S.M.S.,Venugopal,K.,Jones,S.L.,2007.Assessmentofairquality neartrafficintersectionsinBangalorecityusingairqualityindex. TransportationResearchPartD:TransportandEnvironment12,167Ͳ 176.

Polydoras,G.N.,Anagnostopoulos,J.,Bergeles,G.C.,1998.Airquality predictions:dispersionmodelvsBoxͲJenkinsstochasticmodels.An implementationandcomparisonforAthens,Greece.AppliedThermal Engineering18,1037Ͳ1048.

Rajeevan,M.,Pai,D.S.,Rohilla,A.K.2005.Newstatisticalmodelsforlong rangeforecastingofsouthwestmonsoonrainfalloverIndia.NCC Research Report, National Climate Centre, India Meteorological Department,Pune–411005.

Robeson,S.M.,Steyn,D.G.,1990.Evaluationandcomparisonofstatistical forecastmodelsfordailymaximumozoneconcentrations.Atmospheric EnvironmentPartBͲUrbanAtmosphere24,303Ͳ312.

Sanchez, M.L., Pascual, D., Ramos, C., Perez, I., 1990. Forecasting particulatepollutantconcentrationsinacityfrommeteorological variablesandregionalweatherpatterns.AtmosphericEnvironment

(9)

PartAͲGeneralTopics24,1509Ͳ1519.

Sanchez,M.L.,Casanova,J.L.,Ramos,M.C.,Sanchez,J.L.,1986.Studying thespatialandtemporaldistributionofSO2inanurbanareaby

principalcomponentfactoranalysis.AtmosphericResearch20,53Ͳ65. Shi,J.P.,Harrison,R.M.,1997.RegressionmodellingofhourlyNOXandNO2

concentrationsinurbanairinLondon.AtmosphericEnvironment31, 4081Ͳ4094.

Sousa,S.I.V.,Martins,F.G.,AlvimͲFerraz,M.C.M.,Pereira,M.C.,2007. Multiplelinearregressionandartificialneuralnetworksbasedon principalcomponentstopredictozoneconcentrations.Environmental ModellingandSoftware22,97Ͳ103.

VandenElshout,S.,Leger,K.,Nussio,F.,2008.Comparingurbanairquality inEuropeinrealtimeͲareviewofexistingairqualityindicesandthe proposalofacommonalternative.EnvironmentInternational34,720Ͳ 726.

Ziomas,I.C.,Melas,D.,Zerefos,C.S.,Bais,A.F.,Paliatsos,A.G.,1995. Forecasting peak pollutantlevels from meteorological variables. AtmosphericEnvironment29,3703Ͳ3711.

References

Related documents

than 50% of the higher income households send their children to universities even if the government does not intervene, and thus necessarily support the policy t o ... An

The prompt recognition of skin manifestations in patients with prediabetes or diabetes is extremely important as it may trigger not only an adequate metabolic evaluation but also

It takes energy to produce hydrogen fuel, and that energy can be either from fossil fuels or from renewable energies like solar.. Hydrogen fuel is only renewable if it is made using

for children and the higher inhalation rate and higher dermal contact surface for adults.. For the other elements and for adults the cancer risk due to Cr, Ni and Cd inhalation is

2D &amp; 3D QSAR studies were performed on a set of 72 Biaryl analogues of PA-824 having various ether linkers using V-Life Molecular Design Suite (MDS 3.5) QSAR plus

As given in Table 6, in comparison with householders with none school level, those having primary and secondary school level are respectively three times and five times more

We utilize this background knowledge to generate SAM of PFOA on mild steel surface by dip coating (DC-PFOA) and in a new approach; it is intended to generate an

Leslie Bromberg and colleagues at the Plasma Sci- ence and Fusion Center at MIT are interested in the role of an MCF facility as a fuel source for a fleet of relatively