• No results found

Sliding window-based support vector regression for predicting micrometeorological data

N/A
N/A
Protected

Academic year: 2021

Share "Sliding window-based support vector regression for predicting micrometeorological data"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

Contents lists available at ScienceDirect

Expert

Systems

With

Applications

journal homepage: www.elsevier.com/locate/eswa

Sliding

window-based

support

vector

regression

for

predicting

micrometeorological

data

Yukimasa

Kaneda

a ,∗

,

Hiroshi

Mineno

b ,c

aGraduateSchoolofIntegratedScienceandTechnology,ShizuokaUniversity,3-5-1Johoku,Naka-ku,Hamamatsu,Shizuoka432-8011,Japan bCollegeofInformatics,AcademicInstitute,ShizuokaUniversity,3-5-1Johoku,Naka-ku,Hamamatsu,Shizuoka432-8011,Japan

cJST,PRESTO,4-1-8Honcho,Kawaguchi,Saitama,332-0012,Japan

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received 4 February 2016 Revised 29 March 2016 Accepted 13 April 2016 Available online 23 April 2016

Keywords:

Predicting micrometeorological data Data extraction

Dynamic aggregation Support vector regression Ensemble learning

a

b

s

t

r

a

c

t

Sensornetworktechnologyisbecomingmorewidespreadandsophisticated,anddeviceswithmany

sen-sors,suchassmartphonesandsensornodes,havebeenusedextensively.Sincethesedeviceshavemore

easilyaccumulatedvariouskindsofmicrometeorologicaldata,suchastemperature,humidity,andwind

speed,anenormousamountofmicrometeorologicaldatahasbeenaccumulated.Inrecentyears,ithas

beenexpectedthat suchanenormousamountofdata,called bigdata, willproduce novelknowledge

andvalue.Accordingly,manycurrentapplicationshaveuseddataminingtechnologyormachine

learn-ingtoexploitbigdata.However,micrometeorologicaldatahasacomplicatedcorrelationamongdifferent

features,anditscharacteristicschangevariouslywithtime.Therefore,itisdifficulttopredict

microme-teorologicaldataaccuratelywithlowcomputationalcomplexityevenifstate-of-the-artmachinelearning

algorithmsareused.Inthispaper,weproposeanew methodologyfor predictingmicrometeorological

data,slidingwindow-basedsupportvectorregression(SW-SVR)thatinvolvesanovelcombinationof

sup-portvectorregression(SVR)and ensemblelearning.Torepresentcomplicatedmicrometeorologicaldata

easily,SW-SVRbuildsseveralSVRsspecializedforeachrepresentativedatagroupinvariousnatural

envi-ronments,suchasdifferentseasonsandclimates,andchangesweightstoaggregatetheSVRsdynamically

dependingonthecharacteristicsoftestdata.Inourexperiment,wepredictedthetemperatureafter1h

and6hbyusinglarge-scalemicrometeorologicaldatainTokyo.Asaresult,regardlessoftestingperiods,

trainingperiods,andpredictionhorizons,thepredictionperformanceofSW-SVRwasalwaysgreaterthan

orequaltoothergeneralmethodssuchasSVR,randomforest,andgradientboosting.Atthesametime,

SW-SVRreduced thebuildingtimeremarkablycomparedwiththoseofcomplicatedmodels thathave

highpredictionperformance.

© 2016TheAuthors.PublishedbyElsevierLtd.

ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/ ).

1. Introduction

Sensornetwork technologyis becoming morewidespread and

sophisticated,anddeviceswithmanysensors havebeenused ex-tensively. The devicescanvery easily obtainvarious kindsof mi-crometeorological data such as temperature, humidity, and wind speed.Micrometeorologicaldataisaffectedstronglybythesurface oftheearthandisrelatedtoourlivesandindustrialactivity. Ac-cordingly, the data hasbeen used by many applications such as environmental control systemsfor greenhouses (Othman & Shaz- ali, 2012; Park & Park, 2011 ). Moreover, more advanced applica-tionsexploitthedatatoagreaterextentbyusingmachinelearning anddataminingtechnology.Furthermore,anenormousamountof

Corresponding author.

E-mailaddress:[email protected] (Y. Kaneda).

micrometeorologicaldatahasbeenaccumulatedbymanydevices, andithasbeenexpectedthatanalyzingsuchanenormousamount ofdata,calledbigdata,willproducenovelknowledgeandvalue.

Topredictmicrometeorologicaldataeffectively,anumberof re-searchershave studied machine learning(Smith, Hoogenboom, & McClendon, 2009 ).Theseresearchersdescribedpredictionmethods formicrometeorological data;particularly,predictionperformance

andcomputational complexitywereoftenmentioned.Meanwhile,

micrometeorologicaldatahasacomplexcorrelation among differ-entfeaturessuchastemperatureandhumidity.Moreover,its char-acteristicschange variouslywithtime. Therefore,evenifbig data isgiven astraining data,it is not easy to predict micrometeoro-logicaldataaccurately.Furthermore,inmanycases, sothat

mod-els can have high prediction performance, they have to become

complicated,andthecomputationalcomplexityincreases. Accord-ingly,some models probably cannot be builtfrom big data in a

http://dx.doi.org/10.1016/j.eswa.2016.04.012

(2)

practical amount of computing time. In other words, there is a trade-off relationship between high prediction performance and lowcomputational complexity.However,compatibilityis required insome practical use. As the prediction performance in applica-tionsbecomeshigher,thequalityprovidedbytheapplications be-comes better. For example, in the case of environmental control systemsbased on prediction (Kolokotsa, Pouliezos, Stavrakakis, & Lazos, 2009 ), the higherprediction performance enables the sys-temsto provideprecise control, precise management, andbetter environments.Ontheotherhand,modelsthatneedalongtimefor trainingareworthlessinpracticaluse.Incurrentsituationswhere theamountofusabledatahasincreasedremarkably,thistrade-off relationshiphasbecomeamorecriticalissue.

Recently, onetypeofmachinelearningalgorithm,support vec-tormachines(SVMs),havebeenusedsuccessfullyinvariousfields.

The basic theory is a more efficient learning method based on

probably approximately correct (PAC) learning. Moreover, SVMs

can separate non-linear data with low computational

complex-ity.Since most data observed in the real world is likely to have

non-linear relationships, SVMs have also been applied to

mi-crometeorological data prediction (Antonanzas, Urraca, Martinez- de-Pison, & Antonanzas-Torres, 2015; Mohammadi, Shamshirband, Anisi, Alam, & Petkovi ´c, 2015; Urraca, Antonanzas, Martinez-de- Pison, & Antonanzas-Torres, 2015 ). Moreover, SVMs led to better prediction performance than other algorithms such as artificial

neural networks (ANNs) and the autoregressive integrated

mov-ing average (ARIMA) model (Chevalier, Hoogenboom, McClendon, & Paz, 2011; Maity, Bhagwat, & Bhatnagar, 2010 ). However,when SVMslearnbigdata,thecomputationalcomplexityisstillamatter ofconcern.Anotheralternative learningmethod,ensemble learn-ing, hasalso been used more widely for predicting micrometeo-rological data (Singh, Gupta, & Rai, 2013 ). The prediction perfor-mance of ensemble learning is greater than or equal to that of

SVMs. The basic methodology isa combination of weak learners

builtfromdifferentkindsoftrainingdata.The combinationyields a higher generalizing capability that a single model cannot rep-resent. In particular, some researchers proposed improved meth-odsthat could be applied to micrometeorologicaldata prediction (Wang & Japkowicz, 2009; Xie, Li, Ngai, & Ying, 2009 ). However, itis difficultto apply the methods to regression,andit is possi-blethatthemodelswillnotbeabletofollowmicrometeorological datawhosecharacteristicsalwayschangewithtime.

In this paper, we propose a new methodology for predicting

micrometeorologicaldata,slidingwindow-basedsupportvector re-gression(SW-SVR).SW-SVRinvolvesa novel combinationof sup-portvector regression (SVR) andensemble learning. Torepresent complicatedmicrometeorological data easily, SW-SVR builds sev-eralSVRs specialized foreach representative data group in vari-ousnaturalenvironments,such asdifferentseasonsandclimates.

The specialized SVRs are built based on our previous proposed

method,dynamic short-distance data collection (D-SDC) that ex-tractseffectivedata forspecific datapredictionbytakingaccount ofmovements: changes indata during prediction horizons. Each weak learner built from each extracted data specializes on spe-cificdataand predicts accurately the datasimilar to the special-izeddata.Then,SW-SVRaggregatesallthepredictedvaluesbased onweights decided bythe similaritybetweentest dataandeach

data specialized by weak learners. This new ensemble learning

methodologythat changesweights dynamically enablesfollowing micrometeorologicaldatawhosecharacteristicshardlychangewith time.Ourresultsdemonstratedthatthepredictionperformanceof SW-SVRwasalways greaterthanorequaltothatofothergeneral methodssuchasSVR,randomforest,andgradientboosting.Atthe

same time, SW-SVR reduced the building time remarkably

com-paredwiththat ofcomplicatedmodels thathave highprediction performance.

2. Related work

As mentioned in the introduction, to predict micrometeoro-logical data effectively, SVMsand ensemble learninghave gener-ally been used. These algorithms have higher prediction perfor-manceformicrometeorologicaldatathan traditionalmethods

be-cause SVMsuse not only a margin maximizing algorithm whose

great performance was proved by PAC learningbutalso the ker-nel trick that enables non-linear separation. On the other hand, ensemble learning provides higher generalizing capability that a singlemodelcannotrepresent.Inthissection,abriefsummaryof thesealgorithms andsome improvedalgorithms aregiven.

More-over, so that SW-SVR can draw advantagesfrom both SVMsand

ensemblelearning, severalproblemsofthesealgorithmsfor prac-ticalusearediscussed.

2.1. Supportvectorregression

SVMs,introducedby Vapnik,(1995 ),havebeenusedsuccessfully in variousfields. Inthe simplestcase, binary classification,SVMs

obtain a separatinghyperplane decided by maximizing the

mar-gin. The margin means thenorms betweendifferent classes.PAC learningprovedthatmaximizingthemarginproduceshigh gener-alizationability.Moreover,thekerneltrickenablesSVMsto sepa-ratedatanon-linearlywithlowcomputationalcomplexity.Various kinds ofdata observed in the realworld are likelyto have non-linearrelationships. Accordingly, SVMsareused inmany applica-tions such as micrometeorological dataprediction (Kisi & Cimen, 2012; Maity et al., 2010 ).Meanwhile,SVMsforregression,support vectorregression(SVR),usesthesamemethodologyasSVMsthat havethehighestgeneralizationability.Inthissection,abrief sum-maryofSVRisgivenasfollows.

First,thelinearfunctionforregressionisgivenasfollows:

f

(

x

)

=wTx+b.

Then,aswithSVMs,SVRalsominimizesthenormoftheweight vectorw ;the L2 norm

w

2 isoftenused, andminimizing

w

2

corresponds to maximizing themargin. Meanwhile, SVR tolerates predictionerror

.Therefore,theprimalproblemofSVR isshown asfollows: minimize

w

2 2 subjectto

y i

wTxi +b

wTx i +b

yi

.

Moreover, to take some errors into account further, the same slack variables

ξ

as soft margin SVMs are introduced. The slack variables meanpenaltiesandincrease inproportiontoerrors be-tweentruevaluesandpredictedvalues.Theproblemthattheslack variablesareintroducedintoisshownasfollows:

minimize

w

2 2 +C i

ξ

i +

ξ

i

subjectto

yi

wTxi +b

+

ξ

i

wTx i +b

yi

+

ξ

i

ξ

i ,

ξ

i ∗≥0.

where the constant C means the balance between the effect of

maximizingthe marginandpenalties.Tominimize theabove for-mula,the slackvariables intheformulamust alsobe minimized. Accordingly,theslackvariablesdependingontheerrorsareshown asfollows:

ξ

i =

0

yi

wTxi +b

yi

wTx i +b

otherwise
(3)

ξ

i =

0

wTx i +b

yi

wTx i +b

yi

otherwise.

The above formulas mean that a penalty is not given when

the error islower than

, butthe error is regardedas a penalty thatcannot betoleratedwhentheerrorishigherthan

.Inother words,SVRtolerateserrorslessthan

,buterrorsover

aresolely taken into account aspenalties. Finally, the dual problem is de-rived fromthe above primal problemby Lagrange multiplier and

corresponds to a quadratic programmingproblem aswith SVMs.

Asaresult,sinceauniqueglobaloptimalsolutionissolved,SVRis superiortotraditionalalgorithms thatmightfallintoalocal opti-malsolution,suchasANNs.ThedualproblemderivedbyLagrange multiplierisshownasfollows:

maximize−1 2 i,j

(

α

i +

α

i

)

α

j +

α

j

xTi xj

i

(

α

i +

α

i

)

+ i yi

(

α

i

α

i

)

subjectto

i

(

α

i

α

i

)

=0

α

i ,

α

i ∗∈[0,C].

Moreover, the above dual problem can easily involve

non-linear map

ϕ

to consider a higherdimension. To introduce non-linear map

ϕ

in the above problem, kernel function K

(

xi , xj

)

=

ϕ

t

(

x

i

)

ϕ

(

xj

)

is defined andused instead ofxTi xj .Then

ϕ

t (xi )

ϕ

(xj )is determined based on K(xi , xj ) without calculation on a mapped higherdimension;thismethodiscalledthekerneltrick.SVRbased onmaximizingthemarginandthekerneltrickyieldshigh predic-tionperformance.

Meanwhile, conventional quadratic programming solvers, such

as the steepest descent method, have very high computational

complexity;thecomputational complexityisapproximatelyO(N3 ) where N isthe numberof trainingdata.Accordingly, a quadratic

programming solver for SVMs, sequential minimal optimization

(SMO), hasbecome de facto standard (Platt, 1998 ). SMO special-izedforSVMreduce thecomputationalcomplexityofSVMto ap-proximately O(N2 ). Nevertheless, when an enormous amount of dataisinputtedastrainingdata,thecomputationalcomplexity in-creases substantially.Tosolve the problem, atheory that regards

thequadraticprogrammingproblemasacomputationalgeometry

problem,corevectormachine(CVM),wasproposed (Tsang, Kwok, & Cheung, 2005 ).ThepredictionperformanceofCVMis compara-bleto that ofSVMs, andthecomputational complexitydecreases substantially.However,accordingtoapaper(Loosli, 2007 ),

predic-tion performance andcomputational complexity ofCVM strongly

dependonthevaluesofparameters.Therefore,whenessential pa-rameter tuningforpractical use istaken intoaccount, CVM does notalwayssatisfybothhighpredictionperformanceandlow com-putationalcomplexity.

SVRisoneofthebestalgorithmsinmachinelearningfromthe viewpointofpredictionperformance.Inparticular,ithasbeen ex-pected thatthe kerneltrickused inthedualproblemis effective forpredictingmicrometeorologicaldatathathasacomplex corre-lationamongdifferentfeatures.However, thecomputational com-plexity to solve the dual problemis often still long for practical use.Thus, itis difficultto applyconventional SVR directlyto mi-crometeorologicaldataprediction.

2.2. Ensemblelearning

Ensemblelearninghasbeenstudiedrecentlyandused increas-ingly.The basicmethodology ofensemble learningisa combina-tion of weak learners builtfrom different kindsof trainingdata. Thecombinationyieldsahighergeneralizingcapabilitythata sin-glemodelcannotrepresent.AswithSVMs,ensemblelearningcan

Algorithm1 Bagging for regression.

Input:

Training data: D={ (x1,y1),...,(xN, yN)} where xi∈X, yi∈Y

Number of weak learners: n

Fort=1 to ndo

1. Dt←generate sample from Dwith replacement 2. Ht( X) ←build a weak learner from Dt Output: H(X)=1 n n t=1 Ht(X) Algorithm2

Boosting for regression. Input:

Training data: D={ (x1,y1),...,(xN, yN)} where xi∈X, yi∈Y

Number of weak learners: n

Weights: wi=1/ N Fort=1 to ndo

1. Ht( X) ←build a weak learner from D by using weights wt 2. t← compute error rate of Ht( X)

3. αt←compute reliability of prediction result of Ht( X) based on t 4. wt+1←update weights wtbased on αt

Output: H(X)= n t=1 tHt(X))/ n t=1α t

representnon-linearrelationshipsandhasbeenused for predict-ing micrometeorological data. In particular, the two kinds of ap-proaches, bagging andboosting, haveoften beenused in ensem-blelearning.Theapproachesdiffergreatlyonthemethodtobuild weaklearnersandaggregatethem.

Baggingusesseveraltrainingdatageneratedbybootstrap sam-pling. The algorithm of basic bagging for regression is shown in Algorithm. 1 .In bagging, differentkinds oftraining dataare cre-atedbysamplinginputtedoriginaltrainingdatawithreplacement. Then,weaklearnersarebuiltfromeachsampledtrainingdata. Fi-nally,eachpredictedvalueisaggregatedbymajorityvoteor arith-meticaverage.Inparticular,randomforest,introducedbyBreiman (Breiman, 2001 ),to whichrandomnessinfeatureselection isalso applied, often demonstrates better prediction performance than conventionalmodelssuch asSVMs. Randomforestisusedin var-ious applications andhas been extended to other improved ver-sions.Forexample,topredictimbalanceddataobservedfrequently intherealworldmoreaccurately,improvedbalancedrandom for-est (IBRF) hasbeen proposed (Xie et al., 2009 ). IBRF involvesan efficientsamplingmethod forimbalanced dataandcost-sensitive learning that penalizes misclassification of minority class more strongly.TheauthorsshowedthatIBRFwasmoreeffectiveto pre-dictimbalanced datathanclass-weightedSVMsandconventional improvedrandomforestforimbalanceddataprediction.

Boosting builds repeatedly weak learners by using weights

based on the error rate. The algorithm of basic boosting for re-gressionsuch asAdaboost(Freund & Schapire, 1997 )is shownin Algorithm. 2 . Unlike bagging, almost all boosting algorithms use thesame trainingdata,butthe trainingdatais weighted repeat-edly. Boosting alternates between building weak learners by us-ing weights and updating weights. Finally, each predicted value is aggregated by weighted average. Various kinds of algorithms

in boosting have been studied and proposed; gradient boosting

(Friedman, 2001 )inparticularhasshownthebestprediction

per-formance in many competitions. Meanwhile, as with IBRF, the

boosting algorithm for imbalanced data, boosting-SVM, has also beenproposed (Wang & Japkowicz, 2009 ). The main characteris-ticofboosting-SVMisusingasymmetricmisclassificationcost.The

authors demonstrated that boosting-SVM enabled more accurate

(4)

Training data

Extracted data

Center of cluster

Test data

Number of weak learners: 3

Threshold of extraction

Training data Specialized object Movement of data

Training data at end of prediction horizon

(a) Extraction of training data by D-SDC.

(b) Weighted ensemble learning in SW-SVR.

Fig.1. Processing overview of SW-SVR.

When micrometeorologicaldataincludingmanyunusual natu-ralenvironmentsisregardedasimbalanceddata,theabove meth-odsarelikelytoclassifymicrometeorologicaldatamoreaccurately. However,theseapproachescannotbeappliedtoregression. More-over, according to our previous research (Suzuki, Kaneda, & Mi- neno, 2015 ),thereispropertrainingdatadependingontest data. Inotherwords,weightstoaggregateweaklearnersbuiltfrom dif-ferentkindsoftrainingdatashoulddependontestdata.

3. SW-SVR: Sliding window-based support vector regression

We propose a newmethodologyforpredicting

micrometeoro-logicaldata,slidingwindow-basedsupportvectorregression, com-biningmethodologiesofSVRandensemblelearning.Thebasic the-oriesarebasedonD-SDC,ourpreviousproposedmethodtoextract effectivedataforspecificdataprediction,andnovelweighted en-semblelearningasshownin Fig. 1 .First,torepresentcomplicated micrometeorologicaldataeasily,SW-SVRbuilds severalSVRs spe-cializedforeachrepresentativedatagroupinvariousnatural envi-ronments,such asdifferentseasonsandclimates.Thespecialized SVRsarebuiltbasedonD-SDCthatextractseffectivedatafor spe-cificdata predictionbytakingaccount ofmovements:changes of dataduringpredictionhorizons(Fig. 1 (a)).Eachweaklearnerbuilt fromeachextracteddataspecializesonspecificdataandaccurately predicts thedata similar to the specializeddata. Afterward, each weaklearner isaggregated withweightsdetermined dynamically atthe timeof predictionsoastomaintaintheprediction perfor-mance of micrometeorological data whose characteristics always changewithtime(Fig. 1 (b)).Theweightsaredecidedbythe simi-laritybetweentest dataandeachdataspecializedbyweak learn-ers.Evenifthecharacteristicsofmicrometeorologicaldataalways changewithtime, SW-SVRalways givesprioritytoweak learners thatare more suitablefor predictingtestdata. Thedetails of the SW-SVRalgorithm are shown in Algorithm. 3 . The procedure for trainingconsistsof two kindsof preprocessing,iterated learning, anddynamicaggregation.Theprocedures ofeachpartare shown asfollows.

The below-mentionedalgorithms inSW-SVRusetheL2 norm:

the Euclid distance, and the performance is related to feature space. For example, if feature space includes noisy features or non-linear relationships between features, the performance will probably be reduced substantially. In particular, micrometeoro-logical data has a complex correlation among different features suchastemperatureandhumidity.Accordingly,featurespacemust be mapped into other feature space that takes into account the presence of noise and non-linear relationships. In our approach, we usekernel approximation (Rahimi & Recht, 2007 ) andpartial

Algorithm3

Sliding window-based support vector regression. Input:

Training data set:S={(x1,y1,x1), ...,(xN,yN,xN)}where xiX, yiY, xiX

Test data: P

Number of weak learners: n

Weight parameters: p, q

Preprocessing:

1.apply normalization toXandX

2. fit kernel approximation and PLS regression to X and X

3.Mi=|| xixi|| ,i=1 ...N

4.Gt ←each center of kmenas(X), t=1 ...n Fort=1 to ndo 1.Dti=|| Gtxi|| ,i=1 ...N 2.rt= N i=1( wiMi)/ N i=1( wi)where wi= 1 /Dtip 3.St={ (xi, yi)| Dti<rt} ,i=1 ...N 4.Ht( X) ←train LinearSVR( St) Output: H( P) = n t=1 (wtHt(P))/ n t=1 (wt) where wt= 1 /|| GtP|| q

leastsquares(PLS)regression(Tenenhaus, Vinzi, Chatelin, & Lauro, 2005 )tomapintonewfeaturespace.Kernelapproximation gener-ates newfeature spaceandinvolves higher dimensionsthat rep-resent non-linear data as linear data with a very low computa-tionalcomplexity.Actually,acombinationofkernelapproximation andlinearSVMsledtofasterpredictionperformance thatis com-parable tothat of exactSVM (Cao, Naito, & Ninomiya, 2008 ). On the other hand, PLS regression is a supervised dimension reduc-tionmethodology.Thismethodcanreduce dimensionsby extract-ing latentvariablesthat haveastrongrelationship witha depen-dentvariable.Iffeaturespaceincludesnoisyfeatures,theeffectis reducedbecauseofPLS regression.Thecombinationofkernel ap-proximation and PLS regression enables SW-SVR to use effective featurespaceforcalculationoftheL2 norm in

micrometeorologi-caldata.

According to our previous research, to accurately predict par-ticularspecificdatainmicrometeorologicaldata,itisnecessaryto extract effectivetraining datafor specificdata prediction (Suzuki et al., 2015 ).In SW-SVR,theseseveralspecific datais selectedin advance,andweaklearnersarebuiltfromextractedeffective train-ingdata forpredictingeachspecific data.Meanwhile, micromete-orologicaldatainvolves variousnaturalenvironmentssuch as dif-ferentseasonsandclimates.Therefore,eachselected specificdata

must represent more varied natural environments that probably

willappearsoastorepresentmicrometeorologicaldatabyseveral models. InSW-SVR, each specific datais selected by aclustering

(5)

algorithm, k-means(Macqueen, 1967 ).The k-means isone ofthe most famous non-hierarchicalclustering algorithms andclassifies datafasterunderseveralclustersthanotherclusteringalgorithms. In SW-SVR,thek-means classifiesall trainingdatainto thesame numberofclustersasthenumberofweaklearnersgivenbyusers. Then, each centerof clusters isused as specific data that repre-sentsvariousnaturalenvironments.

After selecting severalspecific data, SW-SVRiterates data ex-traction and building a model. First, SW-SVR extracts effective training data forpredicting each specific data by D-SDC(Suzuki, Kaneda, & Mineno, 2014 ).ThetheoryofD-SDCissimilartothatof the k-nearestneighbor (k-NN) algorithm,andD-SDCalso extracts sometrainingdatasimilartoaspecializedobject.However,inour D-SDC,theamountofextracteddatadependsonthemovementof aspecializedobjectwithtime.Themovementrmeansthechange ofaspecializedobjectduringpredictionhorizonsasshowninthe followingequation:

rt =

Gt Gt

where G isaspecializedobject,and G isaspecializedobjectafter predictionhorizons.D-SDCextractstrainingdatawhosenormfrom a specialized object is shorter than the movementr. Accordingly, extractedtrainingdataSbyD-SDCisgivenasfollows:

St =

(

xi, yi

)

|

Gt xi

<

Gt Gt

where x is the feature of training data and y is the dependent variable of trainingdata. D-SDCis based on the movementr be-cause the movementr is strongly related to autocorrelation of datasurroundingaspecializedobject.Inmicrometeorologicaldata,

movements in specific natural environments are mutually

sim-ilar, and the autocorrelation becomes lower when these

move-ments arebigger.Forexample,inJapan,thechangeofweatheris drasticeveryspring,andthenaturalenvironmentschangevarious

other naturalenvironments withtime.Meanwhile, whenwe

pre-dicttimeseriesdatasuchasmicrometeorologicaldata, autocorre-lation meanscorrelation between features anda dependent vari-able, andmore training datais required forhighly accurate pre-diction when autocorrelation is lower. Since D-SDC extracts the amount ofdatasurroundinga specializedobject inproportionto the movementr, extractionthat considers autocorrelation ofdata surroundinga specializedobject isachieved.However, the move-mentrisunknownbecause G isnotobserved.Meanwhile,as men-tionedabove,movementsofdatasurroundingaspecializedobject

are mutually similar. Therefore, D-SDC estimates the movement

r based on movements of training data similar to a specialized object byweightedaverage, wheretheweights are reciprocalsof normsbetweenaspecializedobjectandeachtrainingdata. Move-ments oftrainingdata canbe calculatedby referring tothe time wheneachtrainingdataisobserved.Theestimatedmovementris givenasfollows: rt =

Gt Gt

N i =1wi

xi xi

N i =1wi wherewi =

G 1 t xi

p , Nisthenumberoftrainingdata,andpisaweighted parame-ter.Afterward,SW-SVRbuildsseverallinearSVRsasweaklearners basedontheextracteddata.Asdescribedabove,acombinationof linear SVR andkernel approximation iscomparableto SVR using a kernel method.Moreover, linear SVR can be built much faster by usingliblinear (Fan, Chang, Hsieh, Wang, & Lin, 2008 ), an op-timized implementationforlinear SVMs, insteadofother general implementationsofSVMssuch aslibSVM(Chang & Lin, 2011 ). Al-thoughausablekernelinliblinearisrestrictedtothelinearkernel, liblinear can build the model much faster by solving the primal probleminsteadofthedualproblem. Furthermore,sinceall train-ing datais divided into smalleramounts of extracteddata, each

modelcan bebuiltfaster, anditiseasierto learneach extracted databyparallelprocessing.

Thepredicted valuesofSW-SVR take intoaccount the change ofnaturalenvironments withtime.In generalensemblelearning, prediction for regression depends on the weighted average, and theweightsaredeterminedatthetimeoftraining.However, SW-SVRdeterminesweightsdynamicallyatthetimeofprediction.The

weightsare determined bythenorm betweentestdataandeach

dataspecializedbyweaklearners.AfinalhypothesisofSW-SVRis shownasfollows: H

(

P

)

= n t=1wt Ht

(

P

)

n t=1wt wherewt = 1

Gt P

q ,

P is the test data, n is the number of weak learners, H(X) is

a hypothesis, and q is a weighted parameter. In our approach,

since the weights of ensemble learning are determined

dynami-callyforeveryprediction,SW-SVRcanfollowmicrometeorological datawhosecharacteristicsalwayschangewithtime.

Finally,we describe thecomputational complexity ofSW-SVR. Torepresentcomplicatedmicrometeorologicaldataeasily,SW-SVR

uses the various conventional methods besides D-SDC we

pro-posed:kernel approximation, PLS regression,k-means, andlinear

SVR. The computational complexity of these methods in general

increases linearly; in other words, the computational complexity isapproximatelyequaltoO(N)wherethenumberoftrainingdata

Nisevenbiggerthanthenumberofthedimensionsandeach pa-rameterofthesemethods. Moreover,thecomputational complex-ityofD-SDCcorresponds to O(nN) because D-SDCjustiteratesN

timesofdistancecalculationn+ 1timeswherenisthenumberof weaklearnersinSW-SVR.Therefore,ifNisevenbiggerthann,the computationalcomplexityofD-SDCalsoincreaseslinearly.The to-talcomputationalcomplexityofSW-SVRisapproximatelyequalto

O(N)thatisevenlessthanthatofSVR. 4. Evaluation

4.1. Experiment

Wecomparedtheperformance ofSW-SVRwithotherstandard

methods for regression: k-NN, decisiontree (DT), Adaboost, bag-ging,random forest (RF), gradient boosting (GB), linear SVR, and SVR usinga radial basis function (RBF)kernel that showshigher performance in various fields(RBF-SVR). Note that the kernel of kernelapproximation inSW-SVR is also the RBF kernel,and the baselearnerinAdaboostandbaggingisthedecisiontreethathas

been used generally. Moreover, to evaluate SW-SVR in more

de-tail,we evaluated the performance of linear SVR with mapping: standardlinearSVRtowhichthesamemappingasSW-SVRis

ap-plied(“mappedSVR”). MappedSVR clarifies each performance of

mappingfeaturespaceandensemblelearningbasedonD-SDC.All parameters ofthe usedmodels were adjusted by the gridsearch method.Baseline forthis evaluationwas the performance of the naivestpersistentmodelasshowninthefollowingformula:

ˆ

yi + t=yi

whereyˆisthe predictedvalue,y isthe truevalue, and

t is the predictionhorizons.

We evaluated the performance by two ways: hold-out valida-tion and 10-fold cross-validation. We predicted the temperature after 1h and 6 h by using large-scale micrometeorological data inTokyo (Japan Meteorological Agency, n.d. ). The dataconsistsof atmosphericpressure,temperature,relativehumidity,windspeed, andirradiance. Inhold-outvalidation,trainingperiodsarelimited totheearlierperiodsthantestingperiodssoastoassume practi-cal use;test dataisalways predictedbasedon pasttrainingdata in practical use. The training periods were from 3 months to 5

(6)

(a) Testing periods: 1 month

.

(b) Testing periods: 6 months

.

(c) Testing periods: 12 months

.

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 3 6 12 24 36 60 MAPE [%] (log scale)

Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 8.0E+00 1.6E+01 3.2E+01 6.4E+01 1.3E+02 2.6E+02 5.1E+02 3 6 12 24 36 60 MAPE [%] (log scale)

Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 5.0E+00 1.0E+01 2.0E+01 4.0E+01 8.0E+01 1.6E+02 3.2E+02 3 6 12 24 36 60 MAPE [%] (log scale)

Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR

Fig.2. MAPE for prediction after 1 h for each algorithm. Note that (b) and (c) are shown with log scale.

yearsbefore September 1,2014, andtestingperiods werefrom 1 monthto 1 yearlater the same day.By varying the training pe-riodsandthetestingperiods, theperformance under thevarious usage scenarios is evaluated. Onthe other hand, the periods for 10-fold cross-validationwere 6years from September 1,2009 to

September1,2015. Notethat theamount ofdataper monthwas

approximately4000 becausethe data was accumulatedevery 10 minutes.Inthisevaluation,weusedthemeanabsolutepercentage error(MAPE) asthe index of prediction error andbuilding time calculatedbasedontheCPU clocktime asthe indexof computa-tionalcomplexity.MAPEisshownasfollows:

MAPE= 100 N N i=1

yi yˆi yi

whereN isthenumber oftestdata, yis thetruevalue, andyˆis thepredicted value. Moreover, we evaluated the averageof each extractionratebyD-SDCin eachexperimental conditionsoasto analyzetheperformanceofSW-SVRandD-SDCfurther.All imple-mentationsforthisevaluationareinPython,andimplementations inscikit-learn (Pedregosa et al., 2012 ) were usedforall methods

except SW-SVR. This evaluation wasperformed on a single core

ofa machinewithan IntelCorei5-2500KProcessorand12GBof

RAM;even thoughseveralmethods, suchrandom forestand

SW-SVR,can be performed onparallel processing, the methods were performedona singlecoresoastoevaluatethebuildingtime of allmethodsfairly.

4.2.Resultsanddiscussion

Fig. 2 and 3 show theprediction errorinthe prediction hori-zons of 1h and 6 h, respectively. Note that a log scale is used in Figs. 2 (b),(c), 3 (b), and(c). The results indicate that SW-SVR produced thebest average performance in all models duringthe whole testing periods, training periods, and prediction horizons. Inparticular,theeffectoccursnoticeablywhentestingperiodsare longerthantrainingperiods.Ontheother hand,inthissituation,

almostallmethods exceptSW-SVRhaveoftenlower performance

thanthenaivestpersistentmodelasbaseline.Theresults demon-stratethat theconventional superior methodsdo not always dis-play the great performance for micrometeorological data predic-tiondepending on difficulty of theprediction caused by training periodsandtestingperiodsandpredictionhorizons.Moreover, in

algorithms based onSVR, the prediction performance of SW-SVR

isalmostthebest,followedinorderbythoseofRBF-SVR,mapped

SVR,andlinearSVR.ThedifferencebetweenmappedSVRand lin-earSVRisduetotheeffectofmappingfeaturespace.Ontheother

hand,thedifference betweenSW-SVRandmappedSVR isdueto

the effect of ensemble learning based on D-SDC. These

compar-isons demonstratedthat both mappingfeature spaceand

ensem-blelearningbasedonD-SDCareeffectiveforimprovingprediction

performance. Meanwhile, mappedSVR also tended tohave lower

predictionperformancethanthatofSW-SVRwhenthetesting pe-riodsarelongerthanthe trainingperiods.Accordingly,underthis condition,ensemblelearningbasedonD-SDCisparticularly effec-tive. When the testingperiods are longer than the training peri-ods, the effectivetrainingdata for predictingthe test datais re-duced. We considered that a little training data that D-SDC ex-tractedforbuildingmodelscorrespondedtotheeffectivetraining dataforpredictingthetestdata.Actually, Fig. 4 indicatesthe aver-ageofeach extractionratebyD-SDCanddemonstratesthatweak learnersofSW-SVRare alwaysbuiltfromavery smallproportion ofthewholetrainingdata.SW-SVRthat alwayspredicts microm-eteorologicaldata accurately regardlessofthe amountoftraining dataisverypracticalanduseful.

Table 1 showstheresultsof10-foldcross-validationinthe pre-diction horizonsof1h and6h.SW-SVRwasoftensuperiortoall

methods including RBF-SVM in hold-out validation. However, in

10-fold cross-validation,although SW-SVRhadhigher the predic-tion performance than that ofall methods except RBF-SVR, RBF-SVRwassuperiortoSW-SVRslightly.Theresultsdemonstratethat thepredictionperformanceofSW-SVRisaffectedbytemporal or-der betweentraining dataand test data,andSW-SVR is particu-larlysuitedtobeusedforpracticaluseinwhichtestdataisalways predictedbasedonpasttrainingdata.Meanwhile,evenin10-fold cross-validation,themagnituderelationofthepredictionerror

be-tween mappedSVRandlinearSVR andSW-SVR wassameasthe

caseofhold-outvalidation.Therefore,bothmappingfeaturespace andensemblelearningbasedonD-SDCareeffectiveforimproving predictionperformanceincross-validation.

Fig. 5 and 6 show the building time in the prediction hori-zons of 1h and 6 h, respectively. Figs. 5 (a) and 6 (a) show the buildingtimeofmodelsthat havehighpredictionperformanceas shownin Figs. 2 and 3 ,RF,GB,RBF-SVR,andSW-SVR,when train-ing periods were varied. Note that the numberof weak learners was1000intheensemblelearningseries,costparameterwas1in theSVRseries,and

σ

ofSW-SVRwas0.00001;

σ

ofSW-SVRwasa parameteroftheRBFkernelinkernelapproximation.Theseresults demonstratedthatthebuildingtimeofensemblelearning,suchas SW-SVR,increasesmoregentlythanthatofSVR.Inparticular,the
(7)

(a) Testing periods: 1 month

.

(b) Testing periods: 6 months

.

(c) Testing periods: 12 months

.

7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 3 6 12 24 36 60 MAPE [ % ] (log scale)

Training periods [months]

Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 3.0E+01 6.0E+01 1.2E+02 2.4E+02 4.8E+02 3 6 12 24 36 60 MAPE [ % ] (log scale)

Training periods [months]

Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 2.0E+01 4.0E+01 8.0E+01 1.6E+02 3.2E+02 3 6 12 24 36 60 MAPE [%] (log scale)

Training periods [months]

Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR

Fig.3. MAPE for prediction after 6 h for each algorithm. Note that (b) and (c) are shown with log scale.

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 3 6 12 24 36 60 E x tra c ti o n rate [%]

Training periods [months]

Number of weak learners: 10 Number of weak learners: 100 Number of weak learners: 1000

0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 3 6 12 24 36 60 E x tra c ti o n rate [%]

Training periods [months]

Number of weak learners: 10 Number of weak learners: 100 Number of weak learners: 1000

(b) Prediction horizons: 6 hours.

(a) Prediction horizons: 1 hour.

Fig.4. Average of each extraction rate by D-SDC in SW-SVR.

Table1

MAPE of 10-fold cross-validation for each algorithm. Methods

Prediction horizons SW-SVR k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR Persistent 1h 5 .18608 8 .59929 5 .81042 11 .10375 10 .24014 5 .57213 5 .27190 5 .43892 5 .25274 5 .16985 5 .96816 6h 23 .49826 26 .52433 25 .99290 29 .93160 29 .58125 25 .55044 24 .14987 24 .68383 24 .26108 20 .94132 24 .86800 1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 Ti m e [ s e c ] (log scale) 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 Ti m e [ s e c ] (log scale) RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 SW-SVR RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 RBF SVR: σ = 0.1 RBF-SVR: σ = 0.00001 SW-SVR -1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 Ti m e [ s e c ] (log scale)

(b) Ensemble learning series.

(c) SVR series.

(a) Different training periods.

3 6 12 24 36 60

Training periods [months]

1 5 10 50 100

Cost

10 50 100 500 1000

Number of weak learners

Linear SVR RBF-SVR: σ = 0.1 RBF-SVR: σ = 0.001 RBF-SVR: σ = 0.00001 SW SVR: σ = 0.1 SW SVR: σ = 0.001 SW SVR: σ = 0.00001

(8)

(b) Ensemble learning series.

(c) SVR series.

(a) Different training periods.

1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 3 6 12 24 36 60 Ti m e [ s e c ] (log scale)

Training periods [months]

1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 10 50 100 500 1000 Ti m e [ s e c ] (log scale)

Number of weak learners

1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1 5 10 50 100 Ti m e [ s e c ] (l og scale) Cost RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 RBF-SVR: σ = 0.1 RBF-SVR: σ = 0.00001 SW-SVR RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 SW-SVR Linear SVR RBF-SVR: σ = 0.1 RBF-SVR: σ = 0.001 RBF-SVR: σ = 0.00001 SW-SVR: σ = 0.1 SW-SVR: σ = 0.001 SW-SVR: σ = 0.00001

Fig.6. Building time for prediction after 6 h for each model. Note that all figures are shown with log scale.

buildingtimeofSW-SVRisshortestwhenthetrainingperiods be-comelonger. Inotherwords,therateofbuildingtimeincrease of SW-SVRisthegentlestinallthemethods whentrainingdata in-creases.Theseresultsindicatethat,asmentionedabove,the com-putationalcomplexityofSW-SVRislessthanthat ofconventional

methodsincluding random forest and gradientboosting. SW-SVR

iseffectivefortrainingofanenormousamountofdataintermsof buildingtime.

Next, Figs. 5 (b)and 6 (b) show the buildingtime of the mod-elswithbetterperformanceinensemblelearning,RF,GB,and SW-SVR,whenthenumberofweaklearnerswasvaried.Notethatthe costparameterofSW-SVRwas1,

σ

ofSW-SVRwas0.00001,and

trainingperiods were 12 months. SW-SVR needs a longer

build-ingtime thanRF andGBusing shallowDT when thenumber of

weaklearnersislower.However, whenthe depthofDT becomes

deeperorthe numberofweak learnersbecomeshigher, SW-SVR

canbuild the model fasterthan orat thesame speed asRF and GB. Moreover, SW-SVR, as with RF,can be run easily in parallel environments,anditisexpectedthatthebuildingtimeofSW-SVR willbecomeevenshorter.

Finally, Figs. 5 (c)and 6 (c)show thebuildingtime ofthe

mod-elsbased onSVR when theparameters ofSVR were varied. Note

thatthenumberofweaklearnerswas100,andthetraining peri-odswere12months.Theseresultsindicatethatthebuildingtime ofSW-SVRissignificantlyshorterthanthatofRBF-SVRbutlonger thanlinearSVR.Meanwhile, Fig. 4 demonstrates thatweak learn-ersofSW-SVRarealwaysbuiltfromaverysmallproportionofthe wholetrainingdata.Inparticular,when predictionhorizonswere 1h, the averageof each extraction ratewas0.47 percent at best

and 1.82 percent at worst. On the other hand, when prediction

horizons were 6 h, the average of each extraction rate was7.57 percentatbestand16.25percentatworst. Nevertheless,the rea-sonthecomputationalcomplexityofSW-SVRislargerthanlinear SVRisthattheincreaseofcomputationalcomplexitydueto build-ingseveralmodelsislarger.However,sincetheamountoftraining dataofeachweaklearnerreducessubstantially,thecomputational

complexityto build one model in SW-SVR reduces also.

Accord-ingly,whenthenumberofmodelsoneCPU buildsreducesby us-ingparallelprocessing,thecomputationalcomplexityoftheoverall SW-SVRislowerthanorequaltothatoflinearSVR.Meanwhile,as

withlinearSVR,SW-SVRneverdependsonthechangeof

param-etersrelated to SVR, andthe buildingtime is always a constant. As mentioned in theabove discussion, thebuilding time of SW-SVRsolelydependsonthe numberofweaklearnersandtraining

periods.Therefore,SW-SVRcanavoidanunexpectedlongbuilding timeinparametertuningthatchangeseachparametervariously.

These results demonstrate that SW-SVR predicts complicated

micrometeorological data with the best prediction performance

andthelowestcomputationalcomplexitycomparedwithstandard algorithms. In particular, we found that dynamic aggregation of models builtfromverylittle extracteddataby D-SDCiseffective forcompatibilityofhighpredictionperformance andlow compu-tational complexity.However, there are problemsto be solved in SW-SVR.Firstly,thepredictionperformanceofSW-SVRsometimes deterioratesdespiteanincreaseoftrainingdata.Inparticular,this problem occurred under the conditions that prediction horizons are 6 h asshownin Fig. 3 . Thisis because dataextracted by D-SDCinvolvesunnecessarytrainingdataforhighlyaccurate predic-tion.IfD-SDCextracts thesamedata asthe extracteddatawhen trainingperiodsareshorter,thepredictionperformanceofSW-SVR never deteriorates dueto an increase oftraining data.Therefore,

we must review both feature mapping and algorithms of D-SDC

so as to avoid extracting unnecessary training data. Meanwhile, SW-SVR is based on a combinationof severalalgorithms: kernel approximation, PLS regression, k-means, D-SDC, and linear SVR. Moreover, each algorithm has severalparameters. Therefore, SW-SVR hasmore variedparameters, andit takesmoretime to tune theparameters.Inthisexperiment,weusedagridsearch roughly soastodecidetheparametersinacertaintime.However,thereis still room forimprovementin theprediction performance by us-ingotherapproachessuchasageneticalgorithminsteadofagrid search(Huang & Wang, 2006 ).

5. Conclusion and future work

In thispaper,we proposed a new methodologyforpredicting micrometeorologicaldata,SW-SVRthatinvolvesanovel combina-tionofSVR andensemblelearning.TotaketheadvantagesofSVR andensemblelearning,SW-SVRbuildsseveralSVRsspecializedfor eachrepresentativedatagroupinvariousnaturalenvironmentsby usingD-SDC that extracts effectivetrainingdata forspecific data prediction. Moreover, to follow micrometeorological data whose characteristics always change withtime, prediction ofSW-SVR is

based on dynamically weightedensemble learningdepending on

thesimilaritybetweentestdataandeachdataspecializedbyweak learners. As a result of evaluation experiments using large-scale micrometeorological data,the prediction performance of SW-SVR isgreaterthanorequaltoothergeneralmethodssuchasSVR,RF,

(9)

andGB.Moreover,SW-SVRreducesthebuildingtimesubstantially comparedwithcomplicatedmodelsthathavehighprediction per-formance.Weanticipatethatdynamicaggregationofmodelsbuilt from variouskinds of extracteddata by D-SDCcan contribute to moresophisticatedstudiesofmicrometeorologicaldataprediction. Infuturework,weshouldevaluateSW-SVRinmorevaried sit-uations to show that SW-SVR workseffectively. In particular, we will use more complicated data that consists of many features. Furthermore,whenSW-SVRisappliedtoapplicationssuchas en-vironmental control systems, the performance ofoverall applica-tions should be evaluated. Currently,we have developedan

agri-cultural support system using SW-SVR, which controls

environ-mentsingreenhousesdependingontheactivityoftheplants.The evaluationoftheapplicationswilldescribethesuperiorityof SW-SVRinpracticaluse.

Acknowledgements

This study was partially supported by JST, PRESTO , and JSPS KAKENHI(26 6 60198 ),Japan.

References

Antonanzas,J.,Urraca,R.,Martinez-de-Pison,F.J.,&Antonanzas-Torres, F.(2015). Solarirradiationmappingwithexogenousdatafromsupportvectorregression machinesestimations.EnergyConversionandManagement,100,380–390. Breiman, L. (2001). Random forests. MachineLearning,45(1), 5–32 http://doi.org/10.

1023/A:1010933404324.

Cao, H., Naito, T., & Ninomiya, Y. (2008). Approximate RBF kernel SVM and its ap- plications in pedestrian classification. The1stInternational Workshopon Ma-chine Learning for Visionbased Motion Analysis - MLVMA’08, 1–9 http://hal. archives-ouvertes.fr/inria-00325810/.

Chang, C., & Lin, C. (2011). LIBSVM : A library for support vector machines. ACM TransactionsonIntelligentSystemsandTechnology(TIST),2, 1–39 http://doi.org/ 10.1145/1961189.1961199.

Chevalier,R.F.,Hoogenboom,G.,McClendon,R.W.,&Paz,J.A.(2011).Support vec-torregressionwithreducedtrainingsetsforairtemperatureprediction:A com-parisonwithartificialneuralnetworks.NeuralComputing&Applications,20(1), 151–159Retrievedfrom<GotoISI>://WOS:000286674800015.

Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. TheJournalofMachineLearning,9(2008), 1871–1874 http://doi.org/10.1038/oby.2011.351.

Freund, Y., & Schapire, R. (1997). A desicion-theoretic generalization of on-line learning and an application to boosting. ComputationalLearningTheory,55(1), 119–139 http://doi.org/10.1006/jcss.1997.1504.

Friedman,J. H.(2001). Greedyfunction approximation:Agradientboosting ma-chine.AnnalsofStatistics,29(5),1189–1232.

Huang, C. L., & Wang, C. J. (2006). A GA-based feature selection and parameters optimizationfor support vector machines. ExpertSystemswithApplications,31(2), 231–240 http://doi.org/10.1016/j.eswa.2005.09.024.

Japan Meteorological Agency. (n.d.).. Japanmeteorologicalagencyhttp://www.jma.go. jp/jma/indexe.html.

Kisi, O., & Cimen, M. (2012). Precipitation forecasting by using wavelet-support vec- tor machine conjunction model. EngineeringApplicationsofArtificialIntelligence, 25(4), 783–792 http://doi.org/10.1016/j.engappai.2011.11.003.

Kolokotsa, D., Pouliezos, A., Stavrakakis, G., & Lazos, C. (2009). Predictive con- trol techniques for energy and indoor environmental quality management in buildings. BuildingandEnvironment,44(9), 1850–1863 http://doi.org/10.1016/j. buildenv.2008.12.007.

Loosli,G.(2007).Commentsonthecorevectormachines:fastSVMtrainingonvery largedatasets.TheJournalofMachineLearningResearch,8,291–301.

Macqueen, J. (1967). Some methods for classification and analysis of multivari- ate observations. In Proceedingsofthefifthberkeleysymposiumon mathemati-calstatisticsandprobability:1 (pp. 281–297). http://doi.org/citeulike-article-id: 6083430.

Maity,R.,Bhagwat,P.,&Bhatnagar,A.(2010).Potentialofsupportvectorregression forpredictionofmonthlystreamflowusingendogenousproperty.Hydrological Processes,24(7),917–923.

Mohammadi,K.,Shamshirband,S.,Anisi,M.H.,Alam,K.A.,&Petkovi´c,D.(2015). Supportvectorregressionbasedpredictionofglobalsolarradiationona hori-zontalsurface.EnergyConversionandManagement,91,433–441.

Othman, M. F., & Shazali, K. (2012). Wireless sensor network applications: A study in environment monitoring system. In ProcediaEngineering:41(pp. 1204–1210). http://doi.org/10.1016/j.proeng.2012.07.302.

Park, D. H., & Park, J. W. (2011). Wireless sensor network-based greenhouse envi- ronment monitoring and automatic control system for dew condensation pre- vention. Sensors,11(4), 3640–3651 http://doi.org/10.3390/s110403640.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Courna- peau, D., Brucher, M., Perrot, M. & Duchesnay, É. (2011). Scikit-learn: machine learning in python. TheJournal ofMachine LearningResearch 12, 2825–2830. http://doi.org/10.1007/s13398-014-0173-7.2

Platt, J. C. (1998). Fast training of support vector machines using sequential minimal optimization. AdvancesinKernelMethods, 185–208 http://doi.org/10.1109/ISKE. 2008.4731075.

Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines.

AdvancesinNeuralInformationProcessingSystems,20, 1177–1184 http://doi.org/ 10.1.1.145.8736.

Singh, K. P., Gupta, S., & Rai, P. (2013). Identifying pollution sources and predicting urban air quality using ensemble learning methods. AtmosphericEnvironment, 80, 426–437 http://doi.org/10.1016/j.atmosenv.2013.08.023.

Smith, B. A., Hoogenboom, G., & McClendon, R. W. (2009). Artificial neural networks for automated year-round temperature prediction. ComputersandElectronicsin Agriculture,68(1), 52–61 http://doi.org/10.1016/j.compag.2009.04.003. Suzuki,Y.,Kaneda,Y.,&Mineno,H.(2014).SW-SVRimprovedbyshort-distancedata

collectionmethod(pp.1–8)IPSJSIGTechnicalReport,2014-MBL-73(9).

Suzuki, Y., Kaneda, Y., & Mineno, H. (2015). Analysis of support vector regression model for micrometeorological data prediction. ComputerScienceand Informa-tionTechnology,3(2), 37–48 http://doi.org/10.13189/csit.2015.030202.

Tenenhaus, M., Vinzi, V. E., Chatelin, Y. M., & Lauro, C. (2005). PLS path modeling.

ComputationalStatisticsandDataAnalysis,48(1), 159–205 http://doi.org/10.1016/ j.csda.2004.03.005.

Tsang, I. W., Kwok, J. T., & Cheung, P.-M. (2005). Core vector machines: Fast SVM training on very large data sets. JournalofMachineLearningResearch,6, 363– 392 http://doi.org/10.1111/j.1442-9993.2007.01810.x.

Urraca,R.,Antonanzas,J.,Martinez-de-Pison,F.J.,&Antonanzas-Torres,F.(2015). Estimationofsolarglobalirradiationinremoteareas.JournalofRenewableand SustainableEnergy,7(2),023136.

Vapnik, V. N. (1995). TheNatureofStatisticalLearningTheory: Vol. 8. Springer http: //doi.org/10.1109/TNN.1997.641482.

Wang, B. X., & Japkowicz, N. (2009). Boosting support vector machines for imbal- anced data sets. KnowledgeandInformationSystems,25(1), 1–20 http://doi.org/ 10.1007/s10115-009-0198-y.

Xie, Y., Li, X., Ngai, E. W. T., & Ying, W. (2009). Customer churn prediction using improved balanced random forests. ExpertSystemswithApplications,36(3 PART 1), 5445–5449 http://doi.org/10.1016/j.eswa.2008.06.121.

217–225 ScienceDirect www.elsevier.com/locate/eswa ( PRESTO , 380–390 1023/A:1010933404324 10.1145/1961189.1961199 015 http://doi.org/10.1038/oby.2011.351 http://doi.org/10.1006/jcss.1997.1504 1189–1232 http://doi.org/10.1016/j.eswa.2005.09.024 Japan Meteorological http://doi.org/10.1016/j.engappai.2011.11.003 buildings. Loosli, 6083430 917–923 433–441 http://doi.org/10.1016/j.proeng.2012.07.302 http://doi.org/10.3390/s110403640 7.2 2008.4731075 10.1.1.145.8736 http://doi.org/10.1016/j.atmosenv.2013.08.023 http://doi.org/10.1016/j.compag.20 Suzuki, http://doi.org/10.13189/csit.2015.030202 04.03.0 http://doi.org/10.1111/j.1442-9993.2007.01810.x 023136 Vapnik, V. N. (1995). 0198- http://doi.org/10.1016/j.eswa.2008.06.121

References

Related documents

Client create an RTCPeerConnection offer by adding it’s stream and setting the local session description and then send the video call request to the server through the open

Then, based on the accelerated proximal gradient approach [20], we proposed a fast numerical method for solving the resulting 1 norm related minimization problem such that the

El Buen Vivir será, entonces, una construcción que pasa por desarmar la meta universal para todas las sociedades: el progreso en su deriva productivista y el desarrollo en tanto a

interrogate a key historical source, the 1641 Depositions, in ways not currently possible, by exploiting effective language technology developed by IBM LanguageWare. images ©

Squires previously served as Manager of Systems Engineering at Aurora Flight Sciences, Senior Researcher and Online Technical Director at Stevens Institute of Technology,

Oelke stated there is a letter in the retirement kit that is provided to participants, and Retirement Services encourages planned retirees to make an appointment at the same time

The other three Helmets started to kick the large buttocks of a fat man in his corduroy trousers, his black T-shirt covered with an array of wires and old

• Student employees and research assistants can work on projects year round (not just as summer interns) and intellectual property remains with the company • Students generally