Sliding window-based support vector regression for predicting micrometeorological data

(1)

Contents lists available at ScienceDirect

Expert

Systems

With

Applications

journal homepage: www.elsevier.com/locate/eswa

Sliding

window-based

support

vector

regression

for

predicting

micrometeorological

data

Yukimasa

Kaneda

a ,∗

,

Hiroshi

Mineno

b ,c

a_Graduate_School_of_Integrated_Science_and_Technology,_Shizuoka_University,_3-5-1_Johoku,_Naka-ku,_Hamamatsu,_Shizuoka_432-8011,_Japan b_College_of_Informatics,_Academic_Institute,_Shizuoka_University,_3-5-1_Johoku,_Naka-ku,_Hamamatsu,_Shizuoka_432-8011,_Japan

c_JST,_PRESTO,_4-1-8_Honcho,_Kawaguchi,_Saitama,_332-0012,_Japan

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received 4 February 2016 Revised 29 March 2016 Accepted 13 April 2016 Available online 23 April 2016

Keywords:

Predicting micrometeorological data Data extraction

Dynamic aggregation Support vector regression Ensemble learning

a

b

s

t

r

a

c

t

Sensornetworktechnologyisbecomingmorewidespreadandsophisticated,anddeviceswithmany

sen-sors,suchassmartphonesandsensornodes,havebeenusedextensively.Sincethesedeviceshavemore

easilyaccumulatedvariouskindsofmicrometeorologicaldata,suchastemperature,humidity,andwind

speed,anenormousamountofmicrometeorologicaldatahasbeenaccumulated.Inrecentyears,ithas

beenexpectedthat suchanenormousamountofdata,called bigdata, willproduce novelknowledge

andvalue.Accordingly,manycurrentapplicationshaveuseddataminingtechnologyormachine

learn-ingtoexploitbigdata.However,micrometeorologicaldatahasacomplicatedcorrelationamongdifferent

features,anditscharacteristicschangevariouslywithtime.Therefore,itisdiﬃculttopredict

microme-teorologicaldataaccuratelywithlowcomputationalcomplexityevenifstate-of-the-artmachinelearning

algorithmsareused.Inthispaper,weproposeanew methodologyfor predictingmicrometeorological

data,slidingwindow-basedsupportvectorregression(SW-SVR)thatinvolvesanovelcombinationof

sup-portvectorregression(SVR)and ensemblelearning.Torepresentcomplicatedmicrometeorologicaldata

easily,SW-SVRbuildsseveralSVRsspecializedforeachrepresentativedatagroupinvariousnatural

envi-ronments,suchasdifferentseasonsandclimates,andchangesweightstoaggregatetheSVRsdynamically

dependingonthecharacteristicsoftestdata.Inourexperiment,wepredictedthetemperatureafter1h

and6hbyusinglarge-scalemicrometeorologicaldatainTokyo.Asaresult,regardlessoftestingperiods,

trainingperiods,andpredictionhorizons,thepredictionperformanceofSW-SVRwasalwaysgreaterthan

orequaltoothergeneralmethodssuchasSVR,randomforest,andgradientboosting.Atthesametime,

SW-SVRreduced thebuildingtimeremarkablycomparedwiththoseofcomplicatedmodels thathave

highpredictionperformance.

ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/ ).

1. Introduction

Sensornetwork technologyis becoming morewidespread and

sophisticated,anddeviceswithmanysensors havebeenused ex-tensively. The devicescanvery easily obtainvarious kindsof mi-crometeorological data such as temperature, humidity, and wind speed.Micrometeorologicaldataisaffectedstronglybythesurface oftheearthandisrelatedtoourlivesandindustrialactivity. Ac-cordingly, the data hasbeen used by many applications such as environmental control systemsfor greenhouses (Othman & Shaz- ali, 2012; Park & Park, 2011 ). Moreover, more advanced applica-tionsexploitthedatatoagreaterextentbyusingmachinelearning anddataminingtechnology.Furthermore,anenormousamountof

∗ _{Corresponding}_author.

E-mailaddress:[email protected] (Y. Kaneda).

micrometeorologicaldatahasbeenaccumulatedbymanydevices, andithasbeenexpectedthatanalyzingsuchanenormousamount ofdata,calledbigdata,willproducenovelknowledgeandvalue.

Topredictmicrometeorologicaldataeffectively,anumberof re-searchershave studied machine learning(Smith, Hoogenboom, & McClendon, 2009 ).Theseresearchersdescribedpredictionmethods formicrometeorological data;particularly,predictionperformance

andcomputational complexitywereoftenmentioned.Meanwhile,

micrometeorologicaldatahasacomplexcorrelation among differ-entfeaturessuchastemperatureandhumidity.Moreover,its char-acteristicschange variouslywithtime. Therefore,evenifbig data isgiven astraining data,it is not easy to predict micrometeoro-logicaldataaccurately.Furthermore,inmanycases, sothat

mod-els can have high prediction performance, they have to become

complicated,andthecomputationalcomplexityincreases. Accord-ingly,some models probably cannot be builtfrom big data in a

http://dx.doi.org/10.1016/j.eswa.2016.04.012

(2)

practical amount of computing time. In other words, there is a trade-off relationship between high prediction performance and lowcomputational complexity.However,compatibilityis required insome practical use. As the prediction performance in applica-tionsbecomeshigher,thequalityprovidedbytheapplications be-comes better. For example, in the case of environmental control systemsbased on prediction (Kolokotsa, Pouliezos, Stavrakakis, & Lazos, 2009 ), the higherprediction performance enables the sys-temsto provideprecise control, precise management, andbetter environments.Ontheotherhand,modelsthatneedalongtimefor trainingareworthlessinpracticaluse.Incurrentsituationswhere theamountofusabledatahasincreasedremarkably,thistrade-off relationshiphasbecomeamorecriticalissue.

Recently, onetypeofmachinelearningalgorithm,support vec-tormachines(SVMs),havebeenusedsuccessfullyinvariousﬁelds.

The basic theory is a more eﬃcient learning method based on

probably approximately correct (PAC) learning. Moreover, SVMs

can separate non-linear data with low computational

complex-ity.Since most data observed in the real world is likely to have

non-linear relationships, SVMs have also been applied to

mi-crometeorological data prediction (Antonanzas, Urraca, Martinez- de-Pison, & Antonanzas-Torres, 2015; Mohammadi, Shamshirband, Anisi, Alam, & Petkovi ´c, 2015; Urraca, Antonanzas, Martinez-de- Pison, & Antonanzas-Torres, 2015 ). Moreover, SVMs led to better prediction performance than other algorithms such as artiﬁcial

neural networks (ANNs) and the autoregressive integrated

mov-ing average (ARIMA) model (Chevalier, Hoogenboom, McClendon, & Paz, 2011; Maity, Bhagwat, & Bhatnagar, 2010 ). However,when SVMslearnbigdata,thecomputationalcomplexityisstillamatter ofconcern.Anotheralternative learningmethod,ensemble learn-ing, hasalso been used more widely for predicting micrometeo-rological data (Singh, Gupta, & Rai, 2013 ). The prediction perfor-mance of ensemble learning is greater than or equal to that of

SVMs. The basic methodology isa combination of weak learners

builtfromdifferentkindsoftrainingdata.The combinationyields a higher generalizing capability that a single model cannot rep-resent. In particular, some researchers proposed improved meth-odsthat could be applied to micrometeorologicaldata prediction (Wang & Japkowicz, 2009; Xie, Li, Ngai, & Ying, 2009 ). However, itis diﬃcultto apply the methods to regression,andit is possi-blethatthemodelswillnotbeabletofollowmicrometeorological datawhosecharacteristicsalwayschangewithtime.

In this paper, we propose a new methodology for predicting

micrometeorologicaldata,slidingwindow-basedsupportvector re-gression(SW-SVR).SW-SVRinvolvesa novel combinationof sup-portvector regression (SVR) andensemble learning. Torepresent complicatedmicrometeorological data easily, SW-SVR builds sev-eralSVRs specialized foreach representative data group in vari-ousnaturalenvironments,such asdifferentseasonsandclimates.

The specialized SVRs are built based on our previous proposed

method,dynamic short-distance data collection (D-SDC) that ex-tractseffectivedata forspeciﬁc datapredictionbytakingaccount ofmovements: changes indata during prediction horizons. Each weak learner built from each extracted data specializes on spe-ciﬁcdataand predicts accurately the datasimilar to the special-izeddata.Then,SW-SVRaggregatesallthepredictedvaluesbased onweights decided bythe similaritybetweentest dataandeach

data specialized by weak learners. This new ensemble learning

methodologythat changesweights dynamically enablesfollowing micrometeorologicaldatawhosecharacteristicshardlychangewith time.Ourresultsdemonstratedthatthepredictionperformanceof SW-SVRwasalways greaterthanorequaltothatofothergeneral methodssuchasSVR,randomforest,andgradientboosting.Atthe

same time, SW-SVR reduced the building time remarkably

com-paredwiththat ofcomplicatedmodels thathave highprediction performance.

2. Related work

As mentioned in the introduction, to predict micrometeoro-logical data effectively, SVMsand ensemble learninghave gener-ally been used. These algorithms have higher prediction perfor-manceformicrometeorologicaldatathan traditionalmethods

be-cause SVMsuse not only a margin maximizing algorithm whose

great performance was proved by PAC learningbutalso the ker-nel trick that enables non-linear separation. On the other hand, ensemble learning provides higher generalizing capability that a singlemodelcannotrepresent.Inthissection,abriefsummaryof thesealgorithms andsome improvedalgorithms aregiven.

More-over, so that SW-SVR can draw advantagesfrom both SVMsand

ensemblelearning, severalproblemsofthesealgorithmsfor prac-ticalusearediscussed.

2.1. Supportvectorregression

SVMs,introducedby Vapnik,(1995 ),havebeenusedsuccessfully in variousﬁelds. Inthe simplestcase, binary classiﬁcation,SVMs

obtain a separatinghyperplane decided by maximizing the

mar-gin. The margin means thenorms betweendifferent classes.PAC learningprovedthatmaximizingthemarginproduceshigh gener-alizationability.Moreover,thekerneltrickenablesSVMsto sepa-ratedatanon-linearlywithlowcomputationalcomplexity.Various kinds ofdata observed in the realworld are likelyto have non-linearrelationships. Accordingly, SVMsareused inmany applica-tions such as micrometeorological dataprediction (Kisi & Cimen, 2012; Maity et al., 2010 ).Meanwhile,SVMsforregression,support vectorregression(SVR),usesthesamemethodologyasSVMsthat havethehighestgeneralizationability.Inthissection,abrief sum-maryofSVRisgivenasfollows.

First,thelinearfunctionforregressionisgivenasfollows:

f

(

x

)

=wT_x₊_b_.

Then,aswithSVMs,SVRalsominimizesthenormoftheweight vectorw ;the L2 _norm

_w

2 _is_often_used, _and_minimizing

_w

2

corresponds to maximizing themargin. Meanwhile, SVR tolerates predictionerror

.Therefore,theprimalproblemofSVR isshown asfollows: minimize

w

2 2 subjectto

_y i −

wTxi +b

≤

wT_x i +b

−yi ≤

.

Moreover, to take some errors into account further, the same slack variables

ξ

as soft margin SVMs are introduced. The slack variables meanpenaltiesandincrease inproportiontoerrors be-tweentruevaluesandpredictedvalues.Theproblemthattheslack variablesareintroducedintoisshownasfollows:

minimize

w

2 2 +C i

ξ

i +

ξ

i ∗

subjectto

⎧

⎨

⎩

yi −

wTxi +b

≤

+

ξ

i

wT_x i +b

−yi ≤

+

ξ

i ∗

ξ

i ,

ξ

i ∗≥0.

where the constant C means the balance between the effect of

maximizingthe marginandpenalties.Tominimize theabove for-mula,the slackvariables intheformulamust alsobe minimized. Accordingly,theslackvariablesdependingontheerrorsareshown asfollows:

ξ

i =

0

yi −

wTxi +b

≤

y_i−

wT_x i +b

−

otherwise

(3)

ξ

∗ i =

0

wT_x i +b

−yi ≤

wT_x i +b

−yi −

otherwise.

The above formulas mean that a penalty is not given when

the error islower than

, butthe error is regardedas a penalty thatcannot betoleratedwhentheerrorishigherthan

.Inother words,SVRtolerateserrorslessthan

,buterrorsover

aresolely taken into account aspenalties. Finally, the dual problem is de-rived fromthe above primal problemby Lagrange multiplier and

corresponds to a quadratic programmingproblem aswith SVMs.

Asaresult,sinceauniqueglobaloptimalsolutionissolved,SVRis superiortotraditionalalgorithms thatmightfallintoalocal opti-malsolution,suchasANNs.ThedualproblemderivedbyLagrange multiplierisshownasfollows:

maximize−1 2 i,j

(

α

i +

α

∗i

)

α

j +

α

∗j

xT_ixj −

i

(

α

i +

α

i ∗

)

+ i y_i

(

α

_i−

α

∗ i

)

subjectto

i

(

α

i −

α

∗ i

)

=0

α

i ,

α

i ∗∈[0,C].

Moreover, the above dual problem can easily involve

non-linear map

ϕ

to consider a higherdimension. To introduce non-linear map

ϕ

in the above problem, kernel function K

(

x_i, x_j

)

=

ϕ

t

(

_x

i

)

ϕ

(

xj

)

is deﬁned andused instead ofxTi xj .Then

ϕ

t (xi )

ϕ

(xj )is determined based on K(x_i, x_j) without calculation on a mapped higherdimension;thismethodiscalledthekerneltrick.SVRbased onmaximizingthemarginandthekerneltrickyieldshigh predic-tionperformance.

Meanwhile, conventional quadratic programming solvers, such

as the steepest descent method, have very high computational

complexity;thecomputational complexityisapproximatelyO(N3 ) where N isthe numberof trainingdata.Accordingly, a quadratic

programming solver for SVMs, sequential minimal optimization

(SMO), hasbecome de facto standard (Platt, 1998 ). SMO special-izedforSVMreduce thecomputationalcomplexityofSVMto ap-proximately O(N2 ). Nevertheless, when an enormous amount of dataisinputtedastrainingdata,thecomputationalcomplexity in-creases substantially.Tosolve the problem, atheory that regards

thequadraticprogrammingproblemasacomputationalgeometry

problem,corevectormachine(CVM),wasproposed (Tsang, Kwok, & Cheung, 2005 ).ThepredictionperformanceofCVMis compara-bleto that ofSVMs, andthecomputational complexitydecreases substantially.However,accordingtoapaper(Loosli, 2007 ),

predic-tion performance andcomputational complexity ofCVM strongly

dependonthevaluesofparameters.Therefore,whenessential pa-rameter tuningforpractical use istaken intoaccount, CVM does notalwayssatisfybothhighpredictionperformanceandlow com-putationalcomplexity.

SVRisoneofthebestalgorithmsinmachinelearningfromthe viewpointofpredictionperformance.Inparticular,ithasbeen ex-pected thatthe kerneltrickused inthedualproblemis effective forpredictingmicrometeorologicaldatathathasacomplex corre-lationamongdifferentfeatures.However, thecomputational com-plexity to solve the dual problemis often still long for practical use.Thus, itis diﬃcultto applyconventional SVR directlyto mi-crometeorologicaldataprediction.

2.2. Ensemblelearning

Ensemblelearninghasbeenstudiedrecentlyandused increas-ingly.The basicmethodology ofensemble learningisa combina-tion of weak learners builtfrom different kindsof trainingdata. Thecombinationyieldsahighergeneralizingcapabilitythata sin-glemodelcannotrepresent.AswithSVMs,ensemblelearningcan

Algorithm1 Bagging for regression.

Input:

Training data: D={ (x1,y1),...,(xN, yN)} where xi∈X, yi∈Y

Number of weak learners: n

Fort=1 to ndo

1. Dt←generate sample from Dwith replacement 2. Ht( X) ←build a weak learner from Dt Output: H(X)=1 n n t=1 Ht(X) Algorithm2

Boosting for regression. Input:

Training data: D={ (x1,y1),...,(xN, yN)} where xi∈X, yi∈Y

Weights: wi=1/ N Fort=1 to ndo

1. Ht( X) ←build a weak learner from D by using weights wt 2. t← compute error rate of Ht( X)

3. αt←compute reliability of prediction result of Ht( X) based on t 4. wt+1←update weights wtbased on αt

Output: H(X)= n t=1(α tHt(X))/ n t=1α t

representnon-linearrelationshipsandhasbeenused for predict-ing micrometeorological data. In particular, the two kinds of ap-proaches, bagging andboosting, haveoften beenused in ensem-blelearning.Theapproachesdiffergreatlyonthemethodtobuild weaklearnersandaggregatethem.

Baggingusesseveraltrainingdatageneratedbybootstrap sam-pling. The algorithm of basic bagging for regression is shown in Algorithm. 1 .In bagging, differentkinds oftraining dataare cre-atedbysamplinginputtedoriginaltrainingdatawithreplacement. Then,weaklearnersarebuiltfromeachsampledtrainingdata. Fi-nally,eachpredictedvalueisaggregatedbymajorityvoteor arith-meticaverage.Inparticular,randomforest,introducedbyBreiman (Breiman, 2001 ),to whichrandomnessinfeatureselection isalso applied, often demonstrates better prediction performance than conventionalmodelssuch asSVMs. Randomforestisusedin var-ious applications andhas been extended to other improved ver-sions.Forexample,topredictimbalanceddataobservedfrequently intherealworldmoreaccurately,improvedbalancedrandom for-est (IBRF) hasbeen proposed (Xie et al., 2009 ). IBRF involvesan eﬃcientsamplingmethod forimbalanced dataandcost-sensitive learning that penalizes misclassiﬁcation of minority class more strongly.TheauthorsshowedthatIBRFwasmoreeffectiveto pre-dictimbalanced datathanclass-weightedSVMsandconventional improvedrandomforestforimbalanceddataprediction.

Boosting builds repeatedly weak learners by using weights

based on the error rate. The algorithm of basic boosting for re-gressionsuch asAdaboost(Freund & Schapire, 1997 )is shownin Algorithm. 2 . Unlike bagging, almost all boosting algorithms use thesame trainingdata,butthe trainingdatais weighted repeat-edly. Boosting alternates between building weak learners by us-ing weights and updating weights. Finally, each predicted value is aggregated by weighted average. Various kinds of algorithms

in boosting have been studied and proposed; gradient boosting

(Friedman, 2001 )inparticularhasshownthebestprediction

per-formance in many competitions. Meanwhile, as with IBRF, the

boosting algorithm for imbalanced data, boosting-SVM, has also beenproposed (Wang & Japkowicz, 2009 ). The main characteris-ticofboosting-SVMisusingasymmetricmisclassiﬁcationcost.The

authors demonstrated that boosting-SVM enabled more accurate

(4)

Training data

Extracted data

Center of cluster

Test data

Number of weak learners: 3

Threshold of extraction

Training data Specialized object Movement of data

Training data at end of prediction horizon

(a) Extraction of training data by D-SDC.

(b) Weighted ensemble learning in SW-SVR.

Fig.1. Processing overview of SW-SVR.

When micrometeorologicaldataincludingmanyunusual natu-ralenvironmentsisregardedasimbalanceddata,theabove meth-odsarelikelytoclassifymicrometeorologicaldatamoreaccurately. However,theseapproachescannotbeappliedtoregression. More-over, according to our previous research (Suzuki, Kaneda, & Mi- neno, 2015 ),thereispropertrainingdatadependingontest data. Inotherwords,weightstoaggregateweaklearnersbuiltfrom dif-ferentkindsoftrainingdatashoulddependontestdata.

3. SW-SVR: Sliding window-based support vector regression

We propose a newmethodologyforpredicting

micrometeoro-logicaldata,slidingwindow-basedsupportvectorregression, com-biningmethodologiesofSVRandensemblelearning.Thebasic the-oriesarebasedonD-SDC,ourpreviousproposedmethodtoextract effectivedataforspecificdataprediction,andnovelweighted en-semblelearningasshownin Fig. 1 .First,torepresentcomplicated micrometeorologicaldataeasily,SW-SVRbuilds severalSVRs spe-cializedforeachrepresentativedatagroupinvariousnatural envi-ronments,such asdifferentseasonsandclimates.Thespecialized SVRsarebuiltbasedonD-SDCthatextractseffectivedatafor spe-cificdata predictionbytakingaccount ofmovements:changes of dataduringpredictionhorizons(Fig. 1 (a)).Eachweaklearnerbuilt fromeachextracteddataspecializesonspecificdataandaccurately predicts thedata similar to the specializeddata. Afterward, each weaklearner isaggregated withweightsdetermined dynamically atthe timeof predictionsoastomaintaintheprediction perfor-mance of micrometeorological data whose characteristics always changewithtime(Fig. 1 (b)).Theweightsaredecidedbythe simi-laritybetweentest dataandeachdataspecializedbyweak learn-ers.Evenifthecharacteristicsofmicrometeorologicaldataalways changewithtime, SW-SVRalways givesprioritytoweak learners thatare more suitablefor predictingtestdata. Thedetails of the SW-SVRalgorithm are shown in Algorithm. 3 . The procedure for trainingconsistsof two kindsof preprocessing,iterated learning, anddynamicaggregation.Theprocedures ofeachpartare shown asfollows.

The below-mentionedalgorithms inSW-SVRusetheL2 _norm:

the Euclid distance, and the performance is related to feature space. For example, if feature space includes noisy features or non-linear relationships between features, the performance will probably be reduced substantially. In particular, micrometeoro-logical data has a complex correlation among different features suchastemperatureandhumidity.Accordingly,featurespacemust be mapped into other feature space that takes into account the presence of noise and non-linear relationships. In our approach, we usekernel approximation (Rahimi & Recht, 2007 ) andpartial

Algorithm3

Sliding window-based support vector regression. Input:

Training data set:S={(x1,y1,x1), ...,(xN,yN,xN)}where xi∈X, yi∈Y, xi∈X

Test data: P

Weight parameters: p, q

Preprocessing:

1.apply normalization toXandX

2. ﬁt kernel approximation and PLS regression to X and X

3.Mi=|| xi−xi|| ,i=1 ...N

4.Gt ←each center of kmenas(X), t=1 ...n Fort=1 to ndo 1.Dti=|| Gt−xi|| ,i=1 ...N 2.rt= N i=1( wiMi)/ N i=1( wi)where wi= 1 /Dtip 3.St={ (xi, yi)| Dti<rt} ,i=1 ...N 4.Ht( X) ←train LinearSVR( St) Output: H( P) = n t=1 (wtHt(P))/ n t=1 (wt) where wt= 1 /|| Gt−P|| q

leastsquares(PLS)regression(Tenenhaus, Vinzi, Chatelin, & Lauro, 2005 )tomapintonewfeaturespace.Kernelapproximation gener-ates newfeature spaceandinvolves higher dimensionsthat rep-resent non-linear data as linear data with a very low computa-tionalcomplexity.Actually,acombinationofkernelapproximation andlinearSVMsledtofasterpredictionperformance thatis com-parable tothat of exactSVM (Cao, Naito, & Ninomiya, 2008 ). On the other hand, PLS regression is a supervised dimension reduc-tionmethodology.Thismethodcanreduce dimensionsby extract-ing latentvariablesthat haveastrongrelationship witha depen-dentvariable.Iffeaturespaceincludesnoisyfeatures,theeffectis reducedbecauseofPLS regression.Thecombinationofkernel ap-proximation and PLS regression enables SW-SVR to use effective featurespaceforcalculationoftheL2 _norm _in

micrometeorologi-caldata.

According to our previous research, to accurately predict par-ticularspecificdatainmicrometeorologicaldata,itisnecessaryto extract effectivetraining datafor specificdata prediction (Suzuki et al., 2015 ).In SW-SVR,theseseveralspecific datais selectedin advance,andweaklearnersarebuiltfromextractedeffective train-ingdata forpredictingeachspecific data.Meanwhile, micromete-orologicaldatainvolves variousnaturalenvironmentssuch as dif-ferentseasonsandclimates.Therefore,eachselected specificdata

must represent more varied natural environments that probably

willappearsoastorepresentmicrometeorologicaldatabyseveral models. InSW-SVR, each speciﬁc datais selected by aclustering

(5)

algorithm, k-means(Macqueen, 1967 ).The k-means isone ofthe most famous non-hierarchicalclustering algorithms andclassifies datafasterunderseveralclustersthanotherclusteringalgorithms. In SW-SVR,thek-means classifiesall trainingdatainto thesame numberofclustersasthenumberofweaklearnersgivenbyusers. Then, each centerof clusters isused as specific data that repre-sentsvariousnaturalenvironments.

After selecting severalspeciﬁc data, SW-SVRiterates data ex-traction and building a model. First, SW-SVR extracts effective training data forpredicting each speciﬁc data by D-SDC(Suzuki, Kaneda, & Mineno, 2014 ).ThetheoryofD-SDCissimilartothatof the k-nearestneighbor (k-NN) algorithm,andD-SDCalso extracts sometrainingdatasimilartoaspecializedobject.However,inour D-SDC,theamountofextracteddatadependsonthemovementof aspecializedobjectwithtime.Themovementrmeansthechange ofaspecializedobjectduringpredictionhorizonsasshowninthe followingequation:

rt =

Gt −Gt

where G isaspecializedobject,and G isaspecializedobjectafter predictionhorizons.D-SDCextractstrainingdatawhosenormfrom a specialized object is shorter than the movementr. Accordingly, extractedtrainingdataSbyD-SDCisgivenasfollows:

St =

(

xi, yi

)

|

Gt −xi

<

Gt −Gt

where x is the feature of training data and y is the dependent variable of trainingdata. D-SDCis based on the movementr be-cause the movementr is strongly related to autocorrelation of datasurroundingaspecializedobject.Inmicrometeorologicaldata,

movements in speciﬁc natural environments are mutually

sim-ilar, and the autocorrelation becomes lower when these

move-ments arebigger.Forexample,inJapan,thechangeofweatheris drasticeveryspring,andthenaturalenvironmentschangevarious

other naturalenvironments withtime.Meanwhile, whenwe

pre-dicttimeseriesdatasuchasmicrometeorologicaldata, autocorre-lation meanscorrelation between features anda dependent vari-able, andmore training datais required forhighly accurate pre-diction when autocorrelation is lower. Since D-SDC extracts the amount ofdatasurroundinga specializedobject inproportionto the movementr, extractionthat considers autocorrelation ofdata surroundinga specializedobject isachieved.However, the move-mentrisunknownbecause G isnotobserved.Meanwhile,as men-tionedabove,movementsofdatasurroundingaspecializedobject

are mutually similar. Therefore, D-SDC estimates the movement

r based on movements of training data similar to a specialized object byweightedaverage, wheretheweights are reciprocalsof normsbetweenaspecializedobjectandeachtrainingdata. Move-ments oftrainingdata canbe calculatedby referring tothe time wheneachtrainingdataisobserved.Theestimatedmovementris givenasfollows: rt =

Gt −Gt

≈ N i =1wi

xi −xi

N i =1wi wherewi =

_G 1 t −xi

p , Nisthenumberoftrainingdata,andpisaweighted parame-ter.Afterward,SW-SVRbuildsseverallinearSVRsasweaklearners basedontheextracteddata.Asdescribedabove,acombinationof linear SVR andkernel approximation iscomparableto SVR using a kernel method.Moreover, linear SVR can be built much faster by usingliblinear (Fan, Chang, Hsieh, Wang, & Lin, 2008 ), an op-timized implementationforlinear SVMs, insteadofother general implementationsofSVMssuch aslibSVM(Chang & Lin, 2011 ). Al-thoughausablekernelinliblinearisrestrictedtothelinearkernel, liblinear can build the model much faster by solving the primal probleminsteadofthedualproblem. Furthermore,sinceall train-ing datais divided into smalleramounts of extracteddata, each

modelcan bebuiltfaster, anditiseasierto learneach extracted databyparallelprocessing.

Thepredicted valuesofSW-SVR take intoaccount the change ofnaturalenvironments withtime.In generalensemblelearning, prediction for regression depends on the weighted average, and theweightsaredeterminedatthetimeoftraining.However, SW-SVRdeterminesweightsdynamicallyatthetimeofprediction.The

weightsare determined bythenorm betweentestdataandeach

dataspecializedbyweaklearners.AﬁnalhypothesisofSW-SVRis shownasfollows: H

(

P

)

= n t=1wt Ht

(

P

)

n t=1wt wherewt = 1

Gt −P

q ,

P is the test data, n is the number of weak learners, H(X) is

a hypothesis, and q is a weighted parameter. In our approach,

since the weights of ensemble learning are determined

dynami-callyforeveryprediction,SW-SVRcanfollowmicrometeorological datawhosecharacteristicsalwayschangewithtime.

Finally,we describe thecomputational complexity ofSW-SVR. Torepresentcomplicatedmicrometeorologicaldataeasily,SW-SVR

uses the various conventional methods besides D-SDC we

pro-posed:kernel approximation, PLS regression,k-means, andlinear

SVR. The computational complexity of these methods in general

increases linearly; in other words, the computational complexity isapproximatelyequaltoO(N)wherethenumberoftrainingdata

Nisevenbiggerthanthenumberofthedimensionsandeach pa-rameterofthesemethods. Moreover,thecomputational complex-ityofD-SDCcorresponds to O(nN) because D-SDCjustiteratesN

timesofdistancecalculationn+ 1timeswherenisthenumberof weaklearnersinSW-SVR.Therefore,ifNisevenbiggerthann,the computationalcomplexityofD-SDCalsoincreaseslinearly.The to-talcomputationalcomplexityofSW-SVRisapproximatelyequalto

O(N)thatisevenlessthanthatofSVR. 4. Evaluation

4.1. Experiment

Wecomparedtheperformance ofSW-SVRwithotherstandard

methods for regression: k-NN, decisiontree (DT), Adaboost, bag-ging,random forest (RF), gradient boosting (GB), linear SVR, and SVR usinga radial basis function (RBF)kernel that showshigher performance in various ﬁelds(RBF-SVR). Note that the kernel of kernelapproximation inSW-SVR is also the RBF kernel,and the baselearnerinAdaboostandbaggingisthedecisiontreethathas

been used generally. Moreover, to evaluate SW-SVR in more

de-tail,we evaluated the performance of linear SVR with mapping: standardlinearSVRtowhichthesamemappingasSW-SVRis

ap-plied(“mappedSVR”). MappedSVR clariﬁes each performance of

mappingfeaturespaceandensemblelearningbasedonD-SDC.All parameters ofthe usedmodels were adjusted by the gridsearch method.Baseline forthis evaluationwas the performance of the naivestpersistentmodelasshowninthefollowingformula:

ˆ

yi + t=yi

whereyˆisthe predictedvalue,y isthe truevalue, and

t is the predictionhorizons.

We evaluated the performance by two ways: hold-out valida-tion and 10-fold cross-validation. We predicted the temperature after 1h and 6 h by using large-scale micrometeorological data inTokyo (Japan Meteorological Agency, n.d. ). The dataconsistsof atmosphericpressure,temperature,relativehumidity,windspeed, andirradiance. Inhold-outvalidation,trainingperiodsarelimited totheearlierperiodsthantestingperiodssoastoassume practi-cal use;test dataisalways predictedbasedon pasttrainingdata in practical use. The training periods were from 3 months to 5

(6)

(a) Testing periods: 1 month

.

(b) Testing periods: 6 months

.

(c) Testing periods: 12 months

.

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 3 6 12 24 36 60 MAPE [%] (log scale)

Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 8.0E+00 1.6E+01 3.2E+01 6.4E+01 1.3E+02 2.6E+02 5.1E+02 3 6 12 24 36 60 MAPE [%] (log scale)

Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 5.0E+00 1.0E+01 2.0E+01 4.0E+01 8.0E+01 1.6E+02 3.2E+02 3 6 12 24 36 60 MAPE [%] (log scale)

Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR

Fig.2. MAPE for prediction after 1 h for each algorithm. Note that (b) and (c) are shown with log scale.

yearsbefore September 1,2014, andtestingperiods werefrom 1 monthto 1 yearlater the same day.By varying the training pe-riodsandthetestingperiods, theperformance under thevarious usage scenarios is evaluated. Onthe other hand, the periods for 10-fold cross-validationwere 6years from September 1,2009 to

September1,2015. Notethat theamount ofdataper monthwas

approximately4000 becausethe data was accumulatedevery 10 minutes.Inthisevaluation,weusedthemeanabsolutepercentage error(MAPE) asthe index of prediction error andbuilding time calculatedbasedontheCPU clocktime asthe indexof computa-tionalcomplexity.MAPEisshownasfollows:

MAPE= 100 N N i=1

y_i−yˆ_i y_i

whereN isthenumber oftestdata, yis thetruevalue, andyˆis thepredicted value. Moreover, we evaluated the averageof each extractionratebyD-SDCin eachexperimental conditionsoasto analyzetheperformanceofSW-SVRandD-SDCfurther.All imple-mentationsforthisevaluationareinPython,andimplementations inscikit-learn (Pedregosa et al., 2012 ) were usedforall methods

except SW-SVR. This evaluation wasperformed on a single core

ofa machinewithan IntelCorei5-2500KProcessorand12GBof

RAM;even thoughseveralmethods, suchrandom forestand

SW-SVR,can be performed onparallel processing, the methods were performedona singlecoresoastoevaluatethebuildingtime of allmethodsfairly.

4.2.Resultsanddiscussion

Fig. 2 and 3 show theprediction errorinthe prediction hori-zons of 1h and 6 h, respectively. Note that a log scale is used in Figs. 2 (b),(c), 3 (b), and(c). The results indicate that SW-SVR produced thebest average performance in all models duringthe whole testing periods, training periods, and prediction horizons. Inparticular,theeffectoccursnoticeablywhentestingperiodsare longerthantrainingperiods.Ontheother hand,inthissituation,

almostallmethods exceptSW-SVRhaveoftenlower performance

thanthenaivestpersistentmodelasbaseline.Theresults demon-stratethat theconventional superior methodsdo not always dis-play the great performance for micrometeorological data predic-tiondepending on diﬃculty of theprediction caused by training periodsandtestingperiodsandpredictionhorizons.Moreover, in

algorithms based onSVR, the prediction performance of SW-SVR

isalmostthebest,followedinorderbythoseofRBF-SVR,mapped

SVR,andlinearSVR.ThedifferencebetweenmappedSVRand lin-earSVRisduetotheeffectofmappingfeaturespace.Ontheother

hand,thedifference betweenSW-SVRandmappedSVR isdueto

the effect of ensemble learning based on D-SDC. These

compar-isons demonstratedthat both mappingfeature spaceand

ensem-blelearningbasedonD-SDCareeffectiveforimprovingprediction

performance. Meanwhile, mappedSVR also tended tohave lower

predictionperformancethanthatofSW-SVRwhenthetesting pe-riodsarelongerthanthe trainingperiods.Accordingly,underthis condition,ensemblelearningbasedonD-SDCisparticularly effec-tive. When the testingperiods are longer than the training peri-ods, the effectivetrainingdata for predictingthe test datais re-duced. We considered that a little training data that D-SDC ex-tractedforbuildingmodelscorrespondedtotheeffectivetraining dataforpredictingthetestdata.Actually, Fig. 4 indicatesthe aver-ageofeach extractionratebyD-SDCanddemonstratesthatweak learnersofSW-SVRare alwaysbuiltfromavery smallproportion ofthewholetrainingdata.SW-SVRthat alwayspredicts microm-eteorologicaldata accurately regardlessofthe amountoftraining dataisverypracticalanduseful.

Table 1 showstheresultsof10-foldcross-validationinthe pre-diction horizonsof1h and6h.SW-SVRwasoftensuperiortoall

methods including RBF-SVM in hold-out validation. However, in

10-fold cross-validation,although SW-SVRhadhigher the predic-tion performance than that ofall methods except RBF-SVR, RBF-SVRwassuperiortoSW-SVRslightly.Theresultsdemonstratethat thepredictionperformanceofSW-SVRisaffectedbytemporal or-der betweentraining dataand test data,andSW-SVR is particu-larlysuitedtobeusedforpracticaluseinwhichtestdataisalways predictedbasedonpasttrainingdata.Meanwhile,evenin10-fold cross-validation,themagnituderelationofthepredictionerror

be-tween mappedSVRandlinearSVR andSW-SVR wassameasthe

caseofhold-outvalidation.Therefore,bothmappingfeaturespace andensemblelearningbasedonD-SDCareeffectiveforimproving predictionperformanceincross-validation.

Fig. 5 and 6 show the building time in the prediction hori-zons of 1h and 6 h, respectively. Figs. 5 (a) and 6 (a) show the buildingtimeofmodelsthat havehighpredictionperformanceas shownin Figs. 2 and 3 ,RF,GB,RBF-SVR,andSW-SVR,when train-ing periods were varied. Note that the numberof weak learners was1000intheensemblelearningseries,costparameterwas1in theSVRseries,and

σ

ofSW-SVRwas0.00001;

σ

ofSW-SVRwasa parameteroftheRBFkernelinkernelapproximation.Theseresults demonstratedthatthebuildingtimeofensemblelearning,suchas SW-SVR,increasesmoregentlythanthatofSVR.Inparticular,the

(7)

(a) Testing periods: 1 month

.

(b) Testing periods: 6 months

.

(c) Testing periods: 12 months

.

7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 3 6 12 24 36 60 MAPE [ % ] (log scale)

Training periods [months]

Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 3.0E+01 6.0E+01 1.2E+02 2.4E+02 4.8E+02 3 6 12 24 36 60 MAPE [ % ] (log scale)

Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 2.0E+01 4.0E+01 8.0E+01 1.6E+02 3.2E+02 3 6 12 24 36 60 MAPE [%] (log scale)

Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR

Fig.3. MAPE for prediction after 6 h for each algorithm. Note that (b) and (c) are shown with log scale.

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 3 6 12 24 36 60 E x tra c ti o n rate [%]

Number of weak learners: 10 Number of weak learners: 100 Number of weak learners: 1000

0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 3 6 12 24 36 60 E x tra c ti o n rate [%]

Number of weak learners: 10 Number of weak learners: 100 Number of weak learners: 1000

(b) Prediction horizons: 6 hours.

(a) Prediction horizons: 1 hour.

Fig.4. Average of each extraction rate by D-SDC in SW-SVR.

Table1

MAPE of 10-fold cross-validation for each algorithm. Methods

Prediction horizons SW-SVR k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR Persistent 1h 5 .18608 8 .59929 5 .81042 11 .10375 10 .24014 5 .57213 5 .27190 5 .43892 5 .25274 5 .16985 5 .96816 6h 23 .49826 26 .52433 25 .99290 29 .93160 29 .58125 25 .55044 24 .14987 24 .68383 24 .26108 20 .94132 24 .86800 1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 Ti m e [ s e c ] (log scale) 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 Ti m e [ s e c ] (log scale) RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 SW-SVR RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 RBF SVR: σ = 0.1 RBF-SVR: σ = 0.00001 SW-SVR -1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 Ti m e [ s e c ] (log scale)

(b) Ensemble learning series.

(c) SVR series.

(a) Different training periods.

3 6 12 24 36 60

1 5 10 50 100

Cost

10 50 100 500 1000

Number of weak learners

Linear SVR RBF-SVR: σ = 0.1 RBF-SVR: σ = 0.001 RBF-SVR: σ = 0.00001 SW SVR: σ = 0.1 SW SVR: σ = 0.001 SW SVR: σ = 0.00001

(8)

(b) Ensemble learning series.

(c) SVR series.

(a) Different training periods.

1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 3 6 12 24 36 60 Ti m e [ s e c ] (log scale)

1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 10 50 100 500 1000 Ti m e [ s e c ] (log scale)

Number of weak learners

1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1 5 10 50 100 Ti m e [ s e c ] (l og scale) Cost RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 RBF-SVR: σ = 0.1 RBF-SVR: σ = 0.00001 SW-SVR RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 SW-SVR Linear SVR RBF-SVR: σ = 0.1 RBF-SVR: σ = 0.001 RBF-SVR: σ = 0.00001 SW-SVR: σ = 0.1 SW-SVR: σ = 0.001 SW-SVR: σ = 0.00001

Fig.6. Building time for prediction after 6 h for each model. Note that all ﬁgures are shown with log scale.

buildingtimeofSW-SVRisshortestwhenthetrainingperiods be-comelonger. Inotherwords,therateofbuildingtimeincrease of SW-SVRisthegentlestinallthemethods whentrainingdata in-creases.Theseresultsindicatethat,asmentionedabove,the com-putationalcomplexityofSW-SVRislessthanthat ofconventional

methodsincluding random forest and gradientboosting. SW-SVR

iseffectivefortrainingofanenormousamountofdataintermsof buildingtime.

Next, Figs. 5 (b)and 6 (b) show the buildingtime of the mod-elswithbetterperformanceinensemblelearning,RF,GB,and SW-SVR,whenthenumberofweaklearnerswasvaried.Notethatthe costparameterofSW-SVRwas1,

σ

ofSW-SVRwas0.00001,and

trainingperiods were 12 months. SW-SVR needs a longer

build-ingtime thanRF andGBusing shallowDT when thenumber of

weaklearnersislower.However, whenthe depthofDT becomes

deeperorthe numberofweak learnersbecomeshigher, SW-SVR

canbuild the model fasterthan orat thesame speed asRF and GB. Moreover, SW-SVR, as with RF,can be run easily in parallel environments,anditisexpectedthatthebuildingtimeofSW-SVR willbecomeevenshorter.

Finally, Figs. 5 (c)and 6 (c)show thebuildingtime ofthe

mod-elsbased onSVR when theparameters ofSVR were varied. Note

thatthenumberofweaklearnerswas100,andthetraining peri-odswere12months.Theseresultsindicatethatthebuildingtime ofSW-SVRissigniﬁcantlyshorterthanthatofRBF-SVRbutlonger thanlinearSVR.Meanwhile, Fig. 4 demonstrates thatweak learn-ersofSW-SVRarealwaysbuiltfromaverysmallproportionofthe wholetrainingdata.Inparticular,when predictionhorizonswere 1h, the averageof each extraction ratewas0.47 percent at best

and 1.82 percent at worst. On the other hand, when prediction

horizons were 6 h, the average of each extraction rate was7.57 percentatbestand16.25percentatworst. Nevertheless,the rea-sonthecomputationalcomplexityofSW-SVRislargerthanlinear SVRisthattheincreaseofcomputationalcomplexitydueto build-ingseveralmodelsislarger.However,sincetheamountoftraining dataofeachweaklearnerreducessubstantially,thecomputational

complexityto build one model in SW-SVR reduces also.

Accord-ingly,whenthenumberofmodelsoneCPU buildsreducesby us-ingparallelprocessing,thecomputationalcomplexityoftheoverall SW-SVRislowerthanorequaltothatoflinearSVR.Meanwhile,as

withlinearSVR,SW-SVRneverdependsonthechangeof

param-etersrelated to SVR, andthe buildingtime is always a constant. As mentioned in theabove discussion, thebuilding time of SW-SVRsolelydependsonthe numberofweaklearnersandtraining

periods.Therefore,SW-SVRcanavoidanunexpectedlongbuilding timeinparametertuningthatchangeseachparametervariously.

These results demonstrate that SW-SVR predicts complicated

micrometeorological data with the best prediction performance

andthelowestcomputationalcomplexitycomparedwithstandard algorithms. In particular, we found that dynamic aggregation of models builtfromverylittle extracteddataby D-SDCiseffective forcompatibilityofhighpredictionperformance andlow compu-tational complexity.However, there are problemsto be solved in SW-SVR.Firstly,thepredictionperformanceofSW-SVRsometimes deterioratesdespiteanincreaseoftrainingdata.Inparticular,this problem occurred under the conditions that prediction horizons are 6 h asshownin Fig. 3 . Thisis because dataextracted by D-SDCinvolvesunnecessarytrainingdataforhighlyaccurate predic-tion.IfD-SDCextracts thesamedata asthe extracteddatawhen trainingperiodsareshorter,thepredictionperformanceofSW-SVR never deteriorates dueto an increase oftraining data.Therefore,

we must review both feature mapping and algorithms of D-SDC

so as to avoid extracting unnecessary training data. Meanwhile, SW-SVR is based on a combinationof severalalgorithms: kernel approximation, PLS regression, k-means, D-SDC, and linear SVR. Moreover, each algorithm has severalparameters. Therefore, SW-SVR hasmore variedparameters, andit takesmoretime to tune theparameters.Inthisexperiment,weusedagridsearch roughly soastodecidetheparametersinacertaintime.However,thereis still room forimprovementin theprediction performance by us-ingotherapproachessuchasageneticalgorithminsteadofagrid search(Huang & Wang, 2006 ).

5. Conclusion and future work

In thispaper,we proposed a new methodologyforpredicting micrometeorologicaldata,SW-SVRthatinvolvesanovel combina-tionofSVR andensemblelearning.TotaketheadvantagesofSVR andensemblelearning,SW-SVRbuildsseveralSVRsspecializedfor eachrepresentativedatagroupinvariousnaturalenvironmentsby usingD-SDC that extracts effectivetrainingdata forspeciﬁc data prediction. Moreover, to follow micrometeorological data whose characteristics always change withtime, prediction ofSW-SVR is

based on dynamically weightedensemble learningdepending on

thesimilaritybetweentestdataandeachdataspecializedbyweak learners. As a result of evaluation experiments using large-scale micrometeorological data,the prediction performance of SW-SVR isgreaterthanorequaltoothergeneralmethodssuchasSVR,RF,

(9)

andGB.Moreover,SW-SVRreducesthebuildingtimesubstantially comparedwithcomplicatedmodelsthathavehighprediction per-formance.Weanticipatethatdynamicaggregationofmodelsbuilt from variouskinds of extracteddata by D-SDCcan contribute to moresophisticatedstudiesofmicrometeorologicaldataprediction. Infuturework,weshouldevaluateSW-SVRinmorevaried sit-uations to show that SW-SVR workseffectively. In particular, we will use more complicated data that consists of many features. Furthermore,whenSW-SVRisappliedtoapplicationssuchas en-vironmental control systems, the performance ofoverall applica-tions should be evaluated. Currently,we have developedan

agri-cultural support system using SW-SVR, which controls

environ-mentsingreenhousesdependingontheactivityoftheplants.The evaluationoftheapplicationswilldescribethesuperiorityof SW-SVRinpracticaluse.

Acknowledgements

This study was partially supported by JST, PRESTO , and JSPS KAKENHI(26 6 60198 ),Japan.

References

Antonanzas,J.,Urraca,R.,Martinez-de-Pison,F.J.,&Antonanzas-Torres, F.(2015). Solarirradiationmappingwithexogenousdatafromsupportvectorregression machinesestimations.EnergyConversionandManagement,100,380–390. Breiman, L. (2001). Random forests. MachineLearning,45(1), 5–32 http://doi.org/10.

1023/A:1010933404324.

Cao, H., Naito, T., & Ninomiya, Y. (2008). Approximate RBF kernel SVM and its applications in pedestrian classiﬁcation. The1stInternational Workshopon Ma-chine Learning for Visionbased Motion Analysis - MLVMA’08, 1–9 http://hal. archives-ouvertes.fr/inria-00325810/.

Chang, C., & Lin, C. (2011). LIBSVM : A library for support vector machines. ACM TransactionsonIntelligentSystemsandTechnology(TIST),2, 1–39 http://doi.org/ 10.1145/1961189.1961199.

Chevalier,R.F.,Hoogenboom,G.,McClendon,R.W.,&Paz,J.A.(2011).Support vec-torregressionwithreducedtrainingsetsforairtemperatureprediction:A com-parisonwithartiﬁcialneuralnetworks.NeuralComputing&Applications,20(1), 151–159Retrievedfrom<GotoISI>://WOS:000286674800015.

Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classiﬁcation. TheJournalofMachineLearning,9(2008), 1871–1874 http://doi.org/10.1038/oby.2011.351.

Freund, Y., & Schapire, R. (1997). A desicion-theoretic generalization of on-line learning and an application to boosting. ComputationalLearningTheory,55(1), 119–139 http://doi.org/10.1006/jcss.1997.1504.

Friedman,J. H.(2001). Greedyfunction approximation:Agradientboosting ma-chine.AnnalsofStatistics,29(5),1189–1232.

Huang, C. L., & Wang, C. J. (2006). A GA-based feature selection and parameters optimizationfor support vector machines. ExpertSystemswithApplications,31(2), 231–240 http://doi.org/10.1016/j.eswa.2005.09.024.

Japan Meteorological Agency. (n.d.).. Japanmeteorologicalagencyhttp://www.jma.go. jp/jma/indexe.html.

Kisi, O., & Cimen, M. (2012). Precipitation forecasting by using wavelet-support vector machine conjunction model. EngineeringApplicationsofArtiﬁcialIntelligence, 25(4), 783–792 http://doi.org/10.1016/j.engappai.2011.11.003.

Kolokotsa, D., Pouliezos, A., Stavrakakis, G., & Lazos, C. (2009). Predictive control techniques for energy and indoor environmental quality management in buildings. BuildingandEnvironment,44(9), 1850–1863 http://doi.org/10.1016/j. buildenv.2008.12.007.

Loosli,G.(2007).Commentsonthecorevectormachines:fastSVMtrainingonvery largedatasets.TheJournalofMachineLearningResearch,8,291–301.

Macqueen, J. (1967). Some methods for classiﬁcation and analysis of multivari- ate observations. In Proceedingsoftheﬁfthberkeleysymposiumon mathemati-calstatisticsandprobability:1 (pp. 281–297). http://doi.org/citeulike-article-id: 6083430.

Maity,R.,Bhagwat,P.,&Bhatnagar,A.(2010).Potentialofsupportvectorregression forpredictionofmonthlystreamﬂowusingendogenousproperty.Hydrological Processes,24(7),917–923.

Mohammadi,K.,Shamshirband,S.,Anisi,M.H.,Alam,K.A.,&Petkovi´c,D.(2015). Supportvectorregressionbasedpredictionofglobalsolarradiationona hori-zontalsurface.EnergyConversionandManagement,91,433–441.

Othman, M. F., & Shazali, K. (2012). Wireless sensor network applications: A study in environment monitoring system. In ProcediaEngineering:41(pp. 1204–1210). http://doi.org/10.1016/j.proeng.2012.07.302.

Park, D. H., & Park, J. W. (2011). Wireless sensor network-based greenhouse environment monitoring and automatic control system for dew condensation pre- vention. Sensors,11(4), 3640–3651 http://doi.org/10.3390/s110403640.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Courna- peau, D., Brucher, M., Perrot, M. & Duchesnay, É. (2011). Scikit-learn: machine learning in python. TheJournal ofMachine LearningResearch 12, 2825–2830. http://doi.org/10.1007/s13398-014-0173-7.2

Platt, J. C. (1998). Fast training of support vector machines using sequential minimal optimization. AdvancesinKernelMethods, 185–208 http://doi.org/10.1109/ISKE. 2008.4731075.

Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines.

AdvancesinNeuralInformationProcessingSystems,20, 1177–1184 http://doi.org/ 10.1.1.145.8736.

Singh, K. P., Gupta, S., & Rai, P. (2013). Identifying pollution sources and predicting urban air quality using ensemble learning methods. AtmosphericEnvironment, 80, 426–437 http://doi.org/10.1016/j.atmosenv.2013.08.023.

Smith, B. A., Hoogenboom, G., & McClendon, R. W. (2009). Artiﬁcial neural networks for automated year-round temperature prediction. ComputersandElectronicsin Agriculture,68(1), 52–61 http://doi.org/10.1016/j.compag.2009.04.003. Suzuki,Y.,Kaneda,Y.,&Mineno,H.(2014).SW-SVRimprovedbyshort-distancedata

collectionmethod(pp.1–8)IPSJSIGTechnicalReport,2014-MBL-73(9).

Suzuki, Y., Kaneda, Y., & Mineno, H. (2015). Analysis of support vector regression model for micrometeorological data prediction. ComputerScienceand Informa-tionTechnology,3(2), 37–48 http://doi.org/10.13189/csit.2015.030202.

Tenenhaus, M., Vinzi, V. E., Chatelin, Y. M., & Lauro, C. (2005). PLS path modeling.

ComputationalStatisticsandDataAnalysis,48(1), 159–205 http://doi.org/10.1016/ j.csda.2004.03.005.

Tsang, I. W., Kwok, J. T., & Cheung, P.-M. (2005). Core vector machines: Fast SVM training on very large data sets. JournalofMachineLearningResearch,6, 363– 392 http://doi.org/10.1111/j.1442-9993.2007.01810.x.

Urraca,R.,Antonanzas,J.,Martinez-de-Pison,F.J.,&Antonanzas-Torres,F.(2015). Estimationofsolarglobalirradiationinremoteareas.JournalofRenewableand SustainableEnergy,7(2),023136.

Vapnik, V. N. (1995). TheNatureofStatisticalLearningTheory: Vol. 8. Springer http: //doi.org/10.1109/TNN.1997.641482.

Wang, B. X., & Japkowicz, N. (2009). Boosting support vector machines for imbalanced data sets. KnowledgeandInformationSystems,25(1), 1–20 http://doi.org/ 10.1007/s10115-009-0198-y.

Xie, Y., Li, X., Ngai, E. W. T., & Ying, W. (2009). Customer churn prediction using improved balanced random forests. ExpertSystemswithApplications,36(3 PART 1), 5445–5449 http://doi.org/10.1016/j.eswa.2008.06.121.

217–225

ScienceDirect

www.elsevier.com/locate/eswa

(

PRESTO ,

380–390

1023/A:1010933404324

10.1145/1961189.1961199

015

http://doi.org/10.1038/oby.2011.351

http://doi.org/10.1006/jcss.1997.1504

1189–1232

http://doi.org/10.1016/j.eswa.2005.09.024

Japan Meteorological

http://doi.org/10.1016/j.engappai.2011.11.003

buildings.

Loosli,

6083430

917–923

433–441

http://doi.org/10.1016/j.proeng.2012.07.302

http://doi.org/10.3390/s110403640

7.2

2008.4731075

10.1.1.145.8736

http://doi.org/10.1016/j.atmosenv.2013.08.023

http://doi.org/10.1016/j.compag.20

Suzuki,

http://doi.org/10.13189/csit.2015.030202

04.03.0

http://doi.org/10.1111/j.1442-9993.2007.01810.x

023136

Vapnik, V. N. (1995).

0198-

http://doi.org/10.1016/j.eswa.2008.06.121