Contents lists available at ScienceDirect
Expert
Systems
With
Applications
journal homepage: www.elsevier.com/locate/eswa
Sliding
window-based
support
vector
regression
for
predicting
micrometeorological
data
Yukimasa
Kaneda
a ,∗,
Hiroshi
Mineno
b ,caGraduateSchoolofIntegratedScienceandTechnology,ShizuokaUniversity,3-5-1Johoku,Naka-ku,Hamamatsu,Shizuoka432-8011,Japan bCollegeofInformatics,AcademicInstitute,ShizuokaUniversity,3-5-1Johoku,Naka-ku,Hamamatsu,Shizuoka432-8011,Japan
cJST,PRESTO,4-1-8Honcho,Kawaguchi,Saitama,332-0012,Japan
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:
Received 4 February 2016 Revised 29 March 2016 Accepted 13 April 2016 Available online 23 April 2016
Keywords:
Predicting micrometeorological data Data extraction
Dynamic aggregation Support vector regression Ensemble learning
a
b
s
t
r
a
c
t
Sensornetworktechnologyisbecomingmorewidespreadandsophisticated,anddeviceswithmany
sen-sors,suchassmartphonesandsensornodes,havebeenusedextensively.Sincethesedeviceshavemore
easilyaccumulatedvariouskindsofmicrometeorologicaldata,suchastemperature,humidity,andwind
speed,anenormousamountofmicrometeorologicaldatahasbeenaccumulated.Inrecentyears,ithas
beenexpectedthat suchanenormousamountofdata,called bigdata, willproduce novelknowledge
andvalue.Accordingly,manycurrentapplicationshaveuseddataminingtechnologyormachine
learn-ingtoexploitbigdata.However,micrometeorologicaldatahasacomplicatedcorrelationamongdifferent
features,anditscharacteristicschangevariouslywithtime.Therefore,itisdifficulttopredict
microme-teorologicaldataaccuratelywithlowcomputationalcomplexityevenifstate-of-the-artmachinelearning
algorithmsareused.Inthispaper,weproposeanew methodologyfor predictingmicrometeorological
data,slidingwindow-basedsupportvectorregression(SW-SVR)thatinvolvesanovelcombinationof
sup-portvectorregression(SVR)and ensemblelearning.Torepresentcomplicatedmicrometeorologicaldata
easily,SW-SVRbuildsseveralSVRsspecializedforeachrepresentativedatagroupinvariousnatural
envi-ronments,suchasdifferentseasonsandclimates,andchangesweightstoaggregatetheSVRsdynamically
dependingonthecharacteristicsoftestdata.Inourexperiment,wepredictedthetemperatureafter1h
and6hbyusinglarge-scalemicrometeorologicaldatainTokyo.Asaresult,regardlessoftestingperiods,
trainingperiods,andpredictionhorizons,thepredictionperformanceofSW-SVRwasalwaysgreaterthan
orequaltoothergeneralmethodssuchasSVR,randomforest,andgradientboosting.Atthesametime,
SW-SVRreduced thebuildingtimeremarkablycomparedwiththoseofcomplicatedmodels thathave
highpredictionperformance.
© 2016TheAuthors.PublishedbyElsevierLtd.
ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/ ).
1. Introduction
Sensornetwork technologyis becoming morewidespread and
sophisticated,anddeviceswithmanysensors havebeenused ex-tensively. The devicescanvery easily obtainvarious kindsof mi-crometeorological data such as temperature, humidity, and wind speed.Micrometeorologicaldataisaffectedstronglybythesurface oftheearthandisrelatedtoourlivesandindustrialactivity. Ac-cordingly, the data hasbeen used by many applications such as environmental control systemsfor greenhouses (Othman & Shaz- ali, 2012; Park & Park, 2011 ). Moreover, more advanced applica-tionsexploitthedatatoagreaterextentbyusingmachinelearning anddataminingtechnology.Furthermore,anenormousamountof
∗ Corresponding author.
E-mailaddress:[email protected] (Y. Kaneda).
micrometeorologicaldatahasbeenaccumulatedbymanydevices, andithasbeenexpectedthatanalyzingsuchanenormousamount ofdata,calledbigdata,willproducenovelknowledgeandvalue.
Topredictmicrometeorologicaldataeffectively,anumberof re-searchershave studied machine learning(Smith, Hoogenboom, & McClendon, 2009 ).Theseresearchersdescribedpredictionmethods formicrometeorological data;particularly,predictionperformance
andcomputational complexitywereoftenmentioned.Meanwhile,
micrometeorologicaldatahasacomplexcorrelation among differ-entfeaturessuchastemperatureandhumidity.Moreover,its char-acteristicschange variouslywithtime. Therefore,evenifbig data isgiven astraining data,it is not easy to predict micrometeoro-logicaldataaccurately.Furthermore,inmanycases, sothat
mod-els can have high prediction performance, they have to become
complicated,andthecomputationalcomplexityincreases. Accord-ingly,some models probably cannot be builtfrom big data in a
http://dx.doi.org/10.1016/j.eswa.2016.04.012
practical amount of computing time. In other words, there is a trade-off relationship between high prediction performance and lowcomputational complexity.However,compatibilityis required insome practical use. As the prediction performance in applica-tionsbecomeshigher,thequalityprovidedbytheapplications be-comes better. For example, in the case of environmental control systemsbased on prediction (Kolokotsa, Pouliezos, Stavrakakis, & Lazos, 2009 ), the higherprediction performance enables the sys-temsto provideprecise control, precise management, andbetter environments.Ontheotherhand,modelsthatneedalongtimefor trainingareworthlessinpracticaluse.Incurrentsituationswhere theamountofusabledatahasincreasedremarkably,thistrade-off relationshiphasbecomeamorecriticalissue.
Recently, onetypeofmachinelearningalgorithm,support vec-tormachines(SVMs),havebeenusedsuccessfullyinvariousfields.
The basic theory is a more efficient learning method based on
probably approximately correct (PAC) learning. Moreover, SVMs
can separate non-linear data with low computational
complex-ity.Since most data observed in the real world is likely to have
non-linear relationships, SVMs have also been applied to
mi-crometeorological data prediction (Antonanzas, Urraca, Martinez- de-Pison, & Antonanzas-Torres, 2015; Mohammadi, Shamshirband, Anisi, Alam, & Petkovi ´c, 2015; Urraca, Antonanzas, Martinez-de- Pison, & Antonanzas-Torres, 2015 ). Moreover, SVMs led to better prediction performance than other algorithms such as artificial
neural networks (ANNs) and the autoregressive integrated
mov-ing average (ARIMA) model (Chevalier, Hoogenboom, McClendon, & Paz, 2011; Maity, Bhagwat, & Bhatnagar, 2010 ). However,when SVMslearnbigdata,thecomputationalcomplexityisstillamatter ofconcern.Anotheralternative learningmethod,ensemble learn-ing, hasalso been used more widely for predicting micrometeo-rological data (Singh, Gupta, & Rai, 2013 ). The prediction perfor-mance of ensemble learning is greater than or equal to that of
SVMs. The basic methodology isa combination of weak learners
builtfromdifferentkindsoftrainingdata.The combinationyields a higher generalizing capability that a single model cannot rep-resent. In particular, some researchers proposed improved meth-odsthat could be applied to micrometeorologicaldata prediction (Wang & Japkowicz, 2009; Xie, Li, Ngai, & Ying, 2009 ). However, itis difficultto apply the methods to regression,andit is possi-blethatthemodelswillnotbeabletofollowmicrometeorological datawhosecharacteristicsalwayschangewithtime.
In this paper, we propose a new methodology for predicting
micrometeorologicaldata,slidingwindow-basedsupportvector re-gression(SW-SVR).SW-SVRinvolvesa novel combinationof sup-portvector regression (SVR) andensemble learning. Torepresent complicatedmicrometeorological data easily, SW-SVR builds sev-eralSVRs specialized foreach representative data group in vari-ousnaturalenvironments,such asdifferentseasonsandclimates.
The specialized SVRs are built based on our previous proposed
method,dynamic short-distance data collection (D-SDC) that ex-tractseffectivedata forspecific datapredictionbytakingaccount ofmovements: changes indata during prediction horizons. Each weak learner built from each extracted data specializes on spe-cificdataand predicts accurately the datasimilar to the special-izeddata.Then,SW-SVRaggregatesallthepredictedvaluesbased onweights decided bythe similaritybetweentest dataandeach
data specialized by weak learners. This new ensemble learning
methodologythat changesweights dynamically enablesfollowing micrometeorologicaldatawhosecharacteristicshardlychangewith time.Ourresultsdemonstratedthatthepredictionperformanceof SW-SVRwasalways greaterthanorequaltothatofothergeneral methodssuchasSVR,randomforest,andgradientboosting.Atthe
same time, SW-SVR reduced the building time remarkably
com-paredwiththat ofcomplicatedmodels thathave highprediction performance.
2. Related work
As mentioned in the introduction, to predict micrometeoro-logical data effectively, SVMsand ensemble learninghave gener-ally been used. These algorithms have higher prediction perfor-manceformicrometeorologicaldatathan traditionalmethods
be-cause SVMsuse not only a margin maximizing algorithm whose
great performance was proved by PAC learningbutalso the ker-nel trick that enables non-linear separation. On the other hand, ensemble learning provides higher generalizing capability that a singlemodelcannotrepresent.Inthissection,abriefsummaryof thesealgorithms andsome improvedalgorithms aregiven.
More-over, so that SW-SVR can draw advantagesfrom both SVMsand
ensemblelearning, severalproblemsofthesealgorithmsfor prac-ticalusearediscussed.
2.1. Supportvectorregression
SVMs,introducedby Vapnik,(1995 ),havebeenusedsuccessfully in variousfields. Inthe simplestcase, binary classification,SVMs
obtain a separatinghyperplane decided by maximizing the
mar-gin. The margin means thenorms betweendifferent classes.PAC learningprovedthatmaximizingthemarginproduceshigh gener-alizationability.Moreover,thekerneltrickenablesSVMsto sepa-ratedatanon-linearlywithlowcomputationalcomplexity.Various kinds ofdata observed in the realworld are likelyto have non-linearrelationships. Accordingly, SVMsareused inmany applica-tions such as micrometeorological dataprediction (Kisi & Cimen, 2012; Maity et al., 2010 ).Meanwhile,SVMsforregression,support vectorregression(SVR),usesthesamemethodologyasSVMsthat havethehighestgeneralizationability.Inthissection,abrief sum-maryofSVRisgivenasfollows.
First,thelinearfunctionforregressionisgivenasfollows:
f
(
x)
=wTx+b.Then,aswithSVMs,SVRalsominimizesthenormoftheweight vectorw ;the L2 norm
w
2 isoftenused, andminimizing
w
2
corresponds to maximizing themargin. Meanwhile, SVR tolerates predictionerror
.Therefore,theprimalproblemofSVR isshown asfollows: minimizew 2 2 subjectto y i −wTxi +b≤
wTx i +b−yi ≤
.
Moreover, to take some errors into account further, the same slack variables
ξ
as soft margin SVMs are introduced. The slack variables meanpenaltiesandincrease inproportiontoerrors be-tweentruevaluesandpredictedvalues.Theproblemthattheslack variablesareintroducedintoisshownasfollows:minimize
w 2 2 +C iξ
i +ξ
i ∗ subjectto⎧
⎨
⎩
yi −wTxi +b≤+
ξ
i wTx i +b−yi ≤+
ξ
i ∗ξ
i ,ξ
i ∗≥0.where the constant C means the balance between the effect of
maximizingthe marginandpenalties.Tominimize theabove for-mula,the slackvariables intheformulamust alsobe minimized. Accordingly,theslackvariablesdependingontheerrorsareshown asfollows:
ξ
i = 0 yi −wTxi +b≤yi −wTx i +b−
otherwise
ξ
∗ i = 0 wTx i +b−yi ≤wTx i +b−yi −
otherwise.
The above formulas mean that a penalty is not given when
the error islower than
, butthe error is regardedas a penalty thatcannot betoleratedwhentheerrorishigherthan
.Inother words,SVRtolerateserrorslessthan
,buterrorsover
aresolely taken into account aspenalties. Finally, the dual problem is de-rived fromthe above primal problemby Lagrange multiplier and
corresponds to a quadratic programmingproblem aswith SVMs.
Asaresult,sinceauniqueglobaloptimalsolutionissolved,SVRis superiortotraditionalalgorithms thatmightfallintoalocal opti-malsolution,suchasANNs.ThedualproblemderivedbyLagrange multiplierisshownasfollows:
maximize−1 2 i,j
(
α
i +α
∗i)
α
j +α
∗j xTi xj −i
(
α
i +α
i ∗)
+ i yi(
α
i −α
∗ i)
subjectto i(
α
i −α
∗ i)
=0α
i ,α
i ∗∈[0,C].Moreover, the above dual problem can easily involve
non-linear map
ϕ
to consider a higherdimension. To introduce non-linear mapϕ
in the above problem, kernel function K(
xi , xj)
=ϕ
t(
xi
)
ϕ
(
xj)
is defined andused instead ofxTi xj .Thenϕ
t (xi )ϕ
(xj )is determined based on K(xi , xj ) without calculation on a mapped higherdimension;thismethodiscalledthekerneltrick.SVRbased onmaximizingthemarginandthekerneltrickyieldshigh predic-tionperformance.Meanwhile, conventional quadratic programming solvers, such
as the steepest descent method, have very high computational
complexity;thecomputational complexityisapproximatelyO(N3 ) where N isthe numberof trainingdata.Accordingly, a quadratic
programming solver for SVMs, sequential minimal optimization
(SMO), hasbecome de facto standard (Platt, 1998 ). SMO special-izedforSVMreduce thecomputationalcomplexityofSVMto ap-proximately O(N2 ). Nevertheless, when an enormous amount of dataisinputtedastrainingdata,thecomputationalcomplexity in-creases substantially.Tosolve the problem, atheory that regards
thequadraticprogrammingproblemasacomputationalgeometry
problem,corevectormachine(CVM),wasproposed (Tsang, Kwok, & Cheung, 2005 ).ThepredictionperformanceofCVMis compara-bleto that ofSVMs, andthecomputational complexitydecreases substantially.However,accordingtoapaper(Loosli, 2007 ),
predic-tion performance andcomputational complexity ofCVM strongly
dependonthevaluesofparameters.Therefore,whenessential pa-rameter tuningforpractical use istaken intoaccount, CVM does notalwayssatisfybothhighpredictionperformanceandlow com-putationalcomplexity.
SVRisoneofthebestalgorithmsinmachinelearningfromthe viewpointofpredictionperformance.Inparticular,ithasbeen ex-pected thatthe kerneltrickused inthedualproblemis effective forpredictingmicrometeorologicaldatathathasacomplex corre-lationamongdifferentfeatures.However, thecomputational com-plexity to solve the dual problemis often still long for practical use.Thus, itis difficultto applyconventional SVR directlyto mi-crometeorologicaldataprediction.
2.2. Ensemblelearning
Ensemblelearninghasbeenstudiedrecentlyandused increas-ingly.The basicmethodology ofensemble learningisa combina-tion of weak learners builtfrom different kindsof trainingdata. Thecombinationyieldsahighergeneralizingcapabilitythata sin-glemodelcannotrepresent.AswithSVMs,ensemblelearningcan
Algorithm1 Bagging for regression.
Input:
Training data: D={ (x1,y1),...,(xN, yN)} where xi∈X, yi∈Y
Number of weak learners: n
Fort=1 to ndo
1. Dt←generate sample from Dwith replacement 2. Ht( X) ←build a weak learner from Dt Output: H(X)=1 n n t=1 Ht(X) Algorithm2
Boosting for regression. Input:
Training data: D={ (x1,y1),...,(xN, yN)} where xi∈X, yi∈Y
Number of weak learners: n
Weights: wi=1/ N Fort=1 to ndo
1. Ht( X) ←build a weak learner from D by using weights wt 2. t← compute error rate of Ht( X)
3. αt←compute reliability of prediction result of Ht( X) based on t 4. wt+1←update weights wtbased on αt
Output: H(X)= n t=1(α tHt(X))/ n t=1α t
representnon-linearrelationshipsandhasbeenused for predict-ing micrometeorological data. In particular, the two kinds of ap-proaches, bagging andboosting, haveoften beenused in ensem-blelearning.Theapproachesdiffergreatlyonthemethodtobuild weaklearnersandaggregatethem.
Baggingusesseveraltrainingdatageneratedbybootstrap sam-pling. The algorithm of basic bagging for regression is shown in Algorithm. 1 .In bagging, differentkinds oftraining dataare cre-atedbysamplinginputtedoriginaltrainingdatawithreplacement. Then,weaklearnersarebuiltfromeachsampledtrainingdata. Fi-nally,eachpredictedvalueisaggregatedbymajorityvoteor arith-meticaverage.Inparticular,randomforest,introducedbyBreiman (Breiman, 2001 ),to whichrandomnessinfeatureselection isalso applied, often demonstrates better prediction performance than conventionalmodelssuch asSVMs. Randomforestisusedin var-ious applications andhas been extended to other improved ver-sions.Forexample,topredictimbalanceddataobservedfrequently intherealworldmoreaccurately,improvedbalancedrandom for-est (IBRF) hasbeen proposed (Xie et al., 2009 ). IBRF involvesan efficientsamplingmethod forimbalanced dataandcost-sensitive learning that penalizes misclassification of minority class more strongly.TheauthorsshowedthatIBRFwasmoreeffectiveto pre-dictimbalanced datathanclass-weightedSVMsandconventional improvedrandomforestforimbalanceddataprediction.
Boosting builds repeatedly weak learners by using weights
based on the error rate. The algorithm of basic boosting for re-gressionsuch asAdaboost(Freund & Schapire, 1997 )is shownin Algorithm. 2 . Unlike bagging, almost all boosting algorithms use thesame trainingdata,butthe trainingdatais weighted repeat-edly. Boosting alternates between building weak learners by us-ing weights and updating weights. Finally, each predicted value is aggregated by weighted average. Various kinds of algorithms
in boosting have been studied and proposed; gradient boosting
(Friedman, 2001 )inparticularhasshownthebestprediction
per-formance in many competitions. Meanwhile, as with IBRF, the
boosting algorithm for imbalanced data, boosting-SVM, has also beenproposed (Wang & Japkowicz, 2009 ). The main characteris-ticofboosting-SVMisusingasymmetricmisclassificationcost.The
authors demonstrated that boosting-SVM enabled more accurate
Training data
Extracted data
Center of cluster
Test data
Number of weak learners: 3
Threshold of extraction
Training data Specialized object Movement of data
Training data at end of prediction horizon
(a) Extraction of training data by D-SDC.
(b) Weighted ensemble learning in SW-SVR.
Fig.1. Processing overview of SW-SVR.When micrometeorologicaldataincludingmanyunusual natu-ralenvironmentsisregardedasimbalanceddata,theabove meth-odsarelikelytoclassifymicrometeorologicaldatamoreaccurately. However,theseapproachescannotbeappliedtoregression. More-over, according to our previous research (Suzuki, Kaneda, & Mi- neno, 2015 ),thereispropertrainingdatadependingontest data. Inotherwords,weightstoaggregateweaklearnersbuiltfrom dif-ferentkindsoftrainingdatashoulddependontestdata.
3. SW-SVR: Sliding window-based support vector regression
We propose a newmethodologyforpredicting
micrometeoro-logicaldata,slidingwindow-basedsupportvectorregression, com-biningmethodologiesofSVRandensemblelearning.Thebasic the-oriesarebasedonD-SDC,ourpreviousproposedmethodtoextract effectivedataforspecificdataprediction,andnovelweighted en-semblelearningasshownin Fig. 1 .First,torepresentcomplicated micrometeorologicaldataeasily,SW-SVRbuilds severalSVRs spe-cializedforeachrepresentativedatagroupinvariousnatural envi-ronments,such asdifferentseasonsandclimates.Thespecialized SVRsarebuiltbasedonD-SDCthatextractseffectivedatafor spe-cificdata predictionbytakingaccount ofmovements:changes of dataduringpredictionhorizons(Fig. 1 (a)).Eachweaklearnerbuilt fromeachextracteddataspecializesonspecificdataandaccurately predicts thedata similar to the specializeddata. Afterward, each weaklearner isaggregated withweightsdetermined dynamically atthe timeof predictionsoastomaintaintheprediction perfor-mance of micrometeorological data whose characteristics always changewithtime(Fig. 1 (b)).Theweightsaredecidedbythe simi-laritybetweentest dataandeachdataspecializedbyweak learn-ers.Evenifthecharacteristicsofmicrometeorologicaldataalways changewithtime, SW-SVRalways givesprioritytoweak learners thatare more suitablefor predictingtestdata. Thedetails of the SW-SVRalgorithm are shown in Algorithm. 3 . The procedure for trainingconsistsof two kindsof preprocessing,iterated learning, anddynamicaggregation.Theprocedures ofeachpartare shown asfollows.
The below-mentionedalgorithms inSW-SVRusetheL2 norm:
the Euclid distance, and the performance is related to feature space. For example, if feature space includes noisy features or non-linear relationships between features, the performance will probably be reduced substantially. In particular, micrometeoro-logical data has a complex correlation among different features suchastemperatureandhumidity.Accordingly,featurespacemust be mapped into other feature space that takes into account the presence of noise and non-linear relationships. In our approach, we usekernel approximation (Rahimi & Recht, 2007 ) andpartial
Algorithm3
Sliding window-based support vector regression. Input:
Training data set:S={(x1,y1,x1), ...,(xN,yN,xN)}where xi∈X, yi∈Y, xi∈X
Test data: P
Number of weak learners: n
Weight parameters: p, q
Preprocessing:
1.apply normalization toXandX
2. fit kernel approximation and PLS regression to X and X
3.Mi=|| xi−xi|| ,i=1 ...N
4.Gt ←each center of kmenas(X), t=1 ...n Fort=1 to ndo 1.Dti=|| Gt−xi|| ,i=1 ...N 2.rt= N i=1( wiMi)/ N i=1( wi)where wi= 1 /Dtip 3.St={ (xi, yi)| Dti<rt} ,i=1 ...N 4.Ht( X) ←train LinearSVR( St) Output: H( P) = n t=1 (wtHt(P))/ n t=1 (wt) where wt= 1 /|| Gt−P|| q
leastsquares(PLS)regression(Tenenhaus, Vinzi, Chatelin, & Lauro, 2005 )tomapintonewfeaturespace.Kernelapproximation gener-ates newfeature spaceandinvolves higher dimensionsthat rep-resent non-linear data as linear data with a very low computa-tionalcomplexity.Actually,acombinationofkernelapproximation andlinearSVMsledtofasterpredictionperformance thatis com-parable tothat of exactSVM (Cao, Naito, & Ninomiya, 2008 ). On the other hand, PLS regression is a supervised dimension reduc-tionmethodology.Thismethodcanreduce dimensionsby extract-ing latentvariablesthat haveastrongrelationship witha depen-dentvariable.Iffeaturespaceincludesnoisyfeatures,theeffectis reducedbecauseofPLS regression.Thecombinationofkernel ap-proximation and PLS regression enables SW-SVR to use effective featurespaceforcalculationoftheL2 norm in
micrometeorologi-caldata.
According to our previous research, to accurately predict par-ticularspecificdatainmicrometeorologicaldata,itisnecessaryto extract effectivetraining datafor specificdata prediction (Suzuki et al., 2015 ).In SW-SVR,theseseveralspecific datais selectedin advance,andweaklearnersarebuiltfromextractedeffective train-ingdata forpredictingeachspecific data.Meanwhile, micromete-orologicaldatainvolves variousnaturalenvironmentssuch as dif-ferentseasonsandclimates.Therefore,eachselected specificdata
must represent more varied natural environments that probably
willappearsoastorepresentmicrometeorologicaldatabyseveral models. InSW-SVR, each specific datais selected by aclustering
algorithm, k-means(Macqueen, 1967 ).The k-means isone ofthe most famous non-hierarchicalclustering algorithms andclassifies datafasterunderseveralclustersthanotherclusteringalgorithms. In SW-SVR,thek-means classifiesall trainingdatainto thesame numberofclustersasthenumberofweaklearnersgivenbyusers. Then, each centerof clusters isused as specific data that repre-sentsvariousnaturalenvironments.
After selecting severalspecific data, SW-SVRiterates data ex-traction and building a model. First, SW-SVR extracts effective training data forpredicting each specific data by D-SDC(Suzuki, Kaneda, & Mineno, 2014 ).ThetheoryofD-SDCissimilartothatof the k-nearestneighbor (k-NN) algorithm,andD-SDCalso extracts sometrainingdatasimilartoaspecializedobject.However,inour D-SDC,theamountofextracteddatadependsonthemovementof aspecializedobjectwithtime.Themovementrmeansthechange ofaspecializedobjectduringpredictionhorizonsasshowninthe followingequation:
rt =
Gt −Gtwhere G isaspecializedobject,and G isaspecializedobjectafter predictionhorizons.D-SDCextractstrainingdatawhosenormfrom a specialized object is shorter than the movementr. Accordingly, extractedtrainingdataSbyD-SDCisgivenasfollows:
St =
(
xi, yi)
|
Gt −xi <Gt −Gtwhere x is the feature of training data and y is the dependent variable of trainingdata. D-SDCis based on the movementr be-cause the movementr is strongly related to autocorrelation of datasurroundingaspecializedobject.Inmicrometeorologicaldata,
movements in specific natural environments are mutually
sim-ilar, and the autocorrelation becomes lower when these
move-ments arebigger.Forexample,inJapan,thechangeofweatheris drasticeveryspring,andthenaturalenvironmentschangevarious
other naturalenvironments withtime.Meanwhile, whenwe
pre-dicttimeseriesdatasuchasmicrometeorologicaldata, autocorre-lation meanscorrelation between features anda dependent vari-able, andmore training datais required forhighly accurate pre-diction when autocorrelation is lower. Since D-SDC extracts the amount ofdatasurroundinga specializedobject inproportionto the movementr, extractionthat considers autocorrelation ofdata surroundinga specializedobject isachieved.However, the move-mentrisunknownbecause G isnotobserved.Meanwhile,as men-tionedabove,movementsofdatasurroundingaspecializedobject
are mutually similar. Therefore, D-SDC estimates the movement
r based on movements of training data similar to a specialized object byweightedaverage, wheretheweights are reciprocalsof normsbetweenaspecializedobjectandeachtrainingdata. Move-ments oftrainingdata canbe calculatedby referring tothe time wheneachtrainingdataisobserved.Theestimatedmovementris givenasfollows: rt =
Gt −Gt ≈ N i =1wi xi −xi N i =1wi wherewi =G 1 t −xi p , Nisthenumberoftrainingdata,andpisaweighted parame-ter.Afterward,SW-SVRbuildsseverallinearSVRsasweaklearners basedontheextracteddata.Asdescribedabove,acombinationof linear SVR andkernel approximation iscomparableto SVR using a kernel method.Moreover, linear SVR can be built much faster by usingliblinear (Fan, Chang, Hsieh, Wang, & Lin, 2008 ), an op-timized implementationforlinear SVMs, insteadofother general implementationsofSVMssuch aslibSVM(Chang & Lin, 2011 ). Al-thoughausablekernelinliblinearisrestrictedtothelinearkernel, liblinear can build the model much faster by solving the primal probleminsteadofthedualproblem. Furthermore,sinceall train-ing datais divided into smalleramounts of extracteddata, each
modelcan bebuiltfaster, anditiseasierto learneach extracted databyparallelprocessing.
Thepredicted valuesofSW-SVR take intoaccount the change ofnaturalenvironments withtime.In generalensemblelearning, prediction for regression depends on the weighted average, and theweightsaredeterminedatthetimeoftraining.However, SW-SVRdeterminesweightsdynamicallyatthetimeofprediction.The
weightsare determined bythenorm betweentestdataandeach
dataspecializedbyweaklearners.AfinalhypothesisofSW-SVRis shownasfollows: H
(
P)
= n t=1wt Ht(
P)
n t=1wt wherewt = 1 Gt −Pq ,P is the test data, n is the number of weak learners, H(X) is
a hypothesis, and q is a weighted parameter. In our approach,
since the weights of ensemble learning are determined
dynami-callyforeveryprediction,SW-SVRcanfollowmicrometeorological datawhosecharacteristicsalwayschangewithtime.
Finally,we describe thecomputational complexity ofSW-SVR. Torepresentcomplicatedmicrometeorologicaldataeasily,SW-SVR
uses the various conventional methods besides D-SDC we
pro-posed:kernel approximation, PLS regression,k-means, andlinear
SVR. The computational complexity of these methods in general
increases linearly; in other words, the computational complexity isapproximatelyequaltoO(N)wherethenumberoftrainingdata
Nisevenbiggerthanthenumberofthedimensionsandeach pa-rameterofthesemethods. Moreover,thecomputational complex-ityofD-SDCcorresponds to O(nN) because D-SDCjustiteratesN
timesofdistancecalculationn+ 1timeswherenisthenumberof weaklearnersinSW-SVR.Therefore,ifNisevenbiggerthann,the computationalcomplexityofD-SDCalsoincreaseslinearly.The to-talcomputationalcomplexityofSW-SVRisapproximatelyequalto
O(N)thatisevenlessthanthatofSVR. 4. Evaluation
4.1. Experiment
Wecomparedtheperformance ofSW-SVRwithotherstandard
methods for regression: k-NN, decisiontree (DT), Adaboost, bag-ging,random forest (RF), gradient boosting (GB), linear SVR, and SVR usinga radial basis function (RBF)kernel that showshigher performance in various fields(RBF-SVR). Note that the kernel of kernelapproximation inSW-SVR is also the RBF kernel,and the baselearnerinAdaboostandbaggingisthedecisiontreethathas
been used generally. Moreover, to evaluate SW-SVR in more
de-tail,we evaluated the performance of linear SVR with mapping: standardlinearSVRtowhichthesamemappingasSW-SVRis
ap-plied(“mappedSVR”). MappedSVR clarifies each performance of
mappingfeaturespaceandensemblelearningbasedonD-SDC.All parameters ofthe usedmodels were adjusted by the gridsearch method.Baseline forthis evaluationwas the performance of the naivestpersistentmodelasshowninthefollowingformula:
ˆ
yi + t=yi
whereyˆisthe predictedvalue,y isthe truevalue, and
t is the predictionhorizons.
We evaluated the performance by two ways: hold-out valida-tion and 10-fold cross-validation. We predicted the temperature after 1h and 6 h by using large-scale micrometeorological data inTokyo (Japan Meteorological Agency, n.d. ). The dataconsistsof atmosphericpressure,temperature,relativehumidity,windspeed, andirradiance. Inhold-outvalidation,trainingperiodsarelimited totheearlierperiodsthantestingperiodssoastoassume practi-cal use;test dataisalways predictedbasedon pasttrainingdata in practical use. The training periods were from 3 months to 5
(a) Testing periods: 1 month
.
(b) Testing periods: 6 months
.
(c) Testing periods: 12 months
.
1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 3 6 12 24 36 60 MAPE [%] (log scale)Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 8.0E+00 1.6E+01 3.2E+01 6.4E+01 1.3E+02 2.6E+02 5.1E+02 3 6 12 24 36 60 MAPE [%] (log scale)
Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 5.0E+00 1.0E+01 2.0E+01 4.0E+01 8.0E+01 1.6E+02 3.2E+02 3 6 12 24 36 60 MAPE [%] (log scale)
Training periods [months] Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR
Fig.2. MAPE for prediction after 1 h for each algorithm. Note that (b) and (c) are shown with log scale.
yearsbefore September 1,2014, andtestingperiods werefrom 1 monthto 1 yearlater the same day.By varying the training pe-riodsandthetestingperiods, theperformance under thevarious usage scenarios is evaluated. Onthe other hand, the periods for 10-fold cross-validationwere 6years from September 1,2009 to
September1,2015. Notethat theamount ofdataper monthwas
approximately4000 becausethe data was accumulatedevery 10 minutes.Inthisevaluation,weusedthemeanabsolutepercentage error(MAPE) asthe index of prediction error andbuilding time calculatedbasedontheCPU clocktime asthe indexof computa-tionalcomplexity.MAPEisshownasfollows:
MAPE= 100 N N i=1
yi −yˆi yiwhereN isthenumber oftestdata, yis thetruevalue, andyˆis thepredicted value. Moreover, we evaluated the averageof each extractionratebyD-SDCin eachexperimental conditionsoasto analyzetheperformanceofSW-SVRandD-SDCfurther.All imple-mentationsforthisevaluationareinPython,andimplementations inscikit-learn (Pedregosa et al., 2012 ) were usedforall methods
except SW-SVR. This evaluation wasperformed on a single core
ofa machinewithan IntelCorei5-2500KProcessorand12GBof
RAM;even thoughseveralmethods, suchrandom forestand
SW-SVR,can be performed onparallel processing, the methods were performedona singlecoresoastoevaluatethebuildingtime of allmethodsfairly.
4.2.Resultsanddiscussion
Fig. 2 and 3 show theprediction errorinthe prediction hori-zons of 1h and 6 h, respectively. Note that a log scale is used in Figs. 2 (b),(c), 3 (b), and(c). The results indicate that SW-SVR produced thebest average performance in all models duringthe whole testing periods, training periods, and prediction horizons. Inparticular,theeffectoccursnoticeablywhentestingperiodsare longerthantrainingperiods.Ontheother hand,inthissituation,
almostallmethods exceptSW-SVRhaveoftenlower performance
thanthenaivestpersistentmodelasbaseline.Theresults demon-stratethat theconventional superior methodsdo not always dis-play the great performance for micrometeorological data predic-tiondepending on difficulty of theprediction caused by training periodsandtestingperiodsandpredictionhorizons.Moreover, in
algorithms based onSVR, the prediction performance of SW-SVR
isalmostthebest,followedinorderbythoseofRBF-SVR,mapped
SVR,andlinearSVR.ThedifferencebetweenmappedSVRand lin-earSVRisduetotheeffectofmappingfeaturespace.Ontheother
hand,thedifference betweenSW-SVRandmappedSVR isdueto
the effect of ensemble learning based on D-SDC. These
compar-isons demonstratedthat both mappingfeature spaceand
ensem-blelearningbasedonD-SDCareeffectiveforimprovingprediction
performance. Meanwhile, mappedSVR also tended tohave lower
predictionperformancethanthatofSW-SVRwhenthetesting pe-riodsarelongerthanthe trainingperiods.Accordingly,underthis condition,ensemblelearningbasedonD-SDCisparticularly effec-tive. When the testingperiods are longer than the training peri-ods, the effectivetrainingdata for predictingthe test datais re-duced. We considered that a little training data that D-SDC ex-tractedforbuildingmodelscorrespondedtotheeffectivetraining dataforpredictingthetestdata.Actually, Fig. 4 indicatesthe aver-ageofeach extractionratebyD-SDCanddemonstratesthatweak learnersofSW-SVRare alwaysbuiltfromavery smallproportion ofthewholetrainingdata.SW-SVRthat alwayspredicts microm-eteorologicaldata accurately regardlessofthe amountoftraining dataisverypracticalanduseful.
Table 1 showstheresultsof10-foldcross-validationinthe pre-diction horizonsof1h and6h.SW-SVRwasoftensuperiortoall
methods including RBF-SVM in hold-out validation. However, in
10-fold cross-validation,although SW-SVRhadhigher the predic-tion performance than that ofall methods except RBF-SVR, RBF-SVRwassuperiortoSW-SVRslightly.Theresultsdemonstratethat thepredictionperformanceofSW-SVRisaffectedbytemporal or-der betweentraining dataand test data,andSW-SVR is particu-larlysuitedtobeusedforpracticaluseinwhichtestdataisalways predictedbasedonpasttrainingdata.Meanwhile,evenin10-fold cross-validation,themagnituderelationofthepredictionerror
be-tween mappedSVRandlinearSVR andSW-SVR wassameasthe
caseofhold-outvalidation.Therefore,bothmappingfeaturespace andensemblelearningbasedonD-SDCareeffectiveforimproving predictionperformanceincross-validation.
Fig. 5 and 6 show the building time in the prediction hori-zons of 1h and 6 h, respectively. Figs. 5 (a) and 6 (a) show the buildingtimeofmodelsthat havehighpredictionperformanceas shownin Figs. 2 and 3 ,RF,GB,RBF-SVR,andSW-SVR,when train-ing periods were varied. Note that the numberof weak learners was1000intheensemblelearningseries,costparameterwas1in theSVRseries,and
σ
ofSW-SVRwas0.00001;σ
ofSW-SVRwasa parameteroftheRBFkernelinkernelapproximation.Theseresults demonstratedthatthebuildingtimeofensemblelearning,suchas SW-SVR,increasesmoregentlythanthatofSVR.Inparticular,the(a) Testing periods: 1 month
.
(b) Testing periods: 6 months
.
(c) Testing periods: 12 months
.
7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 3 6 12 24 36 60 MAPE [ % ] (log scale)Training periods [months]
Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 3.0E+01 6.0E+01 1.2E+02 2.4E+02 4.8E+02 3 6 12 24 36 60 MAPE [ % ] (log scale)
Training periods [months]
Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR 2.0E+01 4.0E+01 8.0E+01 1.6E+02 3.2E+02 3 6 12 24 36 60 MAPE [%] (log scale)
Training periods [months]
Persistent k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR SWSVR
Fig.3. MAPE for prediction after 6 h for each algorithm. Note that (b) and (c) are shown with log scale.
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 3 6 12 24 36 60 E x tra c ti o n rate [%]
Training periods [months]
Number of weak learners: 10 Number of weak learners: 100 Number of weak learners: 1000
0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 3 6 12 24 36 60 E x tra c ti o n rate [%]
Training periods [months]
Number of weak learners: 10 Number of weak learners: 100 Number of weak learners: 1000
(b) Prediction horizons: 6 hours.
(a) Prediction horizons: 1 hour.
Fig.4. Average of each extraction rate by D-SDC in SW-SVR.
Table1
MAPE of 10-fold cross-validation for each algorithm. Methods
Prediction horizons SW-SVR k-NN DT Adaboost Bagging RF GB Linear SVR mapped SVR RBF-SVR Persistent 1h 5 .18608 8 .59929 5 .81042 11 .10375 10 .24014 5 .57213 5 .27190 5 .43892 5 .25274 5 .16985 5 .96816 6h 23 .49826 26 .52433 25 .99290 29 .93160 29 .58125 25 .55044 24 .14987 24 .68383 24 .26108 20 .94132 24 .86800 1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 Ti m e [ s e c ] (log scale) 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 Ti m e [ s e c ] (log scale) RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 SW-SVR RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 RBF SVR: σ = 0.1 RBF-SVR: σ = 0.00001 SW-SVR -1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 Ti m e [ s e c ] (log scale)
(b) Ensemble learning series.
(c) SVR series.
(a) Different training periods.
3 6 12 24 36 60
Training periods [months]
1 5 10 50 100
Cost
10 50 100 500 1000
Number of weak learners
Linear SVR RBF-SVR: σ = 0.1 RBF-SVR: σ = 0.001 RBF-SVR: σ = 0.00001 SW SVR: σ = 0.1 SW SVR: σ = 0.001 SW SVR: σ = 0.00001
(b) Ensemble learning series.
(c) SVR series.
(a) Different training periods.
1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 3 6 12 24 36 60 Ti m e [ s e c ] (log scale)
Training periods [months]
1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 10 50 100 500 1000 Ti m e [ s e c ] (log scale)
Number of weak learners
1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1 5 10 50 100 Ti m e [ s e c ] (l og scale) Cost RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 RBF-SVR: σ = 0.1 RBF-SVR: σ = 0.00001 SW-SVR RF: Depth = 5 RF: Depth = 10 GB: Depth = 5 GB: Depth = 10 SW-SVR Linear SVR RBF-SVR: σ = 0.1 RBF-SVR: σ = 0.001 RBF-SVR: σ = 0.00001 SW-SVR: σ = 0.1 SW-SVR: σ = 0.001 SW-SVR: σ = 0.00001
Fig.6. Building time for prediction after 6 h for each model. Note that all figures are shown with log scale.
buildingtimeofSW-SVRisshortestwhenthetrainingperiods be-comelonger. Inotherwords,therateofbuildingtimeincrease of SW-SVRisthegentlestinallthemethods whentrainingdata in-creases.Theseresultsindicatethat,asmentionedabove,the com-putationalcomplexityofSW-SVRislessthanthat ofconventional
methodsincluding random forest and gradientboosting. SW-SVR
iseffectivefortrainingofanenormousamountofdataintermsof buildingtime.
Next, Figs. 5 (b)and 6 (b) show the buildingtime of the mod-elswithbetterperformanceinensemblelearning,RF,GB,and SW-SVR,whenthenumberofweaklearnerswasvaried.Notethatthe costparameterofSW-SVRwas1,
σ
ofSW-SVRwas0.00001,andtrainingperiods were 12 months. SW-SVR needs a longer
build-ingtime thanRF andGBusing shallowDT when thenumber of
weaklearnersislower.However, whenthe depthofDT becomes
deeperorthe numberofweak learnersbecomeshigher, SW-SVR
canbuild the model fasterthan orat thesame speed asRF and GB. Moreover, SW-SVR, as with RF,can be run easily in parallel environments,anditisexpectedthatthebuildingtimeofSW-SVR willbecomeevenshorter.
Finally, Figs. 5 (c)and 6 (c)show thebuildingtime ofthe
mod-elsbased onSVR when theparameters ofSVR were varied. Note
thatthenumberofweaklearnerswas100,andthetraining peri-odswere12months.Theseresultsindicatethatthebuildingtime ofSW-SVRissignificantlyshorterthanthatofRBF-SVRbutlonger thanlinearSVR.Meanwhile, Fig. 4 demonstrates thatweak learn-ersofSW-SVRarealwaysbuiltfromaverysmallproportionofthe wholetrainingdata.Inparticular,when predictionhorizonswere 1h, the averageof each extraction ratewas0.47 percent at best
and 1.82 percent at worst. On the other hand, when prediction
horizons were 6 h, the average of each extraction rate was7.57 percentatbestand16.25percentatworst. Nevertheless,the rea-sonthecomputationalcomplexityofSW-SVRislargerthanlinear SVRisthattheincreaseofcomputationalcomplexitydueto build-ingseveralmodelsislarger.However,sincetheamountoftraining dataofeachweaklearnerreducessubstantially,thecomputational
complexityto build one model in SW-SVR reduces also.
Accord-ingly,whenthenumberofmodelsoneCPU buildsreducesby us-ingparallelprocessing,thecomputationalcomplexityoftheoverall SW-SVRislowerthanorequaltothatoflinearSVR.Meanwhile,as
withlinearSVR,SW-SVRneverdependsonthechangeof
param-etersrelated to SVR, andthe buildingtime is always a constant. As mentioned in theabove discussion, thebuilding time of SW-SVRsolelydependsonthe numberofweaklearnersandtraining
periods.Therefore,SW-SVRcanavoidanunexpectedlongbuilding timeinparametertuningthatchangeseachparametervariously.
These results demonstrate that SW-SVR predicts complicated
micrometeorological data with the best prediction performance
andthelowestcomputationalcomplexitycomparedwithstandard algorithms. In particular, we found that dynamic aggregation of models builtfromverylittle extracteddataby D-SDCiseffective forcompatibilityofhighpredictionperformance andlow compu-tational complexity.However, there are problemsto be solved in SW-SVR.Firstly,thepredictionperformanceofSW-SVRsometimes deterioratesdespiteanincreaseoftrainingdata.Inparticular,this problem occurred under the conditions that prediction horizons are 6 h asshownin Fig. 3 . Thisis because dataextracted by D-SDCinvolvesunnecessarytrainingdataforhighlyaccurate predic-tion.IfD-SDCextracts thesamedata asthe extracteddatawhen trainingperiodsareshorter,thepredictionperformanceofSW-SVR never deteriorates dueto an increase oftraining data.Therefore,
we must review both feature mapping and algorithms of D-SDC
so as to avoid extracting unnecessary training data. Meanwhile, SW-SVR is based on a combinationof severalalgorithms: kernel approximation, PLS regression, k-means, D-SDC, and linear SVR. Moreover, each algorithm has severalparameters. Therefore, SW-SVR hasmore variedparameters, andit takesmoretime to tune theparameters.Inthisexperiment,weusedagridsearch roughly soastodecidetheparametersinacertaintime.However,thereis still room forimprovementin theprediction performance by us-ingotherapproachessuchasageneticalgorithminsteadofagrid search(Huang & Wang, 2006 ).
5. Conclusion and future work
In thispaper,we proposed a new methodologyforpredicting micrometeorologicaldata,SW-SVRthatinvolvesanovel combina-tionofSVR andensemblelearning.TotaketheadvantagesofSVR andensemblelearning,SW-SVRbuildsseveralSVRsspecializedfor eachrepresentativedatagroupinvariousnaturalenvironmentsby usingD-SDC that extracts effectivetrainingdata forspecific data prediction. Moreover, to follow micrometeorological data whose characteristics always change withtime, prediction ofSW-SVR is
based on dynamically weightedensemble learningdepending on
thesimilaritybetweentestdataandeachdataspecializedbyweak learners. As a result of evaluation experiments using large-scale micrometeorological data,the prediction performance of SW-SVR isgreaterthanorequaltoothergeneralmethodssuchasSVR,RF,
andGB.Moreover,SW-SVRreducesthebuildingtimesubstantially comparedwithcomplicatedmodelsthathavehighprediction per-formance.Weanticipatethatdynamicaggregationofmodelsbuilt from variouskinds of extracteddata by D-SDCcan contribute to moresophisticatedstudiesofmicrometeorologicaldataprediction. Infuturework,weshouldevaluateSW-SVRinmorevaried sit-uations to show that SW-SVR workseffectively. In particular, we will use more complicated data that consists of many features. Furthermore,whenSW-SVRisappliedtoapplicationssuchas en-vironmental control systems, the performance ofoverall applica-tions should be evaluated. Currently,we have developedan
agri-cultural support system using SW-SVR, which controls
environ-mentsingreenhousesdependingontheactivityoftheplants.The evaluationoftheapplicationswilldescribethesuperiorityof SW-SVRinpracticaluse.
Acknowledgements
This study was partially supported by JST, PRESTO , and JSPS KAKENHI(26 6 60198 ),Japan.
References
Antonanzas,J.,Urraca,R.,Martinez-de-Pison,F.J.,&Antonanzas-Torres, F.(2015). Solarirradiationmappingwithexogenousdatafromsupportvectorregression machinesestimations.EnergyConversionandManagement,100,380–390. Breiman, L. (2001). Random forests. MachineLearning,45(1), 5–32 http://doi.org/10.
1023/A:1010933404324.
Cao, H., Naito, T., & Ninomiya, Y. (2008). Approximate RBF kernel SVM and its ap- plications in pedestrian classification. The1stInternational Workshopon Ma-chine Learning for Visionbased Motion Analysis - MLVMA’08, 1–9 http://hal. archives-ouvertes.fr/inria-00325810/.
Chang, C., & Lin, C. (2011). LIBSVM : A library for support vector machines. ACM TransactionsonIntelligentSystemsandTechnology(TIST),2, 1–39 http://doi.org/ 10.1145/1961189.1961199.
Chevalier,R.F.,Hoogenboom,G.,McClendon,R.W.,&Paz,J.A.(2011).Support vec-torregressionwithreducedtrainingsetsforairtemperatureprediction:A com-parisonwithartificialneuralnetworks.NeuralComputing&Applications,20(1), 151–159Retrievedfrom<GotoISI>://WOS:000286674800015.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. TheJournalofMachineLearning,9(2008), 1871–1874 http://doi.org/10.1038/oby.2011.351.
Freund, Y., & Schapire, R. (1997). A desicion-theoretic generalization of on-line learning and an application to boosting. ComputationalLearningTheory,55(1), 119–139 http://doi.org/10.1006/jcss.1997.1504.
Friedman,J. H.(2001). Greedyfunction approximation:Agradientboosting ma-chine.AnnalsofStatistics,29(5),1189–1232.
Huang, C. L., & Wang, C. J. (2006). A GA-based feature selection and parameters optimizationfor support vector machines. ExpertSystemswithApplications,31(2), 231–240 http://doi.org/10.1016/j.eswa.2005.09.024.
Japan Meteorological Agency. (n.d.).. Japanmeteorologicalagencyhttp://www.jma.go. jp/jma/indexe.html.
Kisi, O., & Cimen, M. (2012). Precipitation forecasting by using wavelet-support vec- tor machine conjunction model. EngineeringApplicationsofArtificialIntelligence, 25(4), 783–792 http://doi.org/10.1016/j.engappai.2011.11.003.
Kolokotsa, D., Pouliezos, A., Stavrakakis, G., & Lazos, C. (2009). Predictive con- trol techniques for energy and indoor environmental quality management in buildings. BuildingandEnvironment,44(9), 1850–1863 http://doi.org/10.1016/j. buildenv.2008.12.007.
Loosli,G.(2007).Commentsonthecorevectormachines:fastSVMtrainingonvery largedatasets.TheJournalofMachineLearningResearch,8,291–301.
Macqueen, J. (1967). Some methods for classification and analysis of multivari- ate observations. In Proceedingsofthefifthberkeleysymposiumon mathemati-calstatisticsandprobability:1 (pp. 281–297). http://doi.org/citeulike-article-id: 6083430.
Maity,R.,Bhagwat,P.,&Bhatnagar,A.(2010).Potentialofsupportvectorregression forpredictionofmonthlystreamflowusingendogenousproperty.Hydrological Processes,24(7),917–923.
Mohammadi,K.,Shamshirband,S.,Anisi,M.H.,Alam,K.A.,&Petkovi´c,D.(2015). Supportvectorregressionbasedpredictionofglobalsolarradiationona hori-zontalsurface.EnergyConversionandManagement,91,433–441.
Othman, M. F., & Shazali, K. (2012). Wireless sensor network applications: A study in environment monitoring system. In ProcediaEngineering:41(pp. 1204–1210). http://doi.org/10.1016/j.proeng.2012.07.302.
Park, D. H., & Park, J. W. (2011). Wireless sensor network-based greenhouse envi- ronment monitoring and automatic control system for dew condensation pre- vention. Sensors,11(4), 3640–3651 http://doi.org/10.3390/s110403640.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Courna- peau, D., Brucher, M., Perrot, M. & Duchesnay, É. (2011). Scikit-learn: machine learning in python. TheJournal ofMachine LearningResearch 12, 2825–2830. http://doi.org/10.1007/s13398-014-0173-7.2
Platt, J. C. (1998). Fast training of support vector machines using sequential minimal optimization. AdvancesinKernelMethods, 185–208 http://doi.org/10.1109/ISKE. 2008.4731075.
Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines.
AdvancesinNeuralInformationProcessingSystems,20, 1177–1184 http://doi.org/ 10.1.1.145.8736.
Singh, K. P., Gupta, S., & Rai, P. (2013). Identifying pollution sources and predicting urban air quality using ensemble learning methods. AtmosphericEnvironment, 80, 426–437 http://doi.org/10.1016/j.atmosenv.2013.08.023.
Smith, B. A., Hoogenboom, G., & McClendon, R. W. (2009). Artificial neural networks for automated year-round temperature prediction. ComputersandElectronicsin Agriculture,68(1), 52–61 http://doi.org/10.1016/j.compag.2009.04.003. Suzuki,Y.,Kaneda,Y.,&Mineno,H.(2014).SW-SVRimprovedbyshort-distancedata
collectionmethod(pp.1–8)IPSJSIGTechnicalReport,2014-MBL-73(9).
Suzuki, Y., Kaneda, Y., & Mineno, H. (2015). Analysis of support vector regression model for micrometeorological data prediction. ComputerScienceand Informa-tionTechnology,3(2), 37–48 http://doi.org/10.13189/csit.2015.030202.
Tenenhaus, M., Vinzi, V. E., Chatelin, Y. M., & Lauro, C. (2005). PLS path modeling.
ComputationalStatisticsandDataAnalysis,48(1), 159–205 http://doi.org/10.1016/ j.csda.2004.03.005.
Tsang, I. W., Kwok, J. T., & Cheung, P.-M. (2005). Core vector machines: Fast SVM training on very large data sets. JournalofMachineLearningResearch,6, 363– 392 http://doi.org/10.1111/j.1442-9993.2007.01810.x.
Urraca,R.,Antonanzas,J.,Martinez-de-Pison,F.J.,&Antonanzas-Torres,F.(2015). Estimationofsolarglobalirradiationinremoteareas.JournalofRenewableand SustainableEnergy,7(2),023136.
Vapnik, V. N. (1995). TheNatureofStatisticalLearningTheory: Vol. 8. Springer http: //doi.org/10.1109/TNN.1997.641482.
Wang, B. X., & Japkowicz, N. (2009). Boosting support vector machines for imbal- anced data sets. KnowledgeandInformationSystems,25(1), 1–20 http://doi.org/ 10.1007/s10115-009-0198-y.
Xie, Y., Li, X., Ngai, E. W. T., & Ying, W. (2009). Customer churn prediction using improved balanced random forests. ExpertSystemswithApplications,36(3 PART 1), 5445–5449 http://doi.org/10.1016/j.eswa.2008.06.121.