Top PDF Modeling pan evaporation using Gaussian Process Regression, K-Nearest Neighbors, Random Forest, and Support Vector Machines: Comparative analysis

Modeling pan evaporation using Gaussian Process Regression, K-Nearest Neighbors, Random Forest, and Support Vector Machines: Comparative analysis

Modeling pan evaporation using Gaussian Process Regression, K-Nearest Neighbors, Random Forest, and Support Vector Machines: Comparative analysis

Received: 4 November 2019; Accepted: 31 December 2019; Published: 4 January 2020    Abstract: Evaporation is a very important process; it is one of the most critical factors in agricultural, hydrological, and meteorological studies. Due to the interactions of multiple climatic factors, evaporation is considered as a complex and nonlinear phenomenon to model. Thus, machine learning methods have gained popularity in this realm. In the present study, four machine learning methods of Gaussian Process Regression (GPR), K-Nearest Neighbors (KNN), Random Forest (RF) and Support Vector Regression (SVR) were used to predict the pan evaporation (PE). Meteorological data including PE, temperature (T), relative humidity (RH), wind speed (W), and sunny hours (S) collected from 2011 through 2017. The accuracy of the studied methods was determined using the statistical indices of Root Mean Squared Error (RMSE), correlation coefficient (R) and Mean Absolute Error (MAE). Furthermore, the Taylor charts utilized for evaluating the accuracy of the mentioned models. The results of this study showed that at Gonbad-e Kavus, Gorgan and Bandar Torkman stations, GPR with RMSE of 1.521 mm/day, 1.244 mm/day, and 1.254 mm/day, KNN with RMSE of 1.991 mm/day, 1.775 mm/day, and 1.577 mm/day, RF with RMSE of 1.614 mm/day, 1.337 mm/day, and 1.316 mm/day, and SVR with RMSE of 1.55 mm/day, 1.262 mm/day, and 1.275 mm/day had more appropriate performances in estimating PE values. It was found that GPR for Gonbad-e Kavus Station with input parameters of T, W and S and GPR for Gorgan and Bandar Torkmen stations with input parameters of T, RH, W and S had the most accurate predictions and were proposed for precise estimation of PE. The findings of the current study indicated that the PE values may be accurately estimated with few easily measured meteorological parameters.
Show more

17 Read more

Modeling Pan Evaporation Using Gaussian Process Regression K-Nearest Neighbors Random Forest and Support Vector Machines; Comparative Analysis

Modeling Pan Evaporation Using Gaussian Process Regression K-Nearest Neighbors Random Forest and Support Vector Machines; Comparative Analysis

Their obtained results revealed high capabilities of implemented firefly algorithm in decreasing the prediction error of standalone ANFIS model in all studied stations. Khosravi et al. [ 24 ] examined the potential of five data mining and four ANFIS models for predicting reference evapotranspiration in two stations in Iraq. They stated that for both studied stations, the ANFIS-GA generated the most accurate predictions. Salih et al. [ 25 ] investigated the capabilities of co-ANFIS for predicting evaporation from reservoirs using meteorological parameters. The findings of the mentioned study indicated the suitable accuracy of the co-ANFIS model in evaporation estimation. Recently, Feng et al. [ 26 ] examined the performance of two solar radiation-based models for the estimation of daily evaporation in different regions of China. They suggested that Stewart’s model can be preferred when the meteorological data of sunny hours and air temperature are available. Therefore, it is possible to estimate the evaporation through intrinsically nonlinear models. Qasem et al. [ 27 ] examined the applicability of wavelet support vector regression and wavelet artificial neural networks for predicting PE at Tabriz and Antalya stations. Obtained results indicated that artificial neural networks had better performances, and the wavelet transforms did not have significant effects in reducing the prediction errors at both studied stations. Yaseen et al. [ 28 ] predicted PE values using four machine learning models in two stations of Iraq. They reported that the SVM indicated the best performance comparing to other studied methods.
Show more

17 Read more

Modeling Daily Pan Evaporation in Humid Climates Using Gaussian Process Regression

Modeling Daily Pan Evaporation in Humid Climates Using Gaussian Process Regression

most critical factors in agricultural, hydrological, and meteorological studies. Due to the interactions of multiple climatic factors, the evaporation is a complex and nonlinear phenomenon; therefore, the data-based methods can be used to have precise estimations of it. In this regard, in the present study, Gaussian Process Regression (GPR), Nearest-Neighbor (IBK), Random Forest (RF) and Support Vector Regression (SVR) were used to estimate the pan evaporation (PE) in the meteorological stations of Golestan Province, Iran. For this purpose, meteorological data including PE, temperature (T), relative humidity (RH), wind speed (W) and sunny hours (S) collected from the Gonbad-e Kavus, Gorgan and Bandar Torkman stations from 2011 through 2017. The accuracy of the studied methods was determined using the statistical indices of Root Mean Squared Error (RMSE), correlation coefficient (R) and Mean Absolute Error (MAE). Furthermore, the Taylor charts utilized for evaluating the accuracy of the mentioned models. The outcome indicates that the optimum state of Gonbad-e Kavus, Gorgan and Bandar Torkman stations, Gaussian Process Regression (GPR) with the error values of 1.521, 1.244, and 1.254, the Nearest-Neighbor (IBK) with error values of 1.991, 1.775, and 1.577, Random Forest (RF) with error values of 1.614, 1.337, and 1.316, and Support Vector Regression (SVR) with error values of 1.55, 1.262, and 1.275, respectively, have more appropriate performances in estimating
Show more

21 Read more

Comparative evaluation of static gesture recognition techniques based on nearest neighbor, neural networks and support vector machines

Comparative evaluation of static gesture recognition techniques based on nearest neighbor, neural networks and support vector machines

Ren and Zhang [18] used a Support Vector Machine com- bined with Minimum Enclosing Ball, a method they called MEB-SVM, to classify static gestures acquired from a video camera. After the image acquisition, image segmentation, contour selection, and classification, their work achieved a mean recognition rate of 92.89%. Liu et al. [19] used an SVM with Hu moments to classify hand postures acquired from a camera, automating the verification of hand inte- grality for the Chinese Driver Physical Examination Sys- tem. An error rate of 3.5% was generated after tests were executed with data from 20 people. Chen and Tseng [20] developed a robotic visual system to recognize static ges- tures for finger guessing games. An SVM classifier was im- plemented and configured to be robust enough to work re- gardless of hand angles and skin colors. In their tests, their setup achieved a correct recognition rate of 95.0% for the paper, rock, and scissors game, using data from four people. Meng, Pears, and Bailey [21] presented a method to recog- nize human actions from video streams, using a linear SVM as classifier, trained with data acquired from Motion History Image (MHI) and Hierarchical Motion History Histogram (HMHH). Using examples of walking, jogging, running, boxing, hand clapping, and hand waving, recorded from 25 people in four different scenarios, the method achieved a maximum recognition rate of 93.1%.
Show more

16 Read more

Assessment of factors affecting tourism satisfaction using K nearest neighborhood and random forest models

Assessment of factors affecting tourism satisfaction using K nearest neighborhood and random forest models

The city of Hamadan, as an ecotourism and historic destination, can provide a platform for development of sustainable tourism through proper planning and provi- sion of infrastructure. This study aimed a) to identify fac- tors affecting the satisfaction of tourists traveling the city of Hamadan in line with the appropriate decision making for the development of tourism as well as increasing the satisfaction of tourists; and b) to compare performance of two data mining techniques of random forest (RF) and K-nearest neighborhood (KNN) in predicting tourism satisfaction.
Show more

5 Read more

Software Development Effort Duration and Cost Estimation using Linear Regression and K Nearest Neighbors Machine Learning Algorithms

Software Development Effort Duration and Cost Estimation using Linear Regression and K Nearest Neighbors Machine Learning Algorithms

The most important work in software development is Estimation. Various Software Effort Estimation models came into existence when people started following the standard project management process. Researchers often use KLOC, Story Points, Function Points and Use Case Points as a measure of size. There are several techniques used to calculate effort which is broadly classified into algorithmic models, and non-algorithmic models [23]. Algorithmic models such as COCOMO, COCOMO-II, Putnam’s, etc., cannot do early estimations because the attributes they use could only be calculated after project completion. So for doing early estimation, the best alternate is non-algorithmic models such as expert-based, learning-based, linguistic-based and optimization-based models. In this research we are discussing on machine learning techniques which can perform early estimations and have the ability to handle non-linear function, these are adaptable for any environment, we could calculate confidence in decision made.
Show more

5 Read more

Random Regression Forest Model using Technical Analysis Variables

Random Regression Forest Model using Technical Analysis Variables

According to the results of the analysis, it was determined that technical indicators are very successful in order to predict stock price (Aldin et. al., 2012). Taylor and Allen made a study about technical analysis in foreign exchange market. Within this context, they conducted a survey to foreign exchange dealers in London. As a result of survey analysis, it was determined that 90% of dealers use technical analysis in their works. Another result of this study is that dealers trust technical analysis result for short time period. However, they prefer fundamental analysis in the long term (Taylor and Allen, 1992). Blume and others tried to investigate the applicability of stock exchange volume for technical analysis. They concluded that the volume is very informative in order to define the value of stock exchange. Another conclusion of this study is that it was defined that investors, who use statistical information, are more successful than the others in their investments (Blume et. al., 1994). Lam tried to integrate fundamental and technical analysis for financial performance prediction. Within this scope, financial data of 364 S&P companies for the years between 1985 and 1995 was used. In addition to this situation, neural networks method was used in this study so as to achieve this objective. As a result of the analysis, it was determined that using fundamental and technical analysis gives better results (Lam, 2004). Chavarnakul and Enke made a study related to the performance of 2 technical indicators of technical analysis approach. Within this scope, they used generalized regression neural network method. Furthermore, S&P 500 index data was used in this study so as to achieve the purpose. In conclusion, it was defined that stock trading using the neural network showed better performance than the results of stock trading without neural network assistance (Chavarnakul and Enke, 2008).
Show more

18 Read more

Imputing missing genotypes with weighted k nearest neighbors

Imputing missing genotypes with weighted k nearest neighbors

Contrary to the application of KNNcatImpute to the GENICA data set in which all SNPs/observations are considered in the search for the k nearest SNPs/observations, we here “just” use the SNPs without missing values to identify the k nearest neighbors of a SNP with missing values. Moreover, we restrict the search by considering the SNPs chromosomewise such that the missing genotypes of a particular SNP are imputed using only SNPs that come from the same chromosome as the considered SNP. The latter is not only time-saving, but also biologically meaningful, as only SNPs from the same chromosome are inherited together. In Table 2, the mean fractions of falsely imputed values are summarized for the different settings of KNNcatImpute. This table shows that while employing the corrected Pearson’s contingency coefficient works poorly also in the application to the HapMap data, the other three distance measures perform almost equally well, where the scaled Manhattan distance exhibits slightly lower error rates than d SMC which in turn leads to slightly less falsely imputed
Show more

16 Read more

Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

We consider learning on graphs, guided by kernels that encode similarity between vertices. Our fo- cus is on random walk kernels, the analogues of squared exponential kernels in Euclidean spaces. We show that on large, locally treelike graphs these have some counter-intuitive properties, specif- ically in the limit of large kernel lengthscales. We consider using these kernels as covariance func- tions of Gaussian processes. In this situation one typically scales the prior globally to normalise the average of the prior variance across vertices. We demonstrate that, in contrast to the Euclidean case, this generically leads to significant variation in the prior variance across vertices, which is undesir- able from a probabilistic modelling point of view. We suggest the random walk kernel should be normalised locally, so that each vertex has the same prior variance, and analyse the consequences of this by studying learning curves for Gaussian process regression. Numerical calculations as well as novel theoretical predictions for the learning curves using belief propagation show that one obtains distinctly different probabilistic models depending on the choice of normalisation. Our method for predicting the learning curves using belief propagation is significantly more accurate than previous approximations and should become exact in the limit of large random graphs.
Show more

35 Read more

Derivation of regression models for pan evaporation estimation

Derivation of regression models for pan evaporation estimation

Principle Component Analysis (PCA) In order to examine the relationships among a set of p correlated variables, it may be useful to transform the original set of variables to another new set of uncorrelated variables called principal components. These new variables are linear combinations of the original variables and are derived in decreasing order of importance so that, for example, the first principal component accounts for the largest variance of the original data. PCA originated in some work by Karl Pearson around the turn of the century, and was further developed in the 1930s by Harold Hotelling (Chatfield and Collins, 1980). The usual objective of the analysis is to see if the first few components account for most of the variation in the original data. In other words, if some of the original variables are highly correlated, they are effectively 'having the same information' and there may be near-linear constraints on the variables. This method will simply find components which are close to the original variables but arranged in decreasing order of variance (liu et al. 2003). As a result, the information of original variables was exhibited by derived principal components and don't waste aspects of data's information (Konishi and Rao 2014). The PCA can be explained as four below stages: A) Calculation of KMO 1 factor
Show more

14 Read more

A Completion on Fruit Recognition System Using K Nearest Neighbors Algorithm

A Completion on Fruit Recognition System Using K Nearest Neighbors Algorithm

Abstract— Recognition of several fruit images is major challenges for the computers. Mostly fruit recognition techniques which combine different analysis method like color-based, shaped-based, size-based and texture-based. Different fruit images color and shape values are same, but not robust and effective to recognize and identify the images. We introduce new fruits recognition techniques. This combines four features analysis method shape, size and color, texture based method to increase accuracy of recognition. Proposed method used is nearest neighbor classification algorithm. These methods classify and recognize the fruit images from the nearest training fruit example. In this paper it takes the fruit images as input and then recognition system shows the fruit name. Proposed fruit recognition system analyses, classifies and identifies the Fruit recognition system improves the educational learning purpose sharply for small kids and used grocery store to automate labeling and computing the price.
Show more

5 Read more

Estimation of monthly pan evaporation using support vector machine in Three Gorges Reservoir Area, China

Estimation of monthly pan evaporation using support vector machine in Three Gorges Reservoir Area, China

In particular, a large number of data-driven models have been created. For example, the empirical model and machine learning algorithm have been extensively investigated. Stephens and Stewart (1963) developed an empirical model using radiation and air temperature. This model was found to perform best among 23 models in extremely arid areas (Al-Shalan and Salih 1987). Hanson (1989) presented an empirical equation using radiation and air temperature in the USA. Linacre (1977) proposed a simple model using temperature in Australia. Rotstayn et al. (2006) coupled the radiative component and the aerodynamic component to develop the PenPan model, which was later validated by Roderick et al. (2007) and Johnson and Sharma (2010) across Australia. Lim et al. (2016) modified the PenPan model to present the PenPan-V2 model, which was found to outperform the original PenPan model in Australia. Patel and Majmundar (2016) obtained empirical relations as functions of air temperature, relative humidity, wind velocity, and sunshine duration in India. Andreasen et al. (2017) developed multilinear regression models using various combinations of meteorological variables in the USA. The main benefit of empirical models is that the meteorological variables are routinely measured and easily available. However, they can only be applied to the places with similar climatic conditions (Goyal et al. 2014). Moreover, the empirical models cannot provide accurate estimations due to the complex process of evaporation (Shalamu 2011).
Show more

31 Read more

On Secured Blockchain Technology For K Nearest Neighbors Algorithm

On Secured Blockchain Technology For K Nearest Neighbors Algorithm

In "Surveying converse inventory network effectiveness: maker's point," pushed by the significance of regular supportability and remanufacturing exercises, M. Kumar et al. use the dug in fuzzy data envelopment analysis (FDEA) approach to manage focus pivot creation network the board. They direct their exploration from the maker's factor of view. In truth, they convert the proposed FDEA model into a new immediate programming improvement trouble. thus, the issue is point by point as an interim programming trouble. They fight that their proposed model can deliver generous outcomes. They show that the ISO 14001 accreditation plot just hardly improves the stock system's confirmation of home grown viability. but, their revelations surprisingly show that associations that have completed alter assembling system practices for a shorter time allotment could practically outmaneuver those which have realized turn round store network practices for an extra drawn out range of time.
Show more

8 Read more

Predictive modelling for solar thermal energy systems: A comparison of support vector regression, random forest, extra trees and regression trees

Predictive modelling for solar thermal energy systems: A comparison of support vector regression, random forest, extra trees and regression trees

Calculation of the performance of a solar thermal system is highly complex when using an analytical modelling approach. An overview of the theoretical equations governing the thermal dy- namics of solar thermal collectors can be found in Duf fi e and Beckman (2013). Often, computational models are required to capture the physical phenomena at the expense of a large amount of computational time and power. A combination of fi nite differ- ence and electrical analogy models were used in (Notton et al., 2013; Motte et al., 2013) to calculate the outlet temperature of a building integrated solar thermal collector. The accuracy of the numerical model was validated against experimental data allowing the authors to simulate future geometric and material design al- terations to improve the ef fi ciency of the solar collector. A nu- merical modelling approach was applied to a building integrated, aerogel covered, solar air collector in Dowson et al. (2012). From this, the authors were able to calculate outlet temperatures and collector ef fi ciency from weather conditions. The model outputs were validated to within 5% of the measured values over a short measurement period. As a result, the authors could simulate much longer time periods to demonstrate the potential ef fi ciency and fi nancial payback of their proposed solution. A numerical model- ling approach within the MATLAB environment applied to a v- groove solar collector was developed in Karim et al., (2014). The resulting model can predict the air temperature at any part of the
Show more

12 Read more

Parallel based support vector regression for empirical modeling of nonlinear chemical process systems

Parallel based support vector regression for empirical modeling of nonlinear chemical process systems

As an additional analysis, the performance of both models is also evaluated under extrapolating conditions. Extrapolation is a term that is used to describe the scenario when a model is forced to perform prediction in regions beyond the space of the original training data set. Due to the varying nature of processes in industry, empirical models tend to suffer from reduced robustness performance due to the models’ incapability to maintain their original accuracy for data outside the original training range (Castillo 2003; Himmelblau 2008; Kordon 2004; Lennox et al. 2001; Nelles 2001). In process industries, extrapolation is completely unavoidable because plants often operate outside the range of the original identification data used to develop the model (Castillo 2003; Kordon 2004). The variations in processes are actually a dominant and frequently encountered event. Many factors dictate such
Show more

9 Read more

Forecast of fund volatility using least squares wavelet support vector regression machines

Forecast of fund volatility using least squares wavelet support vector regression machines

In this paper, LS-WSVR with three different wavelet kernels are applied to forecasting fund volatility, and the in-sample and out-of-sample forecasting performance of these LS-WSVR are compared with those of LS-SVR with Gaussian kernel functions according to evaluation indices. The remaining of this paper is organized as follows. Section 2 presents the theory of LS-WSVR algorithm. Empirical results on SZSE fund index illustrating the effectiveness of the LS-WSVR are provided in Section 3. Conclusions are given in the final section.

6 Read more

Support Vector Machines

Support Vector Machines

or the Gaussian kernel, SVMs were able to obtain extremely good perfor- mance on this problem. This was particularly surprising since the input attributes x were just a 256-dimensional vector of the image pixel intensity values, and the system had no prior knowledge about vision, or even about which pixels are adjacent to which other ones. Another example that we briefly talked about in lecture was that if the objects x that we are trying to classify are strings (say, x is a list of amino acids, which strung together form a protein), then it seems hard to construct a reasonable, “small” set of features for most learning algorithms, especially if different strings have dif- ferent lengths. However, consider letting φ(x) be a feature vector that counts the number of occurrences of each length-k substring in x. If we’re consid- ering strings of english letters, then there are 26 k such strings. Hence, φ(x)
Show more

25 Read more

A comparison of random forests, boosting and support vector machines for genomic selection

A comparison of random forests, boosting and support vector machines for genomic selection

We used regression trees as basis functions. Boosting regression trees involves generating a sequence of trees, each grown on the residuals of the previous tree [5,9]. Prediction is accomplished by weighting the ensemble outputs of all the regression trees. We used stochastic gradient boosting, assuming the Gaussian distribution for minimizing squared-error loss in the R package gbm [9]. We determined the main tuning parameter, the optimal number of iterations (or trees), using an out-of- bag estimate of the improvement in predictive perfor- mance. This evaluates the reduction in deviance based on observations not used in selecting the next regression tree. The minimum number of observations in the trees’ terminal nodes was set to 1, the shrinkage factor applied to each tree in the expansion to 0.001 and the fraction of the training set observations randomly selected to propose the next tree in the expansion to 0.5. With these settings boosting regression trees with at most 8- way interactions between SNPs required 3656 iterations for the training dataset based on inspecting graphical plots of the out-of-bag change in squared error loss against the number of iterations [9].
Show more

5 Read more

Estimating Daily Pan Evaporation Using Data Mining Process

Estimating Daily Pan Evaporation Using Data Mining Process

Many researchers have also investigated the ap- plicability of the time series analysis, rstly proposed by Box and Jenkins [15], to hydrology studies, such as rainfall [16], ow [17,18], wind speed [19,20], and radiation [21,22]. Kisi (2004) used Articial Neural Networks (ANN) to predict monthly ow and com- pared the results with autoregressive models. He stated that ANN predictions, in general, are better than those found with AR(4) [23]. Yurekli and Ozturk (2003) determined alternative autoregressive moving average process (ARMA) models using the graphs of autocorrelation (ACF) and partial autocorrelation functions (PACF). The plots of the ACF showed that ARMA (1,0) with a constant was the best model by considering Schwarz Bayesian Criterion (SBC) and error estimates [24]. Torres et al. (2005) used ARMA and persistence models to predict the hourly average wind speed up to 10 h in advance. They showed that the use of ARMA models signicantly improved wind speed forecasts compared to those obtained with per- sistence models [25]. Wu and Chau (2010) investigated ARMA, K-Nearest-Neighbors (KNN), ANN and Phase Space Reconstruction-based Articial Neural Network (ANN-PSR) models to determine the optimal approach of predicting the monthly stream ow time series. They compared these models by a one-month-ahead forecast. They determined that the KNN model gives the best performance among the four models, but only exhibits weak superiority to ARMA [26]. Alhassoun et al. (1997) generated annual and monthly evaporation sequences using the rst order Markov model for ten stations in Saudi Arabia. They evaluated the perfor- mance of the developed models using the methods of fragments, Thomas-Fiering and Two-Tier, and dened their suitability [27]. Knapp et al. (1984) generated a weekly evaporation time series using the mass transfer method for Milford Lake. They also developed a mathematical model for the time series. The model
Show more

8 Read more

Bouligand Derivatives and Robustness of Support Vector Machines for Regression

Bouligand Derivatives and Robustness of Support Vector Machines for Regression

In this paper, we will prove that many SVMs based on Lipschitz continuous loss functions have a bounded Bouligand influence function. To formulate our results we will use Bouligand-derivatives in the sense of Robinson (1991) as defined above. These directional derivatives were to our best knowledge not used in robust statistics so far, but are successfully applied in approximation theory for non-smooth functions. Section 2 covers our definition of the Bouligand influence function (BIF) and contains the main result which gives the BIF for support vector machines based on a bounded kernel and a B-differentiable Lipschitz continuous convex loss function. In Section 3 it is shown that this result covers the loss functions L ε , L τ−pin , L c−Huber , and L log as special cases. Section 4
Show more

22 Read more

Show all 10000 documents...