Rainfall Prediction using Gaussian Process Regression Classifier

(1)

Volume 8, Issue 8, August 2019, ISSN: 2278 – 1323

Rainfall Prediction using Gaussian Process

Regression Classifier

Niharika Mishra,M.Tech Scholar

Rungta College of Engineering & Technology

Ajay Kushwaha,Associate Professor (CSE)

Rungta College of Engineering & Technology

Abstract— Forecasting is a procedure of estimating or predicting the future depends on past and nearby data. Forecasting provides information about the impending future measures and their consequences for the administration. Prediction of rainfall can play impotent role for WRM (Water resource management). After studying different literature, work can be carried out using data mining techniques and machine learning model. Due to very much changing nature of atmosphere rainfall prediction became the vital research area. In this paper we have given an experimental review, in which we have taken Meteorological Data collected by Department of Agricultural Meteorology Indira Gandhi Agricultural University, Raipur (C.G.), applied different machine learning classifier and validated using Matlab 2019a. Eventually we have proposed to use Gaussian Process Regression as classifier, proposed prediction model achieved 95.4% accuracy.

Keywords— NB, DT, SVM, Regression

I. INTRODUCTION

India is essentially an agrarian nation and the achievement or disappointment of the reap and water shortage in any year is constantly considered with the best concern. The term tornado appears to have been gotten moreover from the Arabic or from the Malayan monsoon. As first utilized it was connected to southern Asia and the nearby waters, where it alluded to the occasional surface air streams which turn around their headings amongst winter and Summer, southwest in summer and north east in winter here. Amid the late spring the landmass is warmed, prompting rising movement and lower weight. This prompts wind stream from ocean to arrive at low heights.

Predicting of rainfall can be beneficial

for:- Assessing the seriousness of dry season.

 Predicting where/when there is a danger of flooding.

 Limiting the impact of flooding and drought by enabling government and other agencies to put emergency response plans into operation.

Identifying sensitive and risk prone areas for flooding and drought throughout the province. Forecast is merely a prediction about the future values of data. In any case, most extrapolative model estimates expect that the past is an intermediary for what's to come. There are numerous customary models for determining: exponential smoothing, relapse, time arrangement, and composite model figures, frequently including master conjectures. Relapse investigation

is a measurable procedure to dissect quantitative information to assess demonstrate parameters and make forecast.

Forecast is merely a prediction about the future values of data. In any case, most extrapolative model estimates expect that the past is an intermediary for what's to come. There are numerous customary models for determining: exponential smoothing, relapse, time arrangement, and composite model figures, frequently including master conjectures. Relapse investigation is a measurable procedure to dissect quantitative information to assess demonstrate parameters and make forecast.

Applying machine learning to any data need a priori knowledge, like type of data, size of data, missing values etc., and dataset can be categorized in two ways depicted in fig.1.

Input Dataset

Homogenous Data Heterogonous Data

Fig.-1 Types of Data

A system that maps an input to an output needs training to do this in a useful way. Just as people need to be trained to perform tasks, machine learning systems need to be trained. Training is accomplished by giving the system an input and the corresponding output and modifying the structure (models or data) in the learning machine so that mapping is learned. In some ways this is like curve fitting or regression. If we have enough training pairs, then the system should be able to produce correct outputs when new inputs are introduced. Training of data collected can be done by following ways as depicted in fig.2.

(2)

Volume 8, Issue 8, August 2019, ISSN: 2278 – 1323 II. LITERATURESURVEY

M.Kannan et. al. said that Rainfall is essential for nourishment creation design, water asset administration and all movement designs in the nature. The event of delayed dry period or overwhelming precipitation at the basic phases of the harvest development and improvement may prompt huge decrease edit yield. India is an agrarian nation and its economy is to a great extent in light of harvest profitability. Along these lines precipitation forecast turns into a noteworthy factor in agrarian nations like India. Precipitation anticipating has been a standout amongst the most logically and innovatively difficult issues the world over in the most recent century. Creator likewise inferred that Rainfall time arrangement might be unwarranted. The point of rainstorm precipitation information arrangement is very perplexing; the part that numerous straight relapses may play in this subject is one for future research—it shows up, from the confirmation here, not to be helpful as a prescient model. Regardless of whether it may be helpful for offering an inexact estimation of future storm precipitation stays to be seen. Utilizing this relapse strategy, we need to gauge precipitation for our state moreover [IJET 2010].

Yajnaseni Dash et. al. examined the performance of single layer feed-forward neural network (SLFN) in ISMR forecast and for the first time apply random vector functional link network (RVFL) and regularized online sequential network -RVFL (ROS--RVFL) for the forecast purpose and subsequently examine their relative performance with respect to SLFN. Seasonal forecast of ISMR is made a year in advance using the previous year’s data. Six sets of forecasts are made, by varying the length of the past data (training period), ranging from 5 to 10 years. It is found that ROS-RVFL is more accurate and computationally more efficient than SLFN and RVFL. Secondly, irrespective of the techniques, when we consider the last 8 years of past data for the training purpose the result is found to be relatively more accurate, and the ensemble mean of forecast with 8 and 9 years of training period gives the best result among all the trials. Based on statistical and detail analyses finally it is concluded that ROS-RVFL with ensemble mean of 8-9 years training period is a promising approach for ISMR forecast.[1].

S . N o . Author/Title/Y ear/Publicatio n Method Used Description 1 Shubhendu Trivedi e. al. The Utility of Clustering in Prediction Tasks Centre for Mathematics and Cognition gran 2011

K-Menas Observed that use of a predictor in conjunction with clustering improved the prediction accuracy in most datasets

2 Hakan Tongal et. al. Phase-space reconstruction and self-exciting threshold modeling approach to forecast lake water levels Springer-Verlag Berlin Heidelberg 2013 k-nearest neighbour (k-NN) model & SETAR model

A comparison of two nonlinear model approaches was made. Author used the k-NN approach and SETAR model for prediction of water levels for the three largest lakes in Sweden.

3 Andrew Kusiak

et. al./ Modeling and Prediction of Rainfall Using Radar Reflectivity Data: A Data-Mining Approach/ IEEE 2013 k-NN, SVM,MLP , Random forest

Among the five data-mining algorithms tested in this paper,

the MLP has

performed best. It has been selected to predict rainfall for three models for all future time horizons. The baseline Model I has been constructed with radar reflectivity data only. The proposed methodology has demonstrated high-accuracy rainfall predictions in Oxford, Iowa. 4 Pinky Saikia

Dutta Et. Al. / Prediction Of Rainfall Using Datamining Technique Over Assam/ IJCSE 2014 Multiple linear regression

Uses data mining technique in forecasting monthly Rainfall of Assam. This was carried out using traditional statistical technique -Multiple Linear Regression. The data include Six years period [2007-2012] collected locally from Regional

Meteorological Center,

Guwahati,Assam,

India . The

(3)

Volume 8, Issue 8, August 2019, ISSN: 2278 – 1323 5 M.Kannan et.

al./Rainfall Forecasting Using Data Mining Technique/ IJET 2010

Regression Rainfall prediction becomes a significant factor in agricultural countries like India. Rainfall forecasting has been one of the most scientifically and technologically challenging problems around the world in the last century. Regression technique provides sifnificent accuracy.

6 Ravinesh C. Deo et. al./ Application of the extreme learning machine algorithm for the prediction of monthly Effective Drought Index in eastern Australia/ Elsevier 2014 ANN

model The ELM model isseen to enhance the prediction skill of the monthly Effective Drought Index over the ANN model, and therefore, can overcome

deficiencies in prediction when applied to climate analysis that typically requires thousands of training data points and time efficacy of the modeling framework.

7 Jae-Hyun Seo et. al./Feature Selection for Very Short-Term Heavy Rainfall Prediction Using Evolutionary Computation/ Hindawi 2013 k-NN and

k-VNN In comparative SVMtests using evolutionary

algorithms, the results showed that genetic algorithm was considerably

superior to

diﬀ erential

evolution. Te equitable treatment score of SVM with polynomial kernel was the highest

among our

experiments on average. k-VNN outperformed k-NN, but it was dominated

by SVM with

polynomial kernel. III. METHDOLOGY

Existing machine learning classifiers cannot able to achieve remarkable accuracy, due to homogeneity of input

dataset. Henceforth we need a proposed model which can reduce homogeneity and convert data into classful data.

In proposed method we will apply k-medoid clustering algorithm to make data classful. We have used K-medoid because there are some bottlenecks of k-nearest neighbour prediction are as follows:

1. The straightforward algorithm has a costO(n log(k)),not good if the dataset is large.

2. The model cannot be interpreted (there is no description of the learned concepts).

3. It is computationally affluent to treasure the k nearest neighbours when the dataset is very hefty.

4. Enactment be contingent on the number of dimensions that we have (curse of dimensionality) =⇒ Characteristic Selection.

5. The more magnitudes we have, the more instances we need to imprecise a suggestion.

6. This is especially bad for k-nearest neighbours i.e. if the dimensions of dataset is in elevation the resultant nearest neighbours may be very far away.

7. The number of instances that we have in a volume of space diminutions exponentially pertaining to number of dimensions.

8. K-means has difficulties when clusters are of opposing a. Dimensions

b. Densities 9. Problems with outliers

10. K-means is sluggish and scales unwell pertaining to the time it takes for large number of points.

Fig.-3 Proposed Prediction Model

k-medoid is a classical data partitioning algorithm for clustering that group the dataset of n data points into k clusters recognized a priori. A constructive tool for influential k is the shadow. It is added full-bodied to outliers and noise in comparison to k-means for the reason that it minimizes a sum of pair wise dissimilarities rather than a sum of squared Euclidean distances. A medoid can be explained as the object of a group whose typical dissimilarity to all the data objects in the cluster is minimal. That is it is a most centrally positioned point in the cluster. Cost function in K-Medoid algorithm is defined as equation 1.

Meteorological data

Apply Gaussian Process Regression Classifier

(4)

Volume 8, Issue 8, August 2019, ISSN: 2278 – 1323

Algorithm: K-Medoid

1. Initialize: randomly select (without replacement) k of the n data points as the medoids

2. Associate each data point to the closest medoid. 3. While the cost of the configuration decreases:

a. For each medoid m, for each non-medoid data point o:

i. Swap m and o, recompute the cost (sum of distances of points to their medoid)

ii. If the total cost of the configuration increased in the previous step, undo the swap

Proposed Algorithm

Regression (training_features, Trainclass, featTest, Testclass, featName, classifier)

/* training_features - A NUMERIC matrix of training features (N x M)

Trainclass- A NUMERIC vector represents the values of the dependent variable of the training data of order (N x 1) featTest- A NUMERIC matrix of testing features (Nts x M) Testclass- A NUMERIC vector representing the values of the dependent variable of the testing data (Nts x 1)

featName- The CELL vector of string representing the label of each features, (1 x M) cell*/

//classifier as KNN Regression

NNBestFeat = floor(Datapoints()/10) //nearest neighbor trainModel=KNN Regression model

NNSearch=Initialize earch function for KNNReg as linearsearch

//Set the distance measure for NNSearch

distFunc = Euclidean distance (or resemblance) function trainModel. Assign NearestNeighbourSearchAlgorithm (NNSearch)

trainModel.setKNN(NNBestFeat)

Finally to validate prediction model k-fold cross validation done as follows:

k-fold cross validation

1. Split the dataset into Kequalpartitions (or "folds")

 So if k = 5 and dataset has 150 observations

 Each of the 5 folds would have 30 observations

2. Use fold 1 as thetesting setand the union of the other folds as thetraining set

 Testing set = 30 observations (fold 1)

 Training set = 120 observations (folds 2-5)

3. Calculatetesting accuracy

4. Repeat steps 2 and 3 K times, using adifferent foldas the testing set each time

 We will repeat the process 5 times

 2nd iteration

 fold 2 would be the testing set

 union of fold 1, 3, 4, and 5 would be the training set

 3rd iteration

 fold 3 would be the testing set

 union of fold 1, 2, 4, and 5 would be the training set

 And so on...

5. Use theaverage testing accuracyas the estimate of out-of-sample accuracy

(5)

Volume 8, Issue 8, August 2019, ISSN: 2278 – 1323 IV. EXPERIMENTALEVALUATION

For implementation of our proposed algorithm we have used Matlab 2019a. We have used rainfall dataset from Department of Agricultural Meteorology Indira Gandhi Agricultural University, Raipur Station: Labhandi Monthly Meteorological Data. Example as follows:

Table-1: Data Sample year Max

_Te mp

Min_ Tem p

Relative _Humid

ity1

Relati ve_H umidi ty2

Wind _Vel ocity

Rai nfa ll

2013 26.5 11.4 91 37 2.8 9.4

2013 30.9 14.4 85 33 2.9 2.2

2013 33.6 19.1 75 34 3.8 19.

3

2013 37.3 23.1 73 35 7.2 51.

4

2013 41.9 27.4 58 28 7.1 13.

4

Fig.-5 Main UI

Fig.-5 shows the main UI of proposed model in which takes input as training set and test set. Here we have compared our proposed model with decision tree classifier and Naïve Bayes classifier.

Fig.-6 Decision Tree

Fig.-7 Accuracy of Proposed Model

(Decision Tree) (Naïve Bayes) Fig.-8 Predicted Values (DST and NB)

(6)

Volume 8, Issue 8, August 2019, ISSN: 2278 – 1323 V. CONCLUSION

In this research we have proposed a novel and better accurate rainfall prediction model. After experimental evaluation we came into conclusion that our proposed algorithm produces accuracy as 95.2%.

S. No. Classifier Accuracy

1. Decision Tree Classifier 32% 2. Naïve Bayes Classifier 43%

3. Proposed Model 95.2%

In future we can apply some preprocessing method to increase the prediction accuracy, we can optimize classifier using deep learning classifier.

References

[1]. Yajnaseni Dash, Saroj Kanta Mishra, Sandeep Sahany, Bijaya Ketan Panigrahi, Indian Summer Monsoon Rainfall Prediction: A Comparison of Iterative and Non-Iterative Approaches S1568-4946(17)30537-9.

[2]. Neelam Mishra, Hemant Kumar Soni, Development and Analysis of Artificial Neural Network Models for Rainfall Prediction by Using Time-Series Data, I.J. Intelligent Systems and Applications, 2018, 1, 16-23 Published Online January 2018 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2018.01.03

[3]. Isaac Mugume, Charles Basalirwa, Daniel Waiswa, Mary Nsabagwa, Triphonia Jacob Ngailo, Joachim Reuder, Schattler Ulrich, Musa Semujju, A Comparative Analysis of the Performance of COSMO and WRF Models in Quantitative Rainfall Prediction, World Academy of Science, Engineering and Technology International Journal of Marine and Environmental Sciences Vol:12, No:2, 2018

[4]. Shubhendu Trivedi e. al. The Utility of Clustering in Prediction Tasks Centre for Mathematics and Cognition gran 2011.

[5]. Hakan Tongal et. al. Phase-space reconstruction and self-exciting threshold modeling approach to forecast lake water levels Springer-Verlag Berlin Heidelberg 2013 [6]. Andrew Kusiak et. al./ Modeling and Prediction of