**Volume 8, Issue 6, June 2019, ISSN: 2278 – 1323**

## Machine Learning Prediction Model for Rainfall

## Prediction: A Review

### Niharika Mishra

### M.Tech Scholar

Rungta College of Engineering & Technology mishraniharika28@gmail.com

### Ajay Kushwaha

Associate Professor (CSE) ajay.kushwaha@rungta.ac.in

**Abstract****— Prediction of rainfall is a perplexing task. A little**
**vacillation in the regular precipitation can impact sly affect**
**horticulture area. Precise precipitation forecast has a potential**
**advantage of forestalling causalities and harms caused by**
**catastrophic events. India, the achievement or disappointment of**
**the products and water shortage in any year is constantly seen**
**with most prominent concern. Under certain conditions, for**
**example, surge and dry spell, exceedingly exact precipitation**
**expectation is valuable for horticulture administration and**
**debacle anticipation. Numerous techniques are used for**
**prediction of rainfall such as data mining technique, machine**
**learning technique etc. In this paper we have discussed different**
**prediction models and bottle neck of existing prediction models.**

**Keywords—**

I. INTRODUCTION

Data forecast is the headway of making forecasting or approximating of the imminent based on chronological and contemporaneous data. Forecasting delivers information about the future events and acts like a planning tool for the association. Forecasting is the technique of making convinced assumptions based on the management’s information, involvement, and decision. Number of statistical techniques cast-off by forecasting that’s why we also called it as statistical analysis. Significance of forecasting implicates following

points:- Forecasting provides pertinent and dependable information about the historical and contemporary events and the likely future events. This is essential for wide-ranging planning.

Forecasting provides self-assurance to the managers for making essential assessments.

It is the foundation for making planning premises.

It keeps management people active and alert to face the challenges of future events and alter in the environment.

Boundaries of forecasting involves following

points:-I. The analysis and collection of data about the present, history and future involves lots of time and capital. Subsequently, managers have to equilibrium the cost of predicting with its reimbursement. Most of the

trifling firms don't do prediction on account of the high cost.

II. Forecasting task can only approximate the future measures. It cannot pledge that these measures will take place in the future. Long-term prediction will be fewer accurate in comparison with to short-range forecast.

III. Data Prediction is based on convinced assumptions. If these assumptions are mistaken, the forecasting will be incorrect. Forecasting is depend on past measures. Contrariwise, past may not reiterate itself at all times.

IV. Forecasting need proper skills and judgment on the part of managers. Forecasts may go incorrect due to terrible judgment and skills on the part of some of the managers. Consequently, predicting data are subject to human error.

Prediction is purely a prediction about the forthcoming values of data. Nevertheless, most predictive model conjectures shoulder that the historical is a delegation for the future. There are many outmoded models for predicting: regression, exponential smoothing, time series, and compound model predictions, frequently concerning skilled forecasts. Regression analysis is a statistical procedure to investigate measureable data to approximation model parameters and create predictions.

Predicting of rainfall can be beneficial

for:- Assessing the seriousness of dry season.

Predicting where/when there is a danger of flooding.

Limiting the impact of flooding and drought by enabling government and other agencies to put emergency response plans into operation.

**Volume 8, Issue 6, June 2019, ISSN: 2278 – 1323**

Applying machine learning to any data need a priori knowledge, like type of data, size of data, missing values etc., and dataset can be categorized in two ways depicted in fig.1.

Input Dataset

Homogenous Data Heterogonous Data

Fig.-1 Types of Data

If data is homogenous in nature then supervised machine learning algorithm is applicable, if the dataset is heterogonous in nature the supervised machine learning algorithm is applicable. In this research we will use rainfall dataset from Department of Agricultural Meteorology Indira Gandhi Agricultural University, Raipur Station: Labhandi Monthly Meteorological Data: 2015. Example as follows:

**Table-1 Meteorological Data**

**Month** _{Temp.}Max.**Min. Temp. _{(°C)}**

**Rainfall**

_{(mm)}**Relative Humidity**

_{(%)}**Wind Velocity**

_{(Kmph)}**(°C)**

**Dependent Variable** **Independent _{Variable}**

**Dependent Variable**

**Jan.** 26.5 11.4 9.4 91 37 2.8

**Feb.** 30.9 14.4 2.2 85 33 2.9

**Mar** 33.6 ? 19.3 75 34 3.8

**Apr.** 37.3 23.1 51.4 73 35 ?

**May** 41.9 27.4 13.4 58 28 7.1

**Jun.** 36 26 ? 80 54 8.4

**Jul.** ? 25.3 173.2 87 71 8.4

**Aug.** 31.3 25.2 267.4 91 73 7

**Sep.** 32.2 25.2 219.6 93 66 4.4

**Oct.** 33.3 22.3 0 91 47 2.7

**Nov.** 31.3 17.2 0 89 37 2.8

**Dec.** 29.1 14.9 13.8 85 39 2.6

**Total** 1041.3

**Average** 32.9 21 83 46 5

II. LITERATURESURVEY

M.Kannan et. al. said that Rainfall is essential for nourishment creation design, water asset administration and all movement designs in the nature. The event of delayed dry period or overwhelming precipitation at the basic phases of the harvest development and improvement may prompt huge decrease edit yield. India is an agrarian nation and its economy is to a great extent in light of harvest profitability. Along these lines precipitation forecast turns into a noteworthy factor in agrarian nations like India. Precipitation anticipating has been a standout amongst the most logically and innovatively difficult issues the world over in the most recent century. Creator likewise inferred that Rainfall time arrangement might be unwarranted. The point of rainstorm precipitation information arrangement is very perplexing; the part that numerous straight relapses may play in this subject is one for future research—it shows up, from the confirmation here, not to be helpful as a prescient model. Regardless of whether it may be helpful for offering an inexact estimation of future storm precipitation stays to be seen. Utilizing this relapse strategy, we need to gauge precipitation for our state moreover [IJET 2010].

Yajnaseni Dash et. al. examined the performance of single layer feed-forward neural network (SLFN) in ISMR forecast and for the first time apply random vector functional link network (RVFL) and regularized online sequential network -RVFL (ROS--RVFL) for the forecast purpose and subsequently examine their relative performance with respect to SLFN. Seasonal forecast of ISMR is made a year in advance using the previous year’s data. Six sets of forecasts are made, by varying the length of the past data (training period), ranging from 5 to 10 years. It is found that ROS-RVFL is more accurate and computationally more efficient than SLFN and RVFL. Secondly, irrespective of the techniques, when we consider the last 8 years of past data for the training purpose the result is found to be relatively more accurate, and the ensemble mean of forecast with 8 and 9 years of training period gives the best result among all the trials. Based on statistical and detail analyses finally it is concluded that ROS-RVFL with ensemble mean of 8-9 years training period is a promising approach for ISMR forecast.[1].

**Volume 8, Issue 6, June 2019, ISSN: 2278 – 1323**

and Levenberg- Marquardt training function has been used. The performance of both the models has been assessed based on Regression Analysis, Mean Square Error (MSE) and Magnitude of Relative Error (MRE). Proposed ANN model showed optimistic results for both the models for forecasting and found one month ahead forecasting model perform better than two months ahead forecasting model. This paper also gives some future directions for rainfall prediction and time series data analysis research [2].

Isaac Mugume et. al. compares the models’ ability to predict rainfall over Uganda for the period 21st April 2013 to 10th May 2013 using the root mean square (RMSE) and the mean error (ME). In comparing the performance of the models, this study assesses their ability to predict light rainfall events and extreme rainfall events. All the experiments used the default parameterization configurations and with same horizontal resolution (7 Km). The results show that COSMO model had a tendency of largely predicting no rain which explained its under–prediction. The COSMO model (RMSE: 14.16; ME: -5.91) presented a significantly (p = 0.014) higher magnitude of error compared to the WRF model (RMSE: 11.86; ME: -1.09). However the COSMO model (RMSE: 3.85; ME: 1.39) performed significantly (p = 0.003) better than the WRF model (RMSE: 8.14; ME: 5.30) in simulating light rainfall events. All the models under–predicted extreme rainfall events with the COSMO model (RMSE: 43.63; ME: -39.58) presenting significantly higher error magnitudes than the WRF model (RMSE: 35.14; ME: -26.95). This study recommends additional diagnosis of the models’ treatment of deep convection over the tropics [3].

Table-2 Comparison of Literature

**S**
**.**
**N**
**o**
**.**
**Author/Title/Y**
**ear/Publicatio**
**n**
**Method**
**Used** **Description**
1 Shubhendu
Trivedi e. al.
The Utility of
Clustering in
Prediction
Tasks Centre
for
Mathematics
and Cognition
gran 2011

K-Menas Observed that use of a predictor in conjunction with clustering improved the prediction accuracy in most datasets

2 Hakan Tongal et. al. Phase-space reconstruction and self-exciting threshold modeling approach to forecast lake k-nearest neighbour (k-NN) model & SETAR model

A comparison of two nonlinear model approaches was made. Author used the k-NN approach and SETAR model for prediction of water levels for the three largest lakes in Sweden. water levels Springer-Verlag Berlin Heidelberg 2013

3 Andrew Kusiak et. al./ Modeling and Prediction of Rainfall Using Radar Reflectivity Data: A Data-Mining Approach/ IEEE 2013 k-NN, SVM,MLP , Random forest

Among the five data-mining algorithms tested in this paper, the MLP has performed best. It has been selected to predict rainfall for three models for all future time horizons. The baseline Model I has been constructed with radar reflectivity data only. The proposed methodology has demonstrated high-accuracy rainfall predictions in Oxford, Iowa. 4 Pinky Saikia

Dutta Et. Al. / Prediction Of Rainfall Using Datamining Technique Over Assam/ IJCSE 2014 Multiple linear regression

Uses data mining technique in forecasting monthly Rainfall of Assam. This was carried out using traditional statistical technique -Multiple Linear Regression. The data include Six years period [2007-2012] collected locally from Regional

Meteorological Center,

Guwahati,Assam, India . The performance of this model is measured in adjusted R-squared. 5 M.Kannan et.

al./Rainfall Forecasting Using Data Mining Technique/ IJET 2010

**Volume 8, Issue 6, June 2019, ISSN: 2278 – 1323**

around the world in the last century. Regression technique provides sifnificent accuracy.

6 Ravinesh C. Deo et. al./ Application of the extreme learning machine algorithm for the prediction of monthly Effective Drought Index in eastern Australia/ Elsevier 2014

ANN

model The ELM model isseen to enhance the prediction skill of the monthly Effective Drought Index over the ANN model, and therefore, can overcome

deficiencies in prediction when applied to climate analysis that typically requires thousands of training data points and time efficacy of the modeling framework.

7 Jae-Hyun Seo et. al./Feature Selection for Very Short-Term Heavy Rainfall Prediction Using Evolutionary Computation/ Hindawi 2013

k-NN and

k-VNN In comparative SVMtests using evolutionary

algorithms, the results showed that genetic algorithm was considerably superior to diﬀ erential

evolution. Te equitable treatment score of SVM with polynomial kernel was the highest among our experiments on average. k-VNN outperformed k-NN, but it was dominated by SVM with polynomial kernel.

III. MACHINELEARNING ANDRAINFALLPREDICTION

Predicting rainfall from Meteorological data involves following steps:

1. Data Preprocessing 2. Attribute Selection 3. Classification

Data preprocessing phase is required for dimensionality reduction, as we can see from Table-1, Dependent variable contains missing values, similarly independent variable contains missing values, and row contains missing value in response variable need to be eliminated from input dataset because it affects the machine learning classification performance. If data contains missing values in predictor or dependent variable need to identify using algorithm. Through Linear Regression machine learning algorithm we can understand it.

**Regression** examination is a measurable process for
assessing the connections among factors. It incorporates
numerous systems for displaying and breaking down a few
factors, when the emphasis is on the connection between a
needy variable and at least one free factors (or 'indicators'). All
the more particularly, relapse examination causes one see how
the average estimation of the reliant variable (or 'paradigm
variable') changes when any of the autonomous factors is
shifted, while the other free factors are held settled. Most
regularly, relapse examination evaluates the restrictive desire
of the reliant variable given the free factors – that is, the
normal estimation of the needy variable when the autonomous
factors are settled.

The level line is known as the X-axis and the vertical line the Y-axis. Regression examination searches for a connection between the X variable (in some cases called the "autonomous" or "informative" variable) and the Y variable (the "reliant" variable).

Fig.-2 Linear Regression

Regression analysis seeks to find the “line of best fit” through the points. Fundamentally, the regression line is attracted to best estimate the connection between the two factors. Systems for approximating the relapse line (i.e., its intercept on the Y-hub and its incline). Forecasts utilizing the regression line accept that the relationship which existed in the past between the two factors will keep on existing later on. Relapse investigation can be extended to incorporate more than one free factor. Regression concerning more than one autonomous variable are alluded to as various regression.

**Volume 8, Issue 6, June 2019, ISSN: 2278 – 1323**

investigation showings a factual affiliation or connection among factors, as opposed to a causal relationship among factors. The instance of basic, straight, minimum squares Regression might be composed in the frame:

Where Y, the dependent variable, is a linear function of X, the independent variable. The parameters α and β characterize the populace regression line and e is the arbitrarily scattered error term. The regression guesstimates of α and β will be derived from the norm of least squares.

Multivariate regression also allows us to ascertain the generally fit (variance) of the model and the comparative involvement of each of the predictors to the total variance discussed. For instance, we might want to discern how much of the difference in exam presentation can be explained by amendment time, test apprehension, lecture presence and gender "as a whole", but also the "relative involvement" of each independent data variable in explaining the discrepancy. Universal regression model as follows:

###

###

###

###

###

###

###

###

###

###

###

*x*

*x*

_{k}*x*

_{k}*Y*

_{0}

_{1}

_{1}

_{2}

_{2}

###

• 0,1,,kare parameters

• X1, X2, …,Xk are identified

constants

• , the error terms are independent
N(o,2_{)}

IV. CONCLUSION

India is essentially an agrarian nation and the achievement or disappointment of the reap and water shortage in any year is constantly considered with the best concern. Amid the late spring the landmass is warmed, prompting rising movement and lower weight. This prompts wind stream from ocean to arrive at low heights. Henceforth we need a prediction over rainfall so as to it will be beneficial for draught, agriculture system etc.

**References**

[1]. Yajnaseni Dash, Saroj Kanta Mishra, Sandeep Sahany, Bijaya Ketan Panigrahi, Indian Summer Monsoon Rainfall Prediction: A Comparison of Iterative and Non-Iterative Approaches S1568-4946(17)30537-9.

[2]. Neelam Mishra, Hemant Kumar Soni, Development and Analysis of Artificial Neural Network Models for Rainfall Prediction by Using Time-Series Data, I.J. Intelligent Systems and Applications, 2018, 1, 16-23 Published Online January 2018 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2018.01.03

[3]. Isaac Mugume, Charles Basalirwa, Daniel Waiswa, Mary Nsabagwa, Triphonia Jacob Ngailo, Joachim Reuder, Schattler Ulrich, Musa Semujju, A Comparative Analysis of the Performance of COSMO and WRF Models in Quantitative Rainfall Prediction, World Academy of Science, Engineering and Technology International

Journal of Marine and Environmental Sciences Vol:12, No:2, 2018

[4]. Shubhendu Trivedi e. al. The Utility of Clustering in Prediction Tasks Centre for Mathematics and Cognition gran 2011.

[5]. Hakan Tongal et. al. Phase-space reconstruction and self-exciting threshold modeling approach to forecast lake water levels Springer-Verlag Berlin Heidelberg 2013 [6]. Andrew Kusiak et. al./ Modeling and Prediction of