Abstract. Rainprediction is an important topic that continues to gain attention throughout the world. The rain has a big impact on various aspects of human life both socially and economically, for example in agriculture, health, transportation, etc. Rain also affects natural disasters such as landslides and floods. The various impact of rain on human life prompts us to build a model to understand and predict rain to provide early warning in various fields/needs such as agriculture, transportation, etc. This research aims to build a rainprediction model using a rule-basedMachineLearningapproach by utilizing historical meteorological data. The experiment using the J48 method resulted in up to 77.8% accuracy in the training model and gave accurate prediction results of 86% when tested against actual weather data in 2020.
In this research, a problem of the prediction problem of the height of significant waves was solved using wave parameters such as wave spectral density. This significant wave height prediction helps both wave power converters and onboard navigation systems. This research will optimize wave parameters for a fast and efficient wave height prediction. For this Particle Swarm Optimization feature reduction techniques are used. So reduced features are taken into consideration for prediction of wave height using neural network, extreme learningmachine random forest forecasting and support vector machine forecasting technique. In this work, performance evaluation metrics such as MSE RMSE and MAE values are decreased and gives better performance of classification that is compared with existing research’s implemented methodology. From the experimental results, it is observed that proposed algorithm gives the better prediction results with PSO feature reduction technique and ELM forecasting techniques.The future
In this study, the following parameters are used as input pregnancies, Glucose, Blood Pressure, skin thickness, insulin, BMI, Diabetes pedigree Function, and Age. There are a number of machinelearning and statistical techniques that can used to predict diabetes diseases. Based on the extent literature, we settled on employing four most known machinelearning algorithm (Random Forest (RF), KNN, Naïve Bayes, and J48) classification algorithm and ensemble/combined them in to one using base learner. The following section describes these Classification techniques and their unique requirements used in this research study.
Abstract—This paper presents a machinelearning (ML) based model to predict the diffraction loss around the human body. Practically, it is not reasonable to measure the diffraction loss changes for all possible body rotation angles, builds and line of sight (LoS) elevation angles. A diffraction loss variation prediction model based on a non-parametric learning technique called Gaussian process (GP) is introduced. Analysed results state that 86% correlation and normalised mean square error (NMSE) of 0.3 on the test data is achieved using only 40% of measured data. This allows a 60% reduction in required measurements in order to achieve a well-fitted ML loss prediction model. It also confirms the model generalizability for non-measured rotation angles.
R. Sharmila et al,  proposed to use non- linear classification algorithm for heart disease prediction. It is proposed to use bigdata tools such as Hadoop Distributed File System (HDFS), Mapreduce along with SVM for prediction of heart disease with optimized attribute set. This work made an investigation on the use of different data mining techniques for predicting heart diseases. It suggests to use HDFS for storing large data in different nodes and executing the prediction algorithm using SVM in more than one node simultaneously using SVM. SVM is used in parallel fashion which yielded better computation time than sequential SVM. Jayami Patel et al,  suggested heart disease predictionusing data mining and machinelearning algorithm. The goal of this study is to extract hidden patterns by applying data mining techniques. The best algorithm J48 based on UCI data has the highest accuracy rate compared to LMT.
Table 5 shows the confusion matrix for BN and three other ML techniques, DT, kNN and SVM based on target class of ’no flood’ (NO) and ’flood’ (YES) using normal data and SMOTE data. There are some observation has been done in the experiment. For example, it is observed that the best kNN result in accuracy were produced if the value of k is set to 1 while other value of k such as 3, 5 and 7 produced slightly worse result in accuracy. In other words, when the value of k increased, the accuracy achieved by kNN will be decreased. Besides, the water level play important roles as features for flood since the feature become main rules for DT (’no flood’ if the water level lower and equal to 2182 cm and ’flood’ if the water level more than 2182 cm) and directly pointed to target class in BN. Meanwhile, BN still evolve over time and has potential to be improved further in future.  Claimed that BN have many advantages over other classification techniques to solve the real world problems. In their survey, they explained and discuss every discrete BN classifier that available and categorize them in three group based on factorization. They also stated that BN can be organised hierarchically from the simplest algorithm like naive Bayes to the most complex like Bayesian multiunit. However,  stated in their work where the continuous development of machinelearning algorithms in time may expand the machinelearning applications in the field of hydrology are becoming more and more extensive in the future especially on flood risk assessment.
must to know before going to visit it. This paper presents Rating Predictionbased on user Review. A step-by-step methodology for User-based review sentiment analysis. The first, machinelearningapproach, tackles the problem as a text classification task employing supervised classifier like Naive Bayes algorithm as it is most suited for text classification. The Second, a lexicon-based method, uses a dictionary of words with assigned scores to the text-based review to calculate a polarity of reviews and decide if the review is positive or negative. Here in this paper, we show the combination of lexicon and machinelearning approaches which improves the accuracy of Naïve Bayes classification by 5% to 10% based on review length and the size of the dataset.
crimes in India have seen a spike. There is no particular reason for any trouble for criminal activities. Sometimes society, cultural factors, different family systems, political influences and law enforcement are responsible for the criminal activities of an individual. Crime can be found in various categories. To prevent this problem in police sectors, we must predict crime rate usingmachinelearning techniques. The aim is to investigate machinelearningbased techniques for crime rate by prediction results in best accuracy and explore in this work the applicability of data technique in the efforts of crime prediction with particular importance to the data set. The analysis of dataset by supervised machinelearning technique (SMLT) to capture several information’s like, variable identification, uni- variate analysis, bi-variate and multi-variate analysis, missing value treatments and analyse the data validation, data cleaning/preparing and data visualization will be done on the entire given dataset. Our analysis provides a comprehensive guide to sensitivity analysis of model parameters with regard to performance in prediction of crime rate by accuracy calculation from comparing supervise classification machinelearning algorithms
 S.Veenadhari et al., 2014this research work performs forecasting of crop yield based on climatic parameters. The author adopted decision tree classifier for classifying the crops on the basis of climatic parameters. Algorithm is used to find out the most effective climatic parameter on the crop yields of selected crops in selected districts of Madhya Pradesh. For example- for Soyabean crop the most influencing parameter was cloud cover and for wheat crop it was minimum temperature. There are many other parameters can be considered for crop yield prediction like production area, pesticides etc. but in this study only climatic parameters were used for predicting the crop yield.
Abstract: Machinelearning plays a major role from past years in image detection, spam reorganization, normal speech command, product recommendation and medical diagnosis. Present machinelearning algorithm helps us in enhancing security alerts, ensuring public safety and improve medical enhancements. Machinelearning system also provides better customer service and safer automobile systems. In the present paper we discuss about the prediction of future housing prices that is generated by machinelearning algorithm. For the selection of prediction methods we compare and explore various prediction methods. We utilize lasso regression as our model because of its adaptable and probabilistic methodology on model selection. Our result exhibit that our approach of the issue need to be successful, and has the ability to process predictions that would be comparative with other house cost prediction models. More over on other hand housing value indices, the advancement of a housing cost prediction that tend to the advancement of real estate policies schemes. This study utilizes machinelearning algorithms as a research method that develops housing price prediction models. We create a housing cost prediction model In view of machinelearning algorithm models for example, XGBoost, lasso regression and neural system on look at their order precision execution. We in that point recommend a housing cost prediction model to support a house vender or a real estate agent for better information based on the valuation of house. Those examinations exhibit that lasso regression algorithm, in view of accuracy, reliably outperforms alternate models in the execution of housing cost prediction.
Image processing as a domain is a vast topic and nonetheless to say that the wide range of appli- cations it provides are worth the research was done in this field. This paper is focused on how the emp- loyment of machinelearningapproach can simplify the task of image processing both concerning the complexity of processing and quality of output. In this paper, we have used two aspects of machinelearning namely regularization and clustering (un- supervised approach [Michael and Bishop 2012]) and some of the basic techniques like bilateral filte- ring [Tomasi and Manduchi, 1998], etc. for comp- aring the performance of our approach. We prima- rily aim at solving the problem of rain removal and secondarily the Gaussian noise. The former being the structured form of noise and latter being the un- structured noise patterns. We have successfully tried to bring out the differences from other meth- ods used and the dominance of our approach over them.
In this paper we have compared three machinelearning techniques with a LSR model for predicting software project eort. These techniques have been compared in terms of accuracy, explanatory value and con®gurability. Despite ®nding that there are dierences in prediction accuracy levels, we argue that it may be other characteristics of these techniques that will have an equal, if not greater, impact upon their adoption. We note that the explanatory value of both estimation by analogy (case-based reasoning) and rule induction, gives them an advantage when considering their interaction with end-users. We also have found that problems of con®guring neural nets tend to rather counteract their superior performance in terms of accuracy. This pre- liminary research has shown the need for further in- vestigation, particularly in ®nding appropriate con®guration heuristics for neural nets. Whilst some heuristics have been published (e.g. Walczak and Cerpa,
The reinforcement learning (RL) approach enables an agent to learn a mapping from states to actions by trial and error so that the expected cumulative reward in the future is maximized. RL is powerful since a learning agent is not told which action it should take; instead it has to discover through interactions with the system and its environment which action yields the highest reward  .
We pick an internet based life check-in dataset and picture these clients' check-in locations. It is clear that there displays an amazing check-in design crosswise over prominent spots like CBD, air terminal and tech parks. Clients' check-in exercises exhibit a one of a kind edge into their life, and the dissemination designs mirror their interest and inclination. As of late, the estimation of check-in information have been exhibited in numerous applications, including however not constrained in versatile advertising, advancement suggestion, activity administration and social reconnaissance. Behind the check-in information processing, location prediction is a key errand. In any case, it is exceptionally challenging because of the check-in information's inherent attributes. To start with, Sparsity: There is an extensive conceivable space which clients can visit, however in actuality they just cover a little arrangement of the spots. Second, Heterogeneousness: Location information comprises of various types of highlights, i.e., the location, content and fleeting information.
Regression is basically a statistical approach to find the relationship between variables. In machinelearning, this is used to predict the outcome of an event based on the relationship between variables obtained from the data-set. Linear regression is one type regression used in MachineLearning.
In , a method was proposed to forecast wind power in the short-term, based on the application of an Evolutionary Algorithm optimisation for the automated specification of neural networks (NN) and nearest neighbour search. In the same work, the forecasted results were compared with two others algorithms based on particle swarm optimisation (PSO) and differential evolution. The proposed method used weather data combined with historical wind power data from several wind farms in Germany. Also, the system was tested with data from 2004 to 2007 with a time step of 1 hour.
Feature Extraction: This is a general method where one tries to transform a current input space into a smaller dimensional subspace which can preserve the most pertinent information. Feature extraction can be in used in that sense that it can reduce the complexity and give a much simpler representation of variables in a feature space such as the variables original direct combination. The most commonly used approach for feature extraction is the Principle Component Analysis (PCA), which was introduced by Karl. There have been various iterations and variants of PCA that have been proposed. PCA is basically a method which doesn’t depend on parametric in order to extract the most pertinent information from a redundant set of data. It is a linear transformation of data which minimizes the redundancy and increases the information efficiency. Feature Selection: In this method, a subset from the current data features are selected as input for the progression of a learning algorithm. The subset is chosen based upon the set with the least dimensions that can give the best learning accuracy. Many studies have been performed in order to select the best feature extraction and selection approaches. A few approaches used in feature selection are RELIEF, CMIM, BW-ration, Correlation Coefficient, GA, SVM-REF, Non-Linear PCA, Independent Component Analysis, and Correlation based feature selection.
Decision Tree is one of the supervised learning algorithms which it is used to represent the decision that is made based on the condition. It is used for both classification and regression. The Decision tree is always constructed from top to bottom. The first node from the top is called as root node. The last nodes is called as a leaf node. Internal nodes are present in between the root node and leaf nodes. Based on some condition the internal nodes are split and finally, the decisions are made. In the real time as the number of variables increases tree grows larger and algorithm becomes complex. In Decision tree we have two types, they are classification and regression trees. Classification tree is used to classify the dataset, so that it is easy to analyze the data. But using this algorithm we cannot make a prediction. The Regression tree is a tree mainly used to predict continuous values. Growth of tree depends on factors like:
progression modeling, personalized medication, integrating genetics, Predictive modeling . The first 24 hours of patient information are often the most predictive of hospital mortality. In  the author discusses an overlapping and hierarchical social clustering model (OHSC) is designed to classify the vehicles into different social clusters by exploring the social relationship between them is measured by whether the vehicles are driven or parked in the small area simultaneously by the result of OHSC, an SBL algorithm is used to provide the global location information for the vehicular networks based on the prediction of vehicles location even without the GPS devices.
Vector Machinebased method to predict the protein functions by using sequence derived properties with an accuracy ranged from 94.23% to 100%. Also Statnikov A, Wang L, Aliferis CF proposed the comprehensive comparison of random forest and support vector machine and they suggested that both on average and in the majority of microarray datasets, random forests are outperformed by SVM. Cai CZ, Wang WL, Sun LZ, Chen YZ also proposed  Protein function classification by SVM approach in which accuracy for the classification model of protein classes is found to be in the range of 84- 96%. It suggests the importance of SVM in the classification of protein functional classes and its potential application in protein function prediction.