• No results found

A HYBRID ALGORITHM IN BIG DATA FRAMEWORK FOR PEST CLASSIFICATION

N/A
N/A
Protected

Academic year: 2021

Share "A HYBRID ALGORITHM IN BIG DATA FRAMEWORK FOR PEST CLASSIFICATION"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

Available online at http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=12&IType=2 ISSN Print: 0976-6480 and ISSN Online: 0976-6499

DOI: 10.34218/IJARET.12.2.2020.007 © IAEME Publication Scopus Indexed

A HYBRID ALGORITHM IN BIG DATA

FRAMEWORK FOR PEST CLASSIFICATION

R.P.L. Durgabai

Research Scholar, Department of Computer Science, Sri Padmavati Mahila Visvavidyalayam, Tirupati, India

P. Bhargavi

Assistant Professor, Department of Computer Science, Sri Padmavati Mahila Visvavidyalayam, Tirupati, India

S. Jyothi

Professor, Department of Computer Science, Sri Padmavati Mahila Visvavidyalayam, Tirupati, India

ABSTRACT

Agriculture is an intuitive space which is usually transformed from one generation to another generation. In agriculture there are many problems which effect the crop growth out of which pests attack is a complex threat. One of the major causes of pest’s attack is unpredictable weather conditions, soil quality and natural calamities. Some may attack due to poor seed quality. Protection of crops become a foremost confront in agriculture, among them predicting the pest attack is one of the important defy. The increase demand for food and changes in climate, policy makers and technology force like Big data is used by industry exports to take assistants. Where for analysis the clouded data is collected which is integrated with larger amount of data is taken to determine patterns to price the models. The data set is trained and predicted in computer by involving an algorithm called Machine learning. This ML will classify and predict the types of pest attacks on the crops during different climatic conditions. By time prediction and classification of the pest is done to understand and protect the crops effectively to allow farmers to learn. In this context, a new algorithm called hybrid machine learning algorithm is proposed which is made by comparative study of individual algorithms to attain accurate pest classification of cotton data set. The proposed functionalities improvise the performance when compared to individual algorithms and it helps to analyse the classification to the maximum specification.

Key words: Machine Learning, Hybrid Algorithm, Linear Regreesion, Decision Tree,

(2)

Cite this Article: R.P.L. Durgabai, P. Bhargavi and S. Jyothi, A Hybrid Algorithm in

Big Data Framework for Pest Classification, International Journal of Advanced

Research in Engineering and Technology (IJARET), 12(2), 2021, pp. 76-84.

http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=12&IType=2

1. INTRODUCTION

The farmer livelihood is depends on monsoon rains. In INDIA, 70% of the people depended on farming either directly or indirectly. Through agriculture around 58% of employment is depends on agriculture. South-west monsoon is from June to September which is four long months. The south – west monsoon will decide the Kharif crop fate based on climat ic conditions. The season results a plentiful crop in rainy season which is beneficial to farmers.

The crop cultivation is manually depends on monsoon rains because irrigation is done in manual method. So, simply good monsoon rains results in gain of economy and on weak monsoon results in failure of crop which affects the economy in a negative manner. The availability of food produce is checked during monsoon rains due to food inflation. Conversely, in a state of drought, prices soul significantly. Moreover failure of monsoon has a huge impact on the life of farmers. Mostly farmers rely on good crop during monsoon to earn their living and in order to overcome debts incurred. The main reason for the pest prediction may not identify but the symptoms based on weather conditions which cause the crop to have the pests is the current dataset classification problem. The dataset of cotton pests is delight by using big data.

Big data [3] is a stint that defines the enormous volume of data that is growing aggressively. The extraction of information from the data is collected among all possible relations is called as Data analytics. To make the data appeared as bigger. Data analytics facilitated to deal with big data better by applying Machine Learning algorithms [2].

Machine Learning algorithms [1] is classified into two groups based on learning style or function. The hidden information that is used in decision making can be retrieved through machine learning which can used in data mining applications. Similarly, the use of machine learning in classification of pests in crop is a new application. Mostly the prediction techniques in the agriculture field are categorized as data classification approach. The techniques like rule based learning, case-based reasoning; LR and DT are the various learning methods are combined to form a hybrid machine learning method for best performance. In this study a novel approach for hybrid decision tree is used to form a new hybrid machine learning algorithm which will be useful for the agriculture department to predict the pest of cotton crop. The method hybrid machine learning algorithm is implemented in python programming.

2. LITERATURE SURVEY

Yanbo Huang et.al [4] in this author reviewed the fields of agriculture and biological engineering by using water and soil fields to manage and support the precision agriculture by using soft computing techniques.

Basant Agarwal et.al [5] here author proposed a hybrid method for anomaly detection by combination of entropy and SVM. It dynamically decides to derive the entropy values based on fixed range to find the attack will occur or not. In experimental analysis this method worked well by detecting the attack with high accuracy.

Shuangyin Liu et.al [6] by combining the genetic algorithm and support vector regression the author proposed a hybrid approach to predict the aquaculture water quality. Where the information is composed from aquatic factories of china. For this method evaluation intrusion detection dataset is used.

(3)

The researcher Najeebullah et.al [7] proposed a machine learning technique by combining feature selection and regression for predicting wind power to optimize the enhanced particle and hybrid neural network.

Dieu Tien Bui et.al [8] introduced a hybrid model by combining the particle swarm optimization and neural Fuzzy information system that is considered for tropical forest fire susceptibility modelling, which is helpful to plan and manage forest fire prone areas

Wei Chen et.al [9] has applied a novel hybrid artificial intelligence based on rotation, forest ensembles and navie bayes tree whose results can be used for the planning and management of areas vulnerable to landslides to prevent damages caused by natural disasters.

K Polaraju et.al [10] in this context various data mining techniques are applied to predict heart diseases. By multiple linear regression analysis chance of predicting the heart disease is accurate.

The researcher Katherine M Ransom et.al [11] Boosted regression tree model produces 3D map of nitrate concentration. Hybrid multi- modelling method is used as numerical model to predict outputs. The Redox characteristics and field scale unsaturated zone N flux were most important. Nitrate concentrations less than 2 mg/LNO3-N generally conformed to basin

subregion. In eastern alluvial fans subregion Nitrate concentrations are > 10 mg/L NO3-N.

Vaibhav Kumar et.al [12] has developed a hybrid deep learning model which is a combination of deep neural network and fuzzy logic. This helps to predict the behaviour of a dependent variable.

Anitha Avula V et.al [13] has presented a hybrid model for medical datasets using hybrid machine learning algorithm to improve prediction accuracy.

3 PROBLEM DOMAIN

Problem domain is the part of potentiality or function that requires to study, to solve a problem. A problem domain is observing at the particular persons interest a part from everything else. The present problem is to classify the pests more accurately based on weather conditions.

3.1. Description of Cotton Pests Dataset

The data is collected from the department of entomology at the agriculture university, Guntur, A.P, India. The dataset has so many pests from that we have to predict the six major pests attack on cotton crop and the predicted values are considered to generate a classification of pests in different weather conditions. The dataset contains the different parameters like weather, pests occurred in A.P as shown in table 1. The different parameters in the dataset are as follows: Weather conditions: Temperature high (oc): The highest temperature or the maximum temperature recorded (Temp.H). Temperature Low (oc): The lowest temperature or the minimum temperature recorded (Temp.L). Humidity: The relative humidity recorded in morning (RH I (%)) and evening (RH II (%), Rainfall (mm): The amount of rainfall recorded in each week. Pest types (Pest): The six types of pest.

Table 1 Features of Cotton Crop Dataset

S.No. Parameters 1 Pest 2 Temp.H 3 Temp.L 4 RH I 5 RH II 6 Rainfall

(4)

4. METHODOLOGY

Methodology is a common research approach that draws the procedure on what to study is to be committed among other things, identifies the techniques to be used in it.

4.1. Machine Learning

Machine learning provides a automatic tools for analysing, collection of data and integration. Where your day is saved by ML algorithm because it formed by combining cloud computing, machine learning. The combination of supervised/unsupervised machine learning can give an understanding of what one’s real target audience looks, what are their major patterns of behaviour, what are preferences.

4.1.1 Linear Regression

Linear regression [14] is a model that has a linear relationship between the input (X) and single output (Y) variable. Where Y is calculated from a linear combination of the input variable (X).

𝒀 = 𝜽𝟏+ 𝜽𝟐. 𝑿 Where the training model is as follows: X: Training data input

Y: data labels θ1: intercept

θ2: x coefficient

When training the model, the best line is to be predicted the value of y for a given value of x. Once we find the best θ1 and θ2 values, we get the best fit line.

4.1.2 Decision Tree

Decision tree [15] is one of the types of supervised algorithm which explains what is the input and what the output from corresponding training data. The tree contains the nodes like root node, leaf node, decision node, internal node. An internal node represents feature or attribute whereas the branch represents decision rule, outcome is represented by leaf node the top node is a root node as shown in below example figure 1.

There are two types of decision tree

• Classification trees: categorical type of decision. • Regression trees: continuous type of decision.

By considering the regression and decision tree ID3 algorithm. To know about the algorithm few of the methods to be known

(5)

Entropy

Entropy is also known as Shannon Entropy which is noted as H(S) for a finite set, S is the data randomness measure. It tells about the predictability of a certain event. In certain manner low values imply less uncertainty while higher values imply high uncertainty.

𝐻(𝑆) = ∑ 𝑝(𝑥)𝑙𝑜𝑔2

1 𝑝(𝑥)

𝑥∈𝑋

Information Gain

The information gain is also known as Kullback – Leibler divergence is noted as IG(S,A) for a set S means effective change of entropy after deciding on a particular attribute A. It measures the relative change in entropy with respect to the independent variables.

𝐼𝐺(𝑆, 𝐴) = 𝐻(𝑆) − 𝐻(𝑆, 𝐴)

4.2. Comparison of Predictive Model

The predictive model is acquired to classify different pest occurrences in distinctive climatic condition using linear decision tree to acquire the best regression model to assort the pest.

K-Nearest Neighbour(K-NN), Navie Bayes and decision tree are the three classification algorithm is applied to the dataset to find the best model based on accuracy. The fig. 1 elucidates different classification algorithms and their accuracy with model predication time in seconds. By this decision tree is considered as best fit model which can give good accuracy in very low time.

Figure 1. Accuracy on classification algorithms

The second analysis can be done by applying six regression models like Elastic Net, Stochastic Gradient Descent (SGD), LASSO, Ridge, Least Angle Regression (LAR) to dataset. The linear regression is considered as the best fit model despicable on minimum mean square error as shown in figure 2.

0 100 200 300 400 500 600 700 800 900 K-NN Naive Bayes Decision Tree Accuracy Prediction Time in seconds

(6)

Figure 2. Mean Square Error on Regression models

4.3. Hybrid Model for Pest Classification on Cotton Crop

The hybrid model is applied on dataset to predict the pest using linear regression algorithm and classify the pest based on weather conditions using decision tree classifier algorithm. By this naive person can easily understand. The data assessment model is performed on all features, test the results to generate and analyse the tree structure visually, which will help to take wise decisions for interpretation. In this study, the models were trained by using all the weather parameters like temperature high (Temp. H), low (Temp. L), morning relative humidity (RH I), evening relative humidity (RH II) and rainfall. Prediction and Classification of the cotton crop pests based on the climatic conditions is modelled and tested with samples of twenty-five years. The linear decision tree is a result of two best models to predict the dataset more accurately. The hybrid model is implemented as shown in below figure 2.

4.4. Algorithm for Hybrid machine learning algorithm

Algorithm: Generates a hybrid decision tree from the training tuples of data partition D. Input: This is a set of training tuples and their associated class lables, attribute list, the set of

candidate attributes.

Output: A hybrid Model decision Tree Method:

• Root node of a tree is created

• Check the dataset values, if positive return leaf node “positive”. • Else if “negative” then it is return node.

• The current state H(S) Entropy is calculated.

• Calculate the entropy of each attribute ‘x’ denoted by H(S, X). • Selected IG (S, X) attribute for maximum value.

• Attribute offers the highest IG from the set of attributes is removed.

• Repeat, the process until all the attributes will run out, or leaf node has all the decision tree. 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Mean Square Error Model Prediction Time in seconds

(7)

Figure 2. Flow Chart for implementing hybrid algorithm

5. EXPERIMENTAL RESULTS

Initially necessary packages for LR and DT has imported in to python programme library. Python is a very simple and well-designed language for machine learning. It has clear syntax with easy text manipulation capability, good amount of documentation is provided by a community of professional developers, examining and viewing of the program is done simultaneously while writing code which is made possible through an interactive shell.

Secondly, the cotton pest data is loaded into MongoDB and then retrieved into python using MongoDB connector.

Stored the weather inputs in X variable and pests in Y variable. After splitting the data into train and test, predict the pest attack then classify the pest based on given weather conditions. Linear decision tree is procured to classify the pests precisely.

Figure 3.HybridDecision Tree based on rainfall

Hybrid Decision Tree classified the dataset by taking all the independent variables such as X0= Temperature Maximum (Temp.H), X1= Temperature Minimum (Temp.L), X2= Morning Relative Humidity (RH I), X3= Evening Relative Humidity (RH II), X4= Rainfall, and

(8)

dependent variables are types of pests with values in an order 0=Aphids, 1=Jassids, 2=Thrips, 3=Whitefly, 4=Leafhopper, 5=Pink boll worm. Gini is a criterion to reduce the feature impurity. The tree has catalysed the following rules to classify the pests attack in different weather conditions.

Rule 1: If the rainfall is less than equal to 41.1cm then there is much attack of pests.

Rule 2: If the rule 1 is true then the maximum temperature is less than equal to 25.9 degrees then there is an attack of pests.

Rule 3: If the rule 1 is false and the rain fall is less than equal to 43.1cm then there is much effect of concentrated pests.

Rule 4: If the maximum temperature is less than 29.88 degrees then there is effect of pests. Rule 5: If the minimum temperature is less than 14. 09 degrees then there is effect of pests. Rule 6: If the relative humidity in morning is less than equal to 54 percent then there is effect of pests.

Rule 7: If the relative humidity in evening is less than equal to 64 percent then pests attack on crop.

Rule 8: If the rainfall is less than equal to 46cm then pests attack on crop.

In all the rules attained, it has been adhered that attack of aphids, jassids, thrips, whitefly, leafhopper and pink bollworm in disparate weather conditions.

6. CONCLUSION

Mainly agriculture is depending on monsoon which is not sufficient on agriculture production. Rainfall is distributed unfavourably in different regions which results in floods and droughts. To analyse large amount of data is collected and it is loaded into MongoDB for storage and retrieval purpose then it is retrieved into python for analysis. Machine learning algorithm is applied to make agriculture professionals and farmers for easily understand their prediction. As a result it reduces uncertainty in their decisions to reduce risks and saving valuable time. In this study, linear Decision tree algorithm is proposed to demonstrate the hybrid machine learning technique perform better than the individual algorithms for the dataset used. The present hybrid algorithm is composed by linear regression and decision tree classifications of machine learning algorithms. The result shows the hybrid algorithm to presage the pest aggression to save the crop.

REFERENCES

[1] Saeed Banihashemi, Grace Ding, Jack Wang, 2017."Developing a hybrid model of prediction and classification algorithms for building energy consumption", Energy Procedia 110,371-376,1st international conference on Energy and Power.

[2] Annina Simon,Mahima Singh Deo,Venkatesan S,Ramesh Babu D R,2015."An overview of Machine Learning and its Applications", International Journal of Electrical Sciences & Engineering, Volume 1, Issue 1, pp.22-24.

[3] Mashooque A Memon, Safeeullah Soomro, Awais K Jumani, Muneer A Kartio, 2017."Big Data Analytics and its Applications, Annals of Emerging Technologies in Computing, Volume 1, No. 1.

[4] Yanbo Huang, Yubin Lan, Steven J Thomson, Alex Fang, 2010."Development of Soft Computing and applications in agricultural and biological engineering", Computers and Electronics in Agriculture, 71(2):107-127.

(9)

[5] Basant Agarwal, Namita Mittal, 2012."Hybrid Approach for Detection of Anomaly Network Traffic using Data Mining Techniques, Procedia Technology, 2nd International Conference on Communication, Computing & Security.

[6] Shuangyin liu, Haijiang Tai, Qisheng Ding, Daoliang li, 2013."A hybrid approach of support vector regression with genetic algorithm optimization for aquaculture water quality prediction", Mathematical and Computer Modelling 58(s 3-4):458-465.

[7] Najeebullah, Aneela Zameer, Asifullah Khan, Syed Gibran Javed, 2014."Machine Learning based Short Term Wind Power Prediction using a Hybrid Learning Model", Computers and Electrical Engineering.

[8] Dieu Tien Bui, Quang-Thanh Bui, Quoc Phi, Biswajeet Pradhan, 2017."Hybrid Artificial Intelligence Approach Using GIS-Based Neural-Fuzzy Inference System and Particle Swarm Optimization for Forest Fire Susceptibility Modeling at A Tropical Area", Agricultural and Forest Meteorology, DOI: 10.1016/j.agrformet.2016.11.002.

[9] Wei Chen, Ataollah Shirzadi, Himan Shahabi,Baharin Bin Ahmad, Shuai Zhang, Haoyuan Hong & Ning Zhang,2017."A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naive bayes tree classifiers for a landslide susceptibility assessment in Langao County", China, ISSN: 1947-5705.

[10] Polaraju K, Durga Prasad D, 2017."Prediction of Heart Disease using Multiple Linear Regression Model", International Journal of Engineering Development and Research, Volume 5, Issue 4.

[11] Katherine M Ransom, Bernard T Nolan, Jonathan A Traum, Claudia C Faunt, Andrew M Bell, Jo Ann M Gronberg, David C Wheeler, Celia Z Rosecrans, Bryant Jurgens, Gregory E Schwarz, Kenneth Belitz, Sandra M Eberts,George Kourakos,Thomas Harter,2017." A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA", Science of the Total Environment, Volume 601-602.

[12] Vaibhav kumar, Garg M L.,2018."Deep Learning as a Frontier of Machine Learning: A Review", International Journal of Computer Applications (0975-8887), Volume 182, No.1. [13] Anitha Avula V, Arba Asha, 2018."Improving Prediction Accuracy Using Hybrid Machine

Learning Algorithm on Medical Datasets", International Journal of Scientific & Engineering Research, Volume 9, Issue 10, ISSN 2229-5518.

[14] Shen Rong, Zhang Bao-wen, 2018."The research on regression model in machine learning field", MATEC Web of Conferences 176, 01033.

[15] Neha Patel, Divakar Singh, 2015."An Algorithm to Construct Decision Tree for Machine Learning based on Similarity Factor", International Journal of Computer Applications (0975-8887) Volume 111- No10.

References

Related documents

Since the spectrum of SMC SC3 97657 was obtained around phase 0.2, at decreas- ing light, and the Layden and Clementini determinations of [Fe/H] agrees quite well (and the

To be specific, we search the entirety of twitter.com for a few carefully chosen keywords, search within those tweets for mentions of future dates, filter again using various

In the totally serial approach each byte of the State is substituted by the corresponding S-Box byte one at a time in a serial manner.. Consider, for example, the 128 bit

Mohamad Rima (Centre Azm pour la Recherche en Biotechnologie, Lebanon); Claudine Accary (Lebanese University, Lebanon); Katia Haddad (Faculty of Sciences, Section III, Lebanon);

Because bank capital can be viewed as a security buffer to assume losses from risky and poor quality assets, banks willing to take higher risk might hold more capital (Berger et

For example, the solution space offered by the production process of a custom integrated circuit manufacturer offers a huge solution space to users – it will produce any

In this section the results of the study are presented. As it was mentioned earlier, four types of discursive legitimation strategies were used to institutionalize the UK retail

The results show that the transition economies in the Asian side look to perform better for accumulating much larger sum of foreign direct investments while