Classification Model Using Optimization
Technique: A Review
1
Deoshree Diwathe, 2 Snehlata S.Dongare
1
M. Tech Computer Science & Engineering, G. H. Raisoni College of Engineering, Nagpur, India
2
Computer Science and Engineering, G. H. Raisoni College of Engineering, Nagpur, India
Abstract - Data Mining is widely used in Sciences and technology fields. Classification is essential and important technique in data mining. Classification technique contains different types of classifiers. Decision Tree is most useful ruled based classifier, the rules are in the form of IF-THEN rules and generate it according to applicable conditions in tree structure, and checks all condition for classifying the data. The research of this project is that, Decision Tree is designing by greedy approach which is used to generate decision for each and every attributes, but the demerits of classification technique is generating number of rules during classification, it tends to less accuracy and efficiency. Resolved this disadvantage with the help of Artificial Bee Colony Optimization Algorithm. It is used to optimize rules and update the conditions during classification and optimized result. Therefore, classification using optimization algorithm is increasing accuracy and efficiency of classification model.
Keywords - Data Mining, Classification technique, Decision Tree Classifier, Artificial Bee Colony Optimization Algorithm(ABC Optimization Algorithm) .
1. Introduction
ata mining is defined as; extract required or useful data from bulk of datasets. So that, it consists multiple collection and managing data and it also consists analysis of data and prediction on data. This performance uses Classification Models of Data Mining techniques. Data Mining is one of the most important computational process for discovery the patterns, text, forecasting, prediction and many more from the huge datasets. There are so many methods at the intersection of artificial intelligence such as neural network, machine learning such as supervised and unsupervised learning, static or database system. Overall aim of the Data Mining process is to extract information from a number of dataset and transform into grid structure according to their attributes. Data mining is predictive in nature which is used to predict the patterns and the final prediction has shown in the form of graphical view. Data mining delivers the methodology and technology to promote these huge data into useful and easy information for decision making. Data mining is a process of gather knowledge from such immense of data. There are different types of techniques in data mining such as classification, clustering, association. In this project, more focused on classification technique.
decision-tree methods (IF-THEN rules), such as C4.5 algorithm, ID3 algorithm and neural networks.
Classification technique in data mining contains some disadvantages like decision tree generates number of rules during classification of data. Therefore accuracy and efficiency is decreases. To remove this drawback by using Artificial Bee Colony Optimization algorithm. Optimization Algorithm is used to find out the optimal solution and gives optimal output of classification. Optimization algorithm is manipulating to diminish the efforts and time. Also it is deploy to compress the number of rules which is formed by Decision Tree Classifier during classification and it tends to decrease accuracy and efficiency. Within the short time period, it gives optimal solution and increases accuracy and efficiency.
2. Literature Review
In Classification of Data Mining, There are analyzing different kind of data mining techniques introduced in recent years for Classification. The following information shows different data mining techniques used in the Classification over different datasets. The unique concept is classifying different dataset using Optimization Algorithm and increase their efficiency and accuracy. Also there are different types of optimization algorithm to classify data in optimized way and gives optimal solution. In this paper, C4.5 algorithm of decision tree is used for classification purpose and Ant colony optimization is for optimized the problem related with classification algorithm. In C4.5 algorithm, when classification is applies on training data at that time all training cases are organized in the form of IF-THEN rule and the number list is rapidly increases. The goal of this paper is,
• In this paper, LSC means Land use suitability classification is useful to classify specific areas of land depend upon their suitability which is useful for agriculture.
• Maximized efficiency and optimized complexity using Ant Colony Optimization.
In this research paper, Author applied put forward algorithm i.e. Ant-Miner. Ant- Miner was used to improve the system performance or expand the applicability and usability of Ant-Miner which is used to handle non-spatial problems [1].
In this research paper, Support Vector Machine (SVM) classifier is used to classify the limited set of unlabeled data and genetic algorithm is used for optimization algorithm. In this paper, SVM classifier is one of the strong classifier but it is not solution for tricky and
complex problem during classification. That’s why the classification accuracy for unlabeled dataset is decreases. So this is the main disadvantage of SVM classifier algorithm. Therefore, a Multiobjective Genetic SVM Approach i.e. classification with Genetic Algorithm which is used for inflating the selected training set and it is used to find out the tricky problems and solve that problem in optimal way. Finally it created perfect result [2].
In this research paper, artificial neural network (ANN) is used for classification purpose and Artificial Bee Colony optimization algorithm is applicable for removing the drawbacks in classification algorithm. Artificial neural network is not sufficient to generate robust application for classification using ANN. This algorithm does not generate reduced design of the ANN. So this is big drawback of artificial neural network algorithm. Artificial Bee Colony is manipulate to resolve that problem with the help of weight Updating .This algorithm is used to maximize the accuracy and minimize the complexity i.e. number of connection of the ANN is decreases and problem is easily solved [3].
In this Research paper, Population is very important factor which can be provide important information to create the decision such as economic, business, marketing etc. and it is protecting the population from various harm. Real world dataset of fire evacuation is very sparse and dataset is also noisy. Different dangerous task, requirement and environment are varying differently. So there are many inconsistencies in dataset. That’s why, its classification result is inaccurate or misleading and response time is very limited. Therefore Classification model contains some drawbacks.
Particle swarm optimization algorithm for population classification in fire evacuation is used to remove those drawbacks and optimized the recall measures and conditions related to population classification. In this research paper, classification with the help of Particle Swarm Optimization Algorithm organize to design effective method for encoding classification rules, and use an encyclopedic (cover all) learning strategy for including particles and managing diversity of the swarm. This proposed system is comparatively better than the classification model and it is especially very easily work on the real world dataset. Also this hybrid model is working anywhere i.e. Multiobjective Classification Model using Particle Swarm Optimization [4].
list during classification and therefore the accuracy and efficiency of that proposed method was decreases. Ant Colony Optimization Algorithm is used to solve the problem is easy way. The objective or goal of Ant- Miner is to extract some data from the huge dataset .Ant Miner is based on the behavior of real ant’s colonies and their principles and also some data mining concepts and principles. In this research paper, Ant-Miner is compared with the CN2 algorithm for classification. So the comparative results as follows:-
• Ant-Miner is better than the CN2 algorithm because, the accuracy of Ant-Miner is high and it predicted accurate result.
• Ant-Miner created smaller or simpler rule lists than the CN2 algorithm.
• The metaheuristic approach has both robust and versatile.
The hybrid concept of optimization algorithm and classification algorithm i.e. Ant-Miner is very effective for prediction in data mining [5].
In this research paper, the soft computing knowledge or techniques have widely used in machine learning method. Fuzzy logic or fuzzy ruled based evolutionary algorithm created IF-THEN rules. This fuzzy model related with ruled based system is paired with the optimization algorithm. Fuzzy Logic or neural network is used for classification purpose and classification is not work on the real dataset, it is only work on the labeled dataset. That’s the big problem of classification algorithm. Therefore optimization algorithm is used to solve this problem with fuzzy logic method. Fuzzy rule based method is having following drawbacks:
• Fuzzy ruled based method created number of significant or continuous fuzzy rule lists for multivariate dataset of classification.
• Fitness of system is decreases because of generating the number of rule lists.
• Fuzzy logic worked on the labeled dataset only not on the real dataset.
Genetic Algorithm is used for optimization algorithm and Naives Bayesian is used for classification method. The proposed hybrid approach is Fuzzy Genetic algorithm and it is compared with the Naives Bayesian algorithm. The interpretation of Genetic algorithm related with Fuzzy having better fitness value result which is differentiate to the interpretation of Naïve Bayesian algorithm to accurately and perfectly determine and classify patterns. Interpretation analysis is based on testing on both trained dataset as well as tested dataset. That dataset which is created from given or original dataset. It is directly
declared that the fuzzy genetic algorithm with the fitness output based on data mining shown comparatively better outcome to other fitness result but in case of classification technique Naïve Bayesian algorithm is more reliable [6].
In this research paper, k-Nearest Neighbors (kNN) algorithm is used for classification. In this method, to detect the nearest points in a dataset from the required points. The outcome of kNN is used for regression and classification. Those methods are widely used in both machine learning and data mining. Nearest-Neighbors framework methods are successful on different types of pattern classification drawbacks. In kNN method, a group of frameworks has to be determined that accurately presents the input patterns. Then the work of classifier has to be assigns number of classes according to nearest point in this group.
In this research paper, Particle Swarm Optimization Algorithm is manipulate to create the optimal result. The main goals of Particle Swarm Optimization are as follows:-
• Frstly use the standard (PSO) optimization algorithm i.e. particle swarm optimization algorithm to find out those nearest point from that group.
• Second, Generated a hybrid new algorithm is called as adaptive Michigan PSO (AMPSO) which is used to reduce the measurement of the search space and AMPSO provides more flexibility than standard PSO algorithm.
In this paper, comparatively the output of the standard PSO algorithm and hybrid AMPSO algorithm is applied on different benchmark datasets and find out that AMPSO hybrid algorithm is always found a better result than the standard PSO. It was also able to improve the results of the k-Nearest Neighbor algorithm [7].
improve the accuracy of classification related with spinal cord disorders. Genetic algorithm is used for selecting the genes and bagging technique is used to resolve the complex problem of class imbalance. The hybrid method is applied on three classification algorithms such as naïve bayes, neural networks and k-nearest neighbor. The result shows that, the hybrid method is used to improve the classification of spinal cord disorders for classifier algorithms [8].
In this research paper, classification technique in data mining is supervised learning process which is very useful in Artificial Neural Network (ANN). The ANN is having ability to accept the number of inputs and then it generate a network with weights and input values and lastly perfect output generated. ANN must have large amount of framework or measures because it generates better accuracy and efficiency of classification. Artificial Bee Colony Optimization algorithm is manipulating to help to increase the accuracy and efficiency in optimal way. Artificial Bee Colony Optimization algorithm is having capability of exploitation and exploration which is used to update the weights and solving the problem of ANN classifier in optimal way [9].
In this research paper, Back Propagation machine learning algorithm in neural network is utilized for classification purpose. Neural network classifier is having some drawbacks as follows:-
• The weights of the input vectors are very high. That’s why; accuracy of proposed model has been decreased. • The proposed network was very complex for high
weights. Therefore, efficiency is low and maximized errors.
Artificial Bee Colony Optimization algorithm is easy to update new weights and solve the problems of complexity of the proposed network. This algorithm is used to optimize the architecture of Neural Network and increases the accuracy and efficiency [10].
In this research paper, the presented system or method is Ant-Tree-Miner model which is created with the help of decision tree classification algorithm. The presented system is hybrid combination of Ant Colony Optimization (ACO) which is very popular meta- heuristic algorithm and Decision tree classification algorithm. Ant-Tree-MinerM is a unique System to introduce for enhancing the Ant-Tree-Miner. This new approach is utilized to learn multiple-tree classification models. A multiple-tree model consists of number of decision trees and one for each class value; where each class is depend upon the decision tree which is responsible for decreasing its class value and all other class values available in the class domain. Author’s
experimental results shows that, with the help of using 40 popular benchmark datasets for checking the accuracy of the proposed system and identify different quality functions that improve the some quality function that are already used in Ant-Tree- MinerM[11].
3. System Architecture
Fig1:- Flowchart of Classification Model Using Optimization Technique
4. Methodology
In classification technique, analyzing and classifying a set of datasets and then it will generate a classification model. Optimization Technique is used to reduce efforts for solving complex problems and it gives optimal solution on complex problem. Classification using optimization technique is very effective and simple or easy to understand. In this project, classification dataset is taken from UCI repository benchmark dataset which is mentioned in the literature review papers.
4.1 Decision Tree Classifier
Decision Tree Classifier is very simple and effective classifier to classify datasets in tree structure.
In Decision Tree algorithm, there are number of conditions. If those conditions are true then generate next condition and forward it onto the next level of decision tree, otherwise if condition is false then stop that process at that point.
Decision Tree algorithm is totally IF-THEN rule based classifier. The rules are generated in IF-THEN rule based form and according to rules, conditions are check.
Decision trees can be formed from number of rule sets, IF-THEN rule is denoted as following pattern:-
“IF attrA > x AND attrB <= y AND … THEN Class1”
OR
“IF attrA < x AND attrB >= y AND … THEN Class2”
It is a Greedy Approach that is Top-To-Bottom approach therefore the tree checks their condition from top to bottom.
Greedy Approach is easy to design the decision tree and easy way to plot the conditions.
4.2 Artificial Bee Colony Optimization Algorithm
Artificial Bee Colony optimization algorithm is inspired from behavior of real Bee’s Foraging. In this algorithm, System of communication of number of bees can also be seen within the system. Artificial Bee Colony Optimization algorithm collects information about different parts of the environment.
There are 3 Types of Bees:-
1. Employed Bee 2. Onlooker Bee 3. Scout Bee
1. Employed Bee:-
• Each Employed Bee is behaving like employed such as they collect food from different area. • Employed bee goes to the outside and find out the
food source and determine the neighbor food source.
• Then, employed bee evaluates nectar amount of food source.
• Then, come back to home and dance in the hive.
• During dancing, employed bee spreads the information related to the neighbor food source to the onlooker bee.
2. Onlooker Bee:-
• Each Onlooker Bee watches the dance of
Employed Bees and gets information from dance of the Employed Bee.
• Now onlooker bee is converted into the
Employed bee and goes to find out the food source.
• Then choose neighbor food source from the dance and go to that neighbor food source. After choosing the neighbor food source, they detects nectar amounts.
• Then come back to the home after updating the nectar amount.
3. Scout Bee:-
Scout Bee is used to replace the abandoned food sources are detected and are converted or replaced with the new food sources.
Firstly, classification task with the help of Artificial Bee Colony Optimization Algorithm is introducing as follows components:-
o Rule Format:-
The IF-THEN rules are categorized into two parts. IF part is known as antecedent part which is used for representing the conditions for each and every attribute. THEN part is known as consequent part which is useful for showing classes and that classes depends on the features and condition of attributes. Each and every attributes has its low- bound i.e. lower value of rule list and high-bound i.e. higher value of this rule list.
o Fitness Function:-
Fitness function is manipulated for evaluating or determining the nectar amount of classified that data. This fitness function is used to calculate all features in the dataset. If the calculating value of a features is between the upper bound and lower bound, therefore the features filled by the rule.
o Search Strategy:-
Strategy is consuming time to classified number of dataset.
o Rule Discovery:-
Classification work is carried out with number of rules and that attributes are categorized in different classes.
o Rule Pruning:-
Rule pruning is used to remove for redundant features.
o Prediction Strategy:-
Prediction strategy is used to predict the data from the rule pruned.
Prediction value = (α × rule fitness value) + (β × rule cover percentage)
Where, α and β are two weighted parameters.
Cover percentage = TP N
Where, TP is class predicted by the rule. N is number of records.
Fig2:- Flowchart of Classification with Artificial Bee Colony Optimization Algorithm
4. Conclusion
In this paper, Artificial Bee Colony Optimization Algorithm is applicable for classification which is very robust and efficient algorithm. An ABC algorithm is useful for train the dataset and updates the weights, because of this purpose it is very flexible in nature and minimizes the number of rules created by decision tree algorithm. Finally, it improves accuracy and efficiency.
References
[1] Jia Yu, Yun Chen, “Ant Colony Optimization
based Land Use Suitability Classification”, IEEE International Conference on Intelligent Computing Applications 2016, pp.1-6.
[2] Noureddine Ghoggali,Farid Melgani,Yakoub Bazi, “A Multiobjective Genetic SVM Approach for Classification Problems With Limited Training Samples” , IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING JUNE 2009, VOL. 47, NO. 6, PP.1707-1718.
[3] Beatriz A. Garro , Humberto Sossa ,Roberto A. Va´zquez, “Artificial Neural Network Synthesis by means of Artificial Bee Colony (ABC) Algorithm” , IEEE 2011, pp.331-338.
[4] Yu-Jun Zheng, Hai-Feng Ling, Jin-Yun Xue,Sheng-Yong Chen, “Population Classification in Fire Evacuation: A Multiobjective Particle Swarm Optimization Approach”, IEEE
TRANSACTIONS ON EVOLUTIONARY
COMPUTATION FEBRUARY 2014, VOL. 18, NO. 1,PP.70-81.
[5] Rafael S. Parpinelli, Heitor S. Lopes, Alex A. Freitas, “Data Mining With an Ant Colony
Optimization Algorithm”, IEEE
TRANSACTIONS ON EVOLUTIONARY
COMPUTING AUGUST 2002, VOL. 6, NO. 4,pp. 321-332.
[6] Biprodip Pal, Mumu Aktar, Firoz Mahmud, Syed Tauhid Zuhori, “An Evolutionary Fuzzy Genetic and Naïve Bayesian Approach for Multivariate Data Classification”, IEEE International Conference on Computer and Information Technology 2014,pp.20-24.
[7] Alejandro Cervantes, Inés María Galván, and Pedro Isasi, “AMPSO: A New Particle Swarm Method for Nearest Neighborhood Classification”, IEEE TRANSACTIONS ON SYSTEMS, MAN,
AND CYBERNETICS—PART B:
CYBERNETICS OCTOBER 2009, VOL. 39, NO. 5, PP.1082-1091.
[8] M. Duraisamy, F. Mary Magdalene Jane,
“CELLULAR NEURAL NETWORK BASED MEDICAL IMAGE SEGMENTATION USING ARTIFICIAL BEE COLONY ALGORITHM”, IEEE 2012 ,PP.1-6.
[10] Lv Qiongshuai, Wang Shiqing, “A Hybrid Model
Of Neural Network And Classification In Wine” IEEE 2011,PP.58-61.
[11] Khalid M. Salama, Ashraf M. Abdelbar, Fernando E.B. Otero, “Investigating Evaluation Measures in Ant Colony Algorithms for Learning Decision Tree Classifiers”, IEEE Symposium Series on Computational Intelligence 2015,pp.1146-1153.
[12] Hadhami Kaabi, Khaled Jabeur, Talel Ladhari, “Genetic Algorithm to infer criteria weights for Multicriteria Inventory Classification”, IEEE International Conference on Engineering and Technology 2014, pp.276-281.
[13] Hanning Chen, Lianbo Ma, Maowei He, Xingwei Wang, “Artificial Bee Colony Optimizer Based on Bee Life-Cycle for Stationary and Dynamic Optimization”, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS 2016,pp.1-20.
[14] Lae-Jeong Park and Cheol Hoon Park, “Fast Layer-by-Layer Training of The Feedforward Neural Network Classifier with Genetic Algorithm”, IEEE International Joint Conference on Neural Networks 2010, PP.2595-2598.
[15] Jiang Wu, Changjie Tang et al. “A Multiple Evolutionary Neural Network Classifier Based on Niche Genetic",IEEE Fourth International Conference on Natural Computation 2008, PP.405-409.
[16] Hiteshkumar Nimbark, Dr. P P Kotak et al.,
“Optimizing Architectural Properties of Artificial Neural Network using Proposed Artificial Bee Colony Algorithm”, IEEE 2014,PP.1285-1289.
[17] Chin-Teng Lin,Mukesh Prasad,Amit Saxena, “An Improved Polynomial Neural Network Classifier Using Real-Coded Genetic Algorithm”, IEEE TRANSACTIONS ON SYSTEMS NOVEMBER 2015, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 11, PP.1389-1401.
[18] Erdem Dilmen1, Selim Yılmaz1, Selami Beyhan , “CASCADED ABC-LM ALGORITHM BASED OPTIMIZATION AND NONLINEAR SYSTEM IDENTIFICATION”, IEEE 2013 ,PP.243-246.
[19] Fernando E. B. Otero, Alex A. Freitas, and Colin
G. Johnson, “A New Sequential Covering Strategy for Inducing Classification Rules With Ant Colony Algorithms” , IEEE TRANSACTIONS ON
EVOLUTIONARY COMPUTATION
FEBRUARY 2013, VOL. 17, NO. 1, PP.64-76.
[20] Rodrigo C. Barros, Márcio P. Basgalupp, Alex A. Freitas, and André C. P. L. F. de Carvalho, “Evolutionary Design of Decision-Tree Algorithms Tailored to Microarray Gene Expression Data Sets”, IEEE TRANSACTIONS
ON EVOLUTIONARY COMPUTATION
DECEMBER 2014, VOL. 18, NO. 6,PP.873-892.
[21] Rizki Tri Prasetio1 and Dwiza Riana, “A
Comparison of Classification Methods in Vertebral Column Disorder with the Application of Genetic Algorithm and Bagging”, 2015 4th International Conference on Instrumentation, Communications, Information Technology, and Biomedical
Engineering (ICICI-BME) Bandung, November 2-3, 2015, PP.163-168.
[22] Urvesh Bhowan, Mark Johnston, Member, IEEE, Mengjie Zhang, Senior Member, IEEE, and Xin Yao, Fellow, IEEE, “Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data” , IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 17, NO. 3, JUNE 2013,PP.368-386.
[23] Adam Byerly et al. “A New Parameter Adaptation Method for Genetic Algorithms and Ant Colony Optimization Algorithms”, IEEE 2016,PP.0668-0673.
[24] Chia-Feng Juang, Senior Member, IEEE, Chi-Wei Hung, and Chia-Hung Hsu, “Rule-Based Cooperative Continuous Ant Colony Optimization to Improve the Accuracy of Fuzzy System Design”, IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 4, AUGUST 2014, PP.723-735.
[25] Abdul Rauf Baig, Member, IEEE, Waseem
Shahzad, and Salabat Khan, “Correlation as a Heuristic for Accurate and Comprehensible Ant Colony Optimization Based Classifiers”, IEEE
TRANSACTIONS ON EVOLUTIONARY