Bio Mining Method for Classifying Cancer

(1)

Volume 8, Issue 7, July 2019, ISSN: 2278 -7798

Bio Mining Method for Classifying Cancer

Zahraa Naser Shah Weli

Abstract----Today, the medical knowledge of the data is very extensive on the symptoms of patients with various diseases and the ways to help them with the diagnosis of these diseases. Analyzing and considering all the engaged factors by one person are usually difficult. Thus, There was a need to use data mining methods to help link traits and information to find relationships that would be useful in classifying and diagnosing diseases, especially cancer. In this paper data mining methods used to classify and diagnose cancer, depending on Bioinformatics data such as gene mutation.

Keywords: Data mining, Bio mining, Cancer, Classification.

1) INTRODUCTION

Data mining is a tool that establishes unseen relationships between related data and is crucial in examining vast amounts of routine data. It involves the exploration of large datasets to derive hidden, predictive, and unknown information useful in further analysis. The automated system applies to healthcare, particularly in cancer diagnosis and management [1]. Traditional data mining tools include techniques such as association rules and decision trees. The association rules have a drawback of being overly systematic, hence redundant and meaningless rules that limit research. Examples of data mining classification algorithms used in medical diagnosis include Naïve Bayes, CN2, and RBF Network used to detect and diagnose malignant or benign tumors and their respective

characteristics. The challenge faced by modern medicine is research based on isolation of illnesses instead of focusing on the underlying relationships and their interactions, which might be the key to addressing severe conditions such as cancer and diabetes [2].

2) CANCER

Cancer may result from genetic mutations in the cell division process leading to the production of defective cells not effectively destroyed but develops and divide to produce a mass of defective cells that spread and affect the neighboring cells. Cancer refers to the mass of defective cells, regarded as malignant or benign [3]. The cancer has grown to be the leading cause of death in industrialized nations, which comes second in developing countries. The mortality rate is a product of a combination of several factors, including dietary composition, level of income, wine intake, age, race, gender, and smoking. The factors are related to specific cancers; that is, smoking is highly likely to cause throat and lung cancers. The causative clinical factors are classifiable into immunohistochemistry data, the stage of cancer development, and the clinical background of the patient [4] .

Benign tumors have the characteristic of forming lumps in given locations and do not spread to other body parts and locations, and hence they do not fall as cancers.

(2)

Volume 8, Issue 7, July 2019, ISSN: 2278 -7798 Samples from the lumps at times are necessary for further

tests and confirmation of the diagnosis, and no other forms of treatment are necessary for the benign tumor. A malignant tumor, on the other hand, can spread to other body parts and organs if left untreated. In particular, this is because the cells from the primary tumor can break away and relocate to other body organs, mainly through the bloodstream and the lymphatic system where they will multiply [5].

Historical and medical investigations show that certain types of cancer have a significant correlation. There is a need to study chronic cancers, which show a comprehensive correlation, and hence the predictability of either can be determined from an analysis of results of one. This consideration of concurrent diseases can effectively lead to a better clinical decision-making model and the improvement of the quality of decisions. The accurate and effective decisions by medical practitioners in handling the related cancers would eventually lead to an early and accurate diagnosis, which would reduce the mortality rates and the overall costs of treating cancer [6].

The existence of more than one form of cancer on an individual is famous as multimorbidity and the cancers described as comorbid to each other. The predictability of a single cancer is affected by the existence of two coincident cancers because differences in testing and failure of medical practice to conduct comprehensive and full-length testing and diagnosis. The existence of comorbid cancers exacerbates the mortality rate, and this impact is evident from the early years of completion of the treatment procedures. The existence of comorbid cancers also reduces

disease-specific survivability because of the diagnosis delay element associated with two or more ailments co-occurring. The relationship between the comorbidity and mortality is that the more severe the comorbidity, the higher the mortality rate [7].

3) DATA MINING

Data mining allows for new ways of collecting, processing, analyzing, and presenting more meaningful information than just the surface results. Data mining is applicable in market analysis, analysis of medical and healthcare data, risk assessment analysis, crime detection analysis, and other analyses that work with voluminous input and output data. Some of the common data mining techniques applicable in virtually all fields of analysis include Artificial neural networks (ANN), Naïve Bayes, Bagging algorithm, the Support vector machine (SVM), decision tree & K-nearest neighborhood (KNN)[8].

Traditional data mining algorithms serve specific purposes, and they have specified and objective roles. Due to this complexity, for them to be processed, they need to be broken down during a pre-processing stage before any further data exploitation can take place. Data mining algorithms in a healthcare setting allow for new techniques and tools for finding useful information from exploiting the vast health records such as prescriptions, lab results, and examinations stored in databases and are usually too extensive and complex to be handled by traditional data mining methods [9].

(3)

Volume 8, Issue 7, July 2019, ISSN: 2278 -7798 4) CLASSIFICATION

Classification as a data mining tool works by identifying the properties of given profiles, which are useful in determining the classes of the profiles and hence can group a new entry automatically based on its attributes. The training methods for data biomining algorithm are classifiable into various categories depending on the procedures applied to exploit the specific data sets. The classification of the methods depends on the specific neural learning technique used to train the algorithm to exploit the given data set. The four categories of general-purpose neural

networks learning algorithms include online

backpropagation algorithm, conjugate gradient descent algorithm, batch backpropagation algorithm, and quick propagation algorithm (Quick Prop). Quick Prop is a repetitive method of finding the minimum loss function of an artificial neural network while the online backpropagation method corrects and learns from its errors indicated in the validation set. Batch backpropagation algorithm process trains and learns on the entire dataset grouped in batches. Algorithm-based biomining methods are useful in diagnosing the kind of cell mutations that may lead to cancer by employing data mining techniques that rely on neural networks technology [10].

5) DIAGNOSIS CANCER BY BIOMINING METHODS Biomining is a sequential analysis method of data that relies on several methods of deciding the optimal way of focusing on the mutation of genes by using more than one backpropagation method. It has two distinct stages. The first one is the selection of the optimal Neural Network (NN) algorithm that carries out the biomining activity from the

given dataset. The second stage is the diagnosis and classification of the mutated genes in a person to determine those, which cause cancers [11].

6) SELECTION OF THE OPTIMAL NN ALGORITHM FOR BIOMINING

Most researchers use the backpropagation (BP) algorithm to train and equip a multilayered perceptron. The working of the chosen methods and algorithms requires an updated selected database with the varied gene types for study. The working and function of the algorithm are broken down into various stages depending on the activity performed by the algorithm in mining the database [12]. The first stage is the extraction of the datasets related to the cancer types from the database. The second stage is the selection and determination of the sets for training, testing, and validation of the data. After the first two stages, the pre-processing level of analysis has been achieved. The following stage is the selection of the appropriate neural network model, which stipulates the architectural setup of the neural network chosen. The features and characteristics of the challenge at hand also help to determine the most appropriate training and learning method for the algorithm [13].

7) DIAGNOSIS AND CLASSIFICATION OF MUTATED GENES

The optimal NN algorithm picked as most appropriate is derived and is useful in classifying the types of mutations in the cell structure. The entry or input stage requires that a normal healthy gene sequence will serve as

(4)

Volume 8, Issue 7, July 2019, ISSN: 2278 -7798 the standard and then the affected patients’ gene sequence.

The system performs comparisons between the normal gene sequence and the specific patient’s sequence. From the similarity tests, the algorithm processes and determines whether there is a malignant or benign mass of cells. The algorithm will also accurately deliver results indicating whether there is a risk or not besides also revealing information about unknown new mutations.

8) CONCLUSION

The use of optimal learning algorithms is the most effective and accurate biomining technique for determining and classifying mutations which cause cancer. The basis of data mining techniques is statistical and artificial intelligence methods. The aim of data mining is achieving pre-processing and presenting quantitative information uniformly with reduced errors and is user-friendly. The results obtained from data mining will help in informing and guiding decision-making activities. Data mining is also famous as Knowledge Discovery in Databases (KDD) because it cleans, analyses, and integrates the data to derive meaningful patterns of information. The research on cancer and its treatment over the years, primarily based on biological and clinical methods, but recently, the importance, role, and scope of data-driven analytics in diagnosis and treatment have increased significantly. Data mining techniques should be part of the healthcare systems, particularly in the diagnosis of critical ailments, including cancer and diabetes. The automated and self-adjusting algorithms underlying the techniques provide more user-friendliness, extraction, and exploitation of data become easier and more controllable and hence intermediate results

to aid medical operations such as diagnosis and treatment are easily accessible.

REFERENCES

[1] Altug Akay, Andrei Dragomir and all at, “A Data Mining Approach for Investigating Social and Economic Geographical Dynamics of Thalas-semia's Spread “, IEEE Transactions on Information Technology in Biomedicine, vol. 13, Issue: 5, pp. 774-780, Sept. 2009.

DOI: 10.1109/TITB.2009.2020062.

[2] Mahmoodi, S. A., Mirzaie, K.. and Mahmoudi, S. M., “A new algorithm to extract hidden rules of gastric cancer data based on ontology “. SpringerPlus, vol. 5, 2016.

DOI: org/10.1186/s40064-016-1943-9.

[3] Lobo, S., and Pallavi, M. S., “Predicting Protein in Cancer Diagnosis Using Effective Classification and Feature Selection Technique “. In 2018 International Conference on Communication and Signal Processing (ICCSP), pp. 156-159, IEEE, 2018. [4] Song Wu, Wei Zhu, Patricia Thompson and Yusuf A Hannun,

“Evaluating intrinsic and non-intrinsic cancer risk factors “, Nature Communications, vol.9, issue 1,Aug 28 2018.

Doi: 10.1038/s41467-018-05467-z.

[5] Chaurasia, V., Pal, S., & Tiwari, B. B.,” Prediction of benign and malignant breast cancer using data mining techniques “, Journal of Algorithms and Computational Technology, vol. 12(2), pp:119-126, 2018.

[6] Zolbanin, H. M., Delen, D., and Hassan Zadeh, “Predicting overall survivability in comorbidity of cancers: A data mining approach “. Decision Support Systems, vol. 74, pp:150–161, 2015. [7] Luke T A Mounce, Sarah Price, Jose M Valderas and William Hamilton,

“ Comorbid conditions delay diagnosis of colorectal cancer: a cohort study using electronic primary care records “. British Journal of Cancer, vol. 116, pp: 1536–1543, 2017.

[8] Chauhan, D., & Jaiswal, V. , “An efficient data mining classification approach for detecting lung cancer disease “. In 2016

(5)

Volume 8, Issue 7, July 2019, ISSN: 2278 -7798 International Conference on Communication and Electronics

Systems (ICCES), pp. 1-8, IEEE, 2016.

[9] Domingos, Ana L. B. et al. “The profile of beta thalassemia obtained by data mining analysis in a database “, Revista Brasileira de Hematologiae Hemoterapia ABHH, vol.32, pp 78-79. ISSN 15168484,2010.

[10] Ismaeel, A. G., “Diagnose Mutations Causes Β-Thalassemia: Biomining Method Using an Optimal Neural Learning Algorithm “. International Journal of Engineering & Technology,vol. 8(1.11), pp. 1-8, 2019.

[11] Stefan Duffner, Christophe Garcia , “An Online Backpropagation Algorithm with Validation Error-Based Adaptive Learning Rate“, Springer Link, International Conference on Artificial Neural Networks ICANN, 2007.

DOI: 10.1007/978-3-540-74690-4_26, pp 249-258.

[12] Mutasem khalil Sari Alsmadi, Khairuddin Bin Omar and Shahrul Azman Noah, “ Back Propagation Algorithm: The Best Algorithm Among theMulti-layer Perceptron Algorithm “, International Journal of Computer Science and Network Security, vol.9,No.4, April 2009.

[13] Insung Jung, and Gi-Nam Wang, “ Pattern Classification of Back-Propagation Algorithm Using Exclusive Connecting Network “, International Journal of Electrical, Computer, Energetic, Electronic and Communication Engineering, Vol:1, No:12, 2007. ZAHRAA NASER SHAH WELI: Assistant lecturer at Al-Nahrain university, Baghdad, Iraq. M.SC. In computer science from college of science / al-Nahrain university.