Universidade de São Paulo
2014-05
Optimum-path forest applied for breast
masses classification
International Symposium on Computer-Based Medical Systems, 27th, 2014, New York.
http://www.producao.usp.br/handle/BDPI/46708
Downloaded from: Biblioteca Digital da Produção Intelectual - BDPI, Universidade de São Paulo
Biblioteca Digital da Produção Intelectual - BDPI
Optimum-Path Forest Applied for Breast Masses
Classification
Patricia B. Ribeiro, Kelton A. P. da Costa, Jo˜ao P. Papa
S˜ao Paulo State University Departament of Computing
Bauru, S˜ao Paulo - Brazil
Email: [email protected], {kelton,papa}@fc.unesp.br
Roseli A. F. Romero
University of S˜ao Paulo Departament of Computing Science
S˜ao Carlos, S˜ao Paulo - Brazil Email: [email protected]
Abstract—In Computer-Aided Diagnosis-based schemes in mammography analysis each module is interconnected, which directly affects the system operation as a whole. The identification of mammograms with and without masses is highly needed to reduce the false positive rates regarding the automatic selection of regions of interest for further image segmentation. This study aims to evaluate the performance of three techniques in classifying regions of interest as containing masses or without masses (without clinical findings), as well as the main contribution of this work is to introduce the Optimum-Path Forest (OPF) classifier in this context, which has never been done so far. Thus, we have compared OPF against with two sorts of neural networks in a private dataset composed by 120 images: Radial Basis Function and Multilayer Perceptron (MLP). Texture features have been used for such purpose, and the experiments have demonstrated that MLP networks have been slightly better than OPF, but the former is much faster, which can be a suitable tool for real-time recognition systems.
I. INTRODUCTION
The main lesions identified in X-ray mammography, which is the most common technique used by radiologists for the analysis and diagnosis of breast cancer [1], [2], [3], are: micro-calcifications, that is one of the first signs of tumor formation with a high degree of suspicion of malignancy, and breast masses, which are responsible for the most cases of breast cancer. The lesions are described by their shapes and boundary properties. A benign tumor has well-defined margins, whereas the cancer is characterized by a indistinctive border that becomes more spiculated [4]. Distortions in the interpretation and classification of suspicious lesions by ex-perts involve a larger number of unnecessary biopsies [5] [6]. According to Kopans [4], “the failure of the diagnosis of breast cancer became the leading cause of negligence claims, and tends to increase”. Therefore, this can also reduce the cost-effectiveness of examinations as well as the possibility that the disease being no longer detected, characterizing the false negative diagnosis [7].
The interpretation of lesions in a mammogram is a com-plex task for the experts on whose experience affects the accuracy of their diagnosis [4]. The aim of aiding in the detection of mammographic structures with clinical interest
The authors would like to thank Capes, FAPESP grants #2009/16206-1, #2013/20387-7 and #2011/14094-1, and CNPq grants #303182/2011-3 and #470571/2013-6.
has increased in the last years, technologies such as Computer-Aided Diagnosis-based (CAD) schemes have been the subject of extensive researches in the past two decades [8], [9], [10], [11], [12], [13]. Intelligent techniques are used as classification procedures in CAD schemes [14] and their performances de-pend on several classification stages, as well as the extraction and selection of suitable features [14].
Sun et al. [15] presented a new scheme based on Multi-view in the context of mammograghic image analysis, that is defined as the average of the classification results in two different views of the same image. Such approach has demonstrated to reduce the number of false classifications [15]. Further, Zhang et al. [10] propose to classify suspicious masses using an ensemble of four classifiers: Decision Trees, Support Vector Machines, Case-based Reasoning and Artificial Neural Networks. The proposed approach has outperformed a single classifiers’ effectiveness.
Recently, Papa et al. [16], [17] have proposed a new classi-fication technique based on graph partitions called Optimum-Path Forest (OPF), which has demonstrated to overcome some state-of-the-art pattern recognition techniques in some applications. Basically, the idea that rules OPF is to perform a competition process between some key graph nodes (feature vectors) in order to conquer the remaining samples. After that, each key node will be a root of a cluster that represents a class, although we can have more than one cluster per class.
The purpose of this work is to introduce OPF for the classification of mammography images aiming to identify the presence of breast masses, as well as to compare OPF against with two Artificial Neural Networks classifiers-based and the well-known Support Vector Machines using texture features. As far as we know, OPF has never been applied do this context up to date. The remainder of this paper is organized as follows: Sections II and III introduce the OPF classifier and the methodology employed in this paper, respectively. Section IV presents the experimental results and Section V states conclusions and future works.
II. OPTIMUM-PATHFOREST
The OPF classifier works by modeling the problem of pattern recognition as a graph partition in a given feature space. The nodes are represented by the feature vectors and the
edges connect all pairs of them, defining a full connectedness graph. This kind of representation is straightforward, given that the graph does not need to be explicitly represented, allowing us to save memory. The partition of the graph is carried out by a competition process between some key samples (prototypes), which offer optimum paths to the remaining nodes of the graph. Each prototype sample defines its optimum-path tree (OPT), and the collection of all OPTs defines an optimum-path forest, which gives the name to the classifier [16]. The OPF can be seen as a generalization of the well known Dijkstra’s algorithm to compute optimum paths from a source node to the remaining ones [18]. The main difference relies on the fact that OPF uses a set of source nodes (prototypes) with any smooth path-cost function [19].
A. Background theory
Let Z = Z1∪Z2∪Z3be a dataset labeled with a function λ,
in which Z1, Z2and Z3are, respectively, training, evaluating
and test sets such that Z1and Z2are used to design a given
classifier and Z3 is used to assess its accuracy [20]. Let
S ⊆ Z1 a set of prototype samples. Essentially, the OPF
classifier creates a discrete optimal partition of the feature space such that any sample s ∈ Z2∪ Z3 can be classified
according to this partition. This partition is an optimum path forest (OPF) computed in ℜnby the image foresting transform
(IFT) algorithm [19].
The OPF algorithm may be used with any smooth path-cost function which can group samples with similar properties [19]. Particularly, we used the path-cost function fmax, which is
computed as follows: fmax(⟨s⟩) =
!
0 if s ∈ S, +∞ otherwise
fmax(π· ⟨s, t⟩) = max{fmax(π), d(s, t)}, (1)
in which d(s, t) means the distance between samples s and t, and a path π is defined as a sequence of adjacent samples. In such a way, we have that fmax(π)computes the maximum
distance between adjacent samples in π, when π is not a trivial path.
The OPF algorithm assigns one optimum path P∗(s)from
Sto every sample s ∈ Z1, forming an optimum path forest P
(a function with no cycles which assigns to each s ∈ Z1\S its
predecessor P (s) in P∗(s)or a marker nil when s ∈ S. Let
R(s) ∈ S be the root of P∗(s)which can be reached from
P (s). The OPF algorithm computes for each s ∈ Z1, the cost
C(s)of P∗(s), the label L(s) = λ(R(s)), and the predecessor P (s).
The OPF classifier is composed of two distinct phases: (i) training and (ii) classification. The former step consists, essen-tially, in finding the prototypes and computing the optimum-path forest, which is the union of all OPTs rooted at each prototype. After that, we take a sample from the test sample, connect it to all samples of the optimum-path forest generated in the training phase and we evaluate which node offered the optimum path to it. Notice that this test sample is not
permanently added to the training set, i.e., it is used only once. The next sections describe in details this procedure.
1) Training: We say that S∗ is an optimum set of
proto-types when the OPF algorithm minimizes the classification errors for every s ∈ Z1. S∗ can be found by exploiting the
theoretical relation between minimum-spanning tree (MST) and optimum-path tree for fmax[21]. The training essentially
consists in finding S∗and an OPF classifier rooted at S∗.
By computing an MST in the complete graph (Z1, A),
we obtain a connected acyclic graph whose nodes are all samples of Z1 and the arcs are undirected and weighted by
the distances d between adjacent samples. The spanning tree is optimum in the sense that the sum of its arc weights is minimum as compared to any other spanning tree in the com-plete graph. In the MST, every pair of samples is connected by a single path which is optimum according to fmax. That is,
the minimum-spanning tree contains one optimum-path tree for any selected root node. The optimum prototypes are the closest elements of the MST with different labels in Z1(i.e.,
elements that fall in the frontier of the classes).
2) Classification: For any sample t ∈ Z3(similar definition
is applied to Z2), we consider all arcs connecting t with
samples s ∈ Z1, as though t were part of the training graph.
Considering all possible paths from S∗ to t, we find the
optimum path P∗(t) from S∗ and label t with the class
λ(R(t))of its most strongly connected prototype R(t) ∈ S∗.
This path can be identified incrementally by evaluating the optimum cost C(t) as
C(t) = min{max{C(s), d(s, t)}}, ∀s ∈ Z1. (2)
Let the node s∗ ∈ Z1 be the one that satisfies Equation 2
(i.e., the predecessor P (t) in the optimum path P∗(t)). Given
that L(s∗) = λ(R(t)), the classification simply assigns L(s∗)
as the class of t. An error occurs when L(s∗)̸= λ(t).
III. MATERIALS ANDMETHODS
In this section we describe the experimental methodology applied in this work. In order to perform the classification of mammographic images, a dataset composed by 120 regions of interest (ROIs), i.e., masses, has been used. The original digital mammograms were obtained from films digitized by a Lumiscan (Lumisys, Inc.) scanner, with 12 bits of contrast resolution and spatial resolution of 0.15mm and 0.075mm per pixel. From these ROIs, Haralick texture features [22] were extracted, and a selection of the best features representing each class was performed using Gaussian distribution [23]. These attributes were used as the input for the classifiers employed in this work: Artificial Neural Networks with Multilayer Perceptrons (MLP), Artificial Neural Networks with Radial Basis Function (RBF) and Optimum-Path Forest. Next sections describe in more details the dataset as well as the feature selection.
A. Images Dataset
A set of 120 images containing regions of interest [24] of several sizes were selected in agreement with medical reports
supplied for each mammography image, being 60 of them containing suspect masses (Figure 1) and 60 without masses (Figure 2). In this work, we are interested in classifying each image in two classes: with (positive) and without breast mass (negative), being the data of the former class composed by images with benign and malign samples.
(a) (b)
Fig. 1. Example of ROIs with suspect masses used for the classification analysis: (a) benign and (b) malign.
(a) (b)
Fig. 2. Example of ROIs without masses.
B. Texture Features
The use of texture features for CAD-based applications has been largely studied. They provide several measurements as smoothness, rugosity and regularity, which can describe the intensity variation or subtle changes between the object and the image background. Texture features can be statistically represented by using the co-occurrence matrix of gray-levels (Spatial Grey-Level Dependence), which calculates the prob-ability of the combined occurrence between gray levels in different angles. Further, the direction of angles and distance between pairs of pixels with similar intensity values are calculated, and based on this matrix the 14 Haralick features are calculated: variance, correlation, contrast, entropy, sum entropy, entropy of difference, inverse difference moment, sum average, sum variance, difference average, difference variance, energy, information measures of correlation 1 and information measures of correlation 2 [25].
C. Feature Selection
In order to select the best features from the breast images, Gaussian distributions have been employed. In this method, the smaller the overlap of the curves the better the feature repre-sents each image. For each feature, two Gaussian distributions are fitted, being one for the positive images of the dataset and the other one for the negative samples. Thus, these two Gaussian distributions are compared to compute the amount of overlap between them. More details about this procedure can be obtained in the work of Ribeiro et al. [14].
IV. EXPERIMENTALRESULTS
In this section we state the experiments conducted to assess the effectiveness of OPF classifier for breast masses identification. Firstly, we have selected the best set of features using the method described in Section III-C. The following set of features have been chosen as the best representative: contrast, sum average, sum variance, sum entropy, difference of entropy and information of measure correlation 1.
We have compared OPF against with RBF and MLP net-works in a cross-validation procedure using 10 folds. For the MLP network, we used an architecture (empirically chosen) composed by two hidden layers with 8 neurons each, and two output layers and learning rate of 0.7. Notice we used the Fast Artificial Neural Network (FANN) library for MLP implemen-tation [26], and for RBF we used our implemenimplemen-tation.
Table IV presents the accuracy results, which are computed using the accuracy proposed by Papa et al. [16], which considers imbalanced datasets. We can observe MLP networks have obtained best results than OPF and RBF for all folds, except for folds #6 and #7 (OPF has obtained the best results for such sets). Therefore, we can observe that MLP networks have been 7% more accurate than OPF if we consider the mean accuracy rate. Table IV presents the mean training execution time [s] for the classifiers.
Cross Validation % Accuracy-RBF % Accuracy-OPF % Accuracy-MLP
1 70.00 86.67 90.00 2 70.00 83.33 96.67 3 76.67 56.67 76.67 4 70.00 63.33 83.33 5 83.33 76.67 90.00 6 73.33 76.67 73.33 7 63.33 86.67 76.67 8 70.00 80.00 83.33 9 70.00 80.00 90.00 10 90.00 86.67 96.67 % Mean Average 73.67 80.00 85.66 TABLE I
ACCURACIES OBTAINED BY THE CLASSIFICATION ALGORITHMS.
The reader can observe OPF has been the fastest approach, being 2.42 times faster than RBF and 2926.82 times faster than MLP for the training step. These results make OPF more suitable for large datasets, in which we may have hundreds of millions of images, in which a real-time learning system using MLP networks may be inviable.
Cross Validation % Time-RBF % Time-OPF % Time-MLP 1 0.001 0.0005 6.25 2 0.001 0.0004 0.13 3 0.001 0.0005 0.05 4 0.001 0.0005 0.41 5 0.001 0.0004 0.24 6 0.001 0.0005 0.35 7 0.001 0.0004 1.47 8 0.001 0.0005 0.24 9 0.001 0.0005 2.33 10 0.001 0.0004 0.53
% Mean execution time 0.001 0.00041 1.2
TABLE II
TRAINING EXECUTION TIMES[S]OBTAINED BY THE CLASSIFICATION ALGORITHMS.
V. CONCLUSIONS
This work focused on the application of a new technique called Optimum-Path Forest for the classification of regions of interest in mammography images. This technique demon-strated good results compared with other techniques, such as MLP and RBF networks, although MLP-based networks have been slightly more accurate than OPF.
However, the main advantage of OPF concerns with its fast training step, and also with its absence of parameters, which makes it interesting for classification problems. Notice the MLP computational load presented in this work do not consider the time expend for choosing the neural architecture. For future works we intend to consider more techniques in the experimental results, such as Support Vector Machines and Bayesian classifier.
REFERENCES
[1] C. J. Baines, D. V. McFarlane, and A. B. Miller, “The role of the reference radiologist. estimates of inter-observer agreement and potential delay in cancer detection in the national breast screening study,” Invest Radiol, vol. 25, no. 9, pp. 971–6, 1990.
[2] M. L. Giger, “Computer-aided diagnosis of breast lesions in medical images,” Computing in Science Engineering, vol. 2, no. 5, pp. 39–45, 2000.
[3] A. Mencattini, M. Salmeri, R. Lojacono, M. Frigerio, and F. Caselli, “Mammographic images enhancement and denoising for breast cancer detection using dyadic wavelet processing,” IEEE Transactions on Instrumentation and Measurement, vol. 57, no. 7, pp. 1422–1430, 2008. [4] D. Kopans, Breast Imaging. Lippincott-Raven, 1998.
[5] N. Mudigonda, R. Rangayyan, and J. E. L. Desautels, “Gradient and texture analysis for the classification of mammographic masses,” IEEE Transactions on Medical Imaging, vol. 19, no. 10, pp. 1032–1043, 2000. [6] M. Mavroforakis, H. Georgiou, N. Dimitropoulos, D. Cavouras, and S. Theodoridis, “Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines,” European Journal of Radiology, vol. 54, no. 1, pp. 80–89, 2005.
[7] O. R. Brook, A. M. Connell, E. Thornton, R. L. Eisenberg, M. Mendiratta-Lala, and J. B. Kruskal, “Quality initiatives: anatomy and pathophysiology of errors occurring in clinical radiology practice,” Radiographics, vol. 30, no. 5, pp. 1401–10, 2010.
[8] M. F. Angelo, A. Patrocinio, H. Schiabel, R. B. Medeiros, and S. R. Pires, “Comparing mammographic images,” Engineering in Medicine and Biology Magazine, IEEE, vol. 27, no. 3, pp. 74–81, 2008. [9] L. Sun, L. Li, W. Xu, W. Liu, J. Zhang, and G. Shao, “A novel
classification scheme for breast masses based on multi-view information fusion,” in 4th International Conference on Bioinformatics and Biomed-ical Engineering, 2010, pp. 1–4.
[10] Y. Zhang, N. Tomuro, J. Furst, and D. Raicu, “Building an ensemble system for diagnosing masses in mammograms,” International Journal of Computer Assisted Radiology and Surgery, vol. 7, no. 2, pp. 323–329, 2012.
[11] Y. Lan, H. Ren, and J. Wan, “A hybrid classifier for mammography cad,” in Fourth International Conference on Computational and Information Sciences, 2012, 2012, pp. 309–312.
[12] P. B. Ribeiro, R. A. Romero, P. R. Oliveira, H. Schiabel, and L. B. Verc¸osa, “Automatic segmentation of breast masses using enhanced ICA mixture model,” Neurocomputing, vol. 120, pp. 61–71, 2013. [13] F. Soares, F. Janela, M. Pereira, J. Seabra, and M. Freire, “Classification
of breast masses on contrast-enhanced magnetic resonance images through log detrended fluctuation cumulant-based multifractal analysis,” IEEE Systems Journal, no. 99, pp. 1–10, 2013.
[14] P. Ribeiro, H. Schiabel, and A. Patrocinio, “Improvement in ann perfor-mance by the selection of the best texture features from breast masses in mammography images,” in World Congress on Medical Physics and Biomedical Engineering 2006, ser. IFMBE Proceedings, R. Magjarevic and J. H. Nagel, Eds. Springer Berlin Heidelberg, 2007, vol. 14, pp. 2439–2442.
[15] L. Sun, L. Li, W. Xu, W. Liu, J. Zhang, and G. Shao, “A novel classification scheme for breast masses based on multi-view information fusion,” in 4th International Conference on Bioinformatics and Biomed-ical Engineering (iCBBE), 2010.
[16] J. P. Papa, A. X. Falc˜ao, and C. T. N. Suzuki, “Supervised pattern classification based on optimum-path forest,” International Journal of Imaging Systems and Technology, vol. 19, no. 2, pp. 120–131, 2009. [17] J. P. Papa, A. X. Falc˜ao, V. H. C. Albuquerque, and J. M. R. S.
Tavares, “Efficient supervised optimum-path forest classification for large datasets,” Pattern Recognition, vol. 45, no. 1, pp. 512–520, 2012. [18] E. W. Dijkstra, “A note on two problems in connexion with graphs,”
Numerische Mathematik, vol. 1, pp. 269–271, 1959.
[19] A. Falc˜ao, J. Stolfi, and R. Lotufo, “The image foresting transform theory, algorithms, and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 1, pp. 19–29, 2004. [20] C. Marques, I. R. Guilherme, R. Y. M. Nakamura, and J. P. Papa, “New
trends in musical genre classification using optimum-path forest,” in Proceedings of the 12th International Society for Music Information Retrieval Conference, 2011, pp. 699–704.
[21] C. All`ene, J. Y. Audibert, M. Couprie, J. Cousty, and R. Keriven, “Some links between min-cuts, optimal spanning forests and watersheds,” in Proceedings of the International Symposium on Mathematical Morphol-ogy. MCT/INPE, 2007, pp. 253–264.
[22] R. Haralick, “Statistical and structural approaches to texture,” Proceed-ings of the IEEE, vol. 67, no. 5, pp. 786–804, 1979.
[23] A. Patrocinio, H. Schiabel, R. Benatti, C. Goes, and F. Nunes, “Investiga-tion of clustered microcalcifica“Investiga-tion features for an automated classifier as part of a mammography cad scheme,” in Proceedings of the 22nd Annual International Conference of the IEEE in Engineering in Medicine and Biology Society, vol. 2, 2000, pp. 1203–1205.
[24] B. R. N. Matheus and H. Schiabel, “Online mammographic images database for development and comparison of cad schemes,” Journal of Digital Imaging, vol. 24, no. 3, pp. 500–506, 2011.
[25] R. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for image classification,” IEEE Transactions on Systems, Man, and Cyber-netics, vol. SMC-3, pp. 610–621, 1973.
[26] S. Nissen, Implementation of a Fast Artificial Neural Network Library (FANN), 2003, department of Computer Science University of Copen-hagen (DIKU). Software available at http://leenissen.dk/fann/.