6.8 Experimental Results
7.2.1 System development
The introduction of new types of atypical lymphoid cell implies more complexity to the classification problem, which led to modify the developed methodology with improvements in almost all the steps of the digital image processing, specifically the feature extraction and classification process. The optimized methodology to carry out the automatic classification of 7 different types of lymphoid cells was done through the following steps: 1) blood sample preparation and digital image acquisition; 2) clustering color segmentation and Watershed transformation; 3) feature extraction; 4) feature selection; and 5) classification. The details of the methodology have been described in Chapters4and6.
FIGURE7.1: The whole process has two stages: 1) the system development (digital image processing is applied over the training set), and 2) the system validation (the methodology is applied over lymphoid cells of individual patients).
A System for Automatic Identification of PB Atypical Lymphoid Cells
7.2.1.1 Blood sample preparation and digital image acquisition
Samples from normal donors and patients with CLL, HCL, MCL and FL were included in this study. The diagnoses were established by clinical and morphologic findings as well as characteristic immunophenotype of the lymphoid cells. Specifically, CLL cells had the phenotype CD5+, CD19+, CD23+, CD25+, weak CD20+, CD10-, FMC7- and dim surface immunoglobulin (sIg) expression. All the patients with HCL had lymphoid cells with the phenotype CD11c+, CD25+, FMC7+, CD103+ and CD123+. Patients with MCL showed lymphoid cells with the phenotype CD5+, FMC7+, CD43+, CD10- and BCL6-. Follicular lymphoma cells showed B-cell associated antigens (CD19, CD20, CD22, CD79a) BCL2+, BCL6+, CD10+, CD5- and CD43-. BPL images were obtained from transformed CLL. The reactive lymphocyte images were obtained from patients with the diagnoses of infectious mononucleosis. Blood samples were obtained from the routine workload of the Core Lab-oratory of the Hospital Clínic of Barcelona. Venous blood was collected into tubes containing K3EDTA as anticoagulant. Samples were analyzed by a cell counter Advia 2120 (Siemens Healthcare Diagnosis, Deerfield, USA) and PB films were automatically stained with May Grünwald-Giemsa in the SP1000i (Sysmex, Japan, Kobe) within 4 hours of blood collection.
Individual lymphoid cell images from PB had a resolution of 363 x 360 pixels and they were obtained by the CellaVision DM96 system (Lund, Sweden). The quality of the smears was assessed by cytologists prior to the image study. A training set of 3617 lymphoid cell images from PB films was selected by the cytologist to evaluate the accuracy of the proposed methodology, which were distributed as follows: 320 normal lymphocyte images from healthy patients (N), 408 RL, 529 HCL, 732 MCL, 551 FL and 1077 from patients with CLL. This group was divided into 863 CLL clumped chromatin typical lymphocyte images and 214 BPL images. For the validation step cell images from a total of 21 different patients previously selected by the cytologist were used.
7.2.1.2 Clustering color segmentation and Watershed transformation
As described in Chapter4, lymphoid cells were segmented from other objects in the image using sKFCM clustering technique over the XYZ and CMYK color spaces and the Watershed Transformation (WT). Thus, three regions were obtained: cell, nucleus and peripheral zone around the cell.
7.2.1.3 Feature extraction
The feature extraction methods described in Chapter6are used in this work. A summary of the implementation of the feature extraction step is presented here.
122
7.2 Material and methods
Geometric features These features are numerical interpretations of morphologic attributes such as size, shape, nucleus-cytoplasm ratio, etc. A total of 77 geometric features were calculated: 13 geometric-size features and 32 Elliptic Fourier Descriptors (EFD) for each region of interest. They also included a cytoplasmic profile feature, which estimates the external projections of the cytoplasm [103].
Color and texture features Three different methods to characterize the color and texture were used, obtaining the following three types of features for each color component from six color spaces (RGB, CMYK, XYZ, L*a*b*, L*u*v, and HSV): (1) statistical features; (2) wavelet statistical features; and (3) granulometric features. Each of these features was applied to the nucleus and the cytoplasm.
1. Statistical Features
In Chapter 5, 6 first order and 7 second order statistical features were used [83]. As in Chapter6, from 7 to 15 second order statistical features were extended and 2 more features were also added: cluster shade and cluster prominence [111]. These 23 features were calculated over each color component of the image.
2. Wavelet statistical features
A novelty respect to earlier works (Chapters 3 and 5) is that the above 23 statistical features were applied not only over the color components of the original image but also over 6 sub-images derived from a two level wavelet decomposition [113–115] for each color component. As derived in Chapter6, this procedure consists in the application of the discrete wavelet transform (DWT) over an image (in this case a color component).
It decomposed the image into in 4 sub-images: an approximation of the image and three highlighted versions of the horizontal (H1), vertical (V1) and diagonal (D1) details.
This process was repeated in a second level decomposition over the first approximation image and four more sub-images were obtained: A2, H2, V2 and D2. These 6 detail sub-images were used to obtain the 23 x 6 wavelet statistical features.
3. Granulometric features
As in Chapter6, 8 granulometric features were extracted: 4 of them were calculated on the granulometric curve (it uses successive operations of opening and closing) and the remaining 4 were calculated on the pseudo-granulometric curve, which uses successive applications of the mathematical morphology operations dilation and erosion [84].
In summary, 23 statistical features for the original image, 23 x 6 wavelet statistical features and 8 granulometric features were obtained. These 169 features were calculated for each color
A System for Automatic Identification of PB Atypical Lymphoid Cells
component of the six color spaces. All of them were applied for the nucleus and the cytoplasm regions. All features were stored in a numerical data matrix, which was used as the input data for the feature selection step.
7.2.1.4 Feature selection
Due to the large number of cell features extracted, it was necessary to apply feature selection to reduce their interdependence, their redundancy and to make the classification process more feasible. The purpose of this step was to determine the most significant features for the further classification step. In the present work, as it was studied in Chapter6, the information theoretic feature selection using the so-called Conditional Mutual Info Maximization (CMIM) criteria was used [105,122].
7.2.1.5 Classification
The objective of this step was to obtain the automatic recognition of normal and reactive lymphocytes, and five types of neoplastic lymphoid cells from PB. Accordingly, the most relevant features from the selection were used as inputs to the supervised learning classifier based on Support Vector Machines (SVM) using a radial basis function kernel [91, 92] as it was developed and tested in Chapter6. The classification performance was evaluated by the application of the 10-fold cross validation technique over the training set of 3617 lymphoid cell images. This technique randomly divides the data set into 10 equal size subsets. A single subset is used as the testing data, while the remaining data are used for training. Then, the process is repeated 10 times and a confusion matrix is calculated to get significant overall statistical measures.