Candidate based method performance - Human Metaphase Chromosome Analysis using Image Processing

It was established that the proposed algorithm outperformed centerline based approach [1] with statistical significance as demonstrated in section 4.1. In this experiment the proposed method was tested for performance on a larger data set containing 1400 chromosomes from 40 cell images (at an average of 35 chromosomes per image) containing both DAPI and Giemsa staining, with and without sister chromatid separation. Table 4.11 provides the breakdown of these cell images based on the staining method as well as the presence of sister chromatid separation (judged visually). In this experiment, the complete length of the width profile was utilized, which was a feature enabled by the use of candidates for centromere location as opposed to the global minima of the profile. Therefore, all chromosomes that were not touching or overlapping neighboring chromosomes in each cell image were included in the analysis of the experiment.

Table 4.11: Breakdown of chromosome cell images and chromosomes used for the larger data set based on the staining method and the sister chromatid separation

(SC Sep.)

Abbr. Label Images Chromosomes

D-NSC DAPI-No SC Sep. 4 114 D-WSC DAPI-With SC Sep. 18 587 G-WSC Giemsa-With SC Sep. 18 699

Total 40 1,400

The centromere locations manually recorded by the author were used as the ’ground truth’ in the analysis. The set of candidates generated by the algorithm was displayed superimposed on the chromosome and the candidate(s) that closely represent (within the centromere region) the centromere location was selected while rejecting others. In cases where all the candidates provided by the algorithm are incorrect, all the candidates were marked as rejected (negative examples for the classifier). Since the centromere is a region, a pixel error in detection may not convey the accuracy of the algorithm effectively. Therefore a binary detection accuracy measure was used for this test. In the current stage of the research, the intra-observer variability of ground truth was not analyzed due to the limitations of resources. The 1400 chromosome data set yielded 7058 centromere candidates. A randomly selected 50% portion of the data set along with the corresponding ground truth were used

Chapter 4: Results 85 for training a support vector machine for centromere localization. A Gaussian radial basis function kernel was used with sequential minimum optimization (which gives a l-1 norm soft margin classifier) for training the support vector machine classifier in this experiment. The trained SVM classifier was tested for effectiveness using the remaining 50% of the data set (2 fold cross validation) and obtained an accuracy, sensitivity and specificity values of 92%, 96% and 72% respectively. Two fold cross validation was selected as the validation method since it is a less computationally expensive method compared to methods such as the ’leave on out’ approach. Fur- thermore, the importance was placed on the ranking of the candidates as opposed to the label given for each candidate. Therefore two fold cross validation method yielded a reasonable estimation of the performance with minimal computation.

However, the key objective was to accurately detect the centromere location for each given chromosome in the data set as opposed to classifying all candidate points individually. The candidates in each chromosome were analyzed separately and the best candidate from the set was selected based on the distance metric value (ρ). After testing on 1400 chromosomes, the algorithm accurately located the centromere location in 1220 chromosomes with a detection accuracy of 87%. Its also important to note that the 124 chromosomes out of the missed 180 chromosomes were cases where none of the candidates included the centromere of the chromosome. Some of these were caused by segmentation of acrocentric chromosomes where the lighter intensity of the satellites were segmented out while others were caused by extreme sister chromatid separation. The detection accuracy for each image group is given in table 4.12 where a slight reduction in accuracy was observed for the groups with the presence of sister chromatid separation. The lowest detection accuracy was observed with Giemsa stained images which generally shows higher degree of chromosome boundary noise. It is important to notice that these observations are consistent with the conclusions derived through the Games-Howell post-hoc test (see table 4.9) discussed in section 4.1.1.

Figure 4.4 shows an example where 5 candidates were created using the local minima locations in width profile given by the proposed Laplacian based thickness measuring algorithm. In this instance, the 4th candidate was selected which was the largest positive distance from the data set, yielding a truncated CBCC value of 1.00. Figure 4.5 provides a sample representation of cases where the centromere was accurately localized. From a machine learning point of view, figure 4.5 (a), (b)

Chapter 4: Results 86

(a)

(b)

(c)

Figure 4.4: Demonstrates an example where 5 candidates were created for the chromosome in figure 4.4 (a) using the width profile in figure 4.4 (b). The figure 4.4

(c) shows the signed distance values for each candidate calculated from the separating hyperplane while the selected candidate is depicted in blue.

Chapter 4: Results 87 and (c) are fairly straight forward centromere localizations. The very high truncated CBCC values at 1.000 for all three cases provide further validity into the CBCC measure which indicate that the selected candidate was more preferable than the other candidates in the chromosome. Figure 4.5 (e) represents a chromosome where sister chromatid separation has had a significant effect on the chromosome segmentation. However as a result of correcting for sister chromatid separation, the algorithm has localized the centromere accurately with a CBCC value of 1.000. The chromosome segmentation in figure 4.5 (d) demonstrates evidence of extensive sister chromatid separation and therefore the CBCC value was at 0.995 which still was a high value for the data set. The figure 4.5 (f) represents a chromosome which was highly bent and also with very significant sister chromatid separation present within. Yet, the algorithm was capable of localizing an accurate centromere location with a low CBCC value of 0.661, which indicated a less than ideal separation between the centromere candidates.

Table 4.12: The detection accuracy values for chromosomes used for the larger data set based on the staining method and the sister chromatid separation (SC Sep.)

Abbr- Number of Number of Accuracy

-viation chromosomes accurate detections

D-NSC 114 104 91.2%

D-WSC 587 517 88.1%

G-WSC 699 599 85.6%

Figure 4.6 provides some cases where the algorithm failed to localize the accurate centromere location. Most of these (68%) were observed to be cases where none of the candidates were deemed to contain the actual centromere location, mainly due segmentation problems and very high levels of sister chromatid separation. Fig- ure 4.6(b) depicts an example where the segmentation algorithm failed to capture the constriction in an acrocentric chromosome. The CBCC value in this example was as low as 0.066 which indicated that the algorithm picked a weak candidate for the centromere. Figure 4.6(a) demonstrates a case where extreme sister chromatid separation has caused the segmentation algorithm to treat each individual chromatid arm separately. This chromosome had a low CBCC value of 0.368 which represented the acentric nature (morphological) of the separated arm. Another adverse impact of high sister chromatid separation is given by figure 4.6(c) where the long arm sister

Chapter 4: Results 88

(a) (b)

(e) (f)

Figure 4.5: Demonstrates some sample results of the algorithm where the accurately detected centromere location (selected candidate) is depicted by a yellow dot while

the segmented outline is drawn in blue. Figure 4.5 (a) is a result of DAPI stained chromosomes while figure 4.5 (b)-(f) are results of Giemsa stained chromosomes. These results reported CBCC measures of (a) 1.000, (b) 1.000, (c) 1.000, (d) 0.995,

Chapter 4: Results 89 chromatids of an acrocentric chromosome had been identified as a bent chromosome with no sister chromatid separation. The CBCC measure failed to distinguish this chromosome from a normal bent chromosome and had yielded a relatively high (compared to other misidentified localizations) value of 0.655.

(a) (b) (c)

Figure 4.6: Demonstrates some sample results where algorithm failed to yield an accurate centromere location. The detected centromere location (selected candidate)

is depicted by a yellow dot while the segmented outline is drawn in blue. These results reported CBCC measures of (a) 0.368, (b) 0.066, (c) 0.655 respectively.

A preliminary study was conducted to gauge the possibility of extending the proposed centromere detection algorithm into dicentric (chromosomes with two centromere locations) detection in radiation biodosimetry. Given that the constriction at the second centromere carries similar characteristics to the first centromere location, in theory it should be ranked high along with the best candidate (primary centromere). Therefore, the top four ranked candidates of the dicentric chromosomes in the data set was analyzed manually. The purpose was to find out whether both centromere locations would be encompassed within the top four candidate positions. In all 31 dicentric chromosomes in the data set, the first candidate (the selected centromere) was accurate. Out of the 31 cases, there were only two instances where the second centromere was not within the top four candidates. This was caused mainly by high sister chromatid separation. The example given in figure 4.5 (f) was observed to be one of these cases. The breakdown of the candidate numbers which captured the second centromere location is given in table 4.13, where a majority of cases reported the second centromere location as the second highest ranked candidate location. It is important to notice that in some of the cases, more than one candidate was created at the primary centromere location (in long chromosomes). This was observed to cause

Chapter 4: Results 90 some of the cases where the second centromere was ranked as the third candidate.

Table 4.13: The results of the preliminary analysis in studying the feasibility of extending the proposed method for dicentric detection is presented by indicating the

number of times different ranked candidates were able to encompass the second centromere.

Rank of the Number of second centromere cases

02 20

03 6

04 3

05 1

Chapter 5 Conclusions & Future work

The dissertation presented a novel algorithm for effectively analyzing human metaphase chromosomes in lymphocyte cell images. The algorithm was tested for the accuracy in width profile calculation as well as for centromere detection accuracy as discussed in chapter 4. This chapter provides a summary of the algorithm along with some conclusive remarks and feasible future work.

In document Human Metaphase Chromosome Analysis using Image Processing (Page 97-104)