This dissertation employs a number of existing image processing and machine learn- ing algorithms. A brief description is given below. For a more detailed description, the reader is referred to appendix A.
Gradient Vector Flow (GVF) snakes - This is a commonly used active contour model based segmentation algorithm. GVF uses an edge based static vector field as the external energy for evolving a set of points which constitutes a closed/open snake. This segmentation algorithm has a higher capture range and the ability to converge into boundary concavities better than the standard active contour models. Therefore, this algorithm was used to obtain smooth object boundaries of human metaphase chromosomes. A detailed description of the GVF snake algorithm along with a comparison with the distance based snake model is given by appendix A1.
Chapter 2: Background 23 Discrete Curve Evolution (DCE) - DCE is a polygonal shape simplification al- gorithm which evolves by iteratively deleting vertices of a given polygon based on a relevance measure. This measure captures the contribution of each individual vertex to the overall shape of the polygon. In this research, this algorithm was utilized to locate chromosome salient points in order to partition the object boundary. Appendix A2 provides a detailed description of the algorithm and the relevance measure along with the advantages and disadvantages of this approach for detecting salient points.
Support Vector Machine (SVM) - SVM is a powerful kernel based supervised learning technique. SVM maximizes the margin between the two classes using the training data set. This provides good generalization and therefore is more likely to perform well with unseen data. Furthermore, the use of kernels to map data into a higher dimensional space increases the probability of obtaining a better separation between the class labels. In this research SVM was used as a classifier in multi- ple learning problems including contour partitioning, shape analysis and centromere detection. In some instances, the distance from the separating hyperplane (geomet- ric margin) was used as a measure of goodness of fit of a given sample. The basic framework of SVM along with the derivation of the classification problem is given in appendix A3.
24
Chapter 3 Proposed algorithm
Detecting abnormalities in the human metaphase chromosome structure is a key stage in the cytogenetic diagnosis process. Digital image analysis algorithms can speed up this process to effectively utilize valuable and scarce expert time. However, the existing algorithms in the literature can only operate on a limited range of shape variations that a chromosome can exhibit with a specific staining method. Therefore, an algorithm is proposed in this research which could operate with multiple staining methods and chromosome morphologies. The proposed algorithm is able to perform segmentation, extract the centerline, detect the centromere location and to detect and correct for sister chromatid separation. The algorithm also provides cytogenetic experts with a measure of confidence in a given centromere detection. It is developed and tested with both DAPI and Giemsa stained images and is readily adoptable to work with other staining methods.
The algorithm requires the user to manually pick a point within (or close to) each chromosome in order to proceed with the rest of the process autonomously. The algorithm assumes that the marked chromosome does not either touch or overlap with other chromosomes in the cell image. This assumption is reasonable due to the use of a content based ranking algorithm proposed by Kobayashi et al. in this approach [46]. The output of this algorithm was a ranked set of metaphase images where chromosome images that were spread well with minimal overlaps and were complete (contain all 46 chromosomes) were ranked higher. Typically from a given set of cell images, only the highest ranked 5% were selected for further processing. This is a critical step required to improve the accuracy of the proposed algorithm.
The proposed algorithm which is designed as a sequential set of processes, is depicted in the flow diagram given by figure 3.1. The user selected chromosome is first segmented out from the cell image. Next, the centerline of the chromosome is derived using the binary segmentation result. The algorithm next partitions the telomere regions of the chromosome in order to detect evidence of sister chromatid separation. If the presence of sister chromatid separation is detected, the proposed method corrects for the artifact. The correction is performed in order to obtain an
Chapter 3: Proposed algorithm 25 approximately symmetric partitioning of the contour which is a prerequisite for the IIL (Intensity Integrated Laplacian) thickness measurement algorithm. The Lapla- cian based thickness measurement algorithm was improved by integrating intensity information to utilize chromosome intensity bands. Once the thickness measurements are calculated, the proposed method creates multiple candidates for the centromere location based on local minima. Next, the candidates are ranked and the best candi- date is selected as the centromere location. The proposed method then calculates a measure termed ’Candidate Based Centromere Confidence’ (CBCC) which yields the confidence of the centromere detection based on the candidates.
The proposed algorithm will be explained in the following five functional stages, • Preprocessing and segmentation (discussed in section 3.2)
• Finding the chromosome centerline (discussed in section 3.3)
• Contour partitioning & correcting for sister chromatid separation (discussed in section 3.4)
• Laplacian based thickness measurement (discussed in section 3.5)
• Candidate based centromere detection (discussed in section 3.6)