Segmentation of Proliferation-Labelled Cell Nucle
Chapter 4 Segmentation of Proliferation-Labelled Cell Nuclei / I ,
4.7 Segmentation Evaluation and Experimental Results
Having summ arised these grey-level thresholding techniques their evaluation and comparison w ith one another as well as w ith PCA, that uses colour information instead, is deemed essential. For the purpose of this application, the easiest evaluation m ay be found by counting the num ber of cells detected, which is almost the same procedure followed in the previous chapter except for the fact that now only the num ber of proliferation-labelled cells was required. For achieving more comprehensive results one w ould also count the num ber of incorrectly labelled cells, resulting in tw o probabilities: true and false positives. However, this is again a very time consuming and subjective procedure if one considers that each image m ay contain a total of 800 cells, w ith very complex appearance, as revealed in Chapter 3. Similarly, if comparison is to be m ade w ith m anual counting, almost certainly different observers w ould m ark different objects as being wrongly-detected. In addition, all erroneously-labelled objects w ould have an equal weight in the eventual outcome, something that is not always true, and thus such a way of
Chapter 4_______________________________ Segmentation of Proliferation-Labelled Cell Nuclei
evaluation should be accompanied also by other m ethods [Straters and Gerhrands, 1991].
Although the developm ent of segmentation m ethods has attracted significant attention, and more than 1 0 0 0 different algorithms have been proposed in the
literature [Zhang, 1997], relatively few efforts have been spent on their evaluation. Also, m any newly developed algorithm s are compared only w ith some common algorithms and w ith few particular images. In the previous chapter it was show n that one w ay to differentiate labelling errors is to m easure the FOM (Figure Of Merit), which is an approach used widely for evaluation of edge detectors. However, due to its simplicity and broad applicability, this m easure has been extended for evaluating also segmentation algorithm s other than edge detectors [Zhang, 1996]. In particular, for evaluating the outcome of the different algorithms studied, the FOC (Figure Of Certainty) m easure was used here, which is described in
[Straters and Gerbrands, 1991]. For the purpose of proliferating cell detection this was defined as:
1
^
1where is the num ber of objects detected in the image by a particular method, A is a scaling factor usually set to 0.8, and e^ is the normalised colour error associated w ith each cellular object found by the algorithm, and defined as:
- ~ ( 4 + 4 + 4 ) - “7( X ' ^ ( S j k ~ S j Ÿ ) [4.12]
J J J = R ,G ,B k=]
w here are the squared errors at each colour band R, G and B respectively; gj^ is the grey-value of the fcth pixel at the ;th band and g j is the m ean value at band j of a sample image containing a representative proliferating cell extracted from each image being evaluated. Note also that
Chapter 4_______________________________ Segmentation of Proliferation-Labelled Cell Nuclei
the overall error value is divided by /, which is a scaling factor used in order to normalise in the range [0,1]-
FOC indicates essentially a m easure of fit of the representative colour properties of the objects extracted, using a sample cell-nucleus, w ith their m em ber image elements. For example, if an algorithm has segmented some cells in an image m any of which are not labelled for proliferation, i.e. false positive responses, then the error in Eq. [4.12] w ould be high since there will be a significant m ismatch between the overall colour content of the 'false' cells detected in the image, and that of the selected sample cell. Therefore, using Eq. [4.11] the value of FOC w ould be small, since this is inversely proportional to e^.
Similarly, if the colour properties of the detected objects m atch perfectly those of the sample nucleus, then the colour error w ould be low, and consequently FOC w ould approach the unit value, which denotes perfect agreem ent (note that 0.5<FOC<1).
On the other hand though, there is a chance that an algorithm may have found exclusively some of the brow n cell nuclei in an image, which means that FOC w ould be high in that case, but it has failed to detect all of them
leading inevitably to a false negative error. To take into account this type of error a different set of experiments was also conducted, involving manual counting. In specific, tw o different observers m arked the brow n cells in 10 full-scale test images, containing 40-500 (average 150) stained nuclei, and the individual counts were used as the 'ground truth'. Each of the thresholding algorithms along w ith PCA was then applied separately, in order to detect the proliferating nuclei, followed by some standard morphological procedures such as hole-filling to obtain solid objects and low-pass filtering to reject some small-size particles.
Table 4.1 summ arises the threshold values generated after applying each of the five different algorithms in the test images studied. The blue component from each original colour image was chosen in that case, since it w as found to present the best contrast between brow n-stained nuclei and histological background. For the sake of comparison, the FOC m ean values are also show n
Chapter 4_______________________________ Segmentation of Proliferation-Labelled Cell Nuclei
for each method, including PCA. From this table it can be seen that the first three algorithm s generate threshold values which are very close to each other, thus leading to similar FOC values. M inim um Error Thresholding (D), seems to produce a higher threshold for every image, resulting in m any false positive responses, something that can be concluded also from its lower value of FOC. In contrast to that, m ethod (E) appears to find a rather low threshold values, leading consequently to a rather high FOC since all the detected objects corresponded to proliferating cells, but at the same time the algorithm failed to detect m any other positive cells, leading to a large false negative error as it will be shown in the following cell-counting experiments. Finally, the FOC for PCA is very close to that of m ethods (A), (B) and (C). The thresholds values applied in the PC of the colour image are not show n since the latter w as different to the blue com ponent used as input to the other algorithms.
Table 4.1 Threshold values and FOC measures for five different methods.
Images A. Iter. Select. B. Entr. Thr. C. Otsu D. Min. Error E. Fuzzy Sets F. PCA 1 154 145 156 188 1 2 1 NA 2 183 176 184 2 0 1 119 NA 3 149 134 151 183 131 NA 4 160 163 163 183 129 NA 5 148 166 149 205 134 NA 6 149 140 151 182 129 NA 7 155 146 156 188 129 NA 8 141 135 140 190 109 NA 9 179 184 188 2 2 1 139 NA 1 0 136 128 133 167 1 1 0 NA FOC 0.6235 0.6274 0.6249 0.5993 0.6440 0.6263 N A : N o t A pplicable
In the second set of experiments, cell-counts generated by each m ethod w ere compared to the m anual counts of tw o different observers. To substantiate
Chapter 4 Segmentation of Proliferation-Labelled Cell Nuclei
these results, linear regression was applied to both sets of values (computer- versus hum an-generated counts), for each m ethod studied. The correlation indices, slopes and intercepts from the validation studies are compiled in Table 4.2, w here x in the regression equations represents the m anual counts. From all these figures one can note that PCA shows the best performance w ith error rates comparable to the person-to-person variability (see regression equation in the last row of Table 4.2). M ethod (A) seems to be the runner-up, but only w ith a marginal difference. M ethod (B) and (C) come third, but their correlation to the blinded counts of the two observers is still strong.
M inimum error thresholding appears to present the w orst performance w ith an intercept over than 1 0 0 nuclei, which w as expected because this m ethod
tended to generate a large am ount of false positive responses as a result of an overestim ated threshold. Very interesting are also the results for the "Fuzzy Sets', showing that although the previously discussed FOC m easure ranked this m ethod in the first place, here it can be seen that the actual num ber of cell nuclei in the images is underestim ated, leading to a negative intercept value and a slope smaller than one. As commented earlier, this algorithm tended to find a rather low threshold, resulting in a large fraction of positive cells incorrectly identified as negative.
Table 4.2 Regression data obtained from validation studies.
M ethods Observer A Observer B
vs Observers Linear Regression Correlation Index (R^) Linear Regression Correlation Index (R^)
(A) Iter. Select. 1.15X-7 0.96 l.llx + 2 0.96
(B) Entr. Thr. Q.8ÜX-H14 0.93 0.83X-H20 0.93
(C) Otsu 0.83x4-15 0.97 0.86x-h23 0.96
(D) Min. Error 1.04X-H108 0.78 1.10x-hll7 0.79
(E) Fuzzy Sets 0.84X-49 0.93 0.89X-33 0.95
(F) PCA l.lOx-^12 0.96 1.15x4-9 0.95
Chapter 4_______________________________ Segmentation of Proliferation-Labelled Cell Nuclei
In order to take into account both types of false positive and false negatives reflected by the FOC m easure and cell-counts respectively, an attem pt was m ade to derive a unified estimate that includes both these types of errors. Specifically, for each image studied the relative difference (RD) between the num ber of counts generated by each m ethod and the observers was m easured according to the formula:
RD =--- !--- [4.13]
- « 0 1 +
w here n, is the num ber of cells detected by m ethod i, and Uq is the average
num ber of counts found by the two observers in a particular image. This m easure was then m ultiplied w ith the FOC for each image, in order to create a combined estimate representing both the num ber of objects identified incorrectly as cells, i.e. FOC, and the num ber of true cells identified incorrectly as false, i.e. RD. Table 4.3 shows the m ean values of the figure: (FOC x RD), for each m ethod studied. It can be seen that the PCA algorithm yields the best results, providing a good compromise betw een a high FOC and a low num ber of false negatives. In contrary, m ethod (D) shows again the w orst performance, due to its lowest value for both FOC and RD observed in almost all test images. M ethods (A), (B) and (C) have roughly the same outcome, and may be placed altogether in the same rank after PCA, w hereas m ethod (E) has a slightly lower performance mainly due to its tendency to underestim ate the actual num ber of cells.
Table 4.3 (FOCxRD).
Combined estimate of FOC and relative difference (RD) of cells counts
A. B.
Iter. Select. Entr. Thr.
C. D.
O tsu Min. Error
E. Fuzzy Sets
F. PCA
Chapter 4 Segmentation of Proliferation-Labelled Cell Nuclei