5. Evidence
5.3. Experiments
5.3.3. Tracking Performance
We evaluate the tracking performance of the chain graph model in comparison to other state-of-the-art methods on the segmented version of the Drosophila and zebrafish dataset (cf. Sec. 5.1 on page 41). The tracking performance is measured in terms of precision, recall, and F-measure as detailed in Sec. 5.2.3 on page 46. The parameters of all methods were optimized using exhaustive search as described in Sec. 4.5.1 on page 37. If not otherwise stated we used the basic chain graph model without the minimal cell cycle length extension (cf. Sec. 4.4.2 on page 36).
Zebrafish dataset
In a first experiment we compare two reasoners to solve the cell nuclei tracking-by- assignment problem for the zebrafish dataset: the baseline method employing an optimal joint assignment (Sec. 3.3 on page 19) and our proposed method based on a chain graphical model (Chap. 4 on page 21). For the zebrafish dataset we already reported tracking results using the optimal joint assignment method (Lou et al., 2011). The dataset is challenging since the segmentation does not only comprise cell nuclei but also false positives caused by speckle artifacts and inhomogeneous contrast. The results for the optimal joint assignment and the chain graph model are summarized in Table 5.3 on the facing page.
Drosophila dataset
In a second experiment we investigate the tracking performance of different meth- ods including the chain graph model on the Drosophila dataset. No quantitative
5Assuming no violations of consistency constraints; the energy range can be estimated from typical
5.3. Experiments Optimal Joint Assignment Chain Graph
precision 0.807 0.897
recall 0.861 0.850
f-measure 0.833 0.873
Table 5.3.: Tracking results on the Zebrafish dataset. The chain graph model shows improved performance in terms of f-measure and precision compared to optimal joint assignment model.
Detections Given Unconditioned Chain Graph Chain Graph with τ = 3 Bise et al. (2011) precision 0.889 0.953 0.956 0.550 recall 0.933 0.957 0.960 0.718 f-measure 0.911 0.955 0.958 0.623
Table 5.4.: Tracking results on the Drosophila dataset. The conditioned chain graph model with previously filtered false positives (detections given) is inferior to the
full chain graph model optimizing detection and assignment variables at once (unconditioned chain graph). The extended chain graph with the condition that
division events in a cell lineage must at least be three time slices apart from each other shows an improved performance over the unconditioned (or basic) chain graph model (chain graph with τ = 3). Finally, the method of Bise et al. (2011) shows
decent recall but suffers in precision.
tracking performance was reported for this dataset before. In particular, we com- pare the following approaches:
• the basic chain graphical model,
• a chain graph model with fixed detection variables,
• a chain graph model with four-state detection variables, satisfying a minimal duration of three time slices between division events of a particular track (see Sec. 4.4.2 on page 36),
• a state-of-the-art cell tracking method published by Bise et al. (2011).
Chain graph with fixed detection variables. The chain graphical model is a holistic
tracking approach that considers all time slices at once. That way it can reason about the state of detection variables and assignment variables simultaneously. In a less complex approach we could first decide about the states of the detection vari- ables and then—given the detection variables—optimize the assignment variables. To show possible performance improvements gained by the higher complexity
5. Evidence
of the holistic model we therefore compare with a chain graph model with fixed (resp. given) detection random variables. The states of the detection variables are determined by thresholding the Random Forest predictions at 0.5 probability to set the variables to ’active’ or ’inactive’. That is, we filter out all objects classified by the random forest as false positives before the tracking. Since the assignment CRFs in the chain graph are conditioned on prior factors over the detection variables, this approach can be justified as observing the detection variables and reasoning over the assignment variables given the detection variables without the need to refactorize the distribution—a conditioned chain graph. Note, that we use the very same probabilities to parametrize the factor in the full chain graph model. The difference lies only in the inference scheme but not in the amount or quality of input information.
Chain graph with four-state detection variables. In Sec. 4.4.2 on page 36 we describe
an extension to the basic chain graph model that incorporates our prior knowledge about a minimal temporal distance between two divisions in a single cell lineage. Again, we want to show a possible performance gain at the cost of higher model complexity compared to other variants like the basic or conditioned chain graph. For this experiment we set τ = 3, that is, we believe that the temporal distance between two divisions is at least three slices. The dependence of the performance on τ is investigated in another experiment described in Sec. 5.3.4.
Cell tracking method by Bise et al. (2011). Finally, we evaluate the cell tracking
method recently proposed by Bise et al. (2011) on the Drosophila dataset to set the chain graph model variants in proper context with another state-of-the-art approach. This method is quite similar to the chain graph model and is therefore a suitable candidate for a comparison. It is also a probabilistic model with similar random variables and is optimized over all time slices at once, too. However, since it is not a graphical model its factorization is ad hoc and differs from the chain graph factorization.
The tracking results for all four methods are presented in Table 5.4 on the preceding page. A full synopsis of all lineage trees is reproduced in Appendix B. A comparison of some lineage trees obtained by manual tracking, the extended chain graph, and the method by Bise et al. (2011) is shown in Fig. 5.6 on the next page. The last method shows convincing results in regions with high data quality (cf. Fig. 5.6). However, its f-measure is 33 percentage points worse compared to the chain graph model. This is most likely caused by the high false positive detection rate of 13% as is evident from the low precision of 0.550 and Fig. 5.6.