Decomposition techniques tend to operate in a top-down manner (either breadth first or depth first) and thus lend themselves to tree storage structures, quad trees in the case of 2D data and oct-trees in the case of volumetric data. In the case of the overlapping decomposition, the boundary region is included as presented in Section 4.4 leading to nine regions (hence a non-tree). In the context of the region based representation techniques, the nature of this tree conceptualisation was not of significance; it was the regions identified by the nodes in the tree that were of importance. In the case of the WIB representation techniques considered in this chapter, the structure of the tree is of significance as is the nature of the node and edge labelling. More formally, a tree
T is a 4-tuple T = (N, E, nl, el), where N is the set of nodes (representing regions),
E ⊆N ×N is the set of edges, nl is the set of node labels, and el is the set of edge labels. The node label is a single value representing each node (region). The edge label is typically a mapping between the parent and child node. In the generated tree, every newly generated node is connected to an existing parent node with an edge having a label that indicates the relationship between the parent node and the child node. The edge labels are also single values. On completion, each node in the tree will describe a region in terms of a node label.
Algorithm 7.1 describes the process of adding node and edge labels to trees, gen- erated as part of a decomposition, so that Frequent Sub-graph Mining (FSM) can be applied. The input for the algorithm is a decomposed image (represented as a tree) generated using Algorithm 4.1 previously presented in Section 4.2. The notation
n.nodeLabelis used to indicate the node label belonging to node n,ni to indicate the
ith child node from node n, and n.edgeLabeli to indicate the edge label for the edge connecting a current node to a child node ni. The functions nodeLabel (line 6) and
edgeLabel (line 9) are used to generate node and edge labels as appropriate; these functions are discussed further in Subsections 7.2.1 and 7.2.2 below. Of course, this process could have been integrated with the decomposition process, but the labelling is not required in the case of the region based representations.
Algorithm 7.1 Pseudocode for the tree labelling.
Input: An image decomposed into a tree t
Output: The labelled tree Gi
1: max = Number of regions (Children) per node. 2: Start
3: treeLabelling(t) 4: End
5: FunctiontreeLabelling(n) 6: n.nodeLabel←nodeLabel(n) 7: if nhas child nodes then
8: fori= 1 tomax do 9: n.edgeLabeli ←edgeLabel(i) 10: treeLabelling(ni) 11: end for 12: end if 13: End Function 7.2.1 Node Labels
We refer to the different types of values that may be associated with a node as node features. The node features are used to form node labels. The selection of appropriate node features plays an important role in the context of the effectiveness of whole image tree representations. In the Average Intensity Values (AIVs), the mean of the intensity
values of the pixels (voxels) making up the region represented by a tree node, were used to label the nodes. This is probably the simplest way of assigning labels to nodes. However, it is conjectured here that such mean values for regions do not provide a sufficient description of content. Instead, for the work described in this thesis, the
Kurtosis of each region is used as the node label. Kurtosis measures the “peakedness” of a distribution (an intensity histogram in our case). In other words, Kurtosis describes the shape of a distribution [79]. In this case, Kurtosis represents the distribution of the histogram of the region. Kurtosis is calculated as follows (Equation 7.1):
Kurtosis= 1 n Pn i=1(hi−h)4 (1nPn i=1(hi−h)2)2 (7.1)
where h is the histogram vector associated with the region/volume represented by a given node, his the mean of h and nis the number of bins in h.
Later on in this chapter we compare between both these node labelling techniques. However, in order to not confuse AIV node labelling with the critical function of the same name, for the rest of this thesis the term “mean” node labelling will be used to indicate AIV node labelling.
7.2.2 Edge Labels
Using tree-based image representations, the edge features used are typically defined in terms of some node feature similarity measure describing the similarity between a parent-child node coupling. Edge features are essential for the envisioned volumetric tree classification as they are used, together with the node labels, for distinguishing between sub-graphs (sub-trees). With respect to the work described in this thesis, edge labels are defined in terms of the Kullback-Leibler Divergence (KLD) [68] between parent node pairs. The concept of KLD was introduced in Section 4.3 in terms of histogram based critical functions. KLD is a measure of the similarity between the parent and the child node intensity histograms for the regions/volumes they represent. Essentially, KLD measures the divergence between the two histograms. Thus KLD was used to generate edge labels in a similar manner to that with which it was used to determine the intensity homogeneity between parent-child node pairs during the decomposition process (as described in Section 4.3). Recall that Equations 4.4 to 4.3, presented in Section 4.3, demonstrated how the KLD measure is generated. Alternatives to using KLD include ED, DTW and LCS as also introduced in Section 4.3. These were not selected because the KLD measure indicates the likelihood ratio between two distributions (in our case histograms of nodes) and if two distributions are identical then the KLD value is zero while the other techniques compute the average difference between each histogram value (bin).