4 WORD SEGMENTATION METHOD OVERVIEW
4.6 Graph-based word segmentation
4.6.1 Graph-based word segmentation formalism
With the warping template’s phase we have determined a set of templates that have to be matched on the word image. We then consider that each warped template can be translated locally; and its final placement is the result of the word segmentation problem.
To implement the segmentation problem in a graph-based representation, we associate the nodes to specific translations of the warped letter templates. The edges in the graph are directed and represent the nexus between consecutive letters depending on the involved templates translation. The cost associated to the graph edges has two components: first, the Normalized Cross Correlation results between the translated templates and the word letters; and second, the distance between the end of a template and the beginning of the next one, for each pair of consecutive letters. NCC outcome is obtained after a Template Matching analysis per each template being translated within the surrounding pixels of its estimated best position on the image (LK output). These surrounding pixels define the Search Windows for the TM phase.
To build this graph, we use a tree-structure and define an auxiliary node called source to be the root node. This root node is connected to all possible template positions (result of the warped template translation) for the first letter in the current word. This means that the source node is connected to as many numbers of nodes as the number of pixels in the first letter’s Search Window. The link between nodes is always parent-to-child (Section 2.4.2).
At the same time, each one of these nodes is connected to all possible positions of the second letter’s template within its respective Search Windows. Again, this means that each pixel in the second letter’s Search window is a possible position for the second template
This same idea is applied for all letters in the word except for the last one. It can be thus appreciated that the graph is structured in levels, one per each letter. The final graph has a shape that differs slightly of a standard tree graph because all the nodes of the last level are also connected to another auxiliary single node called “sink”. The nodes of the last level represent all the pixels of the last letter’s Search Window, where the last letter’s template can be placed.
Figure 31: Connection between two nodes in the graph-representation of the segmentation problem
Take the connection between the two nodes, 𝑃𝑖,𝑚 and 𝑃𝑗,𝑛 in Figure 31. We have declared that
nodes represent template positions on the letters’ Search Windows; or, equivalently, nodes correspond to a specific translation of the warped letter templates.
Section 4.6. Graph-based word segmentation 49
So, the first sub-index indicates the letter to which the node is associated; and the second sub- index indicates the particular number of pixel in that letter’s Search Window. Then, node 𝑃𝑖,𝑚 represents the case in which the template of letter i is placed in the pixel number m of its SW; and similarly for the node 𝑃𝑗,𝑛; where i and j are consecutive letters of a word.
The edge 𝐷𝑖,𝑚−𝑗,𝑛 connects the two mentioned nodes, thus it represents the connection between
the letters i and j if the template of the letter i is placed in the position m of its SW and the template of letter j is placed at the position n of its SW.
Following this nomenclature, Figure 32 presents the general structure of the graph that represents the word segmentation problem.
Figure 32: Scheme of the graph representation of the segmentation problem
From the graph’s example (Figure 32), we can tell there are q letters in the word being segmented. We can also tell that the Search Window associated to the letters in the word have different size: being n pixels in the first letter’s SW, m pixels in the second letter’s SW and t pixels in the last letter’s SW; or equivalently, the local translation possibilities for the templates, respectively.
The direct interpretation of the graph leads to the association of the NCC results to the nodes, as a result of the matching assessment between the warped template and the word for the translation referenced by the corresponding node. In order to work according to graph-theory, all costs must be included in the graph edges, and none in the nodes; hence NCC results are transferred to the edges as well. In consequence, the cost of the graph edge that connects two given nodes accounts for the NCC outcome associated to the parent node and the distance associated to the parent-child connection.
The segmented solution is obtained on the graph representation by means of the Dijkstra shortest-path algorithm. The shortest-path solution minimizes jointly optimizes the dissimilarity
50 Chapter 4: WORD SEGMENTATION METHOD OVERVIEW
of the handwritten letters in the word with respect to the modified wrapped templates (based on NCC results) and optimizes the spatial coherence between the letters (based on the distance between consecutive letters).