Inter-letters distance cost - Template Matching with Lucas Kanade optimization algorithm: gener

5 ALGORITHM KEY ELEMENTS

5.2 Template warping through Lucas Kanade optimization algorithm

5.2.3 Template Matching with Lucas Kanade optimization algorithm: general

5.3.2.3 Inter-letters distance cost

a) Distance cost preliminaries

The cost associated to the potential distance between letters depending on where they are identified is the result of an evaluation of the real spatial magnitude. The reason of not using the raw value is to earn control on its contribution on the final result.

In first place, it is necessary to accept a certain distance margin between letters with no penalization. This can be explained by comparing the templates of letters that are used and the complete words handwriting. In cursive handwriting, the different letters are linked using strokes, while the isolated letters in the templates being used do not include this lateral links.

Figure 55: Comparison between the letter “o” in the middle of the word “lion” (left) and isolated (right)

The approximate link distance between letters has been manually measured for a set of handwritten words from the reference word images. Each one of these measures is used as a ratio with respect to the mean letter width so that it not depends on the visualization characteristics or scaling. The mean letter width is the quotient between the total word width and the number of letters that constitute it:

𝛿 =𝑙𝑖𝑛𝑘 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑙𝑒𝑡𝑡𝑒𝑟 𝑤𝑖𝑑𝑡ℎ = 𝑙𝑖𝑛𝑘 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑤𝑜𝑟𝑑 𝑤𝑖𝑑𝑡ℎ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑒𝑡𝑡𝑒𝑟𝑠 ⁄

All the magnitudes being measured and compared in this step are the straight line that connects the ending of one letter and the beginning of the following one, i.e., Euclidean distance.

To measure the distance between letters, it is necessary to know where do the letters start and end. In this step, this is done by hand in an approximated way, using as limits the reference points that are defined for the templates. A big amount of data is collected so that statistics results are not affected by the drawback of the approximation.

Statistical results of a set of 𝑁 = 80 samples are presented in Table 5: Value

Max 𝜹̅ 0.67 Min 𝜹 0.07 Mean 𝜹̂ 0.34 Median 𝜹̌ 0.33

Section 5.3. Graph-based word segmentation 79

The fact that the median and the mean values are so close is good to reinforce the decision of using this reference value on the algorithm. For a distance magnitude between letters below the mean value, the associated cost is null. By this decision we accept that consecutive templates might be matched on the word separated by this offset. On the other hand, for distances above this threshold (0.33), the associated cost is increased; and similarly for negative distances; because negative values mean that the reference point of the beginning of a template is found more at the left on the image than the reference point of the ending of the previous letter. Because the reference points belongto the letter contours, this situation is always equivalent to a bad positioning of the templates. In consequence, negative values of distance between points are evaluated with a high cost.

The particular functions to evaluate the link ratios below and above the selected threshold of 𝛿 = 0.33 are defined empirically. Some different options have been tested and we have selected the function that has resulted in a better segmentation of the word.

In summary, the previous distance values are expressed as distance ratios and are finally evaluated according to a self-designed function.

b) Distance cost calculation

To implement the previous decisions into a mathematical function to evaluate the relative position between letters, a piecewise function is designed. This function and the constraints that have led to its design are presented below. The output domain of this function is also adjusted to be comparable and coherent with the NCC evaluated results.

Thanks to the fact that magnitudes are divided by letter width estimation, they are less sensitive to changes in the whole word scaling; so this phase is more uniform for all words.

i) Working range of values

Oppositely to the evaluation of NCC, the distance between consecutive letters has already a cost-intrinsic meaning: the furthest the letters are, the highest the distance magnitude is; the worse the solution is. Data from Table 5 is also used in this step to determine the common values obtained when the distance between reference points is computed.

ii) Negative: Exponential function (I)

The reference points of consecutive letters must fulfil the main constraint of spatial relationship that follows: The right reference point of one letter must be located in the X axis before the start point of the following letter. Therefore, negative inter-letter distances have to be particularly penalized because they always correspond to bad matchings of one of the two involved letters (or both). This penalization must lead the graph-based solution to consider other template shifting with also favourable NCC associated cost such that the relative distance has a feasible value.

With this purpose, we use an exponential function with high-value basis: 𝑓_𝐼(𝑥) = 20𝑎𝑏𝑠(𝑥)_{; 𝑖𝑓 𝑥 < 0}

Actually, the values below the statistical minima should also be penalized, even being positive. However, statistical data showed a minima value approximately null; thus an interval between 0 and the minima statistical data is omitted.

80 Chapter 5: ALGORITHM KEY ELEMENTS

iii) Low values: Constant function (I)

Also according to Table 5, up to a ratio of 0.33, the link with should not have any associated cost.

𝑓_𝐼𝐼(𝑥) = 1; 𝑖𝑓 0 < 𝑥 < 0.3

The cost is not set to 0 exactly because perfect NCC matching does not occur; the maximum achieved for each pair of images is usually around 0.7; thus a null NCC cost is never obtained. In consequence, for ratio being in this optimum interval, the cost value is set to constant 1; this way, the best NCC solution and the best distance solution contribute with a similar magnitude so that none of them forces the solution towards a less-desired result.

iv) Medium values: Exponential function (II)

Same statistical results collected in Table 5 show that a maximum link distance of 0.66 is detected. Consequently, penalization of link magnitudes below this threshold must not be very severe either, so that the associate template placements can also be selected as an option: they may be correct. Nonetheless, the probability of an incorrect placement in this interval is much higher than in the previous distance range. For this reason, the penalization is anyway bigger than in the previous scenario.

In this interval, the largest the distance is, the less probable that the selected letters positioning is accurate. To assign the cost according to these facts, the following function is suggested:

𝑓_𝐼𝐼𝐼(𝑥) = 2(𝑥−𝑎)_{; 𝑖𝑓 0.34 < 𝑥 < 0.66}

Continuity wants to be guaranteed for the entire cost function for smooth results of the evaluated ratios. To force continuity between these two consecutive graph sections, we proceed as:

𝑓_𝐼𝐼𝐼(𝑥𝑙 = 0.34) = 𝑓𝐼𝐼(𝑥𝑙) = 1

𝑓𝐼𝐼𝐼(𝑥𝑙 = 0.34) = 2(0.34−𝑎)= 1 → 𝑎 = 0.34

Finally, the second function that constitutes the distance piecewise function is: 𝑓_𝐼𝐼𝐼(𝑥) = 2(𝑥−0.34)_{; 𝑖𝑓 0.34 < 𝑥 < 0.66}

v) High values: Exponential function (III)

We have considered that all the links ratio above the statistically determined maxima of 0.66 belong to poor letter identification. In consequence, we want to strongly increase the associated cost to the magnitudes above this threshold. With this objective, but still to work in coherent range of values given the whole cost context, we suggest the following function:

𝑓𝐼𝑉(𝑥) = 10(𝑥−𝑎); 𝑖𝑓 𝑥 > 0.66

One more time, to guarantee continuity with the previous graph section, we proceed as: 𝑓_𝐼𝑉(𝑥_𝑙 = 0.66) = 𝑓_𝐼𝐼𝐼(𝑥_𝑙) = 2(𝑥−0.34)_{= 1.2483}

Section 5.3. Graph-based word segmentation 81

Finally, we obtain the last function of the entire evaluation cost function for the distances ratio between letters.

𝑓_𝐼𝑉(𝑥) = 10(𝑥−0.537)_{; 𝑖𝑓 𝑥 > 0.66}

vi) Piecewise distance cost function

The previous determined functions are used to write the complete piecewise function to evaluate the link distance ratios and get its associate cost. This piecewise function covers all possible values of the distance ratio:

𝑓(𝑥 = 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑟𝑎𝑡𝑖𝑜) = { 20𝑎𝑏𝑠(𝑥)_{; 𝑖𝑓 𝑥 < 0} 1; 𝑖𝑓 0 < 𝑥 < 0.34 2(𝑥−0.34)_{; 𝑖𝑓 0.34 < 𝑥 < 0.66} 10(𝑥−0.537)_{; 𝑖𝑓 𝑥 > 0.66}

Figure 56: Graphical representation of the inter-letter distances ratio evaluation

In document Graph-based segmentation of letters in handwriting words with Lucas Kanade template warping (Page 96-99)