5 ALGORITHM KEY ELEMENTS
5.2 Template warping through Lucas Kanade optimization algorithm
5.2.3 Template Matching with Lucas Kanade optimization algorithm: general
5.3.2.3 Inter-letters distance cost
a) Distance cost preliminaries
The cost associated to the potential distance between letters depending on where they are identified is the result of an evaluation of the real spatial magnitude. The reason of not using the raw value is to earn control on its contribution on the final result.
In first place, it is necessary to accept a certain distance margin between letters with no penalization. This can be explained by comparing the templates of letters that are used and the complete words handwriting. In cursive handwriting, the different letters are linked using strokes, while the isolated letters in the templates being used do not include this lateral links.
Figure 55: Comparison between the letter βoβ in the middle of the word βlionβ (left) and isolated (right)
The approximate link distance between letters has been manually measured for a set of handwritten words from the reference word images. Each one of these measures is used as a ratio with respect to the mean letter width so that it not depends on the visualization characteristics or scaling. The mean letter width is the quotient between the total word width and the number of letters that constitute it:
πΏ =ππππ πππ π‘ππππ πππ‘π‘ππ π€πππ‘β = ππππ πππ π‘ππππ π€πππ π€πππ‘β ππ’ππππ ππ πππ‘π‘πππ β
All the magnitudes being measured and compared in this step are the straight line that connects the ending of one letter and the beginning of the following one, i.e., Euclidean distance.
To measure the distance between letters, it is necessary to know where do the letters start and end. In this step, this is done by hand in an approximated way, using as limits the reference points that are defined for the templates. A big amount of data is collected so that statistics results are not affected by the drawback of the approximation.
Statistical results of a set of π = 80 samples are presented in Table 5: Value
Max πΉΜ 0.67 Min πΉ 0.07 Mean πΉΜ 0.34 Median πΉΜ 0.33
Section 5.3. Graph-based word segmentation 79
The fact that the median and the mean values are so close is good to reinforce the decision of using this reference value on the algorithm. For a distance magnitude between letters below the mean value, the associated cost is null. By this decision we accept that consecutive templates might be matched on the word separated by this offset. On the other hand, for distances above this threshold (0.33), the associated cost is increased; and similarly for negative distances; because negative values mean that the reference point of the beginning of a template is found more at the left on the image than the reference point of the ending of the previous letter. Because the reference points belongto the letter contours, this situation is always equivalent to a bad positioning of the templates. In consequence, negative values of distance between points are evaluated with a high cost.
The particular functions to evaluate the link ratios below and above the selected threshold of πΏ = 0.33 are defined empirically. Some different options have been tested and we have selected the function that has resulted in a better segmentation of the word.
In summary, the previous distance values are expressed as distance ratios and are finally evaluated according to a self-designed function.
b) Distance cost calculation
To implement the previous decisions into a mathematical function to evaluate the relative position between letters, a piecewise function is designed. This function and the constraints that have led to its design are presented below. The output domain of this function is also adjusted to be comparable and coherent with the NCC evaluated results.
Thanks to the fact that magnitudes are divided by letter width estimation, they are less sensitive to changes in the whole word scaling; so this phase is more uniform for all words.
i) Working range of values
Oppositely to the evaluation of NCC, the distance between consecutive letters has already a cost-intrinsic meaning: the furthest the letters are, the highest the distance magnitude is; the worse the solution is. Data from Table 5 is also used in this step to determine the common values obtained when the distance between reference points is computed.
ii) Negative: Exponential function (I)
The reference points of consecutive letters must fulfil the main constraint of spatial relationship that follows: The right reference point of one letter must be located in the X axis before the start point of the following letter. Therefore, negative inter-letter distances have to be particularly penalized because they always correspond to bad matchings of one of the two involved letters (or both). This penalization must lead the graph-based solution to consider other template shifting with also favourable NCC associated cost such that the relative distance has a feasible value.
With this purpose, we use an exponential function with high-value basis: ππΌ(π₯) = 20πππ (π₯); ππ π₯ < 0
Actually, the values below the statistical minima should also be penalized, even being positive. However, statistical data showed a minima value approximately null; thus an interval between 0 and the minima statistical data is omitted.
80 Chapter 5: ALGORITHM KEY ELEMENTS
iii) Low values: Constant function (I)
Also according to Table 5, up to a ratio of 0.33, the link with should not have any associated cost.
ππΌπΌ(π₯) = 1; ππ 0 < π₯ < 0.3
The cost is not set to 0 exactly because perfect NCC matching does not occur; the maximum achieved for each pair of images is usually around 0.7; thus a null NCC cost is never obtained. In consequence, for ratio being in this optimum interval, the cost value is set to constant 1; this way, the best NCC solution and the best distance solution contribute with a similar magnitude so that none of them forces the solution towards a less-desired result.
iv) Medium values: Exponential function (II)
Same statistical results collected in Table 5 show that a maximum link distance of 0.66 is detected. Consequently, penalization of link magnitudes below this threshold must not be very severe either, so that the associate template placements can also be selected as an option: they may be correct. Nonetheless, the probability of an incorrect placement in this interval is much higher than in the previous distance range. For this reason, the penalization is anyway bigger than in the previous scenario.
In this interval, the largest the distance is, the less probable that the selected letters positioning is accurate. To assign the cost according to these facts, the following function is suggested:
ππΌπΌπΌ(π₯) = 2(π₯βπ) ; ππ 0.34 < π₯ < 0.66
Continuity wants to be guaranteed for the entire cost function for smooth results of the evaluated ratios. To force continuity between these two consecutive graph sections, we proceed as:
ππΌπΌπΌ(π₯π = 0.34) = ππΌπΌ(π₯π) = 1
ππΌπΌπΌ(π₯π = 0.34) = 2(0.34βπ)= 1 β π = 0.34
Finally, the second function that constitutes the distance piecewise function is: ππΌπΌπΌ(π₯) = 2(π₯β0.34) ; ππ 0.34 < π₯ < 0.66
v) High values: Exponential function (III)
We have considered that all the links ratio above the statistically determined maxima of 0.66 belong to poor letter identification. In consequence, we want to strongly increase the associated cost to the magnitudes above this threshold. With this objective, but still to work in coherent range of values given the whole cost context, we suggest the following function:
ππΌπ(π₯) = 10(π₯βπ); ππ π₯ > 0.66
One more time, to guarantee continuity with the previous graph section, we proceed as: ππΌπ(π₯π = 0.66) = ππΌπΌπΌ(π₯π) = 2(π₯β0.34)= 1.2483
Section 5.3. Graph-based word segmentation 81
Finally, we obtain the last function of the entire evaluation cost function for the distances ratio between letters.
ππΌπ(π₯) = 10(π₯β0.537); ππ π₯ > 0.66
vi) Piecewise distance cost function
The previous determined functions are used to write the complete piecewise function to evaluate the link distance ratios and get its associate cost. This piecewise function covers all possible values of the distance ratio:
π(π₯ = πππ π‘ππππ πππ‘ππ) = { 20πππ (π₯); ππ π₯ < 0 1; ππ 0 < π₯ < 0.34 2(π₯β0.34) ; ππ 0.34 < π₯ < 0.66 10(π₯β0.537); ππ π₯ > 0.66
Figure 56: Graphical representation of the inter-letter distances ratio evaluation