Feature Normalisation - Object Merging Using Characteristic Rules

5.6 Object Merging Using Characteristic Rules

5.6.3 Feature Normalisation

Experimentally, every image tested has a variable number of RMBRs. The number of feature vectors associated with each image depends on the number of RMBRs. In general, the numerical value for a feature v depends on the units used, i.e. on the scale. If v

is multiplied by a scale factor a, then both the mean and the standard deviation are multiplied bya. (The variance is multiplied bya2). It is desirable to scale the data so that the resulting standard deviation is unity. Traditionally, this is done by dividing v by the standard deviations. Similarly, in measuring the distance fromvtoµ(µis the mean of the respective feature), it often makes sense to measure it relative to the standard deviation. The meanµand the standard deviationsof a feature over all input samples are computed as in Equations5.16 and 5.17. µ= 1 M M X i=1 xi (5.16) σ = v u u t 1 M−1 M X i=1 (xi−µ)2 (5.17)

where M is the number of input samples. The calculation is performed for all features resulting inN number of means and standard deviations, whereN is the number of features used. From a set of feature vectorsI= (V1,V2,...,VM), withV = (v1, v2, ..., vN)T representing

vi= vi−µ

s ∀ i= 1,2, ..., M. (5.18)

The same procedure is carried out on all features. AnN-dimensional hierarchical clustering is performed using v= (v1,v2,...,vn) as the features of interest. Ideally, the features should

be first normalised over all RMBRs to produce a mean of zero and standard deviation of unity for each feature element. However, the usual step cannot be performed here for all features considering thatθis represented in polar coordinates (see Figure5.8(b)) withr=1. All the other features used are represented in Cartesian coordinates (see Figure 5.8(a)). These angular values do not represent the “true location” of the orientation. Taking the distance between two points in a polar coordinate and calculating the distance may in some cases result in false distance.

Looking at the polar value representations in Figure5.8(b), the distance betweenθ2 =π/16 and θ3 =−π/16, which isπ/8, represents the true account of distance between two radial points. However, the difference between points θ1 = 7π/16 and θ4 = −7π/16 tells an opposite story. The distance between θ1 and θ4 should beπ/8, similar to the one between

θ2 = π/16 and θ3 = −π/16. The maximum distance d between any two points in this version of polar coordinate representation is π/2. A result which exceeds this maximum distance is produced if the absolute difference is taken betweenθ1 andθ4, which results in 7π/8. Thus, a different approach towards finding the distance is needed.

In finding the correct distance between twoθvalues,π and -π translated versions of either one of the two orientation features are computed (see Figures 5.9(b) and (c)). This is clearly necessary, since two orientations in exactly opposite directions (difference of π) should be considered the same. For instance,π/8 and −7π/8 are two different values, but they represent the same orientation as far as the object-of-interest is concerned.

In order to compute the actual orientation distancedθ, three distances are calculated. The

first distance is an absolute value difference betweenθ1 and θ2. The second distance is an absolute value difference betweenθ1 and θ2, translated byπ, while the remaining distance is an absolute value difference betweenθ1 and θ2, translated by -π. These three distances are visually explained in Figure 5.9. From the three values, the minimum value is taken to represent the absolute distance |dθ| between the two orientations under consideration, θ1 and θ2. However, for the calculation of a distance matrix, the actual value dθ is used. Equation 5.19further explains the method.

|dθ|= min (|θ1−θ2|,|θ1−(θ2+π)|,|θ1−(θ2−π)|) (5.19) y x z (a) /2 /2 - (b)

Figure 5.8: Coordinate systems: a) cartesian and b) polar.

d (a) d (b) d - (c)

Figure 5.9: The three possibilities of actual orientation distance between θ1 and θ2: a)

difference betweenθ1 andθ2, b) difference betweenθ1 andπtranslated version of θ2 and

c) difference betweenθ1 and -πtranslated version ofθ2.

A slight complication occurs at this point, because of the fact that, in traditional techniques, as well as this initial approach, features are normalised first before distance is calculated for clustering. The unusual way in which distance between orientations is evaluated requires the orientation features normalised after difference is computed. Implementation-wise, distance calculation is performed before the distance is normalised with respect to the standard deviation of the respective features. Let {x1,x2,x3, ...,xm} be a vector of data

representing feature values, where m is the number of data and x = (x1,x2,...,xn), with n

being the number of features. The traditional approach in finding the normalised distance for feature i is performed by first normalising with regard to the mean µi and standard

deviationσi, calculated as

_x (1,i)−µi σi ,x(2,i)−µi σi , ...,x(m,i)−µi σi (5.20)

for alli = 1,2,...,nwhich results in a vector of normalised data {x˜1,x˜2, ...,x˜m}. Next, all

|x˜₍₁_,i₎−x˜₍₂_,i₎|, ...,|x˜₍₁_,i₎−x˜₍_m,i₎|,|x˜₍₂_,i₎−x˜₍₃_,i₎|, ...,|x˜₍_m−1,i)−x˜(m,i)| (5.21) for all i = 1,2,...,n which reveals the normalised distance matrix represented by {d˜1,

d2,..., ˜dm(m−1) 2

} with ˜d = (d1,d2,...,dn).

Using any distance metric, a distance vector representing all possible distance pairs can be computed by using the schemas explained in Section5.6.2.4.

In the modified version, the distance between all possible data pairs are computed and the distances are normalised.

The distances between all possible data pairs are calculated to reveal (m(m−1)/2) x n

matrixd ={d₍₁_,i₎, d₍₂_,i₎, ..., d₍_m₍_m−1)/2,i)}. Normalising a feature requires subtraction of the mean and then dividing by the standard deviation of the feature (see Equation 5.18). By taking the difference between a data pair, it is realised that the translation factor, which is the mean, does not contribute to the end result, since it can be cancelled out. Thus, in this case, only the scaling factor is required, which is the standard deviation of the feature in the computation of the normalised distance. Equation 5.22 shows mathematically how this is done for the first normalised distance ˜d1.

˜ d(1,i) = x₍₁_,i₎−µi σi −x(2,i)−µi σi = x₍₁_,i₎−x₍₂_,i₎ σi = d(1,i) σi (5.22)

Thus, to obtain a normalised distance from a vector of distances, the members are just divided by the standard deviation of the vector set such as {d(1,i)

σi , d(2,i) σi , ..., d(m(m−1)/2,i) σi } = {d˜(1,i),d˜(2,i), ...,d˜(m(m−1)/2,i)}.

In document Analysis of craquelure patterns for content based retrieval (Page 121-124)