• No results found

3.3 Problem formulation

3.4.2 Approach

We can consider the problem formulated in (3.1) as having two parts: a vision part, in which learning of the terrain type is done, and a mechanical behavior part, in which the slip measurements act as supervision. They are linked through the fact that they refer to the same terrain type, so they will both give some information about this terrain. In other words, during learning, we can use visual information to learn something about the nonlinear mechanical models, and conversely, the mechanical

80

Figure 3.4: Schematics of the main idea of incorporating automatic ambiguous super- vision into terrain classification. Examples for which the supervision signal is distinc- tively different can propagate this information to examples of ambiguous or noisy supervision through their similarity in vision space.

slip feedback to supervise the vision-based terrain classification. Our goal now is to make those two different sets of information interact.

We provide a solution to (3.1) in a maximum likelihood framework. The problem is that classifying the terrain types in visual space and learning their slip behavior are not directly related (i.e., they are done in different, decoupled spaces) but they do refer to the same terrains. So, we introduce hidden variables L (from a multi- nomial distribution with a parameter π) which will define the class-membership of each training example, similar to MoG [24]. In our case Lij = 1 if the ith training example (xi, yi, zi) has been generated by the jthnonlinear model and belongs to the jth terrain class. Now, given that the labeling of the example is known, we assume that the mechanical measurements and the visual information are independent. So, the complete likelihood will factor as follows:

Figure 3.5: The graphical model for the maximum likelihood density estimation for learning from both vision and automatic mechanical supervision. The observed ran- dom variables are displayed in shaded circles.

where Θ, Θ = {µj, Σj, θj, σj, πj}Kj=1, contains all the unknown parameters that need to be estimated in the system (to be described below).

The graphical model corresponding to this case is shown in Figure 3.5. The graph- ical model [25, 97] represents the assumptions about dependencies we have made for the variables in our problem. In particular, it states our abovementioned assumption that the visual part of the data is independent of the mechanical data, given the class label of the data point is known. We have also assumed that the number of terrain types K is known and that we have a fixed appearance representation x which is good enough for our purposes. The particular forms of the distributions P (X|L, Θ), P (Y, Z|L, Θ) will be described below. πj = P (Lj = 1|Θj) are the prior probabilities of each terrain class.

Using the hidden variables, the complete data likelihood function for the data D = {xi, yi, zi}Ni=1 can be written as follows1:

P (X, Y, Z, L|Θ) = N Y i=1 K Y j=1 [P (xi|Lij = 1, Θ)P (yi, zi|Lij = 1, Θ)P (Lij = 1|Θ)]Lij, 1

Since Liis an indicator variable for class membership of the ithexample, P (Lij) =Q K j=1[P (Lij= 1)]Lij. Similarly P (x i|Lij) = QK j=1[P (xi|Lij = 1]Lij [25].

82

from which the complete log likelihood function (CL) for the data is written as follows:

CL(X, Y, Z, L|Θ) = N X i=1 K X j=1 Lijlog P (xi|Lij = 1, µj, Σj) + (3.2) N X i=1 K X j=1 Lijlog P (yi, zi|Lij = 1, θj, σj) + N X i=1 K X j=1 Lijlog πj.

The complete log likelihood will be optimized iteratively in the EM algorithm [32] to obtain the optimal set of parameters. As seen, the introduction of the hidden variables simplifies the problem and allows for it to be solved efficiently with the EM algorithm.

The vision information X and the mechanical information Y, Z are considered to come from particular probability distributions, conditioned on the label. Those distributions are modeled, so that a tractable solution to the complete maximum likelihood problem is achieved. The vision data is assumed to belong to any of the K clusters (terrain types). For each of them, the mean and covariance parameters need to be estimated. The probability of a data point xi belonging to a terrain class j is expressed as: P (xi|Lij = 1, µj, Σj) = e−12(xi−µj) TΣ−1 j (xi−µj) (2π)d/2 j|1/2 , (3.3)

where µj, Σj are the means and covariances of the K clusters of vision data and d is the dimensionality of the vision space.

The mechanical measurement data is assumed to come from a nonlinear fit, which is modeled as a General Linear Regression (GLR) [105]. GLR is appropriate for expressing nonlinear behavior and is convenient for computation because it is linear in terms of the parameters to be estimated. For each terrain type j, the regression function ˜Z(Y ) = E(Z|Y ) is assumed to come from a GLR with Gaussian noise: fj(Y ) ≡ Z(Y ) = ˜Z(Y ) + ²j, where ˜Z(Y ) = θj0 +

PR

r=1θjrgr(Y ), ²j ∼ N (0, σj), and gr are several nonlinear functions selected before the learning has started. Some example functions are: x, x2, ex, log x, and tanh x (those functions are used later on in our experiments with the difference that the input parameter is scaled first). The parameters θ0

model for zi belonging to the jth nonlinear model (conditioned on yi), is assumed: P (zi|yi, Lij = 1, θj, σj) = 1 (2π)1/2σ j e− 1 2σ2 j (zi−G(yi,θj))2