Stochastic Functional Regression - Scalable Functional Regression

5.2 Scalable Functional Regression

5.2.1 Stochastic Functional Regression

Any FGD planner optimises an objective function, such as in Eq. (4.1). As the objective is uncountable, it is estimated via samples. Therefore, the choice of sampling schedule is paramount for a successful and efficient planner. The impor- tance of the sampling schedule is exacerbated in occupancy maps, where not every sample can generate an informative gradient, as discussed in Section 4.4. Con- sequently, sampling everywhere along the curve is most desired, as this increases the chance of identifying transition areas in the map. Yet, with a fixed resolution sampling defining a sufficient resolution a-priori is difficult. Hence most methods limit the sampling resolution according to their computational resources.

GP-based planners (Mukadam et al., 2016; Dong et al., 2016) use GPs for a smooth path representation. However, as the path is updated only at fixed support points, a dense representation is needed to ensure sufficient expressivity. Similar limitations also hold for the non-parametric approach used in (Marinho et al., 2016), where the support is taken from fixed resolution samples of the objective function. CHOMP and STOMP perform batch optimisation by ex- ploring the solution space using either Hamiltonian Monte Carlo or by estimat- ing the probability density of the objective using noisy path perturbations. As the path is waypoint based the solution space exploration is performed in the robot’s workspace. Consequently, the optimisation process is highly sensitive to the choice of the exploration hyperparameters. For example, STOMP’s update rule fails if the variance of perturbation is smaller than the size of obstacles, which in occupancy maps is unknown a-priori. The stochastic non-parametric approach of Chapter 4 addresses this problem by using continuous sampling in the trajectory domain. However, as the path is represented by a GP the computational costs are high, i.e. of the order O(N3) where N is the number of samples.

The approach taken in this work, alleviates the limitations present in the pre- vious chapter. Namely, it allows stochastic updates from continuous samples. To keep the computational cost low, while maintaining a highly expressive representation, a parametric and thus concise path representation based on kernel approximation is employed.

In kernel machines, Υ(t) defines a mapping from t ∈ [0, 1] into a potentially infinite-dimensional RKHS1 _{H (Schölkopf and Smola, 2001). The kernel function}

k(t, t0) defines the inner product D

·, ·E_H between two points in that space. In the approximate kernel approach we denote ˆΥ as a finite set of features that approximate the RKHS inner product by a dot product;

k(t, t0) =DΥ(t), Υ(t0)E

H≈ ˆΥ(t)

T _{· ˆ}Υ(t0). (5.1)

We note that the set of features only approximates the selected kernel in expec- tation, indicated by ˆΥ notation. There are several methods to generate features to approximate a kernel. For the radial basis function (RBF) kernel defined by k(t, t0) = exp(−γ k t − t0 k2), with γ a free parameter and k · k is the Euclidean

norm, we employed two different approximations2:

1. Random Fourier features (RFF) (Rahimi and Recht, 2009)

1_{The path RKHS is different to the one used by the Hilbert maps.}

This approximation requires m random samples of two variables; si ∼ N (0, 2γI)

bi ∼ uniform[−π, π]

i= 1...m (5.2)

The features vector is then given by ˆΥRF F_{(t) =} _√1

m[cos(s1t+ b1), ..., cos(smt+ bm)] (5.3) 2. RBF features

A kernel matrix K with rank n can be approximated by projecting it into a lower rank matrix using a set of m inducing points denoted by ˆt1, ..., ˆtm

(Schölkopf et al., 1998). Then, K ≈ Kn,mKˆm†Km,n, where the elements of

matrices Kn,m and ˆKm are defined as:

(Kn,m)(i,j) = K(ti, ˆtj) i = 1, ..., n and j = 1, ..., m,

( ˆKm)(i,j) = K(ˆti, ˆtj) i, j = 1, ..., m.

ˆ K†

m is the pseudo inverse of ˆKm. Using these m inducing points, the ap-

proximation features vector is given by Schölkopf et al. (1998): ˆΥRBF(t) = ˆ_D1/2ˆVT[k(t, ˆt

1), ..., k(t, ˆtm))]T (5.4)

Here, ˆD is the diagonal matrix of eigenvalues of ˆKm and ˆV are the corre-

sponding eigenvectors. The m inducing points can be modified during the optimisation, similar to (Williams and Seeger, 2001). However, in the this work we employed evenly distributed fixed features. We note that since the planner of Marinho et al. (2016) also uses a fixed number of support points, it implicitly employed this approximation, though without stochastic updates.

Using a weight vector w we can now express the robot configuration ξ(t) at t, as a function of the finite set of approximating features, ˆΥ(t):

ξ(t; w) = ξo(t) + ξb(t) + wTˆΥ(t). (5.5)

Here, ξois an offset path, which may be used to bias solution and can be computed

by a crude and fast planner. ξb is a term used to enforce boundary conditions.

Both ξo and ξb are represented by an approximated kernel representation with

the same curve properties as ξ (continuity, differentiability, etc.), although the feature set can be different.

Once the path representation has been defined, we can treat path planning as a regression problem, i.e., optimising the weight vector w:

woptimal = argmin

w U(w). (5.6)

The advantage of using this approach is that the model can be learned through stochastic sequential updates by samples from the entire domain.

As with any kernel method, the choice of γ and m is critical. The inverse lengthscale parameter γ depends on the desired smoothness of the trajectory. Non-smooth paths or sharp transitions require a shorter lengthscale (larger γ). Alternatively, a different kernel function, e.g. Matérn, and an appropriate kernel approximation should be used. Choosing the number of features m is a trade- off between the approximation accuracy and computational complexity. With a larger features set, the trajectory model can be more expressive, however the model will require additional computational resources.

In the following sections we discuss how to implement FGD using the approximated kernel regression model. We revise the general update rule of Eq. (2.37) into a practical gradient update based on the choice of path representation. Then, we present the full algorithm of the stochastic approximate kernel path planner.

In document Autonomous Exploration over Continuous Domains (Page 108-111)