• No results found

n slowest SFA-outputs s1...nare the orientation invariant encoding of the robot’s location

and are computed instantaneously from a single image. The stated model parameters are in accordance with the originally proposed model. The concrete values have been slightly adapted in the experiments to account for different image resolutions. However, the model has been shown to be robust under a range of parameter settings for image resolution, number of layers, receptive field size and overlap [42]. An illustration of the model is given in Fig. 3.3.

Figure 3.3: Model architecture. (a) The robot’s view associated with a certain position p:= (x, y) is steadily captured and transformed to a panoramic view. (b) The view is processed by the four layer network where each node in the network performs linear SFA for dimensionality reduction followed by a quadratic SFA for slow feature extraction. (c) The n slowest SFA-outputs s1...n over all positions p. The color coded outputs, so-called spatial firing maps, ideally show

characteristic gradients along the coordinate axes and look the same independent of the specific orientation. Thus, SFA-outputs s1...n at position p are the orientation invariant encoding of

location.

3.5 Analysis of the Learned Representations

For the task of self-localization and navigation the learned SFA representations ideally code for the position of the robot and are orientation invariant. According to [42], the sensitivity of an SFA-output function sj, j = 1...n to the spatial position p :=

(x, y) is characterized by its mean positional variance ηp over all orientations ϕ: ηp =

hvarp(s(p, ϕ))iϕ. Similarly, the sensitivity to the orientation ϕ is characterized by its

mean orientation variance ηϕ over all positions p: ηϕ = hvarϕ(s(p, ϕ))ip. In the ideal

case ηp = 1 and ηϕ = 0, if a function only codes for the robot’s position on the x-and

y-axis and is completely orientation invariant. The spatial information encoded by an SFA-output will be visualized by two dimensional spatial firing maps (see Fig. 3.3c). They illustrate the color-coded SFA-output value for every position p := (x, y). An output which codes for the position on a certain axis ideally produces a map that shows

26 3. Unsupervised Learning of Spatial Representations

a color gradient along this axis. If the SFA-outputs are perfectly orientation invariant the gradients should be clearly visible regardless of the specific orientation.

To perform a quantitative metric evaluation of the learned SFA-representation we com- pute a regression function from the quadratically expanded slow feature outputs to the metric ground truth positions from a training run. The obtained mapping from slow feature to metric space will then be used to evaluate the localization accuracy on a sep- arate test run and to determine the distance to a given target location in the navigation experiments. Please note that the ground truth coordinates are only used for evaluation purposes and that the slow feature representations are learned using visual input only.

4 Data Recording and Ground Truth

Acquisition

This chapter describes the procedures for generating the data that was used to evaluate the introduced methods in simulator and real world experiments. For reasons of sim- plicity and the benefit of a static environment and full control over the configuration space, a first validation of the approaches was conducted in simulated environments. Section 4.1 presents the simulator environments and the process for data generation and recording. A quantitative metric evaluation of the learned slow feature representation requires knowledge of the robot’s true position within the environment. In contrast to the simulator this ground truth information is not directly available and therefore has to be monitored by an external system. A method for ground truth data acquisition based on optical marker detection is detailed in section 4.2.1. The experimental plat- forms and the data generation procedures for the real world experiments are described in section 4.2.2.

4.1 Data Generation in the Simulator

Artificial data generated with a simulator is used in various experiments presented in this thesis to validate the introduced methods in a fully controllable setting. The simulator used in the experiments presented in section 5.1.1 was based on existing software avail- able at the Honda Research Institute. The virtual environment is made of green area, trees and some houses and resembles a park or a garden. Images have been rendered once at discretized positions forming a regular grid of 30 × 30 units. From every position the view of the virtual camera was mapped to a conic mirror to construct an omnidirec- tional image. The movement trajectory of the training and test runs was constructed afterwards by arranging the images and the corresponding coordinates to a continuous walk.

For reasons of greater flexibility and to achieve a higher quality of the rendered images we used the 3D software Blender1 and its Python API to generate data for the further

1

https://www.blender.org/

28 4. Data Recording and Ground Truth Acquisition

simulator experiments2. The garden-like environment was created by randomly drawing

from a pre-defined set of suitable 3D objects. The objects were placed on a textured ground plane at non-overlapping positions defined by randomly chosen polar coordinates with radii from a certain range. The area within the minimum radius defines the space where the virtual robot can freely move. The ground plane and the objects are enclosed by a spherical textured object to mimic a horizon and sky. The omnidirectional camera was created from a virtual camera pointing at an ellipsoid with a reflecting texture. Illustrations of the simulator environments and rendered images are shown in the corre- sponding experiment sections.

4.2 Data Generation in the Real World