Experiment 4: Uncertainty in the Perception of the World

In this experiment, we focus on P (R|I). Until now, we have claimed that R would be picked out to be uttered by a speaker based on properties that an object has; however, here we insert uncertainty into R, as is explained below. How this is manifested inSIUMis shown in Table 5.1- B which shows that properties in R for each object in I contain no uncertainty and Table 5.2, which shows that an object has a distribution over property types.

5.6.1 Data

The data used in this experiment is the same data as Experiment 2, with some additional deriva- tions of the scenes, which will now be described.

5.6.2 Scene Processing

Following Kennington et al. (2015a), we want our model to work with images of real objects as input, even though for our particular data the scenes are represented symbolically (that is, we know without uncertainty each piece’s shape, colour, and position). Using the images that were generated from these symbolic descriptions by Kousidis et al. (2013) and performing computer vision on them does not introduce much uncertainty, as there is no variation in colour

Figure 5.18: Example Pento board for gaze and deixis experiment; the yellow T in the top- right quadrant is the referent.

Figure 5.19: Example Pento Board that has been distorted from its original form (Figure 5.18).

or appearance of individual shapes, and so the data cannot serve to form generalisations. To get closer to conditions as they would hold when working with camera images (e.g., variations of colour due to variations in lighting, distortion of shapes due to camera angles, etc.), we pre-processed these images: We shifted the colour spectrum as follows: the hue channel by a random number between -15 and 15 and the saturation and value channels by a random number between -50 and 50. For the object shapes, we apply affine transformations defined by two randomly generated triangles and warp the image using that transform. This generates more complex shapes that retain some notion of their original form. Figure 5.19 shows a game board that has been distorted from its original (Figure 5.18).

Using these distorted images, we processed each image using the Canny Edge Detector (Canny, 1986) and used mathematical morphology to find closed contours of the objects, thereby segmenting the objects from each other. We acquired the boundary of the objects (always 15 of them), following the inner contours as identified by the border tracing algorithm (Suzuki and Abe, 1985). For each individual object we then extract the number of edges, RGB (red, green, blue) values, HSV (hue saturation value), and from the object’s moments: its centroid, horizontal and vertical skewness (third order moments measuring the distortion in symmetry around the x and y axis), and the orientation value representing the direction of the principal axis (combination of second order moments).

From these features, we compute a distribution over the set of colours and shapes using a

SVMclassifier for each. Position properties are computed using a set of rules; objects above a certain y value receive the top property, if below then bottom, if to the left of a certain x threshold, the left property, if to the right, the right property (other position properties have 0 probability). With these properties, each object now has all colours and shapes, albeit

with differing probabilities. This represents the uncertainty in the properties for each object.

5.6.3 Task & Procedure

The task isRR, as described earlier. At each increment, the model returns a distribution over all

objects; the probability for each object represents the strength of the belief that it is the referred one. The argmax of the distribution is chosen as the hypothesised referent.

Using 1000 episodes, we evaluate our model across 10 folds, where 900 episodes (utter- ances+scenes) were used for training, and the remaining 100 were used to test the model. Our baseline model is random selection (which gets an accuracy of 7%).

Figure 5.20: Results of our model in accuracies; higher numbers are better results for accuracies, lower numbers denote better results for average rank.

5.6.4 Metrics

The metrics in this experiment are the same as in Experiment 1.

5.6.5 Results

Utterance-level Results

Compared to the speech-only results for Experiment 2 in Figure 5.10, when inserting uncertainty into the model, there is a fairly dramatic decrease (from 76.7% to 61%) inRRaccuracy. The model still picks out the top referent out of 15 more than 61% of the time, but there is a clear performance hit. We will provide further discussion below.

Incremental Results

Compared to Experiment 2 (speech model only), the incremental results shown in Figure 5.21 have first-correct and first-final values that are much later. On average, when uncertainty exists in the representation of W,SIUM makes a final decision before the end of theRE, but is later than before in coming to that decision after an increased amount of edit overhead. Overall, the model remains fairly robust even when there is uncertainty in how the scene is represented.

% edit overhead 1-6 3.8 7-8 17.2 9-14 27.5 % never correct all lengths 32.0

Table 5.4: % edit overhead and never correct

Figure 5.21: Incremental Performance

5.6.6 Systematic Insertion of Uncertainty

In this section we explore how wellSIUM performs when varied levels of uncertainty are in- serted into R, specifically the colours. Using the data from Experiment 2 and 3, each object could have one of 7 colours (red, blue, green, yellow, pink, gray, cyan). In a scene without uncertainty, each object has only one colour that receives all of the probability mass. Increasing uncertainty amounts to removing probability mass from that colour and distributing it across the other colours, where colours that are closer in the colour spectrum (e.g., red is closer to pink than it is to green) get more mass than those that are farther away. As the amount of probability mass removed from the original colour increases, so does the entropy of the distribution over the colours. All other non-colour properties were their original values (i.e., they were fully observed).

Using a single fold of the data for training, we evaluated 100 boards 100 times, each time increasing the entropy over the colours for each object on each board. Figure 5.22 shows the accuracy of the model at each point of average entropy over the object colours. As expected, as the average entropy over the colours increases so does the uncertainty, which leads to decreased

accuracy. A major drop off occurs around 1.7, where the probability mass over the colours became more uniform. To compare, the average entropy over the colours of the distorted images used in this experiment was 0.047, which is fairly low and should be stable when compared with Figure 5.22, but since there is also uncertainty in the shapes (average entropy of 0.032) which is a more realistic setting, the results were considerably worse.

Figure 5.22: Accuracy of model (y-axis) decreases as average entropy over colours increases (x-axis).

In document Incrementally resolving references in order to identify visually present objects in a situated dialogue setting (Page 151-155)