Conclusion - Incrementally resolving references in order to identify visually present objects i

We began this thesis with a look at the background of reference and meaning, beginning with

SDSs, where aRR component would fit into a SDS followed by a general approach toRR us- ingFOL. The shortcomings were noted (i.e., defining the set of classes, how to assign objects to those classes, and how to perform RRincrementally), and partially addressed by an appeal

to grounded semantics. We then toured through the philosophy on how meaning and reference are related, finding motivation on using reference as a task for learning meaning. We set out to overcome the shortcomings ofFOL and traditional approaches to RR by automatically determining the classes that objects could be assigned to, using a learned, grounded semantics that could determine if an object belongs to a specific class, and performing these tasks incrementally.

We reviewed the relevant literature with this in mind, noting that such an approach to incremental reference resolution and meaning representation had not yet been accomplished. Hence, the need for the work in this thesis. We then turned to a generative approached in the simple incremental update modelofRR. We saw thatSIUMworked well when varying different aspects of the model, from how theREs were represented, to how the world was represented, and how the two were connected. Though the model performed well in several experiments ranging over 3 different languages and despite the fact that the model does perform a kind of grounding, the model did not quite satisfy the original problem of determining the set of possible classes (rather, the model required a set of properties to be pre-defined), nor did it quite work as a mechanism of assigning objects to those classes (though it did to some degree), butSIUMimproves over previous work in that it works in an update-incremental fashion. The model was also able to handle the two types of REs in which we were interested: definite

descriptions and demonstratives (as well as, to a lesser degree, exophoric pronouns).

We then turned to the words-as-classifiers model which mapped directly from low-level object features to words; thus the need for mediating properties was removed. The set of classes is determined by the language itself as the words are treated as individual classes. The model represents each word as a classifier, which acts as the mechanism for probabilistically assigning objects to classes. The model further specified how the classifiers were to be applied in a reference resolution task and processed in an update-incremental fashion. It works robustly when used for definite descriptions (including relational REs). Though it can be incorporated with

a separate model for resolving deixis (demonstratives, in a general way), we leave modelling demonstratives inWACfor future work.

Given the experiments and evaluations in Chapters 5 and 6, we can conclude that the over- arching goal of modelling and implementing a practical component of incrementalRRhas been

thesis aim SIUM WAC

the model can resolve referring expressions update-incrementally 3 3 the model learns, given data, a mapping between visually present objects 3 3 given novelREs and novel scenes, the model can generalise 3 3

the model can handle definite descriptions 3 3

the model can handle demonstratives 3

the model can be implemented as a component in aSDS 3 3

the model can be evaluated to show that it can handle noise inASRand scene 3 3 formulate the model in such a way that word meanings can be accounted for 3

fit the model into a semantic framework 3

Table 7.2: Thesis aims addressed bySIUMandWAC.

realised. For completeness, Table 7.2 shows a listing of these aims and whetherSIUMorWAC

realises that particular aim. BothSIUMandWACwork in an update-incremental fashion. Both models work robustly against noise fromASRdue to spontaneous (though in somewhat limited tasks), often ungrammatical speech. TheWACmodel further performed well under noisy con- ditions of representing the objects. Both models can be fused with separate models of deixis and gaze, and SIUMcan incorporate deixis and gaze information as properties. Both models

ground, in their own way, aspects of REs with aspects of the world (by object properties for

SIUM or by low-level object features for WAC). Both models are fast and can perform their tasks in real time. These models improve upon previous work which was either not grounded, not incremental, or could not function in real time.

A lesser goal of this thesis is to fit these models into a larger semantic framework which is shown above in the semantic comparison. With the words-as-classifiers model, we substantiate the claim made in Dahlgren (1976):

Extensions determine intensions, though in a complex way, and not the other way around.

The work presented in this thesis, I assert, brings us a step closer to understanding meaning of words (i.e., visual words) and how that meaning is derived from interaction with the world. Specifically, through the words-as-classifiers model, we have estimated meanings of words using the features of objects referred to in the real world. Furthermore, the model is not just a theoretical model, rather it has been implemented and tested inRR tasks with some degree of success.

In document Incrementally resolving references in order to identify visually present objects in a situated dialogue setting (Page 189-191)