• No results found

Rock (1973) suggested that recognition of disorientated objects occurs by first identifying the orientation of the object (principal axis). In this theory the object is described relative to the environmental and gravitational upright (see also Cooper and Shepard, 1973; Jolicoeur, 1985). An object becomes increasingly difficult to recognise as it is rotated away from an upright position. This is because it is increasingly harder to assign the specific orientation to the object. Rock argues that the effect on recognition ability caused by changes of orientation in the picture plane is greater than changes of orientation in the depth plane (i.e. change in perspective view). This is because change of view occurs independent of the direction of the object relative to the environmental and gravitational upright.

To account for recognition of the object rather than the object’s orientation several authors (e.g. Jolicoeur, 1985; Tarr and Pinker, 1989; Ullman, 1989; 1992) suggest that the retinal image produced by an object is matched against a viewer- centred representation stored in memory applying an alignment process. Different theories suggest different numbers of stored (view and or/orientation specific) representations. Palmer et al. (1981) and Jolicoeur (1985) stress that only one single (canonical) view and orientation of the object is needed for the representation of an object, whereas others (Koenderink and van Doom, 1979; Perrett, Smith et al., 1985; Tarr and Pinker, 1989; Ullman, 1989; Edelman and Biilthoff, 1990; Cutzu and Edelman, 1992; 1994; Logothetis, Vetter et al., 1994) suggest that there are several views represented and stored in memory. Most theorists would agree that stored representations are based on views and orientations with which the observer is familiar. Some models which include multiple representations of different views do

Effect of Rotation 98 not require multiple representations of different image orientations and sizes (Seibert and Waxman, 1991; 1992a,b).

Jolicoeur (1992) suggests that the orientation of an object’s representation is specified using a retinal co-ordinate system (an object is upright when it is aligned with the naso-temporal division of the retina). If the object being viewed is not in its upright orientation, then the visual system either applies mental rotation to the image, or processes the image on the basis of certain critical features of the object. These two methods can also work in parallel. Mental rotation in the picture plane transforms the image using the shortest 2D rotational path possible to achieve a match with the canonical upright representation. Tarr ..and Pinker (1989) suggest that several representations are stored in memory (each representation corresponding to one specific orientation commonly experienced). An input image is matched to the canonical orientation or the closest matching representation possible (through mental rotation).

One major problem with transformation models is that they assume some kind of recognition before the correct transformation process can be carried out. As Corballis (1988) points out, the visual system cannot rotate something to its canonical upright position without knowing what the object is, since without knowing the object one does not know what its canonical orientation is. To overcome this problem, Tarr and Pinker (1989), Jolicoeur (1992), Ullman (1989) acknowledge that specific details/characteristics or key features of objects are identified prior to matching with stored representations (using mental rotation or other transformation processes). The nature of these characteristics or features and how they are determined remains a weakness in these models.

Unlike the 2D approaches of the above models, Lowe's (1987) model of object recognition is based on matching the input images to stored 3D representations. This matching process is founded on the spatial organisation of the object’s edges and corners etc. As with the model of Marr and Nishihara (1978) there is no reason that the object’s orientation in the picture plane should affect the efficiency of matching. However, processing in both object-centred models can be more difficult when the object’s main features or the object’s principle axis is obscured due to an unusual perspective view (e.g. foreshortened axis).

Interpolation models (Poggio and Edelman, 1990; Edelman and Weinshall, 1991; Cutzu and Edelman, 1992; Intrator and Gold, 1992) use multiple 2D descriptions of an object from different perspective views. These 2D views are stored as clusters of views in a ‘representational space’. When sufficient views are stored, the appearance of any view falling in-between those stored, can be specified by interpolation between (or a linear combination of) those views that lie adjacent to the input view. This is perhaps most easily imagined in the orientation domain. For example, the appearance of an object rotated 45° from upright could be interpolated from two stored descriptions: one where the object is represented in its upright position and a second description where the object has been rotated 90°. The effectiveness of matching is then judged on how well the input image fits a stored view or the inteipolation between stored views. Interpolation models do not need to perform any processes akin to mental rotation (linear transformations or normalisation) in order to match an image to stored representation. Increased RTs for unusual orientations or views occur because these views lie between stored views and require more complex or time consuming interpolation.

With a sufficient number of 2D descriptions contained in views the same amount of information about an object can be specified as that by a single 3D description (Poggio and Edelman, 1990). Models utilising multiple views also have the flexibility to expand the number of stored views/orientations representations. That is, if an unusual orientation of an object becomes experienced due to viewing practice a new representation of that view can be added to those already stored. This leaves the problem of how one builds or links a collection of 2D representations together to represent the same object. Though such links could depend on continuity of experience (Foldiak, 1991; Perrett, Oram et al., 1991; Seibert and Waxman, 1991; Logothetis, Vetter et al., 1994; Oram and Perrett, 1994a).