• No results found

CHAPTER 2. BACKGROUND AND RELATED WORK

2.2 Related Work in Robotics

2.2.2 Using Behaviors to Recognize Objects

In addition to estimating specific object properties, behaviors have also been used by robots to recognize the identity of objects. Traditionally, object recognition has been treated as a computer vision problem. Indeed, the majority of robots today can only recognize objects

using visual and/or 3D laser scan data (Quigley et al., 2007; Srinivasa et al., 2009; Rasolzadeh et al., 2010; Rusu et al., 2008). With a clear view of the target object, such systems can achieve high recognition rates, but suffer from several limitations. For example, using vision alone, a robot cannot distinguish between a heavy object and a light object that otherwise look the same. Furthermore, such a system would be of little use if the object is outside the robot’s field of view (e.g., grasping an object that is inside a bag).

Several lines of research have attempted to address the limitations of the visual sensory modality by enabling robots to recognize objects using proprioceptive, auditory, and tactile sensory feedback. One of the first such examples is the work of Natale et al. (2004) in which proprioceptive data captured by the robot’s hand was used to recognize objects. In their experiments, the robot grasped seven different objects and the resulting joint-angles of the fingers were fed as inputs to a Self-Organizing Map (SOM). The SOM subsequently allowed the robot to distinguish objects of different sizes, as well as objects of similar size but different rigidity. Another approach to proprioceptive object recognition consists of estimating physical properties such as the objects’ mass and moment of inertia, and using that information to detect if a given object has been previously observed. Using this method, Kubus et al. (2007) performed an experiment in which a robot was able to recognize the identity of three different objects.

Other studies in non-visual recognition have investigated how robots can recognize surface textures using various forms of tactile feedback. Tanaka et al. (2003) developed an artificial fin-ger that uses strain gauges and polyvinylidene fluoride (PVDF) foil to generate tactile feedback when sliding across a surface. In subsequent experiments, Tanaka et al. (2007) demonstrated how their sensor can detect roughness and temperature changes in the textures of six different fabrics. A similar sensor was developed by Hosoda et al. (2006). By applying two different exploratory behaviors – pushing and rubbing – their robot was able to distinguish between five different materials. A robotic finger with randomly distributed strain gauges and PVDF films was also proposed by Jamali and Sammut (2010). In their experiments, a Naive Bayes classifier trained with the Fourier coefficients of the sensor’s output was used to recognize eight different surface textures. While these studies demonstrate the utility of tactile feedback for recognition

tasks, they typically consider such feedback in isolation and only make use of a limited number of behaviors (usually only one). In contrast, the research proposed here plans to investigate how the tactile sensory modality, coupled with scratching behaviors, can be used in conjunction with other channels of information to build a multi-sensory object representation.

Most of the studies in behavior-based object recognition reviewed so far typically assume that the robot can perform only one behavior on the objects that it explores. More recently, it has been demonstrated that robots can boost their recognition rates by applying multiple different behaviors on the test object. For example, Sinapov et al. (2009) proposed a framework for auditory object recognition using a set of five behaviors: grasp, shake, drop, push, and tap.

Using auditory information alone, the robot was able to achieve a recognition rate of over 99%

(measured with 36 different household objects). Such a high rate was possible only by applying all five behaviors on each test object and combining the outputs of the recognition models associated with specific behaviors. In subsequent studies, the same boosting effect was also observed when performing recognition using proprioceptive (Bergquist et al., 2009) as well as tactile feedback (Sinapov et al., 2011b). More recently, the feature extraction and similarity estimation methods proposed by Sinapov et al. (2009) were used by Rebguns et al. (2011) to solve an acoustic object recognition task with 10 objects, in which the robot used reinforcement learning to select which behaviors to apply in order to maximize recognition performance.

Another limitation of most current methods for recognizing objects using behaviors is that they typically use only a single sensory modality. In a recent study, we have shown that a robot may further improve its object recognition rate by not only performing multiple behaviors, but also by using multiple sensory modalities (Sinapov et al., 2011a). In that experiment, the robot explored 50 household objects using five different behaviors. Using both auditory and proprioceptive feedback, the robot was able to achieve a recognition rate of over 98%.

The results also showed that increasing the number of sensory modalities boosts the object recognition rates similar to the boosting observed with increasing the number of behaviors.

In another line of research, Gijsberts et al. (2010) describe a multi-modal object recognition approach that uses grasp affordance features that encode different ways in which an object can be grasped. Using a combination of the grasp affordance features and visual appearance

features, the robot was able to recognize 7 different objects.

A further limitation of most methods used by robots to recognize objects is that they start with a fixed object representation in which the robot’s training data is labeled with one of a finite number of object identities (see Torres-Jara et al. (2005); Sinapov et al. (2009); Natale et al. (2004); Rasolzadeh et al. (2010); Bergquist et al. (2009); Rusu et al. (2008); Sinapov et al.

(2011a); Marton et al. (2012) for a representative sample of such approaches). These methods implicitly make the assumption that the object individuation task (i.e., inferring how many unique objects have been observed) has already been solved. Providing labeled data, however, becomes increasingly more difficult as the number of objects increases.

In summary, while object recognition in robotics has traditionally been addressed as a visual classification problem, more recent lines of research have explored how the robot’s own behaviors can be used to solve this task. Most approaches to date only use a single behavior and a single modality and are typically evaluated on a small set of objects. In addition, virtually all previous approaches assume that all of the training data is labeled with the correct object identity. This assumption, however, is impractical since it would be impossible for a human instructor to label the data for each individual object that a robot may possibly interact with in a home or an office. The research proposed here will address these limitations by developing methods that can scale up to a larger number of behaviors, sensory modalities, and objects.

In addition, as described in Chapter 10, the robot in this research is not only tasked with recognizing the identity of a previously explored object, but is also tasked with solving the object individuation problem. Thus, this research relaxes the assumption that all perceptual experience with objects that is used to train the robot must be annotated with an object label.