As pointed out in Section 5.1.2, with the utilized mobile robot it seems intuitive to learn new object models by gripping the object and moving it in front of the PTU (see Fig. 5.13). Therefore, the NBV algorithm as described in the previous chapter is adapted so that the object is moved instead of the sensor. However, this only works for objects the gripper can actually grasp. For instance, the pneumatic filter object can only be firmly gripped at its top, as depicted in
5.4. GRIPPED OBJECT MODELING WITH MOBILE ROBOT 117
Figure 5.13: Gripped object modeling: a pneumatic filter object is modeled by planning NBVs in order to move the robot, which holds the object, and observe the object with the stereo camera of the PTU. Here, the NBV algorithm is inverted which means that the object is moved instead of the sensor.
Fig. 5.13. For all other parts the two-finger gripper’s maximum stroke length is too low or the grasp is not stable enough.
As the object is moved instead of the sensor, the general assumptions as de- scribed in Section 4.1 are a little different. The bounding box which represents the area of the unknown object is defined at the TCP of the robot arm assum- ing a maximum object size of 100 × 100 × 200 mm. The origin of the bounding box is chosen so that only the two fingers of the gripper are modeled but no further parts of the robot. As a CAD model of the gripper is given, the fingers can be extracted. This is automatically carried out after the final object model has been generated. Thus, not only the side of the object in direction of the gripper but also the parts of the object occluded by the gripper cannot be ini- tially modeled. For adaption of the NBV algorithm, we pretend that the object is at a fixed position and the sensor is moved around the object. In order to achieve this, the WCS is defined at the TCP and the TCP at the actual base of the robot. Thus, in order to manipulate the robot based on NBV candidate transformationsWTS, the inverse of Equation 3.2 on page 34 is determined:
Note that here, TTS is the static transformation between the robot base and
the stereo camera on the PTU. Furthermore, the transformations WTS in the
acquired range images are inverted to correctly align the data.
In contrast to the autonomous object modeling with the industrial robot and laser striper (see Sections 5.2 and 5.3), both the range images and robot poses are significantly noisier resulting in object models with a lot worse quality. Additionally, some of the objects cannot always be firmly grasped with the two- finger gripper, causing the object to drift. Therefore, tracking the robot arm does not reliably improve the pose error as the object’s position in relation to its initial grasp changes while moving the robot arm. Furthermore, tracking articulated models with a defined kinematic tree as suggested by Schmidt et al. (2014) does not work satisfactory as for the last element of the tree, namely the unknown object, no model is given resulting in mismatches and tracking errors. Also, improving the ICP by adding color matching as carried out by Krainin et al. (2011) is not possible for untextured objects as is the case for industrial objects. Thus, KinectFusion would not perform well and additionally it was actually developed for static environments. Moreover, there is no guarantee that the unknown object will always remain in the FOV which poses an additional challenge for tracking.
As shown in Fig. 5.13, the stereo camera on the PTU is utilized for the au- tonomous modeling of the gripped object. Thus, not NBS but NBV candidates as described in Section 4.3.1 are generated. After the object is manually placed into the gripper, an initial range image, which views the complete unknown bounding box, is obtained. Based on the initial range image, viewpoint candi- dates are generated and an NBV is selected considering exploration and surface quality. As the workspace of the LWR is very restricted in comparison to the KR16, for several NBV candidates, the robot cannot move the object in or- der to view it from the required side. Since not a scan path but only a single viewpoint is required, the orientation of the viewpoint around the z-axis of the sensor does not make a difference as long as the unknown part of interest is within the FOV. Therefore, for each determined NBV where the initial robot movement fails, the motion planner is called for different transformationsWTS.
Thereby, three additional transformations by rotating around the z-axis of the SCS by 90◦, 180◦, and 270◦ are tested. This led to a significant increase of NBV
candidates which the robot could actually carry out.
With the mobile robot, three objects that have also been modeled with the in- dustrial robot, namely the pneumatic filter, the Kinder Bueno, and the yellow shower gel, are autonomously modeled. Thereby, all calculations are carried
5.4. GRIPPED OBJECT MODELING WITH MOBILE ROBOT 119
Figure 5.14: The pneumatic filter object (left) is modeled with the mobile robot. The quality of the resulting point clouds from the different views (middle left) is poor. When performing ICP registration and streaming meshing, the resulting mesh (middle right) contains many holes and double walls. When applying a mesh growing as suggested by Wiedemann and Kriegel (2014), the final mesh (right) is watertight and its shape seems more consistent with the actual object than with the streaming mesh generation method. However, the model is very uneven.
out on the I7 boards of the mobile robot, which are slower than the external computer utilized for the modeling with the industrial robot. Nevertheless, the object models are obtained in about two to three minutes, which is faster than on the industrial robot. The reason for this is that no laser scans but only single range images are acquired. The quality of the resulting 3D models proved to be much noisier than with the industrial robot and laser striper system. Fig. 5.14 shows the results exemplary for the pneumatic filter (compare with Fig. 5.6 bottom middle). Note that the top of the object cannot be modeled as it is occluded by the gripper. Due to the poor data quality of each scan in combi- nation with a large error in the pose of the different scans, even with ICP a very noisy model is acquired with the mesh generation method as suggested in Section 3.3.1. Thus, the mesh growing method suggested by Wiedemann and Kriegel (2014) is applied to the point cloud without ICP. The resulting mesh (see Fig. 5.14 right-hand side) is watertight and its shape seems more consistent with the actual object. The model errors ¯e are 4.68 mm for the streaming mesh- ing and 2.72 mm for the mesh growing when comparing with the ground truth. Thus, the mesh based on the mesh growing algorithm seems more applicable to object recognition or grasping. Nevertheless, the performance of these models for recognition or grasping still needs to be investigated. Furthermore, algo- rithms need to be implemented to cope with the sensor and robot uncertainties. This, however, is not the main focus of this thesis.