Kinect-based Autonomous Robot - Background on the Human Spatial Language Model

Chapter 2. Background on the Human Spatial Language Model

3.3 Kinect-based Autonomous Robot

The robot system is the same one that I used for my M.S. study. It was designed to assist elderly people with household tasks and has a typical framework of an autonomous mobile robot. It is built using a P3DX differential drive mobile robot as the central focus, with sonar array along the side part. Its main sensor is a Kinect camera mounted on top, which is about 1 m from the ground. The Kinect can capture both RGB and depth images simultaneously which can be used to build the environment model during its working time. To command the robot, we first used an android phone with speech recognition to get the spatial description and send it to the robot through WLAN. The robot then grounds the command to obtain understandable navigation instructions. Finally, the robot was navigated to the move-to target. The system has been tested and has proven to be a good platform in experiments [13]. The design of the robot system is shown in Figure 3.1.

3.4 3D Robotic Simulator for the Research of Spatial Language Driven Robot

A side work in this research was to develop a 3D simulator to efficiently evaluate the performance of the spatial language driven robot system designed as the focal point of this dissertation work. The primary contribution of a robot simulator to a physical platform is that it is much more convenient to set up the test scene and record the ground truth. However, most robot simulation systems are unable to absolutely match the real-world scene the robot will work in and it always requires extra work to migrate a target system from the virtual platform to a physical platform. The Gazebo3D robotic simulation platform [24] was chosen as our simulation engine. This simulator has been proven robust and reliable for real-time robotic simulation. The simulator offers wide options on sensors and actuators which makes it easy for users to build up their own robot system. Our robot simulation system includes a robot having all the functions (sensing and acting) of the same physical platform which is introduced in section 3.3 and the several virtual robot working environments used in the experiments of Chapter 5 and Chapter 6. There are four virtual worlds programmed in our robotic simulators. They are the apartment world, the hk-studio world, the one-bedroom-house world and two-bedrooms-house world. The apartment world is identical to the environment we used in the human spatial language experiment in Chapter 2 and the robot experiment in the IROS2014 paper [13], including the building structure, furniture appearance and daily objects in the data collection scenes. The other three worlds use the same models of furniture and daily objectives but are different in the building structure, the placement of furniture items and the positions of daily objects. Figure 3.2 shows the bird’s-eye view of the four virtual worlds. The list of furniture items and daily objects are listed in Table 3-1 and their images are shown in Figure 3.3.

Table 3-1 The types of rooms, furniture and objects in the four virtual worlds.

Room Structures Furniture Items Objects

apartment round table fork

hk-studio coffee table glasses case

one-bedroom-house hexagon table laptop

two-bedrooms-house wood chair statue

blue chair monitor

dinner table mug

desk couch

bed

The ROS-based spatial language grounding system and the natural language generation system will also work from the simulator in the same way as on our physical robot platform. The simulator creates embedded applications for a mobile robot without depending physically on the actual machine, thus saving cost and time for other researchers who are working on or planning to work on the same topic.

Using the words in the template introduced in Chapter 2, we edited another corpus of object fetching tasks in our four virtual worlds. The corpus includes 77 spatial language commands for 24 fetch tasks (six per world). We tested our spatial descriptions system, which follow as well as the language generation on these commands and tasks. The goal was to provide other researchers working on their own spatial language robot a simulator and corpus as a benchmark challenge.

Figure 3.2 The bird’s-eye views of the four worlds in the Gazebo3D simulator (top-left: apartment; top-right: hk-studio; bottom-left:

Figure 3.3 The design of the robot, the furniture items, and the

daily objects which appear in the four virtual worlds. The upper figure is the joint photo of all the furniture items, the robot and the objects and the bottom figure is a snapshot of all the six

target objects. 3.5 Environment Model

Our robot is designed to work in an in-home environment, which includes both private homes and public residences (assisted living apartments, nursing homes, etc.). The robot was designed to conduct assistive tasks such as fetching daily objects. Each fetching task is under the navigation of spatial language given by human users. Those tasks and environments create the following challenges to the robot: (1) The scale of the robot working space is not large, but the space is cluttered with walls, furniture items and daily objects of various sizes and shapes. Those objects are all related to the natural language, which means they should be all registered in the environment. (2) The human spatial commands contain the information of spatial relations between objects, which should be understood by the robot for navigation. Thus, the robot must obtain not only the name label but also the spatial information of each object. (3) The furniture items, which are

considered good landmarks for localization are always moved by users without notice to the robot. In other words, the robot may work in a partially unknown environment. Our solution to these difficulties is to build a more elaborate environment model to include all the information needed.

In the environment model for the robot, the objects are described by an entity model which is discussed in Section 5.4.1. In our model, we use “entity” to represent semantic objects handled in human spatial language. An entity has: (1) an ID, (2) a name, (3) a 2D point set, and (4) an orientation. The ID is the unique identification of an object in a robot task. The ID number of an entity is given by the sequence of detection. The name is a word representing the objects obtained from the spatial language corpus. The 2D point set describe the positions of the cells in a 2D grid map which represent the object’s projection on the floor. To reduce the computation and noise we downsampled the raw point cloud to a voxel grid point cloud. The orientation of an entity is defined as the direction value of its functional front side in its ego-centric reference. For example, a chair has its functional front as the direction that a person faced when last sitting on it. Here, we do not define the orientation by linguistic variable but set it at a more precise numerical angle value in world coordinates. It should be noted that some kinds of furniture items such as night stand or round table are rarely used by human as reference since it is difficult to define a functional front side. These curved items not considered in our orientation estimation algorithm.

In document Spatial language driven robot (Page 34-40)