2.4 Artificial Intelligent Robotics (AI Robotics)
2.4.2 Knowledge Representations and Extraction
As highlighted in the previous sections, capturing and representing human skills for imitation learning gains increasing attention in robotics applications that aim to transfer human skills to the robots. Most of the research work reported in the area of imitation learning is based on visual perception. This is mainly because humans mostly rely on vision to gain adequate information about objects’ relative positions and their geometrical properties [97, 76]. In assembly appli- cations, perception importance can vary with motion, where gross motion rely on vision while fine motion requires haptic information, especially in contact situations. The focus of the re- search work reported in this thesis is on the use of haptic information to learn an assembly task. Capturing human skills is particularly complex for assembly processes which often involve an understanding of hidden process features. For example, for a successful assembly task, a deep understanding of various types of contacts between objects and their corresponding forces is required. Another important aspect of an assembly process is the sequential relations between different CS’s during the assembly. Such that different skilled operators can perform the stages of the same task and its stages with different temporal properties (transition between states and durations). To capture, understand and interpret human skills from several trials, those trials must be aligned (regarding duration). Also, the underlining pattern of the haptic information must be extracted to reveal the sequential (temporal) knowledge (human skill). Hence, those skills must be modelled so that they can adapt to task variations for robotic assembly.
Most of the research mentioned above follow the pattern recognition in the extracted/selected features by temporal knowledge modelling (capturing), which can be captured in the symbolic or non-symbolic domain. The main advantages of the non-symbolic models are their paramet- ric nature and their capability to capture variations in human skills [98]. On the other hand, the symbolic approaches are well known for capturing complex human behaviour with simpler and shortened models that have better computational performance. For example, symbolic ap-
proaches can capture the assembly sequence at different hierarchical levels (granularity), which is difficult using probabilistic approaches. Even though symbolic models have traditionally been considered unsuitable for controlling real-world systems [99], researchers are now mak- ing effective use of these models for skills representation, evaluation, generalisation and robot control [100]. These models are computationally efficient, simple and capable of capturing complex human skills. Therefore, the research work reported in this section explores the use of symbolic models to capture human assembly skills.
Another supervised learning algorithm that has good performance is Random Forests (RF). The RF was introduced by [101]. However, it is still not widely accepted in robotics applications. Caruana and Niculescu-Mizil presented a comparison amongst supervised learning algorithms, which found that RF is the second best algorithm after boosting [102]. Caruana et al. also compared the performance of the same supervised learning algorithms that was performed on a high dimensional data [103]. It was demonstrated that RF has a steady performance over a wide range of dimensionalities.
Andreas et al. proposed an active RF to overcome the active vision problem [104]. Active vision literature focuses on finding effective approaches to select observations with minor attention to the classification approaches. In that work, RF was used as a classification approach to identify clothes and grasping points. Also, active RF was used on-line to predict the grasping point positions that reduce the grasping error. Liarokapis et al. recorded muscular activity from human forearm and upper arm while an operator was performing a reach-to-grasp activity where the object position was defined in 3D space [105]. In that work, RF was used to classify different Electromyography (EMG) signals for different reach-to-grasp strategies. Furthermore, Matteo et al. presented a comparison between model-based approaches and machine learning approaches to compensate internal F/T for a humanoid robot [106]. The main result of that work was that the performance of learning algorithms exceed the performance of the analytical- models approach with regard to their prediction accuracy. Two machine learning approaches were implemented and compared, namely, Least-Square Super Vector Machine (LS-SVM) and Neural-Network (NN). The LS-SVM converged more rapidly compared with NN. However, once they converged, eventually their performance was almost identical. The proposed data- driven methods require tuning, which is difficult due to the number of parameters that need to be tuned. Also, training data sometimes require pre-processing and normalisation to gain acceptable performance.
Limitation of Dataset in LfD
In the previous overview of the LfD, different structures and issues in this area were presented. Nevertheless, developing the LfD architecture is not straightforward, due to the robot con- straints, algorithm complexity (computational restriction) and dataset restrictions [70]. The main restriction is the un-demonstrated states or unexplored areas in the operation space. In the literature, researchers are trying to overcome this problem as follows:
1. The generalisation from existing demonstrations: Usually this step requires an additional execution of a robot system to generate the required data. For instance use Skill Tree [89]. Also, the probabilistic flow tube techniques [107].
2. Acquisition of new demonstrations when a new novel state present: a learning-based approach was introduced in [108] for manipulating piles of unknown objects. That is to say that the LfD approach must allow incremental learning where robot behaviour can be optimised over time by either enquiring new demonstration or by automatic adapting to changes within the environment.
Another issue is the data quality, where data gathered from an expert might be ambiguous or suboptimal, includes unwanted effects or even wrong data. Those issues can be tackled by one of the following: remove sub-optimal and ambiguous data [109]. Another technique to avoid such a problem is to provide a quality measurement function that evaluates teacher performance and updates the reward value consequently. Duy et al. proposed filtering collected data during the demonstration [110].
Another issue in the LfD field is the correspondence issue, which relates to how to map the skill from the teacher structures into the learner structure. This issue defined two mapping methods: the record mapping and the embodiment mapping, as shown in Figure 2.13. The correspondence problem can be as simple as copying the teacher’s manoeuvres when both teacher and learner share the same architecture design or when the task is demonstrated through teleoperation. Nevertheless, if the robot and the learner have different kinematic designs then recorded data are processed (mapped) using mapping functiongR(Shadowing). That is to say that if the robot
and teacher are different and recorded data are not mapped then a mapping function must be performed before sending data to the learner robotgE.
Real world imposes many restrictions on robotics, where the robot must fail-safe, and hazard of deprivation must be minimised. Accordingly, robustness and reliability of robot are crucial attributes, which might be achieved by LfD [76]. What is more, robustness lacks the learning algorithm to cope with missing data, primarily by employing probabilistic methods such as Gaussian Mixture Regression (GMR). Consequently, dataset quality must be maintained using
performance optimisation as in [13]. In order to avoid unwanted behaviour of a dynamical system in learning methods, a pre-structure with two dynamical system were introduced [10]. In this case, the two structure were connected by exchanging the learned parameters. When an expert operator tries to teach learner robot, the expert must come up with a set of teaching strategies that can clarify the ambiguous regions, which is different from the strategies they apply to themselves. The reason behind this is the differences in kinematics and dynamics abilities [88]. Moreover, a vital issue in LfD when a human demonstrates a task that it is impossible for the robot to perform [2].
Another method to accelerate the learning process and maintain data quality is by predicting the operator actions during demonstrations (intent prediction). Firstly, the robot must determine the teacher is intent/goalG∗ by processing the on-going trajectory and a predefined vector called cues θ , whereG∗ ∈ G
1, . . . ,GN. Once prediction is done robot must set the next actions a ∗
that assure the successful performance of the task [111].
Generally, research in LfD assumes a fixed skill, extracted from the demonstrated task, and learns suitable skill’s parameters. Also, there are several demonstrations and knowledge repre- sentation methods. Furthermore, knowledge can be extracted using different search algorithms such as gradient descent. In comparison with LfD, the main drawback of RL is that it needs large training data and complex functions required to optimise and improve performance [70]. Nevertheless, LfD is a more interactive learning technique and has shown great potential in enabling collaborative human-robot interaction.