• No results found

Robot Learning from Data

N/A
N/A
Protected

Academic year: 2021

Share "Robot Learning from Data"

Copied!
54
0
0

Loading.... (view fulltext now)

Full text

(1)

Robot Learning from Data

(2)

General Pipeline

1. Data acquisition (e.g., from 3D sensors) 2. Feature extraction and representation

construction

3. Robot learning, e.g.:

classification (recognition),

clustering (knowledge discovery) RL for action generation

4. Action execution

(3)

Robot Learning

• There are generally three types of robot learning:

➢Learning by demonstration

➢Reinforcement learning

➢Learning from data

(4)

Learning from data

(5)

Learning from data

• Supervised learning (Classification)

▪ Support Vector Machines (SVMs)

▪ Decision Trees

▪ Ensemble methods

• Unsupervised

learning (Clustering)

▪ Gaussian Mixture Models (GMM)

(6)

Support Vector Machines

• A Support Vector Machine (SVM) is a supervised learning (i.e., classification) model that recognizes patterns using a separating hyperplane.

• Given a set of training data (with semantic labels), an SVM training algorithm builds a model that

outputs an optimal hyperplane which assigns new examples into one category or the other

• SVM is a non-probabilistic binary linear classifier (as

our first SVM model for discussion)

(7)

Support Vector Machines

• The original SVM algorithm was invented by

Vladimir N. Vapnik and Alexey Ya. Chervonenkis in 1963, based on Statistical Learning Theory

• In 1992, Bernhard E. Boser, Isabelle M. Guyon and Vladimir N. Vapnik suggested a way to create

nonlinear classifiers by applying the kernel trick to maximum-margin hyperplanes

• Empirically good performance: successful

applications in many fields (e.g., text processing, image processing, bioinformatics, robotics, etc.)

(8)

Support Vector Machines

• For example: To recognize human behaviors

• We have a training dataset of skeletal data for the human activity “jump forward”

• Now given a new (previously unseen) skeleton sequence, we want to answer the question: is the human performing

“jump forward”, or not?

(9)

Support Vector Machines

How can we choose a line to separate the data?

As an illustration, here we deal with lines and points in the Cartesian

plane instead of hyperplanes and vectors in a high dimensional space.

(10)

Support Vector Machines

H1

Is this a good line?

(11)

Support Vector Machines

H2

Is this a good line?

(12)

Support Vector Machines

H3

Is this a good line?

(13)

Support Vector Machines

H1 H2 H3

Which line is the best? …and Why?

(14)

Support Vector Machines

H1 H2 H3

H3 can best separate two categories of data, as it maximizes the margin

(15)

Support Vector Machines

(16)

Support Vector Machines

• We are given a training dataset of 𝑛 instances with the format:

• The 𝑦𝑖 are either 1 or −1, each indicating the class to which the point 𝒙𝑖 belongs

• Each 𝒙𝑖 is a 𝑝-dimensional real vector

𝒙1, 𝑦1 , … , (𝒙𝑛, 𝑦𝑛)

(17)

Support Vector Machines

• We are given a training dataset of 𝑛 instances with the format:

• We want to find the "maximum-margin hyperplane" that divides the dataset of

instances 𝒙𝑖, where 𝑦𝑖 = 1 if 𝒙𝑖 belong to the category and 𝑦𝑖 = −1 if 𝒙𝑖 does not belong to the category

• The hyperplane is defined so that the distance between the hyperplane and the nearest data point from either group is maximized

𝒙1, 𝑦1 , … , (𝒙𝑛, 𝑦𝑛)

(18)

Support Vector Machines

• Any hyperplane can be written as the set of points 𝒙 satisfying

where 𝒘 is the

parameter (normal vector) of the

hyperplane

𝒘 ∙ 𝒙 − 𝑏 = 0

(19)

Support Vector Machines

• The parameter

𝑏 𝒘

determines the offset of the hyperplane

from the origin along the normal vector 𝒘

(20)

Support Vector Machines

• If the data instances are linearly separable, we can select two parallel hyperplanes (i.e., as decision boundaries) that separate the two categories of data

• The distance (i.e.,

margin) between these two hyperplanes is as large as possible

• The maximum-margin hyperplane is the one that lies halfway

between them

(21)

Support Vector Machines

• These hyperplanes can be described by the equations

and

𝒘 ∙ 𝒙 − 𝑏 = 1 𝒘 ∙ 𝒙 − 𝑏 = −1

Why 1 and -1?

(22)

Support Vector Machines

• Geometrically, the distance between these two planes is

2 𝒘

• Thus, maximizing the distance between the planes is equivalent to minimize 𝒘

(23)

Support Vector Machines

• We also need to

prevent data points from falling into the margin, we add the following constraint:

for each 𝑖 either or

• These constraints state that each data point must lie on the correct side of the margin

𝒘 ∙ 𝒙𝑖 − 𝑏 ≥ 1, if 𝑦𝑖 = 1 𝒘 ∙ 𝒙𝑖 − 𝑏 ≤ −1, if 𝑦𝑖 = −1

(24)

Support Vector Machines

• We also need to

prevent data points from falling into the margin, we add the following constraint:

for each 𝑖 either or

• The two equations can be rewritten as:

𝒘 ∙ 𝒙𝑖 − 𝑏 ≥ 1, if 𝑦𝑖 = 1

𝑦𝑖(𝒘 ∙ 𝒙𝑖 − 𝑏) ≥ 1

𝒘 ∙ 𝒙𝑖 − 𝑏 ≤ −1, if 𝑦𝑖 = −1

(25)

Support Vector Machines

• The final optimization problem becomes:

minimize 𝑤

subject to:

𝑦𝑖 𝒘 ∙ 𝒙𝑖 − 𝑏 ≥ 1 for all 1 ≤ 𝑖 ≤ 𝑛

(26)

Support Vector Machines

Relevant courses: Machine Learning and Convex Optimization

(27)

Support Vector Machines

• The final optimization problem becomes:

• After solving this opti- mization problem, given a new measurement 𝒙, we classify it as:

minimize 𝑤

subject to:

𝑦𝑖 𝒘 ∙ 𝒙𝑖 − 𝑏 ≥ 1 for all 1 ≤ 𝑖 ≤ 𝑛

𝒙 → 𝑦 = sgn 𝒘 ∙ 𝒙 − 𝑏

(28)

Support Vector Machines

• Support vectors are the data points that lie on the two

decision boundaries

• They are the most difficult to classify

• They have direct influence on the

optimum location of the decision

boundaries

(29)

Support Vector Machines

So far, we discussed our first SVM model, that is a binary linear classifier.

• How to classify data instances that are not linear spreadable?

(30)

Support Vector Machines

How about when data is not linearly separatable?

(31)

Support Vector Machines

Projecting data to a higher

dimensional space for separation

(32)

Projecting data to a higher

dimensional space for separation

Support Vector Machines

(33)

Support Vector Machines

SVMs – The kernel trick:

Project data into a higher dimensional space, then construct a hyperplane in that space so all math is the same.

(34)
(35)
(36)
(37)
(38)

(radial basis function)

(39)

(radial basis function)

𝛾 is the hyperparameter of the RBF kernel

(40)

Support Vector Machines

How to deal with non-separable data?

(i.e., when some points always fall on the wrong side of the decision boundaries even after using the kernel trick)

(41)

Support Vector Machines

• We can use the “soft”

margin (based on the slack variables 𝜉)

(42)

Support Vector Machines

• We can use the “soft”

margin (based on the slack variables 𝜉), and the problem can be rewritten as

(43)

Support Vector Machines

•The hyperparameter 𝐶 adds a penalty for each misclassified data point.

• If 𝐶 is small, the penalty for misclassified points is low, so a decision boundary with a large margin is chosen at the expense of a greater number of misclassifications.

• If 𝐶 is large, SVM tries to minimize the number of misclassified examples due to high penalty which results in a decision boundary with a smaller

margin.

(44)

Support Vector Machines

So far, we discussed our first SVM model, that is a binary linear classifier.

• How to classify data instances that are not linearly spreadable?

• How to classify categorical data with more than two categories?

(45)

Support Vector Machines

• Multi-class classification: one-versus-all (also called one-versus-rest)

Class 1

Class 2 Class 3

(46)

Support Vector Machines

• Multi-class classification: one-versus-all (also called one-versus-rest) or one-versus-one

Class 1

Class 2 Class 3

(47)

Support Vector Machines

So far, we discussed our first SVM model, that is a binary linear classifier.

• How to classify data instances that are not linearly spreadable?

• How to classify categorical data with more than two categories?

• How to deal with time?

(48)

Support Vector Machines

• Online, onboard applications: Sliding window techniques

(49)

Support Vector Machines

So far, we discussed our first SVM model, that is a binary linear classifier.

• How to classify data instances that are not linearly spreadable?

• How to classify categorical data with more than two categories?

• How to deal with time?

• How to evaluate performance?

(50)

Support Vector Machines

• Evaluation of a learning algorithm: Training and Testing separation

(51)

Support Vector Machines

• Evaluation of a learning algorithm: Cross- validation during training

Valid- ation

Valid- ation

Valid- ation Valid-

ation

(52)

Support Vector Machines

• Evaluation of a learning algorithm: Confusion Matrix

Ground truth labels

Recognized labels

(53)

Support Vector Machines

• Evaluation of a learning algorithm: Confusion Matrix (normalized)

(54)

Support Vector Machines

References

Related documents

Followings are the results of single lap spot welded joint on different load range by using different thickness specimen-, von- Mises Stress, Maximum Shear Stress ,Fatigue life,

Fig. 1 Flow chart of the IDEAL trial*Defunctioning ileostomy, CR: continuity restoration.. with the disease); initial and cumulative 6-month length of hospital stay; 6-month

Backcrossing was continued for 11 or more generations, until no visible differences were apparent between the original self-fertilized recurrent parental line and the two

The increased number of public health professionals has led to a number of practically trained persons working in public health leadership positions in the ministry, including

Gender Dynamics and Socio-Cultural Determinants of Middle East Respiratory Syndrome Coronavirus (MERS-CoV) in Saudi

A member benefit of the NLDA, CDA and participating provincial and territorial associations, the Investment Program gives dental professionals and their families exclusive access

Not being robust to a faulty mix server was the major drawback of the decryption Mix-net, It means that if a mix server refuses to decrypt its input messages , mixing op- erations

In this paper, we study about the various related work in the area of image processing in cloud environment also discuss about the need of “cloud