Lecture 08
Machine Learning
Outline
•
Unsupervised Learning
•
Reinforcement Learning
Hi Humans !! Hi Machine
Machine Learning Machine Learning Supervised (Task Driven) Supervised (Task Driven) Classification Classification
Decision Tree, NN, Naïve Bayes, KNN, SVM, Discriminant Analysis, Ensemble Methods, Random
Forest
Decision Tree, NN, Naïve Bayes, KNN, SVM, Discriminant Analysis, Ensemble Methods, Random Forest Regression Regression Ordinary LSR, Linear Regression, Logistic Regression,
MARS, LOESS
Ordinary LSR, Linear Regression, Logistic Regression,
MARS, LOESS Unsupervised (Data Driven) Unsupervised (Data Driven) Clustering Clustering kMean, Kmedoids, Fuzzy C-means, Hierarchical, SOM, Hidden Markov Model,
Gaussian Mixture
kMean, Kmedoids, Fuzzy C-means, Hierarchical, SOM, Hidden Markov Model,
Gaussian Mixture Dimension Reduction Dimension Reduction PCA, LDA PCA, LDA Reinforcement (Algorithms learn to react
an environment)
Reinforcement (Algorithms learn to react
an environment) Decision Process Decision Process Reward System Reward System Recommendation Systems Recommendation Systems
Develop predictive model based on
both input and
output data Develop predictive
model based on both input and
output data
Discover an internal representation
Tasks
Clustering
x c Discrete ID
Dimensionality Reduction
x z Continuous
Unsupervised Learning
•
Given a training corpus of data points
– Observed value of random variables in Bayesian
network
– Series of data points – Orbits of planets
•
Learn underlying pattern in the data
– Existence and conditional probability of hidden
variables
Unsupervised Learning Example
• 2D state space with
unclassified observations
• Learn number and form of
clusters
• Problem of unsupervised
clustering
– Many algorithms proposed
for it
– More research still being
done for better algorithms, different kind of data, …
*
*
* *
*
*
*
*
*
*
*
*
*
*
*
* *
*
*
*
*
*
*
*
* *
*
*
*
*
*
*
Unsupervised Learning Algorithm
• Define a similarity
measure, to compare pairs of elements
• Starting with no clusters
– Pick seed element
– Group similar elements until
threshold
– Pick new seed from free
elements and start again
*
*
*
*
*
*
*
*
*
*
*
* *
*
*
*
*
*
*
*
*
*
*
*
*
*
* *
*
*
*
*
*
*
Unsupervised Learning Algorithm
• Starting with one
all-encompassing cluster
– Find cluster with highest
internal dissimilarity
– Find most dissimilar pair of
elements inside cluster
– Split into two clusters
– Repeat until all clusters have
internal homogeneity
– Merge homogeneous clusters
Unsupervised Learning Evaluation
•
Need to evaluate fitness of relationship
learned
– Number of clusters vs. their internal properties – Difference between clusters vs. internal
homogeneity
– Number of parameters vs. number of hidden
variables in Bayesian network
•
No way of knowing what is the optimal
K-Means
•
Popular unsupervised clustering algorithm
•
Data represented as cloud of points in state
space
•
Target
– Group points in k clusters
K-Means
•
Start with k random cluster centers
•
For each iteration
– For each data point
• Associate the point to the nearest cluster center • Add to variance
– Move each cluster center to the center of mass of
associated data point cloud
– End when
K-Means
•
We have:
– Data points: x1, …, xi, …, xn
– Clusters: C1, …, Cj, … Ck
– Cluster centers: 1, …, j, … k
•
Minimize intra-cluster variance
• Collecting and labeling a large training set can be
very expensive.
• Be able to find features which are helpful for
categorization.
• Gain insight into the natural structure of the data.
There are a lot of other Unsupervised
Learning Methods.
Examples:
◦ Competitive Learning
◦ Kohonen’s Neural Networks: Self-Organizing Maps
◦ Principal Component Analysis, Autoassociation
Reinforcement Learning
• Given a set of possible actions, the resulting state of
the environment, and rewards or punishment for each state
– Taxi driver: tips, car repair costs, tickets – Checkers: advantage in number of pieces
• Learn to maximize the rewards and/or minimize the
punishments
– Maximize tip, minimize damage to car and police tickets:
drive properly
• Learning by trial and error
• Try something, see the result
– Speeding results in tickets, going through a red light
results in car damage, quick and safe drive results in tips
– Checkers pieces in the center of the board are soon lost,
pieces on the side are kept longer, sacrifice some pieces to take a greater number of enemy pieces
• Sacrifice known rewarding actions to explore new,
potentially more rewarding actions
• Develop strategies to maximize rewards while
minimizing penalties over the long-term
Q-Learning
•
Each state has
– A reward or punishment
– A list of possible actions, which lead to other
states
•
Learn value of state-action pairs
Q-Learning
• Update value of previous (t-1) state-action pair based
on current (t) state-action value
• Q(st-1,at-1) = [Rt-1 + maxa(Q(st,at)) – Q(st-1,at-1)]
– Q(s,a): estimated value of state-action pair (s,a) – Rt: reward of state st
– : learning rate
– : discount factor of future rewards
• 0 (future rewards are irrelevant), 1 (future rewards are the same as current rewards)
Exploration Function
•
If agent always does action with max Q(s,a), it
always evaluates the same state-action pairs
•
Need exploration function
– Trade-off greed vs. curiosity
– Try rarely-explored low-payoff actions instead of
well-known high-payoff actions
Exploration Function
• Define:
– Q(s,a): estimated value of (s,a)
– N(s,a): number of times (s,a) has been tried – Rmax: maximum possible value of Q(s,a)
– Nmin: minimum number of times we want the agent to try
(s,a)
• f( Q(s,a), N(s,a) ) =
– Rmax if N(s,a) < Nmin – Q(s,a) otherwise
• Agent picks action with maximum f(.) value
Limits of RL
•
Search
– Number of state-action pairs can be very large – Intermediate rewards can be noisy
•
Real-world search
– Initial policy can have very poor reward
– Necessary exploration of suboptimal actions can
Policy
•
Learn the optimal policy in decision network
•
: S
A
•
EU(
) =
t=0
tR
t
•
Greedy search
Helicopter Flight Control
•
Sustained stable
inverted flight
– Very difficult for
humans
– First AI able to
Helicopter Flight Control
•
Collect flight data with human pilot
•
Learn model of helicopter dynamics
– Stochastic and nonlinear – Supervised learning
•
Learn policy for helicopter controller
Helicopter Dynamics
•
States
– Position, orientation,
velocity, angular velocity
– 12 variables
•
391 seconds of flight data
– Time step 0.1s
– 3910 triplets (st, at, st+1)
•
Learn probability distribution P(s
t+1|s
t,a
t)
Helicopter Controller
• Problem definition
– S: set of possible states – s0: initial state (s0 S)
– A: set of possible actions
– P(S|S,A): state transition probabilities – : discount factor
– R: reward function mapping states to values
• At state st, controller picks action at, system
Helicopter Controller
•
Reward function
– Punish deviation from desired helicopter position
and velocity
– R [-, 0]
•
Policy learning
– Reinforcement learning – EU() = t=0 t R
t
•
Problem
– Stochastic state transitions
PEGASUS algorithm
• Predefined series of random numbers
– Length of series function of complexity of policy
• Use the same series to test all policies
– At time t, each policy encounters the same random event
• Simulate stochastic environment
Robot control
Elevator scheduling (search for patterns)
Telecommunications (finding networks)
Games (Chess, Backgammon)
Financial trading
Summary of Learning
Supervised Unsupervised Reinforcement
Training
data Data and correct
output Data
States, actions, and
rewards
Learning
target Data-output relationship Patterns in data Policy
Evaluation Statistics Fitness Reward value
Typical
1. What is unsupervised learning? Explain with an example? Why it is important?
2. What is k-means clustering? How it works? 3. What is reinforcement learning?
4. Explain Q-learning in reinforcement learning. 5. Describe exploration function and policy in
reinforcement learning.