• No results found

Unsupervised Learning Reinforcement Learning

N/A
N/A
Protected

Academic year: 2020

Share "Unsupervised Learning Reinforcement Learning"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 08

Machine Learning

(2)

Outline

Unsupervised Learning

Reinforcement Learning

Hi Humans !! Hi Machine

(3)

Machine Learning Machine Learning Supervised (Task Driven) Supervised (Task Driven) Classification Classification

Decision Tree, NN, Naïve Bayes, KNN, SVM, Discriminant Analysis, Ensemble Methods, Random

Forest

Decision Tree, NN, Naïve Bayes, KNN, SVM, Discriminant Analysis, Ensemble Methods, Random Forest Regression Regression Ordinary LSR, Linear Regression, Logistic Regression,

MARS, LOESS

Ordinary LSR, Linear Regression, Logistic Regression,

MARS, LOESS Unsupervised (Data Driven) Unsupervised (Data Driven) Clustering Clustering kMean, Kmedoids, Fuzzy C-means, Hierarchical, SOM, Hidden Markov Model,

Gaussian Mixture

kMean, Kmedoids, Fuzzy C-means, Hierarchical, SOM, Hidden Markov Model,

Gaussian Mixture Dimension Reduction Dimension Reduction PCA, LDA PCA, LDA Reinforcement (Algorithms learn to react

an environment)

Reinforcement (Algorithms learn to react

an environment) Decision Process Decision Process Reward System Reward System Recommendation Systems Recommendation Systems

Develop predictive model based on

both input and

output data Develop predictive

model based on both input and

output data

Discover an internal representation

(4)
(5)

Tasks

Clustering

x c Discrete ID

Dimensionality Reduction

x z Continuous

(6)
(7)

Unsupervised Learning

Given a training corpus of data points

Observed value of random variables in Bayesian

network

Series of data pointsOrbits of planets

Learn underlying pattern in the data

Existence and conditional probability of hidden

variables

(8)

Unsupervised Learning Example

2D state space with

unclassified observations

Learn number and form of

clusters

Problem of unsupervised

clustering

Many algorithms proposed

for it

More research still being

done for better algorithms, different kind of data, …

(9)

*

*

* *

*

*

*

*

*

*

*

*

*

*

*

* *

*

*

*

*

*

*

*

* *

*

*

*

*

*

*

Unsupervised Learning Algorithm

Define a similarity

measure, to compare pairs of elements

Starting with no clusters

Pick seed element

Group similar elements until

threshold

Pick new seed from free

elements and start again

*

*

(10)

*

*

*

*

*

*

*

*

*

* *

*

*

*

*

*

*

*

*

*

*

*

*

*

* *

*

*

*

*

*

*

Unsupervised Learning Algorithm

Starting with one

all-encompassing cluster

Find cluster with highest

internal dissimilarity

Find most dissimilar pair of

elements inside cluster

Split into two clusters

Repeat until all clusters have

internal homogeneity

Merge homogeneous clusters

(11)

Unsupervised Learning Evaluation

Need to evaluate fitness of relationship

learned

Number of clusters vs. their internal propertiesDifference between clusters vs. internal

homogeneity

Number of parameters vs. number of hidden

variables in Bayesian network

No way of knowing what is the optimal

(12)

K-Means

Popular unsupervised clustering algorithm

Data represented as cloud of points in state

space

Target

Group points in k clusters

(13)

K-Means

Start with k random cluster centers

For each iteration

For each data point

Associate the point to the nearest cluster center Add to variance

Move each cluster center to the center of mass of

associated data point cloud

End when

(14)

K-Means

We have:

– Data points: x1, …, xi, …, xn

– Clusters: C1, …, Cj, … Ck

– Cluster centers: 1, …, j, … k

Minimize intra-cluster variance

(15)
(16)

Collecting and labeling a large training set can be

very expensive.

Be able to find features which are helpful for

categorization.

Gain insight into the natural structure of the data.

(17)

There are a lot of other Unsupervised

Learning Methods.

Examples:

◦ Competitive Learning

◦ Kohonen’s Neural Networks: Self-Organizing Maps

◦ Principal Component Analysis, Autoassociation

(18)
(19)

Reinforcement Learning

Given a set of possible actions, the resulting state of

the environment, and rewards or punishment for each state

Taxi driver: tips, car repair costs, ticketsCheckers: advantage in number of pieces

Learn to maximize the rewards and/or minimize the

punishments

Maximize tip, minimize damage to car and police tickets:

drive properly

(20)

Learning by trial and error

Try something, see the result

Speeding results in tickets, going through a red light

results in car damage, quick and safe drive results in tips

Checkers pieces in the center of the board are soon lost,

pieces on the side are kept longer, sacrifice some pieces to take a greater number of enemy pieces

Sacrifice known rewarding actions to explore new,

potentially more rewarding actions

Develop strategies to maximize rewards while

minimizing penalties over the long-term

(21)

Q-Learning

Each state has

A reward or punishment

A list of possible actions, which lead to other

states

Learn value of state-action pairs

(22)

Q-Learning

Update value of previous (t-1) state-action pair based

on current (t) state-action value

• Q(st-1,at-1) = [Rt-1 + maxa(Q(st,at)) – Q(st-1,at-1)]

Q(s,a): estimated value of state-action pair (s,a) – Rt: reward of state st

: learning rate

: discount factor of future rewards

• 0 (future rewards are irrelevant), 1 (future rewards are the same as current rewards)

(23)

Exploration Function

If agent always does action with max Q(s,a), it

always evaluates the same state-action pairs

Need exploration function

Trade-off greed vs. curiosity

Try rarely-explored low-payoff actions instead of

well-known high-payoff actions

(24)

Exploration Function

Define:

Q(s,a): estimated value of (s,a)

N(s,a): number of times (s,a) has been triedRmax: maximum possible value of Q(s,a)

Nmin: minimum number of times we want the agent to try

(s,a)

f( Q(s,a), N(s,a) ) =

Rmax if N(s,a) < NminQ(s,a) otherwise

Agent picks action with maximum f(.) value

(25)

Limits of RL

Search

Number of state-action pairs can be very largeIntermediate rewards can be noisy

Real-world search

Initial policy can have very poor reward

Necessary exploration of suboptimal actions can

(26)

Policy

Learn the optimal policy in decision network

: S

A

EU(

) =

t=0

t

R

t

Greedy search

(27)

Helicopter Flight Control

Sustained stable

inverted flight

Very difficult for

humans

First AI able to

(28)

Helicopter Flight Control

Collect flight data with human pilot

Learn model of helicopter dynamics

Stochastic and nonlinearSupervised learning

Learn policy for helicopter controller

(29)

Helicopter Dynamics

States

Position, orientation,

velocity, angular velocity

12 variables

391 seconds of flight data

Time step 0.1s

3910 triplets (st, at, st+1)

Learn probability distribution P(s

t+1

|s

t

,a

t

)

(30)

Helicopter Controller

Problem definition

S: set of possible states – s0: initial state (s0  S)

A: set of possible actions

P(S|S,A): state transition probabilities – : discount factor

R: reward function mapping states to values

• At state st, controller picks action at, system

(31)

Helicopter Controller

Reward function

Punish deviation from desired helicopter position

and velocity

R  [-, 0]

Policy learning

Reinforcement learningEU() = t=0 t R

t

Problem

Stochastic state transitions

(32)

PEGASUS algorithm

Predefined series of random numbers

Length of series function of complexity of policy

Use the same series to test all policies

At time t, each policy encounters the same random event

Simulate stochastic environment

(33)

Robot control

Elevator scheduling (search for patterns)

Telecommunications (finding networks)

Games (Chess, Backgammon)

Financial trading

(34)

Summary of Learning

Supervised Unsupervised Reinforcement

Training

data Data and correct

output Data

States, actions, and

rewards

Learning

target Data-output relationship Patterns in data Policy

Evaluation Statistics Fitness Reward value

Typical

(35)

1. What is unsupervised learning? Explain with an example? Why it is important?

2. What is k-means clustering? How it works? 3. What is reinforcement learning?

4. Explain Q-learning in reinforcement learning. 5. Describe exploration function and policy in

reinforcement learning.

(36)

References

Related documents

The primary objective of this study was to identify patients who had solid tumors (other than metastatic mela- noma or papillary thyroid cancer) or multiple myeloma

The objectives of this study were (a) to measure the effectiveness of using clustering technique on writing ability of the eleventh grade students of MAN Model Palangka

Digital systems used to support access control to ensure physical security and safety of a ship and its cargo, including surveillance, shipboard security

Similar to the group of firms aged 6-0 years, within the group of services SMEs that have been active for -5 years, those firms that started exporting report the highest growth

This animal study also directly compared the facial nerve functional outcome in a group of rats receiving brief electrical stimulation following either crush or transection

The study area included over 2600 census tracts. States and other large jurisdictions can easily aggregate areas to create stable LE estimates using the Geographic Aggregation

Pallavi Institute of Diploma in Education, Nancherla Gate, Pargi Road, Rangareddy district-509337.

The five core books cover each stage of the service lifecycle (Figure 5.1-1 - ITIL Service Lifecycle) from the initial definition and analysis of business requirements in