• No results found

Unsupervised Learning Reinforcement Learning

N/A
N/A
Protected

Academic year: 2020

Share "Unsupervised Learning Reinforcement Learning"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 08

Machine Learning

(2)

Outline

Unsupervised Learning

Reinforcement Learning

Hi Humans !! Hi Machine

(3)

Machine Learning

Supervised (Task Driven)

Classification

Decision Tree, NN, Naïve Bayes, KNN, SVM, Discriminant Analysis, Ensemble Methods, Random Forest Regression Ordinary LSR, Linear Regression, Logistic Regression, MARS, LOESS Unsupervised (Data Driven) Clustering kMean, Kmedoids, Fuzzy C-means, Hierarchical, SOM, Hidden Markov Model,

Gaussian Mixture

Dimension Reduction

PCA, LDA

Reinforcement (Algorithms learn to react

an environment) Decision Process Reward System Recommendation Systems Develop predictive

model based on

both input and

output data

Discover an

internal representation

(4)
(5)

Tasks

Clustering

x c Discrete ID

Dimensionality Reduction

x z Continuous

(6)
(7)

Unsupervised Learning

Given a training corpus of data points

– Observed value of random variables in Bayesian network

– Series of data points

– Orbits of planets

Learn underlying pattern in the data

– Existence and conditional probability of hidden variables

– Number of classes and classification rules

(8)

Unsupervised Learning Example

• 2D state space with

unclassified observations

• Learn number and form of clusters

Problem of unsupervised

clustering

– Many algorithms proposed

for it

– More research still being

done for better algorithms, different kind of data, …

(9)

*

*

* *

*

*

*

*

*

*

*

*

*

*

*

* *

*

*

*

*

*

*

*

* *

*

*

*

*

*

*

Unsupervised Learning Algorithm

• Define a similarity

measure, to compare pairs of elements

• Starting with no clusters

– Pick seed element

– Group similar elements until

threshold

– Pick new seed from free

elements and start again

*

*

(10)

*

*

*

*

*

*

*

*

*

* *

*

*

*

*

*

*

*

*

*

*

*

*

*

* *

*

*

*

*

*

*

Unsupervised Learning Algorithm

• Starting with one

all-encompassing cluster

– Find cluster with highest

internal dissimilarity

– Find most dissimilar pair of

elements inside cluster

– Split into two clusters

– Repeat until all clusters have

internal homogeneity

– Merge homogeneous clusters

(11)

Unsupervised Learning Evaluation

Need to evaluate fitness of relationship

learned

– Number of clusters vs. their internal properties

– Difference between clusters vs. internal homogeneity

– Number of parameters vs. number of hidden variables in Bayesian network

(12)

K-Means

Popular unsupervised clustering algorithm

Data represented as cloud of points in state

space

Target

– Group points in k clusters

(13)

K-Means

Start with k random cluster centers

For each iteration

– For each data point

• Associate the point to the nearest cluster center

• Add to variance

Move each cluster center to the center of mass of

associated data point cloud

End when

• Variance less than threshold

(14)

K-Means

We have:

– Data points: x1, …, xi, …, xn

– Clusters: C1, …, Cj, … Ck

– Cluster centers: μ1, …, μj, … μk

Minimize intra-cluster variance

(15)
(16)

• Collecting and labeling a large training set can be very expensive.

• Be able to find features which are helpful for categorization.

• Gain insight into the natural structure of the data.

(17)

There are a lot of other Unsupervised

Learning Methods.

Examples:

◦ Competitive Learning

◦ Kohonen’s Neural Networks: Self-Organizing Maps

◦ Principal Component Analysis, Autoassociation

(18)
(19)

Reinforcement Learning

• Given a set of possible actions, the resulting state of the environment, and rewards or punishment for

each state

– Taxi driver: tips, car repair costs, tickets

– Checkers: advantage in number of pieces

• Learn to maximize the rewards and/or minimize the punishments

– Maximize tip, minimize damage to car and police tickets:

drive properly

(20)

• Learning by trial and error

• Try something, see the result

– Speeding results in tickets, going through a red light

results in car damage, quick and safe drive results in tips

– Checkers pieces in the center of the board are soon lost,

pieces on the side are kept longer, sacrifice some pieces to take a greater number of enemy pieces

• Sacrifice known rewarding actions to explore new, potentially more rewarding actions

• Develop strategies to maximize rewards while minimizing penalties over the long-term

(21)

Q-Learning

Each state has

– A reward or punishment

– A list of possible actions, which lead to other states

Learn value of state-action pairs

(22)

Q-Learning

• Update value of previous (t-1) state-action pair based on current (t) state-action value

• ΔQ(st-1,at-1) = η[Rt-1 + γmaxa(Q(st,at)) – Q(st-1,at-1)]

– Q(s,a): estimated value of state-action pair (s,a)

– Rt: reward of state st

– η: learning rate

– γ: discount factor of future rewards

• 0 (future rewards are irrelevant), 1 (future rewards are the same as current rewards)

(23)

Exploration Function

If agent always does action with max Q(s,a), it

always evaluates the same state-action pairs

Need exploration function

– Trade-off greed vs. curiosity

– Try rarely-explored low-payoff actions instead of well-known high-payoff actions

(24)

Exploration Function

• Define:

– Q(s,a): estimated value of (s,a)

– N(s,a): number of times (s,a) has been tried

– Rmax: maximum possible value of Q(s,a)

– Nmin: minimum number of times we want the agent to try

(s,a)

• f( Q(s,a), N(s,a) ) =

– Rmax if N(s,a) < Nmin

– Q(s,a) otherwise

• Agent picks action with maximum f(.) value

(25)

Limits of RL

Search

– Number of state-action pairs can be very large

– Intermediate rewards can be noisy

Real-world search

– Initial policy can have very poor reward

(26)

Policy

Learn the optimal policy in decision network

π: S A

EU(π) = Σ

t=0

γ

t

R

t

Greedy search

(27)

Helicopter Flight Control

Sustained stable

inverted flight

– Very difficult for humans

(28)

Helicopter Flight Control

Collect flight data with human pilot

Learn model of helicopter dynamics

– Stochastic and nonlinear

– Supervised learning

Learn policy for helicopter controller

(29)

Helicopter Dynamics

States

– Position, orientation,

velocity, angular velocity

– 12 variables

391 seconds of flight data

– Time step 0.1s

– 3910 triplets (st, at, st+1)

Learn probability distribution P(s

t+1

|s

t

,a

t

)

(30)

Helicopter Controller

• Problem definition

– S: set of possible states

– s0: initial state (s0 S)

– A: set of possible actions

– P(S|S,A): state transition probabilities

– γ: discount factor

– R: reward function mapping states to values

• At state st, controller picks action at, system

(31)

Helicopter Controller

Reward function

– Punish deviation from desired helicopter position and velocity

– R [- , 0]

Policy learning

– Reinforcement learning

– EU(π) = Σt=0 γt Rt

Problem

– Stochastic state transitions

(32)

PEGASUS algorithm

• Predefined series of random numbers

– Length of series function of complexity of policy

• Use the same series to test all policies

– At time t, each policy encounters the same random event

• Simulate stochastic environment

– Environment stochastic from point of view of agent

– Environment deterministic from our point of view

(33)

Robot control

Elevator scheduling (search for patterns)

Telecommunications (finding networks)

Games (Chess, Backgammon)

Financial trading

(34)

Summary of Learning

Supervised Unsupervised Reinforcement

Training data

Data and correct

output

Data

States, actions, and

rewards

Learning target

Data-output relationship

Patterns in

data Policy

Evaluation Statistics Fitness Reward value

Typical

(35)

1. What is unsupervised learning? Explain with an example? Why it is important?

2. What is k-means clustering? How it works? 3. What is reinforcement learning?

4. Explain Q-learning in reinforcement learning. 5. Describe exploration function and policy in

reinforcement learning.

(36)

References

Related documents

The community-based response strategy in Sierra Leone consisted of two parts (see Additional file 2: Appendix File S1): the first was to conduct widespread community education on

The mutated tegument protein UL7 attenuates the virulence of herpes simplex virus 1 by reducing the modulation of ? 4 gene transcription RESEARCH Open Access The mutated tegument protein

Ms Stinch fi eld and Ms LeBlanc designed data collection instruments, collected, coordinated, and interpreted clinical data collection at the children ’ s hospital, and

After the first 12 hours of girinimbine treatment, HT-29 cells had begun to undergo apoptosis with an aver- age number of 80 cells in EA, followed by 40 cells in LA,

Croatia’s initial motivation to repair regional relations was most likely in response to international pressure, proven by such actions as its surrender to Slovenia over

Adsorbents used for removal of Cd(II), Pb(II) and Arsenic: In recent years various studies have been performed on biosorption to remove heavy metal and these represents

As health care transitions toward inter-professional team-based care and emphasizes the BPS model in the management of chronic spine related pain, it is reasonable to