Lecture 17.pdf

(1)

Reasoning Under

Uncertainty

(2)

Announcements

● P3 up, due March 18

○ Will be official assigned Monday

○ Teams due TODAY (same as before, see Moodle) ● WHW1 Assigned

○ Due March 11th

○ Easier than future written homeworks, largely to help you study for midterm

(3)

Today:

● RL Wrap up ● Probability

(4)

RL:

Given an MDP, we have three methods: Direct Utility Estimation

(5)

RL

Learning can be Passive or Active Passive: Fixed, given policy.

Goal is to learn the world. Active: No given policy.

(6)

RL:

In Active RL, a key issue:

Exploitation vs Exploration

(7)

RL:

To encourage exploration:

Occasionally make random moves (1/t - Greedy)

(8)

RL:

Assign action-utility (Q) values instead of Utility. Q learning is TD learning on Q values.

Very fast, simple update step, is model free.

(9)

RL:

(10)

Tip of the Iceberg

Policy Search

Basic idea: Define a parameterization of Q Now learn good weights

(11)

Tip of the Tip of the Iceberg

Multi-agent Systems:

Distributed Constrained Optimization Problem

Distributed Coordination of Exploration and Exploitation

Transfer Learning

(12)

(13)

Next Up:

(14)

Roadmap:

Reasoning Under Uncertainty

● Probability

● Bayesian Networks ○ Exact Inference

○ Sampling

● Markov Models ○ Exact Inference ○ Particle Filtering

A method of modeling probabilistic connections in the world, Ch. 14

(15)

Probability: Why?

● It’s ubiquitous.

○ In the real world, many things are unknown. ● It’s powerful.

○ It greatly simplifies some analysis.

(16)

Detour: Monte Carlo Sampling

Consider a circle of radius r centered at (0, 0).

(17)

Detour: Monte Carlo Sampling

Consider an object defined by two circles, each

with radius r, centered at (-1, 0) and (1, 0).

(18)

Detour: MC Sampling

What can we do easily?

● We can measure areas of squares ● We can roll dice.

(19)

Detour: MC Sampling

Draw square encompassing object. (L = 2 + 2r)

Let in = out = 0

Repeat:

1. Sample (x, y) uniformly form square

(20)

Detor: MC Sampling

(21)

Detour: A Brain Teaser

Alice picks 2 unique numbers x and y.

She hands them to Bob.

Bob flips a fair coin and selects one of them. Bob hands you that number.

(22)

Probability: Sample Space

Sample space: Ω

The set of all possible worlds

(23)

Probability: Events

An event is a set of possible worlds.

(24)

Example:

Roll two 6-sided dice.

Ω: (1, 1), (1, 2), (1, 3), … (2, 1), (2, 2)... Example event:

Total = 11

(25)

Probability:

A probability measure P is any function of a set S of events (a set of sets of possible worlds)

(26)

Example:

Roll two 6-sided dice.

S = {Total = 2, Total = 3, Total = 4, …} 0 ≤ P(T = x) ≤ 1

P(T = 1) + P(T = 2) + … = 1

(27)

Probability: Independence

Two events A and B are independent if

P(A and B) = P(A) P(B) Example:

P(d₁ = 2 and d₂ = 3) = P(d₁ = 2) P(d₂ = 3) =

(28)

Probability: Priors

P(e) is unconditional, called a “prior”

(29)

Probability: Conditional Probability

P(a | b) : The probability of a occurring, given

that b occurs.

Example:

P(T = 2 | d₁= 1) = 1/6

(30)

Probability:

(31)

Probability:

Conditional Independence

(32)

Season Temp Forecast P Fall Hot Clear 0.30 Fall Hot Rainy 0.05 Fall Cold Clear 0.10 Fall Cold Rainy 0.05 Winter Hot Clear 0.10 Winter Hot Rainy 0.05 Winter Cold Clear 0.15

Inference by Enumeration

(33)

Season Temp Forecast P Fall Hot Clear 0.30 Fall Hot Rainy 0.05 Fall Cold Clear 0.10 Fall Cold Rainy 0.05 Winter Hot Clear 0.10 Winter Hot Rainy 0.05

Inference by Enumeration

P(F = C | S = W) = ?

(34)

Season Temp Forecast P Fall Hot Clear 0.30 Fall Hot Rainy 0.05 Fall Cold Clear 0.10 Fall Cold Rainy 0.05 Winter Hot Clear 0.10 Winter Hot Rainy 0.05 Winter Cold Clear 0.15

Inference by Enumeration

P(C | W, H) = ?

(35)

Probabilistic Reasoning:

Note that beliefs change with evidence. P(I’m on time to work): 0.9

...Given that it’s snowing: 0.6

...and that my car has a flat: 0.3

(36)

Bayesian Networks

A Bayes Net is a DAG with annotations on nodes. ● Each node is a random variable.

● An edge from X to Y means X is a parent of Y.

(37)