• No results found

Lecture 17.pdf

N/A
N/A
Protected

Academic year: 2020

Share "Lecture 17.pdf"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

Reasoning Under

Uncertainty

(2)

Announcements

● P3 up, due March 18

○ Will be official assigned Monday

○ Teams due TODAY (same as before, see Moodle) ● WHW1 Assigned

○ Due March 11th

○ Easier than future written homeworks, largely to help you study for midterm

(3)

Today:

● RL Wrap up ● Probability

(4)

RL:

Given an MDP, we have three methods: Direct Utility Estimation

(5)

RL

Learning can be Passive or Active Passive: Fixed, given policy.

Goal is to learn the world. Active: No given policy.

(6)

RL:

In Active RL, a key issue:

Exploitation vs Exploration

(7)

RL:

To encourage exploration:

Occasionally make random moves (1/t - Greedy)

(8)

RL:

Assign action-utility (Q) values instead of Utility. Q learning is TD learning on Q values.

Very fast, simple update step, is model free.

(9)

RL:

(10)

Tip of the Iceberg

Policy Search

Basic idea: Define a parameterization of Q Now learn good weights

(11)

Tip of the Tip of the Iceberg

Multi-agent Systems:

Distributed Constrained Optimization Problem

Distributed Coordination of Exploration and Exploitation

Transfer Learning

(12)
(13)

Next Up:

(14)

Roadmap:

Reasoning Under Uncertainty

● Probability

● Bayesian Networks ○ Exact Inference

○ Sampling

● Markov Models ○ Exact Inference ○ Particle Filtering

A method of modeling probabilistic connections in the world, Ch. 14

(15)

Probability: Why?

● It’s ubiquitous.

○ In the real world, many things are unknown. ● It’s powerful.

○ It greatly simplifies some analysis.

(16)

Detour: Monte Carlo Sampling

Consider a circle of radius r centered at (0, 0).

(17)

Detour: Monte Carlo Sampling

Consider an object defined by two circles, each

with radius r, centered at (-1, 0) and (1, 0).

(18)

Detour: MC Sampling

What can we do easily?

● We can measure areas of squares ● We can roll dice.

(19)

Detour: MC Sampling

Draw square encompassing object. (L = 2 + 2r)

Let in = out = 0

Repeat:

1. Sample (x, y) uniformly form square

(20)

Detor: MC Sampling

(21)

Detour: A Brain Teaser

Alice picks 2 unique numbers x and y.

She hands them to Bob.

Bob flips a fair coin and selects one of them. Bob hands you that number.

(22)

Probability: Sample Space

Sample space: Ω

The set of all possible worlds

(23)

Probability: Events

An event is a set of possible worlds.

(24)

Example:

Roll two 6-sided dice.

Ω: (1, 1), (1, 2), (1, 3), … (2, 1), (2, 2)... Example event:

Total = 11

(25)

Probability:

A probability measure P is any function of a set S of events (a set of sets of possible worlds)

(26)

Example:

Roll two 6-sided dice.

S = {Total = 2, Total = 3, Total = 4, …} 0 ≤ P(T = x) ≤ 1

P(T = 1) + P(T = 2) + … = 1

(27)

Probability: Independence

Two events A and B are independent if

P(A and B) = P(A) P(B) Example:

P(d1 = 2 and d2 = 3) = P(d1 = 2) P(d2 = 3) =

(28)

Probability: Priors

P(e) is unconditional, called a “prior”

(29)

Probability: Conditional Probability

P(a | b) : The probability of a occurring, given

that b occurs.

Example:

P(T = 2 | d1= 1) = 1/6

(30)

Probability:

(31)

Probability:

Conditional Independence

(32)

Season Temp Forecast P Fall Hot Clear 0.30 Fall Hot Rainy 0.05 Fall Cold Clear 0.10 Fall Cold Rainy 0.05 Winter Hot Clear 0.10 Winter Hot Rainy 0.05 Winter Cold Clear 0.15

Inference by Enumeration

(33)

Season Temp Forecast P Fall Hot Clear 0.30 Fall Hot Rainy 0.05 Fall Cold Clear 0.10 Fall Cold Rainy 0.05 Winter Hot Clear 0.10 Winter Hot Rainy 0.05

Inference by Enumeration

P(F = C | S = W) = ?

(34)

Season Temp Forecast P Fall Hot Clear 0.30 Fall Hot Rainy 0.05 Fall Cold Clear 0.10 Fall Cold Rainy 0.05 Winter Hot Clear 0.10 Winter Hot Rainy 0.05 Winter Cold Clear 0.15

Inference by Enumeration

P(C | W, H) = ?

(35)

Probabilistic Reasoning:

Note that beliefs change with evidence. P(I’m on time to work): 0.9

...Given that it’s snowing: 0.6

...and that my car has a flat: 0.3

(36)

Bayesian Networks

A Bayes Net is a DAG with annotations on nodes. ● Each node is a random variable.

● An edge from X to Y means X is a parent of Y.

(37)

References

Related documents

Our quantitative analysis differs from that study in three important respects–first we rely on actual data from the United States on accidents related to cellular phone use to

Using the well-characterized conformational antibody α APF, which specifically detects APFs independently of amino acid sequence [28,29,31,35], we found that APFs are pre- ceded by

6) A population of 100 frogs increases at an annual rate of 22%.. If this growth continues, what will the approximate population of. Henderson City be in the year 2000.

Debit the cash account and credit the specific income account(s), such as pledges or loose offerings. It is recommended that quarterly statements be furnished to regular

From midrange EMC Celerra ® and EMC CLARiiON ® shared storage systems to EMC Symmetrix ® V-Max ™ —the world’s largest high-end storage solution designed to support the

MCS units engage in a range of security practices, which include watching for ‘nuisance’ conduct (e.g., littering, alcohol consumption, and panhandling) on and in relation to

• Microorganisms control the environmental fate of Arsenic through various mechanisms resulting changes in solubility and/or toxicity of different Arsenic species.. Keywords:

Although no HP minerals have been previously reported in the Château-Renard meteorite, we have observed within the melt veins a number of HP minerals and report on them here for