Reasoning Under
Uncertainty
Announcements
● P3 up, due March 18
○ Will be official assigned Monday
○ Teams due TODAY (same as before, see Moodle) ● WHW1 Assigned
○ Due March 11th
○ Easier than future written homeworks, largely to help you study for midterm
Today:
● RL Wrap up ● Probability
RL:
Given an MDP, we have three methods: Direct Utility Estimation
RL
Learning can be Passive or Active Passive: Fixed, given policy.
Goal is to learn the world. Active: No given policy.
RL:
In Active RL, a key issue:
Exploitation vs Exploration
RL:
To encourage exploration:
Occasionally make random moves (1/t - Greedy)
RL:
Assign action-utility (Q) values instead of Utility. Q learning is TD learning on Q values.
Very fast, simple update step, is model free.
RL:
Tip of the Iceberg
Policy Search
Basic idea: Define a parameterization of Q Now learn good weights
Tip of the Tip of the Iceberg
Multi-agent Systems:
Distributed Constrained Optimization Problem
Distributed Coordination of Exploration and Exploitation
Transfer Learning
Next Up:
Roadmap:
Reasoning Under Uncertainty
● Probability
● Bayesian Networks ○ Exact Inference
○ Sampling
● Markov Models ○ Exact Inference ○ Particle Filtering
A method of modeling probabilistic connections in the world, Ch. 14
Probability: Why?
● It’s ubiquitous.
○ In the real world, many things are unknown. ● It’s powerful.
○ It greatly simplifies some analysis.
Detour: Monte Carlo Sampling
Consider a circle of radius r centered at (0, 0).
Detour: Monte Carlo Sampling
Consider an object defined by two circles, each
with radius r, centered at (-1, 0) and (1, 0).
Detour: MC Sampling
What can we do easily?
● We can measure areas of squares ● We can roll dice.
Detour: MC Sampling
Draw square encompassing object. (L = 2 + 2r)
Let in = out = 0
Repeat:
1. Sample (x, y) uniformly form square
Detor: MC Sampling
Detour: A Brain Teaser
Alice picks 2 unique numbers x and y.
She hands them to Bob.
Bob flips a fair coin and selects one of them. Bob hands you that number.
Probability: Sample Space
Sample space: Ω
The set of all possible worlds
Probability: Events
An event is a set of possible worlds.
Example:
Roll two 6-sided dice.
Ω: (1, 1), (1, 2), (1, 3), … (2, 1), (2, 2)... Example event:
Total = 11
Probability:
A probability measure P is any function of a set S of events (a set of sets of possible worlds)
Example:
Roll two 6-sided dice.
S = {Total = 2, Total = 3, Total = 4, …} 0 ≤ P(T = x) ≤ 1
P(T = 1) + P(T = 2) + … = 1
Probability: Independence
Two events A and B are independent if
P(A and B) = P(A) P(B) Example:
P(d1 = 2 and d2 = 3) = P(d1 = 2) P(d2 = 3) =
Probability: Priors
P(e) is unconditional, called a “prior”
Probability: Conditional Probability
P(a | b) : The probability of a occurring, given
that b occurs.
Example:
P(T = 2 | d1= 1) = 1/6
Probability:
Probability:
Conditional Independence
Season Temp Forecast P Fall Hot Clear 0.30 Fall Hot Rainy 0.05 Fall Cold Clear 0.10 Fall Cold Rainy 0.05 Winter Hot Clear 0.10 Winter Hot Rainy 0.05 Winter Cold Clear 0.15
Inference by Enumeration
Season Temp Forecast P Fall Hot Clear 0.30 Fall Hot Rainy 0.05 Fall Cold Clear 0.10 Fall Cold Rainy 0.05 Winter Hot Clear 0.10 Winter Hot Rainy 0.05
Inference by Enumeration
P(F = C | S = W) = ?
Season Temp Forecast P Fall Hot Clear 0.30 Fall Hot Rainy 0.05 Fall Cold Clear 0.10 Fall Cold Rainy 0.05 Winter Hot Clear 0.10 Winter Hot Rainy 0.05 Winter Cold Clear 0.15
Inference by Enumeration
P(C | W, H) = ?
Probabilistic Reasoning:
Note that beliefs change with evidence. P(I’m on time to work): 0.9
...Given that it’s snowing: 0.6
...and that my car has a flat: 0.3
Bayesian Networks
A Bayes Net is a DAG with annotations on nodes. ● Each node is a random variable.
● An edge from X to Y means X is a parent of Y.