• No results found

15-381: Artificial Intelligence. Probabilistic Reasoning and Inference

N/A
N/A
Protected

Academic year: 2021

Share "15-381: Artificial Intelligence. Probabilistic Reasoning and Inference"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

15-381: Artificial Intelligence

(2)

Advantages of probabilistic

reasoning

• Appropriate for complex, uncertain, environments - Will it rain tomorrow?

• Applies naturally to many domains

- Robot predicting the direction of road, biology, Word paper clip

• Allows to generalize acquired knowledge and incorporate prior belief

- Medical diagnosis

• Easy to integrate different information sources - Robot’s sensors

(3)

Examples

(4)
(5)

Example: Biological data

ATGAAGCTACTGTCTTCTATCGAACAAGCATGCG ATATTTGCCGACTTAAAAAGCTCAAG TGCTCCAAAGAAAAACCGAAGTGCGCCAAGTGT CTGAAGAACAACTGGGAGTGTCGCTAC TCTCCCAAAACCAAAAGGTCTCCGCTGACTAGG GCACATCTGACAGAAGTGGAATCAAGG CTAGAAAGACTGGAACAGCTATTTCTACTGATTT TTCCTCGAGAAGACCTTGACATGATT
(6)

Basic notations

• Random variable

- referring to an element / event whose status is unknown: A = “it will rain tomorrow”

• Domain (usually denoted by Ω)

- The set of values a random variable can take:

- “A = The stock market will go up this year”: Binary - “A = Number of Steelers wins in 2007”: Discrete

(7)

Axioms of probability

(Kolmogorov’s axioms)

• A variety of useful facts can be derived from just three axioms:

1. 0 ≤ P(A) ≤ 1

2. P(true) = 1, P(false) = 0

(8)

Axioms of probability

(Kolmogorov’s axioms)

• A variety of useful facts can be derived from just three axioms:

1. 0 ≤ P(A) ≤ 1

2. P(true) = 1, P(false) = 0

3. P(A B) = P(A) + P(B) – P(A B)

(9)

Axioms of probability

(Kolmogorov’s axioms)

• A variety of useful facts can be derived from just three axioms:

1. 0 ≤ P(A) ≤ 1

2. P(true) = 1, P(false) = 0

(10)

Axioms of probability

(Kolmogorov’s axioms)

• A variety of useful facts can be derived from just three axioms:

1. 0 ≤ P(A) ≤ 1

2. P(true) = 1, P(false) = 0

(11)

Axioms of probability

(Kolmogorov’s axioms)

• A variety of useful facts can be derived from just three axioms:

1. 0 ≤ P(A) ≤ 1

2. P(true) = 1, P(false) = 0

3. P(A B) = P(A) + P(B) – P(A B)

There have been several other attempts to provide a foundation for probability theory. Kolmogorov’s

axioms are the most widely used.

(12)

Example of using the axioms

• Assume a probability for dice as follows: P(1) = P(2) = P(3) = 0.2 P(4) = P(5) = 0.1 P(6) = ? • P({heads,tails}) = ? • P(heads)+(1-P(tails)) = ?

(13)

Using the axioms

• How can we use the axioms to prove that: P(¬A) = 1 – P(A)

(14)

Priors

P(rain tomorrow) = 0.2 P(no rain tomorrow) = 0.8

Rain No rain Degree of belief in an event in the absence of any other information

(15)

Conditional probablity

• What is the probability of an event given knowledge of another event

• Example:

- P(raining | sunny) - P(raining | cloudy)

(16)

Conditional probability

• P(A = 1 | B = 1): The fraction of cases where A is true if B is true

(17)

Conditional probability

• In some cases, given knowledge of one or more random variables we can improve upon our prior belief of another random variable

• For example:

p(slept in movie) = 0.5

p(slept in movie | liked movie) = 1/3

p(didn’t sleep in movie | liked movie) = 2/3

0.3 1 0 0.1 0 0 0.4 0 1 0.2 1 1 P Slept Liked movie

(18)

Joint distributions

• The probability that a set of random

variables will take a specific value is their joint distribution.

• Notation: P(A B) or P(A,B) • Example: P(liked movie, slept)

0.3 1 0 0.1 0 0 0.4 0 1 0.2 1 1 P Slept Liked movie

(19)

Joint distribution (cont)

1 43 2 3 13 1 2 51 2 3 15 2 1 65 2 2 12 1 3 34 2 2 10 1 Evaluation (1-3) Class size Time (regular =2, summer =1) P(class size > 20) = 0.5 P(summer) = 1/3 Evaluation of classes P(class size > 20, summer) = ?
(20)

Joint distribution (cont)

1 43 2 3 13 1 2 51 2 3 15 2 1 65 2 2 12 1 3 34 2 2 10 1 Evaluation (1-3) Class size Time (regular =2, summer =1) P(class size > 20) = 0.5 P(summer) = 1/3 Evaluation of classes P(class size > 20, summer) = 0
(21)

Joint distribution (cont)

1 43 2 3 13 1 2 51 2 3 15 2 1 65 2 2 12 1 3 34 2 2 10 1 Evaluation (1-3) Class size Time (regular =2, summer =1) P(class size > 20) = 0.5 P(eval = 1) = 2/9
(22)

Chain rule

• The joint distribution can be specified in terms of conditional probability:

P(A,B) = P(A|B)*P(B)

• Together with Bayes rule (which is actually derived from it) this is one of the most powerful rules in probabilistic reasoning

(23)

Bayes rule

• One of the most important rules for AI usage. • Derived from the chain rule:

P(A,B) = P(A | B)P(B) = P(B | A)P(A)

• Thus,

Thomas Bayes was an English

clergyman who set out his theory of probability in 1764.

)

(

)

(

)

|

(

)

|

(

B

P

A

P

A

B

P

B

A

P

=

(24)

Bayes rule (cont)

Often it would be useful to derive the rule a bit further:

!

=

=

A

A

P

A

B

P

A

P

A

B

P

B

P

A

P

A

B

P

B

A

P

)

(

)

|

(

)

(

)

|

(

)

(

)

(

)

|

(

)

|

(

This results from: P(B) = ∑AP(B,A)

A B A B

(25)

Using Bayes rule

• Cards game:

Place your bet on the location of the King!

(26)

Using Bayes rule

• Cards game:

Do you want to change your bet?

(27)

Using Bayes rule

)

(

)

(

)

|

(

)

|

(

selB

P

k

C

P

k

C

selB

P

selB

k

C

P

=

=

=

=

Computing the (posterior) probability: P(C = k | selB)

A B C

)

10

(

)

10

|

(

)

(

)

|

(

)

(

)

|

(

=

=

+

=

=

=

=

=

C

P

C

selB

P

k

C

P

k

C

selB

P

k

C

P

k

C

selB

P

Bayes rule
(28)

Using Bayes rule

A B C 1/3 1/2 1/2 1/3 1/2 2/3

)

10

(

)

10

|

(

)

(

)

|

(

)

(

)

|

(

=

=

+

=

=

=

=

=

C

P

C

selB

P

k

C

P

k

C

selB

P

k

C

P

k

C

selB

P

= 1/3

P(C=k | selB) =
(29)

Important points

• Random variables • Chain rule

• Bayes rule

• Joint distribution, independence, conditional independence

(30)

Joint distributions

• The probability that a set of random

variables will take a specific value is their joint distribution.

• Requires a joint probability table to specify the possible assignments • The table can grow very rapidly …

0.3 1 0 0.1 0 0 0.4 0 1 0.2 1 1 P Slept Liked movie

How can we decrease the number of columns in the table?

References

Related documents