Probabilistic programming and its applications

(1)

Probabilistic Programming

and its Applications

Luc De Raedt

(2)

A key question in AI:

Dealing with uncertainty

Reasoning with

relational data

Learning

(3)

A key question in AI:

Dealing with uncertainty

Reasoning with

relational data

Learning

?

• logic

• databases

• programming

• ...

(4)

A key question in AI:

Dealing with uncertainty

Reasoning with

relational data

Learning

?

• logic

• databases

• programming

• ...

• probability theory

• graphical models

• ...

(5)

A key question in AI:

Dealing with uncertainty

Reasoning with

relational data

Learning

?

• logic

• databases

• programming

• ...

• probability theory

• graphical models

• ...

• parameters

• structure

(6)

A key question in AI:

Dealing with uncertainty

Reasoning with

relational data

Learning

Statistical relational learning

?

• logic

• databases

• programming

• ...

• probability theory

• graphical models

• ...

• parameters

• structure

(7)

(8)

Dynamics: Evolving Networks

• Travian

: A massively multiplayer real-time strategy game

• Commercial game run by TravianGames GmbH

• ~3.000.000 players spread over different “worlds”

• ~25.000 players in one world

(9)

World Dynamics

border border border border Alliance 2 Alliance 3 Alliance 4 Alliance 6 P 2 1081 895 1090 1090 1093 915 1081 1040 1077 955 1073 804 1054 942 1087 621 P 3 744 P 5 P 6 950 644 985 932 837 871 777 P 7 946 878 864 913 P 9

Fragment of world with

~10 alliances

~200 players

~600 cities

alliances color-coded

Can we build a model

of this world ?

Can we use it for playing

better ?

(10)

World Dynamics

border border border border Alliance 2 Alliance 4 Alliance 6 P 2 904 1090 917 959 1073 820 946 1087 632 P 3 761 961 1061 988 924 P 5 951 948 938 P 6 950 644 985 888 844 875 783 P 7 946 878 864 913

Fragment of world with

~10 alliances

~200 players

~600 cities

alliances color-coded

Can we build a model

of this world ?

Can we use it for playing

better ?

(11)

World Dynamics

border border border border Alliance 2 Alliance 4 Alliance 6 P 2 918 1090 931 977 835 958 1087 701 P 3 838 947 1026 1081 1002 987 994 P 5 1032 1024 1049 905 P 6 986 712 985 920 877 807 P 7 895 959 P 10 824

Fragment of world with

~10 alliances

~200 players

~600 cities

alliances color-coded

Can we build a model

of this world ?

Can we use it for playing

better ?

(12)

Analyzing

Video Data

• Track people or objects over

time? Even if temporarily

hidden?

• Recognize activities?

• Infer object properties?

Fig. 4. Tracking results from experiment 2. In frame 5, two groups are present. In frame 15, the tracker has correctly split group 1 into 1-0 and 1-1 (see Fig. 3). Between frames 15 and 29, group 1-0 has split up into groups 1-0-0 and 1-0-1, and split up again. New groups, labeled 2 and 3, enter the field of view in frames 21 and 42 respectively.

Six frames of the current best hypothesis from experiment 2 are shown in Fig. 4, the corresponding hypothesis tree is shown in Fig. 3. The sequence exemplifies movement and formation of several groups.

A. Clustering Error

Given the ground truth information on a per-beam basis we can compute the clustering error of the tracker. This is done by counting how often a track’s set of points _P contains too many or wrong points (undersegmentation) and how often _P is missing points (oversegmentation) compared to the ground truth. Two examples for oversegmentation errors can be seen

0 0.1 0.2 0.3 0.4 0.5 0.6 0.5 1 1.5 2 2.5 3 3.5

Error rates per track and frame

Clustering distance threshold d_P (m) w/o tracking Overs. + Unders. Oversegm. Undersegm. 0 0.2 0.4 0.6 0.8 1 0 4 8 12 16 20

Avg. cycle time (sec)

Number of people in ground truth Group tracker

People tracker

Fig. 5. Left: clustering error of the group tracker compared to a memory-less single linkage clustering (without tracking). The smallest error is achieved for a cluster distance of 1.3 m which is very close to the border of personal and social space according to the proxemics theory, marked at 1.2 m by the vertical line. Right: average cycle time for the group tracker versus a tracker for individual people plotted against the ground truth number of people.

relations can be determined in such cases.

For experiment 1, the resulting percentages of incorrectly clustered tracks for the cases undersegmentation, overseg-mentation and the sum of both are shown in Fig. 5 (left), plotted against the clustering distance dP . The figure also

shows the error of a single-linkage clustering of the range data as described in section II. This implements a memory-less group clustering approach against which we compare the clustering performance of our group tracker.

The minimum clustering error of 3.1% is achieved by the tracker at dP = 1.3 m. The minimum error for the

memory-less clustering is 7.0%, more than twice as high. In the more complex experiment 2, the minimum clustering error of the tracker rises to 9.6% while the error of the memory-less clustering reaches 20.2%. The result shows that the group tracking problem is a recursive clustering problem that requires integration of information over time. This occurs when two groups approach each other and pass from opposite directions. The memory-less approach would merge them immediately while the tracking approach, accounting for the velocity information, correctly keeps the groups apart.

In the light of the proxemics theory the result of a minimal clustering error at 1.3 m is noteworthy. The theory predicts that when people interact with friends, they maintain a range of distances between 45 to 120 cm called personal space. When engaged in interaction with strangers, this distance is larger. As our data contains students who tend to know each other well, the result appears consistent with Hall’s findings.

B. Tracking Efficiency

When tracking groups of people rather than individuals, the assignment problems in the data association stage are of course smaller. On the other hand, the introduction of an additional tree level on which different models hypoth-esize over different group formation processes comes with additional computational costs. We therefore compare our system with a person-only tracker which is implemented by

8

[Skarlatidis et al, TPLP 14; Nitti et al, IROS 13, ICRA 14]

(13)

Learning relational affordances

Learn probabilistic model

From two object interactions

Generalize to N

Shelf

Shelf grasp

(14)

Learning relational affordances

Learn probabilistic model

From two object interactions

Generalize to N

Shelf

Shelf grasp

(15)

Example:

(16)

Example:

Information Extraction

(17)

Example:

Information Extraction

(18)

Entity Resolution

M.

Bruynooghe

Automatically extracted co-author network:

which nodes refer to the same person?

A. Kimmig

L. De Raedt

Luc D. Raedt

(19)

Homer

Marge

Bart

Lisa

Lenny

Apu

Moe

Maggie

?

_?

?

+$5

-$3

Which

strategy

gives the

maximum

expected utility

?

Viral Marketing

Which advertising

strategy maximizes

expected profit?

(20)

codes for gene protein pathway cellular component homologgroup phenotype biological process locus is homologous to participates in participates in is located in is related to refers to belongs to is found in is found in participates in refers to Biomine

Biological Networks

(21)

• Structured environments

• objects, and

• relationships

amongst them

• and possibly

• using background knowledge

• cope with

uncertainty

• learn from data

This requires dealing with

Sta

tis

tica

l Rel

ati

ona

l L

ea

rni

ng

Pr

oba

bil

isti

c P

ro

gra

mmi

ng

(22)

Common theme

Dealing with

uncertainty

Reasoning with

relational data

Learning

(23)

Some formalisms

Some SRL formalisms

LPAD: Bruynooghe Vennekens,Verbaeten Markov Logic: Domingos,

Richardson CLP(BN): Cussens,Page, Present PRMs: Friedman,Getoor,Koller, Pfeffer,Segal,Taskar ´03 SLPs: Cussens,Muggleton ´90 ´95 96 First KBMC approaches: Breese, Bacchus, Charniak, Glesner, Goldman, Koller, Poole, Wellmann ´00 BLPs: Kersting, De Raedt RMMs: Anderson,Domingos, Weld

LOHMMs: De Raedt, Kersting, Raiko

Future

Prob. CLP: Eisele, Riezler

´02

PRISM: Kameya, Sato

´94 PLP: Haddawy, Ngo ´97 ´93 Prob. Horn Abduction: Poole ´99 1BC(2): Flach, Lachiche

Logical Bayesian Networks: Blockeel,Bruynooghe, Fierens,Ramon,

(24)

Common theme

Dealing with

uncertainty

Reasoning with

relational data

Learning

Statistical relational learning

• many different formalisms

• our focus: probabilistic

 

(25)

ProbLog

probabilistic Prolog

Dealing with

uncertainty

Reasoning with

relational data

Learning

(26)

Prolog / logic

programming

ProbLog

probabilistic Prolog

Dealing with

uncertainty

Learning

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X). 

(27)

Prolog / logic

programming

ProbLog

probabilistic Prolog

Dealing with

uncertainty

Learning

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X). 

one world

(28)

Prolog / logic

programming

atoms as random

variables

ProbLog

probabilistic Prolog

Learning

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  0.8::stress(ann). 0.6::influences(ann,bob). 0.2::influences(bob,carl).

one world

(29)

Prolog / logic

programming

atoms as random

variables

ProbLog

probabilistic Prolog

Learning

one world

(30)

parameter learning,

adapted relational

learning techniques

Prolog / logic

programming

atoms as random

variables

ProbLog

probabilistic Prolog

one world

(31)

Probabilistic Logic

Programming

Distribution Semantics [Sato, ICLP 95]:

probabilistic choices + logic program

→

distribution over possible worlds

e.g., PRISM, ICL, ProbLog, LPADs, CP-logic, ...

multi-valued

switches

probabilistic

facts

causal-probabilistic

laws

(32)

Roadmap

• Modeling (ProbLog and Church, another

representative of PP)

• Inference

• Learning

(33)

ProbLog by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

(34)

0.4 :: heads.

 

ProbLog by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

• win if (heads and a red ball) or (two balls of same color)

probabilistic fact

: heads is true with

(35)

0.4 :: heads.

 

0.3 :: col(1,red); 0.7 :: col(1,blue).

ProbLog by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

• win if (heads and a red ball) or (two balls of same color)

annotated disjunction

: first ball is red

(36)

0.4 :: heads.

 

0.3 :: col(1,red); 0.7 :: col(1,blue).

0.2 :: col(2,red); 0.3 :: col(2,green);

 

0.5 :: col(2,blue).

 

annotated disjunction

: second ball is red with

probability 0.2, green with 0.3, and blue with 0.5

ProbLog by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

(37)

0.4 :: heads.

 

0.3 :: col(1,red); 0.7 :: col(1,blue).

0.2 :: col(2,red); 0.3 :: col(2,green);

 

0.5 :: col(2,blue).

 

win :- heads, col(_,red).

logical rule

encoding

ProbLog by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

(38)

0.4 :: heads.

 

0.3 :: col(1,red); 0.7 :: col(1,blue).

0.2 :: col(2,red); 0.3 :: col(2,green);

 

0.5 :: col(2,blue).

 

win :- heads, col(_,red).

logical rule

encoding

ProbLog by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

(39)

0.4 :: heads.

 

0.3 :: col(1,red); 0.7 :: col(1,blue).

0.2 :: col(2,red); 0.3 :: col(2,green);

 

0.5 :: col(2,blue).

 

win :- heads, col(_,red).

ProbLog by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

• win if (heads and a red ball) or (two balls of same color)

(40)

Questions

• Probability of

_win

?

 

• Probability of

_win

given

_col(2,green)

?

 

• Most probable world where

_win

is true?

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red).

(41)

Questions

• Probability of

_win

?

 

• Probability of

_win

given

_col(2,green)

?

 

• Most probable world where

_win

is true?

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

marginal probability

query

(42)

Questions

• Probability of

_win

?

 

• Probability of

_win

given

_col(2,green)

?

 

• Most probable world where

_win

is true?

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

marginal probability

conditional probability

evidence

(43)

Questions

• Probability of

_win

?

 

• Probability of

_win

given

_col(2,green)

?

 

• Most probable world where

_win

is true?

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

marginal probability

conditional probability

(44)

Possible Worlds

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

(45)

Possible Worlds

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

(46)

Possible Worlds

H

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

(47)

Possible Worlds

H

R

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

×

0.3

0.4

(48)

Possible Worlds

H

R

×

0.3

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

×

0.3

0.4

(49)

Possible Worlds

H

W

R

×

0.3

0.4 :: heads. 0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

×

0.3

0.4

(50)

Possible Worlds

H

W

R

×

0.3

win :- col(1,C), col(2,C).

×

0.3

0.4 (1

−

0.4)

(51)

Possible Worlds

R

H

W

R

×

0.3

win :- col(1,C), col(2,C).

×

0.3

0.4 (1

−

0.4)

×

0.3

(52)

Possible Worlds

R R

H

W

R

×

0.3

win :- col(1,C), col(2,C).

×

0.3

0.4 (1

−

0.4)

×

0.3 ×

0.2

(53)

Possible Worlds

W

R R

H

W

R

×

0.3

win :- col(1,C), col(2,C).

×

0.3

0.4 (1

−

0.4)

×

0.3 ×

0.2

(54)

Possible Worlds

W

R R

H

W

R

×

0.3

win :- col(1,C), col(2,C).

×

0.3

0.4 (1

−

0.4)

×

0.3 ×

0.2 (1

−

0.4)

(55)

Possible Worlds

W

R R

H

W

R _R

×

0.3

win :- col(1,C), col(2,C).

×

0.3

0.4 (1

−

0.4)

×

0.3 ×

0.2 (1

−

0.4)

×

0.3

(56)

Possible Worlds

W

R R

H

W

R _R _G

×

0.3

win :- col(1,C), col(2,C).

×

0.3

0.4 (1

−

0.4)

×

0.3 ×

0.2 (1

−

0.4)

×

0.3 ×

0.3

(57)

Possible Worlds

W

R R

H

W

R _R _G

×

0.3

win :- col(1,C), col(2,C).

×

0.3

0.4 (1

−

0.4)

×

0.3 ×

0.2 (1

−

0.4)

×

0.3 ×

0.3

(58)

Possible Worlds

W

R R

H

W

R _R _G

×

0.3

win :- col(1,C), col(2,C).

×

0.3

0.4 (1

−

0.4)

×

0.3 ×

0.2 (1

−

0.4)

×

0.3 ×

0.3

(59)

All Possible Worlds

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

(60)

Most likely world

where

_win

is true?

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210 MPE Inference

(61)

Most likely world

where

_win

is true?

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210 MPE Inference

(62)

MPE Inference

Most likely world where

col(2,blue)

is false?

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

(63)

MPE Inference

Most likely world where

col(2,blue)

is false?

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

(64)

P(win)=

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210 ?

Probability

Marginal

(65)

P(win)=

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210 ∑

Probability

Marginal

(66)

P(win)=

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210 ∑

=0.562

Probability

Marginal

(67)

P(win|col(2,green))=

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210 ?

Conditional

Probability

(68)

=P(win

∧

col(2,green))/P(col(2,green))

P(win|col(2,green))=

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210 ∑

/

∑

Conditional

Probability

(69)

=P(win

∧

col(2,green))/P(col(2,green))

P(win|col(2,green))=

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210 ∑

/

∑

Conditional

Probability

(70)

P(win|col(2,green))=

=0.036/0.3=0.12

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210 ∑

/

∑

Conditional

Probability

(71)

Alternative view:

CP-Logic

[Vennekens et al, ICLP 04]

throws(john).

0.5::throws(mary).

0.8 :: break :- throws(mary). 0.6 :: break :- throws(john).

probabilistic causal laws

John throws

Window breaks

doesn’t break

Mary throws

_{Mary throws}

doesn’t throw

_{doesn’t throw}

1.0

0.6

0.4

0.5

0.8

0.2

(72)

• Discrete- and continuous-valued random variables

Distributional Clauses (DC)

(73)

• Discrete- and continuous-valued random variables

Distributional Clauses (DC)

length(Obj) ~ gaussian(6.0,0.45) :- type(Obj,glass).

random variable

with Gaussian distribution

(74)

• Discrete- and continuous-valued random variables

Distributional Clauses (DC)

stackable(OBot,OTop) :-  

≃length(OBot) ≥ ≃length(OTop),  

≃width(OBot) ≥ ≃width(OTop).

comparing

values of

random variables

(75)

• Discrete- and continuous-valued random variables

Distributional Clauses (DC)

ontype(Obj,plate) ~ finite([0 : glass, 0.0024 : cup, 

0 : pitcher, 0.8676 : plate, 

0.0284 : bowl, 0 : serving,  

0.1016 : none])  

:- obj(Obj), on(Obj,O2), type(O2,plate).

random variable

with

(76)

• Discrete- and continuous-valued random variables

Distributional Clauses (DC)

ontype(Obj,plate) ~ finite([0 : glass, 0.0024 : cup, 

0 : pitcher, 0.8676 : plate, 

0.0284 : bowl, 0 : serving,  

0.1016 : none])  

:- obj(Obj), on(Obj,O2), type(O2,plate).

(77)

Dealing with

uncertainty

Reasoning with

(78)

Dealing with

uncertainty

relational

Probabilistic Databases

person city ann london bob york bornIn city country london uk york uk paris usa cityIn

select x.person, y.country from bornIn x, cityIn y

where x.city=y.city

(79)

Dealing with

uncertainty

relational

Probabilistic Databases

one world

person city ann london bob york bornIn city country london uk york uk paris usa cityIn

(80)

relational

tuples as random

variables

Probabilistic Databases

one world

person city ann london bob york bornIn city country london uk york uk paris usa cityIn person city P ann london 0,87 bob york 0,95 eve new york 0,9 tom paris 0,56 bornIn city country P london uk 0,99 york uk 0,75 paris usa 0,4 cityIn

(81)

relational

tuples as random

variables

Probabilistic Databases

one world

several possible worlds

(82)

relational

tuples as random

variables

Probabilistic Databases

one world

several possible worlds

probabilistic tables + database queries

→

distribution over possible worlds

(83)

Example:

Information Extraction

(84)

• probabilistic

choices

+ their

consequences

• probability distribution over

possible worlds

• how to efficiently answer

questions

?

• most probable world (MPE inference)

• probability of query (computing marginals)

• probability of query given evidence

(85)

• input database: ground facts

• probabilistic facts

• annotated disjunctions

 

• flexible probabilities

 

• Prolog clauses

person(bob). 0.5::stress(bob). 0.5::stress(X) :- person(X). 0.4::a(X); 0.3::b(X); 0.2::c(X); 0.1::d(X) :- q(X). 0.5::weather(sun,0) ; 0.5::weather(rain,0). P::pack(Item) :- weight(Item,W), P is 1.0/W.

(86)

(87)

Some Probabilistic Programming

Languages outside LP

• IBAL [Pfeffer 01]

• Figaro [Pfeffer 09]

• Church [Goodman et al 08 ]

• BLOG [Milch et al 05]

• Venture [Mansingha et al.]

• Anglican and Probabilistic-C [Wood et al].

(88)

Church

probabilistic functional

programming

Dealing with

uncertainty

Reasoning with

relational data

Learning

[Goodman et al, UAI 08]

(89)

functional

programming

Church

probabilistic functional

programming

Dealing with

uncertainty

Learning

(define plus5 (lambda (x) (+ x 5)))

(90)

functional

programming

Church

probabilistic functional

programming

Dealing with

uncertainty

Learning

one execution

(91)

functional

programming

random

primitives

Church

probabilistic functional

programming

Learning

(define randplus5

(lambda (x) (if (flip 0.6)  

(+ x 5) x)))

(map randplus5 '(1 2 3))

one execution

(92)

functional

programming

random

primitives

Church

probabilistic functional

programming

Learning

(+ x 5) x))) (map randplus5 '(1 2 3))

one execution

several

possible

executions

(93)

functional

programming

random

primitives

Church

probabilistic functional

programming

Learning

(+ x 5) x))) (map randplus5 '(1 2 3))

one execution

several

possible

executions

probabilistic primitives + functional program

→

distribution over possible executions

[Goodman et al, UAI 08]

(94)

Church Example

(define randplus5 (lambda (x) (if (flip 0.6) (+ x 5) x))) (map randplus5 '(1 2 3))

function

arguments

random primitive

P(true) = 0.6

definition

list

apply function

to each element

(95)

Church Example

function

arguments

random primitive

P(true) = 0.6

definition

list

apply function

to each element

(96)

Church Example

function

arguments

random primitive

P(true) = 0.6

definition

list

apply function

to each element

(97)

Computing probability

distribution

(

enumeration-query

(define randplus5 (lambda (x) (if (flip 0.6) (+ x 5) x))) (map randplus5 '(1 2))

true )

enumerates all executions &

sums probabilities per result

evidence

query

(98)

Church by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

(99)

(define heads (mem (lambda () (flip 0.4))))

Church by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

(100)

Church by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

• win if (heads and a red ball) or (two balls of same color)

(101)

Church by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

• win if (heads and a red ball) or (two balls of same color)

(define color1 (mem (lambda () (if (flip 0.3) 'red 'blue))))

(define color2 (mem (lambda ()  

(102)

Church by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

• win if (heads and a red ball) or (two balls of same color)

(multinomial '(red green blue) '(0.2 0.3 0.5))))) (define redball (or (equal? (color1) 'red) (equal? (color2) 'red)))

(103)

Church by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

• win if (heads and a red ball) or (two balls of same color)

(multinomial '(red green blue) '(0.2 0.3 0.5))))) (define redball (or (equal? (color1) 'red) (equal? (color2) 'red))) (define win1 (and (heads) redball))

(104)

Church by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

• win if (heads and a red ball) or (two balls of same color)

(105)

Church by example:

A bit of gambling

h

• toss (biased) coin & draw ball from each urn

• win if (heads and a red ball) or (two balls of same color)

(106)

Probabilistic

Programming Summary

• Church: functional programming + random primitives

• probabilistic generative model

• stochastic memoization

• sampling

• increasing number of probabilistic programming

languages using various underlying paradigms

(107)

Roadmap

• Modeling (ProbLog and Church, another

representative of PP)

• Inference

• Learning

(108)

Answering Questions

program

queries

evidence

marginal

probabilities

conditional

probabilities

MPE state

Given:

Find:

?

(109)

Answering Questions

program

queries

evidence

marginal

probabilities

conditional

probabilities

MPE state

Given:

Find:

?

possible worlds

 

inf

easible

(110)

Answering Questions

program

queries

evidence

marginal

probabilities

conditional

probabilities

MPE state

Given:

Find:

?

possible worlds

 

inf

easible

logical reasoning

probabilistic inference

data structure

(111)

Answering Questions

program

queries

evidence

marginal

probabilities

conditional

probabilities

MPE state

Given:

Find:

?

possible worlds

 

inf

easible

logical reasoning

probabilistic inference

data structure

knowledge

compilation

(112)

Answering Questions

program

queries

evidence

marginal

probabilities

conditional

probabilities

MPE state

Given:

Find:

?

possible worlds

 

inf

easible

logical reasoning

probabilistic inference

data structure

1. using proofs

2. using models

knowledge

compilation

(113)

Logical Reasoning:

Proofs in Prolog

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y).

(114)

Logical Reasoning:

Proofs in Prolog

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl).

(115)

Logical Reasoning:

Proofs in Prolog

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl).

(116)

Logical Reasoning:

Proofs in Prolog

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y).

(117)

Logical Reasoning:

Proofs in Prolog

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y).

(118)

Logical Reasoning:

Proofs in Prolog

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y). ?- smokes(bob).

(119)

Logical Reasoning:

Proofs in Prolog

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y). ?- smokes(bob). ?- stress(bob).

(120)

Logical Reasoning:

Proofs in Prolog

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y). ?- smokes(bob). ?- stress(bob). ?- influences(Y1,bob),smokes(Y1).

(121)

Logical Reasoning:

Proofs in Prolog

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y). ?- smokes(bob). ?- stress(bob). ?- influences(Y1,bob),smokes(Y1).

(122)

Logical Reasoning:

Proofs in Prolog

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y). ?- smokes(bob). ?- stress(bob). ?- influences(Y1,bob),smokes(Y1). ?- smokes(ann).

(123)

Logical Reasoning:

Proofs in Prolog

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y). ?- smokes(bob). ?- stress(bob). ?- influences(Y1,bob),smokes(Y1). ?- smokes(ann). ?- stress(ann).

(124)

Logical Reasoning:

Proofs in Prolog

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y). ?- smokes(bob). ?- stress(bob). ?- influences(Y1,bob),smokes(Y1). ?- smokes(ann). ?- influences(Y2,ann),smokes(Y2). ?- stress(ann).

(125)

Logical Reasoning:

Proofs in Prolog

(126)

Logical Reasoning:

Proofs in Prolog

(127)

Logical Reasoning:

Proofs in Prolog

proof = facts used in successful derivation:

Y=bob

(128)

Proofs in

ProbLog

0.8::stress(ann). 0.6::influences(ann,bob). 0.2::influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). influences(bob,carl)&influences(ann,bob)&stress(ann) ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y). ?- smokes(bob). ?- stress(bob). ?- influences(Y1,bob),smokes(Y1). ?- smokes(ann). ?- influences(Y2,ann),smokes(Y2). ?- stress(ann).

Y=bob

Y1=ann

(129)

influences(bob,carl)  & influences(ann,bob)  & stress(ann)

Proofs in

ProbLog

0.8::stress(ann). 0.4::stress(bob). 0.6::influences(ann,bob). 0.2::influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y). ?- smokes(bob). ?- stress(bob). ?- influences(Y1,bob),smokes(Y1). ?- smokes(ann). ?- influences(Y2,ann),smokes(Y2). ?- stress(ann).

Y=bob

Y1=ann

0.2 ×

0.6 ×

0.8 = 0.096

(130)

Proofs in

ProbLog

Y=bob

Y1=ann

0.2 ×

0.6 ×

0.8 = 0.096

(131)

Proofs in

ProbLog

Y=bob

Y1=ann

influences(bob,carl)  & stress(bob)

0.2 ×

0.6 ×

0.8 = 0.096

0.2 ×

0.4 = 0.08

(132)

Proofs in

ProbLog

Y=bob

Y1=ann

influences(bob,carl)  & stress(bob)

0.2 ×

0.6 ×

0.8 = 0.096

0.2 ×

0.4 = 0.08

proofs overlap!

 

(133)

infl(bob,carl) & infl(ann,bob) & st(ann) & \+st(bob)  

infl(bob,carl) & infl(ann,bob) & st(ann) & st(bob) 

infl(bob,carl) & \+infl(ann,bob) & st(ann) & st(bob) 

infl(bob,carl) & infl(ann,bob) & \+st(ann) & st(bob) 

infl(bob,carl) & \+infl(ann,bob) & \+st(ann) & st(bob) 

...

Disjoint-Sum-Problem

(134)

...

Disjoint-Sum-Problem

influences(bob,carl) &

influences(ann,bob) & stress(ann)

(135)

...

Disjoint-Sum-Problem

influences(bob,carl) & stress(bob)

(136)

...

Disjoint-Sum-Problem

possible worlds

(137)

...

Disjoint-Sum-Problem

possible worlds

sum of proof probabilities: 0.096+0.08 = 0.1760

0.0576

0.0384

0.0256

0.0096

0.0064

∑

= 0.1376

(138)

...

Disjoint-Sum-Problem

possible worlds

sum of proof probabilities: 0.096+0.08 = 0.1760

0.0576

0.0384

0.0256

0.0096

0.0064

∑

= 0.1376

(139)

Binary Decision Diagrams

i(b,c)

0

1

i(a,b) s(a) s(b)

• compact graphical

representation of

Boolean formula

• automatically

disjoins proofs

• popular in many

branches of CS

[Bryant 86]

(140)

Binary Decision Diagrams

i(b,c)

0

1

i(a,b) s(a) s(b) influences(bob,carl) & stress(bob)

• compact graphical

representation of

Boolean formula

• automatically

disjoins proofs

• popular in many

branches of CS

[Bryant 86]

(141)

Binary Decision Diagrams

i(b,c)

0

1

• compact graphical

representation of

Boolean formula

• automatically

disjoins proofs

• popular in many

branches of CS

[Bryant 86]

(142)

Binary Decision Diagrams

i(b,c)

0

1

• compact graphical

representation of

Boolean formula

• automatically

disjoins proofs

• popular in many

branches of CS

[Bryant 86]

(143)

Binary Decision Diagrams

• compact graphical representation of Boolean

formula

• popular in many branches of CS

• automatically disjoins proofs

 

→

efficient probability computation

• other representations exist (SDDs, d-DNNFs)

• knowledge compilation is state of the art for

probabilistic inference (Darwiche et al.)

(144)

Initial Approach

(ProbLog1 & others)

Find all proofs of query

calculate marginal by

dynamic programming

Binary Decision

Diagram (BDD)

(145)

Initial Approach

(ProbLog1 & others)

Find all proofs of query

calculate marginal by

dynamic programming

Binary Decision

Diagram (BDD)

0.4::heads(1). 0.7::heads(2). 0.5::heads(3). win :- heads(1). win :- heads(2),heads(3).