Probabilistic programming and its applications

(1)

Probabilistic Programming

and its Applications

Luc De Raedt

KI 2017

(2)

A key question in AI:

Dealing with uncertainty

Reasoning with

relational data

Learning

(3)

Reasoning with

relational data

Learning

?

•

logic

•

databases

•

programming

•

...

(4)

Reasoning with

relational data

Learning

?

•

logic

•

databases

•

programming

•

...

•

probability theory

•

graphical models

•

...

(5)

Reasoning with

relational data

Learning

?

•

logic

•

databases

•

programming

•

...

•

probability theory

•

graphical models

•

...

•

parameters

•

structure

(6)

Dealing with uncertainty

Reasoning with

relational data

Learning

Statistical relational learning, probabilistic logic

?

•

logic

•

databases

•

programming

•

...

•

probability theory

•

graphical models

•

...

•

parameters

•

structure

(7)

Example:

(8)

Example:

Information Extraction

(9)

Example:

Information Extraction

(10)

codes for gene protein pathway cellular component homologgroup phenotype biological process locus is homologous to participates in participates in is located in is related to refers to belongs to is found in is found in participates in refers to Biomine

Networks of Uncertain

Information

(11)

Biomine

network

(12)

Biomine

network

(13)

(14)

Biomine Network

presenilin 2

Gene

EntrezGene:81751

Notch receptor processing

BiologicalProcess

(15)

Biomine Network

-participates_in

0.220

(16)

Biomine Network

-participates_in

0.220

BiologicalProces

•

different types of nodes & links

•

automatically extracted from text,

databases, ...

•

probabilities quantifying source

reliability, extractor confidence, ...

•

similar in other contexts, e.g.,

linked open data, NELL@CMU, ...

(17)

Can we find the mechanism

connecting causes to effects?

Molecular interaction

networks

(18)

Dynamic networks

border border border border Alliance 2 Alliance 3 Alliance 4 Alliance 6 P 2 1081 895 1090 1090 1093 1084 915 1081 1040 1077 955 1073 804 1054 830 942 1087 621 P 3 744 748 559 P 5 P 6 950 644 985 932 837 871 777 P 7 946 878 864 913 P 9

Travian

: A massively multiplayer

real-time strategy game

Can we build a model

of this world ?

Can we use it for playing

better ?

(19)

Dynamic networks

Travian

: A massively multiplayer

real-time strategy game

Can we build a model

of this world ?

Can we use it for playing

better ?

border border border border Alliance 2 Alliance 4 P 2 948 951 990 856 795 980 730 P 3 898 803 860 964 1037 1085 689 1005 1007 899 1005 760 P 5 1051 1051 1040 860 774 1061 886 844 945 713 P 10 839 796 838

(20)

Learning relational affordances

Learn probabilistic model

From two object interactions

Generalize to N

Shelf

Shelf grasp

(21)

Learning relational affordances

Learn probabilistic model

From two object interactions

Generalize to N

Shelf

Shelf grasp

(22)

Answering Probability

Questions

Our goal

Mike has a bag of marbles with 4 white, 8 blue, and 6 red marbles. He pulls out one marble from the bag and it is red. What is the probability that the second marble he pulls out of the bag is white?

The answer is 0.235941.

(23)

Common theme

Dealing with

uncertainty

Reasoning with

relational data

Learning

(24)

Some formalisms

Some SRL formalisms

LPAD: Bruynooghe Vennekens,Verbaeten Markov Logic: Domingos,

Richardson CLP(BN): Cussens,Page, Present PRMs: Friedman,Getoor,Koller, Pfeffer,Segal,Taskar ´03 SLPs: Cussens,Muggleton ´90 ´95 96 First KBMC approaches: Breese, Bacchus, Charniak, Glesner, Goldman, Koller, Poole, Wellmann ´00 BLPs: Kersting, De Raedt RMMs: Anderson,Domingos, Weld

LOHMMs: De Raedt, Kersting, Raiko

Future

Prob. CLP: Eisele, Riezler

´02

PRISM: Kameya, Sato ´94 PLP: Haddawy, Ngo ´97 ´93 Prob. Horn Abduction: Poole ´99 1BC(2): Flach, Lachiche

Logical Bayesian Networks: Blockeel,Bruynooghe, Fierens,Ramon,

(25)

Many different angles

•

Probabilistic programming

•

Logic programming and probabilistic

databases (ProbLog and DS as

representatives)

•

Functional and imperative (Church

as representatives)

•

Statistical relational AI and learning

(26)

ProbLog

probabilistic Prolog

Dealing with

uncertainty

Reasoning with

relational data

Learning

(27)

Prolog / logic

programming

ProbLog

Dealing with

uncertainty

Learning

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X). 

(28)

Prolog / logic

programming

ProbLog

Dealing with

uncertainty

Learning

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X). 

one world

(29)

Prolog / logic

programming

atoms as random

variables

ProbLog

Learning

stress(ann). influences(ann,bob). influences(bob,carl). smokes(X) :- stress(X).  0.8::stress(ann). 0.6::influences(ann,bob). 0.2::influences(bob,carl).

one world

(30)

Prolog / logic

programming

atoms as random

variables

ProbLog

Learning

one world

(31)

Prolog / logic

programming

atoms as random

variables

ProbLog

Learning

one world

several possible worlds

Distribution Semantics [Sato, ICLP 95]:

probabilistic choices + logic program

(32)

parameter learning,

adapted relational

learning techniques

Prolog / logic

programming

atoms as random

variables

ProbLog

one world

(33)

Probabilistic Logic

Programming

→

distribution over possible worlds

e.g., PRISM, ICL, ProbLog, LPADs, CP-logic, PITA, ...

multi-valued

switches

probabilistic

facts

causal-probabilistic

laws

(34)

Roadmap

1. Modeling

2. Inference

3. Learning

(35)

(36)

ProbLog by example:

A bit of gambling

h

•

toss (biased) coin & draw ball from each urn

(37)

0.4 :: heads.

ProbLog by example:

A bit of gambling

h

•

win if (heads and a red ball) or (two balls of same color)

probabilistic fact

: heads is true with

(38)

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

ProbLog by example:

A bit of gambling

h

•

annotated disjunction

: first ball is red

(39)

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

0.2 :: col(2,red); 0.3 :: col(2,green);

0.5 :: col(2,blue).

annotated disjunction

: second ball is red with

probability 0.2, green with 0.3, and blue with 0.5

ProbLog by example:

A bit of gambling

h

•

(40)

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

0.2 :: col(2,red); 0.3 :: col(2,green);

0.5 :: col(2,blue).

win :- heads, col(_,red).

logical rule

encoding

ProbLog by example:

A bit of gambling

h

•

(41)

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

0.2 :: col(2,red); 0.3 :: col(2,green);

0.5 :: col(2,blue).

logical rule

encoding

ProbLog by example:

A bit of gambling

h

•

(42)

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

0.2 :: col(2,red); 0.3 :: col(2,green);

0.5 :: col(2,blue).

ProbLog by example:

A bit of gambling

h

•

(43)

Questions

•

Probability of

win

?

•

Probability of

win

given

col(2,green)

?

•

Most probable world where

win

is true?

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red).

(44)

Questions

•

Probability of

win

?

•

Probability of

win

given

col(2,green)

?

•

win

is true?

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

marginal probability

query

(45)

Questions

•

Probability of

win

?

•

Probability of

win

given

col(2,green)

?

•

win

is true?

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

marginal probability

conditional probability

evidence

(46)

Questions

•

Probability of

win

?

•

Probability of

win

given

col(2,green)

?

•

win

is true?

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

marginal probability

conditional probability

(47)

Possible Worlds

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

(48)

Possible Worlds

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

(49)

Possible Worlds

H

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

(50)

Possible Worlds

H

R

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

×

0.3

0.4

(51)

Possible Worlds

H

R

×

0.3

0.4 :: heads.

0.3 :: col(1,red); 0.7 :: col(1,blue).

win :- col(1,C), col(2,C).

×

0.3

0.4

(52)

Possible Worlds

H

W

R

×

0.3

0.4 :: heads. 0.3 :: col(1,red); 0.7 :: col(1,blue).

0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red).

win :- col(1,C), col(2,C).

×

0.3

0.4

(53)

Possible Worlds

H

W

R

×

0.3

0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red).

win :- col(1,C), col(2,C).

×

0.3

0.4

(1

−

0.4)

(54)

Possible Worlds

R

H

W

R

×

0.3

0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red).

win :- col(1,C), col(2,C).

×

0.3

0.4

(1

−

0.4)

×

0.3

(55)

Possible Worlds

R R

H

W

R

×

0.3

0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red).

win :- col(1,C), col(2,C).

×

0.3

0.4

(1

−

0.4)

×

0.3

×

0.2

(56)

Possible Worlds

W

R R

H

W

R

×

0.3

0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red).

win :- col(1,C), col(2,C).

×

0.3

0.4

(1

−

0.4)

×

0.3

×

0.2

(57)

Possible Worlds

W

R R

H

W

R

×

0.3

0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red).

win :- col(1,C), col(2,C).

×

0.3

0.4

(1

−

0.4)

×

0.3

×

0.2

(1

−

0.4)

(58)

Possible Worlds

W

R R

H

W

R _R

×

0.3

0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red).

win :- col(1,C), col(2,C).

×

0.3

0.4

(1

−

0.4)

×

0.3

×

0.2

(1

−

0.4)

×

0.3

(59)

Possible Worlds

W

R R

H

W

R _R _G

×

0.3

0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red).

win :- col(1,C), col(2,C).

×

0.3

0.4

(1

−

0.4)

×

0.3

×

0.2

(1

−

0.4)

×

0.3

×

0.3

(60)

Possible Worlds

W

R R

H

W

R _R _G

×

0.3

0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red).

win :- col(1,C), col(2,C).

×

0.3

0.4

(1

−

0.4)

×

0.3

×

0.2

(1

−

0.4)

×

0.3

×

0.3

(61)

Possible Worlds

W

R R

H

W

R _R _G

×

0.3

0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red).

win :- col(1,C), col(2,C).

×

0.3

0.4

(1

−

0.4)

×

0.3

×

0.2

(1

−

0.4)

×

0.3

×

0.3

(62)

All Possible Worlds

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

(63)

Most likely world

where

win

is true?

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

MPE Inference

(64)

Most likely world

where

win

is true?

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

MPE Inference

(65)

MPE Inference

Most likely world where

col(2,blue)

is false?

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

(66)

MPE Inference

Most likely world where

col(2,blue)

is false?

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

(67)

P(win)=

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

?

Probability

Marginal

(68)

P(win)=

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

∑

Probability

Marginal

(69)

P(win)=

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

∑

=0.562

Probability

Marginal

(70)

P(win|col(2,green))=

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

?

Conditional

Probability

(71)

=P(win

∧

col(2,green))/P(col(2,green))

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

∑

/

∑

Conditional

Probability

(72)

=P(win

∧

col(2,green))/P(col(2,green))

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

∑

/

∑

Conditional

Probability

(73)

=0.036/0.3=0.12

W

R R

H

R B

H

W

R G

H

W

R R R G R B

H

B B

H

B G

H

W

R B B R G B B B

0.024

0.036

0.060

0.036

0.054

0.090

0.056

0.084

0.126

0.140

0.210

∑

/

∑

Conditional

Probability

(74)

Distribution Semantics

(with probabilistic facts)

[Sato, ICLP 95]

P

(Q) =

X

F

[

R

|

=

Q

Y

f

2

F

p(f

)

Y

f

62

F

1

p(f

)

(75)

[Sato, ICLP 95]

P

(Q) =

X

F

[

R

|

=

Q

Y

f

2

F

p(f

)

Y

f

62

F

1

p(f

)

query

(76)

[Sato, ICLP 95]

P

(Q) =

X

F

[

R

|

=

Q

Y

f

2

F

p(f

)

Y

f

62

F

1

p(f

)

query

subset of

probabilistic

facts

(77)

[Sato, ICLP 95]

P

(Q) =

X

F

[

R

|

=

Q

Y

f

2

F

p(f

)

Y

f

62

F

1

p(f

)

query

subset of

probabilistic

facts

Prolog

rules

(78)

[Sato, ICLP 95]

P

(Q) =

X

F

[

R

|

=

Q

Y

f

2

F

p(f

)

Y

f

62

F

1

p(f

)

query

subset of

probabilistic

facts

Prolog

rules

sum over possible worlds

where Q is true

(79)

[Sato, ICLP 95]

P

(Q) =

X

F

[

R

|

=

Q

Y

f

2

F

p(f

)

Y

f

62

F

1

p(f

)

query

subset of

probabilistic

facts

Prolog

rules

sum over possible worlds

where Q is true

probability of

possible world

(80)

Extensions of basic PLP

→

distribution over possible worlds

continuous RVs

decisions

constraints

time & dynamics

programming

constructs

semiring

labels

(81)

Discrete- and continuous-valued random variables

Distributional Clauses (DC)

length(Obj) ~ gaussian(6.0,0.45) :- type(Obj,glass).

stackable(OBot,OTop) :-

≃

length(OBot)

≥

≃

length(OTop),

≃

width(OBot)

≥

≃

width(OTop).

random variable

with Gaussian distribution

comparing

values of

(82)

(83)

(84)

(85)

Dealing with

uncertainty

Reasoning with

(86)

Dealing with

uncertainty

relational

Probabilistic Databases

person city ann london bob york bornIn city country london uk york uk paris usa cityIn

select x.person, y.country from bornIn x, cityIn y

where x.city=y.city

(87)

Dealing with

uncertainty

relational

one world

person city ann london bob york bornIn city country london uk york uk paris usa cityIn

select x.person, y.country from bornIn x, cityIn y

(88)

relational

tuples as random

variables

one world

person city ann london bob york bornIn city country london uk york uk paris usa cityIn person city P ann london 0,87 bob york 0,95

eve new york 0,9

tom paris 0,56 bornIn city country P london uk 0,99 york uk 0,75 paris usa 0,4 cityIn

select x.person, y.country from bornIn x, cityIn y

(89)

relational

tuples as random

variables

one world

eve new york 0,9

select x.person, y.country from bornIn x, cityIn y

(90)

relational

tuples as random

variables

one world

eve new york 0,9

select x.person, y.country from bornIn x, cityIn y

probabilistic tables + database queries

→

distribution over possible worlds

(91)

Cancer(A)

Smokes(A)

Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Suppose we have two constants:

Anna

(A) and

Bob

(B)

(92)

Markov Logic

Networks

Some key differences :

•

MLNs are not a programming language

•

MLNs are generating an undirected graphical model;

•

PPs are directed

•

MLNs allows for multiple models — p v q (needs max entropy to assign prob. to models {p}, {q}, {p,q});

•

Probabilistic Logic Programs use least-fix point semantics

•

MLNs / first order logic cannot express transitive closure of relation; logic programs can

(93)

Church

probabilistic functional

programming

Dealing with

uncertainty

Reasoning with

relational data

Learning

(94)

functional

programming

Church

programming

Dealing with

uncertainty

Learning

(define plus5 (lambda (x) (+ x 5)))

(95)

functional

programming

Church

programming

Dealing with

uncertainty

Learning

one execution

(96)

functional

programming

random

primitives

Church

programming

Learning

(define randplus5

(lambda (x) (if (flip 0.6)   (+ x 5) x)))

(map randplus5 '(1 2 3))

one execution

(97)

functional

programming

random

primitives

Church

programming

Learning

(define randplus5

(lambda (x) (if (flip 0.6)   (+ x 5) x))) (map randplus5 '(1 2 3))

one execution

several

possible

executions

(98)

functional

programming

random

primitives

Church

programming

Learning

(define randplus5

(lambda (x) (if (flip 0.6)   (+ x 5) x))) (map randplus5 '(1 2 3))

one execution

several

possible

executions

probabilistic primitives + functional program

→

distribution over possible executions

[Goodman et al, UAI 08]

(99)

Church Example

(define randplus5 (lambda (x) (if (flip 0.6) (+ x 5) x))) (map randplus5 '(1 2 3))

function

arguments

random primitive

P(true) = 0.6

definition

list

apply function

to each element

(1 2) (1 7) (6 2) (6 7)

0.16 0.24 0.24 0.36

(100)

Church Example

(define randplus5 (lambda (x) (if (flip 0.6) (+ x 5) x))) (map randplus5 '(1 2 3))

function

arguments

random primitive

P(true) = 0.6

definition

list

apply function

to each element

(1 2) (1 7) (6 2) (6 7)

0.16 0.24 0.24 0.36

(101)

(define heads (mem (lambda () (flip 0.4))))

Church by example:

A bit of gambling

h

•

(define color1 (mem (lambda () (if (flip 0.3) 'red 'blue)))) (define color2 (mem (lambda ()  

(multinomial '(red green blue) '(0.2 0.3 0.5))))) (define redball (or (equal? (color1) 'red) (equal? (color2) 'red))) (define win1 (and (heads) redball))

(102)

Roadmap

1. Modeling

2. Inference

3. Learning

(103)

(104)

Inference / Reasoning

•

Most of the work in PP and StarAI is on inference

•

It is (#P)-hard (complexity wise)

•

Many inference methods

•

exact, approximate, sampling and lifted …

•

Inference is the key to learning

(105)

Dichotomy of UCQ

Evaluation

•

U

nion of

C

onjunctive

Q

ueries

≈

Datalog without recursion and

negation

•

Theorem: UCQ evaluation is either

(106)

Dichotomy of UCQ

Evaluation

•

U

nion of

C

onjunctive

Q

ueries

≈

Datalog without recursion and

negation

•

Theorem: UCQ evaluation is either

polynomial in database size or #P-hard

(107)

polynomial

#P-hard

(108)

s(X),i(X,Y)

polynomial

#P-hard

(109)

H0 = s(X),i(X,Y),m(Y)

s(X),i(X,Y)

polynomial

#P-hard

(110)

De Raedt, Kersting, Natarajan, Poole: Statistical Relational AI

Two Steps

•

Logical inference

-

•

proofs or model theoretic …

•

Result: Weighted Model Counting problem

•

Probabilistic propositional inference

—

•

Backtracking search — DPLL, VE, RC based

•

Knowledge Compilation

(111)

Proofs in

Pro(b)Log

0.8::stress(ann). 0.4::stress(bob). 0.6::influences(ann,bob). 0.2::influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y). ?- smokes(bob). ?- stress(bob). ?- influences(Y1,bob),smokes(Y1). ?- smokes(ann). ?- influences(Y2,ann),smokes(Y2). ?- stress(ann).

(112)

Proofs in

Pro(b)Log

0.8::stress(ann). 0.4::stress(bob). 0.6::influences(ann,bob). 0.2::influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y). ?- smokes(carl). ?- stress(carl). ?- influences(Y,carl),smokes(Y). ?- smokes(bob). ?- stress(bob). ?- influences(Y1,bob),smokes(Y1). ?- smokes(ann). ?- influences(Y2,ann),smokes(Y2). ?- stress(ann). 0.2×0.6×0.8   = 0.096

0.2

×

0.4

= 0.08

(113)

Answer Set

Programming

0.8::stress(ann). 0.4::stress(bob). 0.6::influences(ann,bob). 0.2::influences(bob,carl). smokes(X) :- stress(X).  smokes(X) :-   influences(Y,X),   smokes(Y).

?- smokes(carl).

(114)

Answer Set

Programming

?- smokes(carl).

•

Find relevant ground rules by backward reasoning

and grounding

(115)

Answer Set

Programming

?- smokes(carl).

•

and grounding

smokes(carl):- influences(bob,carl),smokes(bob). smokes(bob) :- stress(bob).

smokes(bob) :- influences(ann,bob),smokes(ann). smokes(ann) :- stress(ann).

(116)

Answer Set

Programming

?- smokes(carl).

•

and grounding

smokes(carl):- influences(bob,carl),smokes(bob). smokes(bob) :- stress(bob).

smokes(bob) :- influences(ann,bob),smokes(ann). smokes(ann) :- stress(ann).

(117)

Answer Set

Programming

?- smokes(carl).

•

and grounding

•

Convert to propositional logic formula

smokes(carl):- influences(bob,carl),smokes(bob). smokes(bob) :- stress(bob). smokes(bob) :- influences(ann,bob),smokes(ann). smokes(ann) :- stress(ann).

(118)

Answer Set

Programming

?- smokes(carl).

•

and grounding

•

sm(c)

↔

(i(b,c)

⋀

sm(b))

sm(b)

↔ (

st(b)

⋁

(i(a,b)

⋀

sm(a)))

may require

loop-breaking

(119)

Answer Set

Programming

?- smokes(carl).

•

and grounding

•

sm(c)

↔

(i(b,c)

⋀

sm(b))

sm(b)

↔ (

st(b)

⋁

(i(a,b)

⋀

sm(a)))

may require

loop-breaking

(120)

infl(bob,carl) & infl(ann,bob) & st(ann) & \+st(bob)   infl(bob,carl) & infl(ann,bob) & st(ann) & st(bob)  infl(bob,carl) & \+infl(ann,bob) & st(ann) & st(bob)  infl(bob,carl) & infl(ann,bob) & \+st(ann) & st(bob)  infl(bob,carl) & \+infl(ann,bob) & \+st(ann) & st(bob)  ...

Disjoint-Sum-Problem

(121)

influences(bob,carl) &

influences(ann,bob) & stress(ann)

(122)

Disjoint-Sum-Problem

influences(bob,carl) & stress(bob)

influences(bob,carl) &

(123)

Disjoint-Sum-Problem

influences(bob,carl) & stress(bob)

possible worlds

(124)

influences(bob,carl) & stress(bob)

possible worlds

sum of proof probabilities: 0.096+0.08 = 0.1760

0.0576

0.0384

0.0256

0.0096

0.0064

∑

= 0.1376

(125)

influences(bob,carl) & stress(bob)

possible worlds

sum of proof probabilities: 0.096+0.08 = 0.1760

0.0576

0.0384

0.0256

0.0096

0.0064

∑

= 0.1376

(126)

SAT

(

A

_

B

)

^

(

¬

B

_

C

)

_

A,B,C₂_{true,f alse_}

(

A

_

B

)

^

(

¬

B

_

C

)

_

c

₁

(

A, B

)

^

c

₂

(

B, C

)

_

^

i

c

_i

(

A, B, C

)

(127)

#SAT

Counting the number of satisfying assignments

X

Y

i

c

_i

(

A, B, C

)

X

(128)

Weighted Model Counting

W M C

( ) =

X

I

_V

|

=

Y

l

2

I

_V

(129)

Weighted Model Counting

propositional formula in conjunctive normal form (CNF)

W M C

( ) =

X

I

_V

|

=

Y

l

2

I

_V

(130)

interpretations (truth

value assignments) of

propositional variables

W M C

( ) =

X

I

_V

|

=

Y

l

2

I

_V

w

(

l

)

(131)

propositional variables

weight

of literal

W M C

( ) =

X

I

_V

|

=

Y

l

2

I

_V

w

(

l

)

(132)

propositional formula in conjunctive normal form (CNF)

weight

of literal

possible worlds

W M C

( ) =

X

I

_V

|

=

Y

l

2

I

_V

w

(

l

)

(133)

weight

of literal

possible worlds

for instance …

w(f) = p

w(not f) = 1

−

p

W M C

( ) =

X

I

_V

|

=

Y

l

2

I

_V

w

(

l

)

(134)

W M C

( ) =

X

I

_V

|

=

Y

l

2

I

_V

w

(

l

)

W M C

((

A

_

B

)

^

(

¬

A

_

¬

B

)) =

w

(

A

)

⇥

w

(

¬

B

) +

w

(

B

)

⇥

w

(

¬

B

)

(135)

WMC using d-DNNFs

1. represent formula as d-DNNF

(136)

Knowledge Compilation

•

Compile once - query many times …

•

The knowledge compilation map (Darwiche and

Marquis, JAIR 2002)

•

Different representations, computational complexity

of different operations, … (and, or, condition, ….)

•

in StarAI — OBDDs, d-DNNFs, SDD (Darwiche), …

(137)

Example:

First-Order Model Counting

Logical sentence Domain

•

If we know precisely who smokes, and there are k smokers

# models

Lifted Inference

n people

(138)

Roadmap

1. Modeling

2. Inference

3. Learning

(139)

(140)

Parameter Learning

class(Page,C) :- has_word(Page,W), word_class(W,C).

class(Page,C)

:- links_to(OtherPage,Page),

class(OtherPage,OtherClass),

for each

CLASS1, CLASS2

and each

WORD

?? :: link_class(Source,Target,

CLASS1

,

CLASS2

).

?? :: word_class(

WORD

,

CLASS

).

(141)

Sampling

(142)

Sampling

(143)

(144)

Parameter Estimation

p(

fact

) =

count(

fact

is true)

(145)

Learning from partial

interpretations

•

Not all facts observed

•

Soft-EM

•

use

expected count

instead of

count

(146)

Bayesian Parameter Learning

•

Learning as inference (e.g., Church)

•

Prior distributions for parameters

(147)

Information Extraction in NELL

(148)

Rule/Structure Learning

•

Upgrade rule-learning to a

probabilistic

setting

within a

relational learning setting

•

Exist for many of the formalisms (Markov Logic, ProbLog,

etc)

•

ProbFOIL works with a

probabilistic

logic program

[De

Raedt et al IJCAI 15]; it adapts Quinlan’s FOIL

(149)

(150)

Roadmap

1. Modeling

2. Inference

3. Learning

(151)

(152)

Biological Network Analysis

(153)

Homer

Marge

Bart

Lisa

Lenny

Apu

Moe

Maggie

?

+$5

-$3

Which

strategy

gives the

maximum

expected utility

?

Viral Marketing

Which advertising

strategy maximizes

expected profit?

(154)

Homer

Marge

Bart

Lisa

Lenny

Apu

Moe

Maggie

?

+$5

-$3

Which

strategy

gives the

maximum

expected utility

?

Viral Marketing

decide

truth values of

some atoms

(155)

DTProbLog

1

2

3

4

person(1). person(2). person(3). person(4). friend(1,2). friend(2,1). friend(2,4). friend(3,4). friend(4,2).

(156)

DTProbLog

? :: marketed(P) :- person(P). 

decision fact:

true or false?

1

2

3

4

(157)

DTProbLog

0.3 :: buy_trust(X,Y) :- friend(X,Y).  0.2 :: buy_marketing(P) :- person(P). 

buys(X) :- friend(X,Y), buys(Y), buy_trust(X,Y).  buys(X) :- marketed(X), buy_marketing(X).

probabilistic facts

+ logical rules

1

2

3

4

(158)

DTProbLog

buys(P) => 5 :- person(P). 

marketed(P) => -3 :- person(P).

utility facts:

cost/reward if true

1

2

3

4

(159)

DTProbLog

buys(P) => 5 :- person(P).  marketed(P) => -3 :- person(P).

1

2

3

4

(160)

DTProbLog

1

2

3

4

(161)

DTProbLog

1

2

3

4

person(1). person(2). person(3). person(4). friend(1,2). friend(2,1). friend(2,4). friend(3,4). friend(4,2). marketed(1) marketed(3)

(162)

DTProbLog

1

2

3

4

person(1). person(2). person(3). person(4). friend(1,2). friend(2,1). friend(2,4). friend(3,4). friend(4,2). marketed(1) marketed(3) bt(2,1) bt(2,4) bm(1)

(163)

DTProbLog

1

2

3

4

(164)

DTProbLog

1

2

3

4

utility =

−

3 +

−

3 + 5 + 5 = 4

probability = 0.0032

(165)

DTProbLog

1

2

3

4

utility =

−

3 +

−

3 + 5 + 5 = 4

probability = 0.0032

world contributes

0.0032

×

4 to

expected utility of

(166)

DTProbLog

1

2

3

4

task:

find strategy that maximizes expected utility

(167)

l Causes: Mutations l All related to similar

phenotype

l Effects: Differentially expressed

genes

l 27 000 cause effect pairs

l Interaction network: l 3063 nodes l Genes l Proteins l 16794 edges l Molecular interactions Uncertain

l Goal: connect causes to effects

through common subnetwork

l = Find mechanism l Techniques:

l DTProbLog

l Approximate inference

Phenetic

(168)

Answering Probabilistic

Exam Questions

(169)

Our goal

(170)

Our goal

(171)

Our goal

The answer is 0.235941.

combination of mathematics

(172)

Inspiration

(173)

Inspiration

Aristo

Euclid / Geos

(174)

(175)

Our approach

(176)

Our approach

natural language

off-the-shelf NLP tools

+ rule-based system

(177)

Our approach

natural language

+ rule-based system

(178)

Our approach

natural language

+ rule-based system

specification language

solver

(179)

Our approach

natural language

+ rule-based system

specification language

solver

solution

(180)

Example

What is the probability that

the second marble he t