• No results found

08_Linear_Classification_2.pdf

N/A
N/A
Protected

Academic year: 2020

Share "08_Linear_Classification_2.pdf"

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Outlines

Overview Introduction Linear Algebra Probability Linear Regression 1 Linear Regression 2 Linear Classification 1 Linear Classification 2 Kernel Methods Sparse Kernel Methods Neural Networks 1 Neural Networks 2 Continuous Latent Variables Autoencoders Graphical Models 1 Graphical Models 2 Graphical Models 3 Sampling Mixture Models and EM 1 Mixture Models and EM 2 Sequential Data 1 Sequential Data 2 Combining Models

Introduction to Statistical Machine Learning

Cheng Soon Ong & Christian Walder

Machine Learning Research Group Data61 | CSIRO

and

College of Engineering and Computer Science The Australian National University

Canberra February – June 2017

(2)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Part VIII

(3)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Three Models for Decision Problems

In increasing order of complexity

Find a discriminant functionf(x)which maps each input directly onto a class label.

Discriminative Models

1 Solve the inference problem of determining the posterior

class probabilitiesp(Ck|x).

2 Use decision theory to assign each newxto one of the

classes. Generative Models

1 Solve the inference problem of determining the

class-conditional probabilitiesp(x| Ck).

2 Also, infer the prior class probabilitiesp(Ck). 3 Use Bayes’ theorem to find the posteriorp(C

k|x).

4 Alternatively, model the joint distributionp(x,C

k)directly.

5 Use decision theory to assign each newxto one of the

(4)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Probabilistic Generative Models

Generative approach: model class-conditional densities p(x| Ck)and priorsp(Ck)to calculate the posterior probability for classC1

p(C1|x) =

p(x| C1)p(C1)

p(x| C1)p(C1) +p(x| C2)p(C2)

= 1

1+exp(−a(x)) =σ(a(x))

whereaand thelogistic sigmoidfunctionσ(a)are given by

a(x) =lnp(x| C1)p(C1)

p(x| C2)p(C2)

=lnp(x,C1)

p(x,C2)

σ(a) = 1

(5)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Logistic Sigmoid

Thelogistic sigmoidfunctionσ(a) = 1 1+exp(−a)

"squashing function’ because it maps the real axis into a finite interval(0,1)

σ(−a) =1−σ(a) Derivative d

daσ(a) =σ(a)σ(−a) =σ(a) (1−σ(a)) Inverse is calledlogitfunctiona(σ) =ln1σσ

-10 -5 5 10 a

0.2 0.4 0.6 0.8 1.0

ΣHaL

Logistic Sigmoidσ(a)

0.2 0.4 0.6 0.8 1.0Σ

-6

-4

-2 2 4 aHΣL

(6)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Probabilistic Generative Models - Multiclass

Thenormalised exponentialis given by

p(Ck|x) =

p(x| Ck)p(Ck) P

jp(x| Cj)p(Cj)

= Pexp(ak) jexp(aj)

where

ak=ln(p(x| Ck)p(Ck)).

Also calledsoftmax functionas it is a smoothed version of themaxfunction.

(7)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Probabil. Generative Model - Continuous Input

Assume class-conditional probabilities are Gaussian, all classes share thesame covariance. What can we say about the posterior probabilities?

p(x| Ck) =

1

(2π)D/2

1 |Σ|1/2exp

−1

2(x−µk)

TΣ−1(xµ

k)

= 1

(2π)D/2

1 |Σ|1/2exp

−1 2x

TΣ−1x

×exp

µTkΣ−1x−1

T kΣ

−1

µk

(8)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University Probabilistic Generative Models Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Probabil. Generative Model - Continuous Input

For two classes

p(C1|x) =σ(a(x))

anda(x)is

a(x) =lnp(x| C1)p(C1)

p(x| C2)p(C2)

=lnexp

µT

−1x1 2µ

T

−1µ 1

exp

µT

−1x1 2µ

T

−1µ 2

+lnp(C1)

p(C2)

Therefore

p(C1|x) =σ(wTx+w0)

where

w=Σ−1(µ1−µ2)

w0 =−

1 2µ

T

−1µ 1+

1 2µ

T

−1µ 2+ln

p(C1)

(9)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Probabil. Generative Model - Continuous Input

Class-conditional densities for two classes (left). Posterior probabilityp(C1|x)(right). Note the logistic sigmoid of a linear

(10)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

General Case - K Classes, Shared Covariance

Use the normalised exponential

p(Ck|x) =

p(x| Ck)p(Ck) P

jp(x| Cj)p(Cj)

= Pexp(ak) jexp(aj)

where

ak=ln(p(x| Ck)p(Ck)).

to get a linear function ofx

ak(x) =wTkx+wk0.

where

wk=Σ−1µk

wk0=−

1 2µ

T kΣ

−1

(11)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

General Case - K Classes, Different Covariance

If each class-conditional probability has adifferent covariance, the quadratic terms−1

2x

TΣ−1x

do not longer cancel each other out.

We get aquadraticdiscriminant.

−2 −1 0 1 2

(12)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Maximum Likelihood Solution

Given the functional form of the class-conditional densities p(x| Ck), can we determine the parametersµandΣ?

Not without data ;-)

Given also a data set(xn,tn)forn=1, . . . ,N. (Using the coding scheme wheretn=1corresponds to classC1 and

tn=0denotes classC2.

Assume the class-conditional densities to be Gaussian with the same covariance, but different mean.

Denote the prior probabilityp(C1) =π, and therefore

p(C2) =1−π.

Then

p(xn,C1) =p(C1)p(xn| C1) =πN(xn|µ1,Σ)

(13)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Maximum Likelihood Solution

Given the functional form of the class-conditional densities p(x| Ck), can we determine the parametersµandΣ? Not without data ;-)

Given also a data set(xn,tn)forn=1, . . . ,N. (Using the coding scheme wheretn=1corresponds to classC1 and

tn=0denotes classC2.

Assume the class-conditional densities to be Gaussian with the same covariance, but different mean.

Denote the prior probabilityp(C1) =π, and therefore

p(C2) =1−π.

Then

p(xn,C1) =p(C1)p(xn| C1) =πN(xn|µ1,Σ)

(14)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Maximum Likelihood Solution

Thus the likelihood for the whole data setXandtis given by

p(t,X|π,µ12,Σ) = N Y

n=1

[πN(xn|µ1,Σ)]

tn×

[(1−π)N(xn|µ2,Σ)] 1−tn

Maximise the log likelihood The term depending onπis

N X

n=1

(tnlnπ+ (1−tn)ln(1−π)

which is maximal for

π= 1 N

N X

n=1

tn= N1

N = N1

N1+N2

(15)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Maximum Likelihood Solution

Similarly, we can maximise the log likelihood (and thereby the likelihoodp(t,X|π,µ1,µ2,Σ)) depending on the mean

µ1orµ2, and get

µ1= 1

N1

N X

n=1

tnxn

µ2= 1

N2

N X

n=1

(1−tn)xn

(16)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Maximum Likelihood Solution

Finally, the log likelihoodlnp(t,X|π,µ12,Σ)can be maximised for the covarianceΣresulting in

Σ=N1

NS1+ N2

NS2

Sk=

1

Nk X

n∈Ck

(17)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input

Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Discrete Features - Naive Bayes

Assume the input space consists of discrete features, in the simplest casexi∈ {0,1}.

For aD-dimensional input space, a general distribution would be represented by a table with2Dentries.

Together with the normalisation constraint, this are2D1 independent variables.

Grows exponentially with the number of features. TheNaive Bayesassumption is that all features

conditioned on the classCkare independent of each other.

p(x| Ck) = D Y

i=1

µxi

ki(1−µki)

(18)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input

Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Discrete Features - Naive Bayes

With the naive Bayes

p(x| Ck) = D Y

i=1

µxi

ki(1−µki)

1−xi

we can then again find the factorsakin the normalised exponential

p(Ck|x) =

p(x| Ck)p(Ck) P

jp(x| Cj)p(Cj)

= Pexp(ak) jexp(aj)

as a linear function of thexi

ak(x) = D X

i=1

(19)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input

Discrete Features

Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Three Models for Decision Problems

In increasing order of complexity

Find a discriminant functionf(x)which maps each input directly onto a class label.

Discriminative Models

1 Solve the inference problem of determining the posterior

class probabilitiesp(Ck|x).

2 Use decision theory to assign each newxto one of the

classes. Generative Models

1 Solve the inference problem of determining the

class-conditional probabilitiesp(x| Ck).

2 Also, infer the prior class probabilitiesp(Ck). 3 Use Bayes’ theorem to find the posteriorp(C

k|x).

4 Alternatively, model the joint distributionp(x,C

k)directly.

5 Use decision theory to assign each newxto one of the

(20)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input

Discrete Features

Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Probabilistic Discriminative Models

Maximise a likelihood function defined through the conditional distributionp(Ck|x)directly.

Discriminativetraining

Typically fewer parameters to be determined.

As we learn the posterirorp(Ck|x)directly, prediction may be better than with a generative model where the

class-conditional density assumptionsp(x| Ck)poorly approximate the true distributions.

(21)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input

Discrete Features

Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Original Input versus Feature Space

Used direct inputxuntil now.

All classification algorithms work also if we first apply a fixed nonlinear transformation of the inputs using a vector of basis functionsφ(x).

Example: Use two Gaussian basis functions centered at the green crosses in the input space.

x2

−1 0 1

φ2

(22)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input

Discrete Features

Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Original Input versus Feature Space

Linear decision boundaries in the feature space

correspond to nonlinear decision boundaries in the input space.

Classes which are NOT linearly separable in the input space can become linearly separable in the feature space. BUT: If classes overlap in input space, they will also overlap in feature space.

Nonlinear featuresφ(x)can not remove the overlap; but they may increase it !

x1

x2

−1 0 1

−1 0 1

φ1

φ2

0 0.5 1

(23)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input

Discrete Features

Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Original Input versus Feature Space

Fixed basis functions do not adapt to the data and therefore have important limitations (see discussion in Linear Regression).

Understanding of more advanced algorithms becomes easier if we introduce the feature space now and use it instead of the original input space.

Some applications use fixed features successfully by avoiding the limitations.

(24)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features

Probabilistic Discriminative Models

Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Logistic Regression is Classification

Two classes where the posterior of classC1is a logistic

sigmoidσ()acting on a linear function of the feature vector

φ

p(C1|φ) =y(φ) =σ(wTφ)

p(C2|φ) =1−p(C1|φ)

Model dimension is equal to dimension of the feature spaceM.

Compare this to fitting two Gaussians

2M |{z} means

+M(M+1)/2

| {z }

shared covariance

=M(M+5)/2

(25)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features

Probabilistic Discriminative Models

Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Logistic Regression is Classification

Determine the parameter via maximum likelihood for data (φn,tn),n=1, . . . ,N, whereφn=φ(xn). The class membership is coded astn ∈ {0,1}.

Likelihood function

p(t|w) = N Y

n=1

ytn

n(1−yn)1−tn

whereyn=p(C1|φn).

Error function : negative log likelihood resulting in the cross-entropyerror function

E(w) =−lnp(t|w) =−

N X

n=1

(26)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features

Probabilistic Discriminative Models

Logistic Regression Iterative Reweighted Least Squares Laplace Approximation Bayesian Logistic Regression

Logistic Regression is Classification

Error function (cross-entropyerror )

E(w) =−

N X

n=1

{tnlnyn+ (1−tn)ln(1−yn)}

yn=p(C1|φn) =σ(wTφn)

Gradient of the error function (using dσ

da =σ(1−σ))

∇E(w) = N X

n=1

(yn−tn)φn

gradient does not contain any sigmoid function

for each data point error is product of deviationyn−tn and basis functionφn.

(27)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression

Iterative Reweighted Least Squares

Laplace Approximation Bayesian Logistic Regression

Laplace Approximation

Given a continous distributionp(x)which is not Gaussian, can we approximate it by a Gaussianq(x)?

Need to find a mode ofp(x). Try to find a Gaussian with the same mode.

−2 −1 0 1 2 3 4

0 0.2 0.4 0.6 0.8

Non-Gaussian (yellow) and Gaussian approximation (red).

−2 −1 0 1 2 3 4

0 10 20 30 40

Negative log of the Non-Gaussian (yellow) and

(28)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression

Iterative Reweighted Least Squares

Laplace Approximation Bayesian Logistic Regression

Laplace Approximation

Assumep(x)can be written as

p(z) = 1 Z f(z)

with normalisationZ=R

f(z)dz.

Furthermore, assumeZis unknown !

A mode ofp(z)is at a pointz0wherep0(z0) =0.

Taylor expansion oflnf(z)atz0

lnf(z)'lnf(z0)−

1

2A(z−z0)

2

where

A=−d

2

(29)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression

Iterative Reweighted Least Squares

Laplace Approximation Bayesian Logistic Regression

Laplace Approximation

Exponentiating

lnf(z)'lnf(z0)−

1

2A(z−z0)

2

we get

f(z)'f(z0)exp{−

A

2(z−z0)

2}.

And after normalisation we get theLaplace approximation

q(z) =

A

2π 1/2

exp{−A 2(z−z0)

2}.

(30)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression

Iterative Reweighted Least Squares

Laplace Approximation Bayesian Logistic Regression

Laplace Approximation - Vector Space

Approximate p(z) forz∈RM

p(z) = 1 Z f(z).

we get the Taylor expansion

lnf(z)'lnf(z0)−

1

2(z−z0)

TA(zz

0)

where the HessianAis defined as

A=−∇∇lnf(z)|z=z0.

The Laplace approximation ofp(z)is then

q(z) = |A|

1/2

(2π)M/2exp

−1

2(z−z0)

TA(zz

0)

(31)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares

Laplace Approximation

Bayesian Logistic Regression

Bayesian Logistic Regression

Exact Bayesian inference for the logistic regression is intractable.

Why? Need to normalise a product of prior probabilities and likelihoods which itself are a product of logistic sigmoid functions, one for each data point.

(32)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares

Laplace Approximation

Bayesian Logistic Regression

Bayesian Logistic Regression

Assume a Gaussian prior because we want a Gaussian posterior.

p(w) =N(w|m0,S0)

for fixedhyperparameterm0 andS0.

Hyperparametersare parameters of a prior distribution. In contrast to the model parametersw, they are not learned. For a set of training data(xn,tn), wheren=1, . . . ,N, the posterior is given by

p(w|t)∝p(w)p(t|w)

(33)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares

Laplace Approximation

Bayesian Logistic Regression

Bayesian Logistic Regression

Using our previous result for the cross-entropy function

E(w) =−lnp(t|w) =−

N X

n=1

{tnlnyn+ (1−tn)ln(1−yn)}

we can now calculate the log of the posterior p(w|t)∝p(w)p(t|w)

using the notationyn=σ(wTφn)as

lnp(w|t) =−1

2(w−m0)

TS−1

0 (w−m0)

+ N X

n=1

(34)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares

Laplace Approximation

Bayesian Logistic Regression

Bayesian Logistic Regression

To obtain a Gaussian approximation to

lnp(w|t) =−1

2(w−m0)

TS−1

0 (w−m0)

+ N X

n=1

{tnlnyn+ (1−tn)ln(1−yn)}

1 FindwMAPwhich maximiseslnp(w|t). This defines the

mean of the Gaussian approximation. (Note: This is a nonlinear function inwbecauseyn=σ(wTφn).)

2 Calculate the second derivative of the negative log likelihood

to get the inverse covariance of the Laplace approximation

SN =−∇∇lnp(w|t) =S−01+

N X

n=1

(35)

Introduction to Statistical Machine Learning

c 2016 Ong & Walder Data61 | CSIRO The Australian National

University

Probabilistic Generative Models

Continuous Input Discrete Features Probabilistic Discriminative Models Logistic Regression Iterative Reweighted Least Squares

Laplace Approximation

Bayesian Logistic Regression

Bayesian Logistic Regression

The approximated Gaussian (via Laplace approximation) of the posterior distribution is now

q(w|φ) =N(w|wMAP,SN)

where

SN =−∇∇lnp(w|t) =S−01+

N X

n=1

References

Related documents

Eplerenone has shown significant benefits for use in two specific heart failure patient popu- lations, ie, acute myocardial infarction with symptoms of heart failure

The best method of using Internet sites in online public relations activities is to create a special section directed to the media, so called press rooms or press releases.. The aim

However, in Morse’s quest to determine if he is able to piggyback on Edison’s FDA approval for Edison’s patented method of using a catheter without liability,

That said, a broader exacerbation risk study is nearing completion that is not limited to patients with blood eosinophil data and which examines the risks for multiple

Intergroup contact (seeking) Self-expansion motives Confidence in contact History of contact Personality Cognitive ability Perspective taking Polyculturalism

The objec- tives of this study were to determine how different seeding rates and application rates of mepiquat- type plant growth regulator compounds (PGR) affected cotton growth

The development of high-speed rail (HSR) has had a notable impact on modal market shares on the routes on which its services have been implemented. We

119 The following reasons illustrate the difficulty in addressing cybercrime: the lack of tools for the use of police to tackle the problem; the fact that the 'old' laws do not fit