• No results found

Generative and Discriminative Learning. Machine Learning

N/A
N/A
Protected

Academic year: 2022

Share "Generative and Discriminative Learning. Machine Learning"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

Machine Learning

Generative and Discriminative

Learning

(2)

What we saw most of the semester

• A fixed, unknown distribution D over 𝑋 → 𝑌

– X: Instance space, Y: label space (eg: {+1, -1})

• Given a dataset S = {(x

i

, y

i

)}

• Learning

– Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization)

• The guarantee

– If we find an algorithm that minimizes loss on the observed data

– Then, learning theory guarantees good future behavior (as a function of H)

(3)

What we saw most of the semester

• A fixed, unknown distribution D over 𝑋 → 𝑌

– X: Instance space, Y: label space (eg: {+1, -1})

• Given a dataset S = {(x

i

, y

i

)}

• Learning

– Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization)

• The guarantee

– If we find an algorithm that minimizes loss on the observed data

– Then, learning theory guarantees good future behavior (as a function of H)

Is this different from assuming a distribution over X and a fixed oracle function f?

(4)

What we saw most of the semester

• A fixed, unknown distribution D over 𝑋 → 𝑌

– X: Instance space, Y: label space (eg: {+1, -1})

• Given a dataset S = {(x

i

, y

i

)}

• Learning

– Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization)

• The guarantee

– If we find an algorithm that minimizes loss on the observed data

– Then, learning theory guarantees good future behavior (as a function of H)

Is this different from assuming a distribution over X and a fixed oracle function f?

(5)

What we saw most of the semester

• A fixed, unknown distribution D over 𝑋 → 𝑌

– X: Instance space, Y: label space (eg: {+1, -1})

• Given a dataset S = {(x

i

, y

i

)}

• Learning

– Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization)

• The guarantee

– If we find an algorithm that minimizes loss on the observed data

– Then, learning theory guarantees good future behavior (as a function of H)

Is this different from assuming a distribution over X and a fixed oracle function f?

(6)

What we saw most of the semester

• A fixed, unknown distribution D over 𝑋 → 𝑌

– X: Instance space, Y: label space (eg: {+1, -1})

• Given a dataset S = {(x

i

, y

i

)}

• Learning

– Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization)

• The guarantee

– If we find an algorithm that minimizes loss on the observed data

– Then, learning theory guarantees good future behavior (as a function of H)

Is this different from assuming a distribution over X and a fixed oracle function f?

(7)

Discriminative models

Goal: learn directly how to make predictions

• Look at many (positive/negative) examples

• Discover regularities in the data

• Use these to construct a prediction policy

• Assumptions come in the form of the hypothesis class Bottom line: approximating ℎ: 𝑋 → 𝑌 is estimating the

conditional probability 𝑃(𝑌|𝑋)

7

(8)

Generative models

• Explicitly model how instances in each category are generated by modeling the joint probability of X and Y, that is 𝑃(𝑌, 𝑋)

• That is, learn 𝑃(𝑋|𝑌) and 𝑃(𝑌)

• The naïve Bayes classifier does this

– Naïve Bayes is a generative model

• Predict 𝑃(𝑌|𝑋) using the Bayes rule

(9)

Example: Generative story of naïve Bayes

(10)

Example: Generative story of naïve Bayes

Y P(Y)

First sample a label

(11)

Example: Generative story of naïve Bayes

Y P(Y)

X

1

P(X

1

| Y)

Given the label, sample the features independently from the conditional distributions

(12)

Example: Generative story of naïve Bayes

Y P(Y)

X

1

P(X

1

| Y)

X

2

P(X

2

| Y)

Given the label, sample the features independently from the conditional distributions

(13)

Example: Generative story of naïve Bayes

Y P(Y)

X

1

P(X

1

| Y)

X

2

P(X

2

| Y)

X

3

P(X

3

| Y)

Given the label, sample the features independently from the conditional distributions

(14)

Example: Generative story of naïve Bayes

Y P(Y)

X

1

P(X

1

| Y)

X

2

P(X

2

| Y)

X

3

P(X

3

| Y)

. . .

Given the label, sample the features independently from the conditional distributions

(15)

Example: Generative story of naïve Bayes

Y P(Y)

X

1

P(X

1

| Y)

X

2

P(X

2

| Y)

X

3

P(X

3

| Y)

X

d

P(X

d

| Y)

. . .

Given the label, sample the features independently from the conditional distributions

(16)

• Generative models

learn P(x, y)

– Use the capacity of the model to characterize how the data is generated (both inputs and outputs)

– Eg: Naïve Bayes, Hidden Markov Model

• Discriminative models

learn P(y | x)

– Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most

neural models

Generative vs Discriminative models

(17)

• Generative models

learn P(x, y)

– Use the capacity of the model to characterize how the data is generated (both inputs and outputs)

– Eg: Naïve Bayes, Hidden Markov Model

• Discriminative models

learn P(y | x)

– Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most

neural models

Generative vs Discriminative models

(18)

• Generative models

learn P(x, y)

– Use the capacity of the model to characterize how the data is generated (both inputs and outputs)

– Eg: Naïve Bayes, Hidden Markov Model

• Discriminative models

learn P(y | x)

– Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most

neural models

Generative vs Discriminative models

(19)

• Generative models

learn P(x, y)

– Use the capacity of the model to characterize how the data is generated (both inputs and outputs)

– Eg: Naïve Bayes, Hidden Markov Model

• Discriminative models

learn P(y | x)

– Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most

neural models

Generative vs Discriminative models

(20)

• Generative models

learn P(x, y)

– Use the capacity of the model to characterize how the data is generated (both inputs and outputs)

– Eg: Naïve Bayes, Hidden Markov Model

• Discriminative models

learn P(y | x)

– Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most

neural models

Generative vs Discriminative models

A generative model tries to characterize the distribution of the inputs, a

discriminative model doesn’t care

References

Related documents

The purpose of this integrative review is to present the state of evidence to healthcare providers regarding the effectiveness of DSMES and telehealth to improve health outcomes

ALL PARTIES RECEIVING THIS SUPPLEMENTAL EXHIBIT SHOULD LOCATE THEIR NAMES, CONTRACTS, AND OR LEASES IN THIS CHART TO DETERMINE IF THEIR LEASE OR CONTRACT IS BEING ASSUMED AND.

Berdasarkan latar belakang yang telah diuraikan diatas, maka rumusan masalah dalam penelitian ini yaitu Re Desain Laporan Keuangan Koperasi Berdasarkan Peraturan

A strong relationship has been reported between foliar isoprene emissions and the photochemical reflectance index (PRI) [ 9 , 10 ], a remotely sensed vegetation index based on the

Resumo – O objetivo deste trabalho foi avaliar a patogenicidade de Beauveria bassiana a ninfas de Diaphorina citri (Hemiptera: Psyllidae) e verificar a compatibilidade do

In short, this study seeks to bridge the knowledge gap5 by promoting a multi-objective optimisation model customised to the strategic and operational asset liability

Patients were defined as ‘ retained in care ’ if they were alive and known to be still receiving medical care at the time of the study, including those considered as “ stopped..

The objective of this study was to test the effects of feeding canola meal or DDGS on feed intake, milk production and composition, total-tract digest- ibility, and energy