• No results found

Review: Classification Outline

N/A
N/A
Protected

Academic year: 2021

Share "Review: Classification Outline"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Mining Data Mining

CS 341, Spring 2007 CS 341, Spring 2007

Lecture 6: Classification

Lecture 6: Classification – – issues, issues, regression,

regression, bayesian bayesian classification classification

© Prentice Hall 2

Review:

Review:

nn

Decision Trees Decision Trees

n

n

Neural networks Neural networks

© Prentice Hall 3

Data Mining

Data Mining Core Techniques Core Techniques

n

n

Classification Classification

nn ClusteringClustering

n

n Association RulesAssociation Rules

© Prentice Hall 4

Classification Outline Classification Outline

n

n

Classification Problem Overview Classification Problem Overview

n

n

Classification Techniques Classification Techniques

–RegressionRegression –

–Bayesian classificationBayesian classification ––DistanceDistance

–Decision TreesDecision Trees –

–RulesRules –

–Neural NetworksNeural Networks Goal:

Goal: Provide an overview of the classification Provide an overview of the classification problem and introduce some of the basic problem and introduce some of the basic algorithms

algorithms

© Prentice Hall 5

Classification Outline Classification Outline

n

n

Classification Problem Overview Classification Problem Overview

n

n

Classification Techniques Classification Techniques

––RegressionRegression

–Bayesian classificationBayesian classification Goal:

Goal: Provide an overview of the classification Provide an overview of the classification problem and introduce some of the basic problem and introduce some of the basic algorithms

algorithms

© Prentice Hall 6

Classification Problem Classification Problem

nn

Given a database D={t Given a database D={t

11

,t ,t

22

, ,… …, ,t t

nn

} and a set } and a set of classes C={C

of classes C={C

11

,… , …,C ,C

mm

}, the }, the

Classification Problem

Classification Problem

is to define a is to define a mapping

mapping f:D f:D C C where each where each t t

ii

is assigned is assigned to one class.

to one class.

n

n

Actually divides D into equivalence Actually divides D into

equivalence classes

classes.

.

n

nPredictionPrediction

is is similar, but may be viewed similar, but may be viewed as having infinite number of classes.

as having infinite number of classes.

(2)

© Prentice Hall 7

Classification Examples Classification Examples

n

n

Teachers classify students Teachers classify students’ ’ grades as grades as A, B, C, D, or F.

A, B, C, D, or F.

n

n

Identify mushrooms as poisonous or Identify mushrooms as poisonous or edible.

edible.

n

n

Predict when a river will flood. Predict when a river will flood.

nn

Identify individuals with credit risks. Identify individuals with credit risks.

n

n

Speech recognition Speech recognition

n

n

Pattern recognition Pattern recognition

© Prentice Hall 8

Classification Ex: Grading Classification Ex: Grading

nn If x >= 90 then grade If x >= 90 then grade

=A.=A.

nn If 80<=x<90 then If 80<=x<90 then grade =B.

grade =B.

n

n If 70<=x<80 then If 70<=x<80 then grade =C.

grade =C.

n

n If 60<=x<70 then If 60<=x<70 then grade =D.

grade =D.

nn If x<60 then grade =F.If x<60 then grade =F.

>=90

<90 x

>=80

<80 x

>=70

<70 x

F

B A

>=60

<50

x C

D

© Prentice Hall 9

Classification Ex: Letter Classification Ex: Letter

Recognition Recognition

View letters as constructed from 5 components:

Letter C Letter E Letter A

Letter D Letter F Letter B

© Prentice Hall 10

Classification Techniques Classification Techniques

n

n

Approach: Approach:

1. 1. Create specific model by evaluating Create specific model by evaluating training data (or using domain training data (or using domain experts

experts’ ’ knowledge). knowledge).

2. 2. Apply model developed to new data. Apply model developed to new data.

nn

Classes must be predefined Classes must be predefined

n

n

Most common techniques use DTs, Most common techniques use DTs, NNs

NNs, or are based on distances or , or are based on distances or statistical methods.

statistical methods.

Defining Classes Defining Classes

Partitioning Based

Distance Based

Issues in Classification Issues in Classification

n

n Missing DataMissing Data

IgnoreIgnore

Replace with assumed valueReplace with assumed value nn OverfittingOverfitting

Large set of training dataLarge set of training data

Filter out erroneous or noisy dataFilter out erroneous or noisy data n

n Measuring PerformanceMeasuring Performance Classification accuracy on test dataClassification accuracy on test data Confusion matrixConfusion matrix

OC CurveOC Curve

(3)

© Prentice Hall 13

Classification Accuracy Classification Accuracy

nn True positive (TP)True positive (TP)

ttiiPredicted to be in Predicted to be in CCjjand is actually in it.and is actually in it.

n

n False positive (FP)False positive (FP)

ttiiPredicted to be in Predicted to be in CCjjbut is not actually in it.but is not actually in it.

n

n True negative (TN)True negative (TN)

ttiinot predicted to be in not predicted to be in CCjjand is not actually in it.and is not actually in it.

n

n False negative (FN)False negative (FN)

ttiinot predicted to be in not predicted to be in CCjjbut is actually in it.but is actually in it.

© Prentice Hall 14

Classification Performance Classification Performance

True Positive

True Negative False Positive

False Negative

© Prentice Hall 15

Confusion Matrix Confusion Matrix

nn

An m x m matrix An m x m matrix

n

n

Entry Entry C C

i,ji,j

indicates the number of tuples indicates the number of tuples assigned to

assigned to C C

jj, ,

but but where the correct where the correct class is

class is C C

ii

n

n

The best solution will only have non The best solution will only have non- - zero values on the diagonal.

zero values on the diagonal.

© Prentice Hall 16

Height Example Data Height Example Data

N a m e G e n d e r H e ig h t O u tp u t1 O u t p u t2 K ris tin a F 1 .6 m S h o rt M e d iu m

J im M 2 m T a ll M e d iu m

M a g g ie F 1 .9 m M e d iu m T a ll M a rth a F 1 .8 8 m M e d iu m T a ll S te p h a n ie F 1 .7 m S h o rt M e d iu m B o b M 1 .8 5 m M e d iu m M e d iu m K a th y F 1 .6 m S h o rt M e d iu m D a v e M 1 .7 m S h o rt M e d iu m

W o r th M 2 .2 m T a ll T a ll

S te v e n M 2 .1 m T a ll T a ll D e b b ie F 1 .8 m M e d iu m M e d iu m T o d d M 1 .9 5 m M e d iu m M e d iu m K im F 1 .9 m M e d iu m T a ll A m y F 1 .8 m M e d iu m M e d iu m W y n e tte F 1 .7 5 m M e d iu m M e d iu m

© Prentice Hall 17

Confusion Matrix Example Confusion Matrix Example

Using height data example with Output1 Using height data example with Output1

(correct) and Output2 (actual) assignment (correct) and Output2 (actual) assignment

Actual Assignment Membership Short Medium Tall

Short 0 4 0

Medium 0 5 3

Tall 0 1 2

© Prentice Hall 18

Operating Characteristic Curve

Operating Characteristic Curve

(4)

© Prentice Hall 19

Classification Outline Classification Outline

n

n

Classification Problem Overview Classification Problem Overview

nn

Classification Techniques Classification Techniques

–RegressionRegression –

–DistanceDistance –

–Decision TreesDecision Trees –

–RulesRules

––Neural NetworksNeural Networks Goal:

Goal: Provide an overview of the classification Provide an overview of the classification problem and introduce some of the basic problem and introduce some of the basic algorithms

algorithms

© Prentice Hall 20

Regression Regression

nn Assume data fits a predefined functionAssume data fits a predefined function

n

n Determine best values for parameters in the Determine best values for parameters in the model

model

n Estimate an output value based on input values

n Can be used for classification and prediction

© Prentice Hall 21

Linear Regression Linear Regression

n

n Assume the relation of the output variable to Assume the relation of the output variable to the input variables is a linear function of some the input variables is a linear function of some parameters.

parameters.

nn Determine best values for Determine best values for regression regression coefficients

coefficientscc00,c,c11,…,…,c,cnn..

nn Assume an error: y = cAssume an error: y = c00+c+c11xx11+…+…++ccnnxxnn

n Estimate error using mean squared error for training set:

© Prentice Hall 22

Example: 4.3 Example: 4.3

n

n

Y = C Y = C

0 0

+ +

εε

n

n

Find the value for c Find the value for c

00

that best partition that best partition the height values into classes: short and the height values into classes: short and medium

medium

n

n

The training data for y The training data for y

ii

is is

{1.6, 1.9, 1.88, 1.7, 1.85, 1.6, 1.7, 1.8, 1.95, {1.6, 1.9, 1.88, 1.7, 1.85, 1.6, 1.7, 1.8, 1.95,

1.9, 1.8, 1.75}

1.9, 1.8, 1.75}

nn

How ? How ?

Example: 4.4 Example: 4.4

n

n Y = cY = c0 0 + c+ c0 0 xx11+ + εε

nn Find the value for cFind the value for c0 0 and cand c11that best predict that best predict the class.

the class.

n

n Assume 0 for the short class, 1 for the Assume 0 for the short class, 1 for the medium class

medium class

n

n The training data for (xThe training data for (xii, , yyii))isis

{{(1.6,0), (1.9,0) , (1.88, 0), (1.7, 0), (1.85, 0), (1.6, 0), (1.(1.6,0), (1.9,0) , (1.88, 0), (1.7, 0), (1.85, 0), (1.6, 0), (1.7,0), (1.8,0), 7,0), (1.8,0), (1.95, 0), (1.9, 0), (1.8, 0), (1.75, 0)

(1.95, 0), (1.9, 0), (1.8, 0), (1.75, 0)}}

n n How ?How ?

Linear Regression Poor Fit

Linear Regression Poor Fit

(5)

© Prentice Hall 25

Classification Using Regression Classification Using Regression

nn Division:Division:

Use regression function to Use regression function to divide area into regions.

divide area into regions.

n

n PredictionPrediction: Use regression function to

: Use regression function to predict a class membership function.

predict a class membership function.

© Prentice Hall 26

Division Division

© Prentice Hall 27

Prediction Prediction

© Prentice Hall 28

Logistic Regression Logistic Regression

n

n A generalized linear modelA generalized linear model

n

n Extensively used in the medical and social Extensively used in the medical and social sciences

sciences

nn It has the following formIt has the following form Log

Logee(p /p (p /p --1) = c1) = c00+ c+ c11xx1 1 + + …+ c+ ckkxxkk

ppis the probability of being in the class, 1 is the probability of being in the class, 1 –p is the p is the probability that is not.

probability that is not.

The parameters c

The parameters c00, c, c11, , …cckkare usually estimated by are usually estimated by maximum likelihood. (maximize the probability of maximum likelihood. (maximize the probability of observing the given value.)

observing the given value.)

© Prentice Hall 29

Why Logistic Regression Why Logistic Regression

n

n P is in the range [0,1]P is in the range [0,1]

A good model would like to have p value close to A good model would like to have p value close to 0 or 1

0 or 1 n

n Linear function is not suitable for p Linear function is not suitable for p

n

n Consider the odds p/1Consider the odds p/1--p. p.

As p increases, the odds (p/1As p increases, the odds (p/1--p) increasesp) increases The odds is in the range of [0, +The odds is in the range of [0, +∞∞], asymmetric.], asymmetric.

The log odds lies in the range The log odds lies in the range --to to ++∞, , symmetric.

symmetric.

© Prentice Hall 30

Linear Regression vs. Logistic Linear Regression vs. Logistic

Regression

Regression

(6)

© Prentice Hall 31

Classification Outline Classification Outline

n

n

Classification Problem Overview Classification Problem Overview

nn

Classification Techniques Classification Techniques

–RegressionRegression –

–Bayesian classification Bayesian classification Goal:

Goal: Provide an overview of the classification Provide an overview of the classification problem and introduce some of the basic problem and introduce some of the basic algorithms

algorithms

© Prentice Hall 32

Bayes Theorem Bayes Theorem

nn Posterior Probability:Posterior Probability:P(hP(h1|x|xi))

n

n Prior Probability:Prior Probability:P(hP(h1))

n

n Bayes Theorem:Bayes Theorem:

nn Assign probabilities of hypotheses given a Assign probabilities of hypotheses given a data value.

data value.

© Prentice Hall 33

Na

Naï ï ve Bayes ve Bayes Classification Classification

nn

Assume that the contribution by all Assume that the contribution by all attributes are independent and that attributes are independent and that each contributes equally to the each contributes equally to the classification problem.

classification problem.

n

n

t t

ii

has m independent attributes has m independent attributes

{x{xi1i1,,……, , xximim,}.,}.

P (P (ttii| | CCjj)) ∏∏P (P (xxikik| | CCjj))

© Prentice Hall 34

Example: using the output1 as Example: using the output1 as

classification results classification results

N a m e G e n d e r H e ig h t O u tp u t1 O u t p u t2 K ris tin a F 1 .6 m S h o rt M e d iu m

J im M 2 m T a ll M e d iu m

M a g g ie F 1 .9 m M e d iu m T a ll M a rth a F 1 .8 8 m M e d iu m T a ll S te p h a n ie F 1 .7 m S h o rt M e d iu m B o b M 1 .8 5 m M e d iu m M e d iu m K a th y F 1 .6 m S h o rt M e d iu m D a v e M 1 .7 m S h o rt M e d iu m

W o r th M 2 .2 m T a ll T a ll

S te v e n M 2 .1 m T a ll T a ll D e b b ie F 1 .8 m M e d iu m M e d iu m T o d d M 1 .9 5 m M e d iu m M e d iu m

K im F 1 .9 m M e d iu m T a ll

A m y F 1 .8 m M e d iu m M e d iu m W y n e tte F 1 .7 5 m M e d iu m M e d iu m

Example 4.5 Example 4.5

n

n

Step1: Calculate the prior probability Step1: Calculate the prior probability

–P (short) =P (short) = –

–P (medium) =P (medium) = –

–P (tall) =P (tall) =

Example 4.5 Example 4.5

n

n Step1: Calculate the prior probability Step1: Calculate the prior probability

P (short) = 4/15 = 0.267P (short) = 4/15 = 0.267 P (medium) = 8/15 = 0.533P (medium) = 8/15 = 0.533

P (tall) = 3/15 = 0.2P (tall) = 3/15 = 0.2 n

n Step 2: Calculate the conditional probabilityStep 2: Calculate the conditional probability

P(GenderP(Genderii||CCjj), ), Gender

Genderii= F or M, C= F or M, Cjj= short or medium or tall = short or medium or tall

P(HeightP(Heightii||CCjj)) Height

Heightiiin (0,1.6],(1.6,1.7],(1.7,1.8],(1.8,1.9],(1.9,2.0],(>2.0).in (0,1.6],(1.6,1.7],(1.7,1.8],(1.8,1.9],(1.9,2.0],(>2.0).

(7)

© Prentice Hall 37

Example 4.5 (cont Example 4.5 (cont’ ’d) d)

Attribute

Attribute countcount probability probability p(xp(xii|C|Cjj)) short medium tall short medium tall short medium tall short medium tall

Gender M 1 2 3

Gender M 1 2 3

F 3 6 0

F 3 6 0

Height (<1.6] 2 0 0

Height (<1.6] 2 0 0

(1.6,1.7] 2 0 0

(1.6,1.7] 2 0 0

(1.7,1.8] 0 3 0

(1.7,1.8] 0 3 0

(1.8,1.9] 0 4 0

(1.8,1.9] 0 4 0

(1.9,2.0] 0 1 1

(1.9,2.0] 0 1 1

( >2.0 ) 0 0 2

( >2.0 ) 0 0 2

1/4 2/8 3/3 3/4 6/8 0/3 2/4 0 0

2/4 0 0

0 3/8 0

0 4/8 0 0 1/8 1/3 0 0 2/3

© Prentice Hall 38

Example 4.5 (cont Example 4.5 (cont’ ’d) d)

n

n

Given a tuple Given a tuple t ={Adam, M, 1.95m} t ={Adam, M, 1.95m}

nn

Step 3: Calculate P(t|C Step 3: Calculate P(t|C

jj

) )

P(t|short

P(t|short) =) = P(t|medium P(t|medium) = ) = P(t|tall P(t|tall)=)=

n

n

Step 4: calculate P(t Step 4: calculate P(t) )

P(t) = P(t) =

P(t|short)P(short)+P(t|medium)P(medium)+P(t|tall)P(tall P(t|short)P(short)+P(t|medium)P(medium)+P(t|tall)P(tall))

© Prentice Hall 39

Example 4.5 (cont Example 4.5 (cont’ ’ d) d)

nn

Given a Given a tuple tuple t ={Adam, M, 1.95m} t ={Adam, M, 1.95m}

n

n

Step 3: Calculate Step 3: Calculate P(t|C P(t|C

jj

) )

P(t|short

P(t|short) = ) = ¼¼x 0 =0x 0 =0 P(t|medium

P(t|medium) = 2/8 x 1/8 =0.031) = 2/8 x 1/8 =0.031 P(t|tall

P(t|tall)= 3/3 x1/3 =0.333)= 3/3 x1/3 =0.333 n

n

Step 4: calculate Step 4: calculate P(t P(t) )

P(tP(t) = ) =

P(t|short)P(short)+P(t|medium)P(medium)+P(t|tall)P(tall P(t|short)P(short)+P(t|medium)P(medium)+P(t|tall)P(tall))

= 0.0826

= 0.0826

© Prentice Hall 40

Example 4.5 (cont Example 4.5 (cont’ ’d) d)

n

n

Step 5: Calculate P(C Step 5: Calculate P(C

jj

| t) using | t) using Bayes Bayes Rule Rule

P(short|t

P(short|t) = ) = P(t|short)P(short)/P(tP(t|short)P(short)/P(t) = ) = P(medium|t

P(medium|t) = ) = P(tall|t P(tall|t)=)=

n

n

Last step: Last step:

–classify t based on these probabilitiesclassify t based on these probabilities

© Prentice Hall 41

Example 4.5 (cont Example 4.5 (cont’ ’ d) d)

n

n

Step 5: Calculate Step 5: Calculate P(C P(C

jj

| t) using Bayes | t) using Bayes Rule Rule

P(short|t

P(short|t) = ) = P(t|short)P(short)/P(tP(t|short)P(short)/P(t) = 0) = 0 P(medium|t

P(medium|t) = 0.2) = 0.2 P(tall|t

P(tall|t)= 0.799)= 0.799 n

n

Last step: Last step:

–Classify the new Classify the new tupletupleas tall.as tall.

© Prentice Hall 42

A Summary A Summary

n

n Step 1: Calculate the prior probability of each class. P (Step 1: Calculate the prior probability of each class. P (CCjj) )

n

n Step 2: Calculate the conditional probability for each attributeStep 2: Calculate the conditional probability for each attribute value,

value, P(GenderP(Genderii||CCjj), ), n

n Step 3: Calculate the conditional probability Step 3: Calculate the conditional probability P(t|CP(t|Cjj))

n

n Step 4: calculate the prior probability of a Step 4: calculate the prior probability of a tupletuple, , P(tP(t))

n

n Step 5: Calculate the posterior probability for each class givenStep 5: Calculate the posterior probability for each class given the the tupletuple, , P(CP(Cjj| t) using | t) using BayesBayesRuleRule

n

n Step 6: Classify a Step 6: Classify a tupletuplebased on the based on the P(CP(Cjj| t), the | t), the tupletuplebelongs belongs to the class with has the highest posterior probability.

to the class with has the highest posterior probability.

(8)

© Prentice Hall 43

Next Lecture:

Next Lecture:

nn

Classification: Classification:

––DistanceDistance--based algorithmsbased algorithms ––Decision treeDecision tree--based algorithmsbased algorithms

nn

HW2 will be announced! HW2 will be announced!

References

Related documents

Means is a typical clustering algorithm in Data Mining which is widely used for clustering large set of data’s. In 1967, Mac Queen firstly proposed the K-Means algorithm, it

Buildings, Infrastructure, Water, Mining &amp; Minerals, Power, Process, Telecommunications, National Government (Intelligence/IT), Upstream (pipelines). Australia, China,

This experiment employed the optimum thermal stimulation (group D) during the out-of-season barbel reproduction, owing to which it was possible to achieve a very high

Field experiments were conducted at Ebonyi State University Research Farm during 2009 and 2010 farming seasons to evaluate the effect of intercropping maize with

For the broadcast safety applications, we are interested in the performance metrics of normalized throughput, average channel access delay and successful message delivery ratio..

W:www.humanservices.gov.au/customer/information/centrelink-website Child Care Fact Sheets 2012-2013 - The fact sheet kit provides you with information about a range of

For the poorest farmers in eastern India, then, the benefits of groundwater irrigation have come through three routes: in large part, through purchased pump irrigation and, in a

Abstract: Based on facts that the composite action in semi-precast and strengthened structural system depends on the bond strength of the interface between concrete faces of