Chapter Eighteen

(1)

18-1

Chapter Eighteen

Discriminant and Logit Analysis

(2)

18-2

Direct marketing scenario

• Every year, the DM company sends to one annual catalog, four seasonal catalogs, and a number of calalogs for holiday seasons.

• The company has a list of 10 million potential buyers.

• The response rate is on average 5%.

• Who should receive a catalog? (mostly likely to buy)

• How are buyers different from nonbuyers?

• Who are most likely to default?...

(3)

18-3

Similarities and Differences between ANOVA, Regression, and Discriminant Analysis

Table 18.1

ANOVA REGRESSION DISCRIMINANT/LOGIT

Similarities

Number of One One One

dependent variables Number of

independent Multiple Multiple Multiple variables

Differences Nature of the

dependent Metric Metric Categorical variables

Nature of the

independent Categorical Metric Metric (or binary, dummies)

variables

(4)

18-4

Discriminant Analysis

Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the

predictor or independent variables are interval in nature.

The objectives of discriminant analysis are as follows:

•

Development of discriminant functions, or linear

combinations of the predictor or independent variables, which will best discriminate between the categories of the criterion or dependent variable (groups). (buyers vs. nonbuyers)

•

Examination of whether significant differences exist among the groups, in terms of the predictor variables.

•

Determination of which predictor variables contribute to most of the intergroup differences.

•

Classification of cases to one of the groups based on the values of the predictor variables.

•

Evaluation of the accuracy of classification.

(5)

18-5

Discriminant Analysis

•

When the criterion variable has two categories, the technique is known as two-group discriminant analysis.

•

When three or more categories are involved, the technique is referred to as multiple discriminant analysis.

•

The main distinction is that, in the two-group case, it is

possible to derive only one discriminant function. In multiple discriminant analysis, more than one function may be

computed. In general, with G groups and k predictors, it is possible to estimate up to the smaller of G - 1, or k,

discriminant functions.

•

The first function has the highest ratio of between-groups to within-groups sum of squares. The second function,

uncorrelated with the first, has the second highest ratio, and

so on. However, not all the functions may be statistically

significant.

(6)

18-6

Geometric Interpretation Fig. 18.1

X2

X1 G1

G2

D G1

2

G2

2 2 2 2 2 2 2 2 2 2

1 1 1 1 2 2 2

2 1 1 1

1 1 1 1 1

1 1

(7)

18-7

Discriminant Analysis Model

The discriminant analysis model involves linear combinations of the following form:

D = b

₀

+ b

₁

X

₁

+ b

₂

X

₂

+ b

₃

X

₃

+ . . . + b

_k

X

_k

Where:

D = discriminant score

b 's = discriminant coefficient or weight X 's = predictor or independent variable

•

The coefficients, or weights (b), are estimated so that the groups

differ as much as possible on the values of the discriminant function.

•

This occurs when the ratio of between-group sum of squares to

within-group sum of squares for the discriminant scores is at a

maximum.

(8)

18-8

Statistics Associated with Discriminant Analysis

•

Canonical correlation. Canonical correlation measures the extent of association between the discriminant scores and the groups. It is a measure of association between the single discriminant function and the set of dummy variables that define the group membership.

•

Centroid. The centroid is the mean values for the

discriminant scores for a particular group. There are as

many centroids as there are groups, as there is one for each group. The means for a group on all the functions are the group centroids.

•

Classification matrix. Sometimes also called confusion or prediction matrix, the classification matrix contains the

number of correctly classified and misclassified cases.

(9)

18-9

Statistics Associated with Discriminant Analysis

•

Discriminant function coefficients. The discriminant

function coefficients (unstandardized) are the multipliers of variables, when the variables are in the original units of

measurement.

•

Discriminant scores. The unstandardized coefficients are multiplied by the values of the variables. These products are summed and added to the constant term to obtain the discriminant scores.

•

Eigenvalue. For each discriminant function, the Eigenvalue is the ratio of between-group to within-group sums of

squares. Large Eigenvalues imply superior functions.

(10)

18-10

Statistics Associated with Discriminant Analysis

•

F values and their significance. These are calculated from a one-way ANOVA, with the grouping variable serving as the categorical independent variable. Each predictor, in turn, serves as the metric dependent variable in the ANOVA.

•

Group means and group standard deviations. These are computed for each predictor for each group.

•

Pooled within-group correlation matrix. The pooled

within-group correlation matrix is computed by averaging

the separate covariance matrices for all the groups.

(11)

18-11

Statistics Associated with Discriminant Analysis

•

Standardized discriminant function coefficients. The

standardized discriminant function coefficients are the discriminant

function coefficients and are used as the multipliers when the variables have been standardized to a mean of 0 and a variance of 1.

•

Structure correlations. Also referred to as discriminant loadings, the structure correlations represent the simple correlations between the predictors and the discriminant function.

•

Total correlation matrix. If the cases are treated as if they were from a single sample and the correlations computed, a total correlation matrix is obtained.

•

Wilks' . Sometimes also called the U statistic, Wilks' for each predictor is the ratio of the within-group sum of squares to the total sum of squares. Its value varies between 0 and 1. Large values

of (near 1) indicate that group means do not seem to be different.

Small values of (near 0) indicate that the group means seem to be different.

λ λ

(12)

18-12

Conducting Discriminant Analysis

Fig. 18.2

Assess Validity of Discriminant Analysis

Estimate the Discriminant Function Coefficients

Determine the Significance of the Discriminant Function Formulate the Problem

Interpret the Results

(13)

18-13

Conducting Discriminant Analysis Formulate the Problem

•

Identify the objectives, the criterion variable, and the independent variables.

•

The criterion variable must consist of two or more mutually exclusive and collectively exhaustive categories.

•

The predictor variables should be selected based on a

theoretical model or previous research, or the experience of the researcher.

•

One part of the sample, called the estimation or analysis sample, is used for estimation of the discriminant function.

•

The other part, called the holdout or validation sample, is reserved for validating the discriminant function.

•

Often the distribution of the number of cases in the analysis

and validation samples follows the distribution in the total

sample.

(14)

18-14

Information on Resort Visits: Analysis Sample

Table 18.2

Annual Attitude Importance Household Age of Amount

Resort Family Toward Attached Size Head of Spent on No. Visit Income Travel to Family Household Family

($000) Vacation Vacation

1 1 50.2 5 8 3 43 M (2)

2 1 70.3 6 7 4 61 H (3)

3 1 62.9 7 5 6 52 H (3)

4 1 48.5 7 5 5 36 L (1)

5 1 52.7 6 6 4 55 H (3)

6 1 75.0 8 7 5 68 H (3)

7 1 46.2 5 3 3 62 M (2)

8 1 57.0 2 4 6 51 M (2)

9 1 64.1 7 5 4 57 H (3)

10 1 68.1 7 6 5 45 H (3)

11 1 73.4 6 7 5 44 H (3)

12 1 71.9 5 8 4 64 H (3)

13 1 56.2 1 8 6 54 M (2)

14 1 49.3 4 2 3 56 H (3)

15 1 62.0 5 6 2 58 H (3)

(15)

18-15

Information on Resort Visits: Analysis Sample

Ta ble 18. 2, c ont . Annual Attitude Importance Household Age of Amount

Resort Family Toward Attached Size Head of Spent on No. Visit Income Travel to Family Household Family

($000) Vacation Vacation

16 2 32.1 5 4 3 58 L (1)

17 2 36.2 4 3 2 55 L (1)

18 2 43.2 2 5 2 57 M (2)

19 2 50.4 5 2 4 37 M (2)

20 2 44.1 6 6 3 42 M (2)

21 2 38.3 6 6 2 45 L (1)

22 2 55.0 1 2 2 57 M (2)

23 2 46.1 3 5 3 51 L (1)

24 2 35.0 6 4 5 64 L (1)

25 2 37.3 2 7 4 54 L (1)

26 2 41.8 5 1 3 56 M (2)

27 2 57.0 8 3 2 36 M (2)

28 2 33.4 6 8 2 50 L (1)

29 2 37.5 3 2 3 48 L (1)

30 2 41.3 3 3 2 42 L (1)

(16)

18-16

Information on Resort Visits:

Holdout Sample

Table 18.3

Annual Attitude Importance Household Age of Amount

Resort Family Toward Attached Size Head of Spent on No. Visit Income Travel to Family Household Family

($000) Vacation Vacation

1 1 50.8 4 7 3 45 M(2)

2 1 63.6 7 4 7 55 H (3)

3 1 54.0 6 7 4 58 M(2)

4 1 45.0 5 4 3 60 M(2)

5 1 68.0 6 6 6 46 H (3)

6 1 62.1 5 6 3 56 H (3)

7 2 35.0 4 3 4 54 L (1)

8 2 49.6 5 3 5 39 L (1)

9 2 39.4 6 5 3 44 H (3)

10 2 37.0 2 6 5 51 L (1)

11 2 54.5 7 3 3 37 M(2)

12 2 38.2 2 2 3 49 L (1)

(17)

18-17

Conducting Discriminant Analysis

Estimate the Discriminant Function Coefficients

• The direct method involves estimating the discriminant function so that all the

predictors are included simultaneously.

• In stepwise discriminant analysis, the

predictor variables are entered sequentially,

based on their ability to discriminate among

groups.

(18)

18-18

Results of Two-Group Discriminant Analysis

Ta ble 18. 4

GROUP MEANS

VISIT INCOME TRAVEL VACATION HSIZE AGE 1 60.52000 5.40000 5.80000 4.33333 53.73333 2 41.91333 4.33333 4.06667 2.80000 50.13333 Total 51.21667 4.86667 4.9333 3.56667 51.93333 Group Standard Deviations

1 9.83065 1.91982 1.82052 1.23443 8.77062 2 7.55115 1.95180 2.05171 .94112 8.27101 Total 12.79523 1.97804 2.09981 1.33089 8.57395 Pooled Within-Groups Correlation Matrix

INCOME TRAVEL VACATION HSIZE AGE

INCOME 1.00000

TRAVEL 0.19745 1.00000

VACATION 0.09148 0.08434 1.00000

HSIZE 0.08887 -0.01681 0.07046 1.00000

AGE - 0.01431 -0.19709 0.01742 -0.04301 1.00000 Wilks' (U-statistic) and univariate F ratio with 1 and 28 degrees of freedom

Variable Wilks' F Significance

INCOME 0.45310 33.800 0.0000

TRAVEL 0.92479 2.277 0.1425

VACATION 0.82377 5.990 0.0209

HSIZE 0.65672 14.640 0.0007

AGE 0.95441 1.338 0.2572 Cont.

(19)

18-19

Results of Two-Group Discriminant Analysis

Table 18.4, cont.

CANONICAL DISCRIMINANT FUNCTIONS

% of Cum Canonical After Wilks'

Function Eigenvalue Variance % Correlation Function λ Chi-square df Significance

: 0 0 .3589 26.130 5 0.0001

1* 1.7862 100.00 100.00 0.8007 :

* marks the 1 canonical discriminant functions remaining in the analysis.

Standard Canonical Discriminant Function Coefficients

FUNC 1

INCOME 0.74301

TRAVEL 0.09611

VACATION 0.23329

HSIZE 0.46911

AGE 0.20922

Structure Matrix:

Pooled within-groups correlations between discriminating variables & canonical discriminant functions (variables ordered by size of correlation within function)

FUNC 1

INCOME 0.82202

HSIZE 0.54096

VACATION 0.34607

TRAVEL 0.21337

AGE 0.16354 Cont.

(20)

18-20 Cont.

Results of Two-Group Discriminant Analysis

Table 18.4, cont.

Unstandardized Canonical Discriminant Function Coefficients FUNC 1

INCOME 0.8476710E-01

TRAVEL 0.4964455E-01

VACATION 0.1202813

HSIZE 0.4273893

AGE 0.2454380E-01

(constant) -7.975476

Canonical discriminant functions evaluated at group means (group centroids) Group FUNC 1

1 1.29118

2 -1.29118

Classification results for cases selected for use in analysis

Predicted Group Membership

Actual Group No. of Cases 1 2

Group 1 15 12 3

80.0% 20.0%

Group 2 15 0 15

0.0% 100.0%

Percent of grouped cases correctly classified: 90.00%

(21)

18-21

Results of Two-Group Discriminant Analysis

Table 18.4, cont.

Classification Results for cases not selected for use in the analysis (holdout sample)

Predicted Group Membership Actual Group No. of Cases 1 2

Group 1 6 4 2

66.7% 33.3%

Group 2 6 0 6

0.0% 100.0%

Percent of grouped cases correctly classified: 83.33%.

(22)

18-22

Conducting Discriminant Analysis Interpret the Results

•

The interpretation of the discriminant weights, or coefficients, is similar to that in multiple regression analysis.

•

Given the multicollinearity in the predictor variables, there is no

unambiguous measure of the relative importance of the predictors in discriminating between the groups.

•

With this caveat in mind, we can obtain some idea of the relative

importance of the variables by examining the absolute magnitude of the standardized discriminant function coefficients.

•

Some idea of the relative importance of the predictors can also be

obtained by examining the structure correlations, also called canonical loadings or discriminant loadings. These simple correlations between each predictor and the discriminant function represent the variance that the predictor shares with the function.

•

Another aid to interpreting discriminant analysis results is to develop a Characteristic profile for each group by describing each group in

terms of the group means for the predictor variables.

(23)

18-23

Conducting Discriminant Analysis

Assess Validity of Discriminant Analysis

•

Many computer programs, such as SPSS, offer a leave-one- out cross-validation option.

•

The discriminant weights, estimated by using the analysis sample, are multiplied by the values of the predictor

variables in the holdout sample to generate discriminant scores for the cases in the holdout sample. The cases are then assigned to groups based on their discriminant scores and an appropriate decision rule. The hit ratio, or the

percentage of cases correctly classified, can then be

determined by summing the diagonal elements and dividing by the total number of cases.

•

It is helpful to compare the percentage of cases correctly classified by discriminant analysis to the percentage that would be obtained by chance. Classification accuracy

achieved by discriminant analysis should be at least 25%

greater than that obtained by chance.

(24)

18-24

Results of Three-Group Discriminant Analysis

Ta ble 18. 5

Group Means

AMOUNT INCOME TRAVEL VACATION HSIZE AGE

1 38.57000 4.50000 4.70000 3.10000 50.30000

2 50.11000 4.00000 4.20000 3.40000 49.50000

3 64.97000 6.10000 5.90000 4.20000 56.00000

Total 51.21667 4.86667 4.93333 3.56667 51.93333

Group Standard Deviations

1 5.29718 1.71594 1.88856 1.19722 8.09732

2 6.00231 2.35702 2.48551 1.50555 9.25263

3 8.61434 1.19722 1.66333 1.13529 7.60117

Total 12.79523 1.97804 2.09981 1.33089 8.57395

Pooled Within-Groups Correlation Matrix

INCOME TRAVEL VACATION HSIZE AGE

INCOME 1.00000

TRAVEL 0.05120 1.00000

VACATION 0.30681 0.03588 1.00000

HSIZE 0.38050 0.00474 0.22080 1.00000

AGE -0.20939 -0.34022 -0.01326 -0.02512 1.00000 Cont.

(25)

18-25

All-Groups Scattergram

Fig. 18.3

-4.0

Across: Function 1 Down: Function 2 4.0

0.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 1 1

1 1 1

1 1 1 1

12 2

2 2 2

2 3 3

3 3 3 3 3

2 * 3

* *

* indicates a group

centroid

(26)

18-26

Territorial Map

Fig. 18.4

-4.0

Across: Function 1 Down: Function 2 4.0

0.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0

1 1 3

*

-8.0 -8.0

8.0

1 3 1 3

1 3 1 31 3 1 31 3

1 1 2 3 1 1 2 2 3 3 1 1 2

2

1 1 1 2 2 2 2 3 3

1 1 1 2 2

1 1 2 1 1 2 2 1 1 1 2 22 1 1 2

2 1 1 2 2

1 1 1 2 1 1 1 2 22

1 1 2 2 2

2 2 3 2 3 3 2 2 3 3

2 2 3

2 2 3 2 2 3

2 2 3 3 2 3 3

2 3 3

* *

* Indicates a group centroid

(27)

18-27

The Logit Model

• The dependent variable is binary and there are several independent variables that are metric

• The binary logit model commonly deals with the issue of how likely is an

observation to belong to each group (classification n prediction, buyer vs nonbuyer)

• It estimates the probability of an

observation belonging to a particular

group

(28)

18-28

(29)

18-29

Binary Logit Model Formulation

The probability of success may be modeled using the logit model as:

Or

ⁱ

n

e

a

i

X

P

P  = ∑



 



 1 −

log  =

ⁱ

−

ⁱ^{= 0}

a X a X

a X

a

k k

e

P

P  = + + + +



 





− ...

log 1

⁰ ¹ ¹ ² ²

(30)

18-30

Model Formulation

) exp(

1 ) exp(

0 0

X a

a X

i k

i

i i k

i

P ∑

∑

=

= +

Where:

P = Probability of success X _i = Independent variable i

a _i = parameter to be estimated.

(31)

18-31

Properties of the Logit Model

• Although X _i may vary from to , P is constrained to lie between 0 and 1.

• When X _i approaches , P approaches 0.

• When X _i approaches , P approaches 1.

• When OLS regression is used, P is not constrained to lie between 0 and 1.

∞

−

∞ +

∞

−

(32)

18-32

Estimation and Model Fit

•

The estimation procedure is called the maximum likelihood method.

•

Fit: Cox & Snell R Square and Nagelkerke R Square.

•

Both these measures are similar to R2 in multiple regression.

•

The Cox & Snell R Square can not equal 1.0, even if the fit is perfect.

•

This limitation is overcome by the Nagelkerke R Square.

•

Compare predicted and actual values of Y to determine the

percentage of correct predictions.

(33)

18-33

Significance Testing

The significance of the estimated coefficients is based on Wald’s statistic.

Wald = (a

i

/ SE

_ai

)

²

Where,

a

_i

= logistical coefficient for that predictor variable SE

_ai

= standard error of the logistical coefficient

The Wald statistic is chi-square distributed with 1 degree of freedom if the

variable is metric and the number of categories minus 1 if the variable is

nonmetric.

(34)

18-34

Interpretation of Coefficients

• If X _i is increased by one unit, the log odds will change by a _i units, when the effect of other independent variables is held

constant.

• The sign of a _i will determine whether the

probability increases (if the sign is positive)

or decreases (if the sign is negative) by this

amount.

(35)

18-35

Explaining Brand Loyalty

Table 18.6

^No.₁ ^Loyalty₁ ^Brand₄ ^Product₃ ^Shopping₅

2 1 6 4 4

3 1 5 2 4

4 1 7 5 5

5 1 6 3 4

6 1 3 4 5

7 1 5 5 5

8 1 5 4 2

9 1 7 5 4

10 1 7 6 4

11 1 6 7 2

12 1 5 6 4

13 1 7 3 3

14 1 5 1 4

15 1 7 5 5

16 0 3 1 3

17 0 4 6 2

18 0 2 5 2

19 0 5 2 4

20 0 4 1 3

21 0 3 3 4

22 0 3 4 5

23 0 3 6 3

24 0 4 4 2

25 0 6 3 6

26 0 3 6 3

27 0 4 3 2

28 0 3 5 2

29 0 5 5 3

30 0 1 3 2

(36)

18-36

Results of Logistic Regression

Table 18.7

Dependent Variable Encoding Original Value Internal Value

Not Loyal 0

Loyal 1

Model Summary

Step -2 Log

likelihood Cox & Snell

R Square Nagelkerke R Square

1 23.471(a) .453 .604

a Estimation terminated at iteration number 6 because parameter estimates changed by less than .001.

(37)

18-37

Results of Logistic Regression

Table 18.7, cont.

Variables in the Equation a

Variable(s) entered on step 1: Brand, Product, Shopping.

a.

1.274 .479 7.075 1 .008 3.575

.186 .322 .335 1 .563 1.205

.590 .491 1.442 1 .230 1.804

-8.642 3.346 6.672 1 .010 .000

Brand Product Shopping Constant Step

1

B S.E. Wald df Sig. Exp(B)

Classification Tablea

The cut value is .500 a.

12 3 80.0

3 12 80.0

80.0 Observed

Not Loyal Loyal Loyalty to the

Brand

Overall Percentage Step 1

Not Loyal Loyal

Loyalty to the Brand Percentage Correct Predicted

(38)

18-38

Results of Three-Group Discriminant Analysis Multinomial Logistic regression

Ta ble 18. 5 Group Means

AMOUNT INCOME TRAVEL VACATION HSIZE AGE

1 38.57000 4.50000 4.70000 3.10000 50.30000

2 50.11000 4.00000 4.20000 3.40000 49.50000

3 64.97000 6.10000 5.90000 4.20000 56.00000

Total 51.21667 4.86667 4.93333 3.56667 51.93333

Group Standard Deviations

1 5.29718 1.71594 1.88856 1.19722 8.09732

2 6.00231 2.35702 2.48551 1.50555 9.25263

3 8.61434 1.19722 1.66333 1.13529 7.60117

Total 12.79523 1.97804 2.09981 1.33089 8.57395

Pooled Within-Groups Correlation Matrix

INCOME TRAVEL VACATION HSIZE AGE

INCOME 1.00000

TRAVEL 0.05120 1.00000

VACATION 0.30681 0.03588 1.00000

HSIZE 0.38050 0.00474 0.22080 1.00000

AGE -0.20939 -0.34022 -0.01326 -0.02512 1.00000

Cont.

(39)

18-39

Chapter Nineteen

Factor Analysis

The company asked 20 questions about casual dining and lifestyle.

How are these questions related

to one another? What are the

important dimensions or factors?

(40)

18-40

Factor Analysis

• Factor analysis is a general name denoting a class of

procedures primarily used for data reduction and summarization.

• Factor analysis is an interdependence technique in that an entire set of interdependent relationships is examined without making the distinction between dependent and independent variables.

• Factor analysis is used in the following circumstances:

• To identify underlying dimensions, or factors, that explain the correlations among a set of variables.

• To identify a new, smaller, set of uncorrelated variables to replace the original set of correlated variables in subsequent multivariate analysis (regression or discriminant analysis).

• To identify a smaller set of salient variables from a larger set

for use in subsequent multivariate analysis.

(41)

18-41

Factors Underlying Selected Psychographics and Lifestyles

Fig. 19.1

Factor 2

Football Baseball

Evening at home Factor 1

Home is best place Go to a party

Plays

Movies

(42)

18-42

Statistics Associated with Factor Analysis

• Bartlett's test of sphericity. Bartlett's test of sphericity is a test statistic used to examine the hypothesis that the

variables are uncorrelated in the population. In other words, the population correlation matrix is an identity matrix; each variable correlates perfectly with itself (r = 1) but has no correlation with the other variables (r = 0).

• Correlation matrix. A correlation matrix is a lower triangle matrix showing the simple correlations, r, between all

possible pairs of variables included in the analysis. The

diagonal elements, which are all 1, are usually omitted.

(43)

18-43

Statistics Associated with Factor Analysis

• Communality. Communality is the amount of variance a

variable shares with all the other variables being considered.

This is also the proportion of variance explained by the common factors.

• Eigenvalue. The eigenvalue represents the total variance explained by each factor.

• Factor loadings. Factor loadings are simple correlations between the variables and the factors.

• Factor loading plot. A factor loading plot is a plot of the original variables using the factor loadings as coordinates.

• Factor matrix. A factor matrix contains the factor loadings of

all the variables on all the factors extracted.

(44)

18-44

Statistics Associated with Factor Analysis

• Factor scores. Factor scores are composite scores estimated for each respondent on the derived factors.

• Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is an index used to examine the

appropriateness of factor analysis. High values (between 0.5 and 1.0) indicate factor analysis is appropriate. Values below 0.5 imply that factor analysis may not be appropriate.

• Percentage of variance. The percentage of the total variance attributed to each factor.

• Residuals are the differences between the observed

correlations, as given in the input correlation matrix, and the reproduced correlations, as estimated from the factor matrix.

• Scree plot. A scree plot is a plot of the Eigenvalues against

the number of factors in order of extraction.

(45)

18-45

Conducting Factor Analysis

RESPONDENT

NUMBER V1 V2 V3 V4 V5 V6 1 7.00 3.00 6.00 4.00 2.00 4.00 2 1.00 3.00 2.00 4.00 5.00 4.00 3 6.00 2.00 7.00 4.00 1.00 3.00 4 4.00 5.00 4.00 6.00 2.00 5.00 5 1.00 2.00 2.00 3.00 6.00 2.00 6 6.00 3.00 6.00 4.00 2.00 4.00 7 5.00 3.00 6.00 3.00 4.00 3.00 8 6.00 4.00 7.00 4.00 1.00 4.00 9 3.00 4.00 2.00 3.00 6.00 3.00 10 2.00 6.00 2.00 6.00 7.00 6.00 11 6.00 4.00 7.00 3.00 2.00 3.00 12 2.00 3.00 1.00 4.00 5.00 4.00 13 7.00 2.00 6.00 4.00 1.00 3.00 14 4.00 6.00 4.00 5.00 3.00 6.00 15 1.00 3.00 2.00 2.00 6.00 4.00 16 6.00 4.00 6.00 3.00 3.00 4.00 17 5.00 3.00 6.00 3.00 3.00 4.00 18 7.00 3.00 7.00 4.00 1.00 4.00 19 2.00 4.00 3.00 3.00 6.00 3.00 20 3.00 5.00 3.00 6.00 4.00 6.00 21 1.00 3.00 2.00 3.00 5.00 3.00 22 5.00 4.00 5.00 4.00 2.00 4.00 23 2.00 2.00 1.00 5.00 4.00 4.00 24 4.00 6.00 4.00 6.00 4.00 7.00 25 6.00 5.00 4.00 2.00 1.00 4.00 26 3.00 5.00 4.00 6.00 4.00 7.00 27 4.00 4.00 7.00 2.00 2.00 5.00 28 3.00 7.00 2.00 6.00 4.00 3.00 29 4.00 6.00 3.00 7.00 2.00 7.00 30 2.00 3.00 2.00 4.00 7.00 2.00

Table 19.1

(46)

18-46

Correlation Matrix

Variables V1 V2 V3 V4 V5 V6

V1 1.000

V2 -0.530 1.000

V3 0.873 -0.155 1.000

V4 -0.086 0.572 -0.248 1.000

V5 -0.858 0.020 -0.778 -0.007 1.000

V6 0.004 0.640 -0.018 0.640 -0.136 1.000

Table 19.2

(47)

18-47

Conducting Factor Analysis:

Determine the Method of Factor Analysis

• In principal components analysis, the total variance in the data is considered. The diagonal of the correlation matrix

consists of unities, and full variance is brought into the factor matrix. Principal components analysis is recommended when the primary concern is to determine the minimum number of factors that will account for maximum variance in the data for use in subsequent multivariate analysis. The factors are called principal components.

• In common factor analysis, the factors are estimated based only on the common variance. Communalities are inserted in the diagonal of the correlation matrix. This method is

appropriate when the primary concern is to identify the

underlying dimensions and the common variance is of interest.

This method is also known as principal axis factoring.

(48)

18-48

Results of Principal Components Analysis

Communalities

Variables Initial Extraction

V1 1.000 0.926

V2 1.000 0.723

V3 1.000 0.894

V4 1.000 0.739

V5 1.000 0.878

V6 1.000 0.790

Initial Eigen values

Factor Eigen value % of variance Cumulat. %

1 2.731 45.520 45.520

2 2.218 36.969 82.488

3 0.442 7.360 89.848

4 0.341 5.688 95.536

5 0.183 3.044 98.580

6 0.085 1.420 100.000

Table 19.3

(49)

18-49

Results of Principal Components Analysis

Extraction Sums of Squared Loadings

Factor Eigen value % of variance Cumulat. %

1 2.731 45.520 45.520

2 2.218 36.969 82.488

Factor Matrix

Variables Factor 1 Factor 2

V1 0.928 0.253

V2 -0.301 0.795

V3 0.936 0.131

V4 -0.342 0.789

V5 -0.869 -0.351

V6 -0.177 0.871

Rotation Sums of Squared Loadings

Factor Eigenvalue % of variance Cumulat. %

1 2.688 44.802 44.802

2 2.261 37.687 82.488

Table 19.3, cont.

(50)

Results of Principal Components Analysis

Rotated Factor Matrix

Variables Factor 1 Factor 2

V1 0.962 -0.027

V2 -0.057 0.848

V3 0.934 -0.146

V4 -0.098 0.845

V5 -0.933 -0.084

V6 0.083 0.885

Factor Score Coefficient Matrix

Variables Factor 1 Factor 2

V1 0.358 0.011

V2 -0.001 0.375

V3 0.345 -0.043

V4 -0.017 0.377

V5 -0.350 -0.059

V6 0.052 0.395

Table 19.3, cont.

(51)

18-51

Conducting Factor Analysis: Rotate Factors

• Although the initial or unrotated factor matrix

indicates the relationship between the factors and individual variables, it seldom results in factors that can be interpreted, because the factors are

correlated with many variables. Therefore, through rotation, the factor matrix is transformed into a

simpler one that is easier to interpret.

• In rotating the factors, we would like each factor to have nonzero, or significant, loadings or

coefficients for only some of the variables.

Likewise, we would like each variable to have nonzero or significant loadings with only a few factors, if possible with only one.

• The rotation is called orthogonal rotation if the

axes are maintained at right angles.

(52)

18-52

Conducting Factor Analysis: Rotate Factors

• The most commonly used method for rotation is the varimax procedure. This is an orthogonal method of rotation that minimizes the number of variables with high loadings on a factor, thereby enhancing the interpretability of the factors.

Orthogonal rotation results in factors that are uncorrelated.

• The rotation is called oblique rotation when the

axes are not maintained at right angles, and the

factors are correlated. Sometimes, allowing for

correlations among factors can simplify the factor

pattern matrix. Oblique rotation should be used

when factors in the population are likely to be

strongly correlated.

(53)

18-53

Factor Matrix Before and After Rotation

Factors

(a)

High Loadings Before Rotation

Fig. 19.5

(b)

High Loadings After Rotation Factors

Variables 1

2 3 4 5 6

1 X X X X X

2 X X X X

1 X X X

2 X X X Variables

1

2

3

4

5

6

(54)

18-54

Conducting Factor Analysis: Interpret Factors

• A factor can then be interpreted in terms of the variables that load high on it.

• Another useful aid in interpretation is to plot

the variables, using the factor loadings as

coordinates. Variables at the end of an axis

are those that have high loadings on only

that factor, and hence describe the factor.

(55)

18-55

Chapter Twenty

Cluster Analysis

(56)

18-56

(57)

18-57

Chapter Outline

1) Overview

2) Basic Concept (e.g., segmentation without prior known groups)

3) Statistics Associated with Cluster Analysis 4) Conducting Cluster Analysis

i. Formulating the Problem

ii. Selecting a Distance or Similarity Measure iii. Selecting a Clustering Procedure

iv. Deciding on the Number of Clusters

v. Interpreting and Profiling the Clusters

vi. Assessing Reliability and Validity

(58)

18-58

Cluster Analysis

• Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups called clusters. Objects in each cluster tend to be similar to each other and dissimilar to objects in the other clusters.

Cluster analysis is also called classification analysis, or numerical taxonomy.

• Both cluster analysis and discriminant analysis are concerned with classification. However, discriminant

analysis requires prior knowledge of the cluster or group membership for each object or case included, to develop the classification rule. In contrast, in cluster analysis there is no a priori information about the group or cluster

membership for any of the objects. Groups or clusters are

suggested by the data, not defined a priori.

(59)

18-59

A Practical Clustering Situation

Variable 2 X

Vari ab le 1

Fig. 20.2

(60)

18-60

An Ideal Clustering Situation

Variable 2

Vari ab le 1

Fig. 20.1

(61)

18-61

Statistics Associated with Cluster Analysis

• Agglomeration schedule. An agglomeration schedule

gives information on the objects or cases being combined at each stage of a hierarchical clustering process.

• Cluster centroid. The cluster centroid is the mean values of the variables for all the cases or objects in a particular cluster.

• Cluster centers. The cluster centers are the initial starting points in nonhierarchical clustering. Clusters are built

around these centers, or seeds.

• Cluster membership. Cluster membership indicates the

cluster to which each object or case belongs.

(62)

18-62

Statistics Associated with Cluster Analysis

• Dendrogram. A dendrogram, or tree graph, is a graphical device for displaying clustering results. Vertical lines

represent clusters that are joined together. The position of the line on the scale indicates the distances at which

clusters were joined. The dendrogram is read from left to right. Figure 20.8 is a dendrogram.

• Distances between cluster centers. These distances indicate how separated the individual pairs of clusters are.

Clusters that are widely separated are distinct, and

therefore desirable.

(63)

18-63

Attitudinal Data For Clustering

Case No. V

₁

V

₂

V

₃

V

₄

V

₅

V

₆

1 6 4 7 3 2 3

2 2 3 1 4 5 4

3 7 2 6 4 1 3

4 4 6 4 5 3 6

5 1 3 2 2 6 4

6 6 4 6 3 3 4

7 5 3 6 3 3 4

8 7 3 7 4 1 4

9 2 4 3 3 6 3

10 3 5 3 6 4 6

11 1 3 2 3 5 3

12 5 4 5 4 2 4

13 2 2 1 5 4 4

14 4 6 4 6 4 7

15 6 5 4 2 1 4

16 3 5 4 6 4 7

17 4 4 7 2 2 5

18 3 7 2 6 4 3

19 4 6 3 7 2 7

20 2 3 2 4 7 2

Ta ble 20. 1

(64)

18-64

Conducting Cluster Analysis:

Select a Distance or Similarity Measure

• The most commonly used measure of similarity is the Euclidean

distance or its square. The Euclidean distance is the square root of the sum of the squared differences in values for each variable. Other distance measures are also available. The city-block or Manhattan distance between two objects is the sum of the absolute differences in values for each variable. The Chebychev distance between two objects is the maximum absolute difference in values for any

variable.

• If the variables are measured in vastly different units, the clustering solution will be influenced by the units of measurement. In these cases, before clustering respondents, we must standardize the data by rescaling each variable to have a mean of zero and a standard deviation of unity. It is also desirable to eliminate outliers (cases with atypical values).

• Use of different distance measures may lead to different clustering

results. Hence, it is advisable to use different measures and compare

the results.

(65)

18-65

A Classification of Clustering Procedures

Fig. 20.4

Nonhierarchical Hierarchical

Agglomerative Divisive

Sequential

Threshold Parallel

Threshold Optimizing Partitioning Linkage

Methods Variance

Methods Centroid Methods Ward’s

Method Single

Linkage Complete

Linkage Average Linkage

Other

Two-Step

Clustering Procedures

(66)

18-66

Other Agglomerative Clustering Methods

Ward’s Procedure

Centroid Method

Fig. 20.6

(67)

18-67

Results of Hierarchical Clustering

Stage cluster Clusters combined first appears

Stage Cluster 1 Cluster 2 Coefficient Cluster 1 Cluster 2 Next stage

1 14 16 1.000000 0 0 6

2 6 7 2.000000 0 0 7

3 2 13 3.500000 0 0 15

4 5 11 5.000000 0 0 11

5 3 8 6.500000 0 0 16

6 10 14 8.160000 0 1 9

7 6 12 10.166667 2 0 10

8 9 20 13.000000 0 0 11

9 4 10 15.583000 0 6 12

10 1 6 18.500000 6 7 13

11 5 9 23.000000 4 8 15

12 4 19 27.750000 9 0 17

13 1 17 33.100000 10 0 14

14 1 15 41.333000 13 0 16

15 2 5 51.833000 3 11 18

16 1 3 64.500000 14 5 19

17 4 18 79.667000 12 0 18

18 2 4 172.662000 15 17 19

19 1 2 328.600000 16 18 0

Agglomeration Schedule Using Ward’s Procedure

Table 20.2

(68)

18-68

Conducting Cluster Analysis:

Decide on the Number of Clusters

• Theoretical, conceptual, or practical considerations may suggest a certain number of clusters.

• In hierarchical clustering, the distances at which clusters are combined can be used as criteria. This information can be obtained from the agglomeration schedule or from the dendrogram.

• In nonhierarchical clustering, the ratio of total within-group variance to between-group variance can be plotted against the number of clusters. The point at which an elbow or a sharp bend occurs indicates an appropriate number of clusters.

• The relative sizes of the clusters should be meaningful.

(69)

18-69

Conducting Cluster Analysis:

Interpreting and Profiling the Clusters

• Interpreting and profiling clusters involves examining the cluster centroids. The

centroids enable us to describe each cluster by assigning it a name or label.

• It is often helpful to profile the clusters in terms of variables that were not used for

clustering. These may include demographic,

psychographic, product usage, media usage,

or other variables.

(70)

18-70

Cluster Distribution

Table 20.5, cont.

N

% of

Combined % of Total

1 6 30.0% 30.0%

2 6 30.0% 30.0%

3 8 40.0% 40.0%

Cluster

Combined 20 100.0% 100.0%

Total 20 100.0%

(71)

18-71

Cluster Profiles

Table 20.5, cont.

Fun Bad for Budget Eating Out

Mean Std. Deviation Mean Std. Deviation Mean Std. Deviation

1 1.67 .516 3.00 .632 1.83 .753

2 3.50 .548 5.83 .753 3.33 .816

3 5.75 1.035 3.63 .916 6.00 1.069

Cluster

Combined 3.85 1.899 4.10 1.410 3.95 2.012

Best Buys Don't Care Compare Prices

Mean Std. Deviation Mean Std. Deviation Mean Std. Deviation

3.50 1.049 5.50 1.049 3.33 .816

6.00 .632 3.50 .837 6.00 1.549

3.13 .835 1.88 .835 3.88 .641

4.10 1.518 3.45 1.761 4.35 1.496

(72)

18-72