• No results found

Correspondence analysis and Related Methods Part 2. Between-set. versus within-set

N/A
N/A
Protected

Academic year: 2021

Share "Correspondence analysis and Related Methods Part 2. Between-set. versus within-set"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Correspondence analysis

and Related Methods – Part 2

1. What is multiple correspondence analysis (MCA)?

2. Why is MCA so useful as a method of visualizing

questionnaire data?

3. How is MCA implemented in XLSTAT?

• “Classical” or “simple” CA analyses the relationships

between two variables, although the method is extended

to analyse different forms of tabular data, for example the

product–attribute data shown previously, as well as

ratings, preferences, on an individual or aggregate level.

• Multiple CA analyses several categorical variables where

we are interested in all the relationships within the set of

variables, not between one set and another

• The best way to understand the difference is to see the

different data format for the MCA program in XLSTAT:

these are individual-level responses to several questions.

Responses to four questions concerning working women Demographic categories

Source:

Family &

Changing

Gender

Roles Survey

ISSP (1994)

• “between-set” means that there are two sets of

variables and we are interested in the relationships

between them – e.g., between demographics and

the question responses

• “within-set” means that there is one set of variables

and we are interested in the relationships amongst

them – e.g., amongst the question responses... this

is the multiple correspondence analysis (MCA) case

Between

Between

Between

Between----set

set

set

set versus

versus

versus

versus within

within

within

within----set

set

set

set

• Questions: Should a women work full-time, work part-time

or stay at home or missing data [4 response categories]:

(Q1) before she has children; (Q2) when she has a

pre-school child; (Q3) when children are still at pre-school; (Q4)

when all children have left home.

(2)

Between

Between

Between

Between----set

set

set

set example

example

example

example: Simple CA

: Simple CA

: Simple CA

: Simple CA

Q3: Should a woman with a child at school work full-time, part-time or stay at

home?

work work stay at DK/unsure/

full-time part-time home missing

COUNTRY W w H ? Total AUS 256 1156 176 191 1779 DW 101 1394 581 248 2324 DE 278 691 62 66 1097 GB 161 646 70 107 984 NIRL 126 394 75 52 647 USA 482 686 107 172 1447 A 84 632 202 59 977 H 285 736 447 32 1500 I 171 670 167 10 1018 IRL 223 424 209 82 938 NL 539 1205 143 81 1968 N 487 1242 205 153 2087 S 295 833 39 105 1272 CZ 228 585 198 13 1024 SLO 341 428 222 41 1032 PL 431 425 589 152 1597 BG 270 427 335 94 1126 RUS 175 1154 550 119 1998 NZ 120 754 72 101 1047 CDN 566 497 108 269 1440 IL 468 664 92 63 1287 J 203 671 313 120 1307 E 738 1012 514 230 2494 RP 243 448 484 25 1200 Total 7271 17774 5960 2585 33590 Average profile 0.216 0.529 0.177 0.077 1

Source:

Family &

Changing Gender

Roles Survey

ISSP (1994)

Simple CA

Simple CA

Simple CA

Simple CA

Should a woman with a child at school work full-time, part-time or stay at home?

2W 2w 2H 2? AUS DW DE GB NIRL USA A H I IRL NL N S CZ SLO PL BG RUS NZ CDN RP IL J E -0.4 -0.2 0 0.2 0.4 0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.0737 (50.6%) 0.0532 (36.5%)

87.1%

inertia

explained

W ? w H

Simple CA

Simple CA

Simple CA

Simple CA of

of

of

of multiway

multiway

multiway

multiway tables

tables

tables

tables

Should a woman with a child at school work full-time, part-time or stay at home?

work work stay at DK/unsure/ full-time part-time home missing

COUNTRY W w H ? Total AUSm 117 596 114 82 909 AUSf 138 559 60 109 866 DWm 43 675 357 123 1198 DWf 58 719 224 125 1126 . . . . . . . . . . . . . . . . . . RPm 347 445 294 111 1197 RPf 390 566 218 118 1292 Total 7271 17774 5960 2585 33590 Average profile 0.216 0.529 0.177 0.077 1 •Each country is split by gender: 24×2 country-age groups. We say the variables country and age are interactively coded

•Average profile stays the same, so definition of centre and geometric distance remain identical to previous map, all that has been done is to split each country point into two profiles

Simple CA

Simple CA

Simple CA

Simple CA of

of

of

of multiway

multiway

multiway

multiway tables

tables

tables

tables

Should a woman with a child at school work full-time, part-time or stay at home?

86.8%

inertia

explained

W w H ? AUSm DWm Dem GBm NIRLm USAm Am Hm Im IRLm NLmNm Sm CZm SLOm PLm BGm RUSm NZm CDNm RPm Ilm Jm Em AUSf DWf Def GBf NIRLf USAf Af Hf If IRLf NLf Nf Sf CZf SLOf PLf BGf RUSf NZf CDNf RPf Ilf Jf Ef -0.4 -0.2 0 0.2 0.4 0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 0.0797 (51.5%) 0.0546 (35.3%)

•Ireland (IRL) has largest M–F difference

•Bulgaria (BG) is only country with a reverse M–F difference •Inertia before: 0.01456 •Inertia with M–F split: 0.01546 •5.8% due to M–F

(3)

Simple CA

Simple CA

Simple CA

Simple CA of

of

of

of multiway

multiway

multiway

multiway tables

tables

tables

tables

Should a woman with a child at school work full-time, part-time or stay at home?

87.3% inertia

explained

•Points tend to lie in a curved pattern (called

arch or horseshoe)

•Points that lie inside the arch are polarized, e.g. PLm26-35: 32% W, 22% w, 32% H, but NZm>66: 7% W, 73% w, 15% H Average: 22% W, 53% w, 18% H •Interactive coding of country (24), gender (2) and age (6), giving 288 combinations

?

H

w

W

-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 0.1301 (54.3%) 0.0791 (33.0%) CDNf<25 Hm>66 PLm>66 NZm>66 DEm<25 PLm<26-35

Stacked

Stacked

Stacked

Stacked tables

tables

tables

tables

Should a woman with a child at school work full-time, part-time or stay at home?

•Since the column margins of each table are identical (and same as the interactively coded tables before), the basic geometry remains the same, it’s just the detail that is sacrificed here, all the information is collapsed into “main effects”.

•Each variable is separately cross-tabulated with the question and then stacked one on top of another. W w H ? Country (24) Gender (2) Age (6) Education (7) Marital status (5) Social class (8)

•Inertia of stacked table is the average of the inertias of its subtables

Stacked

Stacked

Stacked

Stacked tables

tables

tables

tables

... with a child at

school ...

•Tables can be stacked row-wise and column-wise, adding additional questions as columns W w H ? Country (24) Gender (2) Age (6) Education (7) Marital status (5) Social class (8) W w H ? W w H ? W w H ?

Should a (married)

woman before having

children...

... with a

preschool child...

... when her

children have

left home work

full-time,

part-time or stay at

home?

•24 contingency tables in a 6 ×4 pattern, row margins and column margins are the same.

•Inertia of stacked table is the average of the inertias of its subtables

Stacked

Stacked

Stacked

Stacked tables

tables

tables

tables

Women in the workplace and 6 demographic variables

71.0% inertia

explained

•Relationships within questions and relationships within demographics not displayed explicitly •Join categories of ordinal variable to see trends, for example age. •Relationships between each demographic variable and each question displayed jointly 1W 1w 1H 1? 2W 2w 2H 2? 3W 3w 3H 3? 4W 4w 4H 4? AUS DW DE GB NIRL USA A H I IRL NL N S CZ SLO PL BG RUS NZ CDN RP IL J E M F A1 A2 A3 A4 A5 A6 ma wi di se si E1 E2 E3 E4 E5 E6 E7 S0 S1 S2 S3 S4 S5 S6 S* -0.4 -0.2 0 0.2 0.4 -0.4 -0.2 0 0.2 0.4 0.6 0.0188 (49.1%) 0.0084 (21.9%)

(4)

Multiple

Multiple

Multiple

Multiple correspondence

correspondence

correspondence

correspondence analysis

analysis

analysis (MCA)

analysis

(MCA)

(MCA)

(MCA)

Women in the workplace – 4 questions

West & East German samples only

•N

rows,

Q

questions,

q

-th question has

J

q categories, total number of categories is

J

(

N

= 3415,

Q =

4

J

q= 4 for all

q

,

J =

16 ) •One definition of MCA is that it is the CA of the indicator matrix •Response data is recoded as dummy variables

Questions Qu. 1 Qu. 2 Qu. 3 Qu. 4

1 2 3 4 W w H ? W w H ? W w H ? W w H ? ---1 3 2 2 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 2 3 3 2 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 4 3 3 2 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 4 4 4 4 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 4 4 4 4 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 3 2 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 . . . . . .

. . . and so on for 3415 rows

Original data Indicator Matrix

MCA: XLSTAT

MCA: XLSTAT

MCA: XLSTAT

MCA: XLSTAT initial

initial

initial

initial output

output

output

output

Total inertia: 3

Eigenvalues and percentages of inertia:

F1 F2 F3 F4 F5 Eigenvalue 0.692 0.513 0.365 0.307 0.218 Inertia (%) 23.061 17.108 12.156 10.248 7.254 Cumulative % 23.061 40.169 52.325 62.573 69.827 Adjusted Inertia 0.347 0.123 0.023 0.006 Adjusted Inertia (%) 66.152 23.482 4.456 1.118 Cumulative % 66.152 89.634 94.090 95.208 ... F12

Total inertia in MCA of indicator matrix

Z =

3

4

4

16

=

=

Q

Q

J

Multiple

Multiple

Multiple

Multiple correspondence

correspondence

correspondence analysis

correspondence

analysis

analysis

analysis (MCA)

(MCA)

(MCA)

(MCA)

Women in the workplace – 4 questions

•If

Z

(

N

×

J

) is the indicator matrix, then the Burt matrix

B

(

J

×

J

) is

B

=

Z

T

Z

•Alternative definition of MCA is that it is the CA of the Burt matrix •Stacked matrix of

all two-way contingency tables, including each variable with itself 1W 1w 1H 1? 2W 2w 2H 2? 3W 3w 3H 3? 4W 4w 4H 4? 2500 0 0 0 172 1107 1130 91 355 1709 345 91 1766 537 40 157 0 476 0 0 7 129 335 5 16 261 181 18 128 293 17 38 0 0 79 0 1 6 72 0 1 17 61 0 14 21 38 6 0 0 0 360 1 57 108 194 7 96 55 202 51 45 2 262 172 7 1 1 181 0 0 0 127 48 4 2 165 15 0 1 1107 129 6 57 0 1299 0 0 219 997 61 22 972 239 13 75 1130 335 72 108 0 0 1645 0 24 988 573 60 760 615 84 186 91 5 0 194 0 0 0 290 9 50 4 227 62 27 0 201 355 16 1 7 127 219 24 9 379 0 0 0 360 14 1 4 1709 261 17 96 48 997 988 50 0 2083 0 0 1348 566 23 146 345 181 61 55 4 61 573 4 0 0 642 0 202 286 73 81 91 18 0 202 2 22 60 227 0 0 0 311 49 30 0 232 1766 128 14 51 165 972 760 62 360 1348 202 49 1959 0 0 0 537 293 21 45 15 239 615 27 14 566 286 30 0 896 0 0 40 17 38 2 0 13 84 0 1 23 73 0 0 0 97 0 157 38 6 262 1 75 186 201 4 146 81 232 0 0 0 463

Burt matrix

1W 1w 1H 1? 2W 2w 2H 2? 3W 3w 3H 3? 4W 4w 4H 4?

MCA (

MCA (

MCA (

MCA (Burt

Burt

Burt

Burt matrix

matrix

matrix

matrix version

version

version

version))))

64.9% inertia

explained (only

40.2% if indicator

matrix analysed)

•Missing value categories have strong association •Relationships

amongst (within) the set of questions are displayed jointly

Women in the workplace – 4 questions

1W 1w 1H 1? 2W 2w 2H 2? 3W 3w 3H 3? 4W 4w 4H 4? -3 -2 -1 0 1 2 -1 0 1 2 3 0.263 (41.9%) 0.479 (23.0%) 0.479 (41.9%)

0.263 (23.0%) •Results are same for

Burt matrix, just principal inertias change.

(5)

Multiple

Multiple

Multiple

Multiple correspondence

correspondence

correspondence analysis

correspondence

analysis

analysis

analysis (MCA)

(MCA)

(MCA)

(MCA)

Women in the workplace – 4 questions

•Since the diagonal inertias are so high, this inflates the average, hence low percentages •Total inertia of Burt matrix is average of the inertias of its submatrices = 1.143 1W 1w 1H 1? 2W 2w 2H 2? 3W 3w 3H 3? 4W 4w 4H 4? 2500 0 0 0 172 1107 1130 91 355 1709 345 91 1766 537 40 157 0 476 0 0 7 129 335 5 16 261 181 18 128 293 17 38 0 0 79 0 1 6 72 0 1 17 61 0 14 21 38 6 0 0 0 360 1 57 108 194 7 96 55 202 51 45 2 262 172 7 1 1 181 0 0 0 127 48 4 2 165 15 0 1 1107 129 6 57 0 1299 0 0 219 997 61 22 972 239 13 75 1130 335 72 108 0 0 1645 0 24 988 573 60 760 615 84 186 91 5 0 194 0 0 0 290 9 50 4 227 62 27 0 201 355 16 1 7 127 219 24 9 379 0 0 0 360 14 1 4 1709 261 17 96 48 997 988 50 0 2083 0 0 1348 566 23 146 345 181 61 55 4 61 573 4 0 0 642 0 202 286 73 81 91 18 0 202 2 22 60 227 0 0 0 311 49 30 0 232 1766 128 14 51 165 972 760 62 360 1348 202 49 1959 0 0 0 537 293 21 45 15 239 615 27 14 566 286 30 0 896 0 0 40 17 38 2 0 13 84 0 1 23 73 0 0 0 97 0 157 38 6 262 1 75 186 201 4 146 81 232 0 0 0 463

Burt matrix – inertias of each subtable

1W 1w 1H 1? 2W 2w 2H 2? 3W 3w 3H 3? 4W 4w 4H 4?

3.000

0.363

0.424

0.644

0.363

3.000

0.892

0.345

0.424

0.892

3.000

0.480

0.644

0.345

0.480

3.000

•Percentage of variance explained is actually much higher, in MCA the overall inertia is inflated by the diagonal tables in the Burt matrix – the percentage is actually about 90%

Adjustment

Adjustment

Adjustment

Adjustment of

of

of

of principal

principal

principal

principal inertias

inertias

inertias

inertias

(

(

(

(eigenvalues

eigenvalues

eigenvalues

eigenvalues))))

Here are the steps to rescale the solution:

1.

Calculate the average off-diagonal inertia :

average off-diagonal inertia

=

2.

Calculate the adjusted principal inertias :

adjusted principal inertias

=

3.

Calculate adjusted percentages of inertia :

adjusted percentages of inertia

=

      − − ( ) 2 1 Q Q J inertia Q Q B Q Q Q Q k k 1 1 1

λ

λ

only for 2 2 >       −       − inertia diagonal -off average inertias principal adjusted

We can rescale an existing MCA solution in order to best fit the off-diagonal

tables. All we need is the total inertia of the Burt matrix,

inertia

(

B

), and the

principal inertias

λλλλ

k2

of the Burt matrix in the solution space.

If we have computed the solution on the indicator matrix

Z

(as in MCA module

of XLSTAT), the eigenvalues calculated are

λλλλ

k

so all the squares of the

principal inertias of

Z

need to be summed in order to get

inertia

(

B

). If you

have analysed the Burt matrix

B

,

inerti

a

(

B

) is the total inertia.

MCA (

MCA (

MCA (

MCA (adjusted

adjusted

adjusted

adjusted))))

Women in the workplace – 4 questions

4? 4H 4w 4W 3? 3H 3w 3W 2? 2H 2w 2W 1? 1H 1w 1W -3 -2 -1 0 1 2 -1 0 1 2 3 0.347 (66.2%) 0.123 (23.5%)

89.7% inertia explained

MCA (

MCA (

MCA (

MCA (Burt

Burt

Burt

Burt matrix

matrix

matrix

matrix version

version

version

version))))

Women in the workplace – 4 questions

1W 1w 1H 1? 2W 2w 2H 2? 3W 3w 3H 3? 4W 4w 4H 4? -3 -2 -1 0 1 2 -1 0 1 2 3 0.263 (41.9%) 0.479 (23.0%) 0.479 (41.9%) 0.263 (23.0%)

64.9% inertia explained

(6)

MCA

MCA

MCA

MCA

Women in the workplace – supplementary demographic groups

DW DE M F A1 A2 A3 A4 A5 A6 E1 E2 E3 E4 E5 E6 E* ma wi di se si -0.5 0.5 -0.5 0.5

Related topics

Related topics

Related topics

Related topics

1. Subset correspondence analysis

• restricting analysis to a subset of categories (e.g. all

substantive responses excluding missing categories, or

missing categories by themselves, or “middle” categories)

2. Square asymmetric tables

• mobility tables, brand-switching, migration...

3. Recoding of data before applying CA

• ratings, preferences, paired comparisons, continuous-scale

data (ratio and interval)

4. Stability and inference

• concentration ellipses, convex hulls, permutation tests

5. Canonical correspondence analysis (CCA)

• CA with explanatory variables (combination of dimensions

reduction and regression)

Subset

Subset

Subset

Subset correspondence

correspondence

correspondence

correspondence analysis

analysis

analysis

analysis

For example, analysing the women working data but ignoring the missing

values (this is NOT just a CA of the table without the missing value columns –

the masses and metric of the complete matrix are maintained).

In XLSTAT’s MCA program you are given a menu for selecting which

categories you want to retain or omit:

Subset

Subset

Subset

Subset correspondence

correspondence

correspondence analysis

correspondence

analysis

analysis

analysis

4H 4w 4W 3H 3w 3W 2H 2w 2W 1H 1w 1W -0.5 0 0.5 1 -1.5 -1 -0.5 0 0.5 0.1240 (70.0%) 0.0241 (13.5%)

(7)

Canonical

Canonical

Canonical

Canonical correspondence

correspondence

correspondence

correspondence analysis

analysis

analysis

analysis (

(

(

(CCA

CCA

CCA

CCA))))

This has the same objective as CA but restricts the CA solution to be (linearly)

related to external predictor variables, for exampe we want to find the best

low-dimensional view of the responses which is related to age (either age

group or original age variable)

Canonical

Canonical

Canonical

Canonical correspondence

correspondence

correspondence

correspondence analysis

analysis

analysis

analysis

(

(

(

(restricted

restricted

restricted

restricted to

to

to

to age

age

age

age group

group

group differences

group

differences

differences

differences))))

Q4-4 Q4-3 Q4-2 Q4-1 Q3-4 Q3-3 Q3-2 Q3-1 Q2-4 Q2-3 Q2-2 Q2-1 Q1-4 Q1-3 Q1-2 Q1-1 agegp-6 agegp-5 agegp-4agegp-3 agegp-2 agegp-1 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.685 (63.5%) 0.465 (18.4%)

References

Related documents

Soil microbes have profound impacts on plant growth and survival and can either promote or inhibit plant dominance. Exotic plants are often strongly invasive because they have escaped

* Pickup JC, Sutton AJ: Severe hypoglycaemia and glycaemic control in Type 1 diabetes: meta-analysis of multiple daily insulin injections compared with continuous subcutaneous

I f business schools are to be persuaded to embrace the strategic management concept of dynamic capabilities (which we believe they need to do), two perspectives are

To develop the conventional concrete of grade M20, and investigates the influence of the use of Pond ash as a replaceme nt for natural fine aggregate and cement on the

Furthermore, a recent study has shown that in addition to compressing time spent on site, housing developers in the UK have reported significant reductions in building defects

cultivation depth on seed bank density and species richness of sown species groups. Seed bank densities and quadrat cell counts in the vegetation of all taxa

So, just as the Hidden component can store a single value in a form, the ListEdit component can store a list of values in a form—and iterate over them just like a Foreach component.

The measurement results are supported with the simulation results, which show high similarities especially in return loss and radiation patterns and the antenna perform