Chapter VI. Non Parametrict Tests, 2020.doc

(1)

Chapter

VI

Non Parametric Test

...

Purpose of

Chapter

Nonparametric statistics or

distribution-free tests are those

that do not rely on parameter

estimates or precise assumptions

about the distributions of

variables.

Introduction

Nonparametric tests are sometimes called free distribution tests because they are based on fewer assumptions (e.g., they do not assume that the outcome is approximately normally distributed). Parametric tests involve specific probability distributions (e.g., the normal distribution) and the tests involve estimation of the key parameters of that distribution (e.g., the mean or difference in means) from the sample data. The cost of fewer assumptions is that nonparametric tests are generally less powerful than their parametric counterparts.

(2)

• Parametric Statistics are statistical techniques based on assumptions about the population from which the sample data are collected.

– Assumption that data being analyzed are randomly selected from a normally distributed population.

– Requires quantitative measurements that yield interval or ratio level data.

• Nonparametric Statistics are based on fewer assumptions about the population and the parameters.

– Sometimes called “distribution-free” statistics.

_ A variety of nonparametric statistics are available for use with nominal data, ordinal and even quantitative (convert to ordinal)

_ When the data is quantitative but does not meet the assumptions of normality and homogeneity, in the case of more than two groups.

Nonparametric statistics (or tests) based on the ranks of measurements are called rank statistics (or rank tests).

Analysis for Non Parametric Tests according to the number of samples and type of variable

Chi-Square

1 Sample Quantitative -

Qualitative

Binomial

Association variables

Independent Quantitative Mann-Whitney U

Quantitative Wilcoxon

Related

Before-After McNemar

Independent Quantitative Kruscal Wallis-H

Quantitative Friedman

Related

Qualitative Cochran's Q

C o m p ar e g ro u p s 2 Samples More than 2 samples

2 Samples Discrete or

categorical

Chi-Square Independent

Advantages and Disadvantages of non-parametric test

Advantages Disadvantages

 Can be used when nothing known about population

– Therefore can’t break or violate crucial assumptions for confidence interval and hypothesis tests

 Used with all scales. Can use nominal or ordinal data compared to interval or ratio data

– Nominal = names or labels, Eg. Gender (males, females)

Ordinal = imply some order or rank, Eg. Grade of study

 Can be used with smaller samples

 Sometimes there is no parametric alternative to the use of nonparametric statistics.

 The computations on nonparametric statistics are usually

 Less powerful even with same

sample size

– Prob. of type 2 error is larger for a given type 1 error prob.

 Less efficient

 Nonparametric tests can be wasteful of data if parametric tests are available for use with the data

 Nonparametric tests are usually not as widely available and well know as parametric tests

(3)

less complicated than those for parametric statistics,

particularly for small samples. can be tedious

6.1 Chi Square Independent Test (c2_{) (Contingency Table Analysis)}

A chi square (c2_{) statistic is used to investigate whether distributions of categorical variables differ from} one another. Basically categorical variable yield data in the categories and numerical variables yield data in numerical form. Responses to such questions as "What is your major?" or do you have a car?" are categorical because they yield data such as "Business" or "no". In contrast, responses to such questions as "How tall are you?" it is numerical. Numerical data can be either discrete or continuous. The table below may help you see the differences between these two variables.

Data Type Question Type ResponsesPossible

Categorical What is your sex? male or female

Numerical - Discrete How many children do you have? 0 or 1 or 2 or…

Numerical - Continuous How tall are you? 72.4 inches

A test of independence assesses whether paired observations on two variables, expressed in a contingency table, are independent of each other (e.g. polling responses from people of different nationalities to see if one's nationality affects the response).

Important note: The data are assumed to be a random sample. The expected frequencies for each category should be at least 1. No more than 20% of the categories should have expected frequencies of less than 5."

Steps in Hypothesis Testing

1. Formulate the appropriate null and alternative hypothesis Ho: There is no relationship between variables

Ha: There is a relationship between the variables 2. Specify the desired level of significance (a)

a , (level of significance); typical values are .01, .05, or .10 3. Computing the Test Statistic

Calculate the chi-squared test across all the categories. Oij = observed frequency

eij = expected frequency

4. Making a Decision and Interpreting the Result of the Test

Look at the Sig. or P_value, if this value is less than the significance level (α = .05) we reject Ho.

Example 1

(4)

Table 6.1. Decision to purchase the services by customer’s age Decision to

purchase the

services Under 21 21-34

34

and over Total

Yes 120 262 237 619

No 41 103 237 381

Total 161 365 474 1000

Solution:

Step 1: Hypothesis

Ho: There is no relationship between the customer age groups and to make a decision to purchase the services.

Ha: There is a relationship between the customer age groups and to make a decision to purchase the services.

Step 2: Level of significance: α = 0.05

Step 3: Computing the Test Statistic

Step 4: Making a decision and interpreting the result

Step to pass the data to SPSS when we have contingence tables

For SPSS to successfully analyze this problem, enter the two variables “decision to purchase the services (service)” and “Customer Age (age)” as integer variables (no decimals). In the “decision” row (we’re still in the “Variable View” tab), click the cell in the “Values” column, as shown in the following figure. Decision to purchase:

Code Observation

1 Yes

2 No

Create the same for the second variable (for the variable in Column)

Age:

(5)

Finally enter the data in (Data view), as noted in the following table

Code for Decision

Code for

Age Frequency

1 1 120

1 2 262

1 3 237

2 1 41

2 2 103

2 3 237

Since the data is in a frequency table, we must tell SPSS how to weigh the different cases; that is, we must tell SPSS that the Frequency variable records the frequency in each pair of categories listed by the Decision and Age variable. Use Data (from the menu) → Weight Cases to open the Weight Cases dialog. Make the changes as shown in the following figure:

Finally, let’s perform the Chi Square:

(6)

Output

Decision * Age Crosstabulation

Age

Total

Under 18 18 - 34 34 andover

Decision Yes ₁₂₀ ₂₆₂ ₂₃₇ ₆₁₉

No ₄₁ ₁₀₃ ₂₃₇ ₃₈₁

Total ₁₆₁ ₃₆₅ ₄₇₄ ₁₀₀₀

Chi-Square Tests

Value df

Asymptotic

Significance (2-sided)

Pearson Chi-Square

54.468a ₂

Likelihood Ratio 54.940 2 .000

Linear-by-Linear Association 46.459 1 .000

N of Valid Cases 1000

Step 4. Making decision and interpret result: Since Sig. = .000 < .05, we can reject null hypothesis, In other words, there is a relationship between the age of the customer and to make a decision to purchase the services (So we reject the null hypothesis and conclude that the age groups differ in their decision to purchase the service).

Example 2.

In a sample of 103 people aged 25 to 50 years want to determine whether cough in the morning is associated with cigarette smoking.

Table 6.2. Smoke cigarettes and cough at morning Do you cough at

morning?

Do you smoke cigarettes?

Total

Yes No

Yes 45 24 69

No 15 19 34

Total 60 43 103

Step 1: Hypothesis

(7)

Ho: Coughing in the morning is independent of cigarette smoking Ha: Coughing in the morning is associated with cigarette smoking

Step 2: Level of significance: α = 0.05

Step 3: Computing the Test Statistic ꭓ2

= 4.17

Step 4: Making a decision and interpreting the result: The calculate c2_{= 4.17, and Sig or p_value =}

0.041. is less than .05, so we reject null hypothesis. At 5% of level of significance cough at morning is associated with cigarette smoking.

Example 3: Use the data of file that has the SPSS "demo.sav"

This is a hypothetical data file that concerns a purchased customer database, for the purpose of mailing monthly offers. Whether or not the customer responded to the offer is recorded, along with various demographic information.

Crosstabulation tables (contingency tables) display the relationship between two or more categorical (nominal or ordinal) variables. The size of the table is determined by the number of distinct values for each variable, with each cell in the table representing a unique combination of values. Numerous statistical tests are available to determine whether there is a relationship between the variables in a table. Step in SPSS:

Analyze < Descriptive Statistics < Crosstab < and follow the step that we show in the picture

Interpretation Results:

What factors affect the products that people buy?

(8)

Descriptive interpretation:

The cells of the table show the count or number of cases for each joint combination of values. For example,

455 people in the income range $25,000–$49,000 had own PDAs.

None of the numbers in this table, however, stand out in an obvious way, indicating any obvious relationship between the variables.

It is often difficult to analyze a cross tabulation simply by looking at the simple counts in each cell

Inferential interpretation:

Significance Testing for Crosstabulations

The purpose of a cross tabulation is to show the relationship (or lack thereof) between two variables. Although there appears to be some relationship between the two variables, is there any reason to believe that the differences in PDA ownership between different income categories is anything more than random variation?

A number of tests are available to determine if the relationship between two crosstabulated variables is significant. One of the more common tests is Chi-square. One of the advantages of chi-square is that it is appropriate for almost any kind of data.

Pearson chi-square tests the hypothesis that the row and column variables are independent. 1. Hypothesis

Ho: The variables Income and Owns PDA are independent. Ha: The variables Income and Owns PDA are related. 2. Level significance α = .05

3.Test Statistic: c2 _{= 37.677, Sig= .000}

Output from SPSS

Chi-Square Tests

Value df Asymp. Sig.(2-sided) Pearson Chi-Square _37.677a ₃ _.000

Likelihood Ratio _37.313 ₃ _.000

Linear-by-Linear

Association 36.537 1 .000

N of Valid Cases ₆₄₀₀

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 228.73.

(9)

Interpretation: the percentage of people who had their own PDA increases as the income category increases.

Differences between independent groups 6.2 Mann-Whitney U-Test (orranked-sum)

The Mann-Whitney Test is the alternative test to the independent sample t test, also it is one of the most powerful of the nonparametric tests for comparing two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed. For example, you could use the Mann-Whitney U test to understand whether attitudes towards pay discrimination, where attitudes are measured on an ordinal scale, differ based on gender (i.e., your dependent variable would be "attitudes towards pay discrimination" and your independent variable would be "gender", which has two groups: "male" and "female"). Alternately, you could use the Mann-Whitney U test to understand whether salaries, measured on a continuous scale, differed based on educational level (i.e., your dependent variable would be "salary" and your independent variable would be "educational level", which has two groups: "high school" and "university"). The Mann-Whitney U test is often considered the nonparametric alternative to the independent t-test although this is not always the case.

Nonparametric counterpart of the t test for independent samples • Does not require normally distributed populations

• May be applied to ordinal data

• Actual measurements not used – ranks of the measurements used

Assumptions

– Independent Samples – At Least Ordinal Data

1. Formulate the appropriate null and alternative hypothesis Ho: Both groups are the same (there is no difference) Ha: Both groups are not the same

2. Specify the desired level of significance (a)

a , (level of significance); typical values are .01, .05, or .10 3. Computing the Test Statistic

4. Making a decision and interpreting the result of the test

 Reject Null Hypothesis if sig is less or equal than .05 (significant result)

(10)

Example 1

Is this sufficient evidence to indicate a difference in the average height of the groups? The data is in the following table:

Heights of males (cm) 193 188 185 183 180 178 170

Heights of females (cm) 175 173 168 165 163

Solution

Ho: Male and female students are the same height (the distribution of heights for the two groups are

equal)

Ha: Male and female students are not the same height (the distribution of heights for the two groups are

not equal)

Mann-Whitney U Rank Sum Test – Solution in SPSS

Use the data from the previous example, and enter the data into SPSS

(11)

Example 2

Consider the following data of the number of minutes needed by two groups of factory workers to learn how to build a chainsaw. Group A received classroom training, whereas Group B received only on –the-job- training.

Group A 35 39 51 63 48 31 29 41 55

Group B 85 28 42 37 61 54 36 57

We wish to decide whether the difference between the means is significant. Output from SPSS

Ranks

Group N Mean Rank Sum of Ranks

Time A 9 8.22 74.00

B 8 9.88 79.00

Total 17 Test Statisticsa

Time

Mann-Whitney U 29

Wilcoxon W ₇₄

Z _-0.674

Asymp. Sig. (2-tailed) _0.501 Exact Sig. [2*(1-tailed Sig.)] _.541b

a. Grouping Variable: Group b. Not corrected for ties.

Making Decision and Interpreting the Result: Since Sig = .541 is more than .05, we do not reject the null hypothesis, at level of significance α=0.05, therefore is not significant differences between the means of these two groups.

Differences between dependent or related groups (Matched-Pairs)

Decision: Sig .010<.05 Reject Null Hypothesis at α= .05

(12)

Parametric Nonparametric

Compare two variables measured

in the same sample t-test for dependent samples

Sign test

Wilcoxon’s matched pairs test or Mc Nemar If more than two variables are

measured in same sample

Repeated measures

ANOVA Friedman’s two way analysis of variance

6.3. Wilcoxon Rank

The Wilcoxon Rank Test is a non-parametric statistical hypothesis test, used when comparing two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e. it is a paired difference test). It can be used as an alternative to the paired Student’s t-test (before and after studies), Studies in which measures are taken on the same person or object under different conditions, studies of twins or other relatives or the t-test for dependent samples when the population cannot be assumed to be normally distributed.

 Assumptions

Random Samples

Populations are continuous

 Can Use Normal Approximation If ni ≥15 Steps in Hypothesis Testing

1. Formulate the appropriate null and alternative hypothesis

H0: both samples come from the same underlying distribution (there is no difference)

Ha: both samples are not come from the same underlying distribution

a , (level of significance); typical values are .01, .05, or .10 3. Determine the Critical value

4. Making decision and interpreting the result of the test

For your decision you see the Critical Value in the probabilistic table of “Wilcoxon”

 Reject Null Hypothesis if Sig is less than .05 (significant result).

 Don’t Reject Null Hypothesis if Sig is greater than .05 Example

The mayor of a city wants to see if pollution levels are reduced by closing the streets to the car traffic. This is measured by the rate of pollution every 60 minutes (8am - 22pm: total of 15 measurements) in a day when traffic is open, and in a day of closure to traffic, the data of air pollution is in the following table

Rate of pollution in different situation

With traffic: 214, 159, 169, 202, 103, 119, 200, 109, 132, 142, 194, 104, 219, 119, 234 Without traffic: 159, 135, 141, 101, 102, 168, 62, 167, 174, 159, 66, 118, 181, 171, 112

It is clear that the two groups are paired, because there is a bond between the readings, consisting in the fact that we are considering the same city (with its peculiarities weather, ventilation, etc.) albeit in two different days. Not being able to assume a Gaussian distribution for the values recorded, we must proceed with a non-parametric test, the Wilcoxon signed rank test.

Solution:

(13)

Ha: both samples are not come from the same underlying distribution

n = 15

a =0.05

Statistic Test:

Making a Decision and Interpreting the Result of the Test

Sig = .256, we cannot reject the null hypothesis Ho of equality of the means.

Therefore the closing roads to traffic did not bring any improvement in terms of rate of pollution.

Wilcoxon Rank Sum Test – Solution with SPSS

Wilcoxon Report in SPSS

Ranks Test Statisticsa

N

Mean Rank

Sum of Ranks

Without_traffic -With_traffic

Without_traff ic -With_traffic

Negative Ranks

9a _8.89 ₈₀

Z -1.136b

Positive

Ranks 6

b _6.67 ₄₀ _{Asymp. Sig.}

(2-tailed) 0.256

Ties 0c _{a. Wilcoxon Signed Ranks Test}

Total 15 b. Based on positive ranks.

a. Without_traffic < With_traffic b. Without_traffic > With_traffic c. Without_traffic = With_traffic

(14)

Since Sig.= 0.256>0.05, we can’t reject the null hypothesis, at α=0.05. There is no evidence of unequal distribution. Therefore, the closure of roads to traffic did not bring any improvement in terms of pollution rate.

Nonparametric Tests for Multiple Independent Samples

The nonparametric tests for multiple independent samples are useful for determining whether or not the values of a particular variable differ between two or more groups. This is especially true when the assumptions of ANOVA are not met.

When the assumptions behind the standard ANOVA are invalid or suspect, you should consider using the nonparametric procedures designed to test for the significance of the difference between multiple groups. They are called nonparametric because they make no assumptions about the parameters (such as the mean and variance) of a distribution, nor do they assume that any particular distribution is being used. The Kruskal-Wallis test is a one-way analysis of variance by ranks. It tests the null hypothesis that multiple independent samples come from the same population. Unlike standard ANOVA, it does not assume normality, and it can be used to test ordinal variables.

6.4 Kruskal-Wallis H

When a researcher wishes to compare three or more groups or populations and the data are ordinal, the Kruskal-Wallis test is the appropriate statistical technique, or when the data is numerical but the assumptions does not have to assume that the underlying populations are normally distributed or the equal variances.

It is used to test the null hypothesis that all populations have identical distribution functions against the alternative hypothesis that at least two of the samples differ only with respect to location (median), if at all.

For example, you could use a Kruskal-Wallis H test to understand whether exam performance, measured on a continuous scale from 0-100, differed based on test anxiety levels (i.e., your dependent variable would be "exam performance" and your independent variable would be "test anxiety level", which has three independent groups: students with "low", "medium" and "high" test anxiety levels). Alternately, you could use the Kruskal-Wallis H test to understand whether attitudes towards pay discrimination, where attitudes are measured on an ordinal scale, differed based on job position (i.e., your dependent variable would be "attitudes towards pay discrimination", measured on a 5-point scale from "strongly agree" to "strongly disagree", and your independent variable would be "job description", which has three independent groups: "shop floor", "middle management" and "boardroom").

Data types that can be analyzed with Kruskal-Wallis H

The data points must be independent from each other

 the distributions do not have to be normal and the variances do not have to be equal

 you should ideally have more than five data points per sample

 all individuals must be selected at random from the population

 all individuals must have equal chance of being selected

 sample sizes should be as equal as possible but some differences are allowed

1. Formulate the appropriate null and alternative hypothesis Ho: Identical Distribution

Ha: At Least 2 Differ

(15)

3. Computing the Test Statistic

The H-statistic is calculated as follows:

Where:

Ri = Sum of the ranks of the ith group

ni = Sample size of the ith group

n = Combined sample sizes of all groups

4. Making a decision and interpreting the result of the test: If Hc < than the critical value; do not reject

null hypothesis at α=0.05

Example

An advertising agency employs three different film production companies to produce its television commercials. The advertising agency has taken a sample of five commercials from each of the population houses, and agency executives have ranked the production quality of the commercials from best quality (1) to lowest quality (15). These ranks are show in the next table. Notice that the advertising agency considered two commercials to be ranked of equal quality. Hence, rather than being ranked 3 and 4, the two commercials are each ranked 3.5. The data is in the following Table.

Solution

Ho: Identical Distribution Ha: At Least 2 Differ α = .05

df = p - 1 = 3 - 1 = 2 Test Statistic:

Ranked of the quality of commercials production Production company 1 Production company 2 Production company 3

9 6 1

5 13 7

3.5 10 15

14 2 12

8 3.5 11

R1=39.5 R2=34.5 R3=46

Solution with SPSS

(16)

Output

Ranks

Company N Mean Rank

Ranked of the quality of commercials production

Company 1 5 7.90

Company 2 5 6.90

Company 3 5 9.20

Total 15

Test Statisticsa,b

Ranked of the quality of commercials

production

Chi-Square .666

df 2

Asymp. Sig. .717

a. Kruskal Wallis Test

b. Grouping Variable: Company

(17)

Example with SPSS

Agricultural researchers are studying the effect of mulch color on the taste of crops. Strawberries grown in red, blue, and black mulch were rated by taste-testers on an ordinal scale of one to five (far below to far above average). The results are collected in the following table. Use the Kruskal-Wallis test to determine if taste varies by mulch color.

Customer Taste Color Customer Taste Color

1 1 1 13 3 1

2 3 2 14 4 2

3 2 3 15 5 3

4 1 1 16 4 1

5 2 2 17 3 2

6 4 3 18 3 3

7 2 1 19 3 1

8 4 2 20 3 2

9 4 3 21 3 3

10 2 1 22 2 1

11 4 2 23 3 2

12 3 3 24 5 3

Solution with SPSS

The Kruskal-Wallis test uses ranks of the original values and not the values themselves. That's appropriate in this case, because the scale used by the taste-testers is ordinal.

First, each case is ranked without regard to group membership. Cases tied on a particular value receive the average rank for that value. After ranking the cases, the ranks are summed within groups.

The Kruskal-Wallis statistic measures how much the group ranks differ from the average rank of all groups.

Ranks

Color N MeanRank

Taste Red 8 7.69

Blue 8 13.94

Black 8 15.88

(18)

The degrees of freedom for the chi-square statistic are equal to the number of groups minus one.

Making a Decision and Interpreting the Result: If H = 6.347 and Sig = .042 < .05; so we reject null hypothesis, at α = 0.05 there is evidence population distribution are different.

Nonparametric Tests for Multiple Related Samples

The nonparametric tests for multiple related samples are useful alternatives to a repeated measures analysis of variance. They are especially appropriate for small samples and can be used with nominal or ordinal test variables.

The Friedman procedure tests the null hypothesis that multiple ordinal responses come from the same population. As with the Wilcoxon test for two related samples, the data may come from repeated measures of a single sample or from the same measure from multiple matched samples.

The Cochran Q procedure tests the null hypothesis that multiple related proportions are the same. The Cochran test is a multivariate extension of the McNemar test used for two related samples.

6.5 Friedman Fr-Test

The Friedman’s test is the nonparametric test equivalent to the repeated measures ANOVA, and an extension of the Wilcoxon test.

– it allows the comparison of more than two dependent groups (two or more conditions)

1. Formulate the appropriate null and alternative hypothesis Ho: There are no differences between the groups.

Ha: There are differences between the groups.

a , (level of significance); typical values are .01, .05, or .10 3. Determine the Critical Region

Critical value depend how is level of significance (α) and how much is degree of freedom (df) Test Statisticsa,b

Taste

Chi-Square _6.347

df ₂

Asymp. Sig.

.042

(19)

4. Computing the Test Statistic

Procedure

1. Assign Ranks, Ri = 1 – p, to the p treatments in each of the b blocks

 Smallest Value = 1; Largest Value = p

 Average Ties

2. Sum Ranks for Each Treatment 3. Compute Test Statistic

Where:

Ri = Sum of the ranks of the ith group

p = Number of treatment b = Number of block

4. Making a Decision and Interpreting the Result: Compare the value of Friedman (Fr) with the tables

of critical values χ2 _{of Pearson Chi-square}

If the value Fr > χ2 we reject null hypothesis

Example

Three new traps were tested to compare their ability to trap mosquitoes. Each of the traps, A, B, and C were placed side-by-side at each five different locations. The number of mosquitoes in each trap was recorded. At the .05 level, is there a difference in the distribution of number of mosquitoes caught by the three traps?

Trap A Trap B Trap C

3 5 0

23 17 15

11 5 7

8 4 2

19 11 5

Solution

Ho: There are no differences between the groups. Ha: There are differences between the groups. α = .05

df = p - 1 = 3 - 1 = 2

Test Statistic:

Raw Data Ranks

Trap A Trap B Trap C Trap A Trap B Trap C

3 5 0 2 3 1

23 17 15 3 2 1

11 5 7 3 1 2

8 4 2 3 2 1

19 11 5 3 2 1

14 10 6

Mean rank 2.8 2 1.2

(20)

Fr= 6.4

b=5 P=3

Making a Decision and Interpreting the Result: Since Fr=6.4, and Sig = .041, we reject null hypothesis;

then at α=0.05 there is evidence population distribution are different. Result from SPSS:

Mean Rank N 5

Trap1 2.8 Chi-Square 6.4

Trap2 ₂ df ₂

Trap3 1.2 Asymp. Sig. 0.041

a. Friedman Test

Other techniques for Examining Associations 6.6 Spearman Correlation Coefficient Technique

The technique is appropriate when the degree of association between two sets of ranks (pertaining to two variables) is to be examined.

• Illustrative research question(s) this technique can answer.

Is there a significant relationship between motivation levels of salespeople and the quality of their performance?

Assume that the data on motivation and quality of performance are in the form of ranks, say, 1 through 20, for 20 salespeople who were evaluated subjectively by their supervisor on each variable.

Procedure

 Measures Correlation Between Ranks

 Corresponds to Pearson Product Moment Correlation Coefficient

 Values Range from -1 to +1

Procedure

1. Assign Ranks, Ri , to the Observations of Each Variable Separately

2. Calculate Differences, di , Between Each Pair of Ranks

3. Square Differences, di 2, Between Ranks

4. Sum Squared Differences for Each Variable

5. Use Shortcut Approximation Formula

di = the difference between the ith sample unit's ranks on the two variables

n = the total sample size

Example: Quality of life

(21)

Association between quality of life and Percentage of New Residents

City

Quality of life

Percentage of New Residents

A 25 14 B 10 3

C 2 5

D 30 17 E 20 15 F 20 12 G 27 17 H 35 20 I 40 22 J 11 6 K 10 7 L 23 13

Hypothesis

Ho: There is no relationship between quality of life and percentage of new residents Ha: There is a relationship between quality of life and percentage of new residents

Solution

Correlations

Quality_life Percentage of

New Residents

Spearman's rho

Quality_life

Correlation Coefficient 1.000 .946**

Sig. (2-tailed) . .000

N 12 12

Percentage of New Residents

Correlation Coefficient .946** _1.000

Sig. (2-tailed) .000 .

N 12 12

(22)

Decision: Reject null hypothesis because Sig is < .05. These variables have a strong, positive association. The higher of quality of life score, the greater the percentage of new residents. The value of r2_is

(0.9462 _{= 0.895), which indicates that we will make 89.5% fewer errors when predicting rank on one}

variable from rank on the other, as opposed to ignoring rank on the other variable.

Note: Spearman’s rho is an index of the strength of association between the variables: it ranges from 0 (no association) to +/- 1.00 (perfect association). A perfect positive association (

r

s = +1.00) would exist if there were o disagreements in ranks between the two variables. A perfect negative relationship (

r

s = -1.00) would exist if the ranks were imperfect disagreement.

For testing Spearman’s Rho for significance, when the number of cases in the sample is 10 or more, the sampling distribution of Spearman’s rho approximates the t distribution, and we will use this distribution to conduct the test.

Example: Industrial Marketing Firm

An industrial marketing firm has been hiring all its salespeople from among the graduates of 10 business schools in the vicinity of its headquarters.

The firm developed a subjective ranking of the perceived prestige levels of the 10 schools and the performance levels of the groups of graduates recruited from these schools

Question

What is the degree of association between the prestige levels of the schools and the sales performance levels of their graduates hired by this company?

Table 6.8. Association between Business School and Ranking of school's Prestige

Business School

Ranking of school's Prestige

Ranking of performance of

School's Graduates

Difference Between

Ranks

Squared Difference

1 10 8 2 4

2 7 3 4 16

3 9 7 2 4

4 1 2 -1 1

5 6 9 -3 9

6 2 4 -2 4

7 3 5 -2 4

8 8 10 -2 4

9 5 6 -1 1

10 4 1 3 9

Sum= 56

(23)

SPSS output

Correlations

Ranking of performance of School's Graduates

Spearman's rho

Correlation Coefficient 1.000 .661*

Sig. (2-tailed) . .038

N 10 10

Ranking of performance of School's Graduates

Correlation Coefficient .661* _1.000

Sig. (2-tailed) .038 .

N 10 10

*. Correlation is significant at the 0.05 level (2-tailed).

Decision: Since Sig = .038, we reject H0 and conclude that there is a true association between the prestige

of business schools and the job performance of its graduates. In other words, the sample correlation of rs= 0.661 is unlikely to have occurred because of chance.

Chapter review problems

1. Employees of Inyange Company were sampled at random from pay records and asked to complete an anonymous job satisfaction survey, yielding the tabulation shown. Research question: At α = .05, is job satisfaction independent of pay category?

Pay Type

Job Satisfaction

Satisfied Neutral Dissatisfied Total

Monthly salary 20 13 2 35

Hourly 135 127 58 320

Total 155 140 60 355

Ho: Pay Type and job satisfaction are independent factors. Answer: c2 _{= 4.543}

2. A study was conducted, among 100 professors from 3 different divisions for the preference on beverages of 3 categories test if there is any relationship between the field of teaching and preference of beverage. Answer: c2 _=4.445

Beverage

Field of teaching

Business SciencesSocial Psychology Total

Tea 20 10 10 40

Coffee 10 10 15 35

Cold drinks 10 8 7 25

Total 40 28 32 100

3. Suppose you have the following categorical data set. (Answer: c2 _=125.516)

Incidence of three types of malaria in three tropical regions

Asia Africa South America Totals

Malaria A 31 14 45 90

Malaria B 2 5 53 60

Malaria C 53 45 2 100

Totals 86 64 100 250

(24)

For factory 1, the rates (% of capacity) are: 71, 82, 77, 92, and 88. For factory 2, the rates are: 85, 82, 94 & 97.

Do the factory rates have the same probability distributions at the .05 level? Answer: R1=19.5,

R2=25.5; U1=15.5, U2=4.5

5. The effectiveness of advertising for two rival products (Brand X and Brand Y) was compared. Market research at a local shopping centre was carried out, with the participants being shown adverts for two rival brands of coffee, which they then rated on the overall likelihood of them buying the product (out of 10, with 10 being "definitely going to buy the product"). Half of the participants gave ratings for one of the products; the other half gave ratings for the other product. The data are in the following table

Brand X Brand Y

Participant Rating Participant Rating

1 3 1 9

2 4 2 7

3 2 3 5

4 6 4 10

5 2 5 6

6 5 6 8

Is there a significant difference between the ratings given to each brand in terms of the likelihood of buying the product?.Answer: U=2

Brand N Mean Rank Sum of Ranks Ratings

Ratings Brand X 6 3.83 23.00 Mann-Whitney U 2.000

Brand Y 6 9.17 55.00 Wilcoxon W 23.000

Total 12 Z -2.576

Asymp. Sig. (2-tailed) _.010

6. Test the following hypothesis with Man-Whitney U

H0: The health service population is identical to the educational service population on employee

compensation

Ha: The health service population is not identical to the educational service population on employee

compensation

Health Service Educational Service

20.1 26.19

19.8 23.88

22.36 25.5

18.75 21.64

21.9 24.85

22.96 25.3

20.75 24.12

23.45

Answer: U=3

(25)

User Current New

Jeannette 9.98 9.88

Athanasie 9.88 9.86

Bonaventure 9.90 9.83

Jean Sauveur 9.99 9.80

Sylvére 9.94 9.87

Cyliak 9.84 9.84

Answer: T+ = 15, T- = 0; Z=-2.023. There is evidence New Package is faster.

n’ = 5 (not 6; 1 elim.)

8. As production manager, you want to see if 3 filling machines have different filling times. You assign 15 similarly trained & experienced workers, 5 per machine, to the machines. At the .05 level, is there a difference in the distribution of filling times?

Machine 1 Machine 2 Machine 3

25.4 23.4 20.0

26.31 21.8 22.2

24.1 23.5 19.75

23.74 22.75 20.6

25.1 21.6 20.4

Answer: H=11.58, critical value , df=number of group-1 (3-1=2)

9. We want to find out if students have a preference for one type of soda over others.

They are blindfolded and given a taste test. They are asked to take a sip of Brand X, Brand Y and Brand Z sodas and to rank order their preference for the three sodas where a 1 is the highest rank, a 2 the next highest and a 3 the least preferred soda. The data representing the rankings given by each participant to the three sodas are:

Participants’ Rankings of the Three Brands of Soda

Participant

Brand X

Brand

Y Brand Z

Alexis 2 1 3

Chantal 1 3 2

Eric 1 2 3

Fabrice 1 3 2

Désange 1 3 2

Joselyne 1 2 3

Gad Major 1 3 2

Alphonsine 1 2 3

Diogene 1 3 2

Samuel 2 1 3

Ho: There will be no difference in the participants’ rank ordered preferences for Brand X, Brand Y, or Brand Z sodas.

Ha: There will be a difference in the participants’ rank ordered preferences for Brand X, Brand Y, and Brand Z sodas.

(26)

(mg/1000)? in brain, heart and blood. The aim of the study was to determine if the concentration of the toxicant in the three organs is the same or different. The results obtained are:

Brain Heart Blood

164 96 51

105 115 41

150 100 46

145 75 79

139 88 52

144 64 70

139 97 46

98 101 52

Ho: There are no significant differences in the concentration of the toxicant in the brain, heart and blood.

Ha: There are significant difference in the concentration of the toxicant in the brain, heart and blood.

11. A psychologist believes that those who score high on a need-achievement test will likely have a high salary to match. To test this theory, the psychologist has given questionnaires to a random sample of 17 subjects and has ranked the data so that the highest value in each category has been assigned a 1.

Subject Need AchievementRank SalaryRank

A 1 3

B 8 7

C 4 2

D 10 12

E 12 9

F 2 1

G 13 11

H 6 6

I 16 17

J 11 13

K 14 15

L 3 5

M 9 10

N 7 8

O 15 14

P 17 16

Q 5 4

Compute the Spearman rank correlation coefficient and test it for significance at the .05 level. What conclusion may be reached? Answer: rs=0.949, sig. .000

Chapter VI. Non Parametrict Tests, 2020.doc

Chapter

VI

Non Parametric Test

...

Purpose of

Chapter

Nonparametric statistics or

distribution-free tests are those

that do not rely on parameter

estimates or precise assumptions

about the distributions of

variables.

r

r

Take heart and be strong; have no fear and do not be

troubled; for the Lord your God is with you wherever