• No results found

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

N/A
N/A
Protected

Academic year: 2022

Share "Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Analysis of Data

Claudia J. Stanny PSY 6217 – Research Design

Organizing Data Files in SPSS

Z All data for one subject entered on the same line

ŠIdentification data

ŠBetween-subjects manipulations:variable to identify group level for that subject

ŠMeasures for Within-subjects manipulations:each level of a within subject variable appears as a new variable in the data file

773 513

RTP2

1 2 1 5

2 1 2 6

1 2 2 4

1 1 2 3

825 682 798 880 982 19 20 16 10 15 11 2 1 1 2

490 550 710 725 617 15 16 14 11 13 12 2 2 1 1

RTP3 RTP1 RTW3 RTW2 RTW1 PIC3 PIC2 PIC1 WD3 WD2 WD1 HAND AGE SEX ID

Descriptive Statistics

Z Biased and Unbiased Estimates of Population parameters

Z Measures of Central Tendency

ŠOperational definitions of a “typical” score

ŠMean, median, mode Z Measures of Variability

ŠRange, interquartile range

ŠVariance

ŠStandard deviation

(2)

Measures of Association

Z Pearson product-moment correlation (r) Z Point-biserial correlation

ŠEquivalent to a between-subjects t-test Z Spearman rank order correlation (rs) Z Phi coefficient

Interpreting Correlations

0 2 4 6 8 10 12

0 1 2 3 4 5 6 7 8 9 10 11 12 Patient's Belief in Doctor's Competence

Days in Therapy

0 20 40 60 80 100

0 25 50 75 100

Temperature in Pensacola

ATM transactions

0 2 4 6 8 10 12

0 5 10 15

Inches of Rainfall per Month

Crop Yield (bushels)

1.5 2.5 3.5 4.5

1.5 2 2.5 3 3.5 4

HS GPA

College GPA

Effects of Outliers

0 5 10 15 20 25 30

0 2 4 6

Hours Studied

Scores on a Test

0 5 10 15 20 25 30

0 2 4 6

Hours Studied

Scores on a Test

(3)

Effects of Outliers

0 25 50 75 100 125

0 25 50 75 100 125

0 25 50 75 100 125

0 25 50 75 100 125

Restricted Range

0 10 20 30 40 50 60 70 80

0 5 10 15

Age in years

Height (in)

0 10 20 30 40 50 60 70 80

0 5 10 15

Age in years

Height (in)

Combining Data From Different Groups

90 115 140 165 190 215 240 265 290

55 65 75

Height (inches)

Weight (lbs)

90 115 140 165 190 215 240 265 290

55 65 75

Height (inches)

Weight (lbs)

Women Men

(4)

Data From Different Groups

0 1 2 3 4 5 6 7 8 9 10

0 20 40 60 80 100

Age at Death of Spouse

Interest in Remarriage

0 1 2 3 4 5 6 7 8 9 10

0 20 40 60 80 100

Age at Death of Spouse

Interest in Remarriage

Widowers Widows

Inferential Statistics

Z Logic of Inferential Statistical Tests

Z Assume that all participants are selected from the same population

ŠScore (DV) = µ+ random error

Š H0 : µ1= µ2

Z Assume that the experimental conditions produce systematic changes in the scores of participants on the dependent measure

ŠScore (DV) = µ+ treatment effect + random error

Š H1 : µ1 µ2

Types of Statistical Errors

Z Type I Error

Š Probability = alpha

Š Rejection of H0 when H0 is true

Z Type II Error

Š Probability = beta

Š Failure to reject H0 when H1 is true

Š Failure to detect a true treatment effect

(5)

Parametric Statistics

Z Random selection and/or random assignment of scores from the population of interest

ZSampling distributionis normal

ŠCentral Limit Theorem: sampling distribution will become normal as sample size increases, even when raw data are not distributed normally

Z Homogeneity of variances within groups

ŠProblems can arise when observations in one condition are more variable than observations in other conditions

Z Testing these assumptions statistically

Analysis of Variance

Z One-way designs

ŠOne independent variable (factor)

Š two levels of a factor – a t-test will give equivalent results

Š three or more levels of a factor requires use of ANOVA

Z Partitioning Variance

ŠBetween-groups variability

ŠWithin-groups variability

Partitioning Variance in ANOVA Total Variability

Systematic Variability

(Between Groups)

Error Variability

(Within Groups)

(6)

Creating an F Ratio for ANOVA

Z

SS

total

= SS

treatment

+ SS

error

Z Mean Square = SS / df

Z Expected MSBG= VarTX+ Varerror Z Expected MS WG= Varerror

F Ratio = MS BG/ MS WG

= (VarTX+ Varerror)/Varerror

Z Value of F ratio when H

0

is true?

Z Value of F ratio when H

0

is false?

Interpreting a Significant F Ratio

Z Need to evaluate pair-wise comparisons of means

following a significant F ratio Z Planned comparisons

ŠA priori comparisons

ŠComparisons based on specific predictions hypothesized before data collection

Z Unplanned comparisons

ŠPost hoc tests

ŠNeed to control Experiment-wide Error Rate (family- wise error rate)

Š Probability of at least one Type I Error increases as the number of comparisons increases

Planned Comparisons

Z Contrasts should be orthogonal

ŠOutcome of each contrast should be independent

ŠKnowing the result for one comparison does not constrain the decision for other comparisons Z Example: Planned comparisons for a three

group design: Drug A, Drug B, Control

ŠContrast 1: Average of Drug versus No Drug

Combines mean for drug A and drug B

ŠContrast 2: Drug A versus Drug B

(7)

Orthogonal Contrasts Produce Independent Outcomes

Z Assume that the comparison of the Drugs versus Control produces no difference

Mean for Drugs = 5 Mean for Control = 5 Z 3 outcomes possible for the comparison of Drug A

with Drug B

ŠNo Difference Drug A = 5 Drug B = 5

ŠDrug A > Drug B Drug A = 7 Drug B = 3

ŠDrug A < Drug B Drug A = 2 Drug B = 8 ZKnowing the decision about the first contrast does not

provide information about the outcome of the second contrast

Independence of Contrast Outcomes(continued) Z Assume that the comparison of the Drugs versus

Control produces a significant difference Mean for Drugs = 4 Mean for Control = 8 Z 3 outcomes possible for the comparison of Drug

A with Drug B

ŠNo Difference Drug A = 4 Drug B = 4

ŠDrug A > Drug B Drug A = 6 Drug B = 2

ŠDrug A < Drug B Drug A = 1 Drug B = 7 ZKnowing the decision about the first contrast does not

provide information about the outcome of the second contrast

Post Hoc Tests

Z Scheffe

ŠCorrects for all possible comparisons, whether or not these are examined

Z Bonferroni procedure

ŠDivide alpha by the number of comparisons to be made

ŠUse the critical value for the smaller alpha value for all tests

Z Dunnett

ŠSpecifically designed to compare each experimental group with a single control group

(8)

Post Hoc Tests

ZTukey HSD (default post hoc in SPSS)

ŠControls EER over a set of pair-wise comparisons (adjusts critical value for significance)

ŠSpecialized procedures available for calculation for means based on unequal sample sizes

Z Newman-Keuls

ŠLess conservative than Tukey HSD

ŠMeans in a comparison set are rank ordered

ŠCritical value for a pair-wise comparison varies with the number of “steps” between means in the ordered set Z Duncan

ANOVA – Two Factor Designs

Z Additional factors require partitioning variance

in different ways

Z Nature of the partitioning will change depending on the type of design

ŠBoth factors manipulated between-subjects

ŠBoth factors manipulated within-subjects

ŠMixed designs

Z Logic of the test remains the same: Isolate the systematic effect in the numerator of the F ratio

Partitioning Variance in the Between-

Subjects Design

(9)

Partitioning Variance in the Within- Subjects Design

Partitioning in a Mixed Design

Fixed Effects & Random Effects

Z Fixed Effects

ŠAll levels of a factor are in the design

men & women

ŠAll relevant levels of a factor are in the design

zero stress, moderate stress, high stress

Z Random Effects

ŠLevels of a factor represent a sample of the possible levels in the design

Individual subjects as levels of a factor

Unique stimulus materials (e.g., photos, sentences, etc. used to create a manipulation)

ŠAnalysis problems when 2 or more factors are random effects

(10)

Interaction Effects

Z Additive Model (no interaction) Score = µ + TXa + TXb + error Z Non-Additive Model (interaction)

Score = µ + TXa + TXb + TXab + error Z Meaning of interactions: effect of a treatment

varies depending on conditions created by manipulation of a second variable

2 x 3 Factorial with No Interaction

0 2 4 6 8 10 12 14

sad neutral happy

Emotional Value of Words

Mean Words Recalled

Men Women

2 x 2 Factorial with Significant Interaction

0 5 10 15 20 25 30 35

sad happy

Mood at Study

Mean Recall

sad at test happy at test

(11)

Other Forms of Interactions

0 5 10 15 20 25

1 2 3 4 5 6 7 8 9 10

0 5 10 15 20

1 2

0 10 20 30 40 50 60

1 2 0

5 10 15 20 25 30 35

1 2

Factors that Affect Statistical Power

Z Selection of alpha level

ŠComparing alpha = .05 to alpha = .01

ŠComparing one tailed tests to two-tailed tests Z Size of the treatment effect produced

Z Variability of the sampling distributionfor the statistical test

ŠEffects of changes in sample size

ŠEffects of changes in control of extraneous variables

Power Changes with Size of Effect Created

(12)

Power Changes with Variability in the Sampling Distribution

Measuring Effect Sizes (Abelson)

Z Raw effect size

ŠDifferences created measured in the same units of measurement used for the dependent measure

ŠDifference in IQ, number of words recalled, etc.

ŠDifficulty of knowing what counts as a “large” effect Z Standardized effect size

ŠEffect sizes converted to z-scores

ŠCreates equivalent scales for comparing different dependent measures

ŠUseful for meta-analysis

Š Cohen’s d, variance explained (r 2), omega squared

Interpreting Statistical Significance

Z Large sample sizes can provide powerful tests

in which small numeric differences are statistically reliable

Z How should these be interpreted?

Z Numerically small effects can be impressive

ŠCreated with minimal manipulation of the IV

ŠCreated when the DV is difficult to change at all

ŠEffects of any size have an important theoretical interpretation

(13)

Statistical Significance vs Practical Significance Z What are the practical implications of an

effect?

Š An educational program reliably produces a 2 point increase in the IQ of 10-year-old children

ŠA medical intervention reduces the risk of heart attack from 2 per 10,000 (p(heart attack) = .0002) by half (to p(heart attack) = .0002)

• In a population of 10 million people, this will translate into a change from 1,000 heart attacks to 500 heart attacks

Data Transformations

Z Linear transformations

ŠAdd a constant value

ŠShifts location of distribution without changing the variance of the distribution

ŠCan make data analysis easier (1K vs 1000) Z Other transformations change both the mean

and the variance of the distribution

ŠTransforming data can produce data that meet the distribution assumptions for statistical tests

Non-Linear Data Transformations

Z Square Root transformation

ŠUsed when larger means correlate with larger variances (common with count data)

ŠMakes variances more homogeneous

ŠReduces skew of distributions Z Arcsine transformation

ŠUsed when data consist of proportions (e.g., percent correct data)

(14)

Non-Linear Data Transformations

Z Log transformation

ŠReduces skew in distributions

ŠCommon with sensory data – gives manageable ranges for plotting results

Z Reciprocal transformation

ŠX = 1/X

ŠReciprocal data are less variable than raw data

ŠTransforms alter the interpretation of the measure

Reciprocal of a time measure is a speed measure

ŠNew measure is sometimes easier to interpret

GSR gets smaller as stress increases

EDR (1/GSR) increases with increased stress

Non-Parametric Tests

Z Do not make assumptions about the

underlying distributions to compute probabilities

Z Useful when data do not meet assumptions for normality, etc.

Z Can be more powerful tests than parametric tests when assumptions have been violated Z Non-parametric analogs exists for nearly every

parametric test

References

Related documents

This pneumatic control block for two- hand start ZSB is used where a man­ ually started operation, such as the triggering of compressed air cylinders, would otherwise involve danger

■ Even when the original data is not multivariate normal, certain functions like the sample mean will be approximately multivariate normal due to the central limit theorem. ■

The first experiment compared wayfinding and landmark recognition performance in unimodal visual and acoustic landmarks.. The second experiment focused on the congruency of

speci?ed anchor contact. 1 is a pictorial illustration of a system and method for matching nearest contacts in a contact hierarchy in accordance With one aspect of the

We will study the fundamental principles and techniques of data mining, and we will examine real-world examples and cases to place data-mining techniques in context, to

However, we recommend that the developers do not add standards for nutrients not required (or optional) by USDA, even if this nutrient is provided in the program’s database. Data

Borda voting is one kind of majority judgment voting where voters can express their opinion by ranking the candidates.. From that time it is seen that candidates are elected and

Mainly high resolution multichannel seismic data and swatch bathymetry data were used to study near- surface seismostratigraphy, structure and seismic fluid-indicating features in