The Not-Even-Remotely Close to Being a Complete Guide to SPSS / PASW Syntax. (For SPSS / PASW v.18+)

(1)

The Not-Even-Remotely Close to

Being a Complete Guide to

SPSS / PASW Syntax

(For SPSS / PASW v.18+)

Dr. Bryan R. Burnham

Department of Psychology

University of Scranton

(2)

Table of Contents

1. What is SPSS / PASW ?

...3

1.1 Where is it Available?

...3

1.2 Finding and Opening PASW

...3

1.3 Three Types of PASW Files

...3

1.4 Data Files (Data View)

...4

1.5 Defining and Adjusting Variables in Data Files (Variable View)

...4

1.6 Basic Structure of Output Files

...6

1.7 Data Files Associated with this Guide

...7

2. The Syntax Editor

...9

2.1 Why Syntax? Because it’s Better!

...10

2.2 Some Syntax Basics...It’s Easy?

...10

2.3 Opening .sav files with Syntax

...11

2.4 Opening Microsoft Excel (.xls) Files with Syntax

...11

2.5 Opening Text (.txt) files with Syntax

...12

3. Syntax for Basic Statistical Needs

...14

3.1 Variable Labels

...14

3.2 Value Labels

...14

3.3 Frequencies

...15

3.4 Descriptive Statistics

...16

3.5 SORT CASES

...16

3.6 SPLIT FILE

...17

4. Correlation & Regression

...19

4.1 Pearson Correlations (Bivariate)

...19

4.2 Pearson Correlations (Partial)

...21

4.3 Univariate Regression (one regressor)

...23

5. t-Tests

...26

5.1 One-Sample t-test

...26

5.2 Independent Groups t-Tests

...27

5.3 Correlated Samples (Paired Samples) t-Tests

...30

6. Analysis of Variance

...32

6.1 Oneway Analysis of Variance (via GLM)

...32

6.2 Between Subjects Factorial ANOVA (via GLM)

...36

6.3 Repeated Measures ANOVA (via GLM)

...40

7. Chi Square

...44

7.1 Cross-Tabulation Procedure (Factorial Chi-Square)

...44

7.2 Oneway Chi-Square

...46

7.3 Goodness of Fit Test

...47

(3)

1. What is SPSS / PASW ?

Statistics Package for Social Scientists (SPSS) is a software tool for analyzing sets of data. I have absolutely no idea what the acronym PASW stands for. I wish it was PAWS, because it would be easier to say. Anyway, PASW is just the newest version of SPSS (currently in version 18). SPSS/PASW operate like a spreadsheet program, such as Microsoft Excel, and the data files look a lot like Excel. Unlike Excel, PASW/SPSS is designed for manipulating and analyzing data.

As part of your course requirements, you will gain basic understanding of how to use PASW. Indeed, most statistical analyses are performed with PASW or some other software. Why do we teach you this stuff by hand, why not just use PASW? Simply put, it’s because without conceptual knowledge of where the results of an analysis done with PASW come from, they’re just a bunch of numbers in a computer file! Thus, we teach you what the variance of a set of data is and where it comes from by showing you how it’s calculated. This way, variance should make sense when using PASW. If my logic doesn’t make sense, drop out of the course and preferably out of college. :-)

1.1 Where is it Available?

At the University of Scranton, SPSS / PASW is available in the Weinberg Memorial Library (WML) on the 1st floor and in group study rooms, Brennan Hall (BRN) rooms 102 and 201, McGurrin Hall (MGH) room 110, Hyland (HYL) Café and room 102 (where statistics classes are held), and Alumni Memorial Hall (AMH) rooms 214 and 202. It may be available in the PT/OT lab in the basement of Leahy Hall and in the Nursing Lab and the Stout Lab in McGurrin Hall.1

1.2 Finding and Opening PASW

From the Start Menu, → All Programs → SPSS Inc. → PASW Statistics 18 → PASW Statistics 18 (the red icon with the gray sigma symbol).

1.3 Three Types of PASW Files

There are three main files associated with PASW (and SPSS):

1. Data Files contain data to be analyzed, and have the extension '.sav'. Data files look a lot like a Microsoft Excel spreadsheet, with columns, rows and cells. Columns represent variables, with an abbreviated name of the variable at the top of each column. Rows represent cases, or research subjects. That is, each row/case could be the data associated with an individual, or a sample. The cells and values within the file are the data. (See Figures 1 & 2)

2. Syntax Files are used to request PASW conduct an analysis, and have the extension '.sps'. Hence, syntax files are command files that tell PASW what to do with data. I admit that most analyses and procedures in PASW can be obtained through the pull-down menus in the data file; but, syntax is better for reasons given later. Syntax files are similar to text editors where you insert text-based commands for PASW to interpret and, hopefully, run your requested analyses on the data. (See Figure 5)

3. Output Files are generated in response to PASW running an analysis on a set of data, and have the extension '.spv' (in SPSS the extension is '.spo'). Importantly, if something was written incorrectly in the syntax file, PASW will produce a “Warning”, usually with no additional output. Most of an output file is table-format, with the exception of graphs and charts. (See Figure 3) 1 Thanks to Dr. Barry Kuhle (University of Scranton) for compiling this list.

Σ

(4)

1.4 Data Files (Data View)

There are two different 'views' of a PASW data file:

1. Data View, where your data can be entered by hand, and where you can view the actual values of the working data file.

2. Variable View, where you can define parameters of your variables, such as how many decimals are showing, whether the variable is a string, a date, or a numeric variable, etc.

The figure below is a screen shot of the Data View in a blank PASW data file:

You can toggle between the Data View and the Variable View by clicking on the appropriate tab at the bottom left hand corner in any data file. You can also toggle back and forth between the Data View and the Variable View by double-clicking on any variable name. This amounts to double-clicking a column in Data View and double-clicking any row in Variable View. I will assume that you can figure out how to insert values into a data file, so I will not cover them here.

1.5 Defining and Adjusting Variables in Data Files (Variable View)

If necessary, it is good to define the parameters of your variables first, so that when when you run an

(5)

analysis the output of any tables and graphs will be complete and understandable. Below is a screen shot of the Variable View in a blank PASW data file:

Below, I've listed each of the parameters that can be seen at the top of each column in Variable View, with a brief description of what each parameter can do:

NAME Refers to variables labels that you can enter, but must begin with a letter.

TYPE Indicates whether a variable is numeric, a string, a date, etc. Clicking TYPE opens a dialogue box, in which you can specify the type of data contained in a variable.

WIDTH Is how many numbers or letters is allowable for a value under a variable. DECIMAL The number of decimal places displayed for numeric variables.

LABEL Allows you to assign a longer name to an abbreviated variable label in the data file. That is, you could 'name' a variable STAI, but 'label' the variable ‘State Trait Anxiety Inventory at time 1’. The abbreviated name appears under NAME, and the longer LABEL will appear on any tables or graphs in the output.

(6)

VALUES Allows you to assign dummy-codes to variable. For example, if your data file contains the variable ‘Sex’, a 0 could refer to males and 1 could refer to females. But, 0's and 1's are arbitrary unless they are defined. This packet will show you how to assign labels using syntax. MISSING Refers to what PASW should do with missing data entries.

COLUMNS Refers to how many columns wide you want the variable name to appear in the Data View. Normally this is set to eight.

ALIGN Allows you to have the values in each column left-justified, right-justified, or centered. MEASURE Relevant to numeric variables. Indicates the measurement scale of a variable. It allows three levels: nominal, ordinal and scale, which refers to both interval and ratio data. Most of these parameters are irrelevant for the time being. Later, you'll learn how to assign longer, more descriptive labels to a variable name, as well as dummy-code a variable.

1.6 Basic Structure of Output Files

After you have opened a data file, written syntax commands to request an analysis, and then run that analysis; PASW will produce an output file, like that below:

(7)

The output file is what we are trying to get PASW to provide us. It presents, in table or graph form, the descriptive and/or inferential statistics requested. As you can see in Figure 3, the output contains a single table with a listing of several descriptive statistics (N, Minimum, Maximum, Mean, Standard Deviation), for two different variables (SAT_CR and SAT_M). Don't worry about the variable names right now; trust me, you'll know what they are in a bit. Later, when you have PASW run an analysis on a set of data, I will not include whole screen shots of the output. Rather, I'll simply paste the output tables into the document. (Gotta conserve megabytes!)

1.7 Data Files Associated with this Guide

The data file that will be used throughout most of this packet is 'GRE Therapy Data File.sav', and is available on my statistics course website (http://sites.google.com/site/psyc210stats/), on the course files page. There are actually three data files with the same name ('GRE Therapy Data File.sav'; 'GRE Therapy Data File.xls'; and 'GRE Therapy Data File.txt'). I'll show you how to open each of these types of data files using syntax, so download each file. Here's a screen shot of a portion the data file:

The file contains a set of data from a fictitious study that examined the influence of a new Study Drug and different Types of Tutoring on student scores on the Graduate Record Examination (GRE). The GREs are a set of standardized examinations, like the Scholastic Aptitude Tests (SATs). The GREs are required by most graduate school programs to be reported by applicants. The GREs contain three sections, like the SATs: (1) quantitative reasoning, (2) verbal reasoning, and (3) analytical writing.

(8)

In this fictitious study, researchers investigated whether two independent variables (Study Drug and Type of Tutoring) improved scores on each section of the GREs. For the independent variable Study Drug, subjects were given nothing (control group), a placebo (placebo group), or one of two different dosages of the drug (100 mg/day or 200 mg/day). For the independent variable Type of Tutoring, subjects were not tutored (control group), were tutored with other students in small groups (Group Tutoring), or were tutored one-on-one (Individual Tutoring).

Subjects were tested at the beginning of the study during a pretest phase (before the independent variables were administered), and were tested several months later during a posttest phase (after the independent variables should have an influence). In addition to scores on each of the three sections of the GREs, there are a number of other variables included in the data set. Each subject's SAT scores were collected, their heights and weights were measured, and each subject was measured on their level of Trait Anxiety (enduring level of anxiety) and State Anxiety (temporary, situational anxiety). Trait and State anxieties were assessed using the State Trait Anxiety Inventory (STAI), during both the pretest and posttest phase. The table below lists the abbreviated NAME for each variable, along with a brief description of each variable.

Variable NAME Description of Variable

ID Identification number assigned to each subject.

Sex Each subject's biological sex; dummy-coded, where 1 = male and 2 = female.

Coll_Class Each subject's current year in college; dummy coded, where 1 = Freshmen, 2 = Sophomore, 3 = Junior, and 4 = Senior. Coll_Maj Each subject's primary major; dummy-coded, where 1 = Psychology, 2 = History, 3 = Biology, 4 = Communications, 5 =

English, and 6 = Mathematics.

Height_cm Each subject's height, measured to the nearest 0.1 cm. Weight_kg Each subject's weight, measured to the nearest 0.1 kg.

SAT_CR Each subject's score on the Critical Reading (CR) section of the SATs. SAT_M Each subject's score on the Mathematics (M) section of the SATs. SAT_V Each subject's score on the Verbal (V) section of the SATs. SAT_Tot Each subject's summed SAT score (SAT_CR + SAT_M + SAT_V) GPA Each subject's current cumulative GPA.

Drug_Group Level of the independent variable Drug Group, into which the subject was assigned; dummy-coded, where 1 = Control Group (no drug given), 2 = Placebo Group, 3 = 100-mg of Drug/Day, and 4 = 200-mg of Drug/Day.

Tutor_Group Level of the independent variable Tutor Group, into which the subject was assigned; dummy-coded, where 1 = Control Group (no tutoring), 2 = Group Tutoring, 3 = Individual Tutoring.

Pre_STAIt Each subject's trait anxiety (t) during the pretest phase; measured using the State Trait Anxiety Inventory (STAI). Pre_STAIs Each subject's state anxiety (s) during the pretest phase; measured using the State Trait Anxiety Inventory (STAI). Pre_GREv Each subject's score on the Verbal Reasoning (v) section of the GREs, during the pretest phase.

Pre_GREq Each subject's score on the Quantitative Reasoning (q) section of the GREs, during the pretest phase. Pre_GREa Each subject's score on the Analytical Writing (a) section of the GREs, during the pretest phase.

Post_STAIt Each subject's trait anxiety (t) during the posttest phase; measured using the State Trait Anxiety Inventory (STAI). Post_STAIs Each subject's state anxiety (s) during the posttest phase; measured using the State Trait Anxiety Inventory (STAI). Post_GREv Each subject's score on the Verbal Reasoning (v) section of the GREs, during the posttest phase.

Post_GREq Each subject's score on the Quantitative Reasoning (q) section of the GREs, during the posttest phase. Post_GREa Each subject's score on the Analytical Writing (a) section of the GREs, during the posttest phase. Table 1: Variable NAMES and brief descriptions.

(9)

2. The Syntax Editor

Looks and works like a text editor (Text Pad, Note Pad, Word Pad). You type in what you want PASW to do, in the correct sequence and using PASWs language, and PASW does what you asked it to do (hopefully). If anyone has ever done a little computer programming (C, C++, Matlab, etc.), then this is just like writing code; albeit much simpler code! PASW Syntax files have the file extension *.sps. Here’s an example of what the text editor looks like:

Note, if you use SPSS, then you won't have the various colors and the numbers for each line. The inclusion of different colors for different syntax statements I the PASW structure is a huge improvement over SPSS.

From here on out, I won't be pasting in screen shots of the syntax that we'll be using. Rather, I'll just be writing the syntax that you need to include in order to run a specific analysis or procedure. For example, rather than including a screen shop like Figure 5, I'll type out the syntax (with the appropriate colors and line numbers). Note, that you do not have to type out line numbers. Thus, the syntax in

Figure 5 will appear as (see top of next page):

(10)

1 GET DATA

2 /TYPE=XLS

3 /FILE='C:\Documents and Settings\burnhamb2\My Documents\Class Materials\PSYC 210'+

4 'Statistics\SPSS Assignments\SPSS-PASW Packet\GRE Therapy Data File.xls'

5 /SHEET=name 'Sheet1'

6 /CELLRANGE=full

7 /READNAMES=on

8 /ASSUMEDSTRWIDTH=32767 .

9 DATASET NAME DataSet 2 WINDOW=FRONT

Don't worry about what all of this means right now, it will make sense in a little while. :-)

2.1 Why Syntax? Because it’s Better!

There are two methods that can be used to have PASW do stuff: (1) using pull-down menus, (2) telling PASW what to do by writing syntax commands. (I’ll refer to these as the wrong-way and right-way, respectively.)

Is the syntax-method easier? No, but it’s much more useful, for a variety of reasons. First, you can do more within one syntax file and in a shorter time than with the pull-down menu method. Specifically, you can plan out all of the stuff you need PASW to do, write the appropriate syntax for everything, and then run it all at once. In contrast, with pull-down menus you have to do one thing at a time. Second, you can do more with syntax. There are certain procedures that are simply not possible with the pull-down menus, but that are possible with syntax. Third (and certainly not finally), if you go to grad school, especially in the sciences, you’ll need to learn programming. I’m giving you a head start. You’re welcome!

2.2 Some Syntax Basics...It’s Easy?

PASW syntax is not case-sensitive, except for variable names. Remember: variable names are case sensitive. If you spell a variable's name correctly, but forget to capitalize a letter or make a letter lowercase, the syntax will not run.

I suggest writing commands and sub-commands in CAPS to help distinguish between commands and variables. This will allow you to parse the syntax quickly, especially if you write variable names in lowercase and uppercase.

Syntax commands and sub-commands should be entered on separate lines, or ended with a period (.), but not every syntax line has to end with a period, just the overall procedures. That is, if you look at the syntax in Figure 5, there is a period only on Line 8. This is because lines 1-8 are, collectively, asking PASW to retrieve a data file; hence, these eight lines encompass one whole pocedure.

Sub-commands within a command procedure, and parts of a command that appear on different lines, must start with a forward-slash (/), not a backward slash. PASW will not know what to do with such sub-commands if the forward slash is not entered. For example, if you look at Figure 5, you can see a forward slash beginning lines 2,3,5,6,7, and 8 (there is no slash in line 4, because line 4 is a continuation of line 3).

It is good to enter 'EXECUTE .' at the end of a command procedure. Some commands will not run without this terminator command. Unfortunately, I have never figured out which commands will and will not run with and without this ending statement.

(11)

Once your syntax is written, you need to run it in order to generate an output file. Highlight the syntax that you want to run and hit Ctrl+R to run the procedures. Or, instead of hitting Ctrl+R, click the Run Button on the toolbar. The Run Button is the green rightward-pointing arrow in the middle.

2.3 Opening .sav files with Syntax

I admit that if you have a PASW data file already created, you can really just locate that file and double click to open. Nonetheless, here's how to open a PASW data file using syntax (notes follow):

1 GET

2 FILE='C:\Documents and Settings\burnhamb2\Desktop\GRE Therapy Data File.sav'. 3 DATASET NAME DataSet1 WINDOW=FRONT.

The file directory address in line 2 will differ, depending on where the file is placed on your hard drive. In this case, I placed the file on the Desktop for easy access. Note that the directory address for the file must be contained in single quotes ('). DATASET NAME on line 3 should just be set to DataSet1 as listed.

An output file will be generated when you run any syntax. When opening a data set, the output file will contain only the commands that led to the opening of the file. You can delete that output file.

2.4 Opening Microsoft Excel (.xls) Files with Syntax

Below is an example of the syntax needed to open a data file saved as a Microsoft Excel spreadsheet: 1 GET DATA

2 /TYPE=XLS

3 /FILE='C:\Documents and Settings\burnhamb2\Desktop\GRE Therapy Data File.xls'

4 /SHEET=name 'Sheet1'

5 /CELLRANGE=full

6 /READNAMES=on

7 /ASSUMEDSTRWIDTH=32767.

8 DATASET NAME DataSet1 WINDOW=FRONT.

Notice that Line 1 here and Line 1 for opening a PASW data file are the same (GET DATA). You can think of this statement as the 'major command' that are you asking PASW to perform; all of the additional lines are sub-commands.

When opening an Excel spreadsheet, special care must be taken that you are asking PASW to open the correct sheet within the workbook (usually Sheet1), that you are asking for the correct cells in the worksheet, and that you have asked PASW to read in any variable names in the spreadsheet.

The sub-command on Line 2 (/TYPE) lists XLS, which is the file extension for Microsoft Excel files. On Line 4 (/SHEET=name), the name between the single quotes ('Sheet1') is the name of the worksheet within the Excel workbook where the data is located. If the data sheet in the workbook has a different name or number, this needs to be changed here. Line 5 (/CELLRANGE=full), refers to which cells within the named workbook sheet that are to be imported into PASW. If all of the cells with data are to

(12)

be imported, just use 'full', but if only some of the cells are to be imported, this should be indicated here (e.g., A1:B200). On Line 6 (/READNAMES=on), this tells PASW that the first row of the Excel sheet contains the names of the variables, and these should be treated as variable names. If the Excel book does not include variable names, then 'off' should be substituted for on.

2.5 Opening Text (.txt) files with Syntax

Below is an example of the syntax necessary to open a data file that is saved as a text file:

1 GET DATA

2 /TYPE=TXT

3 /FILE="C:\Documents and Settings\burnhamb2\Desktop\GRE Therapy Data File.txt"

4 /DELCASE=LINE 5 /DELIMITERS="\t" 6 /ARRANGEMENT=DELIMITED 7 /FIRSTCASE=2 8 /IMPORTCASE=ALL 9 /VARIABLES= 10 ID F3.0 11 Sex F1.0 12 Coll_Class F1.0 13 Coll_Maj F1.0 14 Height_cm F5.1 15 Weight_kg F5.1 16 SAT_CR F3.0 17 SAT_M F3.0 18 SAT_V F3.0 19 SAT_Tot F4.0 20 GPA F5.3 21 Drug_Group F1.0 22 Tutor_Group F1.0 23 Pre_STAIt F2.0 24 Pre_STAIs F2.0 25 Pre_GREv F3.0 26 Pre_GREq F3.0 27 Pre_GREa F3.1 28 Post_STAIt F2.0 29 Post_STAIs F2.0 30 Post_GREv F3.0 31 Post_GREq F3.0 32 Post_GREa F3.1. 33 CACHE. 34 EXECUTE.

35 DATASET NAME DataSet4 WINDOW=FRONT.

First thing, I have no idea why the lines are not colored; I was surprised myself. This set of syntax is a bit longer, mainly because you need to tell PASW to read in each variable name form the text file (Lines 10 – 32). Like the PASW syntax for importing data in an Excel spreadsheet, you need to be careful to include certain commands.

(13)

On Line 2 (/TYPE=TXT), the TXT is the file extension for text files.

On Line 4 (/DELCASE=LINE), this is telling PASW that each new case (i.e., each subject) is a different line (row) within the text file.

On Line 5 (/DELIMITERS="\t"), 'delimiters' define the boundaries between adjacent entries, that is, data points in a data file. The \t is telling PASW that the boundaries are defined by TABS.

On Line 7 (/FIRSTCASE=2), this is telling PASW that the data in the text file actually begin on line 2; that is, the first case (subejct) is on line 2 of the data file.

On Line 8 (/IMPORTCASE=ALL), this is telling PASW to import all of the data. This can be changed is you only want to import some of the data file.

Lines 10 – 32 list the labels of each variable in the data set. These variable labels actually appear on line 1 of the data set.

Once you have opened a data set, you should save it as a PASW data file to be used in the future. Then, you can just double click it open.

Throughout the reminder of this packet, when I am providing syntax examples or the output of a procedure, I am not going to provide too much commentary. I'd rather you explore the output and the syntax on your own to get a feel for everything.

(14)

3. Syntax for Basic Statistical Needs

3.1 Variable Labels

In the data file, the NAME given to each variable is a short acronym. For example, 'ID' stands for 'Identification Number', 'Coll_Maj' stands for 'College Major', 'SAT_CR' stands for 'Critical Reading Score on the SATs', etc. So that you do not have to memorize each of these acronyms, it's a good idea to assign a LABEL to each variable. These VARIABLE LABELS do not show up in the data file, but will show up in an output file. Here is how to use the VARIABLE LABELS syntax to assign the label 'SAT Critical Writing Score' to SAT_CR (remember, you do not type the number at the beginning): 1 VARIABLE LABEL SAT_CR 'SAT Critical Writing Score' .

All that you need to do is to list the variable NAME (SAT_CR) followed by the LABEL you wish to assign (SAT Critical Writing Score). Be sure that the label is in single quotes. You can also assign labels to more than one variable at a time:

1 VARIABLE LABEL SAT_CR 'SAT Critical Writing Score' SAT_M 'SAT Math Score' .

3.2 Value Labels

For independent variables that have several levels/groups, it is best to dummy-code those groups in the data file. That is, in the data file, male subjects and female subjects will not be called 'male' and 'female'; rather, they will be assigned arbitrary numbers. In the data file for this packet, for the variable 'Sex', males are assigned 1 and females are assigned 2. The numbers can be anything, as long as all males have the same number, and all females have the same number.

The reason, is that if you want to compare levels/groups of an independent variable, PASW requires they have numeric labels. The downside, is that if you run an analysis that involves those groups/levels, only the arbitrary numbers will appear in the output. You'd have to memorize what the label 1 means for the variable Sex, versus what the label 1 means for another independent variable. But, you can assign LABELS to the dummy-code VALUE assigned to groups. These VALUE LABELS will not show in the data file, but do show in output. Here is an example of how to use the VALUE LABELS syntax to assign labels to the dummy-coded males and females for the variable Sex:

1 VALUE LABEL Sex 1 'Males' 2 'Females' .

If you want to assign labels to more than one independent variable at a time, it is best to use several individual commands:

1 VALUE LABEL Sex 1 'Males' 2 'Females' .

2 VALUE LABEL Coll_Class 1 'Freshmen' 2 'Sophomore' 3 'Junior' 4 'Senior' .

(15)

4 'Mathematics' .

5 VALUE LABEL Drug_Group 1 'Control Group (no drug)' 2 'Placebo Group' 3 '100 mg/day 6 Group' 4 '200 mg/day Group' .

7 VALUE LABEL Tutor_Group 1 'Control Group (no tutoring)' 2 'Group Tutoring' 3 'Individual 8 Tutoring'.

In the data file, I have assigned VALUE LABELS to each independent variable. Hence, when output is presented later in this packet, the groups will not have dummy-codes, they have the labels assigned from the syntax above.

3.3 Frequencies

The FREQUENCIES command is used to obtain a frequency table for a variable. The syntax below asks PASW to determine the frequency for each group within the variables Sex and Coll_Class. Note that the variable names have to be entered just as they appear at the top of the columns in the data file. Also, note that you can request frequencies for several variables at once. This is typical for most PASW commands: you can request a procedure for several variables simultaneously:

1 FREQUENCIESVARIABLES=Sex Coll_Class

2 /ORDER=ANALYSIS.

The syntax above provides the following output (comments were added by me):

Statistics

Sex Coll_Class Coll_Maj

N Valid 240 240 240

Missing 0 0 0

Frequency Table

Sex

Frequency Percent Valid Percent Cumulative Percent

Valid Males 109 45.4 45.4 45.4

Females 131 54.6 54.6 100.0

Total 240 100.0 100.0

Coll_Class

Frequency Percent Valid Percent Cumulative Percent

Valid Freshmen 57 23.8 23.8 23.8

Sophomore 65 27.1 27.1 50.8

Junior 63 26.3 26.3 77.1

Senior 55 22.9 22.9 100.0

Total 240 100.0 100.0

How many cases (subjects) that contribute to each of the three variables.

Each group that contributes to each variable is listed to the left

(16)

3.4 Descriptive Statistics

Although descriptive statistics can be requested as a sub-command within many PASW commands, there is a specific DESCRIPTIVES command. Like the FREQUENCIES command, you can request descriptive statistics for several variables at the same time. In the syntax below, I requested PASW to compute descriptive statistics on the variables Height_cm and Weight_kg:

1

DESCRIPTIVES VARIABLES=Height_cm Weight_kg

2 /STATISTICS=MEAN SUM STDDEV VARIANCE RANGE MIN MAX SEMEAN KURTOSIS

3 SKEWNESS.

You can request a variety of descriptive statistics. On Lines 2 and 3, I listed each descriptive statistic that can be requested; most should be self-explanatory, except for 'SEMEAN', which stands for standard error of the mean, and KURTOSIS and SKEWNESS, which refer to the peakedness of a distribution and the skewness of a distribution, respectively. In the output that follows, I did not request the KURTOSIS and the SKEWNESS statistics:

Descriptive Statistics

N Range Minimum Maximum Sum Mean

Std.

Deviation Variance Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic Height_cm 240 49.3 142.3 191.6 40559.5 168.998 .5922 9.1745 84.171 Weight_kg 240 84.7 25.6 110.3 16248.1 67.700 .9066 14.0451 197.266 Valid N (listwise) 240

3.5 SORT CASES

If you want to sort all of the cases in the data file in ascending or descending order, based on a certain variable, the following SORT CASES command is used. The syntax below asks PASW to arrange the data file in ascending order (A) based on the variable Coll_Class. In the data file, freshmen will appear first, then sophomores, followed by juniors, and finally seniors. If you want to sort in descending order, use (D) in place of (A). (There is no output for this syntax command.)

1 SORT CASESBY Coll_Class(A).

Each requested variable is listed in a different column.

(17)

3.6 SPLIT FILE

I section 3.4 above, where PASW was asked to calculate descriptive statistics, each statistic was based on the n = 240 subjects in the data file. There is nothing wrong with this, but what if you wanted to look at the means and descriptive statistics for different groups? For example, you may want to look at students' mean weights and mean heights for each college class. But, the output in section 3.4 includes data combined from across all four college classes.

Luckily, PASW has a SPLIT FILE command that asks PASW to calculate descriptive statistics for different groups within some independent variable. For example, say you wanted to examine the descriptive statistics by college class. First, you need to use the following syntax to 'split' the output file into different groups:

1 SORT CASES BY Coll_Class.

2 SPLIT FILESEPARATE BY Coll_Class.

Next, run the same DESCRIPTIVES syntax in Section 3.4: 1 DESCRIPTIVESVARIABLES=Height_cm Weight_kg

2 /STATISTICS=MEAN SUM STDDEV VARIANCE RANGE MIN MAX SEMEAN KURTOSIS

3 SKEWNESS.

You will get the following output, which is the descriptive statistics performed on each group within the variable Coll_Class:

Coll_Class = Freshmen

Descriptive Statistics

N Range Minimum Maximum Sum Mean DeviationStd. Variance Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic Height_cm 57 31.2 153.3 184.5 9638.8 169.102 1.1159 8.4251 70.982 Weight_kg 57 71.9 38.4 110.3 3847.0 67.491 1.9307 14.5762 212.467 Valid N (listwise) 57

Coll_Class = Sophomore

Std.

Deviation Variance Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic Height_cm 65 43.8 147.8 191.6 10923.1 168.048 1.2866 10.3732 107.604 Weight_kg 65 64.6 25.6 90.2 4244.8 65.305 1.5937 12.8486 165.087 Valid N

(listwise)

65

T he variable by which you want the output 'split' into different groups is listed here.

(18)

Coll_Class = Junior

Descriptive Statistics

Std.

Deviation Variance Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic Height_cm 63 46.3 142.3 188.6 10659.4 169.197 1.0615 8.4254 70.987 Weight_kg 63 63.3 37.3 100.6 4349.2 69.035 1.7392 13.8046 190.568 Valid N (listwise) 63

Coll_Class = Senior

Std.

Deviation Variance Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic Height_cm 55 34.7 154.0 188.7 9338.2 169.785 1.2657 9.3869 88.114 Weight_kg 55 68.1 35.8 103.9 3807.1 69.220 2.0311 15.0633 226.904 Valid N

(listwise)

55

When you're done using the SPLIT FILE COMMAND, don't forget to turn it off; or else all of your output will be separated into different groups:

(19)

4. Correlation & Regression

4.1 Pearson Correlations (Bivariate)

PASW can measure the statistical association between two variables in a variety of ways (e.g., Pearson correlation, Spearman correlation, Chi-Square, gamma coefficients). For the data in our file, we'll be dealing with how PASW can calculate the Pearson correlation between two variables.

The CORRELATIONS syntax below asks PASW to calculate the Pearson correlation between the variables SAT_CR (SAT Critical Writing Score) and SAT_M (SAT Math Score). All that you need to do is to list on Line 2 the variables between which you want the Pearson correlation measured:

1 CORRELATIONS

2 /VARIABLES= SAT_CR SAT_M

3 /PRINT=TWOTAIL NOSIG

4 /MISSING=PAIRWISE.

On Line 3, the TWOTAIL sub-command tells PASW to run the inferential test on the Pearson correlation as a non-directional, two-tailed test. NOSIG asks PASW to indicate which correlations are statistically significant with an asterisk (*). On Line 4, the /MISSING=PAIRWISE sub-command tells PASW what to do with any missing data points. (In this data file, there are no missing data.) If you have a missing data point, PASW must know what to do with that subject's data. You have two options: handle missing data PAIRWISE or LISTWISE. If you choose LISTWISE, any subject who has a missing data point for any variable will be excluded from all correlations. If you choose PAIRWISE, a subject will be excluded from only those correlations where the subject is missing a data point. When you run the syntax above, you get the following output:

Correlations

SAT_CR SAT_M

SAT_CR Pearson Correlation 1 -.062

Sig. (2-tailed) .340

N 240 240

SAT_M Pearson Correlation -.062 1

Sig. (2-tailed) .340

N 240 240

Each variable is listed in its own column and own row. To find the Pearson correlation between two variables, cross-reference one variable in the columns with the other variable in the rows. The Sig. (2-tailed) value under the Pearson correlation is the p-value for that correlation. It is the exact alpha-level (α) associated with that size correlation (r = -.062) based on that sample size (n = 240). To interpret a

p-value: if the listed p-value is less than your chosen alpha-level, which is generally α = .05 or less, then the correlation is significant. In this case, the Pearson correlation is not significant, because the

(20)

It is also possible to calculate several Pearson correlations at the same time. The more variables that you list on the /VARIABLES sub-command line, the more correlations will be calculated. For example, in the syntax below, I have listed three variables (SAT_CR, SAT_M, and SAT_V). When I run this syntax, PASW will generate the Pearson correlation between each pair of variables:

1 CORRELATIONS

2 /VARIABLES= SAT_CR SAT_M SAT_V

4 /MISSING=PAIRWISE.

Correlations

SAT_CR SAT_M SAT_V SAT_CR Pearson Correlation 1 -.062 .481**

Sig. (2-tailed) .340 .000

N 240 240 240

SAT_M Pearson Correlation -.062 1 -.048

Sig. (2-tailed) .340 .461

N 240 240 240

SAT_V Pearson Correlation .481** _-.048 ₁

Sig. (2-tailed) .000 .461

N 240 240 240

You can see in the output above, in addition to the correlation between SAT_CR and SAT_M that was calculated earlier, PASW also calculated the correlation between SAT_CR and SAT_V (r = .541), and between SAT_M and SAT_V (r = -0.48).

PASW also has a sub-command that allows you to request descriptive statistics to be calculated for each variable, and for the sums of squares, variances, sums of cross products, and covariances to be calculated. On line 4 of the syntax below, the DESCRIPTIVES command requests the means and standard deviations for each variable, and the XPROD command requests the variability and co-variability measures:

1 CORRELATIONS

2 /VARIABLES=SAT_CR SAT_M SAT_V

4 /STATISTICSDESCRIPTIVES XPROD

(21)

Here is the output from the last set of syntax. The first table includes the descriptive statistics for each variable, and the second table includes the person correlations, measures of variability, and measures of co-variability: Descriptive Statistics Mean Std. Deviation N SAT_CR 491.14 105.938 240 SAT_M 516.79 127.100 240 SAT_V 496.69 66.772 240 Correlations

SAT_CR SAT_M SAT_V SAT_CR Pearson Correlation 1 -.062 .481**

Sig. (2-tailed) .340 .000

Sum of Squares and Cross-products 2682249.18 -199155.775 813393.625 Covariance 11222.800 -833.288 3403.321

N 240 240 240

SAT_M Pearson Correlation -.062 1 -.048

Sig. (2-tailed) .340 .461

Sum of Squares and Cross-products -199155.775 3860928.162 -96992.938 Covariance -833.288 16154.511 -405.828

N 240 240 240

SAT_V Pearson Correlation .481** _-.048 ₁

Sig. (2-tailed) .000 .461

Sum of Squares and Cross-products 813393.625 -96992.938 1065571.563 Covariance 3403.321 -405.828 4458.458

N 240 240 240

4.2 Pearson Correlations (Partial)

Having PASW calculate the partial correlation between two variables (the correlation between two variables with the influence of other variables factored out from both variables), is not much different than asking PASW to calculate a raw (zero-order) correlation. For example, say you want to calculate the partial correlation between GPA and Pre_GREv scores (Pretest GRE Verbal Reasoning Scores), while factoring out the SAT_CR scores (Critical Reasoning Scores on the SAT) from both variables.

Between two different variables, this is the sum of cross products. Between the same variable, this is the sum of squares.

Between two different variables, this is the covarance. Between the same variable, this is the variance.

(22)

In the syntax below, on the /VARIABLES sub-command line, the two variables listed before the BY (GPA and Pre_GREv) are the variables between between which we want to calculate a partial correlation. The variable that comes after the BY (SAT_CR) is the variable we want factored out of the other variables. Please note that you can ask PASW to factor out more than one variable:

1 PARTIAL CORR

2 /VARIABLES=GPA Pre_GREv BY SAT_CR

3 /SIGNIFICANCE=TWOTAIL

4 /STATISTICS=DESCRIPTIVES CORR

5 /MISSING=LISTWISE.

On Line 3, the /SIGNIFICANCE=TWOTAIL asks PASW to run the inferential test on the partial correlation as a non-directional, two-tailed test. You have the option of selecting a ONETAIL test as well. On line 4, the /STATISTICS sub-command is asking PASW to calculate the descriptive statistics (DESCRIPTIVES) for each variable. The CORR sub-command is asking PASW to provide the raw Pearson correlations between each pair of variables, in addition to the partial correlation between GPA and Pre_GREv.

Here is the output from the syntax above. The first table reports the descriptive statistics, and the second table is the correlations and partial correlations. The areas in yellow are the raw Pearson correlations, and the areas in green are the partial correlations:

Mean Std. Deviation N

GPA 3.01082 .670476 240

Pre_GREv 412.25000 57.397717 240 SAT_CR 491.14167 105.937717 240

Correlations

Control Variables GPA Pre_GREv SAT_CR

-none-a _GPA _Correlation _1.000 _.169 _.531

Significance (2-tailed) . .009 .000 df 0 238 238 Pre_GREv Correlation .169 1.000 .135 Significance (2-tailed) .009 . .037 df 238 0 238 SAT_CR Correlation .531 .135 1.000 Significance (2-tailed) .000 .037 . df 238 238 0

SAT_CR GPA Correlation 1.000 .116

Significance (2-tailed) . .074

df 0 237

Pre_GREv Correlation .116 1.000 Significance (2-tailed) .074 .

(23)

4.3 Univariate Regression (one regressor)

There is a mountain of stuff that you can do with PASWs REGRESSION procedures, including how a regression analysis is performed and what statistics can be requested. Below, I am performing a 'bare-bones' REGRESSION analysis to keep things simple. The analysis below will regress (predict) GPA on the Summed SAT Scores (SAT_tot). Hence, GPA is the dependent variable (Y) and SAT_tot is the predictor variable (X).

In the syntax below, PASW is being asked to regress GPA on SAT_tot. The DEPENDENT (predicted, or regressed) variable is listed on Line 6. The predictor (independent, or regressor) variable is listed on Line 7 after the ?METHOD sub-command. A few notes on Line 7: First, if you have more than one predictor, each predictor would be entered here. In this example we have only one predictor (SAT_tot). Second, there are a number of methods that you can use to have PASW conduct the analysis (ENTER, STEPWISE, etc.), but this is beyond the scope of this packet. Just use METHOD=ENTER: 1 REGRESSION

2 /MISSINGLISTWISE

3 /STATISTICSCOEFF OUTS R ANOVA

4 /CRITERIA=PIN(.05) POUT(.10)

5 /NOORIGIN

6 /DEPENDENT GPA

7 /METHOD=ENTER SAT_Tot.

The /STATISTICS sub-command on Line 2 is where you can ask PASW to provide various statistics and inferential tests as part of the regression analysis. COEFF requests the slope and intercept coefficients in the regression model. OUTS asks PASW to list any predictors that were entered into the regression model, but were not included due to their not meeting criteria specified on Line 4. 'R' asks for the R and R2_{values of the regression model. ANOVA ask for the analysis of variance to be}

conducted on the overall regression model.

On Line 4, the /CRITERIA=PIN(.05) POUT(.10) are inclusion and exclusion criteria for each regressor coefficient that is initially entered into the model. Basically, if a regressor coefficient does not meet these set criteria, which are based on the t-Tests for the coefficients, they are not included in the final regression model. These values can be adjusted, but the .05 and .10 are used by default.

When you run the syntax above, you get the following output:

Variables Entered/Removedb

Model Variables Entered Variables Removed Method

1 SAT_Tota _{. Enter}

Model Summary

Model

R R Square Adjusted R Square

Std. Error of the Estimate

1 .774a _.599 _.597 _.425693

T his table simply lists the predictor variables that are being entered into the regression analysis.

T his table provides the R and R2_values.

T he R2_{is the proportion of explained}

(24)

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 64.311 1 64.311 354.887 .000a Residual 43.129 238 .181 Total 107.440 239 Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) -1.093 .220 -4.979 .000 SAT_Tot .003 .000 .774 18.838 .000

You can also ask PASW to report descriptive statistics for each variable, correlations between variables, and a host of other information. In the syntax below, I added a /DESCRIPTIVES sub-command on Line 2 that asks for the MEAN and standard deviation (STDEV) for each variable, the Pearson correlation (CORR) between each pair of variables, that a significance test (SIG) be performed on each correlation, and for the number of subjects (N) contributing to each variable and to each correlation:

1 REGRESSION

2 /DESCRIPTIVESMEAN STDDEV CORR SIG N

3 /MISSINGLISTWISE

4 /STATISTICSCOEFF OUTS R ANOVA ZPP

5 /CRITERIA=PIN(.05) POUT(.10)

6 /NOORIGIN

7 /DEPENDENT GPA

8 /METHOD=ENTER SAT_Tot.

I also added ZPP to the /Statistics sub-command on Line 4. This asks PASW to calculate the zero-order, partial, and semi-partial correlations between every pair of variables. In this case, because no variable is being factored out of the relationship between GPA and SAT_tot, each of these correlations will be the same. The output from this syntax appears below and on the next page:

Descriptive Statistics

Mean Std. Deviation N GPA 3.01082 .670476 240 SAT_Tot 1504.62 190.169 240

T he ANOVA is the overall analysis of the regression model.

T his table provides the values of the coefficients in the regression equation, as well as t-Tests on each coefficient.

T he requested descriptive statistics for each variable.

(25)

Correlations

GPA SAT_Tot Pearson Correlation GPA 1.000 .774

SAT_Tot .774 1.000

Sig. (1-tailed) GPA . .000

SAT_Tot .000 .

N GPA 240 240

SAT_Tot 240 240

Variables Entered/Removedb

Model Variables Entered Variables Removed Method

1 SAT_Tota _{. Enter}

Model Summary

Model

R R Square Adjusted R Square

Std. Error of the Estimate

1 .774a _.599 _.597 _.425693

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 64.311 1 64.311 354.887 .000a Residual 43.129 238 .181 Total 107.440 239 Coefficientsa Model Unstandardized Coefficients Standardized

Coefficients t Sig. Correlations B Std. Error Beta Zero-order Partial Part 1 (Constant) -1.093 .220 -4.979 .000

SAT_Tot .003 .000 .774 18.838 .000 .774 .774 .774

Lists the requested correlations and p-values.

Here are the requested zero-order, partial, and semi-partial correlations.

(26)

5. t-Tests

5.1 One-Sample t-test

There are three Tests PASW can perform on a set of data: one-sample Test, independengroups t-Test (independent-samples t-t-Test), and correlated samples t-t-Test (paired-samples t-t-Test). But, the statistics that can be requested and the test parameters that you can control are very limited.

The syntax below asks PASW to run a one-sample t-Test. The dependent variable is GPA, which is entered on the /VARIABLES sub-command on Line 4:

1 T-TEST

2 /TESTVAL=3

3 /MISSING=ANALYSIS

4 /VARIABLES=GPA

5 /CRITERIA=CI(.95).

Importantly, for the one-sample t-Test, you must state a value to which the mean of the dependent variable is compared. This value is entered after the /TESTVAL sub-command on Line 2. In this case, PASW is being asked to compare the mean GPA to a value of 3, which coincides with a grade of 'B'. The /CRITERIA sub-command on Line 5 is pretty much all you have control over, besides the /TESTVAL on Line 2. The CI value tells PASW what size confidence interval and what alpha-level to use in the t-Test. In this case, .95 corresponds to the 95% confidence interval, and alpha level of .05. If you run the syntax above, you get the following in the output file:

One-Sample Statistics

N Mean Std. Deviation Std. Error Mean GPA 240 3.01082 .670476 .043279

One-Sample Test

Test Value = 3

t df Sig. (2-tailed) Mean Difference

95% Confidence Interval of the Difference

Lower Upper

GPA .250 239 .803 .010821 -.07444 .09608

In the table for the One-Sample Test above, the Sig. (2-tailed) value is the p-value used as a basis for determining statistical significance. If it is less than your chosen alpha level (α = .05, or less), then the

T his table presents the descriptive statistics for the dependent variable.

T his table presents the results of the inferential, one-sample t-test.

(27)

difference between the mean (3.01082) and the test value (3) is significant. In this case, the difference is not significant, because .803 > .05. The values underneath the heading 95% Confidence Interval of the Difference are the upper and lower boundaries for the 95% confidence interval around the difference between the mean and the test value (.010821).

As another example, the syntax below asks PASW to compare the mean pretest score on from the Analytical Writing section of the GREs (Pre_GREa) to a test value of 4.9. This test value of 4.9 is actually the national mean score on that section of the GREs:

1 T-TEST

2 /TESTVAL=4.9

3 /MISSING=ANALYSIS

4 /VARIABLES=Pre_GREa

Running this syntax, we get the following in the output file:

One-Sample Statistics

N Mean Std. Deviation Std. Error Mean

Pre_GREa 240 4.11 .361 .023

One-Sample Test

Test Value = 4.9

t df Sig. (2-tailed) Mean Difference

95% Confidence Interval of the Difference

Lower Upper Pre_GREa -33.731 239 .000 -.785 -.83 -.74

In this case, the One-Sample Test indicates that the mean difference (-.785) is statistically significant, because the p-value in the Sig. (2-tailed) column is less than the conventional alpha-level of α = .05.

5.2 Independent Groups t-Tests

The syntax on the next page illustrates how to conduct an independent groups t-Test. Note that when comparing two different groups or levels within a between-subjects independent variable, you must be sure that the groups/levels of that independent variable have been dummy-coded; that is, assigned numeric values in the data file. PASW will not run the independent groups t-Test if the groups have been assigned descriptive (string) labels in the data file.

Say that we want to compare the mean posttest score on the Verbal Reasoning Section of the GREs between different levels of the independent variable Tutor_Group. Specifically, we want to compare mean performance between the group of subjects who did not receive tutoring (Control Group) and the group of subjects who received individual tutoring (Individual Tutoring Group). Recall, within the

(28)

independent variable Tutor_Group, the group that did not receive tutoring was dummy-coded with 1 and the group that received individual tutoring was dummy-coded with 3 (the group that received group tutoring was dummy coded with 2).

In the syntax below, after the T-TEST command, the GROUPS sub-command is listed. In the parentheses, the 1 and 3 are the values that were assigned to the no tutoring group and the individual tutoring group, respectively. The dependent variable (Post_GREv) is listed after the /VARIABELS sub-command on Line 3:

1 T-TESTGROUPS=Tutor_Group(1 3)

2 /MISSING=ANALYSIS

3 /VARIABLES=Post_GREv

When you run this syntax, you get the following output:

Group Statistics

Tutor_Group N Mean Std. Deviation Std. Error Mean Post_GREv Control Group (no tutoring) 80 419.25 62.334 6.969

Individual Tutoring 80 442.88 68.993 7.714

Independent Samples Test

Levene's Test for Equality of

Variances t-test for Equality of Means

F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper Post_GREv Equal variances

assumed

.978 .324 -2.273 158 .024 -23.625 10.396 -44.157 -3.093 Equal variances

not assumed

-2.273 156.400 .024 -23.625 10.396 -44.159 -3.091

The table Independent Samples Test lists a lot of information, some of which is relevant, some of which is less relevant. First, you will almost always assume equal variances, so be sure to use information from those rows. Second, Levene's Test for Equality of Variances is a test for whether the variances of the groups being compared are statistically equivalent. If Levene's Test is not significant, which is the case here, then we can assume that the variances are indeed equal.

The information under the heading t-test for Equality of Means is relevant to the independent groups t-Test on the data and most of the terms should be self-explanatory. Importantly, the Sig. (2-tailed) value is the p-value used for determining statistical significance. If it is less than a chosen alpha level (α = . 05, or less), then the mean difference (-23.625) is significant, which is the case here. Please note that the mean difference is negative because of how the groups were entered into the t-Test in the syntax. That is, the no tutoring group was entered first in the syntax and the individual tutoring group was

T his table presents the descriptive statistics on the dependent variable for each group within the independent variable.

T his table presents the results of the independent groups t-test that is comparing the means in the table above.

(29)

entered second. This means that PASW will subtract the individual tutoring mean from the no tutoring mean. Thus, this value is negative only because of how the groups are being entered; it has nothing to do with any hypotheses.

A nice feature about the PASW independent groups Test procedure is that you can run several t-Tests that are comparing performance between the same two groups. For example, let's say we also want to compare mean posttest score on the Analytical Writing Section of the GREs between the no tutoring group and the individual tutoring group. All that you have to do is add this dependent variable on the /VARIABLES sub-command on Line 3:

1 T-TESTGROUPS=Tutor_Group(1 3)

2 /MISSING=ANALYSIS

3 /VARIABLES=Post_GREv Post_GREa

Running this syntax, we get the following output:

Group Statistics

Tutor_Group N Mean Std. Deviation Std. Error Mean Post_GREv Control Group (no tutoring) 80 419.25 62.334 6.969

Individual Tutoring 80 442.88 68.993 7.714 Post_GREa Control Group (no tutoring) 80 4.17 .355 .040 Individual Tutoring 80 4.22 .456 .051

Independent Samples Test

Levene's Test for Equality of

Variances t-test for Equality of Means

F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper Post_GREv Equal variances

assumed

.978 .324 -2.273 158 .024 -23.625 10.396 -44.157 -3.093 Equal variances

not assumed

-2.273 156.400 .024 -23.625 10.396 -44.159 -3.091 Post_GREa Equal variances

assumed

4.221 .042 -.774 158 .440 -.050 .065 -.178 .078 Equal variances

not assumed

(30)

5.3 Correlated Samples (Paired Samples) t-Tests

Recall, that the correlated samples t-Test is used to compare performance on some dependent variable across levels of a within-subjects independent variable. In PASW, the correlated samples t-Test is called the paired samples t-Test. As was the case for the one-sample t-Test and independent groups t-Tests, there is not much control over what you can request for the paired samples t-Test. Say we want to compare trait-anxiety levels between the pretest and posttest periods. Recall, in the hypothetical study, the researchers measured each subject's state anxiety and trait anxiety using the State-Trait Anxiety Inventory (STAI), and these types of anxiety were measured during the pretest and posttest periods. If we are interested in, specifically, the change in trait anxiety between the pretest and posttest periods, we're going to want to compare the Pre_STAIt mean with the Post_STAIt mean. The syntax below asks PASW to compare Pre_STAIt with Post_STAIt scores. On the T-TEST command line, the PAIRS sub-command tells PASW to run a paired samples t-Test. The levels of the variable being compared come before and after the WITH. Thus, Line 1 is basically telling PASW to “compare Pre_STAIt scores WITH Post_STAIt scores using a PAIRED samples t-Test:”

1 T-TESTPAIRS=Pre_STAIt WITH Post_STAIt (PAIRED)

2 /CRITERIA=CI(.95)

3 /MISSING=ANALYSIS.

Running this syntax, you get the following output:

Paired Samples Statistics

Mean N Std. Deviation Std. Error Mean Pair 1 Pre_STAIt 50.27 240 11.923 .770

Post_STAIt 50.09 240 11.944 .771

Paired Samples Correlations

N Correlation Sig. Pair 1 Pre_STAIt & Post_STAIt 240 .987 .000

Paired Samples Test

Paired Differences t df Sig. (2-tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Pair 1 Pre_STAIt - Post_STAIt .175 1.939 .125 -.072 .422 1.398 239 .163

In the table Paired Samples Test, most of the statistics should be familiar and straightforward. Mean is the mean difference in the dependent variable between the levels of the independent variable. Note

(31)

that this value is positive because of how PASW entered the levels of the independent variable into the t-Test. In the syntax, Pre_STAIt was entered before WITH and Post_STAIt was entered after WITH; hence, the Post_STAIt mean was subtracted from the Pre_STAIt mean. Thus, it is positive only because of how the levels were entered. Ion this example, the mean difference (.175) is not statistically significant, because the p-value (.163) is greater than .05.

You can request several Paired Samples t-tests at the same time. For example, in addition to comparing the Pre_STAIt mean with the Post-STAIt mean, say we also want to compare the pretest and posttest scores from the Quantitative Reasoning section of the GREs (Pre_GREq compared to Post_GREq). In the syntax below, two variables are listed before WITH (Pre_STAIt and Pre_GREq) and two variables are listed after WITH (Post_STAIt and Post_GREq):

1 T-TESTPAIRS=Pre_STAIt Pre_GREq WITH Post_STAIt Post_GREq (PAIRED)

2 /CRITERIA=CI(.95)

3 /MISSING=ANALYSIS.

When the syntax is run, PASW will compare the mean of the first variable before WITH (Pre_STAIt) with the mean of first variable after WITH (Post_STAIt); and PASW will compare the mean of the second variable before WITH (Pre_GREq) with the mean of second variable after WITH (Post_GREq). Thus, it is critical to enter the variables on each side of the WITH in the appropriate order when running several paired-samples t-tests. Running this syntax provides the following output:

Paired Samples Statistics

Mean N Std. Deviation Std. Error Mean Pair 1 Pre_STAIt 50.27 240 11.923 .770

Post_STAIt 50.09 240 11.944 .771 Pair 2 Pre_GREq 568.71 240 76.484 4.937 Post_GREq 591.13 240 79.685 5.144

Paired Samples Correlations

N Correlation Sig. Pair 1 Pre_STAIt & Post_STAIt 240 .987 .000 Pair 2 Pre_GREq & Post_GREq 240 .934 .000

Paired Samples Test

Paired Differences t df Sig. (2-tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Pair 1 Pre_STAIt - Post_STAIt .175 1.939 .125 -.072 .422 1.398 239 .163 Pair 2 Pre_GREq - Post_GREq -22.417 28.461 1.837 -26.036 -18.798 -12.202 239 .000

(32)

6. Analysis of Variance

6.1 Oneway Analysis of Variance (via GLM)

Analysis of Variance (ANOVA) is used, for among other reasons, to compare performance on a dependent variable across two or more levels of one or more independent variables. Oh the things I could say about ANOVA and experimental design! Alas, we do not have time. The PASW procedure for ANOVA is the General Linear Model (GLM). Don't worry about what it means, just know that it calculates F-tests for single-factor and factorial designs.

ANOVA can be used when the levels of an independent variable are manipulated (experimental design), or naturally-occurring (quasi-experimental design). Critically: Setting up ANOVA in PASW requires you to think about the design: Is there one independent variable, or more? How many levels of each independent variable are there? Do the levels of the independent variables differ between-subjects or within-between-subjects? I don't want to get technical, so I'll be as simple as possible.

From the data set, say we want to compare the Posttest GRE Verbal Reasoning Scores (Post_GREv) across the four groups within the independent variable Drug_Group. Thus, we have a oneway ANOVA; that is, one independent variable and one dependent variable. The syntax below presents the minimal set of sub-commands needed to run a oneway ANOVA. This syntax is used only if the independent variable is between-subjects (withing-subjects variables require a repeated measured GLM):

1 UNIANOVA Post_GREv BY Drug_Group

2 /METHOD=SSTYPE(3)

3 /INTERCEPT=INCLUDE

4 /CRITERIA=ALPHA(.05)

5 /DESIGN=Drug_Group.

The variable before BY (Post_GREv) is always the dependent variable and the variable after BY (Drug_Group) is always the independent variable. If you have a factorial design, the additional independent variables would be entered here. On Line 2, the /METHOD sub-command tells PASW how the sums of squares should be calculated (SSTYPE), which is usually set to 3. On Line 4, the /CRITERIA command tells PASW what alpha level to use. Finally, on Line 5, the /DESIGN sub-command is where you build the effects to be examined in the ANOVA. In the case of a oneway design, there is only one independent variable to influence the dependent variable; hence, you list that independent variable. When you are using ANOVA to analyze a factorial designs, additional factors can be included.

Running the syntax above gives you the following:

Between-Subjects Factors

Value Label N Drug_Group 1 Control Group (no drug) 60

2 Placebo Group 60

3 100 mg/day Group 60 4 200 mg/day Group 60

T his table lists each level of the

independent variable, as well as the number of subjects (N) contributing to each level.

(33)

Tests of Between-Subjects Effects

Dependent Variable:Post_GREv

Source Type III Sum of

Squares df Mean Square F Sig.

Corrected Model 28571.250a ₃ _9523.750 _2.333 _.075

Intercept 4.487E7 1 4.487E7 10991.299 .000 Drug_Group 28571.250 3 9523.750 2.333 .075 Error 963375.000 236 4082.097

Total 4.586E7 240

Corrected Total 991946.250 239

The ANOVA summary table (Tests of Between-Subjects Effects) contains a lot of information, some of it unnecessary for our present purpose. I have highlighted relevant portions of the table in yellow. The terms associated with between group variance (variability due to the independent variable) are in the row labeled Drug_Group, which is the independent variable. The terms associated with the within group variance are in the row labeled Error. Most values in each column should be straightforward: Sums of squares for each source of variance are in the second column, degrees of freedom are in the third column, mean squares come next, followed by the F-test, and finally p-values. In this case, the F-Test on the independent variable is not statistically significant, because the p-value (.075) is greater than the chosen alpha-level (.05). Let's assume the test was significant, so we can do post-hoc tests. If you have a statistically significant F-Test, you need to know between which levels of the independent variable there is a significant difference in the dependent variable: we need post-hoc tests.

The syntax below includes additional sub-commands. First, the /POSTHOC sub-command on Line 4 asks PASW to compare levels of the independent variable Drug_Group using Fisher's Least Significant Difference test (LSD). You have several options for what post-hoc test to use (TUKEY, BONFERRONI), but we'll stick with LSD for now. On Lines 5 and 6, the /EMEANS sub-command asks PASW to calculate the estimated mean of the dependent variable at each levels of the independent variable. Specifically, Line 5 asks for the grand mean (OVERALL), and Line 6 asks for the estimated mean for each level of Drug_Group. Finally, the /PRINT sub-command on Line 7 asks PASW to include additional items in the output. Specifically, ETASQ requests the eta-squared measure for the effect size, and DESCRIPTIVE asks for the descriptive statistics. There are many additional items that you can ask PASW to 'print' in the output, but we'll stick with these.

1 UNIANOVA Post_GREv BY Drug_Group

2 /METHOD=SSTYPE(3)

3 /INTERCEPT=INCLUDE

4 /POSTHOC=Drug_Group(LSD)

5 /EMMEANS=TABLES(OVERALL)

6 /EMMEANS=TABLES(Drug_Group)

7 /PRINT=ETASQDESCRIPTIVE

8 /CRITERIA=ALPHA(.05)

9 /DESIGN=Drug_Group.

T his table is the ANOVA summary table. T he sums of squares, degrees of freedoms, mean squares, F-Tests, and p -values are listed here.

(34)

When you run the syntax, you get the following output:

Between-Subjects Factors

Value Label N Drug_Group 1 Control Group (no drug) 60

2 Placebo Group 60

3 100 mg/day Group 60 4 200 mg/day Group 60

Drug_Group Mean Std. Deviation N Control Group (no drug) 424.83 57.327 60 Placebo Group 423.83 58.457 60 100 mg/day Group 430.00 68.668 60 200 mg/day Group 450.83 70.068 60

Total 432.38 64.424 240

Tests of Between-Subjects Effects

Source Type III Sum of

Squares df Mean Square F Sig.

Partial Eta Squared Corrected Model 28571.250a ₃ _9523.750 _2.333 _.075 _.029

Intercept 4.487E7 1 4.487E7 10991.299 .000 .979 Drug_Group 28571.250 3 9523.750 2.333 .075 .029 Error 963375.000 236 4082.097

Total 4.586E7 240

Corrected Total 991946.250 239

Estimated Marginal Means

1. Grand Mean

Mean Std. Error 95% Confidence Interval Lower Bound Upper Bound 432.375 4.124 424.250 440.500

2. Drug_Group

Drug_Group Mean Std. Error 95% Confidence Interval Lower Bound Upper Bound Control Group (no drug) 424.833 8.248 408.584 441.083 Placebo Group 423.833 8.248 407.584 440.083 100 mg/day Group 430.000 8.248 413.750 446.250 200 mg/day Group 450.833 8.248 434.584 467.083

T his table comes from requesting DESCRIPT IVES as part of the /PRINT sub-command.

Estimated marginal means come from the /EMEANS sub-commands. Table 1 comes from the OVERALL request on Line 5 of the syntax, and Table 2 comes from Line 6.

(35)

Post Hoc Tests

Drug_Group

Multiple Comparisons Dependent Variable:Post_GREv

(I) Drug_Group (J) Drug_Group Mean Difference

(I-J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound

LSD Control Group (no drug) Placebo Group 1.00 11.665 .932 -21.98 23.98

100 mg/day Group -5.17 11.665 .658 -28.15 17.81

200 mg/day Group -26.00* 11.665 .027 -48.98 -3.02

Placebo Group Control Group (no drug) -1.00 11.665 .932 -23.98 21.98

100 mg/day Group -6.17 11.665 .598 -29.15 16.81

200 mg/day Group -27.00* _11.665 _.021 _-49.98 _-4.02

100 mg/day Group Control Group (no drug) 5.17 11.665 .658 -17.81 28.15

Placebo Group 6.17 11.665 .598 -16.81 29.15

200 mg/day Group -20.83 11.665 .075 -43.81 2.15

200 mg/day Group Control Group (no drug) 26.00* _11.665 _.027 _3.02 _48.98

Placebo Group 27.00* 11.665 .021 4.02 49.98

100 mg/day Group 20.83 11.665 .075 -2.15 43.81

In the output above, the estimated marginal means and the descriptive statistics table provide more or less the same information: the means of each level of the independent variable Drug_Group. The table under Post Hoc Tests tells you which differences between levels of the independent variable are statistically significant.

To read the Post Hoc Tests (Multiple Comparisons) table: There are two columns (I and J), both of which are labeled with the independent variable (Drug_Group). Under column I, one level of the independent variable should be listed, and in column J each of the other three levels of that independent variable are listed in separate rows. For example, the first level of the independent variable listed in column I is Control Group (no drug), and each of the other three levels of the independent variable are listed under column J: Placebo Group, 100 mg/day group, 200 mg/day Group.

You should see a mean difference next to each of the groups in column J. This is the mean difference in the dependent variable between the level of the independent variable listed in column J with the level of the independent variable listed in column I. Thus, the mean difference in Posttest Verbal Reasoning GRE scores between the Placebo Group and the Control group is -1.00. The mean difference in Posttest Verbal Reasoning GRE scores between the 100 mg/day Group and the Control group is -5.17. (Note, they are negative only because of the direction PASW is subtracting.)

To determine whether a mean difference is statistically significant, look at the column labeled Sig. This column lists the p-value that can be used to determine whether the mean difference is significant. If the p-value is less than a chosen alpha level (α = .05, or less), then the mean difference is significant. In this data set, the only statistically significant mean differences are between the Control Group and the 200 mg/day Group (-26.00, p = .027) and between the Placebo Group and the 200 mg/day Group (-27.00, p = .021). But, it should be noted that because the F-Test was not significant, these post-hoc, pairwise comparisons are meaningless.

T his table presents all of the pairwise comparisons between levels of the independent variable; that is, all of the POST HOC comparisons.

(36)

6.2 Between Subjects Factorial ANOVA (via GLM)

Factorial designs examine the influence of two or more independent variables on a dependent variable, and several possible effects can be significant (or not) in a factorial ANOVA: main effects and

interactions. (I assume you know what these are.) The PASW procedure for requesting a factorial ANOVA is not very different from requesting a oneway ANOVA. In the syntax the follows, we will cover how to request factorial ANOVA in PASW with two between-subjects independent variables.

Say that we want to examine the influence of the independent variables Drug_Group and Tutor_Group on Posttest GRE Verbal Reasoning Scores (Post_GREv). Recall that Drug_Group has four levels (Control, Placebo, 100 mg/day, and 200 mg/day), and Tutor_Group has three levels (Control, Group Tutoring, and Individual Tutoring). Thus, we have a 4 (Drug_Group) x 3 (Tutor_Group) factorial design. The set of syntax, below, which we not actually run, includes minimum sub-commands needed to have PASW run a factorial ANOVA. On the UNIANOVA command line (Line 1), before the BY, the depe