INTRODUCTION TO SPSS
FOR WINDOWS
Version 19.0
Contents
Purpose of handout & Compatibility between different versions of SPSS……….. 1
SPSS window & menus……… 1
Getting data into SPSS & Editing data……….. 3
Reading an SPSS viewer/output (.spv) file & Editing your pout………. 7
Saving data as an SPSS data (.sav) file…..………... 8
Saving your output (statistical results and graphs)……… 9
Exporting SPSS Output………. 10
Printing your work & Exiting SPSS……….. 11
Running SPSS using syntax or command language (.sps files)….……… 12
Display variable names or variable labels……….13
Creating and Recording Variables Creating a new variable………. 14
Recoding or combining categories of a variable……… 15
Example: Recoding a categorical variable………...15
Example: Creating a indicator or dummy variable………..17
Summarizing your data Frequency tables (& bar charts) for categorical variables………. 20
Contingency tables for categorical variables………. 21
Descriptive statistics (& histograms) for numerical variables……….. 22
Descriptive statistics (& boxplots) by groups for numerical variables………. 24
Using the Split File option for summaries by groups……… 26
Using the Select Cases option for summaries for a subgroup of subjects/observations…… 27
Graphing your data Bar chart……… 28
Histogram & Boxplot……… 29
Normal probability plot………. 30
Error bar plot……….. 31
Scatter plot………. 32
Adding a line or loess smooth to a scatter plot……….. 32
Stem-and-leaf plot……….. 33
Hypothesis tests & Confidence intervals One sample t test & Confidence interval for a mean………. 34
Paired t test & Confidence interval for the difference between means………. 37
Two sample t test & Confidence interval for the difference between means……… 39
Sign test and Wilcoxon signed rank test………... 42
Mann Whitney U test (or Wilcoxon rank sum test)………... 45
One-way ANOVA (Analysis of variance) & Post-hoc tests………... 47
Kruskal-Wallis test………... 50
One-sample binomial test………... 52
McNemar’s test………..53
Chi-square test for contingency tables………..……….55
Fisher’s exact test………... 55
Trend test for contingency tables/ordinal variables………... 55
Binomial, McNemar’s, Chi-square and Fisher’s exact tests using summary data……….... 59
Confidence interval for a proportion………. 63
Correlation & Regression Pearson and spearman rank correlation coefficient………... 65
Liner regression via ANOVA commands……….. 76 Logistic regression……… 80
Purpose of handout
IBM SPSS Statistics (or SPSS) provides a powerful statistical and data management system in a graphical environment. The user interfaces make statistical analysis more accessible for casual users and more convenient for experienced users. Most tasks can be accomplished simply by pointing and clicking the mouse.
The objective of this handout is to get you oriented with SPSS for Windows. It teaches you how to enter and save data in SPSS, how to edit and transform data, how to explore your data by producing graphics and summary descriptives, and how to use pointing and clicking to run statistical procedures.
Compatibility between different versions of SPSS and PASW Statistics
SPSS data files (files ending in .sav) and syntax (command) files (files ending in .sps) are compatible between different versions of SPSS (at least, versions 11.0 or newer). However, SPSS viewer/output files (files ending in .spv) are NOT compatible between different versions. One option for avoiding compatibility problems between different versions of SPSS is to export your output using an html or MS Word format. The compatibility betweenWindow and Mac versions of SPSS is also limited.
SPSS Windows & Menus
An overview of the SPSS windows, menus, toolbars, and dialog boxes is given in the SPSS Tutorials under Help. You can also find information under Topics, Case Studies, Statistics Coach, and Command & Syntax (if you are using syntax commands.)
Window Types
Data Editor. When you start an SPSS session, you usually see the Data Editor window
(otherwise you will see a Viewer window). The Data Editor displays the contents of the working data file. There a two views in the data editor window: 1) Data View displays the data in a spreadsheet format with variable names listed for column headings, and 2) Variable View which displays information about the variables in your data set. In the Data View you can edit or enter data, and in the Variable View you can change the format of a variable, add format and variable labels, etc.
Viewer (Output). Statistical results and graphs are displayed in the Viewer window. The (output) Viewer window is divided into two panes. The right-hand pane contains the all the output and the left-hand pane contains a tree-structure of the results. You can use the left-hand pane for navigating through, editing and printing your results.
Chart Editor. The chart editor is used to edit graphs. When you double-click on figure or graph, it will reappear in a chart editor window.
Syntax Editor. The Syntax Editor is used to create SPSS command syntax for using the SPSS production facility. Usually you will be using the point and click facilities of SPSS, and hence, you will not need to use the Syntax Editor. More information about the Syntax Editor and using the SPSS syntax is given in the SPSS Help Tutorials under Working with Syntax. A few instructions to get you started are given later in the handout in the section Running SPSS using the Syntax Editor (or Command Language)
Menus
Data Editor Menu:
File. Use the File menu to create a new SPSS file, open an existing file, or read in spreadsheet or database files created by other software programs (e.g., Excel).
Edit. Use the Edit menu to modify or copy data and output files.
View. Choose which buttons are available in the window or how the window should look. Data. Use the Data menu to make changes to SPSS data files, such as merging files, transposing variables, or creating subsets of cases for subset analysis.
Transform. Use the Transform menu to make changes to selected variables in the data file (e.g., to recode a variable) and to compute new variables based on existing variables.
Analyze. Use the Analyze menu to select the various statistical procedures you want to use, such as descriptive statistics, cross-tabulation, hypothesis testing and regression analysis.
Graphs. Use the Graphs menu to display the data using bar charts, histograms, scatterplots, boxplots, or other graphical displays . All graphs can be customized with the Chart Editor. Utilities. Use the Utilities menu to view variable labels for each variable.
Add-ons. Information about other SPSS software. Window. Choose which window you want to view.
Help. Index of help topics, tutorials, SPSS home page, Statistics coach, and version of SPSS. Viewer Menu: Menu is similar to Data Editor menu, but has two additional options:
Insert. Use the insert menu to edit your output
Format. Use the format menu to change the format of your output.
Chart Editor Menu: Use SPSS Help to learn more about the Chart Editor.
Toolbars
Most Windows applications provide buttons arranged along the top of a window that act as shortcuts to executing various functions. In SPSS, you will find such buttons (icons) at the top the of the Data Editor, Viewer, Chart Editor, and Syntax windows. The icons are usually symbolic representations of the procedure they execute when pushed, unfortunately their meanings are not intuitively obvious until one has already used them. Hence, the best way to learn these buttons is to use them and note what happens.
The Status Bar The Status Bar runs along the bottom of a window and alerts the user to the status of the system. Typical messages one will see are “Processor is ready”, “Running procedure…”. The Status Bar will also provide up-to-date information concerning special manipulations of the data file like whether only certain cases are being used in an analysis or if the data has been weighted according to the value of some variable.
File Types
Data Files. A file with an extension of .sav is assumed to be a data file in SPSS for Windows format. A file with an extension of .por is a portable SPSS data file. The contents of a data file are displayed in the Data Editor window.
Viewer (Output) Files. A file with an extension of .spv is assumed to be a Viewer file containing statistical results and graphs.
Syntax (Command) Files. A file witn an extension of .sps is assumed to be a Syntax file containing spss syntax and commands.
Getting Data into SPSS & Editing Data
When reading and editing data into SPSS the data will be displayed in the Data Editor Window. An overview of the basic structure of an SPSS data file is given in the SPSS Help Tutorials:
1. Choose Help on the menu bar 2. Choose Tutorial
3. Choose Reading Data
Reading Data from a SPSS Data (.sav) File
To read a data file from your computer/floppy disk/flash drive that was created and saved using SPSS. The filename should end with the suffix .sav.
1. Choose Open an existing data source 2. Double click on the filename or
3. Single click on the filename and choose OK Or
2. Choose File on the menu bar 3. Choose Open
4. Choose Data...
5. Edit the directory or disk drive to indicate where the data is located. 6. Double click on the filename or
7. Single click on the filename and choose Open
Reading Data from an Text Data File
To read an raw/text (ascii) data file from your computer/floppy disk/flash drive, where the data for each observation is on a separate line and a space is used to separate variables on the same line (i.e., the file format is freefield). The filename should end with the suffix .dat.
1. Choose File on the menu bar 2. Choose Read Text Data 3. Choose Files of Type *.dat
4. Edit the directory or disk drive to indicate where the data is located 5. Double click on the filename or
6. Single click on the filename and choose Open 7. Follow the Import Wizard Instructions. You can also get to the Import Wizard as follows:
1. Choose File on the menu bar 2. Choose Open
3. Choose Data...
4. Choose Files of Type *.dat
5. Edit the directory or disk drive to indicate where the data is located 6. Double click on the filename or
7. Single click on the filename and choose Open 8. Follow the Import Wizard Instructions.
Instructions on how to read a text data file in fixed format are located in SPSS Help Tutorials under Reading Data from a Text File.
Reading Data from Other Types of External Files
SPSS allows you to read a variety of other types of external files, such as Excel spreadsheet files, SAS data files, and Stata data files. To read data from other types of external files, you follow the same steps as you would for reading an SPSS save file, except that you specify the file type according to what package was used to create the save file. For further instruction on how to read data from other types of external files, see the SPSS for Windows Base System User's Guide on data files or the SPSS Help Tutorials.
Entering and Editing Data Using the Data Editor
The Data Editor provides a convenient spreadsheet-like facility for entering, editing, and displaying the contents of your data file. A Data Editor window opens automatically when you start an SPSS session. Instruction on Using the Data Editor to enter data is given in the SPSS Help Tutorials. Note that if you are already familiar with entering data into a different
spreadsheet program (e.g., MS Excel), you might find it easy to enter your data in the program your are familiar with and then read the data into SPSS.
Entering Data. Basic data entry in the Data Editor is simple:
Step 1. Create a new (empty) Data Editor window. At the start of an SPSS session a new
(empty) Data Editor window opens automatically. During an SPSS session you can create a new Data Editor window by
1. Choose File 2. Choose New 3. Choose Data
Step 2. Move the cursor to the first empty column.
Step 3. Type a value into the cell. As you type, the value appears in the cell editor at the top of the Data Editor window. Each time you press the Enter key, the value is entered in the cell and you move down to the next row. By entering data in a column, you automatically create a variable and SPSS gives it the default variable name var00001.
Step 4. Choose the first cell in the next column. You can use the mouse to click on the cell or use the arrow keys on the keyboard to move to the cell. By default, SPSS names the data in the second column var00002.
Step 5. Repeat step 4 until you have entered all the data. If you entered an incorrect value(s) you will need to edit your data. See the following section on Editing Data.
Editing Data. With the Data Editor, you can modify a data file in many ways. For example you can change values or cut, copy, and paste values, or add and delete cases.
To Change a Data Value:
1. Click on a data cell. The cell value is displayed in the cell editor. 2. Type the new value. It replaces the old value in the cell editor. 3. Press then Enter key. The new value appears in the data cell. To Cut, Copy, and Paste Data Values
1. Select (highlight) the cell value(s) you want to cut or copy. 2. Pull down the Edit box on the main menu bar.
3. Choose Cut. The selected cell values will be copied, then deleted. Or 4. Choose Copy. The selected cell values will be copied, but not deleted. 5. Select the target cell(s) (where you want to put the cut or copy values). 6. Pull down the Edit box on the main menu bar.
7. Choose Paste. The cut or copy values will be ``pasted'' in the target cells. To Delete a Case (i.e., a Row of Data)
1. Click on the case number on the left side of the row. The whole row will be highlighted. 2. Pull down the Edit box on the main menu bar.
3. Choose Clear.
To Add a Case (i.e., a Row of Data)
1. Select any cell in the case from the row below where you want to insert the new case. 2. Pull down the Data box on the main menu bar.
3. Choose Insert.
Defining Variables. The default name for new variables is the prefix var and a sequential five-digit number (e.g., var00001, var00002, var00003). To change the name, format and other attributes of a variable.
1. Double click on the variable name at the top of a column or,
2. Click on the Variable View tab at the bottom of Data Editor Window.
3. Edit the variable name under column labeled Name. The variable name must be eight characters or less in length. You can also specify the number of decimal places (under Decimals), assign a descriptive name (under Label), define missing values (under Missing), define the type of variable (under Measure; e.g., scale, ordinal, nominal), and define the values for nominal variables (under Values).
After the data is entered (or several times during data entering), you will want to save it as an SPSS save file. See the section on Saving Data As An SPSS Save File.
Reading an SPSS Viewer/Output (.spv) File
Statistical results and graphs are displayed in the Viewer window. An overview of how to use the Viewer is given in the SPSS Help Tutorials under Working with Output.
If you saved the results of Viewer window during an earlier SPSS session, you can use the following commands to display the Viewer (output) results in a current SPSS session. However, SPSS output/viewer files (files ending in .spv) are NOT always compatible between different versions. Usually SPSS output files created with an older version and can be read by a new version, but an output file created using a new version can not be read by an older version. One option for avoiding compatibility problems between different versions of SPSS is to export your output in html or MS Word format. The compatibility between Window and Mac versions of SPSS is limited.
To read a Viewer file from your computer\floppy disk\flashdrive that was created and saved using SPSS. The filename should end with the suffix spv.
1. Choose File on the menu bar 2. Choose Open
3. Choose Output...
4. Edit the directory or disk drive to indicate where the data is located 5. Double click on the filename or
6. Single click on the filename and choose Open
Editing Your Output
Editing the statistical results and graphs in the Viewer window is beyond the scope of this handout. Instructions on how to edit your output is given in the SPSS Help Tutorials under Working with Output and Creating and Editing Charts.
You can use either the tree-structure in the left hand pane or the results displayed in the right hand pane to select, move or delete parts of the output.
To edit a table or object (an object is a group of results) you first need to double click on the table/object so an “editing” box appears around the table/object, and then select the value you want to modify. An “editing box'” will be a ragged box outlining the table. If you only do a single click you will get a box with straight/plain lines outlining the table. In general, to create “nice looking” tables of your results it is often easier to hand enter the values into a blank MS Word table than to edit a SPSS table/object (either in SPSS or MS Word).
To edit a chart you first need to double click on the chart so it appears in a new Chart Editor window. After you are done editing the chart, close the window and then export the chart, for example to a windows metafile and then into a MS Word file.
By default in SPSS a P-value is displayed as .000 if the P-value is less than .001. You can report the P-value as <.001 or to have SPSS display more significant digits:
1. In a SPSS (output) Viewer window double click (with the left mouse button) on the table containing the p-value you want to display differently A ``editing box'' should appear around the table.
2. Click on the p-value using the right mouse button.
3. Choose Cell Properties. (If you do not get this option, you need to double click on the table to get the ragged box.)
4. Change the number of decimals to the desired number (default is 3). 5. Choose OK or
6. Double click on the p-value with the left mouse button and SPSS will display the p-value with more significant digits. If the p-value is very small, the p-value will be displayed in scientific notation (e.g., 1.745E-10 = 0.0000000001745).
Saving Data as an SPSS Data (.sav) File
To save data as a new SPSS Data file onto your computer/floppy disk/flashdrive:
1. Display the Data Editor window (i.e., execute the following commands while in the Data Editor window displaying the data you want to save.)
2. Choose File on the menu bar. 3. Choose Save As...
4. Edit the directory or disk drive to indicate where the data should be saved. SPSS will automatically add the .sav suffix to the filename.
5. Choose Save
To save data changes in an existing SPSS Save: file.
1. Display the Data Editor window (i.e., execute the following commands while in the Data Editor window displaying the data you want to save.)
2. Choose File box on the menu bar 3. Choose Save
Caution. The Save command saves the modified data by overwriting the previous version of the file.
You can save your data in other formats besides an SPSS save file (e.g., as an ASCII file, Excel file, SAS data set). To save your data with a given format you follow the same steps as saving data in a new SPSS Save file, except that you specify the Save as Type as the desired format.
Saving Your Output (Statistical Results and Graphs)
To save the statistical results and graphs displayed in the Viewer window as a new SPSS Output file:
1. Display the Viewer window (i.e., execute the following commands while in the Viewer window displaying the results you want to save.)
2. Choose File on the menu bar. 3. Choose Save As...
4. Edit the directory or disk drive to indicate where the output should be saved. SPSS will automatically add the .spv suffix to the filename.
5. Choose Save
To save Viewer changes in an existing SPSS Output file.
1. Display the Viewer window (i.e., execute the following commands while in the Viewer window displaying the results you want to save.)
2. Choose File on the menu bar. 3. Choose Save.
Caution. The Save command saves the modified Viewer window by overwriting the previous version of the file.
NOTE that you will not be able to open SPSS output that was created with a different version than the version of SPSS that you are using to open the output. You can avoid this
incompatibility problem by exporting your output in an html or MS Word format (see the next page).
Exporting SPSS Output
Sometimes you will want to save your SPSS output in a different file format than a SPSS output file, because you want to avoid compatibility problems between different versions of SPSS, you want to further edit your output in a Word document, or you want include graphs or figures in another document file. The basic steps in exporting SPSS output to another file type are, while in a SPSS (output) Viewer window:
1. Choose File 2. Choose Export
3. Objects to Export: Choose what you want to export
All: Exports all the output and other information not shown in the
output. You usually do not want to use this opion.
All – visible: Exports all visible output
Selected: Exports only output that is selected or highlighted in the
Viewer window
4. Document – Type: Choose the type of file or format you want to use save your results.
Word/RTF (*.doc) is a good option. Numerical and graphical output will be saved in the same file.
With the HTML option numerical output will be saved in one file and each graph will be saved in a separate file.
5. Document – File Name: Enter the file name and location. 6. Choose OK (or Paste)
Printing Your Work in SPSS
To print statistical results and graphs in the Viewer window or data in the Data Editor window:
NOTE
there is
no printing capability
at the Seattle Downtown Campus
Classroom Location.
Exiting SPSS
To exit SPSS:1. Choose File on the menu bar 2. Choose Exit SPSS
If you have made changes to the data file or the output file since the last time you saved these files, before exiting SPSS you will be asked whether you want to save the contents of the Data Editor window and Viewer window. If you are unsure as to whether you want to save the contents of the data or output window, choose Cancel, then display the window(s) and if you want to save the contents of the window, follow the instructions in this handout for saving data or output windows. SPSS will use the overwrite method when saving the contents of the window.
1. Display the output or data you want to print (i.e., execute the following commands while in a viewer/output or data window)
2. Choose File on the menu bar. 3. Choose Print...
4. Choose All visible output or Selected output (if you have selected parts of the output).
Running SPSS using Syntax (or Command Language)
This handout describes how to the run various statistical summaries and procedures using the point-and-click menus in SPSS. However, it is possible run SPSS commands using SPSS syntax/command language. If you are running similar analyses repeatedly, it can be more efficient to run your analysis using SPSS syntax. How to run SPSS using the syntax/command language is beyond the scope of this handout. Help on running SPSS using the syntax/command language can be found in the SPSS Tutorials under Working with Syntax.
To get you started using SPSS syntax, follow the point-and-click instructions for running a particular analysis, but select Paste instead of OK at the last step. A Syntax Editor window will open containing the SPSS syntax for running the analysis. To run the analysis you can choose Run on the menu bar or you can highlight the syntax you want to run, click the right mouse button, and select Run Selection. You can add more syntax to the Syntax Editor window by using the point-and-click method, selecting Paste instead of OK at the last step. The
additional syntax will be added at the bottom of the Syntax Editor window. You can also write syntax directly into the syntax file and/or use copy, paste and editing commands to modify the syntax. Remember to save you syntax file before exiting SPSS. The file should end in .sps. You can open a syntax file by selecting File on the menu bar, Open, and the Syntax…
Here’s an example of SPSS syntax.
This syntax runs a two sample t- test comparing HDL cholesterol (hdl) for subjects without and with CHD (incchd, coded 0 for no and 1 for yes).
This syntax creates 3 indicators variables, neversmoker,
formersmoker, and
currentsmoker for smoking status (smoke).
Note that a period (.) is used to denote the end of a string of syntax and Execute. is
sometimes required to run the syntax.
Comments can be added between the symbols /* and */ or after * to help you remember what the syntax is doing.
Displaying Variable Names or Variable Labels
When running SPSS via the menus you want to either have the variable labels or variable names displayed.
Here is an example of the variable labels being displayed. The variable name is also (always) displayed in parenthesis after the variable label.
Here is an example of the variable name being displayed.
To select whether the variable labels or names display:
1. Choose Edit 2. Choose Options 3. Choose General 4. Select Display labels
Creating and Recoding Variables
Creating a New Variable
To create a new variable:1. Display the Data Editor window (i.e., execute the following commands while in the Data Editor window displaying the data file you want to use to create a new variable).
2. Choose Transform on the menu bar 3. Choose Compute Variable...
4. Enter the new variable name in the Target Variable box.
5. Enter the definition of the new variable in the Numeric Expression box (e.g., SQRT(visan), LN(age), or MEAN(age)) or
6. Select variable(s) and combine with desired arithmetic operations and/or functions. 7. Choose OK
After creating a new variable(s), you will probably want to save the new variable(s) by re-saving your data using the Save command under File on the menu bar (See Saving Data as an SPSS Save File). Further instructions on creating a new variable are given in the SPSS Help Tutorials under Modifying Data Values.
Example: Creating a (New) Transformed Variable
You can use the SPSS commands for creating a new variable to create a transformed variable. Suppose you have a variable indicating triglyceride level, trig, and you want to transform this variable using the natural logarithm to make the distribution less skewed (i.e., you want to create a new variable which is natural logarithm of triglyceride levels).
Now, a new variable, lntrig, which is the natural logarithm of trig, will be added to your data set. Remember to save your data set before exiting SPSS (e.g., while in the SPSS Data window, choose Save under File or click on the floppy disk icon).
1. Display the Data Editor window
2. Choose Transform on the menu bar
3. Choose Compute... 4. Enter, say, lntrig, in the
Target Variable box.
5. Enter Ln(trig) in the Numeric Expression box.
Recoding or Combining Categories of a Variable
To recode or combine categories of a variable:1. Display the Data Editor window (i.e., execute the following commands while in the Data Editor window displaying the data file you want to use to recode variables).
2. Choose Transform on the menu bar 3. Choose Recode
4. Choose Into Same Variables... or Into Different Variables...
5. Select a variable to recode from the variable list on the left and then click on the arrow located in the middle of the window. This defines the input variable.
6. If recoding into a different variable, enter the new variable name in the box under Name:, then choose Change. This defines the output variable.
7. Choose Old and New Values...
8. Choose Value or Range under Old Value and enter old value(s). 9. Choose New Value and enter new value, then choose Add. 10. Repeat the process until all old values have been redefined. 11. Choose Continue
12. Choose OK
After creating a new variable(s), you will probably want to save the new variable(s) by re-saving your data using the Save command under File box on the menu bar (See Saving Data as an SPSS Save File).
Example: Recoding a Categorical Variable
You can use the commands for recoding a variable to change the coding values of a categorical variable. You may want to change a coding value for a particular category to modify which category SPSS uses as the referent category in a statistical procedure. For example, suppose you want to perform linear regression using the ANOVA (or General Linear Model) commands, and one of your independent variables is smoking status, smoke, that is coded 1 for never smoked, 2 for former smoker and 3 for current smoker. By default SPSS will use current smoker as the referent category because current smoker has the largest numerical (code) value. If you want never smoked to be the referent category you need to recode the value for never smoked to a value larger than 3.
Although you can recode the smoking status into the same variable, it is better to recode the variable into a new/different variable, newsmoke, so you do not lose your original data if you make an error while recoding.
Remember to save your data set before exiting SPSS. window
2. Choose Transform 3. Choose Recode 4. Choose Into Different
Variables...
5. Select the variable smoke as the Input variable
6. Enter newsmoke as the name of the Output variable, and then choose Change. 7. Choose Old and New
Values...
8. Choose Value under Old Value. (It may already be selected.)
9. Enter 1 (code for never smoker)
10.Choose Value under New Value. (It may already be selected.)
11.Enter 4 (or any value greater than 3)
12.Choose Add
13.Choose All Other Values under Old Value.
14.Choose Copy Old Value(s) under New Value.
15.Choose Add 16.Choose Continue 17.Choose OK
Example: Creating Indicator or Dummy Variables
You can use the commands for recoding a variable to create indicator or dummy variables in SPSS. Suppose you have a variable indicating smoking status, smoke, that is coded 1 for never smoked, 2 for former smoker and 3 for current smoker. To create three new
indicator or dummy variables for never, former and current smoking:
Now, you have created a binary indicator variable for never smoker (coded 1 if never smoker, 0 if former or current smoker). Next, create a binary indicator variable for former smoker.
1. Display the Data Editor window
2. Choose Transform 3. Choose Recode 4. Choose Into Different
Variables...
5. Select the variable smoke as the Input variable 6. Enter neversmoke as the
name of the Output variable, and then choose Change.
7. Choose Old and New Values...
8. Choose Value under Old Value. (It may already be selected.)
9. Enter 1 (code value for never smoker)
10.Choose Value under New Value. (It may already be selected.)
11.Enter 1 (to indicate never smoker)
12.Choose Add
13.Choose All Other Values under Old Value.
14.Choose Value under New Value.
15.Enter 0 16.Choose Add 17.Choose Continue 18.Choose OK
Now, you have a created a binary indicator variable for former smoker (coded 1 if former smoker, 0 if never or current smoker). To create a binary indicator variable for current smoker you would use similar commands to those for creating the indicator variable for former smoke, except that now the value of 3 for smoke is coded as 1 and all other values are coded as 0.
1. Display the Data Editor window
2. Choose Transform 3. Choose Recode 4. Choose Into Different
Variables...
5. Select the variable smoke as the Input variable 6. Enter formersmoke as the
name of the Output variable, and then choose Change. (Or change (edit) never to former, and then choose Change).
7. Choose Old and New Values...
8. Choose 1→1 under Old→New and then choose Remove.
9. Choose Value under Old Value.
10.Enter 2 (code value for former smoker)
11.Choose Value under New Value.
12.Enter 1 (to indicate former smoker)
13.Choose Add 14.Choose Continue 15.Choose OK
Example: Creating a Categorical Variable From a Numerical Variable
You can use the commands for recoding a variable to create a categorical variable from a numerical variable (i.e., group values of the numerical variable into categories). For example, suppose you have a variable that is the number of pack years smoked, packyrs, and you want to create a categorical variable with the four categories, 0, >0 to 10, >10 to 30, and >30 pack years smoked.
Note that if you may want to use different coding values depending on which category you want to be used as the referent category in certain statistical procedures. Remember to save your data set before exiting SPSS.
1. Display the Data Editor window 2. Choose Transform
3. Choose Recode
4. Choose Into Different Variables... 5. Select the variable packyrs as the Input
variable
6. Enter a name for the new variable, packcat, for the Output variable, and then choose Change.
7. Choose Old and New Values...
8. Choose Value under Old Value. (It may already be selected.)
9. Enter 0
10. Choose Value under New Value. 11. Enter 0 (to indicate 0 pack years) 12. Choose Add
13. Choose Range under Old Value. 14. Enter 0.01 and 10 in the two blank
boxes.
15. Choose Value under New Value 16. Enter 1 (to indicate >0 to 10 pack years) 17. Choose Add
18. Choose Range under Old Value. 19. Enter 10.01 and 30 in the two blank
boxes.
20. Choose Value under New Value 21. Enter 2 (to indicate >10 to 30 pack
years) 22. Choose Add
23. Choose Range, value through HIGHEST under Old Value.
24. Enter 30.01 in the blank box.
25. Choose Value under New Value 26. Enter 3 (to indicate >30 pack years)
27. Choose Add
28. Choose Continue
Summarizing Your Data
Frequency Tables (& Bar Charts) for Categorical Variables. To produce frequency tables and bar charts for categorical variables:
1. Choose Analyze from the menu bar 2. Choose Descriptive Statistics 3. Choose Frequencies…
4. Variable(s): To select the variables you want from the source list on the left, highlight a variable by pointing and clicking the mouse and then click on the arrow located in the middle of the window. Repeat the process until you have selected all the variables you want.
5. Choose Charts (Skip to step 7 if you do not want bar charts.) 6. Choose Bar Chart(s)
7. Choose Continue 8. Choose OK
Example: Frequency table and bar chart for the categorical variable, smoking status (smoke).
Frequency table and bar chart of smoking status
current former never Smoking status 60 50 40 30 20 10 0 Pe rc e n t Smoking status Smoking status is the selected variable(s) and Bar charts under Charts… has been selected. Smoking status Fre-quency Percent Valid Percent Cumu-lative Percent never 590 59.0 59.0 59.0 former 293 29.3 29.3 88.3 current 117 11.7 11.7 100.0 Total 1000 100.0 100.0
Contingency Tables for Categorical Variables. To produce contingency tables for categorical variables:
1. Choose Analyze from the menu bar. 2. Choose Descriptive Statistics 3. Choose Crosstabs...
4. Row(s): Select the row variable you want from the source list on the left and then click on the arrow located next to the Row(s) box. Repeat the process until you have selected all the row variables you want.
5. Column(s): Select the column variable you want from the source list on the left and then click on the arrow located next to the Column(s) box. Repeat the process until you have selected all the column variables you want.
6. Choose Cells...
7. Choose the cell values (e.g., observed counts; row, column, and margin (total) percentages). Note the option is selected when the little box is not empty.
8. Choose Continue 9. Choose OK
Example: Contingency table of smoking status by coronary heart disease (CHD).
Smoking status * Incident CHD Crosstabulation Incident CHD
Total
no yes
Smoking
status never % within Smoking statusCount 91.0%537 9.0% 53 100.0% 590
former Count 257 36 293
% within Smoking status 87.7% 12.3% 100.0%
current Count 106 11 117
% within Smoking status 90.6% 9.4% 100.0%
Total Count 900 100 1000
% within Smoking status 90.0% 10.0% 100.0%
Smoking status is the row variable and CHD is the column variable. Observed counts and row percentages will be displayed.
Descriptive Statistics (& Histograms) for Numerical Variables. To produce descriptive statistics and histograms for numerical variables:
1. Choose Analyze on the menu bar 2. Choose Descriptive Statistics 3. Choose Frequencies...
4. Variable(s): To select the variables you want from the source list on the left, highlight a variable by pointing and clicking the mouse and then click on the arrow located in the middle of the window. Repeat the process until you have selected all the variables you want.
5. Choose Display frequency tables to turn off the option. Note that the option is turned off when the little box is empty.
6. Choose Statistics
7. Choose summary measures (e.g., mean, median, standard deviation, minimum, maximum, skewness or kurtosis).
8. Choose Continue
9. Choose Charts (Skip to step 11 if you do not want histograms.) 10.Choose Histograms(s)
11.Choose Continue 12.Choose OK
An alternate way to produce only the descriptive statistics is at step 3 to choose Descriptives... instead of Frequencies..., then, select the variables you want. By default SPSS computes the mean, standard deviation, minimum and maximum. Choose Options... to select other summary measures.
Example: Descriptive summaries and histogram for the numerical variable age. Age is the variable to summarize. You
can select more than one variable to analyze.
Remember to turn off the Display frequency tables option.
Summaries for Age
Statistics Age N Valid 1000 Missing 0 Mean 72.14 Std. Deviation 5.275 Minimum 65 Maximum 90 Histogram of Age 95 90 85 80 75 70 65 60 Age 120 100 80 60 40 20 0 Fr e q ue nc y Mean =72.14 Std. Dev. =5.275 N =1,000 Histogram Mean, standard deviation, minimum and maximum were selected under Statistics…, and histogram was selected under Charts…
Descriptive Statistics (& Boxplots) by Groups for Numerical Variables. To produce descriptive statistics and boxplots by groups for numerical variables:
1. Choose Analyze on the menu bar 2. Choose Descriptive Statistics 3. Choose Explore...
4. Dependent List: To select the variables you want to summarize from the source list on the left, highlight a variable by pointing and clicking the mouse and then click on the arrow located next to the dependent list box. Repeat the process until you have selected all the variables you want.
5. Factor List: To select the variables you want to use to define the groups from the source list on the left, highlight a variable by pointing and clicking the mouse and then click on the arrow located next to the factor list box.
6. Choose Plots... (If you do not want boxplots, choose Statistics for the Display option and skip to Step 11.)
7. Choose Factor levels together from the Boxplot box.
8. Select Stem-and-leaf option from the Descriptive box to turn off the option. 9. Choose Continue
10.Choose Both for the Display option 11.Choose OK
Example: Total cholesterol by family history of heart attack (yes or no).
Under Statistics… Descriptives is usually selected by default. Under Plots select Boxplot option and unselect stem-and-leaf.
Select Percentiles if you want the 25th and 75th percentiles to report with the median.
In this example total cholesterol is the dependent variable. You can select more than one variable. Summaries will be computed for each group defined by family history of heart attack. Both numerical summaries
Descriptives Family history of heart attack Statistic Std. Error Total cholesterol no Mean 221.93 1.417 95% Confidence
Interval for Mean Lower Bound 219.15
Upper Bound 224.72 5% Trimmed Mean 221.63 Median 219.76 Variance 1350.641 Std. Deviation 36.751 Minimum 111 Maximum 363 Range 252 Interquartile Range 49 Skewness .184 .094 Kurtosis .363 .188 yes Mean 220.53 2.150 95% Confidence
Interval for Mean Lower Bound 216.30
Upper Bound 224.76
Boxplot of Total Cholesterol by Family History of Heart Attack
yes no
Family history of heart attack 400 350 300 250 200 150 100 T o tal ch o lest er o l 812 875 659 95 172 438 729 by default produces a lot of different
summaries, so you need to select what to report.
All summaries are shown for all groups – the table has been cropped in this example.
The interquartile range is reported as the difference between the 75th and 25th percentiles. Request percentiles (see prior page) to get the 25th and 75th percentiles.
Using the Split File Option for Summaries by Groups for Categorical and Numerical Variables. The Split File option in SPSS is a convenient way to produce summaries, graphs, and run statistical procedures by groups. To activate the option:
1. Choose Data on the menu bar of the Data Editor window 2. Choose Split File
3. Choose Compare groups or Organize output by groups. The two options display the output differently. Try each option to see which works best for your needs.
4. Choose the variable that defines the groups. 5. Choose OK
Now, all the summaries, graphs, and statistical procedures you request will be done (automatically) for each group. To turn off this option:
1. Choose Data on the menu bar of the Data Editor window 2. Choose Split File
3. Choose Analyze all cases, do no create groups 4. Choose OK
Example. Use the Split File option to run summaries by family history of heart attack (yes or no).
Compare groups option will try to display the results for each group side by side when feasible.
Organize output by groups option will display the results separately for each group starting with the group with the lowest numerical code value.
Using the Select Cases Option for Summaries for a subgroup of subjects/observations. The Select Cases option in SPSS is a convenient way to produced summaries and run statistical procedures for a subgroup of subjects or to temporary exclude subjects from the analysis. To activate this option:
1. Choose Data on the menu bar of the Data Editor window 2. Choose Select Cases…
3. Choose If condition is satisfied 4. Choose If…
5. Enter the expression that indicates the subjects/observation you want to select. 6. Choose Continue
7. Choose OK
Now, all the summaries, graphs, and statistical procedures you request will be done using only the selected subjects/observations. To turn off this option:
1. Choose Data on the menu bar of the Data Editor window 2. Choose Select Cases…
3. Choose All cases 4. Choose OK
Example: Select subjects not lipid lowering medications (i.e., subjects with lipid = 0 indicating no medications).
Select the If condition is satisfied and then If…
Caution! Usually you do not want to delete observations from your dataset, so do not select this Typical expressions will involve
combinations of the following symbols: Symbol Definition
= equal ~= not equal
>= greater than or equal <= less than or equal > greater than < less than & and | or
Graphing Your Data
You can produce very fancy figures and graphs in SPSS. Producing fancy figures and graphs is beyond the scope of this handout. Instructions on producing figures and graphs can be found in SPSS Help under Topics → Contents → Building Charts and Editing Charts, as well as in the SPSS Tutorials under Creating and Editing Charts. Note, that both the Help and Tutorials you need to have Internet access. Also, last time I tried the doing a tutorial is didn’t work.
This handout covers the basic commands for creating simple graphs using the Legacy Dialogs under Graphs versus the newer methods using the Chart Builder .
Bar Charts
The easiest way to produce simple bar charts is to use the Bar Chart option with the
Frequencies... command. See Frequency Tables (& Bar Charts) for Categorical Variables. You can only produce only one bar chart at a time using the Bar command.
current former never Smoking status 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% Pe rc e n t current former never Smoking status 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% Pe rc e n t yes no Family history of heart attack
1. Choose Graphs and then Legacy Dialogs from the menu bar. 2. Choose Bar...
3. Choose Simple, Clustered, or Stacked
4. Choose what the data in the bar chart represent (e.g., summaries for groups of cases). 5. Choose Define
6. Select a variable from the variable list on the left and the click on the arrow next to the Category axis.
7. Choose what the bars represent (e.g., number of cases or percentage of cases) 8. Choose OK
Histograms
The easiest way to produce simple histograms is to use the Histogram option with the
Frequencies... command. See Descriptive Statistics (& Histograms) for Numerical Variables. You can produce only one histogram at a time using the Histogram command.
50 40
30 20
10
Body mass index 120 100 80 60 40 20 0 Fr e q u e n c y Mean =26.2366 Std. Dev. =4.8667 N =1,000 Boxplots
The easiest way to produce simple boxplots is to use the Boxplot option with the Explore... command. See Descriptive Statistics (& Boxplots) By Groups for Numerical Variables. You can produce only one boxplot at a time using the Boxplot command.
diabetic impaired fasting
glucose normal
ADA diabetes status 400 200 0 S e ru m f a s ti n g gl u c o s e 785 880 684 77 673 1. Choose Graphs and then Legacy
Dialogs from the menu bar. 2. Choose Boxplot...
3. Choose Simple or Clustered 4. Choose what the data in the
boxplots represent (e.g., summaries for groups of cases). 5. Choose Define
6. Select a variable from the variable list on the left and then click on the arrow next to the Variable box.
7. Select the variable from the variable list that defines the groups and then click on the arrow next to Category Axis. 8. Choose OK
1. Choose Graphs and then Legacy Dialogs from the menu bar
2. Choose Histogram... 3. Select a variable from the
variable list on the left and then click on the arrow in the middle of the window.
4. Choose Display normal Curve if you want a normal curve
superimposed on the histogram. 5. Choose OK
Normal Probability Plots. To produce Normal probability plots: 1. Choose Analyze from the menu bar
2. Choose Descriptive Statistics.
3. Choose Q-Q Plots... to get a plot of the quantiles (Q-Q plot) or choose P-P Plots... to get a plot of the cumulative proportions (P-P plot)
4. Select the variables from the source list on the left and then click on the arrow located in the middle of the window.
5. Choose Normal as the Test Distribution. The Normal distribution is the default Test Distribution. Other Test Distributions can be selected by clicking on the down arrow and clicking on the desired Test distribution.
6. Choose OK
SPSS will produce both a Normal probability plot and a detrended Normal probability plot for each selected variable. Usually the Q-Q plot is the most useful for assessing if the distribution of the variable is approximately Normal.
600 400 200 0 -200 Observed Value 250 200 150 100 50 0 -50 E x p e c ted No rm al V a lu e
Normal Q-Q Plot of Serum fasting glucose
50 40 30 20 10 Observed Value 40 30 20 10 Ex p e c te d N o rm a l Va lu e
Error Bar Plot. To produce an error bar plot of the mean of a numerical variable (or the means for different groups of subjects):
1. Choose Graphs and then Legacy Dialogs from the menu bar. 2. Choose Error Bar...
3. Choose Simple or Clustered
4. Choose what the data in the error bars represent (e.g., summaries for groups of cases). 5. Choose Define
6. Select a variable from the variable list on the left and then click on the arrow next to the Variable box.
7. Select the variable from the variable list that defines the groups and then click on the arrow next to Category Axis.
8. Select what the bars represent (e.g., confidence interval, ±standard deviation, ±standard error of the mean)
9. Choose OK
Error Bar Plot
diabetic impaired fasting
glucose normal
ADA diabetes status 300 250 200 150 100 50 M e an + - 2 S D S e ru m fa st in g g lu c o s e A bar chart of the mean with error bars can be made using the commands for making a bar chart
ADA diabetes status
diabetic impaired fasting glucose normal M e a n S e ru m f a s ti n g gluc o s e 300 200 100 0 Error bars: +/- 2 SD
1. Choose Graphs and then Legacy Dialogs from the menu bar.
2. Choose Bar... 3. Choose Simple
4. Choose Summaries for groups of cases 5. Choose Define
6. Select a variable from the variable list on the left and the click on the arrow next to the Category axis (e.g., diabetes status) 7. Choose Other statistic (e.g. mean). By
default the mean will be selected. 8. Choose a variable for the Variable that
you the want to display the mean (or Other statistic).
9. Choose Options
10. Select Display error bars
11. Select Standard deviation, and enter 2 for the Multiplier
12. Choose Continue 13. Choose OK
Scatter Plot. To produce a scatter plot between two numerical variables: 50 40 30 20 10
Body mass index 140 120 100 80 60 40 20 0 H D L ch o les te ro l HLD cholesterol vs BMI
Adding a linear regression line to a scatter plot. To add a linear regression (least-squares) line to a scatter plot of two numerical variables:
50 40
30 20
10
Body mass index 140 120 100 80 60 40 20 0 H D L ch o les te ro l HLD cholesterol vs BMI R Sq Linear = 0.121 Additional options:
o Choose Mean under Confidence Intervals (in the Properties window) to add a prediction
interval for the linear regression line to the scatter plot or
o Choose Individual under Confidence Intervals to add a prediction interval for individual
observations to the scatter plot.
7.Click on the ``X'' in the upper right hand corner of the Chart Editor window, or choose File and then Close to return to the Viewer window.
1. Choose Graphs and then Legacy Dialogs on the menu bar.
2. Choose Scatter/Dot... 3. Choose Simple 4. Choose Define
5. Y Axis: Select the y variable you want from the source list on the left and then click on the arrow next to the y axis box.
6. X Axis: Select the x variable you want from the source list on the left and then click on the arrow next to the x axis box.
7. Choose Titles...
8. Enter a title for the plot (e.g., y vs. x).
9. Choose Continue 10.Choose OK
1. While in the Viewer window double click on the scatter plot. The scatter plot should now be
displayed in a window titled Chart Editor.
2. Choose Elements.
3. Choose Fit Line at Total. (A line should be added to the plot, because the next 2 steps are the default options.
4. Choose Linear (in the Properties window)
5. Choose Apply 6. Choose Close
Adding a Loess (scatter plot) smooth to a scatter plot. To add a Loess smooth to a scatter plot of two numerical variables:
50 40
30 20
10
Body mass index 140 120 100 80 60 40 20 0 H D L ch o les te ro l HLD cholesterol vs BMI
Stem-and-leaf Plot. To produce stem-and-leaf plot: 1. Choose Analyze on the menu bar
2. Choose Descriptive Statistics 3. Choose Explore...
4. Dependent List: To select the variables you want from the source list on the left, highlight a variable by pointing and clicking the mouse and then click on the arrow located next to the dependent list box. Repeat the process until you have selected all the variables you want. 5. Choose Plots...
6. Choose Stem-and-leaf from the Descriptive box. Note the option may already be selected if the little box is not empty.
7. Choose None from the Boxplot box 8. Choose Continue
9. Choose Plots for the Display option 10.Choose OK
Severity of Illness Index Stem-and-Leaf Plot
Frequency Stem & Leaf 2.00 4 . 34 7.00 4 . 6688899 10.00 5 . 0001112344 3.00 5 . 568 1.00 Extremes (>=62) Stem width: 10.00
Each leaf: 1 case(s)
1. While in the Viewer window double click on the scatter plot. The scatter plot should now be
displayed in a window titled Chart Editor.
2. Choose Elements. 3. Choose Fit Line at Total.
The next two steps (4. & 5.) may be already selected
4. Choose Loess (in the Properties window). Default options for % of points to fit (50%) and kernel (Epanechnikov) are usually appropriate options.
5. Choose Apply (in the Properties window).
6. Choose Close
7. Click on the ``X'' in the upper right hand corner of the Chart Editor window, or choose File and then Close to return to the Viewer.
Hypothesis Tests & Confidence Intervals
One-Sample t Test
1. Choose Analyze from the menu bar. 2. Choose Compare Means
3. Choose One-Sample T Test...
4. Test Variable(s): Select the variable you want from the source list on the left, highlight variables by pointing and clicking the mouse and then click on the arrow located in the middle of the window.
5. Edit the Test Value. The Test Value is the value of the mean under the null hypothesis. The default value is zero.
6. Choose OK
Confidence Interval for a Mean (from one sample of data)
1. Choose Analyze from the menu bar.2. Choose Compare Means 3. Choose One-Sample T Test...
4. Test Variable(s): Select the variable you want from the source list on the left, highlight variables by pointing and clicking the mouse and then click on the arrow located in the middle of the window.
5. The Test Value should be 0, which is the default value.
6. By default a 95% confidence interval will be computed. Choose Options… to change the confidence level.
7. Choose OK
SIDS Example. There were 48 SIDS cases in King County, Washington, during the years 1974 and 1975. The birth weights (in grams) of these 48 cases were:
2466 3941 2807 3118 2098 3175 3317 3742 3062 3033 2353 3515 2013 3515 3260 2892 1616 4423 2750 2807 2807 3005 3374 3572 2722 2495 3459 3374 1984 2495 3005 2608 2353 4394 3232 3062 2013 2551 2977 3118 2637 1503 2722 2863 2013 3232 2863 2438
We want to know if the mean birth weight in the population of SIDS infant is different from that of normal children, 3300 grams. We could construct a 95% confidence interval, to see if the interval contains the value of 3300 grams or we could perform a one sample t test to test if the mean in the SIDs population is equal to 3300 (versus not equal to 3300).
The mean (and standard deviation) of these
measurements is 2891 (623) grams.
To construct a
95%
confidence interval
One-Sample Statistics
N Mean Std. Deviation Std. Error Mean
birth weight 48 2891.1250 623.39177 89.97885
One-Sample Test
Test Value = 0
t df Sig. (2-tailed) Difference Mean
95% Confidence Interval of the Difference
Lower Upper
birth weight 32.131 47 .000 2891.12500 2710.1109 3072.1391
When computing the interval for a mean make sure the Test Value is 0.
Ignore the t test results (t, df, sig.) because these results are for testing if the mean birth weight is equal to 0 (versus not equal to zero).
95% confidence interval for the mean birth weight is 2710 to
3072 grams
Number of subjects, mean, standard deviation, and standard error of the mean.
To perform a
one sample t test
to test if the mean in the SIDs population is equal
to 3300 versus not equal to 3300.
One-Sample Statistics N Mean Std. Deviation Std. Error Mean birth weight 48 2891.1250 623.39177 89.97885 One-Sample Test Test Value = 3300
t df Sig. (2-tailed) Difference Mean
95% Confidence Interval of the
Difference
Lower Upper
birth weight -4.544 47 .000 -408.87500 -589.8891 -227.8609
To run the one-sample t test to test if the mean birth weight is equal to 3300 you need to change the Test Value from the default value of 0 to 3300.
Ignore the results for 95% confidence interval of the difference, because it is the confidence interval for the mean minus 3300.
Sig. (2-tailed) = two tailed p-value = <.001 t = test statistic value = -4.544
Paired t Test
1. Choose Analyze from the menu bar. 2. Choose Compare Means
3. Choose Paired-Samples T Test...
4. Paired Variable(s): Select two paired variables you want from the source list on the left, and then click on the arrow in the middle of the in window. The order in which you select the two variables will determine how the difference is computed. Repeat the process until you have selected all the paired variables you want to test.
5. Choose OK
Confidence Interval for the Difference Between Means from Paired Sample
By default a 95% confidence interval for the difference means of the paired samples will be computed when performing a paired t test. Choose Options… to change the confidence level.Prozac Example. To compare the effect of Prozac on anxiety 10 subjects are given one week of treatment with Prozac and one week of treatment with a placebo. The order of the treatments was randomized for each subject. An anxiety questionnaire was used to measure a subject's anxiety on a scale of 0 to 30. Higher scores indicate more anxiety.
Subject Placebo Prozac Difference 1 22 19 3 2 18 11 7 3 17 14 3 4 19 17 2 5 22 23 -1 6 12 11 1 7 14 15 -1 8 11 19 -8 9 19 11 8 10 7 8 -1 Mean difference, d 1.3 Standard deviation, sd 4.5
Paired t test
and
confidence interval
for the difference between paired means.
Paired Samples Statistics
Mean N Std. Deviation Std. Error Mean
Pair 1 placebo 16.1000 10 4.95424 1.56667
prozac 14.8000 10 4.68568 1.48174
Paired Samples Correlations
N Correlation Sig.
Pair 1 placebo & prozac 10 .556 .095
Paired Samples Test
Paired Differences t df Sig. (2-tailed)
Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Pair 1 placebo - prozac 1.30000 4.54728 1.43798 -1.95293 4.55293 .904 9 .390
Summaries for each sample of data (or variable).
Correlation between the paired values - usually not useful.
difference = placebo - prozac mean difference = 1.3
standard deviation of the differences = 4.5
standard error of the differences = 1.4
95% confidence interval for the mean difference is -1.9 to 4.6
Paired t test
Sig. (2 tailed) = two-sided p-value = 0.39 t = test statistic value = .904
df = degrees of freedom
The order of the variables in calculating the difference is determined by the order in which you selected the
variables. The difference will computed by Variable 1 – Variable 2.
Two-Sample t Test
1. Choose Analyze on the menu bar. 2. Choose Compare Means
3. Choose Independent-Samples T Test...
4. Test Variable(s): Select the test variable you want from the source list on the left and then click on the arrow located next to the test variable box. Repeat the process until you have selected all the variables you want.
5. Grouping Variable: Select the variable which defines the groups and then click on the arrow located next to the grouping variable box.
6. Choose Define Groups...
7. Click on blank box next to Group 1, then enter the code value (numeric or character/string) for group 1.
8. Click on blank box next to Group 2, then enter the code value (numeric or character/string) for group 2.
9. Choose Continue 10.Choose OK
Confidence Interval for the Difference Between Means from Independent
Samples
By default a 95% confidence interval for the difference means from two independent samples will be computed when performing a two sample t test. Choose Options… to change the confidence level.
Model Cities Example. Two groups of people were studied - those who had been randomly allocated to a Fee-For-Service medical insurance group and those who had been randomly allocated to a Prepaid insurance group.
We would like to compare the two groups on the quality of health care they received in each group, but first we would like to know how comparable the groups are on other characteristics that might affect medical outcome. For example, we would like to know if the mean age in the two groups is similar. Hopefully, the process of random allocation minimizes this possibility, but there is always a chance that it didn't.
Group n Mean Standard
deviation
Prepaid (GHC) 1167 24.0 15.3
Fee-for-service (KCM) 3207 26.4 17.1
We could compare the average age between the two groups using a two sample t test or a confidence interval for the difference between the average ages of the two groups.
Two sample t test
and
95% confidence interval
for the difference between means
(from independent samples).
T-Test
Group Statistics
prov N Mean Std. Deviation Std. Error Mean
age GHC 1167 23.9846 15.30787 .44810
KCM 3207 26.3676 17.10260 .30200
Independent Samples Test
Levene's Test for Equality of Variances
F Sig.
age Equal variances
assumed 47.068 .000
Equal variances
not assumed
After you select the Grouping Variable, SPSS will put in question marks to
prompt you to define the code values for the two groups. Select Define Groups… to enter the code values.
In this example the group codes are numeric, 0 (for GHC) and 1 (for KCM)
Summaries for each sample/group.
SPSS by default tests if the variances are equal using Levene’s test. A small p-value (sig.) indicates the variances may be different.
sig. = p-value = <.001
Independent Samples Test
t-test for Equality of Means
t df Sig. (2-tailed)
Mean Difference
Std. Error Difference
age Equal variances
assumed -4.188 4372 .000 -2.38306 .56896
Equal variances
not assumed -4.410 2293.698 .000 -2.38306 .54037
Independent Samples Test
95% Confidence Interval of the
Difference
Lower Upper
age Equal variances
assumed -3.49851 -1.26760
Equal variances
not assumed -3.44273 -1.32338
Two Sample t test. SPSS by default always performs both versions of the two sample t test assuming equal variance and unequal variances
Sig. (2 – tailed) = two sided p-value = <.001 (equal var.), <.001 (unequal var.) t = test statistic value = -4.2 (equal var.), -4.4 (unequal var.)
df = degrees of freedom = 4372 (equal var.), 2294 (unequal var.)
mean difference = difference between means = -2.4 (equal and unequal var.)
std. error difference = standard error of the difference between means = .6 (equal var.), .5 (unequal var.)
95% confidence interval for the difference between means is
-3.4 to -1.3 (assuming equal variances)
Sign Test and Wilcoxon Signed-Rank Test
1. Choose Analyze from the menu bar.2. Choose Nonparametric Tests 3. Choose Legacy Dialogs 4. Choose 2 Related Samples...
5. Test Pair(s) List: Select two paired variables you want from the source list on the left, and then click on the arrow in the middle of the in window. The order in which you select the two variables will determine how the difference is computed. Repeat the process until you have selected all the paired variables you want to test.
6. Choose Sign as the Test Type. 7. and/or
8. Choose Wilcoxon as the Test Type. 9. Choose OK
Aspirin Example. To compare 2 types of Aspirin, A and B, 1 hour urine samples were collected from 10 people after each had taken either A or B. A week later the same routine was followed after giving the “other” type to the same 10 people.
Person Type A Type B Difference
1 15 13 2 2 26 20 6 3 13 10 3 4 28 21 7 5 17 17 0 6 20 22 -2 7 7 5 2 8 36 30 6 9 12 7 5 10 18 11 7 Mean = 19.2 15.6 3.6 = d Standard deviation = 8.63 7.78 3.098 = sd
A Sign test or Wilcoxon Signed Rank test could be used to compare the two types of Aspirin.