Using SPSS for Quantitative Methods in Geography and Planning
Overview
SPSS for Windows is a very powerful tool for the management and analysis of data. Because it is a Windows application, you should feel slightly comfortable with the way it works, even though you may have no working familiarity with SPSS itself. For the purposes of this course, you will have no need to go beyond the menu items presented to you in SPSS. In case you are interested, SPSS has a scripting capability for you to submit a set of instructions that access the same routines as the menu items. Scripting makes repetitive tasks and larger data analysis tasks much easier to manage.
I have often described SPSS as “Excel on steroids.” You will find this out soon enough; however I am often asked why we just don’t use Excel for many of the statistical routines we become acquainted with in this course. True, Excel does nearly all analytical routines that we ask SPSS to do, however there are some well-documented shortcomings that Excel has related to precision and
consistency that make many statisticians and analysts shy away from an otherwise useful piece of software. This is unfortunate, as you are much more likely to have access to Excel than SPSS on your future career path.
Regardless, there are many skills you will learn that apply to your work outside of SPSS.
Starting SPSS
SPSS is easy enough to start. On the computers in the department, the software can be found in the “geopln software” menu item on the Start tab. Be sure to select “SPSS for Windows” rather than the SPSS “Production Facility” (the
“production facility” is an advanced feature that allows a user to submit a script or set of instructions to SPSS for execution).
The first window you are likely to come across is an introductory dialog box that
looks like this:
Here, you will either click “ok” automatically, in which case you will go to another dialog box to open a pre-existing SPSS file (which is the “More Files…” option highlighted under the pre-selected “Open an existing data source” radio button), or you will select the “Type in data” radio button to begin creating a dataset from scratch. More often than not, you will be using a pre-existing SPSS file that either you’ve created, or I’ve provided for you.
We will discuss opening files in the section titled “Bringing Data into SPSS.”
NOTE 1: The radio button “Open another type of file” is for other types of SPSS files (which we’ll discuss later), not for files of other non-SPSS formats such as text or Excel files.
NOTE 2: If you have the luxury of owning SPSS, clicking the “Don’t show this dialog in the future” box will prevent this dialog box from popping up in the future.
However, this will not work in the GHY/PLN computer labs because the
computer’s internal settings get reset every time one user logs off and another logs on.
Help Files
Please don’t disregard the help files. Not only are there topic-specific
discussions about any sort of command or routine that is of interest to you, there are also tutorials to help you work through many aspects of SPSS. I urge you to use these and consider them as much a resource for this class as the text book.
Also, consider what you are reading to be a supplement to what exists in the help and tutorial files. I’m doing my best to hit the high-points, but there are bound to be situations that this document doesn’t cover.
Don’t worry. You need to get into the habit of teaching yourself. You need to get
into the habit of solving your own problems. While I will always be here to help
you out, please note that often my first response to your question will be “what did the help file say?” or “how did you try to solve this problem?” If you use me as your first option, I’ll send you back. If you’ve adequately exhausted the other resources available to you, I’ll be happy to spend whatever time is necessary to help you work through your problem.
If you want to confirm any solution with me that you arrived at by yourself, I’ll be happy to go over the situation with you.
Remember: If you consistently rely on me as your first resource, you aren’t going to learn and retain much information. If you work through your own problems, the solution will stick with you forever.
Workspaces in SPSS
SPSS has a few different ways to view and describe your data, and in these workspaces you can also change the data as you see fit. The two that we will discuss are the Data View, and the Variable View.
The Data View and Variable View can be accessed by clicking on the appropriate tab at the bottom-left of the SPSS window.
Data View
The Data View is a place that stores the information in your dataset.
Data Sheet
You will likely feel comfortable in the Data View, because the environment looks similar to Excel. Specifically, what you see is an array of columns and rows, which correspond to variables and observations.
Variables
For our purposes, a variable is a piece of information that describes the
observations in our data. For instance, if my data consist of students enrolled at Appalachian, one variable might be “age.” The variable “age” allows me to distinguish among my observations, however it is not necessary that a variable uniquely identifies each observation in my data (as, say, your student id number would – there are no duplicate id’s).
Variables constitute the columns in the data sheet. The header of the column displays the name of the variable. If you have no dataset loaded, or if any/all variables have no names, then you will see “v1,” “v2,” etc. This differs from Excel, wherein the column headers are likely the first row of your data. In SPSS, your variable names are treated as a separate object.
As you can see below, there are (at least) three variables in this dataset of automobiles: “mpg,” “engine,” and “horse.”
Observations
An observation is a unique unit in your data. In my hypothetical data of
Appalachian students, each student is an individual observation (also known as a record). No student can occupy two observations without being identified as a duplicate, or copy. Likewise, two students cannot be commingled within a single observation.
Each row in your dataset contains all the information attributed to a specific
observation. SPSS merely identifies observations starting from 1 and ending at
the last observation (n). Perhaps this is a good time to get started on notation –
n as I’ve used it can mean the total number of observations in a dataset, or we
might also use it to note last observation (in case we don’t know exactly how
many observations exist in our data).
To put variables and observations together in our data sheet, each number exists in a “cell” that is the intersection of a specific observation with a specific variable.
If we look at the previous figure car (that is, observation) 4 has an engine
displacement of 304 cubic inches, car 3 has a displacement of 318 cubic inches, etc.
Variable View
The Variable View is a slightly different take on the same dataset we see in the Data View. In the Variable View, we can see and manipulate the variables as objects themselves rather than all the data that these variables express. In the Variable View, we are able to create or delete variables easily. We are also able to alter the characteristics of the variable itself (noting that a variable’s
characteristics are different than the data, which are characteristics of any given observation!).
Variables
In the Variable View, the first thing you might notice is that the variables are displayed in rows (contrast this with the Data View). The columns describe characteristics of that variable. In general, SPSS has these guidelines about the names of variables (from the SPSS Help Files):
•
The name must begin with a letter. The remaining characters can be any letter, any digit, a period, or the symbols @, #, _, or $.
•
Variable names cannot end with a period.
•
Variable names that end with an underscore should be avoided (to avoid conflict with variables automatically created by some procedures).
•
The length of the name cannot exceed 64 bytes. Sixty-four bytes typically means 64 characters in single-byte languages (for example, English, French, German, Spanish, Italian, Hebrew, Russian, Greek, Arabic, Thai) and 32 characters in double-byte languages (for example, Japanese, Chinese, Korean).
•
Blanks and special characters (for example, !, ?, ', and *) cannot be used.
•
Each variable name must be unique; duplication is not allowed.
•
Reserved keywords cannot be used as variable names. Reserved keywords are: ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, WITH.
•
Variable names can be defined with any mixture of upper and lower case characters, and case is preserved for display purposes.
•
When long variable names need to wrap on to multiple lines in output, SPSS attempts to break lines at underscores, periods, and a change from lower to upper case.
For those with experience in Excel or even in the attribute tables of ArcGIS or ArcView, you will notice that the conventions for naming variables is much more restrictive.
If you need to change the name of a variable, double-click on the name of the variable, or with that cell activated, press the F2 key.
Variable Labels
Do not confuse variable labels with variable names. Variable labels are
important because most datasets you come across have variable names that are limited to eight characters. One can be only so descriptive with eight characters, so variable labels allow you to create an alias of sorts for the variable name. In most circumstances, the data are described in output with the variable label rather than the variable name.
As you can see below, the variable labels for the automobile data are much more descriptive than the variable names themselves:
If you need to change the label of a variable, double-click on the label, or with
that cell activated, press the F2 key.
Value Labels
Value labels describe the values associated with any variable. This is necessary, because many times numbers are associated with information that has nothing to do with a number. As an example, look at the values associated with the
variable “origin” in the cars.sav dataset:
You see numbers, but in actuality those numbers signify a car’s country of origin.
This is done because it is actually easier to manipulate numbers than text in statistical and database programs.
A value label allows us to tell SPSS that in the variable “origin,” 1 means
“American,” 2 means “European” and 3 means “Japanese.” You can see this by clicking on the value label cell for the variable, then clicking on the button with the ellipses on it.
This brings up a dialog box that allows you to see and/or alter the definition of the
labels:
Most of these dialog boxes are pretty self-explanatory. Do note that most of them have a “help” button.
Bringing Data into SPSS
There are a number of ways for you to bring data into SPSS. You can enter the data yourself in the data view much as you would in Excel. You can also import data from other sources such as text (ASCII) files or Excel worksheets. Last, you can open up files that have been saved in SPSS’s “.sav” format. There are benefits and costs associated with all of these methods. You choose the one that works best for you, given the situation you’re in. Be aware that in most situations, you will be doing a lot of “grunt work,” gathering, entering and
processing/defining your data. In many instances, this is the majority of the work you do.
Whatever method you choose, remember that YOU are responsible for the integrity of the data. Never assume that because SPSS didn’t beep or barf then everything is just fine. Always make sure that you have all the variables you intended to bring into SPSS, and you end up with the number of observations you expect. Aside from a visual inspection, it is always a good thing to run some summary statistics to be sure that nothing unusual is being observed in your dataset (like, an average age of students being 1134).
Entering Data by Hand
Entering data by hand is a good thing to do if you are working with a small dataset. In addition to typing the numbers into the cells, you can also copy and paste columns or rows into the Data View. You must also be aware that by entering the data by hand, you are assuming more risk of error.
You will also be required to do a fair amount of post-processing of your data.
That is, you will need to create variable and value labels, specify formats and set
data types.
SPSS files
SPSS data files have a native format with an “sav” extension. That is, if you save the file name as “cars” the SPSS data file will be saved to your disk as “cars.sav.”
You can open these files from the menu:
Or you can navigate outside of SPSS to your data folder, double click on the data file and SPSS will start up and open that file. Of course, this assumes you have SPSS on your computer, and the computer has associated *.sav files as SPSS data files.
Importing Files
Excel
SPSS imports worksheets from Excel fairly easily. However, you must make sure that the structure of the data as they exist in the worksheet conforms to the RxC (row-by-column) format of SPSS. This can be done either as a full
worksheet, or as part of a worksheet. Note that you can only import single sheets from an Excel workbook, not an entire workbook.
As you can see in the image, in a full sheet the data start in column A, row 1 (that is, cell A1). It is also assumed that there is no other information on this
worksheet other than the data in rectangular format.
If the data are not in this format, you may still import the workbook, but only if it is a partial extraction. Take a look at an example:
These are the same data as in the previous image, with the exception that this
has some “header” information associated with it. The header tells us where the
data come from and how the variables are described. The data actually start, in rectangular format, in cell A12.
To import this worksheet, a little knowledge about the data is necessary. That is, we need to know the array the data occupy. An array can be in single or multiple rows or columns. In this example, the array is five columns by 101 rows. The columns are the variables and the rows are the 100 NC Counties, to include the variable names. We need to be able to define the array with the top left cell and the bottom right cell, separating the two coordinates with a colon: A12:E112.
The following steps allow you to import an Excel worksheet into SPSS:
1) Be sure that the Excel worksheet is not open or active
2) Bring up the Open File dialog box (File…Open…Data…) and specify the Excel file format:
3) Navigate to where your Excel file is located and open it. You will be presented with another dialog box:
4) Clicking on the down arrow in the “Worksheet” menu item shows all the worksheets in the workbook:
5) In my “NC Demog Profile 1.xls” you can see that there are five
worksheets, and SPSS has automatically specified the full extent of the information in that worksheet. Compare the “Part of Sheet” and “Full Sheet” with the images above. “Part of Sheet” is the worksheet with the header information, yet this dialog box doesn’t make that distinction (look at the specification of the array inside the brackets after the worksheet name). This would be okay, however if we were importing “Full Sheet.”
6) If you wanted to import “Full Sheet,” it would be appropriate to do that from
here by selecting the worksheet name and clicking “OK.” This is what you
would see:
7) If you need to import a selection from within a worksheet, such as the same data from “Part of Sheet,” we need to specify the range of data (that
is, the array):
8) After that, hit “OK” and this is what you will see, which is now indistinguishable from the importation of data from the “Full Sheet”
workbook:
9) Note that you will still have to define variable names, labels and other matters.
Text Files
Text files are fairly easy to import, although they can be slightly more
complicated that the Excel workbook. In fact, if you’ve ever opened up a text file in Excel, it is nearly an identical process.
A text file is “text” inasmuch as the file is not a binary file. Many text files are actually full of numbers, although they are usually a mix of numbers and “text” as we know text to be. Another way of describing a text file is as an ASCII file.
Most text files you come across with be delimited in one way or another. A text
delimiter is a way to distinguish one variable from another on an observation by
observation basis.
The construction of a text file is rectangular, as each row is an observation and each observations is described by any number of variables. Without the benefit of a proper tool to look at a text file, here is what the worksheet “Full Sheet” looks like in text format:
While you can get an idea of the structure of the file, this is true only because there aren’t very many variables to consider.
The first thing you should notice is that the first row consists of the variable names. Each row after that is an observation (in this case, a county in North Carolina) and information associated with each county in the order of the variables as listed in the first row.
The data in this file are “comma-delimited,” meaning that each bit of information for a given observations is separated by a comma. In this context, think of a comma as being a cell separator in Excel.
You will also notice that the variable “c_name” is described in the data with a comma (i.e., “Burke County, North Carolina”). Because the comma between the county name and North Carolina is contained between quotation marks, most programs will recognize this as being part of the data, and not as a delimiter.
This is convenient, but it can also lead to some unusual outcomes if it slips your
mind. This is likely to happen if you have comma-delimited data, but whoever
created the file has their numbers with thousands separators (i.e., 32,300 instead
of 32300). SPSS will import whatever you tell it to import (in most circumstances) and it is up to you to verify that it was all done correctly.
SPSS has a “wizard” process by which text data are imported. We’ll step through this with the “NC Demog Profile 1.csv” file. If you (or someone else) created a comma-separated file in Excel, it is created with a “csv” extension (Comma Separated Values). The separators can be most anything, but they are usually tab- space- or comma-separated.
You begin the process by selecting the “Read Text Data…” option in the File menu item:
After navigating to, and opening the text file you want, you will start a process by which you will read the text file into SPSS:
Step 1 provides you with a brief glimpse into your data. The first few rows shown in the window matches what we saw in the text file. So far, so good. You will also see that you have the opportunity to apply a predefined format to your data.
This option is only important if you’ve already defined a data set exactly like the one you are currently importing (i.e., variable names, labels, value labels, data types, etc.). This is useful if you have multiple files describing the same
characteristics, in the same variable order, of different observations. For our purposes here, this isn’t necessary.
So long as we can confirm that the data we see is from the file we want, let’s
click “Next >”.
Step 2 asks us to confirm the arrangement of the variables and whether the variable names are at the top of the file. Regardless of the delimiter (comma, tab, other) we select “delimited.” A fixed width file means that data for each variable occupy a specific number of columns.
We verify that what we are telling SPSS to expect is indeed what we are feeding
SPSS, then we hit “Next.”
Step 3 asks us to confirm that the variable names are on the first line (as such, the data begin on the second row – line number 2). Since each row is a unique observation or case, we let SPSS know; keep in mind that it is possible to have observations or cases span a multiple number of lines, but most of those
datasets are likely old – I don’t come across many of those nowadays.
We also tell SPSS to import the entire file. Note for now that it is possible to
import part of the file either by selecting the first “n” cases, or to randomly select
a certain percentage of observations. It is unlikely in this class that you will do
anything but import the entire dataset.
I will represent the Wizard window for Step 4 in multiple stages. Remember that
our data are delimited with commas. As you can see, the default setting in this
wizard is for data to be delimited by spaces and commas (don’t ask me why…):
You will notice a number of anomalies. Let’s deconstruct what we see.
1. The variable “c_fips” is just fine.
2. “c_name” appears fine, but it is actually split because of the space
between the county name and the word “County” There are bound to be other anomalies with, say, “New Hanover County” because SPSS will interpret the space between “New” and “Hanover” as a delimiter
3. Note the change in variable names between “V03” and “V6.” We actually supplied the variable names up to “V03” and SPSS automatically started naming new variables as they came up even though we didn’t provide the name. Note that “V6” is short for “variable 6” and is the sixth column you see
What about the question here, “What is the text qualifier?” A text qualifier
encompasses a string of text and all spaces, commas and anything else is
considered part of that variable’s information rather than anything SPSS is
supposed to be interpreting. In the next image of Step 4, we will identify only
commas as the delimiter and double quotes as the text qualifier:
Beautiful.
Step 5 allows us to change the type of variable that exists in our data. For the
first variable “c_fips,” I pulled down the data format menu to give you a glimpse
into the variable types that are possible. SPSS does a good job of guessing, but
it is always nice to verify this by eye. If I’ve told you once, I’ve told you a million
times, YOU are responsible for your data.
Step 6 wraps things up. There really isn’t anything you need to do here other than press “OK,” but just note that you can save the file formats for future use (remember Step 1’s predefined formats?) or paste the syntax (which is the scripting language that you would have had to write if this wasn’t a Windows program – I’ll spare you the sob stories of teaching myself SPSS version 4 that was just ported to MS-DOS when monkeys descended from the trees). It would look something like this:
GET DATA /TYPE = TXT
/FILE = 'C:\Documents and Settings\Crepeau\My Documents\excel\Quant\NC Demog Profile 1.csv' /DELCASE = LINE
/DELIMITERS = ","
/QUALIFIER = '"'
/ARRANGEMENT = DELIMITED /FIRSTCASE = 2
/IMPORTCASE = ALL /VARIABLES = c_fips F2.1 c_name A33 V01 F6.2 V02 F6.2 V03 F6.2