Using SPSS for Quantitative Methods in Geography and Planning Overview

(1)

Using SPSS for Quantitative Methods in Geography and Planning

Overview

SPSS for Windows is a very powerful tool for the management and analysis of data. Because it is a Windows application, you should feel slightly comfortable with the way it works, even though you may have no working familiarity with SPSS itself. For the purposes of this course, you will have no need to go beyond the menu items presented to you in SPSS. In case you are interested, SPSS has a scripting capability for you to submit a set of instructions that access the same routines as the menu items. Scripting makes repetitive tasks and larger data analysis tasks much easier to manage.

I have often described SPSS as “Excel on steroids.” You will find this out soon enough; however I am often asked why we just don’t use Excel for many of the statistical routines we become acquainted with in this course. True, Excel does nearly all analytical routines that we ask SPSS to do, however there are some well-documented shortcomings that Excel has related to precision and

consistency that make many statisticians and analysts shy away from an otherwise useful piece of software. This is unfortunate, as you are much more likely to have access to Excel than SPSS on your future career path.

Regardless, there are many skills you will learn that apply to your work outside of SPSS.

Starting SPSS

SPSS is easy enough to start. On the computers in the department, the software can be found in the “geopln software” menu item on the Start tab. Be sure to select “SPSS for Windows” rather than the SPSS “Production Facility” (the

“production facility” is an advanced feature that allows a user to submit a script or set of instructions to SPSS for execution).

The first window you are likely to come across is an introductory dialog box that

looks like this:

(2)

Here, you will either click “ok” automatically, in which case you will go to another dialog box to open a pre-existing SPSS file (which is the “More Files…” option highlighted under the pre-selected “Open an existing data source” radio button), or you will select the “Type in data” radio button to begin creating a dataset from scratch. More often than not, you will be using a pre-existing SPSS file that either you’ve created, or I’ve provided for you.

We will discuss opening files in the section titled “Bringing Data into SPSS.”

NOTE 1: The radio button “Open another type of file” is for other types of SPSS files (which we’ll discuss later), not for files of other non-SPSS formats such as text or Excel files.

NOTE 2: If you have the luxury of owning SPSS, clicking the “Don’t show this dialog in the future” box will prevent this dialog box from popping up in the future.

However, this will not work in the GHY/PLN computer labs because the

computer’s internal settings get reset every time one user logs off and another logs on.

Help Files

Please don’t disregard the help files. Not only are there topic-specific

discussions about any sort of command or routine that is of interest to you, there are also tutorials to help you work through many aspects of SPSS. I urge you to use these and consider them as much a resource for this class as the text book.

Also, consider what you are reading to be a supplement to what exists in the help and tutorial files. I’m doing my best to hit the high-points, but there are bound to be situations that this document doesn’t cover.

Don’t worry. You need to get into the habit of teaching yourself. You need to get

into the habit of solving your own problems. While I will always be here to help

(3)

you out, please note that often my first response to your question will be “what did the help file say?” or “how did you try to solve this problem?” If you use me as your first option, I’ll send you back. If you’ve adequately exhausted the other resources available to you, I’ll be happy to spend whatever time is necessary to help you work through your problem.

If you want to confirm any solution with me that you arrived at by yourself, I’ll be happy to go over the situation with you.

Remember: If you consistently rely on me as your first resource, you aren’t going to learn and retain much information. If you work through your own problems, the solution will stick with you forever.

Workspaces in SPSS

SPSS has a few different ways to view and describe your data, and in these workspaces you can also change the data as you see fit. The two that we will discuss are the Data View, and the Variable View.

The Data View and Variable View can be accessed by clicking on the appropriate tab at the bottom-left of the SPSS window.

Data View

The Data View is a place that stores the information in your dataset.

(4)

Data Sheet

You will likely feel comfortable in the Data View, because the environment looks similar to Excel. Specifically, what you see is an array of columns and rows, which correspond to variables and observations.

Variables

For our purposes, a variable is a piece of information that describes the

observations in our data. For instance, if my data consist of students enrolled at Appalachian, one variable might be “age.” The variable “age” allows me to distinguish among my observations, however it is not necessary that a variable uniquely identifies each observation in my data (as, say, your student id number would – there are no duplicate id’s).

Variables constitute the columns in the data sheet. The header of the column displays the name of the variable. If you have no dataset loaded, or if any/all variables have no names, then you will see “v1,” “v2,” etc. This differs from Excel, wherein the column headers are likely the first row of your data. In SPSS, your variable names are treated as a separate object.

As you can see below, there are (at least) three variables in this dataset of automobiles: “mpg,” “engine,” and “horse.”

Observations

An observation is a unique unit in your data. In my hypothetical data of

Appalachian students, each student is an individual observation (also known as a record). No student can occupy two observations without being identified as a duplicate, or copy. Likewise, two students cannot be commingled within a single observation.

Each row in your dataset contains all the information attributed to a specific

observation. SPSS merely identifies observations starting from 1 and ending at

the last observation (n). Perhaps this is a good time to get started on notation –

n as I’ve used it can mean the total number of observations in a dataset, or we

might also use it to note last observation (in case we don’t know exactly how

many observations exist in our data).

(5)

To put variables and observations together in our data sheet, each number exists in a “cell” that is the intersection of a specific observation with a specific variable.

If we look at the previous figure car (that is, observation) 4 has an engine

displacement of 304 cubic inches, car 3 has a displacement of 318 cubic inches, etc.

Variable View

The Variable View is a slightly different take on the same dataset we see in the Data View. In the Variable View, we can see and manipulate the variables as objects themselves rather than all the data that these variables express. In the Variable View, we are able to create or delete variables easily. We are also able to alter the characteristics of the variable itself (noting that a variable’s

characteristics are different than the data, which are characteristics of any given observation!).

Variables

In the Variable View, the first thing you might notice is that the variables are displayed in rows (contrast this with the Data View). The columns describe characteristics of that variable. In general, SPSS has these guidelines about the names of variables (from the SPSS Help Files):

•

The name must begin with a letter. The remaining characters can be any letter, any digit, a period, or the symbols @, #, _, or $.

•

Variable names cannot end with a period.

•

Variable names that end with an underscore should be avoided (to avoid conflict with variables automatically created by some procedures).

•

The length of the name cannot exceed 64 bytes. Sixty-four bytes typically means 64 characters in single-byte languages (for example, English, French, German, Spanish, Italian, Hebrew, Russian, Greek, Arabic, Thai) and 32 characters in double-byte languages (for example, Japanese, Chinese, Korean).

•

Blanks and special characters (for example, !, ?, ', and *) cannot be used.

•

Each variable name must be unique; duplication is not allowed.

(6)

•

Reserved keywords cannot be used as variable names. Reserved keywords are: ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, WITH.

•

Variable names can be defined with any mixture of upper and lower case characters, and case is preserved for display purposes.

•

When long variable names need to wrap on to multiple lines in output, SPSS attempts to break lines at underscores, periods, and a change from lower to upper case.

For those with experience in Excel or even in the attribute tables of ArcGIS or ArcView, you will notice that the conventions for naming variables is much more restrictive.

If you need to change the name of a variable, double-click on the name of the variable, or with that cell activated, press the F2 key.

Variable Labels

Do not confuse variable labels with variable names. Variable labels are

important because most datasets you come across have variable names that are limited to eight characters. One can be only so descriptive with eight characters, so variable labels allow you to create an alias of sorts for the variable name. In most circumstances, the data are described in output with the variable label rather than the variable name.

As you can see below, the variable labels for the automobile data are much more descriptive than the variable names themselves:

If you need to change the label of a variable, double-click on the label, or with

that cell activated, press the F2 key.

(7)

Value Labels

Value labels describe the values associated with any variable. This is necessary, because many times numbers are associated with information that has nothing to do with a number. As an example, look at the values associated with the

variable “origin” in the cars.sav dataset:

You see numbers, but in actuality those numbers signify a car’s country of origin.

This is done because it is actually easier to manipulate numbers than text in statistical and database programs.

A value label allows us to tell SPSS that in the variable “origin,” 1 means

“American,” 2 means “European” and 3 means “Japanese.” You can see this by clicking on the value label cell for the variable, then clicking on the button with the ellipses on it.

This brings up a dialog box that allows you to see and/or alter the definition of the

labels:

(8)

Most of these dialog boxes are pretty self-explanatory. Do note that most of them have a “help” button.

Bringing Data into SPSS

There are a number of ways for you to bring data into SPSS. You can enter the data yourself in the data view much as you would in Excel. You can also import data from other sources such as text (ASCII) files or Excel worksheets. Last, you can open up files that have been saved in SPSS’s “.sav” format. There are benefits and costs associated with all of these methods. You choose the one that works best for you, given the situation you’re in. Be aware that in most situations, you will be doing a lot of “grunt work,” gathering, entering and

processing/defining your data. In many instances, this is the majority of the work you do.

Whatever method you choose, remember that YOU are responsible for the integrity of the data. Never assume that because SPSS didn’t beep or barf then everything is just fine. Always make sure that you have all the variables you intended to bring into SPSS, and you end up with the number of observations you expect. Aside from a visual inspection, it is always a good thing to run some summary statistics to be sure that nothing unusual is being observed in your dataset (like, an average age of students being 1134).

Entering Data by Hand

Entering data by hand is a good thing to do if you are working with a small dataset. In addition to typing the numbers into the cells, you can also copy and paste columns or rows into the Data View. You must also be aware that by entering the data by hand, you are assuming more risk of error.

You will also be required to do a fair amount of post-processing of your data.

That is, you will need to create variable and value labels, specify formats and set

data types.

(9)

SPSS files

SPSS data files have a native format with an “sav” extension. That is, if you save the file name as “cars” the SPSS data file will be saved to your disk as “cars.sav.”

You can open these files from the menu:

Or you can navigate outside of SPSS to your data folder, double click on the data file and SPSS will start up and open that file. Of course, this assumes you have SPSS on your computer, and the computer has associated *.sav files as SPSS data files.

Importing Files

Excel

SPSS imports worksheets from Excel fairly easily. However, you must make sure that the structure of the data as they exist in the worksheet conforms to the RxC (row-by-column) format of SPSS. This can be done either as a full

worksheet, or as part of a worksheet. Note that you can only import single sheets from an Excel workbook, not an entire workbook.

As you can see in the image, in a full sheet the data start in column A, row 1 (that is, cell A1). It is also assumed that there is no other information on this

worksheet other than the data in rectangular format.

(10)

If the data are not in this format, you may still import the workbook, but only if it is a partial extraction. Take a look at an example:

These are the same data as in the previous image, with the exception that this

has some “header” information associated with it. The header tells us where the

(11)

data come from and how the variables are described. The data actually start, in rectangular format, in cell A12.

To import this worksheet, a little knowledge about the data is necessary. That is, we need to know the array the data occupy. An array can be in single or multiple rows or columns. In this example, the array is five columns by 101 rows. The columns are the variables and the rows are the 100 NC Counties, to include the variable names. We need to be able to define the array with the top left cell and the bottom right cell, separating the two coordinates with a colon: A12:E112.

The following steps allow you to import an Excel worksheet into SPSS:

1) Be sure that the Excel worksheet is not open or active

2) Bring up the Open File dialog box (File…Open…Data…) and specify the Excel file format:

3) Navigate to where your Excel file is located and open it. You will be presented with another dialog box:

(12)

4) Clicking on the down arrow in the “Worksheet” menu item shows all the worksheets in the workbook:

5) In my “NC Demog Profile 1.xls” you can see that there are five

worksheets, and SPSS has automatically specified the full extent of the information in that worksheet. Compare the “Part of Sheet” and “Full Sheet” with the images above. “Part of Sheet” is the worksheet with the header information, yet this dialog box doesn’t make that distinction (look at the specification of the array inside the brackets after the worksheet name). This would be okay, however if we were importing “Full Sheet.”

6) If you wanted to import “Full Sheet,” it would be appropriate to do that from

here by selecting the worksheet name and clicking “OK.” This is what you

would see:

(13)

7) If you need to import a selection from within a worksheet, such as the same data from “Part of Sheet,” we need to specify the range of data (that

is, the array):

8) After that, hit “OK” and this is what you will see, which is now indistinguishable from the importation of data from the “Full Sheet”

workbook:

9) Note that you will still have to define variable names, labels and other matters.

Text Files

Text files are fairly easy to import, although they can be slightly more

complicated that the Excel workbook. In fact, if you’ve ever opened up a text file in Excel, it is nearly an identical process.

A text file is “text” inasmuch as the file is not a binary file. Many text files are actually full of numbers, although they are usually a mix of numbers and “text” as we know text to be. Another way of describing a text file is as an ASCII file.

Most text files you come across with be delimited in one way or another. A text

delimiter is a way to distinguish one variable from another on an observation by

observation basis.

(14)

The construction of a text file is rectangular, as each row is an observation and each observations is described by any number of variables. Without the benefit of a proper tool to look at a text file, here is what the worksheet “Full Sheet” looks like in text format:

While you can get an idea of the structure of the file, this is true only because there aren’t very many variables to consider.

The first thing you should notice is that the first row consists of the variable names. Each row after that is an observation (in this case, a county in North Carolina) and information associated with each county in the order of the variables as listed in the first row.

The data in this file are “comma-delimited,” meaning that each bit of information for a given observations is separated by a comma. In this context, think of a comma as being a cell separator in Excel.

You will also notice that the variable “c_name” is described in the data with a comma (i.e., “Burke County, North Carolina”). Because the comma between the county name and North Carolina is contained between quotation marks, most programs will recognize this as being part of the data, and not as a delimiter.

This is convenient, but it can also lead to some unusual outcomes if it slips your

mind. This is likely to happen if you have comma-delimited data, but whoever

created the file has their numbers with thousands separators (i.e., 32,300 instead

(15)

of 32300). SPSS will import whatever you tell it to import (in most circumstances) and it is up to you to verify that it was all done correctly.

SPSS has a “wizard” process by which text data are imported. We’ll step through this with the “NC Demog Profile 1.csv” file. If you (or someone else) created a comma-separated file in Excel, it is created with a “csv” extension (Comma Separated Values). The separators can be most anything, but they are usually tab- space- or comma-separated.

You begin the process by selecting the “Read Text Data…” option in the File menu item:

After navigating to, and opening the text file you want, you will start a process by which you will read the text file into SPSS:

Step 1 provides you with a brief glimpse into your data. The first few rows shown in the window matches what we saw in the text file. So far, so good. You will also see that you have the opportunity to apply a predefined format to your data.

This option is only important if you’ve already defined a data set exactly like the one you are currently importing (i.e., variable names, labels, value labels, data types, etc.). This is useful if you have multiple files describing the same

characteristics, in the same variable order, of different observations. For our purposes here, this isn’t necessary.

So long as we can confirm that the data we see is from the file we want, let’s

click “Next >”.

(16)

Step 2 asks us to confirm the arrangement of the variables and whether the variable names are at the top of the file. Regardless of the delimiter (comma, tab, other) we select “delimited.” A fixed width file means that data for each variable occupy a specific number of columns.

We verify that what we are telling SPSS to expect is indeed what we are feeding

SPSS, then we hit “Next.”

(17)

Step 3 asks us to confirm that the variable names are on the first line (as such, the data begin on the second row – line number 2). Since each row is a unique observation or case, we let SPSS know; keep in mind that it is possible to have observations or cases span a multiple number of lines, but most of those

datasets are likely old – I don’t come across many of those nowadays.

We also tell SPSS to import the entire file. Note for now that it is possible to

import part of the file either by selecting the first “n” cases, or to randomly select

a certain percentage of observations. It is unlikely in this class that you will do

anything but import the entire dataset.

(18)

I will represent the Wizard window for Step 4 in multiple stages. Remember that

our data are delimited with commas. As you can see, the default setting in this

wizard is for data to be delimited by spaces and commas (don’t ask me why…):

(19)

You will notice a number of anomalies. Let’s deconstruct what we see.

1. The variable “c_fips” is just fine.

2. “c_name” appears fine, but it is actually split because of the space

between the county name and the word “County” There are bound to be other anomalies with, say, “New Hanover County” because SPSS will interpret the space between “New” and “Hanover” as a delimiter

3. Note the change in variable names between “V03” and “V6.” We actually supplied the variable names up to “V03” and SPSS automatically started naming new variables as they came up even though we didn’t provide the name. Note that “V6” is short for “variable 6” and is the sixth column you see

What about the question here, “What is the text qualifier?” A text qualifier

encompasses a string of text and all spaces, commas and anything else is

considered part of that variable’s information rather than anything SPSS is

supposed to be interpreting. In the next image of Step 4, we will identify only

commas as the delimiter and double quotes as the text qualifier:

(20)

Beautiful.

Step 5 allows us to change the type of variable that exists in our data. For the

first variable “c_fips,” I pulled down the data format menu to give you a glimpse

into the variable types that are possible. SPSS does a good job of guessing, but

it is always nice to verify this by eye. If I’ve told you once, I’ve told you a million

times, YOU are responsible for your data.

(21)

Step 6 wraps things up. There really isn’t anything you need to do here other than press “OK,” but just note that you can save the file formats for future use (remember Step 1’s predefined formats?) or paste the syntax (which is the scripting language that you would have had to write if this wasn’t a Windows program – I’ll spare you the sob stories of teaching myself SPSS version 4 that was just ported to MS-DOS when monkeys descended from the trees). It would look something like this:

GET DATA /TYPE = TXT

/FILE = 'C:\Documents and Settings\Crepeau\My Documents\excel\Quant\NC Demog Profile 1.csv' /DELCASE = LINE

/DELIMITERS = ","

/QUALIFIER = '"'

/ARRANGEMENT = DELIMITED /FIRSTCASE = 2

/IMPORTCASE = ALL /VARIABLES = c_fips F2.1 c_name A33 V01 F6.2 V02 F6.2 V03 F6.2

(22)

So…press “Finish” and this is what you should see:

(23)

Other Formats

Note for future purposes that SPSS can open a number of different file types from spreadsheets, databases and other statistics programs. I’ll leave it to you to explore the possibilities.

Working with Data

The working title “Working with Data” describes what you might call process- oriented commands rather than analysis-oriented commands. These sorts of commands are good for altering or redifining the dataset itself or altering the data.

Manipulating Data Files

The commands for altering the data are under the “Data” menu item. Pulling that

menu item down reveals the following:

(24)

The most important capabilities for us here will likely be the “Merge Files,”

“Aggregate…” and “Select Cases…” commands.

Merging files merely means putting physically separate files into a single dataset.

Two ways of doing this is to place all new cases at the end of an existing file (which assumes that both datasets contain identical variables but unique

observations); or to add new variables to an existing file (which assumes that the observations are identical in both datasets but the variables are unique)

Aggregating a dataset enables us to “squash” a dataset by a common variable.

As an example, if you look at the Cars.sav dataset that SPSS provides for you, each observation is a car. We can aggregate the dataset by country of

manufacture. Ultimately what we end up with is a dataset with three

observations (because there are three countries of manufacture) and all the information about the individual cars is summarized or “aggregated” within their country of origin).

By Selecting Cases, we can tell SPSS to perform functions on observations that meet specific criteria(and therefore, NOT perform them on observations that don’t meet that criteria). This is useful for data manipulation or for data analysis.

We won’t step through the specifics right now. Just know it is possible (or perhaps explore these commands yourself).

Transforming Variables

You can manipulate the variables through the “Transform” menu item. Pulling it

down reveals the following:

(25)

The two most important commands to us will be the “Compute…” and the

“Recode” commands

Compute allows us to create a new variable based on computation. This computation can be a combination of

• A number and an existing variable

• A function and an existing variable

• Two or more existing variables

• A number and a function

• Even more combinations of the above

Recoding a variable allows us to systematically alter the values of an existing variable. These alterations can either be fed back into the original variable being recoded (thus altering it’s information forever), or they can result in a completely new variable. This is useful if we wish to create categories out of a variable with continuous information, reorder the layout of the variables or define specific values as “missing” (you’ll know what that means soon enough).

Analyzing Data

We won’t go into great detail on the specific analytical commands. Rather, we will briefly talk about them. All analysis can be accessed through the “Analyze”

menu item. Pulling it down reveals the following:

(26)

SPSS does quite a bit for us, however we will only scratch the surface in this class. The menu items we will likely use are: Descriptive Statistics, Compare Means, Correlate, Regression and Nonparametric Tests. The sideways triangle next to each item indicates more specific analytical tools under these headings.

We will discuss each in more detail as we use them.

Everything else is icing on the cake.

Generating Output

Whether you are working with output in the form of tables or graphs, each can be considered an object and treated in a similar manner. The output you generate is actually placed in a separate output file and is organized in a hierarchical manner. These objects can be placed in other applications such as a word processor or graphics application

The Output Window

The output window that pops up after an analytical command gets executed is actually a separate file, in exactly the same way that the data are in a data file.

In fact these files can be opened in a way similar to data files:

(27)

Note that “Output…” is an option just as “Data…” is an option. You can create a new output file if you wish, but I really don’t know why you would want to create a new, blank output file. You can open an existing output file (saved from a

previous SPSS session) and once open, all new output is appended onto the end of that file. If you have multiple output files open, the tables or graphics get appended onto the last opened window.

Organization of the Output Window

The total output window is organized into two main areas:

(28)

On the right-hand side, you will see the output tables/graphics and any of the associated titles and other information. On the left, you will see a directory tree that shows you the hierarchical organization of what you see on the right.

In the image below, you will notice that I’ve added a scatterplot graphic to the existing table. Note that each group has it’s own separate hierarchy:

“Descriptives” shows the separate elements or objects of the Descriptives table,

and “Graph” shows the elements or objects of the Scatterplot. When you click on

the object in the outline frame, the object also gets highlighted or selected in the

output frame.

(29)

For most objects, there are three elements. “Title” usually tells you if it is a graph or table. “Notes” are technical notes associated with the output and is closed by default (notice how its icon is a closed book, while the other two icons are open books). Last is the actual table or graph. Double-clicking on any of these icons opens or closes that object.

Objects in the Output Window

You have a limited ability to edit tables and graphs once they appear in the output window. While this is a shortcoming, just realize that it takes very little time to delete the table or graph you don’t want and run the procedure over again with the parameters you want.

You still have the ability to edit the objects once they are in the output window – just double click on that object for it to appear in an editing window – but you’ll soon realize that there isn’t much flexibility.

While the two parts of the output window (file) are a good way to navigate through the output, it is also a good way to reorganize your output. You might find out soon enough that you can generate way more output than is needed for any specific task. With the output window, you can delete, reorder and annotate your output so that at a future time you can revisit your output with a minimal amount of time reacquainting yourself with the output. Even with this ability, I’m sure you’ll find yourself saying “what the hell is this supposed to be?” quite often.

Using Output in Other Applications

Either for assignments or for other purposes, you might need to export or copy and paste your output into another application. This is very easy in SPSS, although there is at least one thing to be aware of: there is a slight difference between copying what you’ve selected, and copying AS AN OBJECT what you’ve selected.

By copying a table and pasting it into, say, Word, you are copying a table and

you’ll notice that you can edit what you’ve pasted in the same way as you edit a

table in word (with rows, columns, etc.) If you copy the object and paste it, it is

the same as pasting an image into Word. One method is more flexible than the

other, not necessary better than the other.

(30)

Concluding Remarks

This document is meant as an introduction to using SPSS, not a definitive document. Use it to introduce yourself to SPSS, and be sure to use the help documentation and even the tutorials where necessary.

As time goes on in this course, we will go deeper into the specific commands and

routines that SPSS provides. Here, with the exception of bringing in data to

SPSS, we’ve just skimmed the surface of SPSS’s capabilities.