The writing is exceptionally clear and easy to follow, and precise definitions are provided to avoid confusion. Examples are used to illustrate each concept, and those examples are, like everything in this book, clear and logically presented. Sample SAS output is provided for every analysis, with each part labeled and thoroughly explained so the reader understands the results.
Sheri Bauman, Ph.D. Assistant Professor Department of Educational Psychology University of Arizona, Tucson
[Larry Hatcher] once again manages to provide clear, concise, and detailed explanations of the SAS program and procedures, including appropriate examples and sample write-ups.
Frank Pajares Winship Distinguished Research Professor Emory University
The Student Guide and the Exercises books are excellent choices for use in quantitative courses in psychology and education.
Bert W. Westbrook, Ph.D. Professor of Psychology Alumni Distinguished Undergraduate Professor North Carolina State University
BASIC
STATISTICS
Using
SAS
®
Using SAS : Student Guide. Cary, NC: SAS Institute Inc.
Step-by-Step Basic Statistics Using SAS®
: Student Guide
Copyright © 2003 by SAS Institute Inc., Cary, NC, USA ISBN 1-59047-148-2
All rights reserved. Printed in the United States of America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc.
U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related
documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987).
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513. 1st printing, April 2003
SAS Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hardcopy books, visit the SAS Publishing Web site at support.sas.com/pubs or call 1-800-727-3228.
SAS®
and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Dedication
Contents
Acknowledgments ...ix
Chapter 1: Using This Student Guide ...1
Introduction ... 3
Introduction to the SAS System ... 4
Contents of This Student Guide ... 6
Conclusion ... 11
Chapter 2: Terms and Concepts Used in This Guide ...13
Introduction ... 15
Research Hypotheses and Statistical Hypotheses ... 16
Data, Variables, Values, and Observations ... 21
Classifying Variables According to Their Scales of Measurement... 24
Classifying Variables According to the Number of Values They Display ... 27
Basic Approaches to Research... 29
Using Type-of-Variable Figures to Represent Dependent and Independent Variables ... 32
The Three Types of SAS Files ... 37
Conclusion ... 45
Chapter 3: Tutorial: Writing and Submitting SAS Programs ...47
Introduction ... 48
Tutorial Part I: Basics of Using the SAS Windowing Environment... 50
Tutorial Part II: Opening and Editing an Existing SAS Program ... 75
Tutorial Part III: Submitting a Program with an Error ... 94
Tutorial Part IV: Practicing What You Have Learned ... 102
Summary of Steps for Frequently Performed Activities ... 105
Controlling the Size of the Output Page with the OPTIONS Statement... 109
For More Information... 110
Conclusion ... 110
Chapter 4: Data Input...111
Introduction ... 113
Example 4.1: Creating a Simple SAS Data Set ... 117
Example 4.2: A More Complex Data Set ... 122
Using PROC MEANS and PROC FREQ to Identify Obvious Problems with the Data Set... 131
Using PROC PRINT to Create a Printout of Raw Data ... 139
The Complete SAS Program... 142
Conclusion ... 144
Chapter 5: Creating Frequency Tables ...145
Introduction ... 146
Example 5.1: A Political Donation Study... 147
Examples of Questions That Can Be Answered by Interpreting
a Frequency Table ... 155
Conclusion ... 157
Chapter 6: Creating Graphs ...159
Introduction ... 160
Reprise of Example 5.1: the Political Donation Study... 161
Using PROC CHART to Create a Frequency Bar Chart ... 162
Using PROC CHART to Plot Means for Subgroups... 174
Conclusion ... 177
Chapter 7: Measures of Central Tendency and Variability ...179
Introduction ... 181
Reprise of Example 5.1: The Political Donation Study... 181
Measures of Central Tendency: The Mode, Median, and Mean ... 183
Interpreting a Stem-and-Leaf Plot Created by PROC UNIVARIATE ... 187
Using PROC UNIVARIATE to Determine the Shape of Distributions ... 190
Simple Measures of Variability: The Range, the Interquartile Range, and the Semi-Interquartile Range ... 200
More Complex Measures of Central Tendency: The Variance and Standard Deviation... 204
Variance and Standard Deviation: Three Formulas ... 207
Using PROC MEANS to Compute the Variance and Standard Deviation ... 210
Conclusion ... 214
Chapter 8: Creating and Modifying Variables and Data Sets ...215
Introduction ... 217
Example 8.1: An Achievement Motivation Study ... 218
Using PROC PRINT to Create a Printout of Raw Data ... 222
Where to Place Data Manipulation and Data Subsetting Statements... 225
Basic Data Manipulation ... 228
Recoding a Reversed Item and Creating a New Variable for the Achievement Motivation Study... 235
Using IF-THEN Control Statements ... 239
Data Subsetting... 248
Combining a Large Number of Data Manipulation and Data Subsetting Statements in a Single Program... 256
Conclusion ... 260
Chapter 9: z Scores...261
Introduction ... 262
Example 9.1: Comparing Mid-Term Test Scores for Two Courses... 266
Converting a Single Raw-Score Variable into a z-Score Variable ... 268
Converting Two Raw-Score Variables into z-Score Variables ... 278
Standardizing Variables with PROC STANDARD... 285
Chapter 10: Bivariate Correlation ...287
Introduction ... 290
Situations Appropriate for the Pearson Correlation Coefficient... 290
Interpreting the Sign and Size of a Correlation Coefficient ... 293
Interpreting the Statistical Significance of a Correlation Coefficient ... 297
Problems with Using Correlations to Investigate Causal Relationships... 299
Example 10.1: Correlating Weight Loss with a Variety of Predictor Variables... 303
Using PROC PLOT to Create a Scattergram... 307
Using PROC CORR to Compute the Pearson Correlation between Two Variables... 313
Using PROC CORR to Compute All Possible Correlations for a Group of Variables ... 320
Summarizing Results Involving a Nonsignificant Correlation... 324
Using the VAR and WITH Statements to Suppress the Printing of Some Correlations ... 329
Computing the Spearman Rank-Order Correlation Coefficient for Ordinal-Level Variables... 332
Some Options Available with PROC CORR ... 333
Problems with Seeking Significant Results ... 335
Conclusion ... 338
Chapter 11: Bivariate Regression...339
Introduction ... 341
Choosing between the Terms Predictor Variable, Criterion Variable, Independent Variable, and Dependent Variable ... 341
Situations Appropriate for Bivariate Linear Regression ... 344
Example 11.1: Predicting Weight Loss from a Variety of Predictor Variables... 346
Using PROC REG: Example with a Significant Positive Regression Coefficient ... 350
Using PROC REG: Example with a Significant Negative Regression Coefficient ... 371
Using PROC REG: Example with a Nonsignificant Regression Coefficient... 379
Conclusion ... 383
Chapter 12: Single-Sample t Test ...385
Introduction ... 387
Situations Appropriate for the Single-Sample t Test ... 387
Results Produced in a Single-Sample t Test... 388
Example 12.1: Assessing Spatial Recall in a Reading Comprehension Task (Significant Results) ... 393
One-Tailed Tests versus Two-Tailed Tests ... 406
Example 12.2: An Illustration of Nonsignificant Results... 407
Conclusion ... 412
Chapter 13: Independent-Samples t Test ...413
Introduction ... 415
Situations Appropriate for the Independent-Samples t Test ... 417
Example 13.1: Observed Consequences for Modeled Aggression:
Effects on Subsequent Subject Aggression (Significant Differences)... 428
Example 13.2: An Illustration of Results Showing Nonsignificant Differences... 446
Conclusion ... 450
Chapter 14: Paired-Samples t Test...451
Introduction ... 453
Situations Appropriate for the Paired-Samples t Test ... 453
Similarities between the Paired-Samples t Test and the Single-Sample t Test ... 457
Results Produced in a Paired-Samples t Test ... 461
Example 14.1: Women’s Responses to Emotional versus Sexual Infidelity ... 463
Example 14.2: An Illustration of Results Showing Nonsignificant Differences... 483
Conclusion ... 487
Chapter 15: One-Way ANOVA with One Between-Subjects Factor ...489
Introduction ... 491
Situations Appropriate for One-Way ANOVA with One Between-Subjects Factor ... 491
A Study Investigating Aggression ... 494
Treatment Effects, Multiple Comparison Procedures, and a New Index of Effect Size ... .497
Some Possible Results from a One-Way ANOVA ... 500
Example 15.1: One-Way ANOVA Revealing a Significant Treatment Effect ... 505
Example 15.2: One-Way ANOVA Revealing a Nonsignificant Treatment Effect ... 529
Conclusion ... 537
Chapter 16: Factorial ANOVA with Two Between-Subjects Factors...539
Introduction ... 542
Situations Appropriate for Factorial ANOVA with Two Between-Subjects Factors ... 542
Using Factorial Designs in Research ... 546
A Different Study Investigating Aggression... 546
Understanding Figures That Illustrate the Results of a Factorial ANOVA... 550
Some Possible Results from a Factorial ANOVA... 553
Example of a Factorial ANOVA Revealing Two Significant Main Effects and a Nonsignificant Interaction... 565
Example of a Factorial ANOVA Revealing Nonsignificant Main Effects and a Nonsignificant Interaction... 607
Example of a Factorial ANOVA Revealing a Significant Interaction ... 617
Using the LSMEANS Statement to Analyze Data from Unbalanced Designs... 625
Learning More about Using SAS for Factorial ANOVA ... 627
Conclusion ... 628
Chapter 17: Chi-Square Test of Independence ...629
Introduction ... 631
Situations That Are Appropriate for the Chi-Square Test of Independence... 631
Using Two-Way Classification Tables... 634
Results Produced in a Chi-Square Test of Independence ... 637
A Study Investigating Computer Preferences ... 640
Example of a Chi-Square Test That Reveals a Significant Relationship ... 643
Example of a Chi-Square Test That Reveals a Nonsignificant Relationship ... 661
Computing Chi-Square from Raw Data... 668
Conclusion ... 671
References ...673
Acknowledgments
During the development of these books, Caroline Brickley, Gretchen Rorie Harwood, Stephenie Joyner, Sue Kocher, Patsy Poole, and Hanna Schoenrock served as editors. All were positive, supportive, and helpful. They made the books stronger, and I thank them for their guidance.
A number of other people at SAS made valuable contributions in a variety of areas. My sincere thanks go to those who reviewed the books for technical accuracy and readability: Jim Ashton, Jim Ford, Marty Hultgren, Catherine Lumsden, Elizabeth Maldonado, Paul Marovich, Ted Meleky, Annette Sanders, Kevin Scott, Ron Statt, and Morris Vaughan. I also thank Candy Farrell and Karen Perkins for production and design; Joan Stout for indexing; Cindy Puryear and Patricia Spain for marketing; and Cate Parrish for the cover designs.
Using This
Student Guide
Introduction... 3
Overview ...3
Intended Audience and Level of Proficiency ...3
Platform and Version ...3
Materials Needed...4
Introduction to the SAS System ... 4
Why Do You Need This Student Guide?...4
What Is the SAS System?...5
Who Uses SAS? ...5
Using the SAS System for Statistical Analyses...5
Contents of This Student Guide... 6
Overview ...6
Chapter 2: Terms and Concepts Used in This Guide...7
Chapter 3: Tutorial: Using the SAS Windowing Environment to Write and Submit SAS Programs ...7
Chapter 4: Data Input...7
Chapter 5: Creating Frequency Tables ...7
Chapter 6: Creating Graphs ...8
Chapter 7: Measures of Central Tendency and Variability...8
Chapter 8: Creating and Modifying Variables and Data Sets ...8
Chapter 9: Standardized Scores (z Scores)...8
Chapter 10: Bivariate Correlation...9
Chapter 11: Bivariate Regression ...9
Chapter 13: Independent-Samples t Test ...9
Chapter 14: Paired-Samples t Test...9
Chapter 15: One-Way ANOVA with One Between-Subjects Factor...10
Chapter 16: Factorial ANOVA with Two Between-Subjects Factors ...10
Chapter 17: Chi-Square Test of Independence ...10
References ...10
Introduction
Overview
This chapter introduces you to the SAS System, a computer application that can be used to perform statistical analyses. It explains just what SAS is, where it is installed, and describes some of the advantages associated with using SAS for data analysis. Finally, it briefly summarizes what you will learn in each of the chapters that comprise this Student Guide.
Intended Audience and Level of Proficiency
This guide is intended for those who want to learn how to use SAS to perform elementary statistical analyses. The guide assumes that many students using it have not already taken a course on elementary statistics. To assist these students, this guide briefly reviews basic terms and concepts in statistics at an elementary level. It was designed to be easily understood by first and second year college students.
This book was also designed to be user-friendly to those who may have little or no
experience with personal computers. The beginning of Chapter 3, “Tutorial: Using the SAS Windowing Environment to Write and Submit SAS Programs,” reviews basic concepts in using Microsoft Windows, such as selecting menus, double-clicking icons, and so forth. Those who already have experience in using Windows will be able to quickly skim through this elementary material.
Platform and Version
This guide shows how to use the SAS System for Windows, as opposed to other operating environments. This is most apparent in Chapter 3, “Using the SAS Windowing Environment to Write and Submit SAS Programs.” However, the remaining chapters show how to write SAS code to perform statistical analyses, and most of this material will be useful to all SAS users, regardless of the operating environment. This is because, for the most part, the same SAS code can be used on a wide variety of operating environments to obtain the same results.
This book was designed for those using the SAS System Version 8 and later versions. It may also be helpful to those using earlier versions of SAS (such as V6 or V7). However, if you are using one of these earlier versions, it is likely that some of the SAS system options described here are not available with your version. It is also likely that some of the SAS output that you obtain will be arranged differently than the output that is presented here.
Materials Needed
To complete the activities described in this book, you will need
• access to a personal computer on which the SAS System for Windows has been installed,
• one (and preferably two) 3.5-inch disks, formatted for IBM PCs (or some other type of storage media).
Some students using this book will also use its companion volume, Step-by-Step Basic
Statistics Using SAS: Exercises. The chapters in the Exercises book parallel most of the
chapters contained in this Student Guide. Each chapter in the Exercises book contains two assignments for students to complete. Complete solutions are provided for the
odd-numbered exercises, but not for the even-odd-numbered ones. The Exercises book can give you useful practice in learning how to use SAS, but it is not absolutely required.
Introduction to the SAS System
Why Do You Need This Student Guide?
This Student Guide shows you how to use a computer application called the SAS System to perform elementary statistical analyses. Until recently, students in elementary statistics courses typically performed statistical computations by hand or with a pocket calculator. In recent years, however, the increased availability of computers has made it possible for students to also use statistical software packages such as SPSS and the SAS System to perform these analyses. This latter approach allows students to focus more on conceptual issues in statistics, and spend less time on the mechanics of performing mathematical operations by hand. Step by step, this Student Guide will introduce you to the SAS System, and will show you how to use it to perform a variety of statistical analyses that are
What Is the SAS System?
The SAS System is a modular, integrated, and hardware-independent application. It is used as an information delivery system by business organizations, governments, and universities worldwide.
SAS is used for virtually every aspect of information management in organizations, including decision support, project management, financial analysis, quality improvement, data warehousing, report writing, and presentations. However, this guide will focus on just one aspect of SAS: its ability to perform the types of statistical analyses that are appropriate for research in the social sciences and education.
By the time you have completed this text, you will have accomplished two objectives: you will have learned how to perform elementary statistical analyses using SAS, and you will have become familiar with a widely used information delivery system.
Who Uses SAS?
The SAS System is widely used in business organizations and universities. Consider the following statistics from July 2002:
• SAS supports over 40 operating environments, including Windows, OS/2, and UNIX. • SAS Institute’s computer software products are installed at over 38,400 sites in 115
countries.
• Approximately 71% of SAS installations are in business locations, 18% are education sites, and 11% are government sites. It is used for teaching and research at about 3,000 university locations.
• It is estimated that SAS software products are used by more than 3.5 million people worldwide.
• 90% of all Fortune 500 companies are SAS clients.
Using the SAS System for Statistical Analyses
SAS is a particularly powerful tool for social scientists and educators because it allows them to easily perform virtually any type of statistical analysis that may be required in their research. SAS is comprehensive enough to perform the most sophisticated multivariate analyses, but is so easy to use that undergraduates can perform simple analyses after only a short period of instruction.
In a sense, the SAS System may be viewed as a library of prewritten statistical algorithms. By submitting a brief SAS program, you can access a procedure from the library
and use it to analyze a set of data. For example, below are the SAS statements used to call up the algorithm that calculates Pearson correlation coefficients:
PROC CORR DATA=D1; RUN;
The preceding statements will cause SAS to compute the Pearson correlation between every possible pair of numeric variables in your data set. Being able to call up complex
procedures with such a simple statement is what makes SAS so powerful. By contrast, if you had to prepare your own programs to compute Pearson correlations by using a programming language such as FORTRAN or BASIC, it would require many statements, and there would be many opportunities for error. By using SAS instead, most of the work has already been completed, and you are able to focus on the results of the analysis rather than on the mechanics of obtaining those results.
Contents of This Student Guide
Overview
This guide has two objectives: to teach the basics of using SAS in general and, more specifically, to show how to use SAS procedures to perform elementary statistical analyses. Chapters 1–4 provide an overview to the basics of using SAS. The remaining chapters cover statistical concepts in a sequence that is representative of the sequence followed in most elementary statistics textbooks.
Chapters 10–17 introduce you to inferential statistical procedures (the type of procedures that are most often used to analyze data from research). Each chapter shows you how to conduct the analysis from beginning to end. Each chapter also provides an example of how the analysis might be summarized for publication in an academic journal in the social sciences or education. For the most part, these summaries are written according to the guidelines provided in the Publication Manual of the American Psychological Association (1994).
Many students using this book will also use its companion volume, Step-by-Step Basic
Statistics Using SAS: Exercises. For Chapters 3–17 in this student guide, the corresponding
chapter in the exercise book provides you with a hands-on exercise that enables you to practice the data analysis skills that you are learning.
The following sections provide a summary of the contents of the remaining chapters in this guide.
Chapter 2: Terms and Concepts Used in This Guide
Chapter 2 defines some important terms related to research and statistics that will be used throughout this guide. It also introduces you to the three types of files that you will work with during a typical session with SAS: the SAS program, the SAS log, and the SAS output file.
Chapter 3: Tutorial: Using the SAS Windowing Environment to Write and Submit SAS Programs
The SAS windowing environment is a powerful application that you will use to create, edit, and submit SAS programs. You will also use it to review your SAS logs and output. Chapter 3 provides a tutorial that teaches you how to use this application. Step by step, it shows you how to write simple SAS programs and interpret their results. By the end of this chapter, you should be ready to use the SAS windowing environment to write and submit SAS programs on your own.
Chapter 4: Data Input
Chapter 4 shows you how to use the DATA and INPUT statements to create SAS data sets. You will learn how to read both numeric and character variables by using a simple, list style for data input. By the end of the chapter, you will be prepared to input the data sets that will be presented throughout the remainder of this guide.
Chapter 5: Creating Frequency Tables
Chapter 5 shows you how to create frequency tables that are useful for understanding your data and answering some types of research questions. For example, imagine that you ask a sample of 150 people to tell you their age. If you then used SAS to create a frequency table for this age variable, you would be able to easily answer questions such as
• How many people are age 30?
• How many people are age 30 or younger? • What percent of people are age 45?
Chapter 6: Creating Graphs
Chapter 6 shows you how to use SAS to create frequency bar charts––bar charts that indicate the number of people who displayed a given value on a variable. For example, imagine that you asked 150 people to indicate their political party. If you used SAS to create a frequency bar chart, the resulting chart would indicate the number of people who are democrats, the number who are republicans, and the number who are independents. Chapter 6 also shows how to create bar charts that plot subgroup means. For example, assume that, in the “political party” study described above, you asked the 150 subjects to indicate both their political party and their age. You could then use SAS to create a bar chart that plots the mean age for people in each party. For instance, the resulting bar chart might show that the average age for democrats was 32.12, the average age for republicans was 41.56, and the average age for independents was 37.33.
Chapter 7: Measures of Central Tendency and Variability
Chapter 7 shows you how to compute measures of variability (e.g., the interquartile range, standard deviation, and variance) as well as measures of central tendency (e.g., the mean, median, and mode) for numeric variables. It also shows how to use stem-and-leaf plots to determine whether a distribution is skewed or approximately normal in shape.
Chapter 8: Creating and Modifying Variables and Data Sets
Chapter 8 shows how to use subsetting IF statements to create new data sets that contain a specified subgroup from the original sample. It also shows how to use mathematical operators and IF-THEN statements to recode variables and to create new variables from existing variables.
Chapter 9: Standardized Scores (z Scores)
Chapter 9 shows how to transform raw scores into standardized variables (z score variables) with a mean of 0 and a standard deviation of 1. You will learn how to do this by using the data manipulation statements that you learned about in Chapter 8. Chapter 9 also illustrates how you can review the sign and absolute magnitude of a z score to understand where a particular observation stands on the variable in question.
Chapter 10: Bivariate Correlation
Bivariate correlation coefficients allow you to determine the nature of the relationship between two numeric variables. Chapter 10 shows you how to use the CORR procedure to compute Pearson correlation coefficients for interval- and ratio-level variables. You will also learn to interpret the p values (probability values) that are produced by PROC CORR to determine whether a given correlation coefficient is significantly different from zero.
Chapter 10 also shows how to use PROC PLOT to create a two-dimensional scattergram that illustrates the relationship between two variables.
Chapter 11: Bivariate Regression
Bivariate regression is used when you want to predict scores on an interval- or ratio-level criterion variable from an interval- or ratio-level predictor variable. Chapter 11 shows you how to use the REG procedure to compute the slope and intercept for the regression equation, along with predicted values and residuals of prediction.
Chapter 12: Single-Sample t Test
Chapter 12 shows how to use the TTEST procedure to perform a single-sample t test. This is an inferential procedure that is useful for determining whether a sample mean is
significantly different from a specified population mean. You will learn how to interpret the
t statistic, and the p value associated with that t statistic.
Chapter 13: Independent-Samples t Test
You use an independent-samples t test to determine whether there is a significant difference between two groups of subjects with respect to their mean scores on the dependent variable. Chapter 13 explains when to use the equal-variance t statistic versus the unequal-variance t statistic, and shows how to use the TTEST procedure to conduct this analysis.
Chapter 14: Paired-Samples t Test
The paired-samples t test is also appropriate when you want to determine whether there is a significant difference between two sample means. The paired-samples approach is indicated when each score in one sample is dependent upon a corresponding score in the second sample. This will be the case in studies in which the same subjects provide repeated measures on the same dependent variable under different conditions, or when matching procedures are used. Chapter 14 shows how to perform this analysis using the TTEST procedure.
Chapter 15: One-Way ANOVA with One Between-Subjects Factor One-way analysis of variance (ANOVA) is an inferential procedure similar to the
independent-samples t test, with one important difference: while the t test allows you to test the significance of the difference between two sample means, a one-way ANOVA allows you to test the significance of the difference between more than two sample means. Chapter 15 shows how to use the GLM procedure to perform a one-way ANOVA, and then to follow with multiple comparison (post hoc) tests.
Chapter 16: Factorial ANOVA with Two Between-Subjects Factors A one-way ANOVA, as described in Chapter 15, may be appropriate for analyzing data from an experiment in which the researcher manipulates only one independent variable. In contrast, a factorial ANOVA with two between-subjects factors may be appropriate for analyzing data from an experiment in which the researcher manipulates two independent variables simultaneously. Chapter 16 shows how to perform this type of analysis. It provides examples of results in which the main effects are significant, as well as results in which the interaction is significant.
Chapter 17: Chi-Square Test of Independence
Nonparametric statistical procedures are procedures that do not require stringent
assumptions about the nature of the populations under study. Chapter 17 illustrates one of the most common nonparametric procedures: the chi-square test of independence. This test is appropriate when you want to study the relationship between two variables that assume a limited number of values. Chapter 17 shows how to conduct the test of significance and interpret the results presented in the two-way classification table created by the FREQ procedure.
References
Many statistical procedures are illustrated in this guide by showing you how to analyze fictitious data from an empirical study. Many of these “studies” are loosely based on actual investigations reported in the research literature. These studies were chosen to help
introduce you to the types of empirical investigations that are often conducted in the social and behavioral sciences and in education. The “References” section at the end of this guide provides complete references for the actual studies that inspired the fictitious studies reported here.
Conclusion
This guide assumes that some of the students using it have not yet completed a course on elementary statistics. This means that some readers will be unfamiliar with terms used in data analysis, such as “observations,” “null hypothesis,” “dichotomous variables,” and so on. To remedy this, the following chapter, "Terms and Concepts Used in This Guide," provides a brief primer on basic terms and concepts in statistics. This chapter should lay a foundation that will make it easier to understand the chapters to follow.
Terms and
Concepts Used
in This Guide
Introduction...15 Overview ...15 A Common Language for Researchers...15 Why This Chapter Is Important ...15 Research Hypotheses and Statistical Hypotheses ...16
Example: A Goal-Setting Study...16 The Research Question ...16 The Research Hypothesis...16 The Statistical Null Hypothesis...18 The Statistical Alternative Hypothesis...19 Directional versus Nondirectional Alternative Hypotheses ...19 Summary ...21 Data, Variables, Values, and Observations ...21
Defining the Instrument, Gathering Data, Analyzing Data, and
Drawing Conclusions...21 Variables, Values, and Observations ...22 Classifying Variables According to Their Scales of Measurement...24
Introduction ...24 Nominal Scales ...25 Ordinal Scales...25 Interval Scales ...26 Ratio Scales...27
Classifying Variables According to the Number of Values They Display ...27 Overview ...27 Dichotomous Variables ...27 Limited-Value Variables ...28 Multi-Value Variables ...28 Basic Approaches to Research ...29
Nonexperimental Research ...29 Experimental Research...31 Using Type-of-Variable Figures to Represent Dependent and
Independent Variables ...32 Overview ...32 Figures to Represent Types of Variables...33 Using Figures to Represent the Types of Variables Assessed
in a Specific Study...34 The Three Types of SAS Files...37
Overview ...37 The SAS Program...37 The SAS Log...42 The SAS Output File ...44 Conclusion...45
Introduction
Overview
This chapter has two objectives. This first is to introduce you to basic terms and concepts related to research design and data analysis. This chapter describes the different types of variables that might be analyzed when conducting research, the classification of these variables according to their scale of measurement or other characteristics, and the differences between nonexperimental versus experimental research.
The chapter’s second objective is to introduce you to the three types of files that you will work with when you perform statistical analyses with SAS. These include the SAS program file, the SAS log file, and the SAS output file.
After completing this chapter, you should be familiar with the fundamental terms and concepts that are relevant to data analysis, and you will have a foundation to begin learning about the SAS System in detail in subsequent chapters.
A Common Language for Researchers
Research in the behavioral sciences and in education is extremely diverse. In part, this is because the behavioral sciences represent a wide variety of disciplines, including
psychology, sociology, anthropology, political science, management, and other fields. Further complicating matters is the fact that, within each discipline, a wide variety of methods are used to conduct research. These methods can include unobtrusive observation, participant observation, case studies, interviews, focus groups, surveys, ex post facto studies, laboratory experiments, and field experiments.
Despite this diversity in methods used and topics investigated, most scientific investigations still share a number of characteristics. Regardless of field, most research involves an investigator who gathers data and performs analyses to determine what the data mean. In addition, most researchers in the behavioral sciences and education use a common language in reporting their research; researchers from all fields typically speak of “testing null
hypotheses” and “obtaining significant p values.”
Why This Chapter Is Important
The purpose of this chapter is to review some fundamental concepts and terms that are shared in the behavioral sciences and in education. You should familiarize (or refamiliarize) yourself with this material before proceeding to the subsequent chapters, as most of the terms introduced here will be referred to again and again throughout the text. If you have not yet taken a course in statistics, this chapter will provide an elementary introduction; if you have already completed a course in statistics, it will provide a quick review.
Research Hypotheses and Statistical Hypotheses
Example: A Goal-Setting Study
Imagine that you have been hired by a large insurance company to find ways of improving the productivity of its insurance agents. Specifically, the company would like you to find ways to increase the number of insurance policies sold by the average agent. You will therefore begin a program of research to identify the determinants of agent productivity. In the course of this program, you will work with research questions, research hypotheses, and statistical hypotheses.
The Research Question
The process of research often begins by developing a clear statement of the research question (or questions). The research question is a statement of what you hope to have learned by the time the research has been completed. It is good practice to revise and refine the research question several times to ensure that you are very clear about what it is you really want to know.
For example, in the current example, you might begin with the question “What is the
difference between agents who sell much insurance versus agents who sell little insurance?” A more specific question might be “What variables have a causal effect on the amount of insurance sold by agents?” Upon reflection, you might realize that the insurance company really only wants to know what things management can do to cause the agents to sell more insurance. This might eliminate from consideration those variables that are not under
management’s control, and can substantially narrow the focus of the research program. This narrowing, in turn, leads to a more specific statement of the research question such as “What variables under the control of management have a causal effect on the amount of insurance sold by agents?” Once the research question has been more clearly defined in this way, you are in a better position to develop a good hypothesis that provides a possible answer to the question.
The Research Hypothesis
An hypothesis is a statement about the predicted relationships among events or variables. A good hypothesis in the present case might identify a specific variable that is expected to have a causal effect on the amount of insurance sold by agents. For example, a research hypothesis might predict that the agents’ level of training will have a positive effect on the amount of insurance sold. Or it might predict that the agents’ level of achievement
In developing the hypothesis, you might be influenced by any of a number of sources: an existing theory, some related research, or even personal experience. Let's assume that in the present situation, for example, you have been influenced by goal-setting theory. This theory states, among other things, that higher levels of work performance are achieved when
difficult goals are set for employees. Drawing on goal-setting theory, you now state the following hypothesis: “The difficulty of the goals that agents set for themselves is
positively related to the amount of insurance they sell.” Notice how this statement satisfies our definition for a research hypothesis, as it is a statement about the predicted relationship between two variables. The first variable can be labeled “goal difficulty,” and the second can be labeled “amount of insurance sold.”
The predicted relationship between goal difficulty and amount of insurance sold is illustrated in Figure 2.1. Notice that there is an arrow extending from goal difficulty to amount of insurance sold. This arrow reflects the prediction that goal difficulty is the causal variable, and amount of insurance sold is the variable being affected.
Figure 2.1. Causal relationship between goal difficulty and amount of insurance sold, as predicted by the research hypothesis.
In Figure 2.1, you can see that the variable being affected (insurance sold) appears on the left side of the figure, and that the causal variable (goal difficulty) appears on the right. This arrangement might seem a bit unusual to you, since most figures that portray causal
relationships have the order reversed (with the causal variable on the left and the variable being affected on the right). However, this guide will always use the arrangement that appears in Figure 2.1, for reasons that will become clear later.
You can see that the research hypothesis stated above is quite broad in nature. In many research situations, however, it is helpful to state hypotheses that are more specific in the predictions they make. For example, assume that there is an instrument called the “Smith Goal Difficulty Scale.” Scores on this fictitious instrument can range from zero to 100, with higher scores indicating more difficult goals. If you administered this scale to a sample of agents, you could develop a more specific research hypothesis along the following lines: “Agents who score 60 or above on the Smith Goal Difficulty Scale will sell greater amounts of insurance than agents who score below 60.”
The Statistical Null Hypothesis
Beginning in Chapter 10, “Bivariate Correlation,” this guide will show you how to use the SAS System to perform tests of null hypotheses. The way that you state a specific null hypothesis will vary depending on the nature of your research question and the type of data analysis that you are performing. Generally speaking, however, a statistical null hypothesis is typically a prediction that there is no difference between groups in the population, or that there is no relationship between variables in the population.
For example, consider the research hypothesis stated in the preceding section: “Agents who score 60 or above on the Smith Goal Difficulty Scale will sell greater amounts of insurance than agents who score below 60.” Assume that you conduct a study to investigate this research hypothesis. You identify two groups of subjects:
• 50 Agents who score 60 or above on the Smith Goal Difficulty Scale (the “high goal-difficulty group”).
• 50 Agents who score below 60 on the Smith Goal Difficulty Scale (the “low goal-difficulty group”).
You observe these agents over a 12-month period, and record the amount of insurance that they sell. You want to investigate the following (fairly specific) research hypothesis:
Research hypothesis: The average amount of insurance sold by the high goal-difficulty group will be greater than the average amount sold by the low goal-difficulty group. You plan to analyze the data using a statistical procedure such as a t test (which will be discussed in Chapter 13, “Independent-Samples t Test”). One way to structure this analysis is to begin with the following statistical null hypothesis:
Statistical null hypothesis: In the population, there is no difference between the high goal-difficulty group and the low goal-difficulty group with respect to their mean scores on the amount of insurance sold.
Notice that this is a prediction of no difference between the groups. You will analyze the data from your sample, and if the observed difference is large enough, you will reject this null hypothesis of no difference. Rejecting this statistical null hypothesis means that you have obtained some support for your original research hypothesis (the hypothesis that there is a difference between the groups).
Statistical null hypotheses are often represented symbolically. For example, this is how you could have symbolically represented the preceding statistical null hypothesis:
where
H0 is the symbol used to represent the null hypothesis
µ1 is the symbol used to represent the mean amount of insurance sold by Group 1 (the
high goal-difficulty group) in the population
µ2 is the symbol used to represent the mean amount of insurance sold by Group 2 (the low goal-difficulty group) in the population.
The Statistical Alternative Hypothesis
A statistical alternative hypothesis is typically a prediction that there is a difference between groups in the population, or that there is relationship between variables in the population. The alternative hypothesis is the counterpart to the null hypothesis; if you reject the null hypothesis, you tentatively accept the alternative hypothesis.
There are different ways that you can state alternative hypotheses. One way is simply to predict that there is a difference between the population means, without predicting which population mean is higher. Here is one way of stating that type of alternative hypothesis for the current study:
Statistical alternative hypothesis: In the population, there is a difference between the high goal-difficulty group and the low goal-difficulty group with respect to their mean scores on the amount of insurance sold.
The alternative hypothesis also can be stated symbolically H1: µ1≠ µ2
The H1 symbol above is the symbol for an alternative hypothesis. Notice that the “not equal” symbol (≠) is used to represent the prediction that the means will not be equal.
Directional versus Nondirectional Alternative Hypotheses
Nondirectional hypotheses. The preceding section illustrated a nondirectional alternative hypothesis, also known as a two-sided or two-tailed alternative hypothesis. With the type of study described here (a study in which group means are being compared), a nondirectional alternative hypothesis simply predicts that one population mean differs from the other population mean––it does not predict which population mean will be higher. You would obtain support for this nondirectional alternative hypothesis if the high goal-difficulty group sold significantly more insurance, on the average, than the low goal-difficulty group. You would also obtain support for this nondirectional alternative hypothesis if the low goal-difficulty group sold significantly more insurance than the high goal-goal-difficulty group. With a nondirectional alternative hypothesis, you are predicting some type of difference, but you are not predicting the specific nature, or direction, of the difference.
Directional hypotheses. In some situations it might be appropriate to use a directional alternative hypothesis. With the type of study described above, a directional alternative hypothesis (also known as a one-sided or one-tailed alternative hypothesis) not only predicts that there will be a difference, but also makes a specific prediction about which population will display the higher mean.
For example, in the present study, previous research might lead you to predict that the population of high goal-difficulty employees will sell more insurance, on the average, than the population of low goal-difficulty employees. If this were the case, you might state the following directional alternative hypothesis:
Statistical alternative hypothesis: In the population, mean amount of insurance sold by the high goal-difficulty group is greater than the mean amount of insurance sold by the low goal-difficulty group.
This alternative hypothesis can also be stated symbolically H1: µ1 > µ2
where
µ1 represents the mean amount of insurance sold by Group 1 (the high goal-difficulty group) in the population
µ2 represents the mean amount of insurance sold by Group 2 (the low goal-difficulty group) in the population.
Notice that the “greater than” symbol (>) is used to represent the prediction that the mean for the high goal-difficulty population is greater than the mean for the low goal-difficulty population.
Choosing directional versus nondirectional tests. Which type of alternative hypothesis should you use in your research? Most statistics textbooks recommend using a
nondirectional, or two-sided, alternative hypothesis, in most cases. The problem with the directional hypothesis is that if your obtained sample means are in the opposite direction of the direction that you predict, it can cause you to fail to reject the null hypothesis even when there are very large differences between the sample means.
For example, assume that you state the directional alternative hypothesis presented above (i.e., “In the population, mean amount of insurance sold by the high goal-difficulty group is greater than the mean amount of insurance sold by the low goal-difficulty group”). Because your alternative hypothesis is a directional hypothesis, the null hypothesis you are testing is as follows:
H0: µ1≤ µ2
which means, “In the population, the mean amount of insurance sold by the high goal-difficulty group (Group 1) is less than or equal to the mean amount of insurance sold by the low goal-difficulty group (Group 2).”
Clearly, to reject the null hypothesis, the high goal-difficulty group (Group 1) must display a mean that is greater than the low goal-difficulty group (Group 2). If Group 2 displays the higher mean, then you might not reject the null hypothesis, no matter how great that
difference might be. This presents a problem because the finding that Group 2 scored higher than Group 1 may be of great interest to other researchers (particularly because it is not what many would have expected). This is why, in many situations, nondirectional tests are
preferred over directional tests.
Summary
In summary, research projects often begin with a statement of a research hypothesis. This allows you to develop a specific, testable statistical null hypothesis and an alternative hypothesis. The analysis of your data will lead you to one of two results:
• If the results are significant, you can reject the null hypothesis and tentatively accept the alternative hypothesis. Assuming the means are in the predicted direction, this type of result provides some support for your initial research hypothesis.
• If the results are nonsignificant, you fail to reject the null hypothesis. This type of result fails to provide support for your initial research hypothesis.
Data, Variables, Values, and Observations
Defining the Instrument, Gathering Data, Analyzing Data, and Drawing Conclusions
With the null hypothesis stated, you can now test it by conducting a study in which you gather and analyze relevant data. Data is defined as a collection of scores that are obtained when subject characteristics and/or performance are observed and recorded. For example, you can choose to test your hypothesis by conducting a simple correlational study: You identify a group of 100 agents and determine
• the difficulty of the goals that have been set for each agent • the amount of insurance sold by each.
Different types of instruments can be used to obtain different types of data. For example, you might use a questionnaire to assess goal difficulty, but rely on company records for measures of insurance sold. Once the data are gathered, each agent will have one score indicating the difficulty of his or her goals, and a second score indicating the amount of insurance he or she has sold.
You would then analyze the data to see if the agents with the more difficult goals did, in fact, sell more insurance. If so, the study results would lend some support to your research hypothesis; if not, the results would fail to provide support. In either case, you would be
able to draw conclusions regarding the tenability of your hypotheses, and would have made some progress toward answering your research question. The information learned in the current study might stimulate new questions or new hypotheses for subsequent studies, and the cycle would repeat. For example, if you obtained support for your hypothesis with a correlational study, you might choose to follow it up with a study using a different research method, perhaps an experimental study (the difference between these methods will be described below). Over time, a body of research evidence would accumulate, and researchers would be able to review this body to draw general conclusions about the determinants of insurance sales.
Variables, Values, and Observations
Definitions. When discussing data, one often speaks in terms of variables, values, and observations. Further complicating matters is the fact that researchers make distinctions between different types of variables (such as quantitative variables versus classification variables). This section discusses the distinctions between these terms.
• Variables. For the type of research discussed in this book, a variable refers to some specific characteristic of a subject that can assume one or more different values. For the subjects in the study described above, “amount of insurance sold” is an example of a variable: Some subjects had sold a large amount of insurance, and others had sold less. A different variable was “goal difficulty:” Some subjects had more difficult goals, while others had less difficult goals. Subject age was a third variable, while subject sex (male versus female) was yet another.
• Values. A value, on the other hand, refers to either a particular subject's relative standing on a quantitative variable, or a subject's classification within a classification variable. For example, the “amount of insurance sold” is a quantitative variable that can assume a large number of values: One agent might sell $2,000,000 worth of insurance in one year, one might sell $100,000 worth, and another might sell $0 worth. Subject age is another quantitative variable that can assume a wide variety of values. In the sample studied, these values ranged from a low of 22 years to a high of 64 years. • Quantitative variables. You can see that, in both of these examples, a particular value
is a type of score that indicates where the subject stands on the variable. The word “score” is an appropriate substitute for the word “value” in these cases because both “amount of insurance sold” and “age” are quantitative variables: variables that represent the quantity, or amount, of the construct that is being assessed. With quantitative variables, numbers typically serve as values.
• Classification variables. A different type of variable is a classification variable or, alternatively, qualitative variable or categorical variable. With classification
variables, different values represent different groups to which the subject might belong. “Sex” is a good example of a classification variable, as it might assume only one of two values: A particular subject is classified as being either a male or a female. “Political Party” is an example of a classification variable that can assume a larger number of
values: A subject might be classified as being a republican, a democrat, or an
independent. These variables are classification variables and not quantitative variables because the values only represent membership in a singular, specific group––
membership that cannot be represented meaningfully with a numeric value. • Observational units. In discussing data, researchers often make references to
observational units, that can be defined as the individual subjects (or other objects) that serve as the source of the data. Within the behavioral sciences and education, an
individual person usually serves as the observational unit under study (although it is also possible to use some other entity, such as an individual school or organization, as the observational unit). In this text, the individual person is used as the observational unit in most examples. Researchers will often refer to the “number of observations” or
“number of cases” included in their data set, and this typically refers to the number of subjects who were studied.
An example. For a more concrete illustration of the concepts discussed so far, consider the data set displayed in Table 2.1:
Table 2.1
Insurance Sales Data
________________________________________________________________________ Goal
difficulty Overall Observation Name Sex Age scores ranking Sales ________________________________________________________________________ 1 Bob M 34 97 2 $598,243 2 Walt M 56 80 1 $367,342 3 Jane F 36 67 4 $254,998 4 Susan F 24 40 3 $80,344 5 Jim M 22 37 5 $40,172 6 Mack M 44 24 6 $0 ________________________________________________________________________
The preceding table reports information regarding six research subjects: Bob, Walt, Jane, Susan, Jim, and Mack; therefore, we would say that the data set includes six observations. Information about a particular observation (subject) is displayed as a row running
horizontally from left to right across the table.
The first column of the data set (running vertically from top to bottom) is headed “Observation,” and it simply provides an observation number for each subject. The second column (headed “Name”) provides a name for each subject.
The remaining five columns report information about the five research variables that are being studied.
The column headed “Sex” reports subject sex, which might assume one of two values: “M” for male and “F” for female.
The column headed “Age” reports the subject's age in years.
The “Goal Difficulty Scores” column reports the subject's score on a fictitious goal difficulty scale. In this example, each participant has a score on a 20-item questionnaire about the difficulty of his or her work goals. Depending on how they respond to the
questionnaire, subjects receive a score ranging from a low of zero (meaning that the subject views the work goals as extremely easy) to a high of 100 (meaning that the goals are viewed as extremely difficult).
The column headed “Overall Ranking,” shows how the subjects were ranked by their supervisor according to their overall effectiveness as agents. A rank of 1 represents the most effective agent, and a rank of 6 represents the least effective.
The column headed “Sales” reveals the amount of insurance sold by each agent (in dollars) during the most recent year.
Table 2.1 provides a very small data set with six observations and five research variables (sex, age, goal difficulty, overall ranking, and sales). One of the variables was a
classification variable (sex), while the remainder were quantitative variables. The numbers or letters that appear within a particular column represent some of the values that could be assumed by that variable.
Classifying Variables According to Their Scales of
Measurement
Introduction
One of the most important schemes for classifying a variable involves its scale of measurement. Researchers generally discuss four different scales of measurement: nominal, ordinal, interval, and ratio. Before analyzing a data set, it is important to determine which scales of measurement were used because certain types of statistical procedures require specific scales of measurement. For example, a one-way analysis of variance generally requires that the dependent variable be an interval-level or ratio-level variable; the chi-square test of independence allows you to analyze nominal-level variables; other statistics make other assumptions about the scale of measurement used with the variables that are being studied.
Nominal Scales
A nominal scale is a classification system that places people, objects, or other entities into mutually exclusive categories. A variable that is measured using a nominal scale is a classification variable: It simply indicates the name of the group to which each subject belongs. The examples of classification variables provided earlier (e.g., sex and political party) also serve as examples of nominal-level variables: They tell you which group a subject belongs to, but they do not provide any quantitative information about the subjects. That is, the “sex” variable might tell you that some subjects are males and other are females, but it does not tell you that some subjects possess more of a specific characteristic relative to others. With the remaining three scales of measurement, however, some quantitative
information is provided.
Ordinal Scales
Values on an ordinal scale represent the rank order of the subjects with respect to the
variable that is being assessed. For example, Table 2.1 includes one variable called “Overall Ranking,” which represents the rank-ordering of the subjects according to their overall effectiveness as agents. The values on this ordinal scale represent a hierarchy of levels with respect to the construct of “effectiveness”: We know that the agent ranked “1” was
perceived as being more effective than the agent ranked “2,” that the agent ranked “2” was more effective than the one ranked “3,” and so forth.
However, an ordinal scale has a serious limitation in that equal differences in scale values do not necessarily have equal quantitative meaning. For example, notice the rankings reproduced here: Overall ranking Name _______ ______ 1 Walt 2 Bob 3 Susan 4 Jane 5 Jim 6 Mack
Notice that Walt was ranked #1 while Bob was ranked #2. The difference between these two rankings is 1 (because 2 – 1 = 1), so we might say that there is one unit of difference between Walt and Bob. Now notice that Jim was ranked #5 while Mack was ranked #6. The difference between these two rankings is also 1 (because 6 – 5 = 1), so we might say that there is also 1 unit of difference between Jim and Mack. Putting the two together, we can see that the difference in ranking between Walt and Bob is equal to the difference in ranking between Jim and Mack. But does this mean that the difference in overall effectiveness between Walt and Bob is equal to the difference in overall effectiveness between Jim and Mack? Not necessarily. It is possible that Walt was just barely superior to
Bob in effectiveness, while Jim was substantially superior to Mack. These rankings tell us very little about the quantitative differences between the subjects with regard to the
underlying construct (effectiveness, in this case). An ordinal scale simply provides a rank order of who is better than whom.
Interval Scales
With an interval scale, equal differences between scale values do have equal quantitative meaning. For this reason, you can see that the interval scale provides more quantitative information than the ordinal scale. A good example of an interval scale is the Fahrenheit scale used to measure temperature. With the Fahrenheit scale, the difference between 70 degrees and 75 degrees is equal to the difference between 80 degrees and 85 degrees: the units of measurement are equal throughout the full range of the scale.
However, the interval scale also has an important limitation: it does not have a true zero point. A true zero point means that a value of zero on the scale represent zero quantity of the construct being assessed. It should be obvious that the Fahrenheit scale does not have a true zero point. When the thermometer reads zero degrees, that does not mean that there is absolutely no heat present in the environment––it is still possible for the temperature to go lower (into the negative numbers).
Researchers in the social sciences often assume that many of their “man-made” variables are measured on an interval scale. For example, in the preceding study involving insurance agents, you would probably assume that scores from the goal difficulty questionnaire constitute an interval-level scale; that is, you would likely assume that the difference
between a score of 50 and 60 is approximately equal to the difference between a score of 70 and 80. Many researchers would also assume that scores from an instrument such as an intelligence test are also measured at the interval level of measurement.
On the other hand, some researchers are skeptical that instruments such as these have true equal-interval properties, and prefer to refer to them as quasi-interval scales.
Disagreements concerning the level of measurement achieved with such paper-and-pencil instruments continues to be a controversial topic within many disciplines.
In any case, it is clear that there is no true zero point with either of the preceding instruments: a score of zero on the goal difficulty scale does not indicate the complete absence of goal difficulty, and a score of zero on an intelligence test does not indicate the complete absence of intelligence. A true zero point can be found only with variables measured on a ratio scale.
Ratio Scales
Ratio scales are similar to interval scales in that equal differences between scale values do have equal quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional property: with ratio scales, it is possible to make meaningful statements about the ratios between scale values.
For example, the system of inches used with a common ruler is an example of a ratio scale. There is a true zero point with this system, in that “zero inches” does in fact indicate a complete absence of length. With this scale, it is possible to make meaningful statements about ratios. It is appropriate to say that an object four inches long is twice as long as an object two inches long. Age, as measured in years, is also on a ratio scale: a 10-year-old house is twice as old as a 5-year-old house. Notice that it is not possible to make these statements about ratios with the interval-level variables discussed above. One would not say that a person with an IQ of 160 is twice as intelligent as a person with an IQ of 80, as there is no true zero point with that scale.
Although ratio-level scales are most commonly used for reporting the physical properties of objects (e.g., height, weight), they are also common in the type of research that is discussed in this manual. For example, the study discussed above included the variables “age” and “amount of insurance sold (in dollars).” Both of these have true zero points, and are measured as ratio scales.
Classifying Variables According to the Number of Values
They Display
Overview
The preceding section showed that variables can be classified according to their scale of measurement. Sometimes is also useful to classify variables according to the number of values they display. There might be any number of approaches for doing this, but this guide uses a simple division of variables into three groups according to the number of possible values: dichotomous variables, limited-value variables, and multi-value variables.
Dichotomous Variables
A dichotomous variable is a variable that assumes just two values. These variables are sometimes called binary variables. Here are some examples of dichotomous variables: • Suppose that you obtain Smith Anxiety Test scores from 50 male subjects and 50 female
subjects. In this study, “subject sex” is a dichotomous variable, because it can assume just two values, “male” versus “female.”
• Suppose that you conduct an experiment to determine whether the herbal supplement ginkgo biloba causes improvement in a rat’s ability to learn. You begin with 20 rats, and randomly assign them to two groups. Ten rats are assigned to the 100 mg group (they receive 100 mg of ginkgo), and the other ten rats are assigned to the 0 mg group (they receive no ginkgo). In this study, the independent variable that you are manipulating is “amount of ginkgo administered.” This is a dichotomous variable because it assumes just two values “0 mg” versus “100 mg.”
Limited-Value Variables
A limited-value variable is a variable that assumes just two to six values in your sample. Here are some examples of limited-value variables:
• Suppose that you obtain Smith Anxiety Test scores from 50 Caucasian subjects, 50 African-American subjects, and 50 Asian-American subjects. In this study, “subject race” is a limited-value variable because it assumes just three values: “Caucasian” versus “African-American” versus “Asian-American.”
• Suppose that you again conduct an experiment to determine whether ginkgo biloba causes improvements in a rat’s ability to learn. You begin with 100 rats, and randomly assign them to four groups: Twenty-five rats are assigned to the 150 mg group, 25 rats are assigned to the 100 mg group, 25 rats are assigned to the 50 mg group, and 25 rats are assigned to the 0 mg group. In this study, the independent variable that you are manipulating is still “amount of ginkgo administered.” You know that this is a limited-value variable because it assumes just four limited-values “0 mg” versus “50 mg” versus “100 mg” versus “150 mg.”
Multi-Value Variables
Finally, this book defines a multi-value variable as a variable that assumes more than six values in your sample. Here are some examples of multi-value variables:
• Assume that you obtain Smith Anxiety Test scores from 100 subjects. With the Smith Anxiety Test, scores (values) may range from 0–99, with higher scores indicating greater anxiety. In analyzing the data, you see that your subjects displayed a wide variety of scores, for example:
• One subject received a score of 2. • One subject received a score of 5. • Two subjects received a score of 10. • Five subjects received a score of 21. • Seven subjects received a score of 33. • Eight subjects received a score of 45. • Nine subjects received a score of 53.