Standardized Scores (z Scores) - Step-By-Step Basic Statistics Using SAS

Chapter 9 shows how to transform raw scores into standardized variables (z score variables) with a mean of 0 and a standard deviation of 1. You will learn how to do this by using the data manipulation statements that you learned about in Chapter 8. Chapter 9 also illustrates how you can review the sign and absolute magnitude of a z score to understand where a particular observation stands on the variable in question.

Chapter 10: Bivariate Correlation

Bivariate correlation coefficients allow you to determine the nature of the relationship between two numeric variables. Chapter 10 shows you how to use the CORR procedure to compute Pearson correlation coefficients for interval- and ratio-level variables. You will also learn to interpret the p values (probability values) that are produced by PROC CORR to determine whether a given correlation coefficient is significantly different from zero.

Chapter 10 also shows how to use PROC PLOT to create a two-dimensional scattergram that illustrates the relationship between two variables.

Chapter 11: Bivariate Regression

Bivariate regression is used when you want to predict scores on an interval- or ratio-level criterion variable from an interval- or ratio-level predictor variable. Chapter 11 shows you how to use the REG procedure to compute the slope and intercept for the regression equation, along with predicted values and residuals of prediction.

Chapter 12: Single-Sample t Test

Chapter 12 shows how to use the TTEST procedure to perform a single-sample t test. This is an inferential procedure that is useful for determining whether a sample mean is

significantly different from a specified population mean. You will learn how to interpret the t statistic, and the p value associated with that t statistic.

Chapter 13: Independent-Samples t Test

You use an independent-samples t test to determine whether there is a significant difference between two groups of subjects with respect to their mean scores on the dependent variable.

Chapter 13 explains when to use the equal-variance t statistic versus the unequal-variance t statistic, and shows how to use the TTEST procedure to conduct this analysis.

Chapter 14: Paired-Samples t Test

The paired-samples t test is also appropriate when you want to determine whether there is a significant difference between two sample means. The paired-samples approach is indicated when each score in one sample is dependent upon a corresponding score in the second sample. This will be the case in studies in which the same subjects provide repeated measures on the same dependent variable under different conditions, or when matching procedures are used. Chapter 14 shows how to perform this analysis using the TTEST procedure.

Chapter 15: One-Way ANOVA with One Between-Subjects Factor One-way analysis of variance (ANOVA) is an inferential procedure similar to the

independent-samples t test, with one important difference: while the t test allows you to test the significance of the difference between two sample means, a one-way ANOVA allows you to test the significance of the difference between more than two sample means. Chapter 15 shows how to use the GLM procedure to perform a one-way ANOVA, and then to follow with multiple comparison (post hoc) tests.

Chapter 16: Factorial ANOVA with Two Between-Subjects Factors A one-way ANOVA, as described in Chapter 15, may be appropriate for analyzing data from an experiment in which the researcher manipulates only one independent variable. In contrast, a factorial ANOVA with two between-subjects factors may be appropriate for analyzing data from an experiment in which the researcher manipulates two independent variables simultaneously. Chapter 16 shows how to perform this type of analysis. It provides examples of results in which the main effects are significant, as well as results in which the interaction is significant.

Chapter 17: Chi-Square Test of Independence

Nonparametric statistical procedures are procedures that do not require stringent

assumptions about the nature of the populations under study. Chapter 17 illustrates one of the most common nonparametric procedures: the chi-square test of independence. This test is appropriate when you want to study the relationship between two variables that assume a limited number of values. Chapter 17 shows how to conduct the test of significance and interpret the results presented in the two-way classification table created by the FREQ procedure.

References

Many statistical procedures are illustrated in this guide by showing you how to analyze fictitious data from an empirical study. Many of these “studies” are loosely based on actual investigations reported in the research literature. These studies were chosen to help

introduce you to the types of empirical investigations that are often conducted in the social and behavioral sciences and in education. The “References” section at the end of this guide provides complete references for the actual studies that inspired the fictitious studies reported here.

Conclusion

This guide assumes that some of the students using it have not yet completed a course on elementary statistics. This means that some readers will be unfamiliar with terms used in data analysis, such as “observations,” “null hypothesis,” “dichotomous variables,” and so on. To remedy this, the following chapter, "Terms and Concepts Used in This Guide,"

provides a brief primer on basic terms and concepts in statistics. This chapter should lay a foundation that will make it easier to understand the chapters to follow.

Terms and

Concepts Used in This Guide

Introduction...15 Overview ...15 A Common Language for Researchers...15 Why This Chapter Is Important ...15 Research Hypotheses and Statistical Hypotheses ...16 Example: A Goal-Setting Study...16 The Research Question ...16 The Research Hypothesis...16 The Statistical Null Hypothesis...18 The Statistical Alternative Hypothesis...19 Directional versus Nondirectional Alternative Hypotheses ...19 Summary ...21 Data, Variables, Values, and Observations ...21

Defining the Instrument, Gathering Data, Analyzing Data, and

Drawing Conclusions...21 Variables, Values, and Observations ...22 Classifying Variables According to Their Scales of Measurement...24 Introduction ...24 Nominal Scales ...25 Ordinal Scales...25 Interval Scales ...26 Ratio Scales...27

Classifying Variables According to the Number of Values

They Display ...27 Overview ...27 Dichotomous Variables ...27 Limited-Value Variables ...28 Multi-Value Variables ...28 Basic Approaches to Research ...29 Nonexperimental Research ...29 Experimental Research...31 Using Type-of-Variable Figures to Represent Dependent and

Independent Variables ...32 Overview ...32 Figures to Represent Types of Variables...33 Using Figures to Represent the Types of Variables Assessed

in a Specific Study...34 The Three Types of SAS Files...37 Overview ...37 The SAS Program...37 The SAS Log...42 The SAS Output File ...44 Conclusion...45

Introduction

Overview

This chapter has two objectives. This first is to introduce you to basic terms and concepts related to research design and data analysis. This chapter describes the different types of variables that might be analyzed when conducting research, the classification of these variables according to their scale of measurement or other characteristics, and the differences between nonexperimental versus experimental research.

The chapter’s second objective is to introduce you to the three types of files that you will work with when you perform statistical analyses with SAS. These include the SAS program file, the SAS log file, and the SAS output file.

After completing this chapter, you should be familiar with the fundamental terms and concepts that are relevant to data analysis, and you will have a foundation to begin learning about the SAS System in detail in subsequent chapters.

A Common Language for Researchers

Research in the behavioral sciences and in education is extremely diverse. In part, this is because the behavioral sciences represent a wide variety of disciplines, including

psychology, sociology, anthropology, political science, management, and other fields.

Further complicating matters is the fact that, within each discipline, a wide variety of methods are used to conduct research. These methods can include unobtrusive observation, participant observation, case studies, interviews, focus groups, surveys, ex post facto studies, laboratory experiments, and field experiments.

Despite this diversity in methods used and topics investigated, most scientific investigations still share a number of characteristics. Regardless of field, most research involves an investigator who gathers data and performs analyses to determine what the data mean. In addition, most researchers in the behavioral sciences and education use a common language in reporting their research; researchers from all fields typically speak of “testing null

hypotheses” and “obtaining significant p values.”

Why This Chapter Is Important

The purpose of this chapter is to review some fundamental concepts and terms that are shared in the behavioral sciences and in education. You should familiarize (or refamiliarize) yourself with this material before proceeding to the subsequent chapters, as most of the terms introduced here will be referred to again and again throughout the text. If you have not yet taken a course in statistics, this chapter will provide an elementary introduction; if you have already completed a course in statistics, it will provide a quick review.

Research Hypotheses and Statistical Hypotheses

Example: A Goal-Setting Study

Imagine that you have been hired by a large insurance company to find ways of improving the productivity of its insurance agents. Specifically, the company would like you to find ways to increase the number of insurance policies sold by the average agent. You will therefore begin a program of research to identify the determinants of agent productivity. In the course of this program, you will work with research questions, research hypotheses, and statistical hypotheses.

The Research Question

The process of research often begins by developing a clear statement of the research question (or questions). The research question is a statement of what you hope to have learned by the time the research has been completed. It is good practice to revise and refine the research question several times to ensure that you are very clear about what it is you really want to know.

For example, in the current example, you might begin with the question “What is the

difference between agents who sell much insurance versus agents who sell little insurance?”

A more specific question might be “What variables have a causal effect on the amount of insurance sold by agents?” Upon reflection, you might realize that the insurance company really only wants to know what things management can do to cause the agents to sell more insurance. This might eliminate from consideration those variables that are not under

management’s control, and can substantially narrow the focus of the research program. This narrowing, in turn, leads to a more specific statement of the research question such as “What variables under the control of management have a causal effect on the amount of insurance sold by agents?” Once the research question has been more clearly defined in this way, you are in a better position to develop a good hypothesis that provides a possible answer to the question.

The Research Hypothesis

An hypothesis is a statement about the predicted relationships among events or variables. A good hypothesis in the present case might identify a specific variable that is expected to have a causal effect on the amount of insurance sold by agents. For example, a research hypothesis might predict that the agents’ level of training will have a positive effect on the amount of insurance sold. Or it might predict that the agents’ level of achievement

motivation will positively affect sales.

In developing the hypothesis, you might be influenced by any of a number of sources: an existing theory, some related research, or even personal experience. Let's assume that in the present situation, for example, you have been influenced by goal-setting theory. This theory states, among other things, that higher levels of work performance are achieved when

difficult goals are set for employees. Drawing on goal-setting theory, you now state the following hypothesis: “The difficulty of the goals that agents set for themselves is

positively related to the amount of insurance they sell.” Notice how this statement satisfies our definition for a research hypothesis, as it is a statement about the predicted relationship between two variables. The first variable can be labeled “goal difficulty,” and the second can be labeled “amount of insurance sold.”

The predicted relationship between goal difficulty and amount of insurance sold is illustrated in Figure 2.1. Notice that there is an arrow extending from goal difficulty to amount of insurance sold. This arrow reflects the prediction that goal difficulty is the causal variable, and amount of insurance sold is the variable being affected.

Figure 2.1. Causal relationship between goal difficulty and amount of insurance sold, as predicted by the research hypothesis.

In Figure 2.1, you can see that the variable being affected (insurance sold) appears on the left side of the figure, and that the causal variable (goal difficulty) appears on the right. This arrangement might seem a bit unusual to you, since most figures that portray causal

relationships have the order reversed (with the causal variable on the left and the variable being affected on the right). However, this guide will always use the arrangement that appears in Figure 2.1, for reasons that will become clear later.

You can see that the research hypothesis stated above is quite broad in nature. In many research situations, however, it is helpful to state hypotheses that are more specific in the predictions they make. For example, assume that there is an instrument called the “Smith Goal Difficulty Scale.” Scores on this fictitious instrument can range from zero to 100, with higher scores indicating more difficult goals. If you administered this scale to a sample of agents, you could develop a more specific research hypothesis along the following lines:

“Agents who score 60 or above on the Smith Goal Difficulty Scale will sell greater amounts of insurance than agents who score below 60.”

The Statistical Null Hypothesis

Beginning in Chapter 10, “Bivariate Correlation,” this guide will show you how to use the SAS System to perform tests of null hypotheses. The way that you state a specific null hypothesis will vary depending on the nature of your research question and the type of data analysis that you are performing. Generally speaking, however, a statistical null hypothesis is typically a prediction that there is no difference between groups in the population, or that there is no relationship between variables in the population.

For example, consider the research hypothesis stated in the preceding section: “Agents who score 60 or above on the Smith Goal Difficulty Scale will sell greater amounts of insurance than agents who score below 60.” Assume that you conduct a study to investigate this research hypothesis. You identify two groups of subjects:

• 50 Agents who score 60 or above on the Smith Goal Difficulty Scale (the “high goal-difficulty group”).

• 50 Agents who score below 60 on the Smith Goal Difficulty Scale (the “low goal-difficulty group”).

You observe these agents over a 12-month period, and record the amount of insurance that they sell. You want to investigate the following (fairly specific) research hypothesis:

Research hypothesis: The average amount of insurance sold by the high goal-difficulty group will be greater than the average amount sold by the low goal-difficulty group.

You plan to analyze the data using a statistical procedure such as a t test (which will be discussed in Chapter 13, “Independent-Samples t Test”). One way to structure this analysis is to begin with the following statistical null hypothesis:

Statistical null hypothesis: In the population, there is no difference between the high goal-difficulty group and the low goal-difficulty group with respect to their mean scores on the amount of insurance sold.

Notice that this is a prediction of no difference between the groups. You will analyze the data from your sample, and if the observed difference is large enough, you will reject this null hypothesis of no difference. Rejecting this statistical null hypothesis means that you have obtained some support for your original research hypothesis (the hypothesis that there is a difference between the groups).

Statistical null hypotheses are often represented symbolically. For example, this is how you could have symbolically represented the preceding statistical null hypothesis:

H₀: µ₁ = µ₂

where

H₀ is the symbol used to represent the null hypothesis

µ₁ is the symbol used to represent the mean amount of insurance sold by Group 1 (the high goal-difficulty group) in the population

µ₂ is the symbol used to represent the mean amount of insurance sold by Group 2 (the low goal-difficulty group) in the population.

The Statistical Alternative Hypothesis

A statistical alternative hypothesis is typically a prediction that there is a difference between groups in the population, or that there is relationship between variables in the population. The alternative hypothesis is the counterpart to the null hypothesis; if you reject the null hypothesis, you tentatively accept the alternative hypothesis.

There are different ways that you can state alternative hypotheses. One way is simply to predict that there is a difference between the population means, without predicting which population mean is higher. Here is one way of stating that type of alternative hypothesis for the current study:

Statistical alternative hypothesis: In the population, there is a difference between the high goal-difficulty group and the low goal-difficulty group with respect to their mean scores on the amount of insurance sold.

The alternative hypothesis also can be stated symbolically H₁: µ₁≠ µ2

The H₁ symbol above is the symbol for an alternative hypothesis. Notice that the “not equal”

symbol (≠) is used to represent the prediction that the means will not be equal.

Directional versus Nondirectional Alternative Hypotheses

Nondirectional hypotheses. The preceding section illustrated a nondirectional alternative hypothesis, also known as a two-sided or two-tailed alternative hypothesis. With the type of study described here (a study in which group means are being compared), a nondirectional alternative hypothesis simply predicts that one population mean differs from the other population mean––it does not predict which population mean will be higher. You would obtain support for this nondirectional alternative hypothesis if the high goal-difficulty group sold significantly more insurance, on the average, than the low goal-difficulty group. You would also obtain support for this nondirectional alternative hypothesis if the low goal-difficulty group sold significantly more insurance than the high goal-goal-difficulty group. With a nondirectional alternative hypothesis, you are predicting some type of difference, but you are not predicting the specific nature, or direction, of the difference.

Directional hypotheses. In some situations it might be appropriate to use a directional alternative hypothesis. With the type of study described above, a directional alternative hypothesis (also known as a one-sided or one-tailed alternative hypothesis) not only predicts that there will be a difference, but also makes a specific prediction about which population will display the higher mean.

For example, in the present study, previous research might lead you to predict that the population of high goal-difficulty employees will sell more insurance, on the average, than the population of low goal-difficulty employees. If this were the case, you might state the following directional alternative hypothesis:

Statistical alternative hypothesis: In the population, mean amount of insurance sold by the high goal-difficulty group is greater than the mean amount of insurance sold by the low goal-difficulty group.

This alternative hypothesis can also be stated symbolically H₁: µ₁ > µ₂

where

µ₁ represents the mean amount of insurance sold by Group 1 (the high goal-difficulty

In document Step-By-Step Basic Statistics Using SAS_ Student Guide.pdf (Page 23-200)