• No results found

The association between two quantitative variables can be readily displayed using a scatterplot.Relationships between two variables can be described as signed, additive, multiplicative, or a combination of all three types of relation.For example, a positive multiplicative relationship between interest rates and infla- tion means that when interest rates increase, inflation will also increase. Conversely, a negative relationship would imply that a decreasing interest rate would be associated with increasing inflation.

Relationships of this type are commonly used in linear algebra; you might think of them as “formulae of straight lines,” since they describe exactly the characteris- tics of straight lines in a two-dimensional plane that can vary.In the general form of the modelyax±b:

y is the dependent variable

• ±ais theslope(i.e., where the value ofywhenx= 0);–aindicates a negative association; +aindicates a positive association

x is the independent variable • ±b is theintercept of the straight line

Note thatmis sometimes used in place ofain this equation: this is just a different notational convention and does not change the meaning.Given various sources of error, most phenomena that can be described by a linear model actually demon- strate some deviation from expected values.However, by using a scatterplot, you can get some visual clues concerning whether the relationship between two vari- ables is linear or not.

Figure 9-1 shows the association between two variables (xandy) that are strongly positively associated, since for each (x,y) set of coordinates, the values are equal. The model describing this relationship is x = y.Using the linear model, a = 1, which is positive, so the relationship is positive; andb= 0, since the intercept is the origin (0, 0).

However, associations between two variables can also be negative, as shown in Figure 9-2, where each value ofyis a negative multiple ofx.In this example, the association between the two variables is perfectly negative.Note that the values of the (x,y) coordinates do not need to be the same in order for the association to be strong—the first values plotted are (1, –2), (2, –4), and (3, –6).The model for this

relationship isy= –2x, which is multiplicative in nature.Using the linear model, a= –2, which is positive, so the relationship is positive.Again,b= 0, since the intercept is the origin (0, 0).

Relationships between two variables can also be additive, or both multiplicative and additive.Figure 9-3 shows the strongly linear relationship betweenxandy, where the model is multiplicative (by a factor of 2) and additive (with 0.5). The model for this relationship isy= 2x+ 0.5. Using the linear model,a= 2, which is positive, so the relationship is positive.

Sometimes, variables have no relationship, so don’t be fooled into thinking that a straight line on a number plane indicates an association.Figure 9-4 shows the situation where the same value ofyis related to every possible value ofx.In this case, there is no association between the variables.Using the linear model,a= 0, and since there is no slope, there is no relationship.

Often, when taking real-world measurements, some amount of random or system- atic error is present, as described in Chapter 6.This can obscure the strength of relationship between two variables when you are examining a scatterplot.For example, Figure 9-5 shows exactly the same data as Figure 9-1, except that random error has been added into the model, to better reflect real-world measure- ments.By looking at the scatterplot, would you have guessed that the model was similar to Figure 9-1? In this example, the model isxx=yy, whereεxandεy

represent random error.

Graphing Associations Through Scatterplots | 173

Correlation

Coefficient

Figure 9-2. Association between two variables described by the model y = –2x

Finally, Figure 9-6 shows a different type of relationship between two variables: an exponential relationship that is described by the modely=ex, whereeequals 2.712..., which is the base of the natural logarithm. Exponential and other Figure 9-4. Lack of association between two variables described by the model y = 1

Graphing Associations Through Scatterplots | 175

Correlation

Coefficient

nonlinear functions mean that variables may not be linearly associated, but are associated in other ways.Indeed, knowing (or being able to predict) the type and class of model required to describe the type of association between two variables is part of the art of being a statistician.

In this example, if you changed the y-axis to be displayed using a logarithmic scale, the relationship betweenxandywould appear to be strong.Being aware of the underlying linearity either assumed or used explicitly is very important when understanding relationships.Linear associations are the easiest to deal with math- ematically, but exponential relationships are very powerful (imagine if Figure 9-6 described the growth in your savings after 10 years of stock market investment!).

Looking forward to Chapter 12, imagine now drawing a straight line through Figure 9-5; you can see there are several possible straight lines that can be drawn through all of the data points.The basis of linear regression goes one step further in quantifying the relationship between two (or more) variables by drawing the line through the plane that best fits the data, in the sense of minimizing the distance between the observed coordinates and their estimate according to a model.

Least-squares linear regression, described in Chapter 12, minimizes the squared

deviations from the expected values of each observation, and thus provides the “best fit” to a linear model for the data.The point to note here is that you can use graphical tools as well as mathematical models to determine underlying structure in data.