• No results found

Line graphsare also often used to display the relationship between two variables,

often between time on the x-axis and some other variable on the y-axis.One requirement for a bar graph is that there can only be oney-value for eachx-value, so it would not be an appropriate choice for data such as the SAT data presented above.Consider the data in Table 4-10, from the U.S.Centers for Disease Control and Prevention (CDC), showing the percentage of obesity among U.S. adults, measured annually over a 13-year period.

Figure 4-14. Scatterplot of verbal and math SAT scores

Bivariate Charts | 79

Descriptive

Statistics

What we can see from this table is that obesity has been increasing at a steady pace; occasionally there is a decrease from one year to the next, but more often there is a small increase (1–2 percent).This information can also be presented as a bar chart, as in Figure 4-16.

Table 4-10. Percentage of obesity among U.S. adults, 1990–2002 (source: CDC)

1990 11.6 1991 12.6 1992 12.6 1993 13.7 1994 14.4 1995 15.8 1996 16.8 1997 16.6 1998 18.3 1999 19.7 2000 20.1 2001 21 2002 22.1

Although the line graph makes the overall pattern of steady increase clear, the visual effect of the graph is highly dependent on the scale and range used for the y-axis (which in this case shows percentage of obesity).Figure 4-16 is a sensible representation of the data, but if we wanted to increase the effect we could choose a larger scale and smaller range for they-axis (vertical axis), as in Figure 4-17.

Figure 4-17 presents exactly the same data as Figure 4-16, but a smaller range was chosen for they-axis (10%–22.5%, versus 0%–30%). The narrower range makes the differences between years look larger: choosing a misleading range is one of the time-honored ways to “lie with statistics.”

The same trick works in reverse: if we graph the same data using a wide range for the vertical axis, the changes over the entire period seem much smaller, as in Figure 4-18.

Figure 4-18 presents the same obesity data as Figures 4-16 and 4-17, with a large range on the vertical axis (0%–100%) to decrease the visual impact of the trend. So which scale should be chosen? There is no perfect answer to this question: all present the same information, and none strictly speaking are incorrect.In this case, if I were presenting this chart without reference to any other graphics, the scale would be 5–16 because it shows the true floor for the data (0%, which is the lowest possible value) and includes a reasonable range above the highest data point.One principle that should be observed is that if multiple charts are compared to each other (for instance, charts showing the percent obesity in Figure 4-17. Obesity among U.S. adults, 1990–2002 (CDC), using a restricted range to decrease the visual impact of the trend

Exercises | 81

Descriptive

Statistics

different countries over the same time period, or charts of different health risks for the same period), they should all use the same scale to avoid misleading the reader.

Exercises

Like any other aspect of statistics, learning the techniques of descriptive statistics requires practice.The data sets provided are deliberately simple, because if you can apply a technique correctly with 10 cases, you can also apply it with 1,000. My advice is to try solving the problems several ways, for instance, by hand, using a calculator, and using whatever software is available to you.Even spreadsheet programs like Excel have many simple mathematical and statistical functions available, and now would be a good time to investigate those possibilities.In addi- tion, by solving a problem several ways, you will have more confidence that you are using the software correctly.

Most graphic presentations are created using software, and while each package has good and bad points, most will be able to produce most if not all of the graphics presented in this chapter, and quite a few other types of graphs as well. So the best way to become familiar with graphics is to investigate whatever soft- ware you have access to and practice graphing data you work with (or that you make up).Always keep in mind that graphic displays are a form of communica- tion, and therefore should clearly indicate whatever you think is most important about a given data set.

Figure 4-18. Obesity among U.S. adults, 1990–2002 (CDC), using a large range to inflate the visual impact of the trend

Question

When is each of the following an appropriate measure of central tendency? Think of some examples for each from your work or studies.

Mean Median Mode Answer

The mean is appropriate for interval or ratio data that is continuous, symmetrical, and does not contain significant outliers.

The median is appropriate for continuous data that may be skewed (asymmet- rical), based on ranks, or contain extreme values.

The mode is most appropriate for categorical variables, or for continuous data sets where one value dominates the others.

Question

What is the median of this data set? 1 2 3 4 5 6 7 8 9

Answer

5: The data set has 9 values, which is an odd number; the median is therefore the middle value when the values are arranged in order.To look at this question more