CHAPTER 2: Visual Description of Data

(1)

CHAPTER 2:

Visual Description of Data

.

(2)

Chapter 2 - Learning Objectives

•

Convert raw data into a data array.

•

Construct:

– a frequency distribution.

– a relative frequency distribution.

– a cumulative relative frequency distribution.

•

Visually represent data by using

graphs and charts.

(3)

Chapter 2 - Key Terms

•

Data array

– An orderly presentation of data in either ascending or descending numerical order.

•

Frequency Distribution

– A table that represents the data in

classes and that shows the number of

observations in each class.

(4)

Chapter 2 - Key Terms

•

Frequency Distribution

– Class - The category

– Frequency - Number in each class – Class limits - Boundaries for each

class

– Class interval - Width of each class

– Class mark - Midpoint of each class

(5)

Sturges’ rule

•

How to set the approximate number of classes to begin constructing a

frequency distribution.

where k = approximate number of classes to use and

n = the number of observations in the data set .

k   1 3322 n

. (log 10 )

(6)

How to Construct a Frequency

Distribution

1. Number of classes

Choose an approximate number of classes for your data. Sturges’ rule can help.

2. Estimate the class interval

Divide the approximate number of classes (from Step 1) into the range of your data to find the approximate class interval, where the range is defined as the largest data value minus the smallest data value.

3. Determine the class interval

Round the estimate (from Step 2) to a convenient value.

(7)

How to Construct a

Frequency Distribution, cont.

4. Lower Class Limit

Determine the lower class limit for the first class by selecting a convenient number that is smaller than the lowest data value.

5. Class Limits

Determine the other class limits by repeatedly adding the class width (from Step 2) to the prior class limit, starting with the lower class limit (from Step 3).

6. Define the classes

Use the sequence of class limits to define the classes.

(8)

Converting to a Relative Frequency Distribution

1. Retain the same classes defined in the frequency distribution.

2. Sum the total number of observations across all classes of the frequency

distribution.

3. Divide the frequency for each class by the total number of observations,

forming the percentage of data values in

each class.

(9)

Forming a Cumulative Relative Frequency

Distribution

1. List the number of observations in the lowest class.

2. Add the frequency of the lowest class to the frequency of the second class. Record that cumulative sum for the second class.

3. Continue to add the prior cumulative sum to the frequency for that class, so that the cumulative sum for the final class is the

total number of observations in the data

set.

(10)

Forming a Cumulative Relative Frequency

Distribution, cont.

4. Divide the accumulated frequencies for each class by the total number of

observations -- giving you the percent of all observations that occurred up to an including that class.

•

An Alternative: Accrue the relative

frequencies for each class instead of the

raw frequencies. Then you don’t have to

divide by the total to get percentages.

(11)

Example: Problem 2.59

•

The average daily cost to community hospitals for patient stays during 1993 for each of the 50 U.S. states was given in the next table.

– a) Arrange these into a data array.

– b) this part has been omitted as it is not examinable – *) Approximately how many classes would be

appropriate for these data? [*not in textbook]

– c & d) Construct a frequency distribution. State interval width and class mark.

– e) Construct a histogram, a relative frequency distribution, and a cumulative relative frequency distribution.

(12)

Problem 2.59 - The Data

AL $775 HI 823 MA 1,036 NM 1,046 SD 506 AK 1,136 ID 659 MI 902 NY 784 TN 859 AZ 1,091 IL 917 MN 652 NC 763 TX 1,010 AR 678 IN 898 MS 555 ND 507 UT 1,081 CA 1,221 IA 612 MO 863 OH 940 VT 676 CO 961 KS 666 MT 482 OK 797 VA 830 CT 1,058 KY 703 NE 626 OR 1,052 WA 1,143 DE 1,024 LA 875 NV 900 PA 861 WV 701 FL 960 ME 738 NH 976 RI 885 WI 744 GA 775 MD 889 NJ 829 SC 838 WY 537

(13)

Problem 2.59 - (a) Data Array

CA 1,221 TX 1,010 RI 885 NY 784 KS 666 WA 1,143 NH 976 LA 875 AL 775 ID 659 AK 1,136 CO 961 MO 863 GA 775 MN 652 AZ 1,091 FL 960 PA 861 NC 763 NE 626 UT 1,081 CH 940 TN 859 WI 744 IA 612 CT 1,058 IL 917 SC 838 ME 738 MS 555 OR 1,052 MI 902 VA 830 KY 703 WY 537 NM 1,046 NV 900 NJ 829 WV 701 ND 507 MA 1,036 IN 898 HI 823 AR 678 SD 506 DE 1,024 MD 889 OK 797 VT 676 MT 482

(14)

Problem 2.59 - Continued

•

To approximate the number of classes we should use in creating the

frequency distribution, use Sturges’

Rule, n = 50:

Sturges’ rule suggests we use approximately 7 classes.

k 13.322(log

10 n) 13.322(log

10 50)

13.322(1.69897)15.6446.6447

(15)

Constructing the Frequency

Distribution

• Step 1. Number of classes

– Sturges’ Rule: approximately 7 classes.

The range is: $1,221 – $482 = $739

$739/7 = $106 and $739/8 = $92

•

Steps 2 & 3. The Class Interval

– So, if we use 8 classes, we can make

each class $100 wide.

(16)

Constructing the Frequency

Distribution

•

Step 4. The Lower Class Limit

– If we start at $450, we can cover the range in 8 classes, each class $100 in width.

The first class : $450 up to (but not including)

$550

•

Steps 5 & 6. Setting Class Limits

$450 up to $550 $850 up to $950

$550 up to $650 $950 up to $1,050

$650 up to $750 $1,050 up to $1,150

$750 up to $850 $1,150 up to $1,250

(17)

Problem 2.59 - (c) & (d)

Average daily cost Number Class Mark

$450 – under $550 4 $500 $550 – under $650 3 $600 $650 – under $750 9 $700 $750 – under $850 9 $800

$850 – under $950 11 $900 $950 – under $1,050 7 $1,000

$1,050 – under $1,150 6 $1,100

$1,150 – under $1,250 1 $1,200

Interval width: $100

(18)

Problem 2.59 - (e) The Histogram

0 2 4 6 8 10 12

500 600 700 800 900 1000 1100 1200

(19)

Problem 2.59 - The Relative Frequency Distribution

Average daily cost Number Rel. Freq.

$450 – under $550 4 4/50 = .08 $550 – under $650 3 3/50 = .06 $650 – under $750 9 9/50 = .18 $750 – under $850 9 9/50 = .18

$850 – under $950 11 11/50 = .22

$950 – under $1,050 7 7/50 = .14

$1,050 – under $1,150 6 6/50 = .12

$1,150 – under $1,250 1 1/50 = .02

(20)

Problem 2.59 - The Cumulative

Frequency Distribution

Average daily cost Number Cum. Freq.

$450 – under $550 4 4

$550 – under $650 3 7

$650 – under $750 9 16

$750 – under $850 9 25

$850 – under $950 11 36 $950 – under $1,050 7 43

$1,050 – under $1,150 6 49

$1,150 – under $1,250 1 50

(21)

Problem 2.59 - The Cumulative Relative

Frequency Distribution

Average daily cost Cum.Freq. Cum.Rel.Freq.

$450 – under $550 4 4/50 = .08 $550 – under $650 7 7/50 = .14

$650 – under $750 16 16/50 = .32 $750 – under $850 25 25/50 = .50 $850 – under $950 36 36/50 = .72 $950 – under $1,050 43 43/50 = .86

$1,050 – under $1,150 49 49/50 = .98

$1,150 – under $1,250 50 50/50 = 1.00

(22)

The Scatter Diagram

•

A scatter diagram is a two-dimensional plot of data representing values of two quantitative variables.

• x, the independent variable, on the horizontal axis

• y, the dependent variable, on the vertical axis

•

Four ways in which two variables can be related:

1. Direct 2. Inverse 3. Curvilinear 4. No

relationship

(23)

An Example: Problem 2.44

•

For 6 local offices of a large tax

preparation firm, the following data

describe x = service revenues and y = expenses for supplies, freight, postage, etc.

• Draw a scatter diagram representing the data. Does there appear to be any

relationship between the variables? If so, is

the relationship direct or inverse?

(24)

Problem 2.44, continued

Scatter Plot with Trend Line

15.0 17.0 19.0 21.0 23.0 25.0

200.0 300.0 400.0 500.0 600.0

x = Service Revenue (thous) y = Office

Expenses (thous)

There appears to be a direct relationship between

the service revenue and the office expenses incurred.