• No results found

4 Statistical samples and displays DATA ANALYSIS

N/A
N/A
Protected

Academic year: 2020

Share "4 Statistical samples and displays DATA ANALYSIS"

Copied!
47
0
0

Loading.... (view fulltext now)

Full text

(1)

4

4

Statistical

samples and displays

DATA ANALYSIS

Statistics originally meant ‘information about the state’. Today the term ‘statistics’ refers

both to the field of study and to the calculated numerical values, such as the mean, determined from the information collected.

Statistics is used by governments to plan for community facilities and to maintain records of our economy. It is used by the medical profession in researching new drugs and their effects and by advertisers in promoting their products. Businesses use statistics to forecast sales and profits, and manufacturing industries use statistics to monitor product quality.

We make use of statistical data every day when we use an ATM, purchase goods with barcodes, borrow a book from a library or access information via a modem.

As we move from an industrial to an information society, the focus moves from the production of goods and services to the mammoth task of organising and managing the needs and wants of Australia’s growing population. Statistical methods are critical in supplying accurate and quick responses to governments, businesses and other groups.

In this chapter you will learn how to: collect and organise statistical data

access and use information from statistical organisations

analyse and interpret statistical information from tables, charts and graphs distinguish between the various types of sampling

design an appropriate questionnaire

represent data in frequency histograms and polygons

display data in dot plots, stem-and-leaf plots and radar charts use random numbers to establish random samples

(2)

STATISTICAL INVESTIGATIONS

We carry out statistical investigations for a variety of reasons, including:

to satisfy our curiosity

for research or study

to measure performance

to assist in decision making.

There are three main stages in a statistical investigation: 1. Collect and organise information or data.

2. Summarise and display this data. 3. Analyse and interpret this data.

The collection and analysis of good data will lead to meaningful conclusions.

1 Collect and organise information or data

We can access data already published by governmental or private sources, design an experiment, make observations or conduct a survey.

There are many sources of published statistical data that can be accessed. For example:

The Australian Bureau of Statistics (ABS) collects, analyses and distributes information about the population of Australia including housing and health care.

AC Neilsen uses market research to provide clients with information on consumer products.

Daily newspapers are filled with statistical information.

The Internet can be used to access various sites dealing with statistical information, including the ABS, United Nations (UN), Bureau of Meteorology (BOM), World Health Organisation (WHO) as well as many interesting student sites.

Gathering information by making observations usually occurs in a natural setting or during a controlled experiment. For example, researchers studying animal behaviour will collect their data by observing animals directly in their natural habitat.

Medical researchers studying the effects of drugs on patients will observe reactions using a

control group. For example, out of 50 patients in a group, 25 are given a new drug and 25

are given a placebo, and only the researcher knows which patients are given the new drug.

2 Summarise and display data

Statistical data is usually summarised and displayed using a table, graph or spreadsheet.

3 Analyse and interpret data

In analysing and interpreting data, the data is examined and any patterns are noticed. Summary statistics, such as mean and standard deviation, are calculated, and conclusions and predictions are made.

Some useful websites: Australian Bureau of Statistics (ABS) www.abs.gov.au

Bureau of Meteorology (BOM) www.bom.gov.au

United Nations (UN) www.un.gov

Morgan Surveys www.roymorgan.com.au

(3)

It is easy to quote figures in order to promote a product, but the figures quoted are often used with little explanation as to how they were obtained. Have you ever thought about what advertisers mean when they say:

‘Four out of five film stars use Spotlight toilet soap’ or ‘Most dentists use Sparkle toothpaste’?

Discuss these statements, looking at: (a) what the statements could mean (b) how the data could have been collected

(c) what message the advertiser is trying to get across to the consumer.

1. Choose three of the statistical facts about NSW given above in Just for the record, and

write two or three sentences on how you think the data may have been collected for each.

2. Choose three of the following organisations and state what sort of statistical information

each collects:

(a) National Roads and Motorists Association (NRMA) (b) GIO Insurance Company

(c) Bureau of Meteorology (d) Australian Stock Exchange (e) Land Titles Office

(f) Coopers and Lybrand (g) Roy Morgan Research Centre

(h) Commonwealth Department of Immigration (i) Australian Customs

(j) Totalisator Agency Board (TAB)

Investigation:

Using statistics in advertising

Just for the record

10 S

TATISTICS ABOUT

NSW

In 1996:

There were 97.6 males to every 100 females, down from 98.5 in 1991.

133 million chickens, 5.2 million sheep and 3.6 million lambs were killed for human consumption.

3.37 million cars were registered.

Accidents, poisoning and violence caused 66% of deaths in 15–24 year olds.

31% of residents had a dual flush toilet.

Trips to the cinema totalled 2.8 million.

The most popular sport was aerobics with 235 200 people participating.

87% of TAFE students studied part-time.

There were 3053 schools with about 1 million students.

The sun shone on Sydney for 342 days.

From the 1998 NSW Year Book published by the ABS.

(4)

3. Collect five graphs, charts or tables from newspapers or magazines and write two or

three sentences on what each is showing.

4. We often see advertisers’ statements like these in the media:

(a) ‘best food in Sydney’ (b) ‘world’s finest chocolate’

(c) ‘Australia’s favourite pool company’ (d) ‘most popular TV program’ (e) ‘best coffee in Australia’

Choose two of these statements and say, in a sentence or two: (i) what the statements could mean

(ii) how the data could have been collected

What message are the advertisers trying to get across to the consumer?

5. By using the ABS website or otherwise, write two or three sentences on the role of the

Australian Bureau of Statistics in our society and give three examples of the statistical information that it collects.

It is important to use real and current data in your study of statistics.

Start a statistics file now and collect graphs and tables from newspapers and magazines.

You can also collect information from statistical websites such as the ABS.

Collect as many different types of graphs and tables as you can.

INTERPRETING GRAPHS

Example 1

These column graphs represent the 896 international students who enrolled in Australian tertiary institutions in 1994. List some statistical facts that can be interpreted from the graphs.

Solution

About a third of the students came from Hong Kong.

Business Office Studies was the most popular area of study with approximately 45% of students enrolled in this area.

Hotel and Tourism courses were the second most popular with about 15% of students enrolling in these.

Idea:

Collecting statistical graphs and tables

International students enrolled in Australian tertiary institutions, 1994

By country By course

5 10 15 20 25 30 35%

Taiwan Pakistan Malaysia Korea Japan Indonesia India Hong Kong Fiji China Other Thailand

0 5 10 15 20 25 30 35 40 45%

Maritime Studies Matriculation Hotel and Tourism Engineering Studies Computer Studies Business Office Studies Building and Construction Aviation Arts and Media Applied Science Rural Studies Manufacturing Fashion Marine Engineering

(5)

There were twice as many students from Hong Kong as there were from Indonesia.

Pakistan had the least number of students.

Rural Studies and Manufacturing were the least popular courses. Can you write down three other facts from the graphs?

Example 2

A sample of 800 people were surveyed on the mode of transport they use to get to work. The results are shown in this divided bar graph.

(a) How many people walk to work?

(b) What percentage of commuters catch either a bus or a train to work? (c) How many people prefer not to cycle or walk to work?

Solution

The length of the bar is 120 mm.

120 mm represent 800 people, so 1 mm represents people.

(a) Length of ‘walk’ segment is 21 mm.

Number who walk to work = × 21 = 140

(b) Length of ‘bus + train’ segments is 11 + 43 = 54 mm. Number who catch a bus or train = × 54 = 360

% who catch a bus or train = × 100% = 45%

(c) Length of ‘train + car + bus’ segments is 43 + 35 + 11 = 89 mm. Number who do not cycle or drive = × 89 ≈ 593

1. This graph shows how Pete’s Pizza Parlour spends the money from pizza sales.

(a) Apart from rent, give three overheads that Pete may have.

(b) How much of a large pizza, sold for $16, is spent on (i) overheads and (ii) labour? (c) Pete’s profit for selling pizzas over a

week is $900. How much did he earn from pizza sales in that week?

(d) If Pete’s profit increases to $1200 next week, how much will he spend on advertising? (e) What are the advantages of using a sector

graph to display this information? (f) Represent the data in a segmented

(divided) bar graph.

Transport used to go to work

Train Car Walk Bus Cycle

800 120

---800 120

---800 120

---360 800

---800 120

---Exercise 4-02:

Interpreting graphs

Ingredients 25% Labour

35%

Profit 15%

Advertising 5%

(6)

2. The line graph shows the sales of compact discs over a 6-month period.

(a) How many CDs were sold in January? (b) What could be some of the reasons for

the most discs being sold in January? (c) In which month was the least number of

CDs sold?

(d) In which month were 250 000 CDs sold? (e) Between which 2 months did the

greatest drop in sales occur? (f) What were the total sales for the

6 months?

(g) What percentage of the CD sales were in November?

(h) Represent the information in a column graph.

3. This graph and table, published in the Sydney Morning Herald on 18 March 1999, show

the results of a poll taken in the seat of Kogarah to look at voting intentions before a state election.

(a) Write a few sentences describing what information is shown in the graph. (b) What party was most likely to win the seat of Kogarah?

(c) What was the party preferred by the over-55 age group? (d) Which party did males prefer?

(e) How many voters were in the Kogarah district?

(f) The ALP polled 44%. Does this mean it would win the seat? Give reasons.

4. (a) What is the graph about?

(b) What was the budget for Die Hard, and how much did it make at the US box office?

(c) Do you think the graph is a good indicator of Bruce Willis’s popularity?

(d) Which movie grossed the most money at the US box office? (e) Bruce earned $A30 million for

Die Hard III. Was this salary justified and why?

(f) Which movie had the smallest budget?

CD sales

Number sold (

× 1000) 500 400 300 200 100 0

Oct Nov Dec Jan Feb Mar

Month

Voting intentions in Kogarah

LAKEMBA MIRANDA GEORGES RIVER Hurstville Kogarah Allawah Connells Point Kogarah Bay Georges River Illawarra Railway

Kogarah

40% Liberal 44% ALP 15% 6% 4% 3% 3%

GREEN ONE NATION DEMOCRATS OTHER UNDECIDED Herald — A.C.Nielsen poll

Primary voting intentions

No. of voters 44 837

10–11 MARCH 1999

Two-party preferred

BY GENDER

TOTAL MALE FEMALE

LABOR 54 58 49

COALITION 46 42 51

BY AGE

18–24 25–39 40–54 55+

LABOR 56 57 54 49

COALITION 44 43 46 51

Blakehurst

1987 1988 1988 1990 1990 1990 1991 1991 1992 1993 1994 BLIND DATE DIE HARD HUDSON BILLY DEATH DIE HARD

HAWK BATHGATE BECOMES HER III

DIE HARD II SUNSET BONFIRE OF

THE VANITIES BOY SCOUTTHE LAST DISTANCESTRIKING

# $1.5m # $7m

# $15m # $13m # $2m # $30m

# $2m # $9m

# $10m # $10m # $10m # Bruce Willis’s wage

$Am 175 150 125 100 75 50 25 0

Budget US box office

(7)

5. This graph is derived from UAC

application statistics, 1993–94. (a) What is the source of this data? (b) What information is contained

in the graph?

(c) Western Sydney has a score of

−13.7. What does this mean? (d) How does Sydney University

suffer most?

(e) What can you say about the University of NSW?

(f) Write down two good points about this graph.

6.

(a) Does a 16-year-old girl with a body mass index (BMI) of 30 have a healthy weight? Give reasons.

(b) What are the upper and lower limits of the BMI for a healthy 17-year-old boy? (c) Can you tell the weight of a 15-year-old girl from this graph? What other

information would you need to know besides her BMI?

(d) John, aged 15, calculated his BMI to be 14. What advice would you give him based on the information in the graph?

(e) Mary’s BMI is 25. If she is 16 years old, what advice would you give her?

(f) Write down the upper and lower limits of the BMI for a healthy 17-year-old female.

7. This graph from the Bureau of Meteorology gives rainfall information for Alice Springs.

(a) Which month had the largest number of raindays? (b) Which month had the highest

rainfall?

(c) What was the rainfall for (i) September and (ii) April? (d) How many raindays were in

(i) February and (ii) October? (e) When would be the best time to

travel to Alice Springs? Why? (f) What is the significance of the

line graph?

(g) Comment on the effectiveness of this graph.

Sydney Uni suffers most

WHERE THEY WANT TO STUDY IN 1995

as student demand falls

–23.4 –16.0 –13.7 –11.5

–10.9 –2.3 2.9 Sydney Wollongong Western SydneyNewcastle Macquarie UTS UNSW

(% change since last year)

BOYS 30 28 26 24 22 20 18 16 14

9 10 11 12 13 14 15 16 17 18 Healthy Underweight Overweight Becoming overweight GIRLS Age

Body mass index (kg/m

2)

YOUR HEALTHY WEIGHT RANGE

Your body mass index equals your weight divided by your height squared

30 28 26 24 22 20 18 16 14

9 10 11 12 13 14 15 16 17 18 Healthy

Underweight Overweight

Becoming overweight

Age

Body mass index (kg/m

2)

Alice Springs—rainfall data

Rainfall (mm) 50 40 30 20 10 0

J M M J S N

Month

No. of raindays

30 25 20 10 5 0 15 J D

F A A O

(8)

8. The graph shows the amount of uranium ore produced at the Ranger mine in Australia.

(a) How many tonnes of uranium ore were produced at the Ranger mine in 1998? (b) In which years were less than 2000 t produced?

(c) In which 5 years did the mine produce 3000 t of uranium ore? (d) How many kilograms of uranium ore were produced in 1995?

(e) The Ranger mine produced 10% of the world’s uranium ore in 1996, when it produced 3508 t. Why is it not possible to read this amount from the graph?

9. Kara and Shelley spend a portion of their weekly wages on entertainment, as shown in

the sector graphs.

(a) If they both earn the same amount per week, who spends the largest portion on entertainment? Justify your answer.

(b) Measure the sector angles and hence determine the percentage of their wages spent on entertainment each week.

(c) If Kara earns $650 per week and Shelley earns $900 per week, how much does each spend on entertainment per week?

(d) Name another type of graph that could be used to represent this information.

10. The graph on the next page shows how many databases were accessed and how many

records were downloaded from the Internet from July 1996 to July 1998. (a) How many databases were accessed in July 1997?

(b) How many records were downloaded in May 1998?

(c) Between which 2 months was the largest decrease in database access?

(d) Which 2-month period showed the greatest increase in the downloading of records? (e) How many databases were accessed in the first 6 months of 1998?

(f) How many ‘hits’ were there in the last 6 months of 1996? (g) Comment on the suitability of a line graph to represent the data.

Ranger production (year ending 30 June)

Uranium ore (

t)

5000

4000

3000

2000

1000

0 ’82

Year

’83 ’84 ’85 ’86 ’87 ’88 ’89 ’90 ’91 ’92 ’93 ’94 ’95 ’96 ’97 ’98

Weekly spending

Kara Shelley

Entertainment

(9)

(h) From this graph, do you think it is possible to predict usage in the next 6 months of 1998? Give reasons.

TYPES OF DATA

Let us consider these questions and their responses:

‘Do you own a pet?’ The answer is ‘yes’ or ‘no’. A categorical answer ‘How many pets do you own?’ The answer is a number. A numerical answer

Categorical (or qualitative) data obtained from a categorical variable is information that

can be put into different categories. The categories are either distinct or arranged in some

order.

Some examples of categorical data are:

Numerical (or quantitative) data is obtained from a numerical variable and is information

that is represented by numbers. The data can be discrete or continuous.

Discrete data is obtained through a counting process. The possible values are clearly

separated from one another.

Continuous data is obtained through a measuring process. The possible values are on a

continuous scale.

Categorical data Distinct categories

Pets owned Cat Dog Fish Bird Other None

Political party preference Liberal Labor Democrats Independent Other

Type of vehicle driven Car Motorbike Truck Other None

Computer ownership Yes No

Categorical data Ordered categories

Hotel classification ✩✩✩✩✩ ✩✩✩✩ ✩✩✩ ✩✩ ✩

Test grades A B C D E F

Movie classification X R MA PG G

Product satisfaction Very satisfied

Fairly satisfied

Neutral Fairly unsatisfied

Very unsatisfied

Restaurant rating 0 10

Internet use

DB accesses

80 000

50 000

30 000 20 000 10 000 0

Year

90 000

70 000

40 000 60 000

May 97

Jul 96 Sep 96

Nov 96 Jan 97 Mar 97 Jul 97 Sep 97 Nov 97 Jan 98 Mar 98 May 98 Jul 98

Records downloaded (hits)

16 000 000

10 000 000

6 000 000 4 000 000 2 000 000 0 14 000 000

8 000 000 12 000 000 18 000 000

(10)

Some examples of numerical data are:

1. What is the difference between a categorical and a quantitative variable? Give an

example of each type.

2. State whether the following data is numerical (N) or categorical (C).

(a) mobile phone ownership (b) number of computers in a home (c) type of driver’s licence held (d) amount of time spent on homework (e) number of textbooks needed for Maths (f) gender

(g) method of payment for a TV set (h) income

(i) temperature during a 24-hour period

(j) amount of time spent on the Internet per month

3. Classify the data as discrete (D) or continuous (C).

(a) temperature at noon taken over a 1-month period (b) weekly incomes of politicians

(c) number of wickets taken by bowlers in a 1-day cricket match (d) placings in the men’s 100 m sprint final at the last Olympic Games (e) ages of boys in the school swimming team

(f) heights of the world’s 10 tallest buildings (g) shoe sizes of Year 11 students

(h) weights of your family group (i) number of pairs of shoes owned

(j) distance travelled by a fleet of company cars in a year

Discrete numerical data Continuous numerical data

Number of CDs owned Height of a person

Number of children in a family Length of telephone calls

Number of tries scored in a football match Interview times for job applicants

Shoe size Noise level at an airport

Numerical Categorical

Discrete Continuous Data type

Exercise 4-03:

Types of data

Numerical implies number.

Categorical implies category.

Quantitative implies quantity.

(11)

SAMPLE TYPES

In statistics, a population refers to the total number of items under consideration, and a census is a survey that includes every member of a population.

When a population is too large or too difficult to survey, a sample of items from the population is taken and used to obtain information from and make predictions about the whole population. For example, this fish tank has a population of 26 fish and a sample of 6 fish is being taken from the tank.

Types of random samples

If a sample is to be truly representative of a population, it needs to be large enough to draw conclusions from and to be selected without bias. The most common types of random sampling are:

In a simple random sample, each member of the population is equally likely to be chosen, so the sample has all the attributes of the whole population. For example, names are drawn out of a hat, or the winning balls in Lotto are picked by a tumbler.

In a systematic sample, the first member of the sample is chosen at random, then the other members are chosen at regular intervals. For example, every 20th light globe is taken from a factory conveyor belt and tested for defects.

In a stratified sample, the population can be divided into strata or layers, then a random sample is taken from each stratum. For example, if it is known that 60% of car owners are male and 40% are female, a stratified sample would need to contain males and females in the ratio 60 : 40 or 3 : 2.

Choosing the most appropriate sample type

Example 3

The mayor is visiting the local high school and a sample of 6 students is to be chosen from a class of 35 students.

Solution

A simple random sample is best here, so that every member of the population is equally likely to be chosen. Some ways of choosing such a sample are:

(a) You could put all 35 names in a hat and draw out 6 names.

(b) Each student could be allocated a number, then a table of random digits or a calculator used to select 6 numbers.

Example 4

A manufacturer makes torch batteries and wants to ensure that quality is consistent. The torch batteries move along a production line and are packed in boxes at the final stage. She decides to test 10% of the batteries made in a day.

Solution

(12)

Example 5

There are 6500 female students and 4600 male students at a local TAFE college. A survey is to be carried out to see how many students have part-time jobs. It is decided to use a sample of 200 students.

Solution

A stratified sample should be used here to reflect the different numbers of males and females.

Fraction of females at the TAFE college=

No. of females in sample= × 200 ≈ 117

Fraction of males at the TAFE college=

No. of males in sample= × 200 ≈ 83

Hence the sample should consist of 117 females and 83 males, randomly chosen from the population of 11 100 TAFE students.

1. The Australian Bureau of Statistics (ABS) conducts a Census of Population and Housing

every few years and collects details about every person in Australia on census forms. Find out how often the ABS conducts this census and when the next one will be.

2. Would you use a stratified (St) or a systematic (Sy) sample to get information about:

(a) quality of running shoes? (b) best Internet service provider? (c) shelf-life of bread?

(d) effectiveness of the local council? (e) best meat pie in Australia?

(f) whether Australia should become a republic? (g) what new books to buy for the school library? (h) sports venues in your suburb?

Justify your answers.

3. In which of the following cases is random sampling not appropriate? If random sampling

is appropriate, would you choose a simple random (R) sample, a systematic sample (Sy) or a stratified sample (St)?

(a) quality control of light globes (b) testing a new AIDS cure

(c) analysing offshore oil workers’ needs and wants (d) surveying the students in your school for favourite sport (e) product quality survey to be carried out in a shopping mall (f) survey of migrants newly arrived in Australia

(g) children per household in Australia

(h) effects of a new drug on terminally ill patients (i) animal behaviour

(j) determining the most popular brand of cat food (k) finding the most popular restaurant in Sydney (l) testing the life of a car battery

(m) determining whether there are really 50 matches in each box of matches 6500

11 100

---6500 11 100

---4600 11 100

---4600 11 100

(13)

4. What type of sampling is used to determine the top 10 TV shows?

5. The table shows persons hospitalised after road traffic accidents in 1996.

If a stratified random sample of 150 people is to be taken, how many of each type of road user should be included? Comment on the method you would use to select the sample and any problems you might have.

SAMPLING TECHNIQUES

Random numbers

Random numbers can be used to simulate a variety of situations. You can generate random

numbers from your calculator or you can use a table of random sampling numbers.

To use a table of random numbers, choose any starting point, then move systematically up, down or diagonally. We have selected the starting point 4759, but you could choose any starting point and move systematically in any direction.

Drivers Passengers Pedestrians Motorbikers Cyclists Total

9758 5742 2792 2456 1129 21 935

Random sampling numbers

20 17 74 49 94 70 22 15 93 29 42 28 04 49 49 31 78 15 12 18 23 17 03 04 38 67 69 84 27 30 59 66 10 33 23 42 32 52 30 55 38 61 53 70 29 65 32 54 91 87 02 10 11 54 40 88 15 12 50 57 86 10 48 63 78 71 54 02 58 51 51 55 94 60 37 18 01 37 49 36 92 52 94 49 48 64 38 37 12 53 44 25 57 38 06 57 12 93 96 40 45 04 41 91 16 23 04 50 32 70 77 97 99 49 91 02 65 04 17 72 36 14 89 39 19 96 65 65 03 61 99 45 94 60 47 59 82 42 66 26 52 95 48 49 89 65 70 51 24 71 69 85 06 77 27 84 55 04 22 77 03 83 64 72 30 92 61 47 88 33 51 87 59 26 63 37 88 83 17 78 85 56 08 51 26 24 99 34 08 92 22 37 25 57 23 66 82 37 73 49 03 64 02 49 61 00 89 03 01 72 59 07 00 90 95 86 90 49 33 85 42 95 67 86 98 36 28 74 52 40 81 39 93 48 14 03 21 04 60 07 06 41 31 83 48 88 09 96 06 71 20 81 19 07 51 07 60 45 89 27 92 34 67 68 33 40 22 03 14 29 51 90 49 03 06 86 52 80 55 24 39 08 27 47 33 76 01 79 83 79 21 42 52 03 68 57 33 81 31 96

Just for the record

D

EFINITION OF AFAMILY

The ABS defines a family as ‘two or more persons, one of whom is at least 15 years of age, who are related by blood, marriage (registered or de facto), adoption or fostering, who are usually resident in the same household’.

(14)

Example 6

To get a list of two-digit random numbers, start at 4759 and read across. The numbers will be: 89 65 27 84 30 92 63 37 and so on.

These values could be used to select a random sample of people numbered from 00 to 99, or a sample of two-digit house numbers.

Example 7

To get a list of three-digit numbers between 500 and 1000, start at 4759 and read across. Take the first three digits of each four-digit number, omitting any that are not in the required range. The numbers will be: 896 633 650 656 824 705 550 and so on.

Capture recapture technique

The capture recapture technique is a way of estimating the size of a population. It is used in the fishing industry to estimate fish populations in rivers, lakes and oceans and to monitor the number and size of each species that can be caught for commercial sale.

Example 8

In 1995, the Fisheries Research and Development Corporation in Victoria captured, tagged and released 10 000 southern rock lobsters. In March 1996, a large number of southern rock lobsters were taken from the sea, and 13% were found to be recaptured tagged lobsters. Estimate the total population of southern rock lobsters in Victoria in March 1996.

Solution

From the information we can estimate that the tagged group of 10 000 represented 13% of the southern rock lobster population in Victoria.

13% of population= 10 000 1% of population=

100% of population= × 100%

≈ 76 900

So the total population of southern rock lobsters in Victoria in March 1996 was about 76 900.

Bias and non-random sampling

Sometimes random sampling is not possible or not convenient. This could be, for example, because selecting people randomly is too difficult or because not all items in a population are known and hence random selection is not possible.

Bias usually occurs when a non-random sample favours one section of the population and

hence is not representative of the whole population. Bias can also occur through response or

non-response to a poll or survey.

4759 has not been included.

10 000 13

(15)

---Example 9

Here is an article that appeared in the Tasmanian Mercury. Look at the statistics in the article and comment on:

(a) suitability of sample size (b) any issues involving bias

Solution

(a) The large sample size would seem to be adequate to draw the conclusion that the population favoured decriminalisation.

(b) The sample could have involved bias for the following reasons:

Triple J listeners are usually young and represent only one section of the population.

A phone-in poll usually attracts people with a strong opinion either way.

The poll was restricted to listeners of the particular radio station and to those listening at the time.

There was no restriction on the number of times a person could phone in.

1. Give three ways that bias can occur when sampling.

2. A sample of 100 people is to be chosen from the Sydney White Pages telephone

directory.

(a) How would you choose the people to ensure that you get an unbiased sample? (b) Give two ways of choosing people that would result in a biased sample.

3. Comment on this article from the Tasmanian Mercury, by discussing the

appropriateness of the sample size and any issues of bias.

Decriminalise drug use: poll

Some 96 percent of callers to youth radio station Triple J have said marijuana use should be decriminalised in Australia.

The phone-in listener poll, which closed yesterday showed 9924—out of the 10 000-plus callers—favoured decriminalisation, the station said. Only 389 believed

possession of the drug should remain a criminal offence.

Many callers stressed that they did not smoke marijuana but still believed in

decriminalising its use, a Triple J spokesman said.

The poll followed a recent decriminalisation ruling in the Australian Capital Territory.

26 September 1992

Exercise 4-05:

Sampling techniques

THAT’S LIFE

About six in ten United States high school students say they could get a handgun if they wanted one, a third of them within an hour, a survey shows. The poll of 2508 junior and senior high school students in Chicago also found 15 per cent had actually carried a handgun

within the past 30 days, with 4 per cent taking one to school. As well, the survey, sponsored by the Chicago-based Joyce foundation, found 35 per cent said they felt their lives could be cut short by handgun violence.

(16)

4. The shire council wants to investigate what library users think of the library service.

Choose the method of sampling that you think would be best in this case and say why you rejected the others.

(a) Survey the first 10 people through the library door each morning for a week. (b) Call a meeting of interested persons.

(c) Send a survey form to all homes in the shire.

(d) Leave a bundle of survey forms at the library for people to fill in.

(e) Survey the first person who comes into the library each half-hour every day for a month.

(f) Send a survey form to everyone who has borrowed a book from the library in the last 12 months.

(g) Ask only adult borrowers.

(h) Get a list of ratepayers from the shire council and choose every 10th person.

5. Use the table of random sampling numbers on page 125 to choose a sample of 20 people

aged between 50 and 100, by starting at 3196 (the last number in the table) and moving left.

6. The capture recapture technique was used to estimate the population of blackfish in the

upper Cotter River of the ACT. The number of blackfish caught, tagged and released was 1500. When 250 blackfish were later taken from the river, 20 were found to be recaptured tagged ones. What was the estimated population of blackfish in the region?

7. A large number of trout were caught in Lake Eucumbene, and 12% were found to be

tagged. If 3500 trout had previously been captured, tagged and released into Lake Eucumbene, estimate the number of trout in the lake.

8. Simulate a random sample of 30 children aged from 3 to 5 years by commencing at the

top right of the random digit table on page 125, at 4425, and moving down and up. What percentage of the sample were aged 4 years?

DESIGNING A QUESTIONNAIRE

Information is usually gathered for a survey from a telephone or personal interview or by supplying a written questionnaire. A questionnaire is a series of carefully worded questions dealing with a variety of beliefs, attitudes, behaviours and other characteristics. The questionnaire must enable the investigator to gather meaningful information. For example:

A market researcher may conduct a survey to collect information about TV viewing habits.

A manufacturer may survey the potential market before introducing a new product.

A government agency may gather data in order to evaluate existing laws before looking at introducing new ones.

Voters are surveyed before an election so that public perceptions of the candidates and election issues are known.

Steps in designing a questionnaire

1. Choose the topic for your survey.

2. Choose the format of the answer/response you want. 3. Formulate the questions.

(17)

Sample questions from the ABS 1996 Census of Population and Housing

Format of answers

Answers to survey questions can take several forms. Here are some examples:

Select from given word options.

Rating on a scale.

Put a tick (✔) or cross (✖) in a box.

Write a numeral.

Write a sentence.

What is your marital status? Never married

Widowed Divorced Separated but not divorced

Married

Are you an Australian citizen? Yes No

How well do you speak English? Very well Well Not well Not at all

What is the highest qualification you have completed since leaving school?

Full name of qualification

Age last birthday years

Are you male or female? Male Female

How much does your household pay for this dwelling?

$ per week $ per fortnight $ per month

How many bedrooms in this dwelling?

Are you attending a school or other educational institution?

Yes, full-time student Yes, part-time student No

Question type Answer

What type of vehicle do you own? Car Motorbike Truck Other None

How did you enjoy your meal? 0 10

Sex Male Female

How many children in your family?

(18)

Designing good questions

Suppose that we want to gather information on current cigarette usage. First we decide what information we want; then we design the questions in order to get this information.

Consider these two alternative questions:

This is not a good question as it is ambiguous. If a person answered YES, it would not tell us how many, what type or how often the person smokes. If we used this question we would need to formulate more questions to follow a YES answer.

Since we are interested in current cigarette usage, this question is better as the answer would indicate whether the person smokes cigarettes as well as whether the person is a light or heavy smoker. (We would need to decide on a figure that indicates light/heavy.)

Features of a good questionnaire

A good questionnaire will enable you to obtain the data you require for analysis. It should have:

clear instructions;

questions written in plain English with the highly relevant ones first;

questions that are unambiguous and free from bias;

answers that are easy to interpret (questions requiring a sentence to be written should be optional, as this data is not easy to analyse);

a good length (if it is too long, people will be less likely to respond; and if it is too short, it may not seem worthwhile);

a covering letter giving the reasons for the survey, the importance of a response and the name of a contact person;

an assurance that the respondent’s privacy will be respected.

Reasons for non-response

Here are some typical reasons for a person not responding to a questionnaire:

It was sent to a wrong address.

It was sent to a deceased person.

It has unclear instructions.

The person lacks time.

The person has more important things to do.

It looks like a sales gimmick.

The person didn’t think it a worthwhile project.

Do you smoke? Yes No

How many cigarettes do you smoke per day?

Just for the record

M

OST LIKELY RESPONDENTS
(19)

Just for the record

P

IONEERS OFSTATISTICS

John Graunt (1626–97) was one of the first people to attempt to use statistics to study human populations. He wrote ‘Observations on the Bills of Mortality’, and one of his observations was on the inequality of burials in the various parishes in and around London, posing what is maybe one of the earliest questions on statistics.

Carl Friedrich Gauss (1777–1855) was a German mathematician who is considered to be the inventor of non-Euclidian geometry but was also deemed to be the originator of statistical curves.

Florence Nightingale (1820–1910) was an English nurse who is probably best remembered for improving sanitary conditions in the Crimean war and being the founder of a training school for nurses, but she also had a role as a statistician. In the papers she wrote, she referred to the social and moral sciences as ‘statistical sciences’ and to statistics as almost a religious experience.

Sir Francis Galton (1822–1911) was the first person reported to employ questionnaire and survey methods, which he used to investigate mental imagery in different groups of people, but he is best known for his work on normal distributions.

Karl Pearson (1857–1936) was an English statistician who applied statistics to hereditary and evolutionary problems in biology.

William Sealey Gossett (1876–1937) was the statistical adviser to the Guinness Brewery. He invented the t-test to handle small samples of the brew for quality control.

Sir Ronald Aylmer Fisher (1890–1962) studied the design of experiments, developed methods for small samples and invented the analysis of variance.

(20)

As well as learning about various practical aspects of statistics, it is also worthwhile acquiring some background knowledge by looking at some of the people who did pioneering work with statistics. You can use the library or the Internet to find out more about these and other pioneers of statistics.

1. Choose one or two of the people listed in Just for the record on the previous page, and

find out more about them and their work. Write a half-page research report and present it to your class group.

2. Write half a page on the Doomsday book and how it relates to your study of statistics. 3. ‘There are lies, damned lies and statistics.’ Comment on this statement by Benjamin

Disraeli.

4. Find out about the early Australian ‘musters’ and how they related to statistics. 5. Write half a page about Australia’s first census.

1. Design a questionnaire using 5–8 questions to find out how your friends/family spend

their weekend time/pocket money/wages.

2. Design a questionnaire that a market research company could use to conduct a survey in

a large shopping centre. Suggested topics could be the amount of time that working/ non-working men/women spend shopping for food/clothes, etc. or perhaps for cleaning products used in the home or other product of your choice.

3. Design a questionnaire with 5–8 questions to survey one or more of the following:

favourite teacher mobile phone ownership

favourite subject computer ownership

favourite movie hobbies

favourite sport to watch Internet access favourite sport to play TV shows

4. Administer one or more of your questionnaires from question 3 to your class (or other

group) and discuss outcomes. (a) Did you obtain useful data?

(b) Were there too many/too few questions?

(c) Were some questions misleading or not necessary? (d) How could the questionnaire be improved? (e) Was your survey successful? Why or why not?

CONSTRUCTING GRAPHS

Data represented graphically is visually appealing, and we are more likely to notice patterns if data is presented in a statistical graph.

A statistical display should be simple and interesting and make an impact, but should not mislead the reader.

Investigation:

Pioneers of statistics

(21)

Example 10

Students were surveyed on the number of compact discs (CDs) they had purchased in the last 6 months. The results from 50 students were:

Represent this data graphically in:

(a) a column graph (b) a line graph

Solution

First organise the data in a frequency distribution table.

(a) Column graph

(b) Line graph 3 4 2 4 1 6 3 0 5 3 5 2 6 3 4 4 4 3 0 5 2 5 1 2 4 1 0 4 4 2 3 1 2 6 0 4 2 5 6 3 0 1 3 2 6 1 3 1 0 3

No. of CDs Tally No. of students

0 6 1 7 2 8 3 10 4 9 5 5 6 5 Total 50 ⎥⎥⎥⎥ ⎥ ⎥⎥⎥⎥ ⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎥⎥

The columns can be any width but must be uniform. You can draw the columns as straight lines or even have horizontal columns (sometimes referred to as a bar chart).

No. of CDs

0 1 2 3 4 5 6

No. of students

10 9 8 7 6 5 4 3 2 1 0

No. of CDs

1 2 3 4 5 6

No. of students

(22)

Example 11

47 100 students enrolled in Year 10 were monitored for Year 11 and 12. The results are shown in the table. Represent the data in: (a) a divided bar graph

(b) a sector graph

Solution

In a divided bar graph or sector graph, the whole rectangle or circle represents 100% of the data, so that the different categories are represented by a part of the whole rectangle or circle. (a) For the divided bar graph, first choose an appropriate length, say 12 cm, then calculate

the length of each segment.

(b) To draw the sector graph, first you need to determine the angle of each sector.

Level reached No. of students Fraction Length of segment

Completed Year 12 33 900 × 12 ≈ 8.64 cm

Left after Year 11 7 000 × 12 ≈ 1.78 cm

Left after Year 10 6 200 × 12 ≈ 1.58 cm

Total 47 100 1 12 cm

Level reached No. of students Fraction Size of sector angle

Completed Year 12 33 900 × 360° ≈ 259°

Left after Year 11 7 000 × 360° ≈ 54°

Left after Year 10 6 200 × 360° ≈ 47°

Total 47 100 1 360°

Level reached No. of students

Completed Year 12 33 900

Left after Year 11 7 000

Left after Year 10 6 200

Total 47 100

33 900 47 100

--- 33 900 47 100

---7000 47 100

--- 7000 47 100

---6200 47 100

--- 6200 47 100

---12 cm

Completed Year 12 Left afterYear 11 Left afterYear 10

33 900 47 100

--- 33 900 47 100

---7000 47 100

--- 7000 47 100

---6200 47 100

--- 6200 47 100

---Completed Year 12

Left after Year 10

(23)

This is how to use a Chart Wizard in one spreadsheet program, Excel.

1. Type the data and labels into a spreadsheet in cells A1 to C2.

2. Select (highlight) the cells A1:C2.

3. Open the Chart Wizard and follow these steps:

Step 1 Chart type—choose ‘pie’.

Step 2 Chart source data—not needed here. Step 3 Chart options —show data labels

—add a chart title

—remove legend if required.

Step 4 Chart location—displays the chart ‘as an object in’ the spreadsheet.

Some other options that you can try are:

Select the chart, go to Chart Wizard and change the chart type. Try a ‘pie explosion’ or a ‘doughnut’.

Click and drag the labels into the actual sectors of the graph.

Format the chart by clicking on any part of it and using Format on the toolbar.

Cut and paste the chart into a Word document.

Hint: Practice and patience are essential when learning to create statistical graphs.

What type of statistical display should you choose? In general, it is best to choose:

a column graph or line graph for numerical data

a divided bar graph or sector graph for categorical data. A good graph is:

interesting and eye catching

appropriate for the type of data

easy to interpret

simple but not misleading

clearly labelled with a scale and name on each axis.

From your statistics file of graphs collected from the media (see page 116), decide which are good graphs and which are bad examples. Can you turn the bad examples into good ones?

A B C

1 Yr 12 Yr 11 Yr 10

2 33 900 7000 6200

Technology:

Using a spreadsheet to construct a pie chart

Next>

Next>

Next>

School completion times

Yr 12 Yr 10

Yr 11

Finish

(24)

1. The data below shows confinements (time spent by a woman in childbirth) and births for

1996 and 1997.

Source: NSW Midwives Data Collection, Epidemiology and Surveillance Branch, NSW Health Department.

(a) How many confinements were recorded in 1996? (b) How many births were recorded in 1997?

(c) What percentage of confinements resulted in two births in 1996? (d) Draw a divided bar graph representing births for 1996.

(e) Represent the confinements for 1997 in a sector graph.

(f) Comment on any difficulties you have in drawing these graphs.

Source: K. Takaki, in Kornberg, 1989.

(a) What is beriberi?

(b) Draw a column graph to represent the deaths from beriberi from 1880 to 1888. (c) Why did the number of deaths start to decline in 1884?

(d) What percentage of navy personnel died from beriberi in 1882?

(e) What fraction of navy personnel died from beriberi in the year the diet was changed?

Births and confinements by plurality, NSW, 1996–97

1996 1997

No. % No. %

Number of confinements Singleton Twins Triplets Quadruplets 84 201 1 076 24 1 98.7 1.3 0.0 0.0 85 740 1 147 32 1 98.6 1.3 0.0 0.0

Total 85 302 100.0 86 920 100.0

Number of births Singleton Twins Triplets Quadruplets 84 201 2 152 72 4 97.4 2.5 0.1 0.0 85 740 2 293 96 4 97.3 2.6 0.1 0.0

Total 86 429 100.0 88 133 100.0

2. Takaki’s Japanese naval records of deaths from beriberi

Year Diet Total navy personnel Deaths from beriberi

1880 1881 1882 1883 1884 1885 1886 1887 1888 Rice diet Rice diet Rice diet Rice diet

(25)

3. This table shows the number of deaths of mothers just after childbirth at two clinics

(hospitals) in the mid 19th century.

Source: Ignaz Semmelweiss, The Etiology, Concept and Prophylaxis of Childhood Fever.

(a) On the same set of axes, draw line graphs showing the deaths at each clinic from 1841 to 1846.

(b) Which clinic had fewer deaths? Give two possible reasons for this.

(c) In what year did (i) the most number of deaths occur and (ii) the least number of deaths occur?

(d) Represent the births in each clinic in a clustered column graph (that is, with two columns per year).

4. The table gives the smoker status for Australia’s 13 million adults, recorded in the 1995

National Health Survey. The survey showed that 23.8% of adults were smokers, that more men than women smoked and that the prevalence of smoking declined with age.

Source: 1995 National Health Survey.

(a) Represent the smoker status of the over-65s in a sector graph. (b) Display the data for females in a segmented bar graph. (c) Who would you target for a Quit Smoking program and why?

Deaths of mothers after childbirth

First clinic Second clinic

Year Births Deaths Births Deaths

1841 1842 1843 1844 1845 1846

3 036 3 287 3 060 3 157 3 492 4 010

237 518 274 260 241 459

2 442 2 659 2 739 2 956 3 241 3 754

86 202 164 68 66 105

Total 20 042 1 989 17 791 691

Smoker status by age and sex, 1995

Age Sex

Status

18–44 years 45–65 years >65 years Males Females Persons

% % % % % %

Smoker Ex-smoker Never smoked

28.9 21.8 49.3

20.7 32.2 47.1

11.3 38.3 50.4

23.7 32.4 40.4

20.3 22.5 57.1

23.8 27.4 48.9

(26)

Source: 1995 National Health Survey.

(a) Use a column graph to represent, by age, the average daily intake of alcohol by consumers of alcohol.

(b) Use a divided bar graph to represent the alcohol-consuming status of females. (c) The information in the table was obtained from the 13 million adults in Australia

who took part in the National Health Survey in 1995.

(i) What percentage of males said that they consumed alcohol?

(ii) What percentage of 18 to 44-year-olds said that they did not drink alcohol?

6. The table below shows the exports from Australia to Brazil and imports to Australia

from Brazil from 1993 to 1997.

(a) In what year(s) did exports exceed imports?

(b) What was the value of exports from Australia to Brazil over the 5-year period? (c) How much was paid for imports from Brazil over the 5-year period?

(d) Draw a clustered column graph to represent the imports and exports for the years 1993–97.

7. In a 2-week period, the Australian Customs noted that new immigrants to Sydney were

from:

(a) What percentage of immigrants were from Iraq?

(b) What fraction of immigrants were not from China or India? (c) Draw a divided bar graph to represent this information.

5. Alcohol consumption by age and sex, 1995

Age Sex

Status 18–44 yrs 45–65 yrs >65 yrs Males Females

Did not consume alcohol % 42.0 44.0 54.5 34.2 54.6

Consumed alcohol % 58.0 56.0 45.4 65.8 45.4

Average daily intake for

consumers of alcohol mL 56.7 37.8 25.4 57.6 32.8

Trade between Australia and Brazil

1993 1994 1995 1996 1997

Exports from Australia ($Am) 375.2 272.4 328.0 395.9 376.8

Imports to Australia ($Am) 351.0 389.5 481.3 384.0 373.3

Middle East 20

China 35

India 7

Afghanistan 10

Turkey 3

Iraq 12

(27)

8. In one week, a survey was carried out in Sydney to determine the most popular evening

news broadcast on the 4 main TV stations. Participants were asked to nominate their favourite news bulletin. The results were:

(a) How many people were surveyed?

(b) Of the persons surveyed, what percentage watched the ABC news? (c) Represent the data in a segmented bar graph.

(d) Was this data gathered from a sample or a census? Justify your answer.

(e) Does this data reflect the TV news-watching habits of all Australians? Justify your answer.

9. The data on drug seizures for 1995–96 and 1996–97

was obtained from Customs records.

(a) Draw a column graph to represent the amount (in kilograms) of each drug seized by Customs in 1995–96.

(b) Why would a column graph to represent the mass of drugs seized in 1996–97 be inappropriate?

(c) What was the increase in ecstasy seizures from 1995–96 to 1996–97?

(d) Which drug had the greatest increase in seizure? What was the percentage increase?

10. A survey of Sydney industrial property revealed that property rental growth in 1998 was

as follows:

(a) Display this data in a sector graph.

(b) Use a divided bar graph to represent the information.

(c) Which of these graphs do you think best represents this data and why?

Channel 9 486 000

Channel 2 (ABC) 341 000

Channel 7 323 000

Channel 10 308 000

Warehouse and office 63%

Manufacturing 14%

Transport and storage 9%

Hi-tech 14%

DRUG SEIZURES ON THE RISE

1995–96 1996–97

Cannabis (kg)

Cocaine (kg)

Heroin (kg)

Ecstasy (kg)

53.3

58.0

24 295.6

67.5

64.3 169.0

(28)

MISLEADING GRAPHS

Statistical graphs are often used to display information in a way that may mislead the reader. Advertisers use graphs to entice us to buy products, and company directors often use graphs to display statistical information to their advantage when dealing with shareholders or prospective clients. Here are some ways of doing this:

Example 12

Statement: The wheat crop has doubled in two years.

(a) How is the graph misleading to the viewer? (b) Redraw the graph so that it is not misleading.

Solution

(a) The graph is misleading as the second sheaf (1998) is 4 times larger than the first sheaf (1996) rather than twice as large as the statement indicates. The second sheaf has been drawn with twice the height and twice the width of the first.

(b) So that the graph is not misleading, the 1998 sheaf should be drawn with only one of its dimensions (either width or height) doubled, as shown.

Example 13

This graph represents the actual sales of computers by Rick Jones Electronics over a 5-week period.

(a) Investigate ways to mislead Rick’s clients by redrawing the graph:

(i) using the same scale on the vertical axis but a smaller scale on the horizontal axis

(ii) using the same scale on the horizontal axis but a smaller scale on the vertical axis (iii) using a scale starting at 10 on the vertical axis (b) How does each graph mislead the viewer?

Wheat crops

1996

1998

1998

1998

or

No. computers sold

40

30

20

10

0

1 2 3 4 5

(29)

Solution

(a)

(b) Altering the scales on the axes will make the line seem steeper or flatter. Graphs (i) and (iii) give the viewer the impression that the sales rose more rapidly over the 5-week period, whereas graph (ii) gives the impression that the sales rose more slowly.

1. This sales graph was presented to a group of

shareholders by the outgoing company director. (a) What does the graph describe?

(b) What message is the outgoing director trying to portray to the shareholders?

(c) What information is missing from the graph? (d) Can you redraw this graph so that it is not

misleading?

2. This graph depicts the average weekly

wage now compared to 5 years ago. (a) How is the graph misleading to the

reader?

(b) Redraw the graph so that it is not misleading to the reader.

3. The graph shows the price of a barrel of oil

just before and just after the Gulf War. (a) What happened to the price of oil just

after the Gulf War?

(b) Why is the graph misleading? (c) Redraw the graph so it is not a

misrepresentation of the data.

4. (a) What message is the author trying to

portray to the reader by this graph? (b) Redraw this graph with the vertical

scale starting at zero.

(c) Compare your graph with the original graph and comment on the differences.

(i) Compressed horizontal axis (iii) Scale not showing the zero

40

30

10

0 20

1 3 5

30

10 20

1 2 3 4 5

(ii)Compressed vertical axis

20

0 40

1 2 3 4 5

Exercise 4-08:

Misleading graphs

Sales

New company director appointed

Months

$800

$400

5 years ago

Now

$50

$25

Just after the war Before the war

Source: Sunday Mail.

BIG CAR SALES THE BIG AND SMALL OF IT

40%

35%

30%

25%

1995 1996 1997 1998 1999

(30)

5. Statement: The 10-year bond rate leapt from

4.8% in mid December 1998 to 6.4% in mid June 1999.

(a) Is the graph a good indicator of the statement given?

(b) Redraw this graph with the vertical axis starting at zero.

(c) Compare the two graphs and comment.

1. Find five misleading graphs from the media or your statistics file and redraw them so that

they do not misrepresent the data.

2. Keep adding both good and misleading statistical displays to your statistics file.

Source: Bloomberg, Sydney Morning Herald,

10-year bonds

December 11 June 11

YIELD % 6.50

6.00

5.50

5.00

4.50

12 June 1999

Study tips

O

RGANISING YOURNOTES

Your exercise book or folder is for writing up class notes, worked examples and completing exercises in class and at home. One step towards better study is to have a neat, organised collection of notes. This allows you to quickly and easily find your work in any topic. The key to having organised notes is to label everything: the names of the topics, the theory, the exercises as well as your workbook or folder.

When studying maths your class follows a program of topics. These topics should correspond roughly to the chapters of this textbook. You can ask your teacher for a copy of the program and organise your folder according to these topics. Use coloured dividers or title pages to mark the start of each new topic and write the name of the topic in big bold letters.

Start each day’s work with the title of the lesson and the date. Pay attention to your teacher’s explanations and copy the examples into your book—make sure you write down the question as well as each step in the working. You can add any personal notes, comments or reminders as well to help you later when revising the work. If you don’t understand something you should ask your teacher to explain again.

If you use a looseleaf folder, make sure your notes are filed neatly in the correct order. Some students number the pages of each topic so that they can refile if the pages get out of order. When the current folder becomes too bulky to manage you should buy a new one or at least remove the completed topics and file at home. Remember that you do not need to bring your notes for the whole Maths course to school each day.

(31)

FREQUENCY HISTOGRAMS AND POLYGONS

A histogram is used to represent quantitative data and is a column graph with no spaces between the columns. The height of each column represents the frequency of the scores.

A frequency polygon is a line graph that plots score against frequency and can also be drawn by joining the midpoints of the tops of histogram columns.

Example 14

A pair of dice was rolled 50 times and the sum recorded. Construct a frequency distribution table, then draw a frequency histogram and polygon for the data.

Solution

Frequency distribution table Frequency histogram and polygon

You can use the Chart Wizard to construct histograms and polygons. For Example 14 above, enter the scores and frequencies in cells A1:B11 in a spreadsheet.

To draw a frequency polygon, use the line graph.

To draw a frequency histogram, change your line graph to a column graph, then change the gap width to zero (you will need to format the data series).

7 7 10 12 8 7 9 8 6 10

9 9 8 5 7 3 6 4 5 7

4 4 8 7 10 8 9 7 11 7

5 9 4 6 3 9 2 7 8 8

9 7 5 8 10 6 8 7 9 2

Score Tally Frequency

2 2

3 2

4 4

5 4

6 4

7 11

8 9

9 8

10 4

11 1

12 1

Total 50

Rolling dice

Sum of dice

2 3 4 5 6 7 8

Frequency

10

9

8

7

6

5

4

3

2

1

0 11

9 10 11 12

Frequency polygon

⎥⎥ ⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎥⎥ ⎥ ⎥⎥⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎥ ⎥⎥⎥⎥ ⎥ ⎥

(32)

Your graphs should look like this:

Label each axis, and add a title also.

Example 15

Consider the table of road trauma deaths in Australia from 1990 to 1993. The data has been grouped into classes, so we do not know exact ages.

(a) What are the exact limits of each class?

(b) For the male deaths, construct a frequency histogram and polygon on the same diagram. (c) Give two observations from the graphs.

(d) Why is it appropriate to have one of the classes a different size from the others?

Solution

(a) Class 0–9 means 0 to 9.99 … or ‘0 to less than 10’; the class limits are 0 x 10. Class 10–19 means 10 to 19.99 … or ‘10 to less than 20’; the class limits are 10 x 20. And so on.

(b)

Deaths from road trauma, Australia, 1990–93

Age group Male deaths Female deaths

0–9 10–19 20–29 30–39 40–49 50–59 60–69 70–79 80+

257 1053 1879 925 589 411 389 365 213

157 387 540 336 225 196 286 337 139

Total 6081 2603

12

10

8

6

4

2

0

2 3 4 5 6 7 8 9 10 11 12

12

10

8

6

4

2

0

2 3 4 5 6 7 8 9 10 11 12

10

Age (years)

20 30 40 50 60 80

Male deaths

2000 1800 1600 1400 1200 1000 800 600 400 200

0 70

(33)

(c) Younger people between 10 and 40 were more likely to be killed as a result of a road accident, and there were fewer deaths as age increased.

(d) The class 80+ covers all persons aged 80 and above. It would not be appropriate to continue with class widths of 10 as the numbers of people in the last few classes would be very small.

1. Use the table of female road-trauma deaths in Example 15 on the previous page.

(a) Construct a frequency histogram and polygon. (b) Comment on the shape of the data.

(c) Compare the graphs for male deaths and female deaths and write down two differences between them.

2. Here are the ages of employees at Burger Heaven:

(a) Organise the data in a grouped frequency table using classes 15–19, 20–24, etc. (b) Draw a frequency histogram.

(c) Comment on the shape of the data.

3. A group of students were surveyed

on the number of mobile phone calls made in a 4-week period. The results are shown in the histogram.

(a) How many students were surveyed?

(b) How many students made more than 30 calls?

(c) What percentage of students made less than 20 calls? (d) Can you tell how many students

made exactly 40 calls? Why?

4. Julie’s Weight Loss Centre for Men advertised an introductory offer — ‘$10 for each

kilogram (or part thereof) you lose’. The offer was taken up by 40 men, and the weight (in kilograms) lost by each is recorded below:

(a) Draw up a frequency distribution table using class intervals 0–2, 2–4, etc.

18 19 18 17 20 20 24 15 24 19

15 35 15 24 22 19 15 17 23 29

15 40 21 17 20 22 23 21 24 23

22 16 36 15 16 24 15 15 19 15

34 19 45 20 15 21 24 27 19 33

48 37 15 30 15 34 15 29 25 15

34 24 16 18 30 21 26 31 16 25

18 18 15 22 25 22 15 40 34 43

49 21 21 35 16 22 15 25 44 23

17 24 32 18 20 32 28 22 16 45

1.8 2.5 4.3 6.5 2.7 4.6 11.0 10.8 0.3 8.2

2.1 3.8 4.4 5.8 1.6 5.9 7.6 9.3 4.8 3.4

12.5 4.6 2.5 6.9 7.5 3.5 4.8 12.2 4.3 3.7

0.0 0.9 2.6 7.8 4.9 7.4 9.8 10.4 2.6 8.2

Exercise 4-09:

Frequency histograms and polygons

No. of phone calls

23 25 27 29 more

Frequency

8 7 6 5 4 3 2 1 0

fewer

Mobile phone calls

than 20 21

(34)

(b) Use your table to draw a frequency histogram.

(c) Use your histogram to comment on the effectiveness of the program. (d) How much money did Julie make from the 40 men who took up this offer?

5. Forty primary school students went on an excursion. Their ages were:

(a) Draw up a frequency distribution table using scores 5, 6, 7, …, 12. (b) Use your table to construct a frequency histogram and polygon.

6. The sentences in a magazine article were examined, and the number of words in each

sentence was counted. The results are listed below:

(a) Is this data discrete or continuous?

(b) Using class boundaries 0–9, 10–19, etc., draw up a frequency distribution table. (c) Construct a frequency histogram and polygon.

7. The masses (in grams) of large eggs, advertised as 60 g eggs, were found to be:

(a) Construct a frequency distribution table for the data. (b) Draw a frequency polygon.

(c) Would you say that the advertising is misleading? Justify your answer.

8. A modelling agency required its models to be 165 cm or taller. The heights of

prospective new models were measured (in centimetres) to be:

(a) How many new models would be rejected on height?

(b) Using class intervals of 5 cm, draw up a frequency distribution table. (c) Draw a frequency histogram and polygon for the data.

7 9 9 12 11 6 5 12 8 6

11 7 5 9 7 10 5 9 5 5

8 9 5 10 10 7 11 8 5 8

9 10 7 6 7 9 9 8 10 11

27 22 15 8 14 7 9 25 15 17 5 24

9 11 22 8 5 15 25 28 10 21 24 13

9 14 18 11 9 23 15 19 10 8 14 17

57 58 61 59 62 59 59 56 60 64 58 58

56 59 64 57 60 62 58 60 64 57 61 58

59 57 64 58 59 57 58 64 60 58 60 57

61 64 58 60 61 62 62 58 60 61 57 58

168 182 187 185 178 176 174 163 164 183 167 166

165 189 185 165 162 173 164 173 178 187 155 178

172 186 189 175 176 180 170 168 165 170 184 172

180 182 168 182 182 168 184 164 180 169 186 171

A

TTACKYOUR WEAKAREAS

Make a note to revise or relearn your weak areas in more detail. Ask your teacher or see how other textbooks explain the topic. Most of your study time should be spent attacking your weak areas and filling the gaps in your Maths knowledge. Use your summaries for ‘general revision’, but spend longer study periods on your problem areas. Don’t spend much time on work you already know well, unless you need a confidence boost!

(35)

DOT PLOTS

A dot plot is a simplified type of histogram and is a good way of presenting a small amount of data. It is easy to see where clusters of scores occur as well as to identify any outliers. Each score is represented by a symbol, usually a dot.

Example 16

This dot plot represents the body temperatures (in °C) of hospital patients, where normal body temperature is about 37°C:

(a) How many patients are represented here? (b) What is the outlier value?

(c) Are there any clusters?

(d) What type of patient would have the outlier temperature?

Solution

(a) There are 10 patients represented (each dot represents 1 patient). (b) The outlier is 42°C.

(c) Scores are clustered around 37°C to 39°C.

(d) The patient with a temperature of 42°C would feel very hot and would be quite ill as their temperature is 5°C above normal.

1. A Year 11 Maths class was surveyed to find out how many hours each student spent on

Maths homework each week. The results are shown below.

(a) Draw a dot plot for this data.

(b) Are there any outliers? Comment on possible reasons for these. (c) Are there any clusters or gaps?

7 6 8 9 5 10 6 9 9 0 9 8

18 7 5 3 4 9 6 7 8 10 7 8

36 37 38 39 40 41 42 °C

Just for the record

O

UTLIERS CANBE IMPORTANT

You probably know about the hole in our ozone layer over Antarctica caused by CFCs (chlorofluorocarbons), but what you may not know is that scientists detected but disregarded the low levels of ozone for several years, treating them as outliers. The Nimbus 7 satellite had been detecting very low levels of ozone since 1976, but it was not until 1985 that the importance of these outliers was discovered. The damage to our ozone layer went undetected and untreated for 9 years because these values were disregarded.

An outlier can often be the most important score in a data set!

(36)

2. This dot plot gives the shoe sizes of a group of Year 11 students.

(a) How many students are represented?

(b) Would you say this distribution has an outlier? Why?

3. The Bennelong Bears football team scored the following numbers of goals per game in

its first season:

(a) Display the data in a dot plot. (b) Comment on any unusual features.

References

Related documents

The Trends in International Mathematics and Science Study (TIMSS) uses five dimen- sions related to school climate: class learning environment, discipline, safety, absence of

Funding: Black Butte Ranch pays full coost of the vanpool and hired VPSI to provide operation and administra- tive support.. VPSI provided (and continues to provide) the

Under anaerobic conditions, the ldhL strain with exogenous xylose isomerase and xylulokinase genes expressed and the endogenous xylose reductase and xylitol dehydrogenase genes

The most capable of agentless products not only use mechanisms such as these to collect and aggregate data from links in the chain of service delivery, but also correlate the

TLC measured by both plethysmography and HRCT correlated significantly with indices of airflow obstruction (forced expiratory volume in 1 second/forced vital capacity

Thereafter it was investigated to what extent the actor attributes and social network characteristics relate to each other and influenced the knowledge transfer on

The United States has devoted additional than $132 billion in diverse forms of assist to Afghanistan over the precedent decade and a partly, from constructing

As sharp variation points provide rich information to analyze the fluctuation and sharp variations of traffic flow, it is used in a new method developed in this thesis to guide