STATISTICS STAT 1010
Centre for Professional Development and Lifelong Learning UNIVERSITY OF MAURITIUS
STATISTICS STAT 1010
SUPPORT MATERIALS
AUTHORS
STATISTICS – STAT 1010 was prepared for the Centre for Professional Development and Lifelong Learning, University of Mauritius. The Pro-Vice Chancellor – Teaching and Learning - acknowledges the contribution of the following course team members:
Dr V Jowaheer - Faculty of Science
Mr S Kalasopatan - Faculty of Social Studies and Humanities
Dr F Khodabacus - Faculty of Engineering
Assoc. Prof M J Pochun
Dr A Ruggoo - Faculty of Agriculture
Assoc. Prof P Veerapen - Faculty of Social Studies and Humanities
August 2008
All rights reserved. No part of this work may be reproduced in any form, without the written permission from the University of Mauritius, Réduit, Mauritius.
TABLE OF CONTENTS
STUDY GUIDE:-
Support Materials How to Proceed
How to Use the Support Materials How to Use the Textbook
Suggested Coursework Suggested Course Map
Final Examination
Suggested Grading Scheme
Unit 1 Introduction
Unit 2 Data Collection l, OJ, Chapter 16
Unit 3 Organisation and Presentation of Data l, OJ, Chapter 1 Unit 4 Organisation and Presentation of Data ll, OJ, Chapter 2 Unit 5 Organisation and Presentation of Data lll, OJ, Chapter 3 Unit 6 Measures of Central Tendency, OJ, Chapter 5
Unit 7 Measures of Dispersion, OJ, Chapter 9 Unit 8 Time Series Analysis, OJ, Chapters 6 and 7 Unit 9 Index Numbers, OJ, Chapter 8
Unit 10 Probability, OJ, Chapter 11
Unit 11 Data Collection ll, OJ, Chapters 15 and 16
Unit 12 Linear Relationship Between Variables – l: Correlation, OJ, Chapter 23 Unit 13 Linear Relationship Between Variables – ll: Regression, OJ, Chapter 23
STUDY GUIDE:
Welcome to STATISTICS. This is a one-semester course designed to cover first-year syllabuses of programmes of studies in the various faculties. The course provides an introduction to Statistics and the manual is designed to guide you through the course.
The Study Guide contains important information on materials and procedures. We suggest that you spend some time to read it, and to familiarise yourself with what you will have to do to complete STATISTICS successfully. The suggested course map, p: (vii), indicates what you should be working one each week.
If you have any questions arising from the instructions in the support materials, do not hesitate to contact your tutor.
SUPPORT MATERIALS AND TEXTBOOK
This document can be used as SUPPORT MATERIALS. The module also include the following TEXTBOOK:
Owen, Frank and Jones, Ron. (4th Edition) Statistics. Pitman. The textbook will be referred to as (OJ) in the Support Materials.
HOW TO PROCEED?
You should begin by taking a look at the TABLE OF CONTENTS in both the SUPPORT MATERIALS and the TEXTBOOK. These tables provide you with a framework for the entire course because they outline the organisation and structure of the material you will be covering. You will notice that the Units in the support materials do not follow the same sequence as the Chapters in the textbook. However, in the Support Materials, you will be referred to the relevant parts of the various Chapters.
The guidelines that follow are designed to help you most effectively work your way through the materials in this course. So, before you begin Unit 1 of the course, read the guidelines below carefully.
The Support Materials provide you with study plans and commentaries on the textbook presentation. They introduce additional concepts and information, advise you to do particular practice activities, offer clarification, examples and solutions.
Take a few minutes now to glance through the entire manual to get an idea of its structure. Notice that the format to deal with each unit is fairly consistent throughout the support material. For example, each unit begins with a UNIT STRUCTURE, an OVERVIEW and a list of LEARNING OBJECTIVES.
The UNIT STRUCTURE and OVERVIEW identify the main topics in the Unit. You should begin your study of each unit by reading this brief introduction. You should then read the LEARNING OBJECTIVES. The importance of these objectives cannot be overstated. They identify the knowledge and skills you will have acquired once you have successfully completed the study of a particular unit. Keep the objectives in mind as you read the corresponding content in your textbook. The learning objectives also provide a useful guide for review.
How to Use the Textbook?
Studying requires that you take an active role. Therefore, use your textbook actively, recognising it for the useful “learning tool” that it is. You should be studying pencil in hand, circling an important concept, and making summary notes to crystallise your understanding. You may like to highlight or underline the key ideas. If so, remember that a rule of thumb is one quarter to one third of the material. If you overhighlight, you may be extracting more than the key ideas.
Suggested Coursework
STATISTICS is designed to be completed in one semester, with weeks one to thirteen for instruction, weeks fourteen and fifteen for review and with the final examination as scheduled by Faculty. Although you are free to work at your own pace, you should try to distribute your workload according to the suggested course map on page (vii).
In order to complete STATISTICS you must read the instructional units. Generally, each of these will direct you to study specific Chapters in (OJ) though some of the units will be almost self-contained.
The objectives are tied to particular sections of the textbook. Review these objectives when you have completed a section to confirm that you have achieved the learning goals for it. If you realise that you are not clear about some aspects of the section, go back and redo relevant readings and exercises. It is important to build your understanding of Statistics patiently and thoroughly.
reinforce the learning objectives for each part of the course. Thinking through these activities will train you in the skills you need for the examination and for later applications of Statistics.
SUGGESTED COURSE MAP
Week Unit Topic Tutorial
1 1 Introduction 1
2 2 Data Collection I 2
3 3 4
• Organisation & Presentation of Data I • Organisation & Presentation of Data II
3
4 5 Organisation & Presentation of Data III 4
5 6 Measures of Central Tendency 5
6 7 Measures of Dispersion 6
7 8 Time Series Analysis 7
8
CLASS TEST*
9 9 Index Numbers 8
10 10 Probability 9
11 11 Data Collection II 10
12 12 Relationship between Variables I 11
13 13 Relationship between Variables II 12
14
15 REVISION
*Week/date for Class Test to be confirmed during the semester.
♦ Scheduled and administered by the Registrar’s Office ♦ A two-hour paper at the end of the Semester.
SUGGESTED GRADING SCHEME
Invigilated class test : 30%
Final Examination : 70%
UNIT 1 INTRODUCTION Unit Structure
1.1 What Is Statistics?
1.2 Definition and Measurement 1.3 Nature of Statistical Data 1.4 A Last Word
1.1 WHAT IS STATISTICS?
In various aspects of life, we come across many questions whose answers are not immediately and accurately available. Very often, there is insufficient information or lack of knowledge or no information available: there may exist varying degrees of uncertainty with regards to possible answers for these questions.
For example, we may ask ourselves many questions:- Shall we have enough rainfall this summer? How many cyclones shall we have during this summer? What is the pass rate for the B.Sc. Management or B.Sc. Economics course? What is the level of unemployment in Mauritius? Are University students satisfied with the canteen facilities available on campus? Are our industries able to compete on the world market?
Statistics is that branch of knowledge which provides us with tools/techniques to answer, at least to some extent, the above questions and many more such questions. To do so, we need, on the one hand, a minimum level of knowledge (i.e. understanding) and, on the other, information/data already available. If information/data are not available, then the first step is to collect them. Statistics deals thoroughly with the collection of data/information with a prime objective in mind: the quality of data collected should be of high standard. The data collected constitute the raw materials of any statistical analysis.
Thus, if we would like to know whether university students are satisfied with available canteen facilities, we may choose to collect the views of all students or of a small percentage of students, provided that this small percentage of students is selected in an unbiased manner
and is representative of the whole student body. Whatever may be the approach, much can be learnt from the data, provided we are sufficiently careful about what is being collected and about how data are being collected.
Once data are collected, there is the need to organise and present them in a manner calculated to reveal their salient features and any underlying pattern. Thus, the organisation and presentation of data are most important for the interpretation of data. This interpretation may be very basic and sometimes rather advanced.
We then have to analyse the data to uncover with precision patterns which exist in the data set and relationships unheard of previously. Uncertainties can then be handled with some precision and can even be assessed, using probability theory and related ideas. Sophisticated analysis of the data can be carried out if necessary; statistical models are developed.
Finally, comprehensive reports together with conclusions and recommendations are produced so that, in turn, ultimately the relevant authorities may take appropriate policy decisions. Statistical data and their statistical analysis are essential ingredients for decision making in almost any sphere of life: for government, business, community and individuals.
The above definition of Statistics can be summarised by the following diagrammatic representation. Our starting point is always the need to study a specific issue/problem/phenomenon concerning people/society at large (e.g. students’ problems) or nature (e.g. the weather) or any interaction between people and nature (e.g. agriculture).
Figure 1.1 : Diagrammatic Definition of Statistics
This process continues indefinitely since implementation of the recommendations will ultimately create a new situation and most probably a better understanding of the problem/issue/phenomenon under consideration. Then, at a later stage, the need for new information/data will be felt, if only to assess the impact of these same recommendations over time.
In a similar manner, scientific experiments/observations are carried out to help us to study and understand the world around us and to develop science and technology in general. The scientific data collected are then analysed accordingly. Statistics indeed plays a key role not only in the collection of scientific data but in the very development of scientific knowledge. So much so that Professor A.F.M. Smith of Imperial college of Science, Technology and Medicine, UK defines Statistics as “The Science of doing Science” (1996).
People Nature Organisation and Presentation of Data Analysis of Data Conclusion and Recommendation Collection of Data
Let us give some thought to one of the questions raised in the previous section: What is the level of unemployment in Mauritius?
Statisticians, scientists and many other people take much time to measure a particular variable or set of variables. It is relatively easy to measure the length of a table; but it is entirely a different matter to ‘measure’ the level of unemployment in Mauritius. To be able to do so, we must know with precision what the term ‘unemployment’ means not only in broad general terms but in precise operational terms. In other words, ‘unemployment’ must be precisely defined before it can be ‘measured’.
Thus, how do we define an unemployed person? The Central Statistical Office would define someone as unemployed if that person was not employed and was available for work and looking for work. But then, this raises other questions. For how long was the person not employed - for a day, for a week, ....? Is a full-time student who holds no job unemployed? Is a worker on strike unemployed?
In this introduction, we are not going to provide answers to all these questions. But they drive home the point that, in Statistics, precise definition of a variable is most important not only in broad general terms but in operational terms as mentioned above.
The definition must be such that measurement is then possible. Sometimes good theoretical definition of a variable does not lend itself easily to measurement; it has to be adapted from a practical point of view so that the measurement is possible.
Furthermore, definitions of a given variable may vary over time and methods of measurement may vary too! Hence particular care must be given to the problems of definition and measurement of a given variable so that these measurements are comparable over time and space as well.
1.3 NATURE OF STATISTICAL DATA
The discussion in the two preceding sections will most probably help us to become aware of the fact that available data must be used with some caution.
For this reason, data are categorised in two ways: primary data and secondary data. Primary data are data which have been collected for a specific purpose and are being used for that purpose. That would include, amongst others, data collected by someone by means of a sample survey or an experiment with some clearly defined objectives in mind.
Secondary data are data available in many statistical publications produced by the Central Statistical Office and by other institutions whether governmental or from the private sector. They include data which have been collected for a specific purpose but which are being used for various other studies. Thus, government departments may collect data for administrative reasons, not gathered specifically for the particular study which is being carried out.
It is obvious that secondary data must be used with much caution. To start with, the sources of secondary data must be known. It helps to ascertain that the data are genuine and that they have been produced by competent institutions having the required expertise. Various valuable pieces of information would then be available:
(i) the definitions of variables used and problems of measurement involved;
(ii) the method of data collection, for this will give us an idea of the degree of accuracy of the available data;
(iii) the date of collection which would be relevant with respect to possible change in definition used over time and with respect to up-to-date character of the data collected as well;
(iv) the units of measurement used; for example the average monthly salary of a Mauritian in rupees is not comparable to the average monthly salary of an English person in
pounds sterling. Similarly, the month as a measure of time is not constant since each month does not have the same number of working days.
Finally, it may be appropriate to note that data may be collected on, for example, the whole student body or on a fraction of the student body, as mentioned in section 1.1. Sometimes a statistical investigation is carried out on the entire group of units/individuals about which information is wanted; such an entire group is known as the statistical population. We have thus the population of students, population of cattle, population of buildings etc. A sample, however, is a part of the population used to gain information, which, after proper statistical analysis, can be generalised to the whole population. More will be said on samples and different types of statistical investigations in other units.
1.4 A LAST WORD
Statistics is a fast developing subject, having a wide range of applications: biometrics, econometrics, psychometrics, statistical quality control, etc. Over the last sixty years or so, there has been a constant flow of new ideas in Applied Statistics as well as in Theoretical Statistics and probability. So much so that different schools of thought have emerged in Statistics. This is a healthy sign in a developing subject.
For our purposes, we may say, in simple terms, that the objective of Statistics is the understanding of information contained in data characterised mainly by uncertainty. That understanding demands one essential ingredient on your part: common sense! Everything else would be straightforward. In fact, the psychologist S. S. Stevens referred to Statistics as
“... a straightforward discipline designed to amplify the power of common sense in the discernment of order amid complexity.”
UNIT 2 DATA COLLECTION I
Unit Structure
2.0 Overview
2.1 Learning Objectives 2.2 The Collection of Data I
2.2.1 Introduction
2.2.1 Quantitative v/s Qualitative Approach
2.3 Routine Data Collection(as byproduct of Administrative Procedures) v/s Special Investigations
2.4 Censuses v/s Sample Surveys
2.4.1 Introduction
2.4.2 Comparative Advantages of Sample Surveys over Censuses 2.4.3 Sources of Errors in Censuses and Sample Surveys
2.4.3.1 Sampling Errors 2.4.3.2 Non Sampling Errors
2.5 Mode of Administration of a Questionnaire 2.5.1 Face to Face Interviewing
2.5.2 The Postal Method 2.5.3 The Telephone Method 2.6 Stages in a Sample Survey 2.7 Summary
2.0 OVERVIEW
This unit introduces you to the various approaches to data collection, the basic principles and various ways of collecting quantitative data. Comparisons of the relative strengths and weaknesses of alternative methods are included. Data collection is covered in OJ in Chapters 15 and 16. However, note that the material in OJ on sampling (Chapter 15 ) is not considered appropriate for this course. You may find Chapter 16 of OJ useful supplementary reading to the material in this manual.
2.1 LEARNING OBJECTIVES
When you have successfully completed this Unit, you should be able to do the following:
1. Identify the various methods of collecting quantitative data
2. Differentiate between censuses and sample surveys as means of collecting quantitative data
3. Explain the various ways of administering a survey questionnaire and analyse their relative strengths and weaknesses
4. Identify the various stages involved in a sample survey
2.2 THE COLLECTION OF DATA I
2.2.1 Introduction
In Unit 1, the importance of statistical data for informed decision making and planning was mentioned. However, data do not just exist. They have to be collected. And data collection can be a complex and technical task. It can also be very costly and time consuming. The coverage of data collection in this course therefore is not intended to equip you to embark on a complex and large scale data collection exercise on your own (much further study will be required for this!) but rather to provide you with a basic appreciation of the general principles of data collection, the various stages involved, the dangers to avoid and the precautions to take. Additionally, this unit should encourage you to examine published data with a more critical mind, to appreciate their limitations as well as their strengths and to exercise caution in their use.
2.2.2 Quantitative v/s Qualitative Approach
There are two broad approaches to collecting data: the qualitative approach and the quantitative one. Each of these approaches has its merits and limitations. The distinguishing
procedures). This makes responses comparable and allows them to be aggregated so as to produce percentages, rates, averages etc. Hence it is possible, for example, to estimate the proportion in a given population who possesses a certain characteristic or to quantify the extent to which specific views or attitudes are held. Sample surveys using standard questionnaires and uniform field procedures represent a major example of the quantitative approach. By uniform field procedures we mean, for example that questionnaires administered to all respondents in the same way, say by face to face interview, that interviewers are trained to ask the questions and to deal with any problems arising on the field in exactly the same way. The great advantage of the quantitative approach is that the results are quantifiable and generalisable.
In the qualitative approach, instruments and procedures are more flexible and informal. There is usually no standard questionnaire: the ordering of questions may vary and the phrasing of questions is not rigid. Examples of the qualitative approach are the key informant approach (where persons having specialised knowledge of the subject of interest, by virtue of their occupations, are interviewed) and the focus group approach (where people are interviewed in groups, in a rather informal way). Further examples (by no means an exhaustive list) of the qualitative approach are participant observation and case studies. Certain qualitative approaches have the advantages of low cost, rapidity, depth but the emphasis with qualitative approaches is not on quantitative information. Thus, for example, interviews of trade union representatives and focus groups of a small number of workers may indicate that the majority of workers are against a proposed measure and that men are more strongly opposed than women. However, it would not be possible, with any confidence, to generalise these conclusions to all workers and still less quote percentages of those for and against the measure.
2.3 ROUTINE DATA COLLECTION (AS BY PRODUCT OF ADMINISTRATIVE PROCEDURES) V/S SPECIAL INVESTIGATIONS
Often there exist opportunities for collecting quantitative data in the course of administrative control procedures. For example, every person entering Mauritius has to go through the immigration authorities, as is the practice in all other countries. This provides an opportunity for collecting information on tourist arrivals, which is in fact done through the well known disembarkation card. Similarly, anyone importing goods into the country has to go through customs for control and taxation purposes, but this also provides an opportunity for collecting data on imports such as the type of product, the origin, etc.
Collection of data as a by-product of administrative control is generally inexpensive. Often the same forms or schedules are used for both administrative and statistical purposes. However, much care must be taken in designing these forms or schedules, as what is suitable for administrative purposes may not always be relevant for statistical purposes. In particular, attention must be given to the definitions of terms used. Also, care must be taken not to burden the administration too much by making the forms too long or complicated. Sometimes separate forms for statistical purposes are necessary.
It is not always possible to obtain the data one needs as a by-product of administrative procedures. It then becomes necessary to conduct special, dedicated investigations, with the specific purpose of collecting the required data. This process can be quite costly, but the importance and potential use of the data may well justify the expenditure. Two alternative approaches are possible. The investigation may involve collecting data in respect of every member of the population of interest (i.e. a Census). Alternatively, it may involve collecting data in respect of a sample of the population. We discuss these two approaches next.
2.4 CENSUSES V/S SAMPLE SURVEYS
2.4.1 Introduction
A census involves the collection of data in respect of every member of a population of interest. Familiar examples of censuses are the Housing and Population Censuses carried out in Mauritius by the Central Statistical Office every ten years.
A sample survey involves the collection of data in respect of only some of the members of the target population but with the purpose of learning about the whole of that population. Examples of important national sample surveys carried out by the Central Statistical Office in Mauritius are the Family Budget Sample Survey and the Labour Force Sample Survey.
This idea of examining a part to learn about the whole, which is what a sample survey is all about, is familiar and intuitively appealing, and we apply it in our every day lives, often unwittingly. For example, when buying grain for the household, we usually examine a handful to check the quality before making our purchase. Of course, in order for observations on the part to provide a valid basis for conclusions about the whole, certain precautions must be taken in the selection of that part. We simply cannot use any part. We should ensure that every member of the population has a fair chance of selection and this is achieved by a method of selection which we call random selection. We should also aim at drawing a sample that is likely to be representative of the whole population. We shall not pursue the matter further here, but in Unit 11, we shall discuss the basic principles involved in selecting valid samples.
2.4.2 Comparative Advantages of Sample Surveys over Censuses
As a means of collecting quantitative data, the sample survey has a number of advantages over the census approach.
When the population of interest is large, a Census becomes a very costly exercise. For a small population, a Census could be considered as the cost may be moderate. For large populations, it is avoided. Nevertheless, for certain purposes, although the population may be very large, a Census is absolutely necessary and a sample survey would not be appropriate. In such cases, Censuses are carried out at infrequent intervals. The Population and Housing Censuses are carried out in Mauritius at 10 year intervals.
(ii) Sample surveys are less time consuming and hence, results are more timely.
For a Census, because of the sheer scale of the data collection, the processing of data takes a lot of time. Not so long ago, data from Population Censuses used to take years to process, even in developed countries, at times dragging on almost to the next census. Under these circumstances, the results from the census were largely obsolete by the time that they were out. With the advent of electronic processing, things have improved a lot but it still takes a number of months to process data from a population or housing census.
(iii) In a sample survey, because only a small portion of the population is involved, that portion can be studied intensively.
In investigations of human populations, one important consideration is the need to limit the burden on the respondent, i.e. on the individual contacted to provide the data. In a census of a large population therefore, the questions must be simple and factual and their number must be kept small because many people would have the burden of answering the questions. In a sample survey, since only relatively fewer people are involved, we can ask more questions and the questions can be more complex if necessary.
As an illustration of the above, it may be noted that typically, the Population Census carried out in Mauritius involves 25 to 30 simple factual questions. However, there are sample surveys that have been carried out by the University of Mauritius involving at times over 150 questions, many of them complex ones, often dealing with attitudes, opinions and perceptions.
(iv) In certain contexts, data collection may involve destruction of the individual from whom the data are collected, in which case, a census is then out of question.
For example, studying the life of electric bulbs would involve lighting them until they burn out. Hence, if a bulb manufacturer used a census to study the life of his bulbs, he would soon be left with no bulbs to sell!
In spite of the above, censuses are sometimes necessary because of the level of detail required. For example, for local planning purposes, detailed information about all towns and villages of the country are required. A national sample survey will not contain enough members of each town or village for accurate information in respect of each of them to be obtained. Indeed, certain towns or villages may not even appear in the sample at all.
Censuses may also provide the sampling frame for future sample surveys.
2.4.3 Sources of Errors in Censuses and Sample Surveys
2.4.3.1 Sampling Errors
Suppose that we want to find out the average weight of all students of the University of Mauritius, and we do this by selecting a sample of say 200 students in accordance with the principles of scientific sampling (to be discussed in Unit 11). We then find the average weight of our sample of students. What we get is an estimate. We may expect this estimate to be close to the true average weight of all students but we cannot expect it to be exactly equal,
except by coincidence. Differences between the estimates based on samples and the true population values are called sampling errors.
If we were to start anew and repeat the process, i.e. draw a sample again (putting back the 200 students), we will most likely have a different sample, although there may well be some students who appeared in the first sample as well. If we now compute the average weight of the sampled students again, we expect the result to be different from the first time, except for a coincidence. Such differences are called sampling variation.
Thus the estimates based on samples are not in general exactly equal to the true population values and they also vary from one sample to another. However, if the size of the sample is sufficiently large, we can be reasonably certain that the estimate will be close to the true population value. The theory of sampling (which is beyond the scope of this course) gives us this guarantee. This guarantee gives sampling its power and makes it a viable alternative to complete enumeration (census). Thus national surveys using samples of between 1000 and 3000 individuals are carried out in many countries (with populations of several millions). Actually, sampling theory enables one to estimate the required sample size for a given degree of precision. We shall not go into this but please note that the common belief that the larger the population, the larger the required sample, is not quite true. In fact, the sample size hardly depends on the population size, even when the population is large.
Of course, because censuses involve complete enumeration, they are not subject to sampling errors.
2.4.3.2 Non-Sampling Errors
It is commonly believed that because a census is an exhaustive exercise and is therefore not subject to sampling errors, it must be more reliable than a sample survey. This is not necessarily the case. Both the census and the sample survey are subject to other errors called
(ii) non-response: non-response occurs when people contacted are not at home or refuse
to participate in the survey. Non response is a serious problem because people who refuse to participate may have different opinions on aspects pertinent to the subject of the survey from those who cooperate. For example, suppose we carry out a survey on leisure and we have a lot of non-response. It is quite possible that those who did not respond are very busy and have little leisure time. Therefore conclusions based on those who responded would be misleading.
(iii) interviewer bias: interviewer bias occurs when the responses obtained are influenced
by the interviewers. This may happen in a number of ways: an unskilled interviewer may by his/her intonation or facial expression during interviewing, by the way he or she tries to clarify a question which has not been understood or probes for more information in case of an ambiguous or incomplete answer, influence the respondent to answer in a particular way. It may also occur by misinterpretation and misrecording of answers, caused by the interviewer’s preconceptions.
(iv) coder bias: coder bias may occur when answers to questions which have been
recorded verbatim by the interviewer are coded in the office for the purposes of analysis. Interpretation given to answers and the codes assigned as a result may be influenced by the coders’ preconceptions.
All of these errors can occur with a census, as well as with a sample survey. However, because a sample survey involves only a small number of respondents, the efforts made to minimise non-sampling errors can be more intensive than they could be for a census.
2.5 MODE OF ADMINISTRATION OF A QUESTIONNAIRE
The process of implementing a questionnaire designed for data collection, i.e. of getting the questionnaire completed, is called administering the questionnaire. There are four basic ways of doing this:
(i) by observation
(ii) by face to face interviewing using interviewers
(iii) by mailing the questionnaire to all individuals from whom the data are to be collected and asking them to complete and return it to the investigator
(iv) by interviewing individuals by telephone.
The scope for collecting information by observation is rather limited as the method requires that the phenomenon being studied be observable. Some interesting possibilities do nevertheless exist. It is thus possible to study the intensity of traffic flow by standing at a particular spot and observing the number of vehicles that go by. However, in the discussion which follows, we shall restrict ourselves to the other three modes. Choosing among these alternative modes requires a thorough knowledge of their relative strengths and limitations.
2.5.1 Face to Face Interviewing
The face to face method of administering a questionnaire has a number of advantages:
(i) The response rate tends to be high, as possibly people find it hard to refuse when the interviewer is standing right in front of them. Several sample surveys carried out by the University of Mauritius using face to face interviewing have easily reached 95% response.
(ii) The face to face approach, because it uses trained interviewers, makes it possible to administer a complex questionnaire (e.g. a questionnaire which contains attitude and opinion questions and a lot of skip instructions). When the questionnaire is self-administered (i.e. filled by the respondent) as in a postal survey, the questionnaire must be kept simple.
(iii) The face to face method provides an opportunity for the interviewer to find out the reasons for any reticence on the part of the person contacted and to persuade the
(iv) The face to face method has practically no restrictions on the type of population that can be investigated. With the face to face method, the interviewer reads out the questions and records the answers. It is therefore not necessary for the respondent to be literate as is the case with the postal method. The telephone interview method, however, requires that the respondent be reachable by phone.
(v) With the face to face method, there is more control over the identity of the respondent. In the postal survey, the person to whom the questionnaire is addressed may decide to pass over the questionnaire to someone else to fill in his/her place.
(vi) The face to face method can be used for practically any topic of enquiry. Some people believe that for sensitive or embarassing topics, postal surveys are better because of the relative anonymity. However experience shows that, given trained interviewers and the appropriate precautions, the face to face method works very well even for sensitive topics. Moreover, it is difficult to see why people would bother to answer an embarrassing questionnaire sent by mail.
(vii) The face to face method provides an opportunity for clarifying questions which the respondent finds to be unclear.
(viii) The face to face method provides an opportunity for probing (i.e. asking for additional information) if the answer given by the respondent is incomplete or ambiguous.
(ix) With the face to face method, the interviewer can ensure that the sequence of the questions as it appears on the questionnaire is respected. This is usually very important. With the postal method, respondents have the opportunity to see all the questions before answering any of them.
The great disadvantage of the face to face method is that it is costly, much more costly than either the postal method or the telephone interview method. It also requires trained interviewers.
2.5.2 The Postal Method
The main advantage of the postal method (also called the mail method) of administering a questionnaire is its relatively low cost. The cost, it must be noted however, is not limited to the initial cost of mailing out the questionnaires: usually reminders have to be sent out and sometimes there are follow-up phone calls and even personal visits which raise up the cost.
Also the postal method does not require trained interviewers.
The postal method has a number of disadvantages, relative to the face to face method:-
(i) The response rate is usually low, often of the order of 30 to 40%, if not less. This is a very serious disadvantage.
(ii) The questionnaire must be kept simple.
(iii) There is less opportunity to persuade people who are reticent to answer the questions. Follow-up by phone is a possibility but it is not as effective as the face to face presence of an interviewer.
(iv) Respondents can see all the questions before they answer any. This is usually not desirable.
(v) The method is of course restricted to a target population that is literate.
(vi) It is important that the information obtained relates to the person selected and not someone else. However, with the postal method, control over who actually answers the questions is difficult. The person to whom the questionnaire is addressed may pass it on to another family member or a friend for completion.
(vii) If a respondent finds a question unclear, he or she may ignore it or give an irrelevant answer. There is no opportunity to detect that a respondent has misunderstood a question as with the face to face method.
(viii) If the answer to a question is incomplete or ambiguous, there is no opportunity for probing as in the case of the face to face method.
2.5.3 The Telephone Method
In terms of advantages and disadvantages, the telephone method is intermediate between the face to face method and the postal method in many respects:
(i) The telephone method is less costly than the face to face method. However, it is generally more costly than the postal method.
(ii) The telephone method does require trained interviewers. However travel costs and travel time are eliminated. Interviewers spend all their time in an office doing interviews by phone. Each interviewer can thus do more interviews.
(iii) The questionnaire can be more complex than with the postal method but it is not advisable to attempt to administer a very long questionnaire by phone.
(iv) There is more control over the identity of the respondent than with the postal method, although less than with the face to face method.
(v) There is opportunity for persuading reticent respondents, although the face to face method is probably more effective in doing that.
(vi) There is opportunity for clarification if questions are not clear to respondents, although, here again, this is more difficult to do over the phone than face to face
(vii) There is opportunity for probing if respondents’ answers are incomplete or ambiguous but the same qualification as for (vi) applies.
(viii) The sequence of questions on the questionnaire can be respected. Respondents do not have the opportunity of knowing all questions appearing on the questionnaire before they start answering as in the case of the postal method.
The great disadvantage of the telephone method is that it can only be used when all members of the target population are reachable by phone.
2.6 STAGES IN A SAMPLE SURVEY
From earlier discussion, it is clear that sample surveys are an important means of collecting data.
We conclude this unit with a list of the main stages involved in a sample survey:
(i) Clear definition of the objective of the survey
A clear definition of the objective is fundamental for a survey. This will help make key decisions in the subsequent stages. It is not sufficient to just define a broad objective although one must start by that. It is necessary to break down this broad objective into finer objectives for subsequent operationalisation.
(ii) Clear definition of the target population
It is necessary to be clear about what constitutes the target population and the unit of investigation. For example, if we are doing a survey among the students of the University, do we wish to cover part time students or only full time ones. If we are doing a survey on consumer expenditure, is our unit of enquiry the household or the individual?
This will be dealt with in detail in Unit 11.
(v) Recruitment and training of field staff
The quality of data collected depends critically on the competence and dedication of the field staff involved. Therefore great care should be applied in the recruitment and training of such staff.
(vi) Pilot survey and pre-testing the questionnaire
A pilot survey consists of a rehearsal of all the survey procedures on a small number of respondents. This process is very important as it permits the identification of any flaws or weaknesses in the questionnaire, which can thus be remedied. It also provides a lot of information about field procedures e.g. whether the method of approaching the respondent is satisfactory, how long it takes to administer the questionnaire, how easy it is to locate the respondents, how many call backs are required on average, etc. This information helps to organise the full scale survey.
(vii) Conduct of interviews
This stage applies when face to face interviewing is used. We have mentioned the danger of interviewer bias before. Interviewers need to possess a variety of skills, ranging from approaching the respondent, establishing rapport, persuading respondents to cooperate, asking questions in a neutral manner and recording the answers correctly. Training and experience are important but supervision and control are also necessary.
(viii) Editing of completed questionnaires
Completed questionnaires may contain a number of problems, such as blanks (i.e. questions which have not been answered), ambiguous or irrelevant or inconsistent answers. Therefore, before the data are processed and analysed, it is necessary to screen the questionnaires for such problems and remedy them. It is advisable to have a first edit carried out on the field by the interviewer immediately after the interview, as any problems can then be remedied
immediately. A second edit need to be done by field supervisors to detect any mistakes that may have gone unnoticed by the interviewer. Further edits can be done in the office, including a computer edit stage.
(ix) Coding of answers where required for data entry
Where a questionnaire contains open ended questions i.e. questions where no pre-coded answers are proposed, the answers must be coded before processing. Such coding must be done carefully, ensuring that there is consistency both across and within coders i.e. the same code is used for similar answers by different coders or by the same coder on different occasions.
(x) Data Entry
The data collected must in general be captured on computer for eventual processing and analysis. At this stage, it must be ensured that no errors are made during the transfer.
(xi) Data Processing and analysis
The data processing and analysis are usually done with the help of appropriate statistical software. The objectives as defined in the very first stage will guide the analysis.
(xii) Interpretation and report writing
Care must be taken to ensure correct interpretation and the report writing needs to take into consideration the readers targeted.
The success of a survey depends on strict observance of precautions and meticulous attention to quality control at every stage.
2.7 SUMMARY
In this unit you have studied the various methods of collecting quantitative data, the differences between censuses and sample surveys, the various ways of administering a survey questionnaire (including their strengths and weaknesses) and the various stages of a sample survey.
UNIT 3 ORGANISATION AND PRESENTATION OF DATA I
Unit Structure
3.0 The Aim and Forms of Presenting Data 3.1 Overview
3.2 Learning Objectives
3.3 Organisation and Presentation of Data I
3.3.1 Data types
3.3.2` Tabulations
3.3.3 The Stem and Leaf Diagram 3.3.4 The Time Series
3.4 Secondary Statistics 3.5 Interpretation of Tables 3.6 Summary
3.0 THE AIM AND FORMS OF PRESENTING DATA
The aim of presenting figures is to communicate information. Therefore the type of presentation depends on the requirement and interests of the people receiving the information. Effectively, there are different types of presentation:
• Tabulation is covered in Unit 3.
• Chart and Diagram are covered in Unit 4. • Graph is covered in Unit 5.
3.1 OVERVIEW
Chapter 1 of your textbook (OJ) introduces the methods of arranging data in tabular form. The Textbook, as well as Unit 3 of this course manual, cover the key aspects of tabular presentation, the different types of tables and secondary statistics.
3.2 LEARNING OBJECTIVES
When you have successfully completed this Unit, you should be able to do the following:
1. Explain the importance of tabular presentation. 2. Identify the general principles of general tabulation. 3. Use the different types of tabulation.
4. Explain the importance of secondary statistics.
5. Use correctly the different secondary statistics to shed light on data.
6. Interpret information contained in tables and other forms of presentation.
3.3 ORGANISATION AND PRESENTATION OF DATA I
3.3.1 Data Types
Read pp 1- 4 of textbook (OJ).
Activity 1 Attempt Questions 1.2(a), 1.3 from textbook (OJ).
3.3.2 Tabulations
Read pp 4 - 13 of textbook (OJ).
3.3.2.1 Construction of tables
In the construction of tables, there are important guidelines to consider:
• Be sure what you want the table to show.
• All tables should have a title which is an explanatory title.
• The source of the data must be included (usually below the table) so that the original sources can be checked.
• Tables should be neat, tidy and you should use a good handwriting.
• To improve the quality of the table, make judicious use of different types of print. • Column and row headings should be brief but self-explanatory.
• Units of measurement should be shown clearly.
• Approximations and omissions can be explained in footnotes. However, footnotes should be kept to a strict minimum.
• Double lines or thick lines, can be used to break up a large table and make it easier to read.
• Two or three simple tables are often better than one very large table. • Sets of data which are to be compared should be close together.
• Secondary statistics, such as percentages and averages, should be beside the figures to which they relate.
• In the particular case of frequency tables, the construction of classes should be done judiciously, with particular attention to the class boundaries and class widths.
3.3.2.2 Class boundaries, class limits, class widths and class midpoints
Two important principles that must be observed when classifying data into categories are that the categories should be (i) mutually exclusive -- i.e. there must be no overlap among categories and (ii) the categories should be jointly exhaustive -- i.e. together the various categories should cover the whole range of the data. These principles apply to the construction of frequency tables.
Conversely, when studying frequency tables prepared by others, it is important to be clear about the boundaries of each class. The correct determination of class boundaries and hence class widths and class midpoints are, as you will discover later on, pertinent for the computation of the mean, median etc. These boundaries often are not what they seem to be at first sight.
• Class boundaries are the specific points along a measurement scale that separate adjoining classes. These can be different from the class limits.
We cannot give general rules for determining class boundaries. These have to be determined on a case by case basis, applying some common sense. The key is to try and figure out what are the smallest and largest values that would have been placed in each of the classes when the table was compiled. A consideration of whether the variable involved is discrete or continuous is also useful. Once the class boundaries have been correctly determined, the class width is obtained simply from the difference between the upper and the lower boundaries, whereas the class mid point is obtained by averaging the same boundaries.
Example 1
Table 3.1
Length of rod Number of rods
(nearest cm.)
11 - 15 5
16 - 20 12
21 - 25 23
etc. etc.
Since the lengths are given to the nearest centimetre, the boundaries of the first class extend from 10.5 to 15.4999..., which for practical purposes you can take as 10.5 to 15.5 so that the class width is 5 and the class midpoint is 13.0.
Example 2
Table 3.2
Hours of sunshine Number of days
0 and under 2 3
2 and under 4 15
4 and under 6 59
6 and under 8 92
etc. etc.
In this case, the class boundaries coincide with the class limits.
Example 3
Table 3.3
Number of calls made Number of subscribers
1-10 9 11-15 12 16-20 24 21-25 16 26-40 14 __ Total 75 __
Often, for analytical purposes, discrete variables are converted into continuous variables, i.e., rather than the variable taking countable number of values in a particular interval, we assume that in this interval, it takes all the possible values. Later, you will see why we do that, especially when we construct the histogram and compute the median.
So, for this case, the lower class boundary (l.c.b) and upper class boundary (u.c.b) of the second class, for example, are taken respectively as 10.5 and 15.5.
Similarly those of the third class are 15.5 and 20.5. Note that the class boundaries are obtained by subtracting and adding 0.5 respectively to the lower and upper class limits.
Note that u.c.b of a class coincides with l.c.b of the next class.
Class width = u.c.b - l.c.b
Class mid point = u.c.b + l.c.b = u.c.l + l.c.l 2 2
For example, the class width of the second class is 5 and the class midpoint is 13.
Example 4
Table 3.4
Age Number of club members
10-19 185
20-29 263
30-39 325
40-49 442
The boundaries of the first class are 10 and 20 respectively. Try to figure out why. Note that ‘age’ is usually quoted as ‘age at last birthday’.
3.3.2.3 Types of tables
There are many types of tables, as you may have noticed in publications, journals and magazines and in company reports.
Tables can be divided into
• Frequency tables
• Two-way tables or contingency tables or cross tabulation • General tables
• Examples of frequency tables are clearly illustrated in your textbook (OJ). An example of a two-way table is provided in this unit.
Two way tables
Example 5
Table 3.5
Student Marks in English and Maths
Student English Maths Student English Maths 1 35 40 11 47 49 2 32 41 12 61 54 3 41 50 13 63 61 4 31 27 14 58 73 5 65 66 15 72 82 6 42 66 16 69 76 7 58 72 17 58 69 8 71 80 18 55 54 9 82 58 19 48 58 10 64 59 20 50 44
The table above gives the marks in English and Mathematics gained by twenty students. Arrange these results into a two-way grouped frequency distribution.
Answer to Example 5
Table 3.6
Student Marks in English and Maths
Eng\Maths → ↓ 0-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 Total 0-20 A D 0 21-30 0 31-40 1 1 1 3 41-50 111 (3) 1 1 5 51-60 1 1 11 (2) 4 61-70 11 (2) 11 (2) 1 5 71-80 1 1 2 81-90 1 1 91-100 C B 0 Total 0 1 1 4 5 4 4 1 0 20 Source: University X, 1971
We observe a direct relationship between the scores in English and Maths as the diagonal moves from A to B. i.e. students doing well in Maths will do well in English. Had the rend been from C to D, then we would have said that an inverse relationship exists i.e. students scoring high marks in English do not necessarily score high marks in Maths.
Example 6
(a) According to the 1972 Census data published by the Central Statistical Office, out of a total of 246,000 males aged 15 and over, 169,000 were employed and 35,000 were unemployed. The remainder were inactive (i.e. were either retired, rentiers, homemakers, students, disabled or voluntarily idle). According to the same data, out of a total of 249, 000 females aged 15 and over, 44,000 were in employment, 7,000 were unemployed and the rest inactive.
The Central Statistical Office estimated that in 1986, there were 238,000 employed males and 106,000 employed females. The number of unemployed males and females were 37,000 and 18,000 respectively. The total number of males aged 15 and over was estimated at 339,000. The corresponding number of females was estimated at 343,000.
(Note : The data have been rounded to the nearest thousand).
- Tabulate the above information, including in your table any secondary statistics you consider useful for the interpretation of the data.
- Comment on the data, especially in relation to what they reflect on the role of women. What are the main social and economic implications?
Table 3.7
Population aged 15 and over by activity status and sex, Mauritius, 1972 - 1986
Year
and Sex 1972 † 1986 ‡
Male Female Male Female
Activity Status Number ('000) % Numbe r ('000) % Number ('000) % Number ('000) % Employed Unemploye d 169 35 68.7 14.2 44 7 17.7 2.8 238 37 70.2 10.9 106 18 30.9 5.2 Total Active 204 82.9 51 20.5 275 81.1 124 36.2 Inactive Total 42 246 17.1 100.0 198 249 79.5 100.0 64 339 18.9 100.0 219 343 63.8 100.0
Source : Central Statistical Office. † Census figures (74) ‡ Estimates (86)
The table reflects the considerable changes that have taken place between 1972 and 1986, in particular the large number of jobs created and the increased demand for female employment. The reduction in male unemployment probably implies a reduction in the social evils associated with unemployment : crime, violence, drug abuse, alcoholism, suicides etc. The greater participation of women in economic activity implies a changing role for women, showing a movement away from the traditional idea of home as the proper place for women. The greater employment of women also probably means increased prosperity for households but may be accompanied by difficulty in reconciling domestic and occupational responsibilities with the attendant consequences: strained relationships between spouses, neglect of children, etc. (The increased female unemployment is due not to low job creation but rather to the increased demand for jobs among women).
Activity 2
(a) In a recent survey, 7381 children were studied, of whom 219 attended private schools. 78% were the children of manual workers but only 40 of these children attended private schools.
1 out of every 9 children were the only child in the family (“enfant unique”); among private school attenders, the proportion of children from families with only child was 20.1%, of whom 7 were the children of manual workers. Of the families with only one child, 567 came from the manual class.
Arrange these figures in a table, calculating any secondary statistics you consider necessary and comment on the results.
(b) Attempt Questions 1.5, 1.6, 1.17 from textbook (OJ)
3.3.3 The Stem and Leaf Diagram
Read p 14 of textbook (OJ).
Activity 3 Attempt Questions 1.15 and 1.16 from textbook (OJ).
3.3.4 The Time Series
Read pp 14-15 of textbook (OJ).
3.4 SECONDARY STATISTICS
Secondary statistics are those simple calculations which are performed using given data, to help us in our interpretation. Some examples of secondary statistics are sub-totals, totals,
Ratio
A ratio is a relationship between two quantities expressed in a number of units to enable comparison.
Example 7
Three-quarters of the annual output of a factory consists of product A and one-quarter of product B. The ratio of the output is then 3:1. For every 3 units of A produced in a year, 1 unit of B is produced.
Percentage
"Percentage" (or percent) means per hundred. Therefore 50 per cent is 50 out of a hundred, that is, one half. The symbol for percentage is % . For example, to convert a fraction to a percentage, multiply by 100 : ¼ equals 25% (25 = ¼ x100)
3.5 INTERPRETATION OF TABLES
When data are presented, it is important that tables provide information clearly and at the same time make an impact. Interpretation is a matter of judgement based on knowledge of the terms used in the table. It is not enough that a figure or the result of calculation is accurate, the result has to be understood. There is little point in arriving at a correct answer to a calculation if it is not known what it means..
3.6 SUMMARY
In this unit, you have learnt about presentation of data using the different types of tables namely frequency tables, two way tables and general tabulation.
UNIT 4 ORGANISATION AND PRESENTATION OF DATA II
Unit Structure 4.0 Overview
4.1 Learning Objectives
4.2 Organisation and Presentation of Data II
4.2.1 Introduction
4.2.2 The Bar Chart 4.2.3 The Pie Chart
4.2.4 The Histogram
4.3 Summary
4.0 OVERVIEW
This unit introduces you to the methods of organising and presenting data, using various charts and diagrams. Part of Chapter 2 of your textbook pp. 28-39 (OJ) covers the relevant topics.
4.1 LEARNING OBJECTIVES
When you have successfully completed this Unit, you will be able to construct, interpret and use the following:
1. the Bar chart. 2. the Pie chart.
4.2 ORGANISATION AND PRESENTATION OF DATA II
4.2.1 Introduction
Study pp 28-29 of your textbook (OJ).
There are some guidelines which are important for the construction of various charts, diagrams and graphs, in the same way as we discussed for the construction of tables in Section 3.3.2.1 of Unit 3.
Some of these guidelines are common:
• Be sure what you want your chart or diagram or graph to show.
• All charts, diagrams or graphs must have a title which is, as far as possible, self-explanatory.
• The source of the data must always be included (usually below the chart/diagram/graph).
• Units of measurement must be shown clearly.
• Axes should be labelled clearly and scales must be made convenient, explicit and clear.
• Where appropriate, a key must be given so as to explain clearly what each shading etc. represents.
• Charts, diagrams or graphs must be neat and tidy.
4.2.2 The Bar Chart
Study pp 29-33 of your textbook (OJ).
Your textbook covers adequately the discussion on the bar chart; however, certain points need to be added with regards to various charts developed from the idea of a bar chart.
It is desirable that the compound or component bar chart does not contain too many components, or else, the impact on the reader may be blurred. Whenever there is a need to compare two data sets using component bar charts, it is advisable to use percentages rather
than actual numbers : percentages make comparison easier, especially when charts or diagrams are used. Think why!
The example given in Fig. 2.5 of p 32 of your textbook is an example of what is commonly known as a multiple bar chart. Multiple bar charts are very useful when different characteristics [e.g. % of labour force employed in agriculture, agrarian output as % of GNP of various units of interest (e.g. countries)] need to be simultaneously presented. It is however desirable that not too many characteristics are included in the diagram; the chart might otherwise contain too much information and can become rather confusing.
Sometimes, bar charts or component bar charts are drawn with the bars horizontal; in some cases, the variable on the horizontal axis is time. Such adaptation of the bar chart is known as the Gantt Charts. It is used especially at the time of planning a project over time and monitoring the implementation of the project with regards to the assigned time schedule.
Activity 1 Attempt Questions 2.8 and 2.9 of your textbook (OJ).
4.2.3 The Pie Chart
Study pp 33-34 of our textbook (OJ).
Your textbook tends to be too sceptical about the pie chart. In fact, the main objective of the pie chart is to show the relative importance of the component parts of a total. And the pie chart does this extremely well, provided there are not too many components.
The pie chart is used widely to present statistical data to the general public as well as to highlight any shift in the relative importance of the component parts of a total over time. In the latter case, two pie charts can be drawn for data available at two different points in time.
Table 4.1
URBAN POPULATION FOR ISLAND OF MAURITIUS
Municipal Council Area 1972 1983
Port-Louis 133,996 133,702 Beau-Bassin - Rose-Hill 80,318 90,577 Quatre-Bornes 50,770 63,682 Vacoas-Phoenix 47,638 53,090 Curepipe 51,956 62,200 TOTAL 364,678 403,251 Source: Annual Digest of Statistics, C.S.O., 1988
Represent the above information by means of pie-charts.
4.2.4 The Histogram
Study pp 34-39 of your textbook (OJ).
Note that a histogram can only be constructed for continuous variables; thus a given discrete variable needs to be transformed into the appropriate continuous form before the histogram is constructed.
Table 4.2
Number of faults Number of cars (frequency) 1 18 2 25 3 19 4 8 5 3 6 or more 0 __ 73 ---
The variable ‘number of faults’ is discrete and is first transformed into the continuous form as follows: Table 4.3 Number of faults 0.5 and under 1.5 1.5 and under 2.5 2.5 and under 3.5 3.5 and under 4.5 4.5 and under 5.5 5.5 and above
The histogram is then constructed with the first rectangle having its base between 0.5 and 1.5 inclusive. The second rectangle will have its base between 1.5 and 2.5 inclusive, etc. Thus there is no gap between the rectangles. The rectangles must be contiguous i.e. touching each other.
Table 4.4
Discrete form Continuous form
Number of calls Number of calls
10 - 19 9.5 and under 19.5
20 - 29 19.5 and under 29.5
30 - 39 29.5 and under 39.5, etc.
Another important point needs to be highlighted in the construction of histogram. Occasionally, we come across frequency distributions with class intervals being very different from almost each other. Consider the following data relating to infant deaths in Table 4.5.
Infant Deaths
(deaths of Children Under 1 Year of Age) by age and sex Island of Mauritius, 1986-1988
1986 1987 1988
Age Both
Sexes
Male Female Both Sexes
Male Female Both Sexes
Male Female
Under 1 day 91 48 43 60 31 29 87 53 34
1 – 6 days 191 118 73 183 111 72 117 117 57
7 – 27 days 75 40 35 91 63 28 30 30 21
28 days – under 2 months 25 13 12 38 21 17 12 12 16
2 – 3 months 35 28 7 35 19 16 17 17 20 4 – 5 months 22 12 10 21 16 5 14 14 11 6 – 7 months 11 4 7 13 6 7 8 8 5 8 – 9 months 16 8 8 12 8 4 9 9 7 10 – 11 months 14 8 6 10 8 2 6 6 4 Under 1 year 480 279 201 463 283 180 266 266 175 Under 7 days 282 166 116 243 142 101 170 170 91 Under 28 days 357 206 151 334 205 129 200 200 112
28 days – under 1 year 123 73 50 129 78 51 129 66 63
Source: Central Statistical Office, Annual Digest of Statistics 1989.
In such cases, we first compute the frequency density which is defined as follows:
frequency density = frequency (class width.
Then the frequency density is used on the vertical axis, and the variable of interest is used on the horizontal axis as usual.
Table 4.6
Age Number of deaths Frequency density for both sexes, 1986
(frequency)
Under 1 day 91 91
1 - 6 days 191 191 ( 6 = 31.8
7 - 27 days 75 75 ( 21 = 3.6
etc. etc. etc.
Thus the frequency density gives the number of deaths per unit time (i.e. per day) and renders all frequencies comparable. The fundamental principle underlying the histogram is that what matters is the area of rectangle and not the height of rectangle. The examples considered in your textbook are merely specific applications of this fundamental principle. In these examples all or most class intervals have the same widths except two or three. Can you see the link?
The histogram and the frequency polygon give us a view of the shape of a given frequency distribution. In particular, they help to
(i) identify to what extent a particular distribution is asymmetrical, and
For the latter case, we may use, for example, two histograms (using the same scales) to highlight the change in age structure of the population of Mauritius which has occurred between 1972 and 1990 (at which times a population census was carried).
Activity 3
(i) Attempt questions 2.7 and 2.14 of your textbook (OJ).
(ii) The age distributions of the population as enumerated at the censuses of 1972 and 1990 for Island of Mauritius are as follows:
Table 4.7
AGE DISTRIBUTION OF POPULATION FOR ISLAND OF MAURITIUS
Age Group 1972 1990 (years) (000’s) (000’s) ________________________________________________ 9 and less 220.0 191.2 10 - 19 211.9 201.6 20 - 29 133.0 202.5 30 - 39 83.9 171.0 40 - 49 74.5 102.5 50 - 59 52.8 68.1 60 - 69 31.8 ( ( 70 - 79 13.4 ( 85.4 ( 80 and above 3.8 ( ________________________________________________ TOTAL 825.1 1,022.3 ________________________________________________
Source: (a) Annual Digest of Statistics (b) 1990 Census report, Volume II
(a) Illustrate, by means of histograms, the age distributions of the Island of Mauritius for 1972 and 1990.
Comment on your findings.
(b) Draw the respective frequency polygons.
4.3 SUMMARY
In this unit, you have learnt about the presentation of data by using some charts/diagrams, namely the bar chart and its various adaptations, the pie chart and the histogram.