COMMUNITY DENTISTRY Biostatistics
II. Quantitative Data or Continuous Data or Numerical Data
2. Continuous Data
Occur when there is no limitation on the values that the variable can take.
Ex: weight or height
Sources of Data 2. Primary data
Obtained directly from the source
It is first hand information
Data can be obtained by means of questionnaires, interviews, or clinical examinations
3. Secondary data
Obtained from pre-existing records
It is Second hand information
Data can be obtained from govt. records, hospital records etc.
Methods of Collecting Data 1. Census
Defined as the total process of collecting, compiling and publishing demographic, economic and social data pertaining at a specified time or times, to all persons in a country or a delimited territory
The first regular census in India was recorded in 1881.
Census is conducted for every 10 years in India (MAHE-99)
Recent census in India was recorded in February, 2011.
Census act was passed by the parliament of India in1948.
‘Census Commissioner of India’ is the chief officer for census enumeration.
Advantages
o Complete information
Disadvantages
o Expensive, time consuming, needs more man-power, lesser accuracy.
COMMUNITY DENTISTRY Biostatistics
18
© BRIHASPATHI ACADEMY ׀ SUBSCRIBER’S COPY ׀ NOT FOR SALE 2. Sampling
Sample is a portion of a population, selected from the population in some manner
A Sampling unit is defined as representing every member of sample. (AIPG-09)
Importance of Sampling
The physical impossibility of checking all the items in the population
Adequate accuracy of sampling results
Cost of study in the entire population
Saving the time
Types
1. Purposive Sampling i. Judgment Sampling
Selection of samples is left to the Judgment of investigator.
In this sampling technique, the accuracy of results depends upon investigator.
Indications
o Employed mainly when population is small o Employed to conduct
pilot study
Limitations
o Accuracy of results depends upon the knowledge of the investigator.
o If investigator is biased, it affects the acceptance or rejection of a hypothesis
ii. Convenience Sampling (Chunk Sampling/ incidental sampling )
Chunk is a fraction of population, which is selected because it is
conveniently available for investigator.
Ex: In order to estimate oral hygiene status of school children in a city, the investigator may select a few schools nearby his work. Results of this sampling are rarely representative because they are generally biased.
iii. Quota Sampling
Each investigator is allotted quota of persons which are to be interviewed.
Investigators are given instructions to interview persons within the quota with some specified characteristics.
Ex: Persons within the quota of 10 house wives, 6 professionals.
2. Random Sampling
The sample is selected using random techniques.
Selection bias is avoided.
i. Simple Random Sampling (unrestricted random sampling )
The procedure of selecting a sample in which, every item in a population has an equal chance of being included in the sample. (MAHE-97)
Applicable when
population is very small, homogeneous and readily available
Lottery method
Advantages
o Eliminates selection bias
© BRIHASPATHI ACADEMY ׀ SUBSCRIBER’S COPY ׀ NOT FOR SALE Disadvantages
o Selection of sample is costly and time consuming
Limitation
o Difficult to collect data for large samples
ii. Systematic Random Sampling
By selecting one unit at random and then selecting additional units at evenly spaced intervals (sample interval) till the sample of required size has been formed
It is applied to field studies when the population is large, scattered &
homogenous.
Sample interval is calculated by the following formula K = N/n
Where, K - sample interval or sample ratio, N - population size and n - Sample size
Ex: If 150 patients are to be included in the sample from a population of 3000, K = 3000/150 = 20
Advantages
o Systematic design is simple, convenient to adopt
o The time & labor in collection of sample is relatively small o It gives accurate results
when population is large
Limitation
o Requires a pre-formed list
iii. Stratified Random Sampling (KAR-99)
If population is
heterogeneous, the simple random sampling is not effective.
Purpose of this sampling is to increase the efficiency of sampling by dividing heterogeneous sample
population into
homogenous groups. These homogenous groups are termed as strata.
Ex: Areas, classes, age groups, sexes etc.
Advantages
o There is a greater precision of results o It gives better results
when population is scattered
o More
representativeness &
accuracy
Disadvantages
o It is too technical method and Time consuming.
iv. Cluster Sampling
In this sampling the required no of groups or clusters are selected by simple random sampling.
Then all the individuals present in those clusters are included in the sample (KAR-04)
Advantages o Simpler
o Involves less time and cost
COMMUNITY DENTISTRY Biostatistics
20
© BRIHASPATHI ACADEMY ׀ SUBSCRIBER’S COPY ׀ NOT FOR SALE Indication
o When population is vast & scattered over a wide area and the population forms natural groups (called clusters), cluster sampling is applicable.
v. Multistage Sampling
As the name implies this method refers to the sampling procedures carried out in several stages using random sampling technique.
Indication
o When the study involves very large population, like nationwide surveys
vi. Multiphase Sampling
In this method, part of the information is collected from the whole sample &
part from the sub sample.
Advantages
o Economic, yet
purposeful
o Saves time and manpower
Errors in Sampling Sampling Errors
• Faulty sampling design
• Small sample size
Non sampling Errors
• Coverage error
• Observational error
• Processing error
PRESENTATION OF DATA
• Statistical data once collected should be systematically arranged and presented,
To arouse interest of readers
For data reduction
To bring out important points clearly and strikingly
For easy grasp and meaningful conclusions
To facilitate further analysis
To facilitate communication
• Two main types of data presentation are I. Tabulation
II. Graphic representation with charts and diagrams
I. Tabulation
• It is the most common method
• Data presentation is in the form of columns and rows
• It can be of the following types
1.
Simple tables2.
Frequency distribution tables1.
Simple TableYear Number of in patients
Jan 06 2,800
Feb 06 1,900
March 06 1,750
2.
Frequency distribution tableIn a frequency distribution table, the data is first split into convenient groups (class interval) and the number of items (frequency) which occurs in each group is shown in adjacent column.
Number of
Cavities Number of Patients
0 to 3 78
3 to 6 67
6 to 9 32
9 and above 16
© BRIHASPATHI ACADEMY II. Charts and diagrams
Ideal requirements of Charts and diagrams
• Self explanatory
• Simple and consistent with the d
• Values of the variables should be indicated at the right hand top corner of the graph
• The scale of division of the should be proportional
• The details of the variables and frequencies presented on
should be mentioned
Types of Diagrams 1. Simple Bar
Represent qualitative data
Only one variable can be represented using one diagram classification, thus cannot be used for comparison.
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
© BRIHASPATHI ACADEMY ׀ SUBSCRIBER’S COPY ׀ NOT FOR SALE Charts and diagrams
Simple and consistent with the data Values of the variables should be shown on horizontal or X-axis and ency on vertical line or
Y-many lines on the graph
The scale of presentation should be indicated at the right hand top corner
The scale of division of the two axes
The details of the variables and frequencies presented on the axes
Represent qualitative data
Only one variable can be represented
Width of the bar remains the same
The length varies according to the
vertically or
It represents only one classification, thus cannot be used for comparison.
2. Multiple Bar
It is used to compare qualitative data with respect to a single variable.
Eg: with respect to sex, time or region.
Each category of the variable have a set of bars of the same width corresponding to the different sections without any gap in between the width and the length corresponds to the frequency.
3. Component Bar
It represents qualitative data.
We can represent the number of cases in major groups as well as the subgroups simultaneously, using component bar diagram.
First, rectangles are drawn, proportional to the number of cases of the major group. Then, each rectangle is divided in to components, proportional to the numbers in the subgroups.
4. Histogram (AP-01, 03, KAR
Most widely used to represent quantitative data of continuous type.
It is a bar diagram without gap between the bars.
It represents a frequency distribution.
X-axis: the size of an observation is marked. Starting from 0, the limit of each class interval is marked. The width of each bar corresponds to the width of the class interval in the
It is used to compare qualitative data with respect to a single variable.
Eg: with respect to sex, time or
Each category of the variable have a set of bars of the same width corresponding to the different sections without any gap in between the width and the length corresponds to the frequency.
It represents qualitative data.
can represent the number of cases in major groups as well as the subgroups simultaneously, using component bar diagram. numbers in the subgroups.
, KAR-10)
used to represent quantitative data of continuous type.
It is a bar diagram without gap between the bars.
It represents a frequency distribution.
the size of an observation is marked. Starting from 0, the limit of each class interval is marked. The width of each bar corresponds to the width of the class interval in the
© BRIHASPATHI ACADEMY
Y-axis: the frequencies are marked. A rectangle is drawn above each class interval with height proportional to the frequency of that class interval.
5. Frequency Polygon
It represents frequency distribution of quantitative data
It facilitates comparison of two or more frequency distributions.
A point is marked over the mid
the class interval, corresponding to the frequency.
The first point and last point of each class interval are joined to the midpoint of previous and next class respectively. All the points are connected by straight lines.
To compare two or more frequency distributions, lines of different types are drawn on the same graph.
6. Line Diagram
It is useful to study the changes of values in the variables over time. . (AIIMS
Time is represented on X axis and frequency of the variable on Y axis.
Facilitates comparison of data among different groups in a simple way
COMMUNITY DENTISTRY Biostatistics
© BRIHASPATHI ACADEMY ׀ SUBSCRIBER’S COPY ׀ NOT FOR SALE
0
the frequencies are marked. A rectangle is drawn above each class interval with height proportional to the frequency of that class interval.
It represents frequency distribution of
It facilitates comparison of two or more
A point is marked over the mid-point of the class interval, corresponding to the
The first point and last point of each class interval are joined to the midpoint of d next class respectively. All the points are connected by straight lines.
To compare two or more frequency distributions, lines of different types are
changes of values . (AIIMS-01) Time is represented on X axis and frequency of the variable on Y axis.
Facilitates comparison of data among different groups in a simple way
7. Pie Chart/Sector Diagram
Used to present data, expressed in percentages (KAR-04, COMEDK
The frequency of the group is shown in a circle.
Degree of angle denotes the frequency.
Instead of comparing the length of the areas of segments are compared.
8. Scatter/Dot Diagram
It is used to show the association between two quantitative variables. The frequency of the group is shown in a
Degree of angle denotes the frequency.
Instead of comparing the length of bar, the areas of segments are compared.
It is used to show the association between uantitative variables.
The imaginary line drawn through the center of the scatter shows the
© BRIHASPATHI ACADEMY
9. Cartograms or Spot Map
It shows geographical distribution of frequencies of a characteristic.
Easy to understand and condenses a lot of
information in to a simple picture.
10. Pictogram
The pictures representing the value items are called pictograms.
It is most useful way of representing data to lay groups.
© BRIHASPATHI ACADEMY ׀ SUBSCRIBER’S COPY ׀ NOT FOR SALE It shows geographical distribution of
frequencies of a characteristic.
Easy to understand and condenses a lot of
information in to a simple picture.
The pictures representing the value of
It is most useful way of representing data
MEASURES OF STATISTICAL AVERAGES OR CENTRAL TENDENCY
• Single estimate of a series of data that summarizes the data is known as the parameter and one such
measure of central tendency.
• Objective: to condense the entire mass of data to facilitate comparison with other data measured on the same grounds.
Ideal Properties of Central Tendency
• Should be easy to understand and compute
• Should be based on each and every item in the measure of central tendency is calculated, they should not differ from each other markedly
Types
1.
Arithmetic mean – mathematical estimate2.
Median – positional estimate3.
Mode – based on frequency 1. Arithmetic Mean/Mean (MAHEAIIMS-01, PGI-02)
The simplest measure of central tendency
It is the summation of all the observations divided by the total number of observations (n)
Denoted by X for sample and µ for population
Mean = Sum of all the observations of the data Number of observations in the data MEASURES OF STATISTICAL AVERAGES OR
Single estimate of a series of data that summarizes the data is known as the parameter and one such parameter is the measure of central tendency.
to condense the entire mass of data to facilitate comparison with other data measured on the same grounds.
Ideal Properties of Central Tendency
Should be easy to understand and compute based on each and every item in the
Should not be affected by extreme measure of central tendency is calculated, they should not differ from each other markedly
mathematical estimate positional estimate
based on frequency
(MAHE-95, 98, 99, 2K,
est measure of central tendency It is the summation of all the observations divided by the total number of
Denoted by X for sample and µ for
the observations of the data Number of observations in the data
means the sum of.
is the value of each observation in the
n: is the number of observations in the
COMMUNITY DENTISTRY Biostatistics
24
© BRIHASPATHI ACADEMY ׀ SUBSCRIBER’S COPY ׀ NOT FOR SALE Properties of the Mean
Uniqueness – For a given set of data there is one and only one mean
Simplicity – It is easy to understand and to compute
Affected by extreme values. Since all values enter into the computation
2. Median (UPSC-01, KAR-02, AIIMS-04)
When ordering the data, it is the observation that divides the set of observations into two equal parts such that half of the data are before it and the other are after it.
If n is odd, the median will be the middle of observations. It will be the (n+1)/2th ordered observation. When n = 11, then the median is the 6th observation.
If n is even, there are two middle observations. The median will be the mean of these two middle observations. It will be the mean of the [(n/2)th, (n/2 +1)th] ordered observation. When n = 12, then the median is the 6.5th observation, which is an observation halfway between the 6th and 7th ordered observation.
Calculation of Median
o Observations are arranged in the ascending or descending order of magnitude & then the middle value of the observations is Median.
o In case of even number of observations, the average of the two middle values is Median
Properties of the Median
Uniqueness – For a given set of data there is one and only one median
Simplicity – It is easy to calculate
It is not affected by extreme values as is the mean
3. Mode (KAR-03, AIIMS-08)
The value in a series of observations, which occurs with the greatest frequency Example
o Number of decayed teeth in 10 children: 2, 2, 4, 1, 3, 0, 10, 2, 3, 8 Mean = 34 / 10 = 3.4
Median = (0,1,2,2,2,3,3,4,8,10) = 2+3 /2 = 2.5
Mode = 2 (3 Times)
Properties of the Mode
Sometimes, it is not unique.
It may be used for describing qualitative data.
TYPES OF VARIABILITY
• There are three types of variability
1.
Biological variability2.
Real variability3.
Experimental variabilityi. Observer Error ii. Instrumental Error iii. Sampling Error
© BRIHASPATHI ACADEMY ׀ SUBSCRIBER’S COPY ׀ NOT FOR SALE 1. Biological Variability
It is the natural difference which occurs in individuals due to age, gender and other attributes which are inherent
This difference is small and occurs by chance and is within certain accepted biological limits
Ex: vertical dimension may vary from patient to patient