Continuous Data - Quantitative Data or Continuous Data or Numerical Data

COMMUNITY DENTISTRY Biostatistics

II. Quantitative Data or Continuous Data or Numerical Data

2. Continuous Data

Occur when there is no limitation on the values that the variable can take.

Ex: weight or height

Sources of Data 2. Primary data

Obtained directly from the source

It is first hand information

Data can be obtained by means of questionnaires, interviews, or clinical examinations

3. Secondary data

Obtained from pre-existing records

It is Second hand information

Data can be obtained from govt. records, hospital records etc.

Methods of Collecting Data 1. Census

Defined as the total process of collecting, compiling and publishing demographic, economic and social data pertaining at a specified time or times, to all persons in a country or a delimited territory

The first regular census in India was recorded in 1881.

Census is conducted for every 10 years in India (MAHE-99)

Recent census in India was recorded in February, 2011.

Census act was passed by the parliament of India in1948.

‘Census Commissioner of India’ is the chief officer for census enumeration.

Advantages

o Complete information

Disadvantages

o Expensive, time consuming, needs more man-power, lesser accuracy.

COMMUNITY DENTISTRY Biostatistics

18

Sample is a portion of a population, selected from the population in some manner

A Sampling unit is defined as representing every member of sample. (AIPG-09)

Importance of Sampling

The physical impossibility of checking all the items in the population

Adequate accuracy of sampling results

Cost of study in the entire population

Saving the time

Types

1. Purposive Sampling i. Judgment Sampling

Selection of samples is left to the Judgment of investigator.

In this sampling technique, the accuracy of results depends upon investigator.

Indications

o Employed mainly when population is small o Employed to conduct

pilot study

Limitations

o Accuracy of results depends upon the knowledge of the investigator.

o If investigator is biased, it affects the acceptance or rejection of a hypothesis

ii. Convenience Sampling (Chunk Sampling/ incidental sampling )

Chunk is a fraction of population, which is selected because it is

conveniently available for investigator.

Ex: In order to estimate oral hygiene status of school children in a city, the investigator may select a few schools nearby his work. Results of this sampling are rarely representative because they are generally biased.

iii. Quota Sampling

Each investigator is allotted quota of persons which are to be interviewed.

Investigators are given instructions to interview persons within the quota with some specified characteristics.

Ex: Persons within the quota of 10 house wives, 6 professionals.

2. Random Sampling

The sample is selected using random techniques.

Selection bias is avoided.

i. Simple Random Sampling (unrestricted random sampling )

The procedure of selecting a sample in which, every item in a population has an equal chance of being included in the sample. (MAHE-97)

Applicable when

population is very small, homogeneous and readily available

Lottery method

Advantages

o Eliminates selection bias

o Selection of sample is costly and time consuming

Limitation

o Difficult to collect data for large samples

ii. Systematic Random Sampling

By selecting one unit at random and then selecting additional units at evenly spaced intervals (sample interval) till the sample of required size has been formed

It is applied to field studies when the population is large, scattered &

homogenous.

Sample interval is calculated by the following formula K = N/n

Where, K - sample interval or sample ratio, N - population size and n - Sample size

Ex: If 150 patients are to be included in the sample from a population of 3000, K = 3000/150 = 20

Advantages

o Systematic design is simple, convenient to adopt

o The time & labor in collection of sample is relatively small o It gives accurate results

when population is large

Limitation

o Requires a pre-formed list

iii. Stratified Random Sampling (KAR-99)

If population is

heterogeneous, the simple random sampling is not effective.

Purpose of this sampling is to increase the efficiency of sampling by dividing heterogeneous sample

population into

homogenous groups. These homogenous groups are termed as strata.

Ex: Areas, classes, age groups, sexes etc.

Advantages

o There is a greater precision of results o It gives better results

when population is scattered

o More

representativeness &

accuracy

Disadvantages

o It is too technical method and Time consuming.

iv. Cluster Sampling

In this sampling the required no of groups or clusters are selected by simple random sampling.

Then all the individuals present in those clusters are included in the sample (KAR-04)

Advantages o Simpler

o Involves less time and cost

COMMUNITY DENTISTRY Biostatistics

20

o When population is vast & scattered over a wide area and the population forms natural groups (called clusters), cluster sampling is applicable.

v. Multistage Sampling

As the name implies this method refers to the sampling procedures carried out in several stages using random sampling technique.

Indication

o When the study involves very large population, like nationwide surveys

vi. Multiphase Sampling

In this method, part of the information is collected from the whole sample &

part from the sub sample.

Advantages

o Economic, yet

purposeful

o Saves time and manpower

Errors in Sampling Sampling Errors

• Faulty sampling design

• Small sample size

Non sampling Errors

• Coverage error

• Observational error

• Processing error

PRESENTATION OF DATA

• Statistical data once collected should be systematically arranged and presented,

To arouse interest of readers

For data reduction

To bring out important points clearly and strikingly

For easy grasp and meaningful conclusions

To facilitate further analysis

To facilitate communication

• Two main types of data presentation are I. Tabulation

II. Graphic representation with charts and diagrams

I. Tabulation

• It is the most common method

• Data presentation is in the form of columns and rows

• It can be of the following types

1.

Simple tables

2.

Frequency distribution tables

1.

Simple Table

Year Number of in patients

Jan 06 2,800

Feb 06 1,900

March 06 1,750

2.

Frequency distribution table

In a frequency distribution table, the data is first split into convenient groups (class interval) and the number of items (frequency) which occurs in each group is shown in adjacent column.

Number of

Cavities Number of Patients

0 to 3 78

3 to 6 67

6 to 9 32

9 and above 16

Ideal requirements of Charts and diagrams

• Self explanatory

• Simple and consistent with the d

• Values of the variables should be indicated at the right hand top corner of the graph

• The scale of division of the should be proportional

• The details of the variables and frequencies presented on

should be mentioned

Types of Diagrams 1. Simple Bar

Represent qualitative data

Only one variable can be represented using one diagram classification, thus cannot be used for comparison.

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

Simple and consistent with the data Values of the variables should be shown on horizontal or X-axis and ency on vertical line or

Y-many lines on the graph

The scale of presentation should be indicated at the right hand top corner

The scale of division of the two axes

The details of the variables and frequencies presented on the axes

Represent qualitative data

Only one variable can be represented

Width of the bar remains the same

The length varies according to the

vertically or

It represents only one classification, thus cannot be used for comparison.

2. Multiple Bar

It is used to compare qualitative data with respect to a single variable.

Eg: with respect to sex, time or region.

Each category of the variable have a set of bars of the same width corresponding to the different sections without any gap in between the width and the length corresponds to the frequency.

3. Component Bar

It represents qualitative data.

We can represent the number of cases in major groups as well as the subgroups simultaneously, using component bar diagram.

First, rectangles are drawn, proportional to the number of cases of the major group. Then, each rectangle is divided in to components, proportional to the numbers in the subgroups.

4. Histogram (AP-01, 03, KAR

Most widely used to represent quantitative data of continuous type.

It is a bar diagram without gap between the bars.

It represents a frequency distribution.

X-axis: the size of an observation is marked. Starting from 0, the limit of each class interval is marked. The width of each bar corresponds to the width of the class interval in the

It is used to compare qualitative data with respect to a single variable.

Eg: with respect to sex, time or

Each category of the variable have a set of bars of the same width corresponding to the different sections without any gap in between the width and the length corresponds to the frequency.

It represents qualitative data.

can represent the number of cases in major groups as well as the subgroups simultaneously, using component bar diagram. numbers in the subgroups.

, KAR-10)

used to represent quantitative data of continuous type.

It is a bar diagram without gap between the bars.

It represents a frequency distribution.

the size of an observation is marked. Starting from 0, the limit of each class interval is marked. The width of each bar corresponds to the width of the class interval in the

Y-axis: the frequencies are marked. A rectangle is drawn above each class interval with height proportional to the frequency of that class interval.

5. Frequency Polygon

It represents frequency distribution of quantitative data

It facilitates comparison of two or more frequency distributions.

A point is marked over the mid

the class interval, corresponding to the frequency.

The first point and last point of each class interval are joined to the midpoint of previous and next class respectively. All the points are connected by straight lines.

To compare two or more frequency distributions, lines of different types are drawn on the same graph.

6. Line Diagram

It is useful to study the changes of values in the variables over time. . (AIIMS

Time is represented on X axis and frequency of the variable on Y axis.

Facilitates comparison of data among different groups in a simple way

COMMUNITY DENTISTRY Biostatistics

the frequencies are marked. A rectangle is drawn above each class interval with height proportional to the frequency of that class interval.

It represents frequency distribution of

It facilitates comparison of two or more

A point is marked over the mid-point of the class interval, corresponding to the

The first point and last point of each class interval are joined to the midpoint of d next class respectively. All the points are connected by straight lines.

To compare two or more frequency distributions, lines of different types are

changes of values . (AIIMS-01) Time is represented on X axis and frequency of the variable on Y axis.

Facilitates comparison of data among different groups in a simple way

7. Pie Chart/Sector Diagram

Used to present data, expressed in percentages (KAR-04, COMEDK

The frequency of the group is shown in a circle.

Degree of angle denotes the frequency.

Instead of comparing the length of the areas of segments are compared.

8. Scatter/Dot Diagram

It is used to show the association between two quantitative variables. The frequency of the group is shown in a

Degree of angle denotes the frequency.

Instead of comparing the length of bar, the areas of segments are compared.

It is used to show the association between uantitative variables.

The imaginary line drawn through the center of the scatter shows the

9. Cartograms or Spot Map

It shows geographical distribution of frequencies of a characteristic.

Easy to understand and condenses a lot of

information in to a simple picture.

10. Pictogram

The pictures representing the value items are called pictograms.

It is most useful way of representing data to lay groups.

frequencies of a characteristic.

Easy to understand and condenses a lot of

information in to a simple picture.

The pictures representing the value of

It is most useful way of representing data

MEASURES OF STATISTICAL AVERAGES OR CENTRAL TENDENCY

• Single estimate of a series of data that summarizes the data is known as the parameter and one such

measure of central tendency.

• Objective: to condense the entire mass of data to facilitate comparison with other data measured on the same grounds.

Ideal Properties of Central Tendency

• Should be easy to understand and compute

• Should be based on each and every item in the measure of central tendency is calculated, they should not differ from each other markedly

Types

1.

Arithmetic mean – mathematical estimate

2.

Median – positional estimate

3.

Mode – based on frequency 1. Arithmetic Mean/Mean (MAHE

AIIMS-01, PGI-02)

The simplest measure of central tendency

It is the summation of all the observations divided by the total number of observations (n)

Denoted by X for sample and µ for population

Mean = Sum of all the observations of the data Number of observations in the data MEASURES OF STATISTICAL AVERAGES OR

Single estimate of a series of data that summarizes the data is known as the parameter and one such parameter is the measure of central tendency.

to condense the entire mass of data to facilitate comparison with other data measured on the same grounds.

Ideal Properties of Central Tendency

Should be easy to understand and compute based on each and every item in the

Should not be affected by extreme measure of central tendency is calculated, they should not differ from each other markedly

mathematical estimate positional estimate

based on frequency

(MAHE-95, 98, 99, 2K,

est measure of central tendency It is the summation of all the observations divided by the total number of

Denoted by X for sample and µ for

the observations of the data Number of observations in the data

means the sum of.

is the value of each observation in the

n: is the number of observations in the

COMMUNITY DENTISTRY Biostatistics

24

Uniqueness – For a given set of data there is one and only one mean

Simplicity – It is easy to understand and to compute

Affected by extreme values. Since all values enter into the computation

2. Median (UPSC-01, KAR-02, AIIMS-04)

When ordering the data, it is the observation that divides the set of observations into two equal parts such that half of the data are before it and the other are after it.

If n is odd, the median will be the middle of observations. It will be the (n+1)/2^th ordered observation. When n = 11, then the median is the 6^th observation.

If n is even, there are two middle observations. The median will be the mean of these two middle observations. It will be the mean of the [(n/2)^th, (n/2 +1)^th] ordered observation. When n = 12, then the median is the 6.5^th observation, which is an observation halfway between the 6^th and 7^th ordered observation.

Calculation of Median

o Observations are arranged in the ascending or descending order of magnitude & then the middle value of the observations is Median.

o In case of even number of observations, the average of the two middle values is Median

Properties of the Median

Uniqueness – For a given set of data there is one and only one median

Simplicity – It is easy to calculate

It is not affected by extreme values as is the mean

3. Mode (KAR-03, AIIMS-08)

The value in a series of observations, which occurs with the greatest frequency Example

o Number of decayed teeth in 10 children: 2, 2, 4, 1, 3, 0, 10, 2, 3, 8 Mean = 34 / 10 = 3.4

Median = (0,1,2,2,2,3,3,4,8,10) = 2+3 /2 = 2.5

Mode = 2 (3 Times)

Properties of the Mode

Sometimes, it is not unique.

It may be used for describing qualitative data.

TYPES OF VARIABILITY

• There are three types of variability

1.

Biological variability

2.

Real variability

3.

Experimental variability

i. Observer Error ii. Instrumental Error iii. Sampling Error

It is the natural difference which occurs in individuals due to age, gender and other attributes which are inherent

This difference is small and occurs by chance and is within certain accepted biological limits

Ex: vertical dimension may vary from patient to patient

In document Brihaspathi Synopsis (Page 173-181)