COMMUNITY DENTISTRY Biostatistics
II. Quantitative Data or Continuous Data or Numerical Data
4. Coefficient of variation
It is used to compare attributes having two different units of measurement Ex.
height and weight
Denoted by CV
CV = ୗୈ ଡ଼ ଵ
ୣୟ୬
It is expressed as percentage
NORMAL DISTRIBUTION OR NORMAL CURVE
• Also called Gaussian curve, after Gauss, who observed it. (KAR-03)
Characteristics
Bell shaped
Bilaterally symmetrical
Frequency increases from one side reaches its highest and decreases exactly the way it had increased
The highest point denotes mean, median and mode which coincide
Mean ± 1 SD includes 68.27% of all observations. Such observations are fairly common
Mean ± 2 SD includes 95.45% of all observations i.e. by convention values beyond this range are uncommon or rare. Their chances of being normal is 100 – 95.45 % i.e.
only 4.55%.
Mean ± 3 SD includes 99.73%. Such values are very rare. There chance of being normal is 0.27% only
These limits on either side of measurement are called Confidence Limits
The look of frequency distribution curve may vary depending on mean and SD. Thus it becomes necessary to standardize it. Eg- One study has SD as 3 and other has SD as 2, thus it becomes difficult to compare them
Thus normal curve is standardized by using the unit of standard deviation to place any measurement with reference to mean.
The curve that emerges through this procedure is called Standard Normal Curve
© BRIHASPATHI ACADEMY ׀ SUBSCRIBER’S COPY ׀ NOT FOR SALE Properties of Standard Normal Curve
Smooth bell shaped
Perfectly symmetrical
Based on infinite number of observations thus curve does not touch x axis
Mean is zero
SD is always 1
Total area under the curve is 1
Mean median mode coincide (KAR-99, AIPG-09)
The unit of SD here is relative or standard normal deviate and is denoted by Z
Z = ୠୱୣ୰୴ୟ୲୧୭୬ିୣୟ୬
ୗୈ
With the help of Z value we can find the area under the curve from a table
This area helps to give the P value
MEASURES OF THE SHAPE OF A DISTRIBUTION
• The measures of the shape of a distribution are the coefficients of skewness and kurtosis.
Skewness (sk)
• A measure of asymmetry of a frequency distribution
• It shows if deviations from the mean are larger on one side than the other side of the distribution
• For a symmetric distribution skewness is equal to zero.
• The direction of the tail of the curve indicates the direction of the skewed distribution.
• If the tail of the curve is toward the right, the distribution is said to be positively skewed.
Mean is greater than Median
COMMUNITY DENTISTRY Biostatistics
28
© BRIHASPATHI ACADEMY ׀ SUBSCRIBER’S COPY ׀ NOT FOR SALE
• If the tail of the curve is toward the left, the distribution is negatively skewed. Mean is less than Median
• In a skewed distribution, the mean always follows the tail of the curve. From the tail of the curve to the apex (mode), the mean, median, and mode are always in alphabetical order.
Kurtosis (kt)
• A measure of flatness or steepness of a distribution, or a measure of the heaviness of the tails of a distribution.
• If observations follow a normal distribution then kurtosis is equal to zero.
• A distribution with positive kurtosis has a large frequency of observations close to the mean and thin tails.
• A distribution with a negative kurtosis has thicker tails and a lower frequency of observations close to the mean than does the normal distribution
MEASURES OF RELATIVE POSITION
• Measures of relative position include percentiles and z-value.
Percentile value (p)
• It is of an observation yi, in a data set has 100p% of observations smaller than yiand has 100(1-p) % of observations greater than yi.
• A lower quartile is the 25th percentile, an upper quartile is 75th percentile, and the median is the 50th percentile. (AIPG-03,AIIMS-06)
z-value
• The deviation of an observation from the mean in standard deviation units
CORRELATION COEFFICIENT (r)
• The quantity r, called the linear correlation coefficient, measures the strength and the direction of a linear relationship between two variables.
• The linear correlation coefficient is sometimes referred to as the Pearson product moment correlation coefficient in honor of its developer Karl Pearson.
• The mathematical formula for computing r is
Where n is the number of pairs of data.
• The value of r is such that -1 < r < +1.
• The + and – signs are used for positive linear correlations and negative linear correlations, respectively.
• Positive correlation: If x and y have a strong positive linear correlation, r is close to +1. An r value of exactly +1 indicates a perfect positive fit. Positive values indicate a relationship between x and y variables such that as values for x increase, values for y also increase.
© BRIHASPATHI ACADEMY ׀ SUBSCRIBER’S COPY ׀ NOT FOR SALE
• Negative correlation: If x and y have a strong negative linear correlation, r is close to -1. An r value of exactly -1 indicates a perfect negative fit. Negative values indicate a relationship between x and y such that as values for x increase, values for y decrease.
• No correlation: If there is no linear correlation or a weak linear correlation, r is close to 0. A value near zero means that there is a random, nonlinear relationship between the two variables
• Note that r is a dimensionless quantity; that is;
it does not depend on the units employed.
• A perfect correlation of ± 1 occurs only when the data points all lie exactly on a straight line.
• If r = +1, the slope of this line is positive. If r = -1, the slope of this line is negative.
• A correlation greater than 0.8 is generally described as strong, whereas a correlation less than 0.5 is generally described as weak.
COEFFICIENT OF REGRESSION
• Regression, as often practiced in earth sciences, is the attempt to establish a mathematical relationship between variables.
• This can be used to extrapolate or to predict one variable given the other.
• For example, a relationship exists between the frequency of occurrence of a given size flood or earthquake, and the size of the event. Given flood data, and assuming constancy of system operation then one can predict how big a size of a certain frequency will be, i.e. how big the 100 year flood will be.
• A linear relationship between two variables is captured by the formula y = b + m x, where b is the y intercept and m is the slope.
• It is significant which variable is y and which is x
• Correlation measures the dependability of the relationship (the goodness of fit of the data to that). It is a measure of how well one variable can predict the other (given the context of the data), and determines the precision you can assign to a relationship.
• Regression or correlation can be bivariate (between 2 variables, x and y) or multivariate, between greater than two variables.
• Regression is interested in the form of the relationship, whereas correlation is more focused simply on the strength of a relationship.
SAMPLING