Coefficient of variation - Quantitative Data or Continuous Data or Numerical Data

COMMUNITY DENTISTRY Biostatistics

II. Quantitative Data or Continuous Data or Numerical Data

4. Coefficient of variation

It is used to compare attributes having two different units of measurement Ex.

height and weight

Denoted by CV

CV = ^{ୗୈ ଡ଼ ଵ଴଴}

୑ୣୟ୬

It is expressed as percentage

NORMAL DISTRIBUTION OR NORMAL CURVE

• Also called Gaussian curve, after Gauss, who observed it. (KAR-03)

Characteristics

Bell shaped

Bilaterally symmetrical

Frequency increases from one side reaches its highest and decreases exactly the way it had increased

The highest point denotes mean, median and mode which coincide

Mean ± 1 SD includes 68.27% of all observations. Such observations are fairly common

Mean ± 2 SD includes 95.45% of all observations i.e. by convention values beyond this range are uncommon or rare. Their chances of being normal is 100 – 95.45 % i.e.

only 4.55%.

Mean ± 3 SD includes 99.73%. Such values are very rare. There chance of being normal is 0.27% only

These limits on either side of measurement are called Confidence Limits

The look of frequency distribution curve may vary depending on mean and SD. Thus it becomes necessary to standardize it. Eg- One study has SD as 3 and other has SD as 2, thus it becomes difficult to compare them

Thus normal curve is standardized by using the unit of standard deviation to place any measurement with reference to mean.

The curve that emerges through this procedure is called Standard Normal Curve

Smooth bell shaped

Perfectly symmetrical

Based on infinite number of observations thus curve does not touch x axis

Mean is zero

SD is always 1

Total area under the curve is 1

Mean median mode coincide (KAR-99, AIPG-09)

The unit of SD here is relative or standard normal deviate and is denoted by Z

Z = ୓ୠୱୣ୰୴ୟ୲୧୭୬ି୑ୣୟ୬

ୗୈ

With the help of Z value we can find the area under the curve from a table

This area helps to give the P value

MEASURES OF THE SHAPE OF A DISTRIBUTION

• The measures of the shape of a distribution are the coefficients of skewness and kurtosis.

Skewness (sk)

• A measure of asymmetry of a frequency distribution

• It shows if deviations from the mean are larger on one side than the other side of the distribution

• For a symmetric distribution skewness is equal to zero.

• The direction of the tail of the curve indicates the direction of the skewed distribution.

• If the tail of the curve is toward the right, the distribution is said to be positively skewed.

Mean is greater than Median

COMMUNITY DENTISTRY Biostatistics

28

• If the tail of the curve is toward the left, the distribution is negatively skewed. Mean is less than Median

• In a skewed distribution, the mean always follows the tail of the curve. From the tail of the curve to the apex (mode), the mean, median, and mode are always in alphabetical order.

Kurtosis (kt)

• A measure of flatness or steepness of a distribution, or a measure of the heaviness of the tails of a distribution.

• If observations follow a normal distribution then kurtosis is equal to zero.

• A distribution with positive kurtosis has a large frequency of observations close to the mean and thin tails.

• A distribution with a negative kurtosis has thicker tails and a lower frequency of observations close to the mean than does the normal distribution

MEASURES OF RELATIVE POSITION

• Measures of relative position include percentiles and z-value.

Percentile value (p)

• It is of an observation yi, in a data set has 100p% of observations smaller than yiand has 100(1-p) % of observations greater than yⁱ.

• A lower quartile is the 25^th percentile, an upper quartile is 75^th percentile, and the median is the 50^th percentile. (AIPG-03,AIIMS-06)

z-value

• The deviation of an observation from the mean in standard deviation units

CORRELATION COEFFICIENT (r)

• The quantity r, called the linear correlation coefficient, measures the strength and the direction of a linear relationship between two variables.

• The linear correlation coefficient is sometimes referred to as the Pearson product moment correlation coefficient in honor of its developer Karl Pearson.

• The mathematical formula for computing r is

Where n is the number of pairs of data.

• The value of r is such that -1 < r < +1.

• The + and – signs are used for positive linear correlations and negative linear correlations, respectively.

• Positive correlation: If x and y have a strong positive linear correlation, r is close to +1. An r value of exactly +1 indicates a perfect positive fit. Positive values indicate a relationship between x and y variables such that as values for x increase, values for y also increase.

• Negative correlation: If x and y have a strong negative linear correlation, r is close to -1. An r value of exactly -1 indicates a perfect negative fit. Negative values indicate a relationship between x and y such that as values for x increase, values for y decrease.

• No correlation: If there is no linear correlation or a weak linear correlation, r is close to 0. A value near zero means that there is a random, nonlinear relationship between the two variables

• Note that r is a dimensionless quantity; that is;

it does not depend on the units employed.

• A perfect correlation of ± 1 occurs only when the data points all lie exactly on a straight line.

• If r = +1, the slope of this line is positive. If r = -1, the slope of this line is negative.

• A correlation greater than 0.8 is generally described as strong, whereas a correlation less than 0.5 is generally described as weak.

COEFFICIENT OF REGRESSION

• Regression, as often practiced in earth sciences, is the attempt to establish a mathematical relationship between variables.

• This can be used to extrapolate or to predict one variable given the other.

• For example, a relationship exists between the frequency of occurrence of a given size flood or earthquake, and the size of the event. Given flood data, and assuming constancy of system operation then one can predict how big a size of a certain frequency will be, i.e. how big the 100 year flood will be.

• A linear relationship between two variables is captured by the formula y = b + m x, where b is the y intercept and m is the slope.

• It is significant which variable is y and which is x

• Correlation measures the dependability of the relationship (the goodness of fit of the data to that). It is a measure of how well one variable can predict the other (given the context of the data), and determines the precision you can assign to a relationship.

• Regression or correlation can be bivariate (between 2 variables, x and y) or multivariate, between greater than two variables.

• Regression is interested in the form of the relationship, whereas correlation is more focused simply on the strength of a relationship.

SAMPLING

•

It is not possible to include each and every member of population as it will be time consuming, costly, laborious, therefore sampling is done

•

Sampling is a process by which some unit of a population or universe are selected for the study and by subjecting it to statistical computation, conclusions are drawn about the population from which these units are drawn

•

The sample will be a representative of entire population only

•

It is sufficiently large

•

It is unbiased

•

Such sample will have its statistics almost equal to parameters of entire population

•

Two main characteristics of a representative sample are

1.

Precision

In document Brihaspathi Synopsis (Page 182-185)