MEASURES OF CENTRAL LOCATION (OR CENTRAL TENDENCY)

The most important objective of statistical analysis is to determine a single value for the entire mass of data, which describes the overall level of the group of observations and can be called a representative set of data. It tells us where the centre of the distribution of data is located on the scale that we are using. There are several such measures, but we shall discuss only those that are most commonly used. These are: Arithmetic Mean, Mode and Median. These values are very useful in not only presenting the overall

picture of the entire data, but also for the purpose of making comparisons among two or more sets of data.

As an example, questions like, "How hot is the month of June in Mumbai?" can be answered, generally, by a single figure of the average temperature for that month. For the purpose of comparison, suppose that we want to find out if boys and girls at the age 10 differ in height. By taking the average height of boys of that age and the average height of the girls of the same age, we can compare and note the difference.

While, arithmetic mean is the most commonly used measure of central location, mode and median arc more suitable measures under certain set of conditions and for certain types of data. However, all measures of central tendency should meet the following requisites:

· It should be easy to calculate and understand.

· It should be rigidly defined. It should have one and only one interpretation so that the personal prejudice or bias of the investigator does not affect the value or its usefulness.

· It should be representative of the data. If it is calculated from a sample, then the sample should be random enough to be accurately representing the population.

· It should have sampling stability. It should not be affected by sampling

fluctuations. This means that if we pick 10 different groups of college students at random and we compute the average of each group, then we should expect to get approximately the same value from these groups.

· It should not be affected much by extreme values. If a few very small or very large items are presented in the data, they will unduly influence the value of the

average by shifting it to one side or the other and hence the average would not be

really typical of the entire series. Hence, the average chosen should be such that it is not unduly influenced by extreme values.

Let us consider these three measures of the central tendency:

MODE

In statistics, mode means the most frequent value assumed by a random variable, or occurring in a sampling of a random variable. The term is applied both to probability distributions and to collections of experimental data.

Like the statistical mean and the median, die mode is a way of capturing important information about a random variable or a population in a single quantity. The mode is in general different from mean and median, and may be very different for strongly skewed distributions.

The mode is not necessarily unique, since the same maximum frequency may be attained at different values. The worst case is given by so-called uniform distributions, in which all values are equally likely.

Mode of a probability distribution

The mode of a probability distribution is the value at which its probability density function attains its maximum value, so, informally speaking; the mode is at the peak.

Mode of a sample

The mode of a data sample is the element that occurs most often in the collection. For example, the mode of the sample [1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17] is 6. Given the list of data [1, 1, 2, 4, 4] the mode is not unique.

For a sample from a continuous distribution, such as [0.935..., 1.211..., 2.430..., 3.668..., 3.874...], the concept is unusable in its raw form, since each value will occur precisely once. The usual practice is to discreteise the data by assigning the values to equidistant intervals, as for making a histogram, effectively replacing the values by the midpoints of the intervals they are assigned to. The mode is then the value where the histogram reaches its peak. For small or middle-sized samples the outcome of this procedure is sensitive to the choice of interval width if chosen too narrow or too wide; typically one should have a sizable fraction of the data concentrated in a relatively small number of intervals (5 to 10), while the fraction of the data falling outside these intervals is also sizable.

Comparison of mean, median and mode

For a probability distribution, the mean is also called the expected value of the random variable. For a data sample, the mean is also called the average.

When do these measures make sense?

Unlike mean and median, the concept of mode also makes sense for "nominal data" (i.e., not consisting of numerical values). For example, taking a sample of Korean family names, one might find that "Kim" occurs more often than any other name. Then "Kim"

might be called the mode of the sample. However, this use is not common.

Unlike median, the concept of mean makes sense for any random variable assuming values from a vector space, including the real numbers (a one-dimensional vector space) and the integers (which can be considered embedded in the real numbers). For example, a distribution of points in the plane will typically have a mean and a mode, but the concept of median does not apply. The median makes sense when there is a linear order on the possible values.

Uniqueness and Definedness

For the remainder, the assumption is that we have (a sample of) a real-valued random variable.

For some probability distributions, the expected value may be infinite or undefined, but if defined, it is unique. The average of a (finite) sample is always defined. The median is the value such that the fractions not exceeding it and not falling below it are both at least 1/2. It is not necessarily unique, but never infinite or totally undefined. For a data

sample it is the "halfway" value when the list of values is ordered in increasing value, where usually for a list of even length the numerical average is taken of the two values closest to "halfway". Finally, as said before, the mode is not necessarily unique.

Furthermore, like the mean, the mode of a probability distribution can be (plus or minus)-infinity, but unlike the mean it cannot be just undefined. For a finite data sample, the mode is one (or more) of the values in the sample and is itself then finite.

Properties

Assuming definedness, and for simplicity uniqueness, the following are some of the most interesting properties.

All three measures have the following property: If the random variable (or each value from the sample) is subjected to the linear or affine transformation which replaces Xby ax+b, so are the mean, median and mode.

However, if there is an arbitrary monotonic transformation, only the median follows; for example, if X is replaced by exp(X), the median changes from m to exp(m) but the mean and mode won't.

Except for extremely small samples, the median is totally insensitive to "outliers" (such as occasional, rare, false experimental readings). The mode is also very robust in the presence of outliers, while the mean is rather sensitive.

In continuous uni-modal distributions the median lies, as a rule of thumb, between the mean and the mode, about one third of the way going from mean to mode. In a formula, median = (2 x mean + mode)/3. This rule, due to Karl Pearson, is however not a hard and fast rule. It applies to distributions that resemble a normal distribution.

Example for a skewed distribution

A well-known example of a skewed distribution is personal wealth: Few people are very rich, but among those some are excessively rich. However, many are rather poor.

A well-known class of distributions that can be arbitrarily skewed is given by the log-normal distribution. It is obtained by transforming a random variable X having a normal distribution into random variable Y = exp(X). Then the logarithm of random variable Y is normally distributed, whence the name.

Taking the mean μ of X to be 0, the median of Y will be 1, independent of the standard deviation σ of X. This is so because X has a symmetric distribution, so its median is also 0. The transformation from X to Y is monotonic, and so we find the median exp(0) = 1 for Y.

When X has standard deviation σ = 0.2, the distribution of Y is not very skewed. We find (see under Log-normal distribution), with values rounded to four digits:

Mean = 1.0202 Mode = 0.9608

Indeed, the median is about one third on the way from mean to mode.

When X has a much larger standard deviation, σ = 5, the distribution of Y is strongly skewed. Now

Mean = 7.3891 Mode = 0.0183

Here, Pearson's rule of thumb fails miserably.

MEDIAN

In probability theory and statistics, a median is a number dividing the higher half of a sample, a population, or a probability distribution from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, one often takes the mean of the two middle values.

At most, half the population has values less than the median and at most half have values greater than the median. If both groups contain less than half the population, then some of the population is exactly equal to the median.

Popular explanation

The difference between the median and mean is illustrated in a simple example.

Suppose 19 paupers and one billionaire are in a room. Everyone removes all money from their pockets and puts it on a table. Each pauper puts $5 on the table; the billionaire puts $1 billion (that is, $109) there. The total is then $1,000,000,095. If that money is divided equally among the 20 persons, each gets $50,000,004.75. That amount is the mean (or "average") amount of money that the 20 persons brought into the room. But the median amount is $5, since one may divide the group into two groups of 10 persons each, and say that everyone in the first group brought in no more than $5, and each person in the second group brought in no less than $5. In a sense, the median is the amount that the typical person brought in. By contrast, the mean (or "average") is not at all typical, since no one present - pauper or billionaire - brought in an amount

approximating $50,000,004.75.

Non-uniqueness

There may be more than one median. For example if there are an even number of cases, and the two middle values are different, then there is no unique middle value. Notice, however, that at least half the numbers in the list are less than or equal to either of the two middle values, and at least half are greater than or equal to either of the two values, and the same is true of any number between the two middle values. Thus either of the two middle values and all numbers between them are medians in that case.

Measures of statistical dispersion

When the median is used as a location parameter in descriptive statistics, there are several choices for a measure of variability: the range, the inter-quartile range, and the absolute deviation. Since the median is the same as the second quartile, its calculation is illustrated in the article on quartiles. To obtain the median of an even number of

numbers, find the average of the two middle terms.

Medians of particular distributions

The median of a normal distribution with mean μ and variance σ² is μ. In fact, for a normal distribution, mean = median = mode.

The median of a uniform distribution in the interval [a, b] is (a + b) / 2, which is also the mean.

Medians in descriptive statistics

The median is primarily used for skewed distributions, which it represents differently than the arithmetic mean. Consider the multiset {1, 2, 2, 2, 3, 9}. The median is 2 in this case, as is the mode, and it might be seen as a better indication of central tendency than the arithmetic mean of 3.166....

Calculation of medians is a popular technique in summary statistics and summarizing statistical data, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier values than is the mean.

MEAN

In statistics, mean has two related meanings:

- The average in ordinary English, which is also called the arithmetic mean (and is distinguished from the geometric mean or harmonic mean). The average is also called sample mean. The expected value of a random variable, which is also called the

population mean.

- In statistics, ‘means’ are often used in geometry and analysis. A wide range of means have been developed for these purposes, which are not much used in statistics. See the other means section below for a list of means.

Sample mean is often used as an estimator of the central tendency such as the population mean. However, other estimators are also used.

For a real-valued random variable X, the mean is the expectation of X. If the expectation does not exist, then the random variable has no mean.

For a data set, the mean is just the sum of all the observations divided by the number of observations. Once we have chosen this method of describing the communality of a data set, we usually use the standard deviation to describe how the observations differ. The standard deviation is the square root of the average of squared deviations from the mean.

The mean is the unique value about which the sum of squared deviations is a minimum.

If you calculate the sum of squared deviations from any other measure of central tendency, it will be larger than for the mean. This explains why the standard deviation and the mean are usually cited together in statistical reports.

An alternative measure of dispersion is the mean deviation, equivalent to the average absolute deviation from the mean. It is less sensitive to outliers, but less tractable when combining data sets

Arithmetic Mean

The arithmetic mean is the "standard" average, often simply called the "mean".

The mean may often be confused with the median or mode. The mean is the arithmetic average of a set of values, or distribution; however, for skewed distributions, the mean is not necessarily the same as the middle value (median), or most likely (mode). For

example, mean income is skewed upwards by a small number of people with very large incomes, so that the majority has an income lower than the mean. By contrast, the median income is the level at which half the population is below and half is above. The mode income is the most likely income, and favors the larger number of people with lower incomes. The median or mode is often more intuitive measures of such data.

That said, many skewed distributions are best described by their mean - such as the Exponential and Poisson distributions.

An amusing example…

Most people have an above average number of legs. The mean number of legs is going to be less than 2, because there are people with one leg, people with no legs and no people with more than two legs. So since most people have two legs, they have an above average number.

Geometric Mean

The geometric mean is an average that is useful for sets of numbers that are interpreted according to their product and not their sum (as is the case with the arithmetic mean).

For example rates of growth.

For example, the geometric mean of 34, 27, 45, 55, 22, 34 (six values) is (34 x 27 x 45 x 55 x 22 x 34)^1/6 = (1699493400)^1/6 = 34.545

Harmonic Mean

The harmonic mean is an average which is useful for sets of numbers which are defined in relation to some unit, for example speed (distance per unit of time).

An example…

An experiment yields the following data: 34, 27, 45, 55, 22, 34. We need to find the harmonic mean. No. of items is 6, therefore n = 6. Value of the denominator in the formula is 0.181719152307. Reciprocal of this value is 5.50299727522. Now, we multiply this by ‘n’ to get the harmonic mean as 33.0179836513.

Weighted Arithmetic Mean

The weighted arithmetic mean is used, if one wants to combine average values .rom samples of the same population with different sample sizes:

The weights ωi represent the bounds of the partial sample. In other applications they represent a measure for the reliability of the influence upon the mean by respective values.

SUMMARY

This chapter has given the meaning of population parameters. The procedures of measuring the above population parameters are dealt with in detail in the chapter.

KEYTERMS

· Population parameters

· Mean, Mode and Median

· Arithmetic mean

· Geometric mean

· Harmonic mean

· Skewed distribution IMPORTANT QUESTIONS

1. Explain the methods to measure the Median, Mode and Mean

2. What are the different types of Means?

End of Chapter

-LESSON – 6

In document Business Research Methods (Page 31-39)