Sample size - POPULATION OF THE STUDY - MATRIX FOR DATA COLLECTION METHODS Data collection Meth

RESEARCH DESIGN AND METHODOLOGY 5.1 INTRODUCTION

MATRIX FOR DATA COLLECTION METHODS Data collection Methods

5.4 POPULATION OF THE STUDY

5.4.1 Sample size

A sample is a sub-set of some pre-determined size from a population of interest (Bailey 1994:82) so that - by studying the sample - results may be fairly generalise back to the population from which they were chosen (Trochim 2002:1). An adequate sample size reduces the likelihood of sampling error and the optimal size for a particular study may

be estimated from several parameters (Polgar and Thomas 1997). Neuman (1997:221) observes that the question of sample size may be addressed in two ways:

• The first one is to make assumptions about the population and use statistical equations to determine the sample size. Using this method, the researcher makes assumptions about the degree of confidence or number of errors that is acceptable – as well as the degree of variations.

• The second method is the “rule of thumb” - a conventional or commonly accepted number. This method is used because researchers, rarely, have the information required by the statistical method and because it gives sample sizes close to those of the statistical method.

5.4.1.1 Sample size for policy makers and information providers

Neuman (1997:222) argues that one of the methods of sample size determination is by “rule of thumb” - a conventional or commonly accepted amount. As noted earlier, based on the result of the pilot study and the manageable number of the target population of the policy makers and information providers, the study used all the targeted policy makers and information providers. In other words, the study used all of the 20 targeted policy makers and 75-targeted information providers.

5.4.1.2 Sample size for the SMEs

To determine the sample size for the SMEs, the statistical method was adopted - since the size of the population is known to be 721. Rea and Parker (1997) state that when, statistically, determining a sample size from a population, two interrelated factors that the researcher must, specifically, address before proceeding with the selection of a sample size are confidence interval and level of confidence. A confidence interval is an interval used to estimate the likely size of a population parameter. It gives an estimated range of values - calculated from a given set of data - that has a specified probability of containing the parameter being estimated. The level of confidence, on the other hand, is the risk of error the researcher is willing to accept in the study. Given the time requirements, budget and the magnitude of the consequences of drawing incorrect conclusions from the

sample, the researcher, typically, chooses either a 95% level of confidence – with a 5% chance of error - or a 99% level of confidence – with a 1% chance of error (Rea and Parker 1997:114). In order to minimise the risk of errors, the researcher considered the following:

• The greater the consequences of generalising data that might lead to incorrect conclusions, the greater the level of confidence the researcher should establish. In practical terms, this involves a choice between the 95% and 99% levels. In most cases the researcher can be satisfied by choosing the 95% confidence level - 5% risk.

• The margin of error or confidence interval must be established. The researcher will, generally, find 3-5 % to be satisfactory for proportional data.

On the same note, the Evaluation and Data Development Strategic Policy, Human Resources Development, Canada (1998) argues that the sample size to be chosen for a particular survey depends, mainly, on tolerable error; population size; the importance of particular sub-groups; anticipated level of non-response; and how much money is available. "Tolerable error" refers to the margin of error for the survey. The margin of error tells the reader how accurate the study's findings are. It adjusts the standard error to account for any potential differences between the sample and the population via the calculation of a "confidence interval" for the population mean. Traditionally, a 95% confidence interval is used (Evaluation and Data Development Strategic Policy, Human Resources Development Canada 1998). The tolerable margin of error is, usually, between 3% and 5% - much lower and the costs of the survey begin to rise, dramatically.

The traditional formula for sample sizes is:

n = 1.962 p (1-p)/Cp2 OR n= Zα√p(1-p) 2 Cp

Where n is the sample size to be calculated, Cp is the tolerable standard error and p is the proportion having the characteristic being measured. The most conservative way of handling this uncertainty is to set the value of p at the proportion that would result in the highest sample size. This occurs when p = 0.5, and (1-p) is the proportion who lack it. For example, if 48% said “yes”, 52% must have said “no”. Zα= level of confidence - i.e., 95% or 1.96 from Z-table. That is to say that the 1.96 figure reflects the choice of a 95% confidence interval. In a normal distribution, 95% of the area under the curve is within 1.96 standard deviations of the mean. When the population size is known - and is below 100,000 - the "finite population correction factor" must be used to determine sample size (Evaluation and Data Development Strategic Policy, Human Resources Development Canada 1998; Israel 1992). The finite population correction factor measures how much extra precision is achieved when the sample size comes close to the population size (Simon 2004). Integrating the finite population factor gives the formula for sample size as: n = Zα2_{p (1-p) N)/ [Zα}2

p (1-p)] + (N-1)Cp2 (Evaluation and Data Development Strategic Policy, Human Resources Development Canada 1998), where n is the sample size, Zα= 1.96 (from Z-table), p is the proportion having the characteristic being measured or the estimated proportion of an attribute that is present in the population. The most conservative way of handling this uncertainty is to set the value of p at the proportion that would result in the highest sample size and this occurs when p = 0.5, N is the population size which is 721 (see Section 5.5), Cp is the desired level of precision or tolerable error and this study will use 5%. Substituting these in n = Zα2_{p(1-p)N)/ (Zα}2 p(1-p)) + (N-1)Cp2, then, n= 1.962 x 0.5(1-0.5) 721/1.962 x 0.5(1-0.5) + (721-1) 0.052 n= 692.4484/2.7604 = 250.85.

The calculated n must be rounded to the next highest whole number. This gives n (number of the SMEs) to be 251.

Alternatively, the sample size for proportions is given as n= Z2 pq/e2 (Israel 1992) where n is the sample size; Z is the abscissa of the normal curve that cuts off an area or the desired confidence interval; e is the desired level of precision; p is the estimated proportion of an attribute that is present in the population; and q is 1-p. The value for Z is found in statistical tables which contain the area under the normal curve. In this study Z

is 1.96 - from the table; p is 0.5; and e is 0.05. Substituting in the formula: n= Z2 pq/e2 , then, n= 1.96 x 1.96 x 0.5 x 0.5/ 0.05 x 0.05 and gives n to be 385. However, since the population size is small - less than 100,000 – a finite population correction for proportions is required. The finite population correction for proportions is

n = n0 1 + n0-1 N

Where n is the sample size; n0 is the determined sample size that needs to be reduced slightly for precision to be achieved; and N is the population size. In this case n0 = 385 and N = 721 and substituting this in the formula:

n = 385 gives n to be 251 - the same as with the first formula. 1 + 385-1

721

Therefore, in this study, a sample size of 251 SMEs was used. Each of the sub-regions - West Nile, Madi, Acholi, Lango and Karamoja - had 50 SMEs to participate in the study.

In document Business information systems design for Uganda's economic development: the case of SMES in northern Uganda (Page 126-130)