Economic and Social Review, Vol. 10, No. 2, January, 1979, pp. 169-174.
Notes a n d C o m m e n t s
R A N S A M : A R a n d o m Sample Design
for I r e l a n d
B R E N D A N J . W H E L A N
The Economic and Social Research Institute, Dublin
I I N T R O D U C T I O N
T
he purpose o f this paper is t o describe R A N S A M , the computer-based system for d r a w i n g r a n d o m samples f r o m the Electoral Register, w h i c h has been developed over the past few years at E S R I .1 This system, w h i c h isupdated each year b y reference t o the most recent Electoral Register, permits us t o offer a sample selection service t o those c o n t e m p l a t i n g carrying o u t a survey o n a local (e.g., regional) or n a t i o n a l level. The c o m p u t e r costs i n selecting samples are q u i t e l o w , and the names and addresses can b y t y p e d at reasonable cost. Such samples have the advantage t h a t we can ensure that the selected respondents have n o t cropped u p i n any o f the surveys conducted b y the E S R I i n the current year.
This paper first discusses the use o f the Electoral Register as a sampling frame. I t t h e n goes o n t o describe the m a i n features o f R A N S A M and the reasons w h y these features have been i n c l u d e d . I t should be n o t e d that t h r o u g h o u t the paper we are concerned w i t h r a n d o m or p r o b a b i l i t y samples (i.e., samples i n w h i c h the p r o b a b i l i t y o f inclusion o f each element is
able). N o n - r a n d o m samples (e.g., quota samples) can also be based o n the present selection system. For instance, we sometimes use R A N S A M t o select a r a n d o m sample o f names and addresses as starting points for the interviewers and instruct t h e m t o call o n the selected respondent and o n , say, every t w e n t i e t h house thereafter.
I I T H E E L E C T O R A L R E G I S T E R A S A S A M P L I N G F R A M E
The Electoral Register, w h i c h comes i n t o force o n 15 A p r i l o f each year, lists the names and addresses o f all those eligible t o vote i n Dail elections
(resident citizens) and those eligible t o vote i n local government elections (all residents). Persons eligible t o vote i n local government elections o n l y are indicated b y the letter L . The Register is c o m p i l e d b y the franchise depart ments o f the c o u n t y councils and c o u n t y b o r o u g h councils.
The Register is published i n the f o r m o f " b o o k s " , each relating t o " p o l l ing d i s t r i c t s " , i n w h i c h the electors are n u m b e r e d sequentially f r o m 1 t o n , where n is the t o t a l number o f people i n the b o o k . The c o u n t y , constituency, and D i s t r i c t Electoral Divisions (DEDs) i n w h i c h the p o l l i n g district is located are s h o w n . I n general, the p o l l i n g districts are n o t sub-divisions o f the DEDs — one p o l l i n g d i s t r i c t may contain parts o f several D E D s . This is n o t the case i n the urban areas o f C o r k and D u b l i n where there is a one-to-one correspondence between DEDs (or wards) and the p o l l i n g districts.
K i s h ( 1 9 6 5 , p . 53) describes a perfect sampling frame as one i n w h i c h "every element appears o n the list separately once, o n l y once, and n o t h i n g else appears on the l i s t " . Measured against this y a r d s t i c k , the Irish Electoral Register has certain deficiencies.
Age Restriction: Under current l a w , the Register excludes those under 18
years o f age. Thus, a sample w h i c h includes persons under 18 cannot be obtained d i r e c t l y f r o m the Register.
Missing Elements: Individuals over 18 w h o are resident i n the
country-may fail t o appear o n the Register. Most o f these are people w h o have o n l y just reached their 1 8 t h b i r t h d a y or w h o have o n l y recently moved i n t o the d i s t r i c t .
Superfluous Elements: The Register includes the names o f some deceased
electors and o f some w h o have left the d i s t r i c t w i t h o u t cancelling their previous registration.
Despite these difficulties, the Electoral Register is still the most appro priate list for use as a sampling frame t o select representative samples at reasonable cost.
(a) c o u n t y , (b) constituency,
(c) p o l l i n g district name and i d e n t i f i c a t i o n code,
(d) District Electoral Division(s) i n t o w h i c h the p o l l i n g district falls, (e) map reference o f one o f the DEDs i n t o w h i c h p o l l i n g district falls, (f) n u m b e r o f electors i n p o l l i n g district,
(g) (for D u b l i n ) : a r a n k i n g o f the w a r d ( D E D ) b y percentage o f males i n unskilled occupations (1971) Census),
(h) (for D u b l i n ) : a r a n k i n g o f the w a r d ( D E D ) b y percentage o f house holds o w n i n g cars ( 1 9 7 1 Census),
(i) (for D u b l i n ) : a r a n k i n g o f the w a r d ( D E D ) b y an overall socioeco n o m i c status index (based o n 1971 Census).
These cards are sorted b y alphabetical order o f constituency. W i t h i n each constituency, the order o f the cards is determined i n the manner described b e l o w under the heading " S t r a t i f i c a t i o n " . B y using these cards as an i n p u t i n t o the R A N S A M program a n a t i o n a l r a n d o m sample is obtained i n the f o r m o f a set o f p o l l i n g districts (books) and the registration numbers o f the selected electors w i t h i n these b o o k s .
I l l M A I N F E A T U R E S O F R A N S A M
I t w o u l d , o f course, be possible t o select a simple r a n d o m sample (srs) f r o m the t w o m i l l i o n names o n the Register. However, this has many dis advantages such as high cost, impractical scattering o f respondents and higher sampling variances than c o u l d be achieved b y other methods o f sampling. Samples selected b y R A N S A M incorporate three m a i n modifications w h i c h distinguish t h e m f r o m simple r a n d o m samples: (i) multi-stage sampling, ( i i ) selection w i t h p r o b a b i l i t y p r o p o r t i o n a l t o size and (iii) s t r a t i f i c a t i o n . We n o w consider each o f these i n t u r n .
Multi-stage Sampling
Instead o f t a k i n g a sample o f individuals f r o m the Register, R A N S A M first selects a set o f p r i m a r y sampling units (PSUs) and then selects a r a n d o m sample o f individuals w i t h i n each o f the PSUs. The PSUs used are the p o l l i n g districts i n t o w h i c h the Electoral Register is divided. I n general, this t y p e o f sample w i l l be expected t o be less efficient t h a n a srs, b u t b y concentrating the selected individuals i n t o certain geographical areas, one can reduce costs t o such an extent as t o more than j u s t i f y the decrease i n efficiency.
t h r o u g h o u t the c o u n t r y were selected and 4 0 electors selected f r o m each d i s t r i c t . Each interviewer i n the survey was allocated one or t w o o f these d i s t r i c t s .2
The p o l l i n g districts vary enormously i n size — f r o m a few dozen electors t o several thousand. A p r o b l e m m a y , therefore, arise i n small p o l l i n g districts i n t h a t t w o or m o r e people f r o m the same household m a y crop up i n the sample. This is undesirable, b o t h f r o m the interviewer's p o i n t o f view and because the intra-household c o r r e l a t i o n o f response is l i k e l y t o be h i g h . (See Cochran, 19 6 3, p . 2 1 0 , o n this p r o b l e m . ) R A N S A M avoids this d i f f i c u l t y b y a l l o w i n g the user t o specify a m i n i m u m size for the PSUs. A n y p o l l i n g dis t r i c t b e l o w this size is amalgamated w i t h a contiguous one t o f o r m a com b i n e d PSU greater than the required size. A t y p i c a l value o f the m i n i m u m size w o u l d be about 5 0 0 .
Probability Proportional to Size
A n i m p o r t a n t feature o f the R A N S A M design is that i t is "epsem" (i.e., each element (elector) i n the p o p u l a t i o n has an equal p r o b a b i l i t y o f selec t i o n ) . Epsem procedures have some i m p o r t a n t advantages, n o t a b l y the fact t h a t the estimates derived f r o m such a sample are self-weighting (see K i s h , 1 9 6 5 , p p . 20-22).
I n order t o show w h y the sample selected is epsem, let us examine h o w the sampling procedure operates. T o use the procedure one specifies
n = the (constant) n u m b e r o f electors t o be chosen f r o m each PSU, k = the n u m b e r o f PSUs t o be selected,
m = the m i n i m u m desired PSU size.
The p r o g r a m w i l l t h e n decide the n u m b e r o f PSUs t o be selected f r o m each constituency b y f o r m i n g a cumulative list o f the populations o f the con stituencies and sampling systematically f r o m a r a n d o m start. W i t h i n each c o n s t i t u e n c y , i t t h e n amalgamates p o l l i n g districts b e l o w the desired m i n i m u m size w i t h contiguous p o l l i n g districts. N e x t , i t selects the p o l l i n g dis tricts t o be i n c l u d e d i n the sample b y f o r m i n g a cumulative list o f the populations o f the p o l l i n g districts and sampling systematically f r o m a r a n d o m start. T h u s , the p r o b a b i l i t y o f a given PSU being selected is propor t i o n a l t o its size. F o r each p o l l i n g district selected, a r a n d o m start and an interval are t h e n calculated and the registration numbers o f the n selected electors listed. F i n a l l y , the names and addresses o f the selected electors are f o u n d and t y p e d .
Thus, the p r o b a b i l i t y o f selecting the i t h element o f the j t h PSU is
Pr(Ej.) = 'Pr(PSUj)- P r ( E; j| PSUj selected)
n n
= constant
where Nj is the size o f the j t h PSU, and n the constant number o f elements selected f r o m each PSU. Hence, the p r o b a b i l i t y o f selection o f each element i n the p o p u l a t i o n is equal.
Stratification
This means that the p o p u l a t i o n is d i v i d e d i n t o sub-groups or strata and a srs is selected f r o m each s t r a t u m . This procedure w i l l almost always result i n an i m p r o v e m e n t i n precision (Cochran, 1 9 6 3 , p . 9 9 ) . S t r a t i f i c a t i o n is incor porated i n R A N S A M o n l y at the first stage, all PSUs being stratified b y geo graphical l o c a t i o n .
I n order t o stratify the PSUs geographically, one w o u l d ideally l i k e t o have maps o f the p o l l i n g districts. U n f o r t u n a t e l y , n o such maps are available, so one must utilise the maps o f the DEDs available f r o m the Ordnance Survey. Given the lack o f one-to-one correspondence between p o l l i n g dis tricts and D E D s , these maps o n l y give the approximate l o c a t i o n o f the p o l l i n g
districts. F u r t h e r m o r e , w h e n one p o l l i n g d i s t r i c t contains parts o f several D E D s , i t is necessary t o i d e n t i f y each p o l l i n g d i s t r i c t u n i q u e l y w i t h one o f the DEDs i n some w a y . The rather arbitrary rule w h i c h we adopted t o do this was t o i d e n t i f y the p o l l i n g d i s t r i c t w i t h the D E D w h i c h comes alpha betically first. T h o u g h far f r o m perfect, this rule allows one t o locate the p o l l i n g districts a p p r o x i m a t e l y , and this is really all that is required for the purposes o f s t r a t i f i c a t i o n .
The order o f the DEDs w i t h i n each constituency is determined as follows. Each D E D is n u m b e r e d f r o m 1 t o t , where t is the t o t a l n u m b e r o f DEDs i n the constituency. The numbers are allocated b y starting i n the extreme n o r t h o f the constituency, m o v i n g west t o east, then east t o west and so o n (see Figure 1 ) . Ordering the p o l l i n g districts b y this n u m b e r i n g system and selecting a systematic r a n d o m sample means that those selected are stratified geographically across the constituency.
T o date, R A N S A M has been used t o select samples for some t w e n t y surveys and seems t o produce sample estimates w h i c h correspond well t o k n o w n data o n such variables as age, sex, occupational status, etc. O n the w h o l e , the researchers w h o have used these samples seem satisfied w i t h t h e m .
Figure 1
Constituency b o u n d a r y D E D b o u n d a r y
I t is hoped t o develop the system i n several ways over the n e x t few years. W o r k is already at an advanced stage t o l i n k the Electoral Register w i t h the Small Area data f r o m the Census o f P o p u l a t i o n . This w o u l d p e r m i t one t o stratify the PSUs b y a n u m b e r o f variables (such as age structure, socio economic status, etc.) i n a d d i t i o n t o the simple geographical stratification n o w i n c o r p o r a t e d i n the system. A slightly m o r e long-term objective is"the calculation o f t y p i c a l standard errors and design effects for samples d r a w n b y R A N S A M . Several surveys already exist at E S R I o n w h i c h such calcula tions c o u l d be based and more are planned for the f u t u r e .
REFERENCES