Program GROUP: the identification of all possible solutions to a constituency-delimitation problem

(1)

Program GROUP: the identification of all possible solutions to a constituency-delimitation problem

D J Rossiter, R J Johnston

Department of Geography, University of Sheffield, Sheffield S10 2TN, England Received 9 May 1980

Abstract. A computer program is designed to produce all of the electoral constituencies for an English local authority, within the constraints imposed on the Parliamentary Boundary Commissioners.

This builds on the seminal work of Gudgin and Taylor, and introduces size constraints, evaluates shape, and evaluates the electoral consequences of swings in voter opinion.

Work on electoral bias in recent years, notably that undertaken by Gudgin and Taylor (1979), has demonstrated that the concept of a nonpartisan cartography is a myth.

The boundaries of Parliamentary constituencies in Great Britain are defined by an explicitly nonpartisan body, the Parliamentary Boundary Commissioners, and yet the consequences of their decisions are almost always highly partisan.

The Parliamentary Boundary Commissioners work within certain constraints, defined by the rules under which they operate. [These rules are set out in Boundary Commission for England (1954); see also Taylor and Gudgin (1976a).] Except for situations where the Commissioners can indicate that it is necessary to break these rules, in order to produce a workable set of constituencies, the constraints are:

(1) That each major local authority unit (county, county borough, and metropolitan borough) should be treated separately in defining constituencies. (Pre-1974 local government reorganisation terminology is used, as the last full redistribution was completed in 1970.) Thus no constituency includes portions of more than one local authority.

(2) That the districts within each local authority (urban and rural districts in counties, wards in county and metropolitan boroughs) should not be split between two or more constituencies.

(3) That constituencies should, as far as possible, have the same number of electors as the national electoral quota (the national electorate divided by the number of

constituencies). Thus the task facing the Boundary Commissioners is to take every separate major local authority, to determine the number of constituencies it should contain, and delimit those constituencies, by using the wards/districts as the building blocks, and by ensuring that the resulting electorates are approximately equal. (The main situations when the basic rules are overridden concern small authorities, which are amalgamated with a neighbour.) It is assumed, of course, that the resulting constituencies will be contiguous areal units.

Within these rules, the Boundary Commissioners are frequently faced with a large number of possible solutions. (For the old county borough of Sheffield, for example, the 1970 redistribution required the twenty-seven wards to be grouped to form six constituencies. The algorithm to be described in this paper identified 15 937 possible solutions, within the given constraints.) The choice made by the Commissioners is of considerable interest to electoral geographers, as it is the basic component of the electoral bias that results from the operation of a nonpartisan cartography. What they need to know is whether the constraints operating on the Commissioners, plus the geography of votes (the distribution of party preferences across the wards/districts plus the relative disposition of those wards/districts) are such that a certain electoral

(2)

bias is inevitable. To find this out, they must establish the entire set of solutions available to the Commissioners.

To date, two pieces of research have been reported which attempt to investigate this issue. One refers to the United States, and gives few details as to its methods (Engstrom, 1977; Engstrom and Wildglen, 1977). The other, which builds on some American work on school boards (Jenkins and Shepherd, 1972), refers to the work of the English Boundary Commissioners (Taylor and Gudgin, 1976b): the algorithm presented here represents a substantial development on that seminal work.

The programming problem

The program used by Taylor and Gudgin provides only a first approximation to the task outlined here. Basically, it takes a local authority area and produces all the groups of wards which are contiguous, given the number of constituencies required.

Thus for Sunderland, which contains eighteen wards and has two constituencies, they produced all of the possible groupings of two sets of nine contiguous wards as the population of potential solutions available to the Commissioners. This implicitly assumed equal ward populations, and was therefore only an approximation to the rule requiring that constituencies have approximately equal populations. Thus we

attempted to develop an algorithm which would more nearly reproduce the problem facing the Commissioners.

Two other aspects of the districting procedure and its consequences were ignored by Taylor and Gudgin. The first concerns constituency shape. Circumstantial evidence suggests [and our research confirms: Johnston and Rossiter (1981)] that the Commissioners prefer to delimit compact constituencies rather than odd-shaped ones;

this being so, many of the solutions identified may not be feasible because of the shape of one or more of the constituencies. Thus we defined a measure of shape and used it to index every potential solution. Second, Taylor and Gudgin made no estimate of the probable effect on the electoral bias generated by the Commissioners' solution of a swing in voter preferences; we incorporated a subroutine which enabled an appreciation of the likely consequences of swings.

Building the basic algorithm

The problem facing us was a combinatorial one, of generating all possible combinations of / wards into K constituencies. For any simple problem, the total number of solutions to this problem, T, is given by T = J\/(L\)^K, where L is J/K. For almost every local authority, T is extremely large: for the Sunderland example, where 7 = 1 8 and K = 2, T = 48620. But most of these are not feasible solutions, some

because the constituencies do not form contiguous groups of wards, others because they do not meet the constraints of electorate size. (Taylor and Gudgin have identified only eighty-seven feasible solutions for Sunderland.) To search all the possible solutions for the feasible ones would be extremely time-consuming, especially for a problem as large as Sheffield, which has / = 27, K = 6; further if L is noninteger, additional problems are encountered.

The procedure adopted was based on one described by Openshaw (1977). This takes a set of wards and builds up constituencies by random selections of ward cores and then adding neighbours to these cores until all wards are in a constituency. This procedure produces many 'failures', however, because it does not aim to produce constituencies of approximately equal numbers of wards: the random selection may well result in two or more cores close together, in which case one or more of the constituencies may be small (outside the constraints) because none of its neighbours (or neighbours of its neighbours) remains unallocated to a constituency.

(3)

To circumvent this problem as far as possible, and to produce an algorithm which would lead to as few 'failures' as possible, the program developed proceeded in the following way. The basic information which it requires is the electorate in each ward and a list of the other wards with which it shares a common boundary. In the algorithm, constituencies are built successively (Openshaw's procedure builds them concurrently) so that as soon as a failure is identified, the run is aborted. (The usual cause of failure is 'island creation'; a ward is an enclave of one or more completed constituencies, to which it cannot be added because of the size constraints, neither can it be placed into any other constituency.)

Step 1: Core selection. To ensure that the cores are not clustered within the local authority, and thus likely to produce a failure, the authority is stratified into K sets of contiguous wards, with approximately J/K wards in each stratum. This stratification is input to the program.

Step 2: Initiation. A core ward is randomly selected from each stratum.

Step 3: Choice of first constituency. Of the selected core wards, that with the smallest number of contiguous wards is chosen as the base for building the first constituency.

Step 4: An electoral quota for this constituency is selected (see next section).

Step 5: All of the wards contiguous to the core ward, which are not already allocated to a constituency, are listed.

Step 6: One of the wards listed is selected randomly (or quasirandomly, see the section 'Modifying the procedure') and added to the constituency.

Step 7: The total electorate of the constituency is identified. If this is further from the quota {step 4) than was the case before the ward was added, the constituency is considered complete. (This rule means that, for example, with a quota of 20000, if before ward k is added the constituency population is 19000 and afterwards it is 21000, the procedure stops.)

During steps 5-7, every time a ward is added to a constituency a check is made to see if this will lead to island creation—separation of a ward into an enclave. If this is so, the potential enclave ward is added to the ward selected at step 6, and they are treated as a single ward at step 7. This avoids many potential island creation failures.

Step 8: Selection of a new core. Of the remaining cores selected at step 2 and still not in a constituency, that with the smallest number of neighbours not yet in a constituency is chosen as the core for the next constituency. The algorithm then returns to step 4.

Step 9: After all constituencies have been defined, and a failure has not occurred (that is, all are within the electoral quotas):

(a) the shape index (see below) is computed;

(b) the solution's uniqueness (see below) is tested; and (c) electoral statistics are computed, if required.

The algorithm then returns to step 2, unless all of the prescribed number of runs have been undertaken.

Calculating electoral quotas (step 4)

Given the total electorate (TE) of the local authority being studied, the average electorate per constituency should be JE/K. However, some percentage variation about this will be allowed; the size of that percentage (KVARN) is input to the program. The electoral quota for the first constituency is randomly selected within the given bounds as

/ KVARNXTE

(

^l±

-wo-)ir-

(4)

Once a constituency has been defined, the range of possible electorate sizes for the next is changed. Thus for the nth constituency, the electoral quota is randomly selected from

/ K V A R N ( Z - n ) \ T E - C E

\l ± lOO(Z-l) JK-n + V

where CE is the electorate already allocated to constituencies.

The measurement of shape

Conventional measures of the shape of a constituency seek to relate it to the most compact form—a circle. This is frequently irrelevant, however, as most wards have boundaries which are far from either straight or the perimeter of a circle. Taylor (1973) recognised this, and produced an alternative index. It is applied to individual constituencies only, however, and when applied to a set of constituencies it is likely to be a constant. For a set of constituencies a simple index was devised by

categorising boundaries into three types: (1) external—the perimeter of the local authority; (2) internal type A—the ward boundaries which are used as constituency boundaries; and (3) internal type B—the ward boundaries which are not used as constituency boundaries. As shown elsewhere (Johnston and Rossiter, 1981), the more compact the whole set of constituencies the longer the length of the internal type B boundaries. Thus the shape index, KSHAPE, is that length. [There is no value in expressing internal type B length as a percentage of the length (internal type A + internal type B) except for interauthority comparative studies, and there seems little demand for these. The percentage could be easily obtained, however.]

Establishing the uniqueness of the solution

As the program works on the basis of random selections it regularly produces the same solution. For step 9(b), therefore, a two-part index number is produced. That part preceding the decimal point is KSHAPE. That part after the decimal point is the average constituency deviation (in proportions) from the authority's electoral quota. A check at step 9(b) inquires whether this index number has already been produced: if so, the solution is deemed a failure.

An extension for the Euro constituencies

The program was developed at the time when the Boundary Commissioners were required to produce a set of constituencies for the European Parliament elections.

The existing (1970 redistribution) Westminster Parliament constituencies were used as the building blocks. When investigating this procedure for Greater London, it was noted that the Commissioners preferred solutions in which a European constituency did not incorporate Westminster constituencies from more than one metropolitan borough. Thus for this problem, an extra constraint (the 'London approach') was included: at step 5, the listing of contiguous constituencies includes only those in the same borough as the core, unless there are no such contiguous constituencies in that borough.

Modifying the election result

As pointed out above, Taylor and Gudgin estimated the electoral bias given a particular pattern of voting across the wards. The program developed here operates in the same way: if ward voting data are input, it will determine which party would win each constituency in each solution. It also applies the concept of uniform swing in each constituency (Butler and Stokes, 1974) to identify what would happen if there were a change in voter preferences. A matrix (JERRY) is produced. Its vertical axis lists all possible solutions (that is, in a two-constituency town these would be

(5)

L = 2, C = 0; L = 1, C = 1; L = 0, C = 2). Its horizontal axis has seventeen categories. The ninth uses the input voting data; columns 10-17 represent swings of

1, ..., 8 percent to the party of the right; columns 1-8 represent swings of 8, ..., 1 percent to the party of the left. The entries in the matrix are the number of solutions producing each result: each column sum is equal to the total number of solutions.

Modifying the procedure

By means of the procedure outlined earlier, many of the solutions have relatively (for the local authority) small shape indices. To save computing time, and to produce only the 'more compact' or 'more shapely' solutions, the program can be run in one of three modes by setting a MODE parameter to 1, 2, or 3:

MODE = 1 The procedure already described.

MODE = 2 With MODE = 1, all contiguous neighbours listed at step 5 have the same probability of being selected. In this mode, each neighbour's probability of being selected is proportional to the reciprocal of its own number of links with as yet unclassified wards. This ensures a bias towards the more compact solutions.

MODE = 3 This exaggerates the bias introduced in the second mode, and uses as a numerator the length of the boundaries of the ward with those wards already in the constituency; the denominator remains the same.

Running the program

The program is too long to be reproduced here (a Fortran listing can be obtained from the second author). This section describes the input and output. (Format statements for the input are in the program.)

The input Control card 1:

NRUNS the number of runs for the data set.

KVARN the percentage deviation from the electorate average allowed.

MODE which procedure to be used with regard to compact solutions.

KPOLL enter 1 if data on party strength are to be input.

KSTRNT enter shape index; no solutions with a shape index less than this value will be printed.

LONDON enter 1 if the 'London approach' is required.

Control card 2:

TOTAL the total electorate of the local authority(1). NWARDS the number of wards.

NSEATS the number of constituencies to be created.

NLINKS the total number of links between contiguous wards.

Control card 3:

ISET a vector (length NWARDS) allocating the wards to the NSEATS sets from which the random cores are selected. Each ward is given a number representing the set it is in.

Data set 1:

J DATA a matrix with six columns and NWARDS rows. For each ward, the entries in the columns are:

Column 1 the number of links with contiguous wards, Column 2 the electorate at the time of redistribution,

(*) In the 'London approach' local authority refers to the set of local authorities being studied.

(6)

Column 3 the electorate at the time of the election being studied (this may be the same as in column 2),

Column 4 the vote for the first-named party, Column 5 the vote for the second-named party, Column 6 the vote for any other parties(2). If KPOLL = 0 columns 3 to 6 are left empty.

Data set 2:

LINKS a matrix with NLINKS rows and either two or three columns. The matrix is read in column by column. The whole of column 1 is read in first, followed (starting on a new card) by the whole of column 2, etc.

Column 1 a list of the wards with which each ward in turn is linked (thus if ward 1 is linked with wards 2, 4, and 5 and ward 2 with wards 1,3, and 6 the first six entries in the column are 2, 4, 5, 1, 3, 6).

Column 2 a list of the length of the boundaries for the wards listed in column 1 (thus the first value is the length of the boundary separating wards 1 and 2, the second that separating wards 1 and 4, etc); any metric can be used.

Column 3 used if LONDON = 1 . In this a 1 shows that the two linked Westminster constituencies are in the same borough, a 0 that they are not.

If LONDON = 1 control card 4 and data sets 3 and 4 are used.

Control card 4:

NBOROS the number of boroughs.

NCONSTS the number of Westminster constituencies (taken to be wards for operating the program).

Data set 3:

JLBOR a vector (of length NBOROS) listing the number of Westminster constituencies in each borough.

Data set 4:

LBOROS a vector (of length NCONSTS) listing the borough which each Westminster constituency is in.

The output

For each successful run of the program, the following information is printed (on an output channel selected by the user):

Run Number, KSHAPE listing of number of seats won by party 2, seventeen rows of the horizontal axis of JERRY, and for each constituency the percentage swing needed to change result, and its member wards. This is followed by summary figures, NRUNS, the number of successful solutions, the average of KSHAPE, and the JERRY matrix (if required, that is, if KPOLL = 1) with axes as defined previously.

Operational issues

The program has been run for fourteen separate problems: four county boroughs;

eight metropolitan boroughs; and the definition of European constituencies for (1) London north of the Thames and (2) London south of the Thames. For the first twelve runs, NRUNS was set at 200000, and MODE at 1; for the other two NRUNS at 100000, and MODE at 3.

Table 1 lists certain statistics of the runs, which were on the University of Sheffield ICL 1906S computer. They show that in most cases solutions were still being

produced close to the limit of NRUNS (which suggests that we may have missed a W The figure in row 3 is used in the calculations for KVARN. The voting figures are expressed as percentages of the votes cast.

(7)

County boroughs

Sheffield 200 Coventry 200 Hull 200 Leicester 200 Metropolitan boroughs

Camden Hackney Haringey Harrow Islington Lambeth Wandsworth Westminster

200 200 200 200 200 200 200 200 European constituencies London North

London South

100 100

12 10 6 10

2 3 2 5 6 4 9 3

27 18 21 16

26 23 23 21 20 22 22 24

6 4 3 3

2 2 2 2 2 3 3 2

6 4

20 12

56 36

5937 244 100 214

878 284 334 460 138 97 71 105

2808 419

199001 74638 133859 75185

199623 164984 178628 162850 47393 81411 175188 166216

99990 198999

(8)

very small number in some cases). Monitoring of the output, however, showed that the number of solutions produced decreased as the size of NRUNS grew, and also that most of the more compact solutions were produced early on. Running time (cpu seconds) was about 6000 seconds for Sheffield, 3000 seconds for Coventry, Hull, and Leicester, 2000-3000 seconds for each of the metropolitan boroughs, and 8000 and 4000 seconds for the North and South London Euroconstituencies, respectively.

Conclusions

The program described here was devised as part of a specific piece of work, an investigation of the range of solutions available to the Boundary Commissioners, and their electoral consequences. It can be used for similar problems in a variety of situations, and could be employed to produce solutions in a nonpartisan way, just as computer usage has been proposed in the USA (Taylor and Johnston, 1979, chapter 8).

Further, it could be used more generally for region-building exercises which do not involve evaluating electoral consequences; if KPOLL were set to zero, it would merely produce the whole set of feasible regions for a given population constraint (KVARN).

Acknowledgements. The research during which this program was developed was financed under grant HR 5939/2 from the SSRC. We are grateful for that support, for the assistance given by Stan Openshaw in providing a copy of his program, and for the stimulus provided by Pete Taylor.

References

Boundary Commission for England, 1954 Report on First General Review of Parliamentary Constituencies under the House of Commons (Redistribution of Seats) Act, 1949 (HMSO, London)

Butler D, Stokes D, 1974 Electoral Change in Britain second edition (Macmillan, London) Engstrom R L, 1977 "The Supreme Court and equipopulous gerrymandering" Arizona State Law

Journal 2 277-319

Engstrom R L, Wildglen J, 1977 "Pruning thorns from the thicket: an empirical test of the existence of racial gerrymandering" Legislative Studies Quarterly 4 465 -479

Gudgin G, Taylor P J, 1979 Seats, Votes, and the Spatial Organization of Elections (Pion, London) Jenkins M A, Shepherd J W, 1972 "Decentralizing high school administration in Detroit: an

evaluation of alternative strategies of political control" Economic Geography 48 95-106 Johnston R J, Rossiter D J, 1981 "Shape and the definition of Parliamentary constituencies" Urban

Studies 18 (forthcoming)

Openshaw S, 1977 "An optimal zoning approach to the study of spatially aggregated data" in Spatial Representation and Spatial Interaction Eds I Masser, P J B Brown (Martinus Nijhoff, Leiden) pp 96-113

Taylor P J, 1973 "A new shape measure for evaluating electoral district patterns" American Political Science Review 67 947-950

Taylor P J, Gudgin G, 1976a "The myth of non-partisan cartography" Urban Studies 13 13-25 Taylor P J, Gudgin G, 1976b "The statistical basis of decisionmaking in electoral districting"

Environment and Planning A 8 43-58

Taylor P J, Johnston R J, 1979 Geography of Elections (Penguin Books, Harmondsworth, Middx)