Holecounts - Quantum Vol1

9.9 Canceling the run

Quick Reference To cancel a run, type:

cancel [num_times_execute]

The word cancel, which is similar in format to stop, terminates the run immediately, producing tables only for those respondents already passed to the tabulation section. It is often used to halt a run when too many errors have been detected in the data. For instance, to cancel the run when more than 100 errors have been found, we might have:

/* ect is the error counter

if (c110n’1’) write $error in c110$; ect=ect+1 if (c145n’ ’) write $c145 not blank$; ect=ect+1 if (ect.gt.100) cancel

To cancel the run when more than 50 records have been rejected, we could write:

if (rec_rej.gt.50) cancel

Alternatively, cancel may be followed by a number indicating that the run should be cancelled when the statement has been executed a specific number of times:

cancel 100

cancels the run when this statement has been executed 100 times.

As with stop, holecounts and error listings will only contain information about records read prior to the cancellation condition being fulfilled. If 400 records are read before 101 errors are found, we will see the errors for those 400 records.

9.10 Going temporarily to the tab section

Quick Reference

To send a record temporarily to the tab section, type:

process

The record is returned to the statement immediately after process.

The process statement is similar to return but must not be confused with it. When return is executed, the record is sent on to the tabulation section; after the tables are completed for that record, the program returns to the start of the edit section and the next record is read in.

When process is executed, the record is also sent immediately to the tabulation section where it is used in table creation. However, after the record has been tabulated, control is passed back to the edit section to the statement immediately following the word process. The record continues through the edit and any statements after process applicable to the record are executed. At the end of the edit the record is passed through the tabulation section again.

The process statement is used when you need to tabulate portions of a record more than once. For example, if our survey asks shoppers about the brands of bread they purchased the last four times they visited the shops, our data may be set out as follows:

c134 : Brand purchased first time (’1’=Brand A; ’2’=Brand B; ’3’=Brand C; ’4’=Brand D) c135 : Number of loaves purchased at that time

c136 : Brand purchased second time c137 : Number purchased second time c138 : Brand purchased third time c139 : Number purchased third time c140 : Brand purchased fourth time c141 : Number purchased that time

Suppose we wish to create a table showing the total number of loaves of each brand bought by all (or selected groups of) respondents during their four trips to the store. The simplest way to do this is to set up an axis of the form:

l brd;inc=c135

n23Number of Loaves Bought

col 134;Brand A;Brand B;Brand C;Brand D in the tabulation section, and to write the statement:

process

in the edit at the point you want to tabulate the record for the first brand.

The next set of edit statements will be:

c(134,135)=c(136,137) process

This overwrites the information about the first purchase with information about the second purchase, and the record is processed a second time. The total number of loaves bought on the second trip will be added to the total number of loaves bought on the first trip.

The statements continue:

c(134,135)=c(138,139) process

c(134,135)=c(140,141) process

When we finish, the total number of loaves of each brand bought by all respondents during those four visits will be contained in the relevant cells of the axis.

In a situation like this we would probably put the process statements in a loop at the end of the edit, although this is not strictly necessary. For example:

do 10 t1 = 134,140,2 c(134,135)=c(t1,t1+1) process

10 continue

This performs exactly the same task as the list of statements shown earlier; it is just a more efficient way of writing them.

✎

Be careful if process is the last statement in your edit: the record will be passed to the tabulation section by process and then again by the end statement. If this is not what you want, omit the last process.

☞

For another example of process, see ‘Incrementing tables more than once per respondent’ in chapter 4, ‘More about axes’ in the Quantum User’s Manual Volume 2.

There are a number of ways of examining your data once it has been read into the C array. You may:

• Produce a holecount showing the total number of codes in each column.

• Create a frequency distribution reporting the different values found in a column or field of columns.

• Write out specific records and examine them individually, as discussed in chapter 7, ‘Writing out data’.

10.1 Holecounts

Holecounts are used to obtain an overall picture of the data before you write your edit program. For each column they show:

• A distribution of the codes — for example, how many respondents have a 2 in column 56.

• The density of coding — how many respondents have one, two, or three or more codes in each column.

• The total number of codes for the whole data file.

There is an example of a holecount on the next page. The first column tells us the columns for which codes are being counted; in this case it is columns 1 to 16 of card 1. The numbers across the top are the individual codes, and the total in the top left-hand corner is the total number of respondents (records): our data has 605 respondents.

As you can see, there are two numbers in each cell; an absolute figure and a percentage. The former tells us how many records were found with a specific code in a column and the latter tells us what percentage of the total data that is.

For example, there are 169 records with a code 1 in column 14 and this is 27.9% of the total.

Similarly, 32 records have a code 4 in column 15 which is 5.3% of the total records. Notice that when the cell total is zero, no percentage figure is printed: this all makes it easier to see the pattern of coding in each column.

The four right-hand columns of the holecount show the density of coding in each column. the columns headed Den1 shows the total number of records with only one code of any sort in the column. Den2 is the number of records with two codes in the column, and Den3+ tells us how many records were multicoded with three or more codes in that column. The TOTAL is the total number

Let’s look at column 115. 162 records have one code only in that column; six have two codes and one has three or more codes. The total number of codes in this column is 177, and each card has an average of 0.29 codes in this column.

The holecount is the starting place in your search for errors. There are many holecounts in which it is immediately apparent that the presence of certain codes indicates an error. It is also clear whether or not the column should be multicoded.

Creating a holecount

Quick Reference

To create a holecount, type:

count c(start_col, end_col) [$text$]

where text is the holecount title.

To create a holecount you will use the count statement:

count c(start_col,end_col) [$text$]

where text is the heading to be printed at the top of each page. This is optional; if it is omitted the holecount will simply be headed ‘Holecount’. Our example was created by the statement:

count c(101,116) $Visitor Survey - British Museum (Natural History) ALL VISITORS$

Quantum itself accepts double quotes in the holecount heading, but the C compiler which processes the code that Quantum creates from your specification does not. Generally, it will issue an error message that refers to a missing ) symbol at the point the double quote occurs. To prevent this happening, precede the double quote with a backslash. For example:

count c(101,116) $Demo for \"Quantum User’s Guide\"$

You may count as many or as few columns as you like, as long as the columns to be counted are consecutive: to count, say, columns 135 to 140 and columns 160 to 180 you will need two statements, one for each field.

Records are counted at the stage they are when the count is read. If you have previously altered any columns, say, with assignment or emit statements, the count will refer to the columns as they are after the alterations rather than as they were in the original data file. Similarly, any changes which are effected after the count are not reflected in the output.

✎

If you place a count statement in a loop, Quantum sums the counts for all the columns in the statement and reports the total number of codes as the count for the first column only.

Filtered holecounts

A filtered holecount is one in which only records fulfilling a specific condition are counted. They can be created using the if statement to define the occasions when a record should be counted.

For example, suppose we only wish to include male respondents in our holecount. Our statement might be:

if (c106’1’) count c(101,108) $Demonstration Survey – Males$

Counting trailer cards

Normally, trailer cards of a given type are treated as one card and are counted together. Thus, the number of codes in a column for a particular trailer card contains the sum of all codes found in that column on all trailer cards of the given type (e.g., all cards 2s).

You may, however, prefer to produce holecounts on such cards based on their relative position within the group of trailer cards. For example, suppose card 2 is a trailer card and we wish to make a holecount on the third card 2 of each group. In chapter 6 we said that the variable allread2 is true when a card 2 has been read in for the current record, and that it keeps count of the number of card 2s read. So, to produce a holecount for the third card 2, we would write:

if (allread2.eq.3) count c(201,280) $Card 2 – Third Card$

We can also create filtered holecounts of trailer cards based on characteristics of the individual cards. Suppose we have a trailer card for each store visited, in which the store is identified in c79.

The trailer card is the 5-card. We would write:

if (c579’1’) count c(501,580) $Harrods$

Multiplied holecounts

Quick Reference

To create a multiplied or weighted holecount, type:

count c(start_col, end_col) [$text$] c(m_start, m_end)

where text is the holecount title and c(m_start,m_end) is the field in the C array containing the multiplier or weight for each record.

In ordinary holecounts, the cells are simply counts of records: each time a record is read with a specific code in a given column, the relevant cell in the holecount is incremented by one. If 231 records have a 7 in column 79, the figure in that cell will be 231.

Holecounts may also be created by incrementing each cell by the value found in a column field in the record. This value is the record’s ‘multiplier’. If the multiplier is 15, and the record has a 6 in column 152, the count for c152’6’ will be incremented by 15 rather than by 1 for this record. You may hear this type of holecount referred to as a weighted holecount because multiplying a record by a given value is the equivalent of weighting it.

✎

If the multiplier is being calculated during the run, it must be placed in the C array using wttran before the holecount is requested.

☞

For further details on weighting and wttran, see section 1.9, ‘Copying weights into the data’

in the Quantum User’s Guide Volume 3.

A multiplied holecount is created using the count statement as shown below:

count c(m,n) [$text$] c(x,y)

where c(m,n) is the field to be counted, text is the optional heading to be printed at the top of each page, and c(x,y) is the field containing the multiplier for the record. If this field contains a real number, it must be referenced as cx(x,y) otherwise the decimal point will be ignored (for example, 1.5 will be read as 15).

The number labeled TOTAL at the top of each page of output is no longer the total number of records in the data file, rather it is the number of records after each record has been multiplied by its multiplier. This is best illustrated by an example. If we are producing a holecount for c(20,30), and of our 50 respondents, 20 have a multiplier of 2.5, 15 have a multiplier of 2.6 and 15 have a multiplier of 3.0, the total at the top of the page will be 134 respondents, calculated as follows:

(20 × 2.5) + (15 × 2.6) + (15 × 3.0) = 134

Multipliers may be part of the original data file or they may be calculated during the edit. Both real and integer values are valid, even though the cell counts in the output will always be shown as whole numbers. This does not mean that you lose accuracy with real multipliers. Quantum stores the cell counts with as many decimal places as are necessary until the count is complete, whereupon it rounds all values ending in .49 or less down and all values ending in .5 or more up.

For example, we might write:

/* House owners have multiplier of 22.4 if (c104’2’) cx(177,180):1=22.4; go to 10 /* Tenants have multiplier of 12.7;

/* Others have multiplier of 11.9

if (c104’3’) cx(177,180):1=12.7; else; cx(177,180):1=11.9 10 continue

– other statements –

-count c(101,180) $Multiplied Hole-count – Card 1$ cx(177,180)

The figures used to create the multiplied holecount would then be 22.4, 12.7, or 11.9, depending upon the contents of c104 in each record. Suppose we have 27 home owners (that is, 27 people have c104’2’), the count for a ‘2’ in column 4 of card 1 would be 612.9 (27 × 22.4), which would appear in the output file as 613.

Other points to notice are:

• Since we are copying a real number into a field of columns we use the notation cx to refer to the columns and follow them with the number of decimal places required.

• Because the word count is written in lower case it may start in column 1. If it had been written in upper case it would need to start in a column other than 1 to prevent it being read as a comment.

In document Quantum Vol1 (Page 142-152)