6.3 Processing the data
Each time an ordinary record or set of cards comprising a multicard record is read in, that data is processed first by the edit section and then by the tabulation section of your program. The complete record is edited and tabulated in one go. The exception to this is the trailer card record where processing can take place a number of times within each record for each lower level.
To ensure that only the part of the edit section applying to a particular level is used, the edit section is defined separately for each level. Similarly, the table instructions specify the level at which the table should be incremented.
☞
For more information about levels, see chapter 3, ‘Dealing with hierarchical data’ in the Quantum User’s Guide Volume 3.6.4 Trailer cards
By using the Levels facility, the user need not know how Quantum deals with trailer card data internally. However, there are occasions when it may be necessary to edit or tabulate the data without using levels. To do this, it is necessary to know more about how trailer cards are processed.
Quantum deals with trailer cards in a number of ‘reads’. Cards are read into the appropriate rows of the C array until:
• A card is located with a card type matching that of the previous card (for example, two consecutive card 2’s), or
• A card is read with a type lower than its predecessor and matching one of the card types already read in during the current ‘read’ (for example, a card 2, a card 3, and then another card 2).
In order to produce useful tables, you will need to know which cards are currently in the C array.
Quantum has four reserved variables — thisread, allread, firstread and lastread — which it uses to keep track of which cards it has read for each respondent.
thisread
The array called thisread is used to check which cards have been read in during the current read.
thisread1 will be true (or 1) if a card type 1 has just been read in; thisread2 will be true if a card 2 has just been read, and so on.
There are nine such variables (thisread1 to thisread9) available unless extra card types have been specified using the max= option In this case, these variables will be numbered 1 to max; if there are 13 cards, we will have thisread1 to thisread13.
☞
For further details on max=, see ‘Highest card type number’ later in this chapter.allread
allread notes which cards have been read in so far for this questionnaire. If cards 1, 2 and 3 have been read so far, allread1, allread2 and allread3 will all be true. Additionally, each cell of allread will contain the number of cards of the given type read in — for instance, if two cards of type 3 have been read, allread3 will be true and it will contain the number 2.
As with thisread, there are nine allread variables available unless extra card types have been specified with max=.
firstread and lastread
The variables firstread and lastread become true when the first and last cards in a record have been read in.
Examples
You can use these variables in your program to associate specific parts of the edit or tabulation section with specific types of data. For instance:
if (.not. thisread3) go to 400
* card 3 edit follows .
.
400 continue
/* calculate average when all cards read for respondent if (lastread) average=sum / num
.
/* update table when all cards read for this respondent tab brand demo;c=lastread
Let’s take an example and look at the contents of the C array and the values of thisread, allread, firstread and lastread. Suppose the record has five cards: 1, 2, 2, 2 and 3 of 80 columns each. The first ‘read’ places card 1 in c(101,180) and the first card 2 in c(201,280). The second card 2 is not read into the array yet because it has the same card type as the previous card. As this is the start of a new respondent, firstread is true (or 1), and because cards 1 and 2 have been read, thisread1, thisread2, allread1 and allread2 are also true.
The second ‘read’ deals only with the second card 2 since it is followed by another card of the same type. thisread2 is true, as are allread1 and allread2. Also, allread2 contains the value 2 because we have read in 2 card 2s so far. Note that thisread1 is now false (or 0) as no card 1 was read this time.
On the third and final ‘read’ the third card 2 is read into c(201,280) and card 3 is copied into c(301,380). lastread is true because we have reached the end of the record, thisread2 and thisread3 are true because we have just read cards 2 and 3, and allread1, allread2 and allread3 are true because this record contains cards 1, 2 and 3. allread2 now contains the value 3 because there were 3 card 2s altogether.
The chart below summarizes the cards read and the variables which will be true after each read.
c(101,180) c(201,280) c(301,380) thisread allread firstread lastread
Read 1 Card 1 Card 2a 12 12 1
Read 2 Card 2b 2 12
If Quantum reads a record in which the repeated cards are out of sequence, it inserts blanks cards of the appropriate types wherever necessary to force the cards into the correct sequence. For example, if the record contains the cards 1, 2, 4, 3, 4, 4 in that order, Quantum will generate a completely blank card 3 when it reads the first card 4. The record is then processed as if it contained cards 1, 2, 3, 4, 3, 4, 4.
6.5 Columns 1 to 100
It is sometimes useful to know that in the case of multicard records the first card of the next record is waiting in columns 1 to 100 of the array. Beware of overwriting these columns.
6.6 Reserved variables
In section 6.4, ‘Trailer cards’ we discussed the reserved variable thisread, which keeps track of which cards have been read in during the current read, and allread, which keeps track of all cards read in for the current record. Other reserved variables associated with reading in data:
6.7 Using spare columns
You can use spare columns in the C array for data manipulation and storing additional information.
However, it may be clearer to store this information in named variables where the name gives some indication of the type of data stored.
In ordinary records you can use the space beyond the end of the record. If the record length is 120 columns, you can use columns 121 to 1000.
✎
For ordinary records, only columns 1 to reclen are reset to blanks, where reclen is the maximum record length as defined by the reclen= keyword on the struct statement.☞
For further information about defining the record length, see ‘Record length’ in the next section.lastrec Set to true when the last record in the file has been read or, in the case of trailer cards, the last read of the last record has occurred.
rec_count Stores the number of records read in so far.
card_count Counts the number of cards read so far.
In multicard records you may not use c(1,100). However, you may use any columns between the end of the card (reclen) and the end of that row of the C array. For instance, when reclen=80 you may use c(181,200), c(281,300) and so on. You may also use full sets of columns in which there is no data: that is, if the record has only four cards (1, 2, 3 and 4), then c(501,1000) are the spare columns you may use. Additionally, cells 101 to c(100+reclen), c201 to c(200+reclen), and so on are reset to blanks before the next record is read in.
6.8 Describing the data structure
Quick Reference
To describe the structure of the data, type:
struct; options
All programs dealing with multicard records must contain a struct statement unless the data contains trailer cards which will be read and tabulated using the levels facility. In this case you may choose between using a struct statement or using a levels file. If the run has no struct statement and no levels file, Quantum assumes that the data contains ordinary records to be read into c1 onwards of the C array.
☞
For information about levels and how to describe the levels data structure, see chapter 3,‘Dealing with hierarchical data’ in the Quantum User’s Guide Volume 3.
The struct statement is used to define the type of records, the location of the serial number and card type in the record and the number of the highest card type if greater than 9. Its format is:
struct; options
Record type
Quick Reference
To define the record type, type:
struct; read=n
where n is 0 for ordinary records, 2 to read multicard records in sections according to the card type, or 3 to read multicard records all in one go.
Quantum recognizes two types of record: single card and multicard. The type of record is defined by the keyword read= on the struct statement:
• Ordinary records — Ordinary records are defined using read=0. Each record is read into c1 onwards of the array. Since it is the default, you need only use it when other options are required; for example, when the records contain serial numbers and you wish to have the serial number printed out as part of the record, or when you are working with long records of more than 100 columns.
• Multicard records — Multicard records are identified by the keyword read=2. Each card in the record is read into the row corresponding to the card type of that card — that is, card 1 in c(101,200), card 2 in c(201,300), and so on.
We mentioned briefly that it is possible to read all cards in a multicard record in at once and ignore the card type. The first card goes in c(101,200), the second in c(201,300), and so on.
This is achieved with read=3.
Record length
Quick Reference
To define the record length of records greater than 100 columns, type:
struct; reclen=n
The keyword reclen=n defines the maximum number of characters to be read into the C array, the number of cells to be reset to blanks and the number of cells to be written out by the write statement.
With ordinary records reclen may take any value, but with multicard records the maximum is reclen=1000. In both cases, the default is reclen=100. When data is read into the array, any record which is longer than reclen characters is truncated to that length and a warning message is printed.
When ordinary records are written out with write or split, cells c1 to c(reclen) are copied, with any trailing blanks being ignored. For instance, if we have:
struct;read=0;reclen=200
and the current record is only 157 characters long, the record written out will be 157 characters long. This length can be overridden by an option on a filedef statement.
When multicard records are written out, columns c101 to c(100+reclen), c201 to c(200+reclen), and so on will be output. Thus, if we write:
struct;read=2;reclen=70
and we have 2 cards per record, Quantum will write out c(101,170) and c(201,270).
Finally, with ordinary records cells c1 to c(reclen) are reset to blanks between records, but with multicard records cells c101 to c(100+reclen), c210 to c(200+reclen), and so on are reset.
☞
For information about the write statement, see section 7.1, ‘Print files’.For information about the split statement, see section 12.4, ‘Creating clean and dirty data files’.
For information about the filedef statement, see section 7.4, ‘Defining the file type’.
Serial number location
Quick Reference
To define the location of the serial number in each record, type:
struct; ser=c(m,n)
The keyword ser=c(m,n) defines the field of columns containing the respondent serial number. For example, if the serial number is in columns 1 to 5 of an ordinary record we would write:
struct;read=0;ser=c(1,5)
Similarly, if it is in columns 1 to 5 of a multicard record the statement would be:
struct;read=2;ser=c(1,5)
Notice that even with multicard records we only give the actual column numbers containing the serial number, rather than card type and column number as is usually the case when identifying columns in such records. This is because the column numbers refer to all cards in the data set rather than to a single card in the file.
Card type location
Quick Reference
For multicard records only, to define the location of the card type in the record, type:
struct; crd=cn
Defining the card type location is much the same as defining the position of the serial number in the record. The keyword is crd=cn for a single digit card type or crd=c(m,n) for a card type of more than one digit. Once again, m and n are column numbers only, not card type and column number.
For example:
struct;read=2;ser=c(1,4);crd=c5
tells us that we have a multicard record with serial numbers in columns 1 to 4 and the card type in column 5 of each card. Each card will be read into the row corresponding to its card number.
Required card types
Quick Reference
For multicard records only, to define cards which must be present in each record, type:
struct; req=card_numbers
where card_numbers is either a comma-separated list of card numbers, or a range of sequential card numbers in the form start:end or start/end.
Sometimes some cards will be optional and others mandatory. You define the cards which must appear in every record by using the keyword req= followed by the numbers of the cards that each respondent must have. For example:
req=1,2
tells us that cards 1 and 2 must be present in each record for that record to be accepted. Any other cards are optional. If a record is read without one of these cards, the error message ‘Card Missing in Set’ and a note of the record’s position in the file are printed and the record is ignored.
If you have ranges for required card types, you may type the numbers of the lowest and highest cards separated by a slash (/) or a colon (:) rather than listing each card type separately. For example, if cards 1 to 4 are all required, you may type:
req=1,2,3,4 or req=1/4 or req=1:4
Repeated card types
Quick Reference
For multicard records only, to define cards which may appear more than once in a record, type:
struct; rep=card_numbers
where card_numbers is either a comma-separated list of card numbers, or a range of sequential card numbers in the form start:end or start/end.
If the data contains trailer cards and the Levels facility is not used, you must list their card types with the keyword rep=. For instance, if card 2 is a trailer card we would write rep=2. Where there is more than one trailer card, each card type is listed separated by a comma. If cards 2, 3 and 4 are all trailer cards we could write:
rep=2,3,4
If you have ranges for repeated card types, you may type the numbers of the lowest and highest cards separated by a slash (/) or a colon (:) rather than listing each card type separately.
For example, if cards 2 to 4 are all repeated, you may type:
rep=2,3,4 or rep=2/4 or rep=2:4
If rep= is not used and a record is read with two or more cards of the same type, the last card of that type will be accepted and the message ‘Identical duplicate’ or ‘Non-identical duplicate’ and a note of the record’s position in the file will be printed. For example:
Because rep= refers to trailer cards only, it will be ignored if read=2 and crd= are not both present on the struct statement.
Highest card type number
Quick Reference
For multicard records only, to define the highest card type in the record, if there are more than nine cards per record, type:
struct; max=n
The only time you need to inform Quantum of the highest card type is when you have records with more than nine cards. This is so that Quantum can allocate sufficient cells in the C array to store the extra cards. The highest card type is defined with max=n, where n is the number of the highest card type. Cells 1 to max*reclen are then cleared between respondents. For example, to read a data set with 11 cards per respondent we might write:
struct;read=2;ser=c(1,4);crd=c5;req=1,2,3,4;max=11
If you forget max=, and a record is read with more than nine cards, the message ‘Too many cards per record’ is printed and the record is rejected. On the other hand, if a card is read with a card type higher than that defined with max=, the record is rejected with the message ‘Card number out of Record structure error: serial 026, card 234 in run, card 234 in dfile
card type 2 — non-identical duplicate
✎
Since the maximum size of the C array is 32,767 cells, the maximum value you can set with max= is 327 cards.Dealing with alphanumeric card types
Quick Reference
For multicard records only, to define the location in the C array of cards with alphanumeric card types, type:
struct; order=card_types
where card_types is a list of card type numbers and letters in the order they are to appear in the C array.
From time to time you may need to read in records with alphabetic as well as numeric card types.
This generally happens in a multicard data set containing more than nine cards per record where only one column has been allocated to the card type.
Quantum can deal with this data but first you have to say where in the C array the alphabetic card types should go. This is done with the keyword:
order=n
where n is one or more of the codes ‘1234567890–&’ or the letters A to Z (in upper or lower case) not separated by spaces.
The card type bearing the first number in the list is read into c(101,200), the card bearing the second code in the list is read into c(201,300), and so on. For example, suppose each record has ten cards
— 1 to 9 and A — our struct statement might say:
struct;read=2;ser=c(1,4);crd=c5;max=10;order=123456789A Data from card A would be read into cells 1001 to 1100 of the C array.
Merge sequence for trailer cards
Quick Reference
For multicard records only, to define the location of the merge sequence number in trailer cards, type:
struct; seq=cn
When trailer card data is merged during a run with the merge facility, you may wish trailer cards to be merged in a specific order, according to a sequence number entered as part of the data. The location of this sequence number can be defined with the keyword seq=cn for a single column code or seq=c(m,n) for a multicolumn code. For more information on merging data see the next section.