When we say that Quantum allows you to merge data files, we do not mean that Quantum takes data from a number of files and merges it to create a new file. Rather, we mean that data can be read from a series of files during a Quantum run. Of course, the merged data can then be written out to a new file for future use.
Quantum provides two methods for merging data. The first is designed for studies where you have different card types in different files; for example, cards 1 and 2 in the file data1 and card 3 in the file data2. In this case, merging is by serial number and, optionally, card type and trailer card sequence number.
The second method is designed for situations where you want to merge a field of data from an external file into records from the main data file. For example, you may have a file of manufacturers’ codes which refer to a number of products. If each record in the main data file contains the product the respondent preferred, you may wish to merge the appropriate manufacturer’s code from the external file into the main data in the C array. In this case, merging is based on finding matching keys in the main record and the records in the external file.
Both options are described in detail below.
Merging complete cards
Data for a study may be spread across a number of files. This is particularly useful with large surveys because it means that you can put each card type in a different file and simply merge in the cards required for the current batch of tables. For example, if we require tables from cards 4 and 5, we need not even read in cards 1, 2, 3 and 6.
Data from up to 16 files may be merged; that is, the main data file and 15 others. It may be merged on serial number and, within that, on card type. With trailer card data, you also have the option of merging trailer cards according to a sequence number entered as part of the data.
In order for the merge to be successful, all files must be sorted in ascending order with the serial number, card type and sequence number in the same position. Quantum reads the locations from the keywords ser=, crd= and seq= on the struct statement.
To merge data files you must create a file called merges telling Quantum which items to merge on, and which files to merge. The type of merge is represented by a number:
1 Merge on serial number. Cards are read in from each data file according to their serial number only — the card type and sequence number, if any, are ignored. You might use this option when you have two files, dat01 containing cards of type 1 and dat02 containing cards of type 2, and you want the files to be merged so that card type 1 is read into the C array, followed by card type 2.
3 Merge on serial number and card type (default). With this option, cards with the same serial number read from different data files are merged to form a single record by comparing the serial number and card type. Cards within a record are then sorted sequentially from 1 so that each card is read into the appropriate cells of the C array. For example, if dat01 contains cards 1 and 3, and dat02 contains cards of type 2, the merge will produce records containing cards 1, 2 and 3 in that order.
5 Merge on serial number, card type and sequence number. This is similar to merge type 3, except that trailer cards are merged according to their sequence number. For example, if dat01 contains cards 1 and 2, where card 2 is a trailer card with a sequence number of 2, and dat02 contains cards 2 and 3, where card 2 is a trailer cards with a sequence number of 1, the merged record will contain cards 1, 2/1, 2/2, and 3, in that order.
The type of merge is the first item in the merges file, and is followed by the names of the files to be merged with the main data file named in the Quantum command line. Items may be entered on separate lines or all on the same line separated by semicolons. For example, if we want to merge data in files dat02 and dat03 with data in the main file, dat01, by serial number, card type and sequence number, the merges file would look like this:
5; dat02; dat03
Notice that we have not mentioned dat01 in the merges file because it will be named on the Quantum command line instead.
✎
This facility is not designed to work with merge files that contain *include or #include statements to read additional data files into the current data file. All merge files must be named in the merges file, which accepts pathnames if the data files are not in the project directory.Merging a field of data from an external file
Quick Reference
To merge extra data from an external data file into the data currently in the C array, type:
int_variable=mergedata($ex_file$, key_field, key_start, copy_to, data_start) where
ex_file is the name of the file containing the extra data.
key_field is the location of the key in the main data file, entered using the standard Quantum notation for columns and fields.
key_start is the start column of the key in the external data file.
copy_to is the field in the main data record in which to place the external data. The field is defined using the standard Quantum notation for columns and fields.
data_start is the start column of the data to be copied.
This statement returns 1 in int_variable if a match is found, 0 if no match is found.
The mergedata statement merges a field of data from an external file with the main data at the datapass stage of the Quantum run. Merging is by means of a data key present in both the main records and the records in the external file. If a record in the external file has a key which matches that of a record in the main data file, the external data will be merged into a user-defined field of the main record when it is read into the C array.
In order for data to be merged correctly, both the main data file and the external file must be sorted in ascending order by key value. If the key is the record serial number then the data file will already be sorted in the correct order (assuming, of course, that the data is sorted by serial number). If you are using a key that is not the record serial number you must sort the data file so that it is ordered by key rather than by serial number.
The syntax for mergedata is:
int_variable=mergedata($ex_file$, key_field, key_start, copy_to, data_start) where:
int_variable is the name of an integer variable in which the function can place its return value.
ex_file is the name of the file containing the extra data. It must be enclosed in dollar signs.
For example:
t1 = mergedata($manuf_codes$,c(178,180),15,c(168,175),1)
tells Quantum to compare the key in columns 178 to 180 of the main record with the key which starts in column 15 of the external records in the file manuf_codes.
Because the key field in the main record is 3 columns long, Quantum reads columns 15 to 17 of each external record to obtain its key. If the keys match, Quantum copies the data from the external record into columns 168 to 175 of the main record in the C array. The external data to be copied starts in column 1 and, since the destination field is 8 columns long, Quantum copies 8 columns starting at that column.
This statement returns a value of 1 if a match was found (i.e., merging took place), or 0 if not.
There is no limit on the number of mergedata statements in a specification, but you may only merge data from up to nine different files per record.
Errors
Errors can occur if your run contains a mergedata statement and either the main data file or the file of supplementary data for merging has records with duplicate keys or records that are out of sequence. In some cases the run is also canceled after all data has been read, when a complete error report is available. The following table lists the situations when duplicate or out of sequence data may occur and shows what happens to your job.
key_field is the location of the key in the main data file, entered using the standard Quantum notation for columns and fields.
key_start is the start column of the key in the external data file, for example, 1 if the key starts in column 1. The length of the key is taken from the length of key_field.
copy_to is the field in the main data record in which to place the external data. The field is defined using the standard Quantum notation for columns and fields.
data_start is the start column of the data to be copied. Quantum copies as many columns as are defined by copy_to.