Typically, a data record consists of all the information pertaining to an experi- mental unit (plot, animal, assessment). Data field definitions manage the process of converting the fields as they appear in the data file to the internal form needed by ASReml. This involves mapping (coding) factors, general transformations, skipping fields and discarding unnecessary records. If the necessary information is not in a single file, theMERGE facility (See chapter 12) may help.
The data fields to be saved for analysis are defined immediately after the job title. The definitions indicate how each field in the data file is handled as it is read intoASReml. ASRemldeduces how many of them are read from the data file from the associated transformation information (override with the!READqualifier described in Table 5.5). No more than 10,000 variables may be read or formed.
NIN Alliance Trial 1989
variety !A id pid raw repl 4 nloc yield lat long row 22 column 11 nin89aug.asd !skip 1 yield ∼ mu variety . . . Data field definitions
• should be given for all fields in the data file;
fields can be skipped and fields (on the end of a data line) without a field definition are ignored; if there are not enough data fields on a data line, the remainder are taken from the next line(s),
• must be presented in the order in which
they appear in the data file,
• must be indented one or more spaces,
Important
• can appear with other definitions on the
same line,
5 Command file: Reading the data 49
– transformation qualifiers should be listed after the data field labels for the
fields being modified/created.
– additional data fields can be created by transformation qualifiers.
Data field definition syntax
Data field definitions appear in the ASRemlcommand file in the form
space label [field type] [transformations]
• space
– is a required space • label
– is an alphanumeric string to identify the field,
– has a maximum of 31 characters although only 20 are ever printed/displayed, – must begin with a letter,
– must not contain the special characters ., *, :, /, !, #, |or ( , – reserved words (Table 6.1 and Table 7.3) must not be used,
• field type defines how a variable is interpreted as it is read and whether it is
regarded as a factor or variable if specified in the linear model,
– for a variate, leave field typeblank or specify 1,
– for a model factor, various qualifiers are required depending on the form of
Revised 08
the factor coding where nis the number of levels of the factor andsis a list of labels to be assigned to the levels:
*orn is used when the data field has values1. . .ndirectly coding for the factor unless the levels are to be labelled (see!L),
Row * # 1:12for example
!Ls is used when the data field is numeric with values 1. . .n and labels are to be assigned to then levels, for example
Sex !L Male Female
!L can also be used in conjunction with!A to set the order of the levels. For example SNP !A !L C:C C:T T:T defines the levels over-riding the default, data dependent order.
If there are many labels, they may be written over several lines by using a trailing comma to indicate continuation of the list.
!A[n] is required if the data field is alphanumeric, for example
5 Command file: Reading the data 50
!I[n] is required if the data is numeric defining a factor but not 1. . .n;
!Imust be followed by n if more than 1000 codes are present,
Year !I # 1995 1996 for example
!ASp is required if the data field has level names in common with a previous !A or !I factor p and is to be coded identically, for example in a plant diallel experiment
Male !A 22 Female !AS Male # integrated coding !P indicates the special case of a pedigree factor; ASReml will
determine whether the identifiers are integer or alphanumeric from the pedigree file qualifiers, and set the levels after reading the pedigree file, see Section 9.3,
Animal !P # coded according to pedigree file
A warning is printed if the nominated value for n does not agree with the actual number of levels found in the data and if the nominated value is too small the correct value is used.
– for a group of m variates or factor variables
!Gm [l] ASReml3
is used when m contiguous data fields comprise a set to be used together. The variables will be treated as fac- tor variables if the second argument (l) setting the num- ber of levels is present (it may be *). For example
. . . X1 X2 X3 X4 X5 y data.dat y ∼ mu X1 X2 X3 X4 X5 and ... X !G 5 y data.dat y ∼ mu X are equivalent.
– !DATEspecifies the field has one of the date formatsdd/mm/yy,dd/mm/ccyy,
ASReml2
dd-Mon-yy,dd-Mon-ccyyand is to be converted into a Julian day whereddis a 1 or 2 digit day of the month,mmis a 1 or 2 digit month of the year,Monis a three letter month name (Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec), yyis the year within the century (00 to 99),ccis the century (19 or 20). The separators ’/’ and ’-’ must be present as indicated. The dates are converted to days starting 1 Jan 1900. When the century is not specified,
yy of 0-32 is taken as 2000-2032, 33-99 taken as 1933-1999.
– !DMYspecifies the field has one of the date formatsdd/mm/yyordd/mm/ccyy
ASReml2
and is to be converted into a Julian day.
– !MDYspecifies the field has one of the date formatsmm/dd/yyormm/dd/ccyy
ASReml2
and is to be converted into a Julian day.
– !TIME specifies the field has the time format hh:mm:ss. and is to be con-
5 Command file: Reading the data 51
verted to seconds past midnight where hhis hours (0 to 23),mmis minutes (0-59) andss is seconds (0 to 59). The separator ’:’ must be present.
• transformationsare described below.
Storage of alphabetic factor labels
Space is allocated dynamically for the storage of alphabetic factor labels with a ASReml2
default allocation being 2000 labels of 16 characters long. If there are large !A
factors (so that the total across all factors will exceed 2000), you must specify the anticipated size (within say 5%). If some labels are longer then 16 characters and the extra characters are significant, you must lengthen the space for each label by specifying !LLce.g.
cross !A 2300 !LL 48
indicates the factor crosshas about 2300 levels and needs 48 characters to hold the level names; only the first 20 characters of the names are ever printed.
!PRUNE on a field definition line means that if fewer levels are actually present ASReml2
in the factor than were declared, ASReml will reduce the factor size to the ac- tual number of levels. Use !PRUNALLfor this action to be taken on the current and subsequent factors up to (but not including) a factor with the !PRUNEOFF
qualifier. The user may overestimate the size for large ALPHA and INTEGER coded factors so that ASReml reserves enough space for the list. Using !PRUNE
will mean the extra (undefined) levels will not appear in the .sln file. Since it is sometimes necessary that factors not be pruned in this way, for example in pedigree/GIV factors, pruning is only done if requested.
Reordering the factor levels
!SORT declared after!Aor!I on a field definition line will cause ASRemlto sort ASReml2
the levels so that labels occur in alphabetic/numeric order for the analysis. As
ASReml reads the data file, it encodes !I and !A factor levels in the order they appear in the data so that for example, the user cannot tell whether SEX will be coded 1=Male, 2=Female or1=Female, 2=Malewithout looking at the data file to see whether Male or Female appears first in the SEX field. If !SORT is specified, ASRemlcreates a lookup table after reading the data to select levels in sorted order and uses this sorted order when forming the design matrices. Conse- quentially, with the !SORTqualifier, the order of fitted effects will be 1=Female, 2=Male in the analysis regardless of which appears first in the file. However most other references to particular levels of factors will refer to the unsorted lev-
5 Command file: Reading the data 52
els so users should verify that ASRemlhas made the correct interpretation when nominating specific levels of !SORTed factors. In particular any transformations Caution
are performed as the data is read in and before the sorting occurs.
!SORTALLmeans that the levels of this and subsequent factors are to be sorted. Skipping input fields
!SKIPf will skipf data fields BEFORE reading this field. It is particularly useful ASReml2
in large files with alphabetic fields which are not needed as it saves ASReml the time required to classify the alphabetic labels. For example
Sire !I !skip 1
would skip the field before the field which is read as ’Sire’. This qualifier is ignored when reading binary data.
Warning