Transformation is the process of modifying the data (for example, dividing all of the data values in a field by 10), forming new variables (for example, summing the data in two fields) or creating temporary data (for example, a test variable used to discard some records from analysis and subsequently discarded). Occasional users may find it easier to use a spreadsheet to calculate derived variables than to modify variables using ASRemltransformations.
Transformation qualifiers are listed after data field labels (and the field type if present). They define an operation (e.g. +), often involving an argument (a constant or another variable), which is performed on a target variable. For a !G
group of variables, the target is the first variable in the set. Thetarget is usually implicit, the current field, but can be changed to a new variable with the!TARGET
qualifier.
Using transformations will be easier if you understand the process. As ASReml
parses the variable definitions, it sequentially assigns them column positions in the internal data array. It notes which is the last variable which is not created by Revised 08
(say the !=) transformation, and that determines how many fields are read from the data file (unless overridden by!READqualifier in Table 5.2). After parsing the model line,ASReml actually reads the data file. It reads a line into a temporary vector, performs the transformations in that vector, and then saves the positions
5 Command file: Reading the data 53
that relate to labelled variables to the internal data array. Note that
• there may be up to 10000 variables and these are internally labeled V1,V2· · ·
ASReml3
V10000 for transformation purposes. Values from the data file, ignoring any
!SKIPed fields, are read into the leading variables,
• alpha (!A), integer (!I), pedigree (!P) and date (!DATE) fields are converted
to real numbers (level codes) as they are read and before any transformations are applied,
• transformations may be applied to any variable (since every variable is nu-
meric), but it may not be sensible to change factor level codes,
• transformations operate on a single variable (not a!Ggroup of variables) unless
it is explicitly stated otherwise,
• transformations are performed in order for each record in turn,
• variables that are created by transformation should be defined after (below)
variables that are read from the data file unless it is the explicit intention to overwrite an input variable (see below),
• after completing the transformations for each record, the values in the record
for variables associated with a label are held for analysis, (or the record (all values) is discarded; see !Dtransformation and Section 6.9),
Thus variables form three classes: those read from the data file (possibly modified, normally labelled and available for subsequent use in analysis), those created and labelled (available for subsequent use in the analysis) and those created but not labelled (intermediate calculations not required for subsequent analysis).
When listing variables in the field definitions, list those read from the data file first. After them, list (and define) the variables that are to be created and labelled but not read. The number of variables read can be explicitly set using the!READ
qualifier described in Table 5.5. Otherwise, if the first transformation on a field overwrites its contents (for instance using !=), ASRemlrecognises that the field does not need to be read in (unless a subsequent field does need to be read). For example,
A B
C !=A !-B
reads two fields (A and B), and constructs C as A-B. All three are available for analysis. However,
A B
5 Command file: Reading the data 54
C !=A !-B D
E !=D !-B
reads four fields (A,B,C andD) because the fourth field is not obviously created and must therefore be read even though the third field (C) is overwritten. The fifth field is not read but just createdE.
Variables that have an explicit label, may be referenced by their explicit label or their internal label. Therefore, to avoid confusion, do not use explicit labels of the form Vi, whereiis a number, for variables to be referred to in a transformation.
Vialways refers to field/variableiin a transformation statement.
Variables that are not initialized from the data file, are initialized to missing value for the first record, and otherwise, to the values from the preceding record (after transformation). Thus
A B
LagA !=V4 !V4=A
reads two fields (AandB), and constructsLagAas the value ofAfrom the previous record by extracting a value forLagA from working variableV4 before loadingV4
with the current value of A. Transformation syntax
Transformation qualifiers have one of seven forms, namely
!operator to perform an operation on the current field, for example,
absY !ABSto take absolute values,
!operator value to perform an operation involving an argument on the current field, for example,
logY !=Y !^0copiesYand then takes logs,
!operator Vfield to perform an operation on the current field us- ing the data in another field, for example,!-V2to subtract field 2 from the current field,
5 Command file: Reading the data 55
!Vtarget to reset the focus for subsequent transformations to field numbertarget,
!TARGETtarget
ASReml3
to reset the focus for subsequent transformations to the previously named field target,
!Vtarget = value to change all of the data in a target field to a given value,
!Vtarget = Vfield to overwrite the data in a target field by the data values of another field; a special case is whenfield
is 0instructing ASReml to put the record number into the targetfield.
• operatoris one of the symbols defined in Table 5.1,
• value is the argument, a real number, required by the transformation,
• V is the literal character and is followed by the number (target or field) of a
data field; the data field is used or modified depending on the context,
• Vfield may be replaced by the label of the field if it already has a label, • in the first three forms the operation is performed on the current field; this
will be the field associated with the label unless the focus has been reset by specifying a newtarget in a preceding transformation,
• the last four forms change the focus for subsequent transformations to the
target,
• in the last two forms a value is assigned to the target field. For example, ... !V22=V11 ... copies (existing) field 11 into field 22. Such a statement would typically be followed by more transformations. If there are fewer than 22 variables labelled then V22 is used in the transformation stage but not kept for analysis.
• only the !DOM and !RESCALE transformations automatically process a set of
variables defined with the!Gfield definition. All other transformations always Warning
operate on only a single field. Use the !DO ... !ENDDO transformations to perform them on a set of variables.
Table 5.1: List of transformation qualifiers and their actions with examples
qualifier argument action examples
!= v used to overwrite/create a variable
withv. It usually implies the variable
is not read (see examples on page 53)
half !=0.5 zero !=0.
5 Command file: Reading the data 56
Table 5.1: List of transformation qualifiers and their actions with examples
qualifier argument action examples
!+,!-,!*,!/ v usual arithmetic meaning; note that,
0/0 gives 0 but v/0 gives a missing
value wherev is not 0.
yield !/10
!^ v raises the data (which must be positive)
to the powerv.
yield
SQRyld !=yield !^0.5
!^ 0 takes natural logarithms of the data
(which must be positive).
yield
LNyield !=yield !^0
!^ −1 takes reciprocal of data (data must be
positive). yield INVyield !=yield !^-1 !>, !<, !<>, !==, !<=, !>=
v logical operators forming 1 if true, 0 if
false.
yield
high !=yield !>10
!ABS takes absolute values - no argument re-
quired.
yield
ABSyield !=yield !ABS
!ARCSIN v forms an ArcSin transformation using the sample size specified in the argu- ment, a number or another field. In the side example, for two existing fields
Germ andTotal containing counts, we
form the ArcSin for their ratio (ASG) by
copying theGermfield and applying the
ArcSin transformation using theTotal
field as sample size.
Germ Total
ASG !=Germ !ARCSIN Total
!COS, !SIN s takes cosine and sine of the data vari-
able with period s having default 2π;
omitsif data is in radians, setsto 360
if data is in degrees.
Day
CosDay !=Day !COS 365
!D, !D<>, !D<, !D<=, !D>, !D>= v v v
!D[o] v discards records which have v
or ’missing value’ in the field, subject
to the logical operatoro.
yield !D<=0 yield !D<1 !D>100 !DV, !DV<>, !DV<, !DV<=, !DV>, !DV>= ASReml3 v v v
!DV[o]vdiscards records, subject to the
logical operatoro, which havevin the
field but keeps records with ’missing
value’ in the field; if !DV is used after
!Aor!I,vshould refer to the encoded
factor level rather than the value in the
data file (see also Section 4.2). Use!DV
* to discard just those records with a
missing value in the field.
!Dvis equivalent to!DV * !DVv.
yield !DV<=0 yield !DV<1 !DV>100
5 Command file: Reading the data 57
Table 5.1: List of transformation qualifiers and their actions with examples
qualifier argument action examples
!DO
ASReml3
[n[it[iv]]] causes ASReml to perform the follow-
ing transformationsntimes (default is
variables in current term), increment-
ing the target byit(default 1) and the
argument (if present) byiv(default 0).
Loops may not be nested. A loop is
terminated by !ENDDO, another !DO or
a new field definition,
See below
!DOM
ASReml2
f copies and converts additive marker
covariables (-1, 0, 1) to dominance marker covariables (see below).
ChrAadd !G 10 !MM .. ChrAdom !DOM ChrAadd
!ENDDO
ASReml3
terminates a!DOtransformation block See below
!EXP takes antilog basee - no argument re-
quired.
Rate !EXP
!Jddm, !Jmmd !Jyyd
!Jddm converts a number representing
a date in the formddmmccyy, ddmmyy
or ddmm into days. !Jmmd converts a
date in the form ccyymmdd, yymmdd
or mmdd into days. !Jyyd converts a
date in the formccyydddoryydddinto
days. These calculate the number of days since December 31 1900 and are valid for dates from January 1 1900 to December 31 2099; note that
ifccis omitted it is taken as 19 ifyy>
32 and 20 ifyy<33, the date must be
entirely numeric: characters such as /
may not be present (but see!DATE).
!M, !M<>, !M< !M<= !M> !M>= v v v
!Mv converts data values of v to miss-
ing; if!Mis used after!Aor!I,vshould
refer to the encoded factor level rather than the value in the data file (see also Section 4.2).
yield !M-9
yield !M<=0 !M>100
!MAX, !MIN, !MOD
v the maximum, minimum and modulus
of the field values and the valuev.
yield !MAX 9
!MM
ASReml2
s assigns Haldane map positions (s) to
marker variables and imputes missing values to the markers (see below).
ChrAadd !G 10 !MM 1 · · ·
!NA v replaces any missing values in the vari-
ate with the value v. If v is another
field, its value is copied.
Rate !NA 0 WT !=Wt2 !NA Wt1
5 Command file: Reading the data 58
Table 5.1: List of transformation qualifiers and their actions with examples
qualifier argument action examples
!NORMAL
ASReml2
v replaces the variate with normal ran-
dom variables having variancev.
Ndat !=0 !Normal 4.5
is equivalent to
Ndat !=Normal 4.5
!REPLACE
ASReml2
o n replaces data valuesowithnin the cur-
rent variable. I.e.
IF(DataValue.EQ.o) DataValue=n
Rate !REPLACE -9 0
!RESCALE
ASReml2
o s rescales the column(s) in the current
variable (!G group of variables) using
Y = (Y +o)∗s
Rate !RESCALE -10 0.1
!SEED
ASReml2
v sets the seed for the random number
generator.
· · ·!SEED 848586
!SET vlist for vlist, a list of n values, the data
values 1. . . n are replaced by the cor-
responding element from vlist; data
values that are <1 or > n are re-
placed by zero. vlist may run over
several lines provided each incomplete line ends with a comma, i.e., a comma is used as a continuation symbol (see
Other examplesbelow).
treat !L C A B
CvR !=treat !SET 1 -1 -1
group !=treat !SET 1, 2 2 3 3 4
!SETN
ASReml2
v n !SETN v n replaces data values 1 : n
with normal random variables having
variance v. Data values outside the
range 1· · ·nare set to 0.
Anorm !=A !SETN 2.5 10
!SETU
ASReml2
v n replaces data values 1 :nwith uniform
random variables having range 0 : v.
Data values outside the range 1· · ·n
are set to 0.
Aeff !=A !SETU 5 10
!SUB vlist replaces data values =viwith their in-
dexiwherevlistis a vector ofnvalues.
Data values not found in vlistare set
to 0. vlistmay run over several lines
if necessary provided each incomplete
line ends with a comma. ASRemlallows
for a small rounding error when match- ing. It may not distinguish properly if
values in vlist only differ in the sixth
decimal place (seeOther examplesbe-
low).
5 Command file: Reading the data 59
Table 5.1: List of transformation qualifiers and their actions with examples
qualifier argument action examples
!SEQ replaces the data values with a sequen-
tial number starting at 1 which incre- ments whenever the data value changes between successive records; the current field is presumed to define a factor and the number of levels in the new factor is set to the number of levels identified in
this sequential process (see Other ex-
amplesbelow). Missing values remain missing.
plot !=V3 !SEQ
!TARGET
ASReml3
v changes the focus of subsequent trans-
formations to variable (field)v.
sqrtA
meanAB !+A !/2 , !TARGET sqrtA !^0.5
!UNIFORM
ASReml2
v replaces the variate with uniform ran-
dom variables having range 0 :v.
Udat !=0. !Uniform 4.5
is equivalent to
Udat !=Uniform 4.5
!Vtarget= value assignsvalueto data fieldtarget over- writing previous contents; subsequent transformation qualifiers will operate
on data fieldtarget.
· · ·!V3=2.5
Vfield assigns the contents of data field field
to data field target overwriting previ-
ous contents; subsequent transforma- tion qualifiers will operate on data field
target. Iffield is 0 the number of the data record is inserted.
· · ·!V10=V3 · · · !V11=block · · · !V12=V0
QTL marker transformations
!MMsassociates marker positions in the vectors(based on the Haldane mapping ASReml2
function) with marker variables and replaces missing values in a vector of marker states with expected values calculated using distances to non-missing flanking markers. This transformation will normally be used on a !G n factor where the
n variables are the marker states for nmarkers in a linkage group in map order and coded [-1,1] (backcross) or [-1,0,1] (F2 design). s (length n+1) should be the n marker positions relative to a left telomere position of zero, and an extra value being the length of the linkage group (the position of the right telomere).
5 Command file: Reading the data 60
The length (right telomere) may be omitted in which case the last marker is taken as the end of the linkage group. The positions may be given in Morgans or centiMorgans (if the length is greater than 10, it will be divided by 100 to convert to Morgans).
The recombination rate between markers at sL and sR (L is left and R is right of some putative QTL at Q) is
θLR = (1−e−2(sR−sL))/2.
Consequently, for 3 markers (L,Q,R),θLR =θLQ+θQR−2θLQθQR.
The expected value of a missing marker at Q (between L and R) depends on the marker states at L and R: E(q|1,1) = (1−θLQ−θQR)/(1−θLR),
E(q|1,−1) = (θQR−θLQ)/θLR,E(q| −1,1) = (θLQ−θQR)/θLR and E(q| −1,−1) = (−1 +θLQ+θQR)/(1−θLR).
Let λL= (E(q|1,1) +E(q|1,−1))/2 = θQR(1θ−LR(1θQR)(1−θLR)−2θLQ) and λR= (E(q| −1,1) +E(q| −1,−1))/2 = θLQ(1θ−LR(1θLQ)(1−θLR)−2θQR)
Then E(q|xL, xR) = λLxL +λRxR. Where there is no marker on one side,
E(q|xR) = (1−θQR)xR+θQR(−xR) =xR(1−2θQR) . This qualifier facilitates the QTL method discussed in Gilmour (2007).
!DOM A is used to form dominance covariables from a set of additive marker ASReml2
covariables previously declared with the!MM marker map qualifier. It assumes the argument A is an existing group of marker variables relating to a linkage group defined using !MM which represents additive marker variation coded [-1, 0, 1] (representing marker states aa, aA and AA) respectively. It is a group transformation which takes the [-1,1] interval values, and calculates (|X| −0.5)∗2 i.e. -1 and 1 become one, 0 becomes -1. The marker map is also copied and applied to this model term so it can be the argument in a qtl()term (page 106).
!DO ... !ENDDO provides a mechanism to repeat transformations on a set of ASReml3
variables. All tranformations except !DOM and !RESCALEoperate once on a sin- gle field unless preceded by a!DOqualifier. The!DOqualifier has three arguments:
n[[it]iv]. n is the number of times the following transformations are to be per- formed. it(default 1) is the increment applied to the target field. iv (default 0.0) is the increment applied to the transformation argument. The default fornis the number of variables in the current field definition. !ENDDOis formally equivalent to!DO 1and is implicit when another!DOappears or the next field definition be- gins. Note that when several transformations are repeated, the processing order is that each is performed n times before the next is processed (contrary to the implication of the syntax). However, the target is reset for each transformation so that the transformations apply to the same set of variables.
5 Command file: Reading the data 61
Y1 Y2 Y3 Y4 Y5 # Repeat 5 times, incrementing just
Ymean !=0. !DO 5 0 1 !+Y1 !ENDDO !/5 # the argument
is equivalent to
Y1 Y2 Y3 Y4 Y5
Ymean !=0. !+Y1 !+Y2 !+Y3 !+Y4 !+Y5 !/5
Y0 Y1 Y2 Y3 Y4 Y5 !TARGET Y1 !do 5 1 0 !-Y0 !ENDDO#Take Y0 from rest Markers !G 12 !do !D * !ENDDO # Delete records with missing marker values
The default arguments ( 12, 1, 0.) are used. The initial target is the first marker. Other rules and examples
Other rules include the following
• variables that are created should be listed after all variables that are read in
unless the intention is to overwrite an input field. Revised 08
• missing values are unaffected by arithmetic operations, that is, missing values
in the current or target column remain missing after the transformation has been performed except in assignment
– !+3 will leave missing values (NA,*and.) as missing, – !=3 will change missing values to3,
• multiple arithmetic operations cannot be expressed in a complex expression
but must be given as separate operations that are performed in sequence as they appear, for example, yield !-120 !*0.0333 would calculate 0.0333 * (yield - 120),
• Most transformations only operate on a single field and will not therefore be
performed on all variables in a !G factor set. The only transformations that apply to the whole set are !DOM,!MMand !RESCALE.
ASReml code action
yield !M0 changes the zero entries inyieldto missing values
yield !^0 takes natural logarithms of theyielddata
score !-5 subtracts 5 from all values inscore
score !SET -0.5 1.5 2.5 replaces data values of 1, 2 and 3 with -0.5, 1.5 and 2.5 respectively
5 Command file: Reading the data 62
ASReml code action
score !SUB -0.5 1.5 2.5 replaces data values of -0.5, 1.5 and 2.5 with 1, 2 and 3 respectively; a data value of 1.51 would be