• No results found

Transforming the data

In document ASREML user guide release 3.0 (Page 79-90)

Transformation is the process of modifying the data (for example, dividing all of the data values in a field by 10), forming new variables (for example, summing the data in two fields) or creating temporary data (for example, a test variable used to discard some records from analysis and subsequently discarded). Occasional users may find it easier to use a spreadsheet to calculate derived variables than to modify variables using ASRemltransformations.

Transformation qualifiers are listed after data field labels (and the field type if present). They define an operation (e.g. +), often involving an argument (a constant or another variable), which is performed on a target variable. For a !G

group of variables, the target is the first variable in the set. Thetarget is usually implicit, the current field, but can be changed to a new variable with the!TARGET

qualifier.

Using transformations will be easier if you understand the process. As ASReml

parses the variable definitions, it sequentially assigns them column positions in the internal data array. It notes which is the last variable which is not created by Revised 08

(say the !=) transformation, and that determines how many fields are read from the data file (unless overridden by!READqualifier in Table 5.2). After parsing the model line,ASReml actually reads the data file. It reads a line into a temporary vector, performs the transformations in that vector, and then saves the positions

5 Command file: Reading the data 53

that relate to labelled variables to the internal data array. Note that

there may be up to 10000 variables and these are internally labeled V1,V2· · ·

ASReml3

V10000 for transformation purposes. Values from the data file, ignoring any

!SKIPed fields, are read into the leading variables,

alpha (!A), integer (!I), pedigree (!P) and date (!DATE) fields are converted

to real numbers (level codes) as they are read and before any transformations are applied,

transformations may be applied to any variable (since every variable is nu-

meric), but it may not be sensible to change factor level codes,

transformations operate on a single variable (not a!Ggroup of variables) unless

it is explicitly stated otherwise,

transformations are performed in order for each record in turn,

variables that are created by transformation should be defined after (below)

variables that are read from the data file unless it is the explicit intention to overwrite an input variable (see below),

after completing the transformations for each record, the values in the record

for variables associated with a label are held for analysis, (or the record (all values) is discarded; see !Dtransformation and Section 6.9),

Thus variables form three classes: those read from the data file (possibly modified, normally labelled and available for subsequent use in analysis), those created and labelled (available for subsequent use in the analysis) and those created but not labelled (intermediate calculations not required for subsequent analysis).

When listing variables in the field definitions, list those read from the data file first. After them, list (and define) the variables that are to be created and labelled but not read. The number of variables read can be explicitly set using the!READ

qualifier described in Table 5.5. Otherwise, if the first transformation on a field overwrites its contents (for instance using !=), ASRemlrecognises that the field does not need to be read in (unless a subsequent field does need to be read). For example,

A B

C !=A !-B

reads two fields (A and B), and constructs C as A-B. All three are available for analysis. However,

A B

5 Command file: Reading the data 54

C !=A !-B D

E !=D !-B

reads four fields (A,B,C andD) because the fourth field is not obviously created and must therefore be read even though the third field (C) is overwritten. The fifth field is not read but just createdE.

Variables that have an explicit label, may be referenced by their explicit label or their internal label. Therefore, to avoid confusion, do not use explicit labels of the form Vi, whereiis a number, for variables to be referred to in a transformation.

Vialways refers to field/variableiin a transformation statement.

Variables that are not initialized from the data file, are initialized to missing value for the first record, and otherwise, to the values from the preceding record (after transformation). Thus

A B

LagA !=V4 !V4=A

reads two fields (AandB), and constructsLagAas the value ofAfrom the previous record by extracting a value forLagA from working variableV4 before loadingV4

with the current value of A. Transformation syntax

Transformation qualifiers have one of seven forms, namely

!operator to perform an operation on the current field, for example,

absY !ABSto take absolute values,

!operator value to perform an operation involving an argument on the current field, for example,

logY !=Y !^0copiesYand then takes logs,

!operator Vfield to perform an operation on the current field us- ing the data in another field, for example,!-V2to subtract field 2 from the current field,

5 Command file: Reading the data 55

!Vtarget to reset the focus for subsequent transformations to field numbertarget,

!TARGETtarget

ASReml3

to reset the focus for subsequent transformations to the previously named field target,

!Vtarget = value to change all of the data in a target field to a given value,

!Vtarget = Vfield to overwrite the data in a target field by the data values of another field; a special case is whenfield

is 0instructing ASReml to put the record number into the targetfield.

operatoris one of the symbols defined in Table 5.1,

value is the argument, a real number, required by the transformation,

V is the literal character and is followed by the number (target or field) of a

data field; the data field is used or modified depending on the context,

Vfield may be replaced by the label of the field if it already has a label, in the first three forms the operation is performed on the current field; this

will be the field associated with the label unless the focus has been reset by specifying a newtarget in a preceding transformation,

the last four forms change the focus for subsequent transformations to the

target,

in the last two forms a value is assigned to the target field. For example, ... !V22=V11 ... copies (existing) field 11 into field 22. Such a statement would typically be followed by more transformations. If there are fewer than 22 variables labelled then V22 is used in the transformation stage but not kept for analysis.

only the !DOM and !RESCALE transformations automatically process a set of

variables defined with the!Gfield definition. All other transformations always Warning

operate on only a single field. Use the !DO ... !ENDDO transformations to perform them on a set of variables.

Table 5.1: List of transformation qualifiers and their actions with examples

qualifier argument action examples

!= v used to overwrite/create a variable

withv. It usually implies the variable

is not read (see examples on page 53)

half !=0.5 zero !=0.

5 Command file: Reading the data 56

Table 5.1: List of transformation qualifiers and their actions with examples

qualifier argument action examples

!+,!-,!*,!/ v usual arithmetic meaning; note that,

0/0 gives 0 but v/0 gives a missing

value wherev is not 0.

yield !/10

!^ v raises the data (which must be positive)

to the powerv.

yield

SQRyld !=yield !^0.5

!^ 0 takes natural logarithms of the data

(which must be positive).

yield

LNyield !=yield !^0

!^ 1 takes reciprocal of data (data must be

positive). yield INVyield !=yield !^-1 !>, !<, !<>, !==, !<=, !>=

v logical operators forming 1 if true, 0 if

false.

yield

high !=yield !>10

!ABS takes absolute values - no argument re-

quired.

yield

ABSyield !=yield !ABS

!ARCSIN v forms an ArcSin transformation using the sample size specified in the argu- ment, a number or another field. In the side example, for two existing fields

Germ andTotal containing counts, we

form the ArcSin for their ratio (ASG) by

copying theGermfield and applying the

ArcSin transformation using theTotal

field as sample size.

Germ Total

ASG !=Germ !ARCSIN Total

!COS, !SIN s takes cosine and sine of the data vari-

able with period s having default 2π;

omitsif data is in radians, setsto 360

if data is in degrees.

Day

CosDay !=Day !COS 365

!D, !D<>, !D<, !D<=, !D>, !D>= v v v

!D[o] v discards records which have v

or ’missing value’ in the field, subject

to the logical operatoro.

yield !D<=0 yield !D<1 !D>100 !DV, !DV<>, !DV<, !DV<=, !DV>, !DV>= ASReml3 v v v

!DV[o]vdiscards records, subject to the

logical operatoro, which havevin the

field but keeps records with ’missing

value’ in the field; if !DV is used after

!Aor!I,vshould refer to the encoded

factor level rather than the value in the

data file (see also Section 4.2). Use!DV

* to discard just those records with a

missing value in the field.

!Dvis equivalent to!DV * !DVv.

yield !DV<=0 yield !DV<1 !DV>100

5 Command file: Reading the data 57

Table 5.1: List of transformation qualifiers and their actions with examples

qualifier argument action examples

!DO

ASReml3

[n[it[iv]]] causes ASReml to perform the follow-

ing transformationsntimes (default is

variables in current term), increment-

ing the target byit(default 1) and the

argument (if present) byiv(default 0).

Loops may not be nested. A loop is

terminated by !ENDDO, another !DO or

a new field definition,

See below

!DOM

ASReml2

f copies and converts additive marker

covariables (-1, 0, 1) to dominance marker covariables (see below).

ChrAadd !G 10 !MM .. ChrAdom !DOM ChrAadd

!ENDDO

ASReml3

terminates a!DOtransformation block See below

!EXP takes antilog basee - no argument re-

quired.

Rate !EXP

!Jddm, !Jmmd !Jyyd

!Jddm converts a number representing

a date in the formddmmccyy, ddmmyy

or ddmm into days. !Jmmd converts a

date in the form ccyymmdd, yymmdd

or mmdd into days. !Jyyd converts a

date in the formccyydddoryydddinto

days. These calculate the number of days since December 31 1900 and are valid for dates from January 1 1900 to December 31 2099; note that

ifccis omitted it is taken as 19 ifyy>

32 and 20 ifyy<33, the date must be

entirely numeric: characters such as /

may not be present (but see!DATE).

!M, !M<>, !M< !M<= !M> !M>= v v v

!Mv converts data values of v to miss-

ing; if!Mis used after!Aor!I,vshould

refer to the encoded factor level rather than the value in the data file (see also Section 4.2).

yield !M-9

yield !M<=0 !M>100

!MAX, !MIN, !MOD

v the maximum, minimum and modulus

of the field values and the valuev.

yield !MAX 9

!MM

ASReml2

s assigns Haldane map positions (s) to

marker variables and imputes missing values to the markers (see below).

ChrAadd !G 10 !MM 1 · · ·

!NA v replaces any missing values in the vari-

ate with the value v. If v is another

field, its value is copied.

Rate !NA 0 WT !=Wt2 !NA Wt1

5 Command file: Reading the data 58

Table 5.1: List of transformation qualifiers and their actions with examples

qualifier argument action examples

!NORMAL

ASReml2

v replaces the variate with normal ran-

dom variables having variancev.

Ndat !=0 !Normal 4.5

is equivalent to

Ndat !=Normal 4.5

!REPLACE

ASReml2

o n replaces data valuesowithnin the cur-

rent variable. I.e.

IF(DataValue.EQ.o) DataValue=n

Rate !REPLACE -9 0

!RESCALE

ASReml2

o s rescales the column(s) in the current

variable (!G group of variables) using

Y = (Y +o)∗s

Rate !RESCALE -10 0.1

!SEED

ASReml2

v sets the seed for the random number

generator.

· · ·!SEED 848586

!SET vlist for vlist, a list of n values, the data

values 1. . . n are replaced by the cor-

responding element from vlist; data

values that are <1 or > n are re-

placed by zero. vlist may run over

several lines provided each incomplete line ends with a comma, i.e., a comma is used as a continuation symbol (see

Other examplesbelow).

treat !L C A B

CvR !=treat !SET 1 -1 -1

group !=treat !SET 1, 2 2 3 3 4

!SETN

ASReml2

v n !SETN v n replaces data values 1 : n

with normal random variables having

variance v. Data values outside the

range 1· · ·nare set to 0.

Anorm !=A !SETN 2.5 10

!SETU

ASReml2

v n replaces data values 1 :nwith uniform

random variables having range 0 : v.

Data values outside the range 1· · ·n

are set to 0.

Aeff !=A !SETU 5 10

!SUB vlist replaces data values =viwith their in-

dexiwherevlistis a vector ofnvalues.

Data values not found in vlistare set

to 0. vlistmay run over several lines

if necessary provided each incomplete

line ends with a comma. ASRemlallows

for a small rounding error when match- ing. It may not distinguish properly if

values in vlist only differ in the sixth

decimal place (seeOther examplesbe-

low).

5 Command file: Reading the data 59

Table 5.1: List of transformation qualifiers and their actions with examples

qualifier argument action examples

!SEQ replaces the data values with a sequen-

tial number starting at 1 which incre- ments whenever the data value changes between successive records; the current field is presumed to define a factor and the number of levels in the new factor is set to the number of levels identified in

this sequential process (see Other ex-

amplesbelow). Missing values remain missing.

plot !=V3 !SEQ

!TARGET

ASReml3

v changes the focus of subsequent trans-

formations to variable (field)v.

sqrtA

meanAB !+A !/2 , !TARGET sqrtA !^0.5

!UNIFORM

ASReml2

v replaces the variate with uniform ran-

dom variables having range 0 :v.

Udat !=0. !Uniform 4.5

is equivalent to

Udat !=Uniform 4.5

!Vtarget= value assignsvalueto data fieldtarget over- writing previous contents; subsequent transformation qualifiers will operate

on data fieldtarget.

· · ·!V3=2.5

Vfield assigns the contents of data field field

to data field target overwriting previ-

ous contents; subsequent transforma- tion qualifiers will operate on data field

target. Iffield is 0 the number of the data record is inserted.

· · ·!V10=V3 · · · !V11=block · · · !V12=V0

QTL marker transformations

!MMsassociates marker positions in the vectors(based on the Haldane mapping ASReml2

function) with marker variables and replaces missing values in a vector of marker states with expected values calculated using distances to non-missing flanking markers. This transformation will normally be used on a !G n factor where the

n variables are the marker states for nmarkers in a linkage group in map order and coded [-1,1] (backcross) or [-1,0,1] (F2 design). s (length n+1) should be the n marker positions relative to a left telomere position of zero, and an extra value being the length of the linkage group (the position of the right telomere).

5 Command file: Reading the data 60

The length (right telomere) may be omitted in which case the last marker is taken as the end of the linkage group. The positions may be given in Morgans or centiMorgans (if the length is greater than 10, it will be divided by 100 to convert to Morgans).

The recombination rate between markers at sL and sR (L is left and R is right of some putative QTL at Q) is

θLR = (1−e−2(sR−sL))/2.

Consequently, for 3 markers (L,Q,R),θLR =θLQ+θQR−2θLQθQR.

The expected value of a missing marker at Q (between L and R) depends on the marker states at L and R: E(q|1,1) = (1−θLQ−θQR)/(1−θLR),

E(q|1,−1) = (θQR−θLQ)/θLR,E(q| −1,1) = (θLQ−θQR)/θLR and E(q| −1,−1) = (1 +θLQ+θQR)/(1−θLR).

Let λL= (E(q|1,1) +E(q|1,−1))/2 = θQR(1θLR(1θQR)(1θLR)2θLQ) and λR= (E(q| −1,1) +E(q| −1,−1))/2 = θLQ(1θLR(1θLQ)(1θLR)2θQR)

Then E(q|xL, xR) = λLxL +λRxR. Where there is no marker on one side,

E(q|xR) = (1−θQR)xR+θQR(−xR) =xR(12θQR) . This qualifier facilitates the QTL method discussed in Gilmour (2007).

!DOM A is used to form dominance covariables from a set of additive marker ASReml2

covariables previously declared with the!MM marker map qualifier. It assumes the argument A is an existing group of marker variables relating to a linkage group defined using !MM which represents additive marker variation coded [-1, 0, 1] (representing marker states aa, aA and AA) respectively. It is a group transformation which takes the [-1,1] interval values, and calculates (|X| −0.5)2 i.e. -1 and 1 become one, 0 becomes -1. The marker map is also copied and applied to this model term so it can be the argument in a qtl()term (page 106).

!DO ... !ENDDO provides a mechanism to repeat transformations on a set of ASReml3

variables. All tranformations except !DOM and !RESCALEoperate once on a sin- gle field unless preceded by a!DOqualifier. The!DOqualifier has three arguments:

n[[it]iv]. n is the number of times the following transformations are to be per- formed. it(default 1) is the increment applied to the target field. iv (default 0.0) is the increment applied to the transformation argument. The default fornis the number of variables in the current field definition. !ENDDOis formally equivalent to!DO 1and is implicit when another!DOappears or the next field definition be- gins. Note that when several transformations are repeated, the processing order is that each is performed n times before the next is processed (contrary to the implication of the syntax). However, the target is reset for each transformation so that the transformations apply to the same set of variables.

5 Command file: Reading the data 61

Y1 Y2 Y3 Y4 Y5 # Repeat 5 times, incrementing just

Ymean !=0. !DO 5 0 1 !+Y1 !ENDDO !/5 # the argument

is equivalent to

Y1 Y2 Y3 Y4 Y5

Ymean !=0. !+Y1 !+Y2 !+Y3 !+Y4 !+Y5 !/5

Y0 Y1 Y2 Y3 Y4 Y5 !TARGET Y1 !do 5 1 0 !-Y0 !ENDDO#Take Y0 from rest Markers !G 12 !do !D * !ENDDO # Delete records with missing marker values

The default arguments ( 12, 1, 0.) are used. The initial target is the first marker. Other rules and examples

Other rules include the following

variables that are created should be listed after all variables that are read in

unless the intention is to overwrite an input field. Revised 08

missing values are unaffected by arithmetic operations, that is, missing values

in the current or target column remain missing after the transformation has been performed except in assignment

– !+3 will leave missing values (NA,*and.) as missing, – !=3 will change missing values to3,

multiple arithmetic operations cannot be expressed in a complex expression

but must be given as separate operations that are performed in sequence as they appear, for example, yield !-120 !*0.0333 would calculate 0.0333 * (yield - 120),

Most transformations only operate on a single field and will not therefore be

performed on all variables in a !G factor set. The only transformations that apply to the whole set are !DOM,!MMand !RESCALE.

ASReml code action

yield !M0 changes the zero entries inyieldto missing values

yield !^0 takes natural logarithms of theyielddata

score !-5 subtracts 5 from all values inscore

score !SET -0.5 1.5 2.5 replaces data values of 1, 2 and 3 with -0.5, 1.5 and 2.5 respectively

5 Command file: Reading the data 62

ASReml code action

score !SUB -0.5 1.5 2.5 replaces data values of -0.5, 1.5 and 2.5 with 1, 2 and 3 respectively; a data value of 1.51 would be

In document ASREML user guide release 3.0 (Page 79-90)