• No results found

The data file

In document ASREML user guide release 3.0 (Page 70-74)

The standard format of an ASReml data file is to have the data arranged in columns/fields with a single line for each sampling unit. The columns contain variates and covariates (numeric), factors (alphanumeric), traits (response vari- ables) and weight variables in any order that is convenient to the user. The data file may be free format, fixed format or a binary file.

Free format data files

The data are read free format (space, commaortab separated) unless the file name has extension.binfor real binary, or.dbl for double precision binary (see

4 Data file preparation 44

below). Important points to note are as follows:

files prepared in Excel must be saved to comma or tab-delimited form. blank lines are ignored,

column headings, field labels or comments may be present at the top of the

file provided that the !skipqualifier (Table 5.2) is used to skip over them,

NA, * and . are treated as coding formissing values in free format data files; – if missing values are coded with a unique data value (for example, 0 or -9),

use !Mto flag them asmissing or!DV *to drop the data record containing them (see Table 5.1),

comma delimited files whose file name ends in .csv or for which the !CSV

qualifier is set recognise empty fields as missing values,

– a line beginning with a comma implies a preceding missing value, – consecutive commas imply a missing value,

– a line ending with a comma implies a trailing missing value,

– if the filename does not end in.csv or the!CSV qualifier is not set, commas

are treated as white space,

characters following#on a line are ignored so this character may not be used

in alphanumeric fields,

blank spaces, tabs and commas must not be used (embedded) in alphanumeric

fields unless the label is enclosed in quotes, for example, the name Willow Creek would need to be appear in the data file as ‘Willow Creek’ to avoid error,

the $ symbol must not be used in the data file,

alphanumeric fields have a default size of 16 characters. Use the!LL qualifier

to extend the size of factor labels stored. ASReml2

extra data fields on a line are ignored,

if there are fewer data items on a line than ASRemlexpects the remainder are

taken from the following line(s) except in .csv files were they are taken as missing. If you end up with half the number of records you expected, this is probably the reason,

all lines beginning with ! followed by a blank are copied to the .asr file as

4 Data file preparation 45

Fixed format data files

The format must be supplied with the !FORMAT qualifier which is described in (Table 5.5). However, if all fields are present and are separated, the file can be read free format.

Preparing data files in Excel

Many users find it convenient to prepare their data in Excel or Access. How- ever, the data must be exported from these programs into either .csv (Comma separated values) or .txt (TAB separated values) form for ASReml to read it.

ASRemlcan convert an.xlsfile to a.csvfile. WhenASRemlis invoked with an

.xls file as the filename argument and there is no.csv file or.aswith the same basename, it exports the first sheet as a.csv file and then generates a template

.as command file from any column headings it finds (see page 196). It will also convert a Genstat.gshspreadsheet file to.csvformat. The data extracted from the .xls file are labels, numerical values and the results from formulae. Empty rows at the start and end of a block are trimmed, but empty rows in the middle of a block are kept. Empty columns are ignored. A single row of labels as the first non-empty row in the block will be taken as column names. Empty cells in this row will have default names C1,C2 etc. assigned. Missing values are commonly represented in ASRemldata files byNA,*or.. ASRemlwill also recognise empty fields as missing values in.csv (.xls) files.

Binary format data files

Conventions for binary files are as follows:

binary files are read as unformatted Fortran binary in single precision if the

filename has a .binor .BINextension,

Fortranbinary data files are read in double precision if the filename has a.dbl

or.DBL extension,

ASRemlrecognises the value-1e37 as a missing value in binary files,

Fortranbinary in the above means all real (.bin) or all double precision (.dbl)

variables; mixed types, that is, integer and alphabetic binary representation of variables is not allowed in binary files,

binary files can only be used in conjunction with a pedigree file if the pedigree

fields are coded in the binary file so that they correspond with the pedigree file (this can be done using the !SAVE option in ASReml to form the binary file, see Table 5.5), or the identifiers are whole numbers less than 9,999,999 and the

5

Command file: Reading the data

Introduction

Important rules

Title line

Specifying and reading the data

Data field definition syntax

Transforming the data

Transformation syntax Other rules and examples Special note on covariates Other examples

Datafile line

Datafile line syntax

Datafile qualifiers

Job control qualifiers

5 Command file: Reading the data 47

5.1

Introduction

NIN Alliance Trial 1989 variety !A # Alphanumeric id // pid // raw repl 4 nloc yield lat long row 22 column 11 nin89aug.asd !skip 1 yield mu variety 1 2 11 column AR1 .424 22 row AR1 .904 In the code box to the right is the ASReml

command filenin89a.asfor a spatial analysis of the Nebraska Intrastate Nursery (NIN) field experiment introduced Chapter 3. The lines that are highlighted in bold/blue type relate to reading in the data. In this chapter we use this example to discuss reading in the data in detail.

Notice in line comment introduced by the character # and joining of lines indicated by

//.

In document ASREML user guide release 3.0 (Page 70-74)