• No results found

Preparation of data files and data entry An efficient system of data capturing and storing is essential in any breeding programme, and

DATA ENTRY

The sheet "fldbook" prepared according to the steps listed above can be printed for manual data recording in the field or in the lab, or can be saved in a palmtop for electronic recording of the data.

In the first case, the data have to be entered manually in the sheet “fldbook”, while in the second they will be transferred by connecting the palmtop to the desktop or the laptop. In either case it is strongly recommended that the data be entered or transferred

IMMEDIATELY AFTER BEING COLLECTED FROM THE FIELD. Once entered in the computer, it is suggested to calculate minimum and maximum or to rank them, to verify that there are no obvious mistakes. Even better is to analyse them, because this can easily reveal mistakes either in recording the data from the field or in entering them manually. A mistake discovered when the crop is still in the field can be easily fixed by going back to the field and by measuring again the plot(s) with value(s) which look suspicious or which are obviously wrong.

In the “fldbook” sheet it is important to use the same abbreviations and units as in the sheet “traits” (Figure 10). Additional information, such as missing plots, plot damage (by farm animals, ants, etc.), can be entered in a special column “notes”. In the case of the yield data, it is recommended to enter a column with the plot size and one with the area harvested, regardless whether the two differ or not . The principle in organizing this important sheet in the file is that it should be as transparent as possible to those who are not familiar with the trial.

Examples of the “fldbook” sheet in a data file of an unreplicated trial (Stage 1) and of a replicated trial (Stage 3) are given in Figures 29 and 30, respectively.

A number of features shown in the two Figures are worth mentioning. Firstly, it will be noticed that the entry number (column G in Figure 29 and column I in Figure 30) does not

necessarily correspond to the unique identification number of the genotype (column H in Figure 29 and J in Figure 30) for the reasons mentioned before. Secondly, as one of characters which are important to farmers in the case of barley (and presumably also of other cereals) is spike length, we take two measures of plant height, one from ground level to the bottom of the spike (PlHt_B_cm), and one from ground level to top of the spike excluding the awns (PlHt_T_cm); spike length (SL_cm) is then derived with a formula (column P - column O). This method does not have any advantage in case of manual note taking. However, in the case of electronic capturing of the data, the spike length is automatically calculated plot by plot as the other two measures are recorded, as an additional way to check the data. Thirdly, grain yield is recorded on one sample of 1.6 m2 as g/plot in the Stage 1 trial (column R in

Figure 29) and on two samples (because of larger plots) of 1.6 m2 in the Stage 3 trial (columns

T and U in Figure 30). Grain yield in kg/ha (columns U in Figure 29 and X in Figure 30) is obtained by dividing column R by plot size (column S) and multiplying by 10 in Stage 1 trials and by dividing the mean of the two samples by the plot size and multiplying by 10 in Stage 3 trials.

Columns V to AE (Figure 29) and Y to AH (Figure 30) contain the scores (from 0 = bad, to 4 = very good) given by individual farmers (in this case 10, but the number is obviously variable from village to village), and their average (FS) is in columns AF and AI, respectively. Eventually 1000-kernel weight in grams (TKW) is in column AG in Figure 29 while, since we have two samples for grain yield in the Stage  3 trial (Figure  30), 1000  kernel weight is measured independently in the two samples (columns AJ and AK) and their mean is in column AL.

FIGURE 31

An example of a “fldbook” sheet in a data file of a Stage 4 trial with 9 entries and 2 replications with one replication planted with one farmer and the second with another farmer.

In rows 204 and 205 of Figure 29 and in rows 78 and 79 of Figure 30 are the minimum and maximum for the values in each of the relevant columns. As mentioned earlier this is a powerful and easy way to detect a number of mistakes.

An example of the “fldbook” sheet in a data file of a Stage 4 trial with replications planted with different farmers is shown in Figure 31. The structure of the file is considerably simpler than those considered so far. The sheet “fldbook” has one column with the code for the village , one for the plot number, one for the entry number, one for the unique identification number (= plot number in the seed increase), and one for the farmer (which in this case is equivalent to the replication code).

These columns will be followed by those with the data, which are similar to those discussed earlier (see Figure 23 and 24) with the difference that, because now the plots are much larger, grain yield and 1000 kernel weight are measured on three samples each (columns N, O and P for grain yield in g/plot and columns AE, AF and AG for 1000-kernel weight in g).

The “fldbook” sheet shown in Figure 31 may also be used in Stage 2 and 3 trials when it is not possible to keep the trial as a physical unit.

Before leaving this topic, it worth mentioning that is very useful (for reasons that will be clear in the section on data analysis) to:

• Use the shortest, but still meaningful, possible abbreviation as a header for each variable.

• Always use the same abbreviation for the same variable.

• Always use the same sequence of variables. The actual sequence is important for plot, row, column and entry but not for the variables. However, once a given sequence for the variables is decided, always using the same sequence saves a lot of work during the subsequent data management analysis and data reporting.

The operations described above apply with only minor differences for a range of statistical analysis using a set of modules running in GenStat and available on request from the author.