• No results found

Reading BeadStudio output into beadarray

3.4 Analysing Gene expression bead-summary data

3.4.1 Reading BeadStudio output into beadarray

The format of the “Sample Probe Profile” (SPP) file exported from BeadStu- dio is already similar to that required by an ExpressionSetIllumina object as it has one row for each bead type. However, the columns in the SPP file are arranged with the expression values, standard errors and number of beads in adjacent columns for the same array. Therefore, the main challenge of read- ing BeadStudio output into beadarray is how to recognise the correct columns for each array and assigning to the correct part of an ExpressionSetIllumina object. A complication is that no standard format of the BeadStudio output exists and users are able to select as many columns as they like. Most column headings used by BeadStudio are generally the same between versions of the software (e.g. AVG Signal for the expression values), but the column names for the standard errors have been known to change. We therefore assume the column headings from the latest version of BeadStudio (version 3 at the

time of writing), but give users the chance to define alternative headings. Probably the most important column heading to specify denotes the column containing an identifier for each bead type. By default, this is assumed to be the column which contains unique numeric codes for each bead type.

In BeadStudio, it is also possible to export annotation information. How- ever, we recommend that this information is not exported if the file is to be read into beadarray, as some of the special characters used in the annotation fields cause problems in R. Also, the inclusion of the annotation is unneces- sary as it can be retrieved later on from other Bioconductor packages, such as illuminaHumanv1.

The function readBeadSummaryData is used to read exported BeadStu- dio data into beadarray. The minimum requirements for the function are the specification of a file name in the dataFile parameter, relating to the SPP file to be read. The complicated nature of BeadStudio output means that the list of parameters to this function can potentially be quite long and therefore full details will not be presented here (see the beadarray documen- tation for more information). Key points to note are the columns parameter, which allows the user to specify the names of the columns in the SPP file containing expression values, standard errors, the number of beads and de- tection scores. The ProbeID parameter also allows the column containing the unique identifiers for each bead type to be specified. This is a crucial step, as the ExpressionSetIllumina class does not allow repeated row names. Other parameters such as skip, sep and quote are important in specifying the format of the file. The default values of these are set to read BeadStudio version 3 output. Many common errors encountered during the execution of readBeadSummaryData can be solved by correctly setting these parame- ters, and wherever possible, beadarray will try to provide informative error messages. If problems using this function are reported to the Bioconductor mailing list, then the responses may be used to assist users with similar error messages.

Once the SPP file has been successfully read into memory by the read- BeadSummaryData function and the contents have been verified, a valid Ex- pressionSetIllumina object is created. Essentially, this process involves match- ing the column names supplied by the user to columns in the SPP file and

then creating a separate matrix for the expression values, standard errors, number of beads and detection values. The column and row names of these matrices are then set to the names of the arrays being read (determined from the SPP file) and the ProbeID values respectively. These matrices are then stored in the assayData slot of a newly created ExpressionSetIllumina ob- ject. Slots such as assayData are accessed using the “at” operator, with the $ operator then required to access the individual matrices. However, acces- sor functions such as exprs make this process convenient as the user does not need to know the details of how the class is implemented. Hence, the following two lines of code produce the same result.

e = BSData@assayData$exprs e = exprs(BSData)

If quality control information has also been exported from BeadStudio, the name of this file can be supplied as the qcFile parameter. As with the SPP file, the columns exported from BeadStudio can be specified by the user, and therefore parameters can be set to specify the contents of this file. If imported, the quality control information is stored in a separate slot (Q C) to the data imported from the SPP file and accessed using the QCInfo function. Additional information about the samples can be imported through the sampleSheet parameter. This is a text file, usually created by Excel, that allows users to specify what samples were hybridised to each array and any grouping of the samples. This information is stored in the phenoData slot of ExpressionSetIllumina, which is a standard feature of an ExpressionSet and can be accessed using the pData function.