• No results found

5.2 Software Design

5.2.1 Software Architecture Overview

In the following, the presentation of the software architecture of longQDA is restricted to the components that are needed for the analysis of real, univariate biomarker data sets. The other components are described in Section 5.3 where the extensibility for the evaluation of multivariate biomarker panels as well as for the analysis of simulated data are demonstrated.

Since 1997, unied modelling language (UML) has been the standard approach for a structured object-oriented software development (Born et al., 2004). One of the key diagrams of UML is the class diagram depicting the class structure with its relationships. At the moment,R does not support the creation of class diagrams from the code (back- ward engineering), but there are plans to incorporate such a functionality in the future, called Ruml (R Foundation for Statistical Computing, 2008). First steps towards back- ward engineering are done by the package classGraph (Maechler and Gentleman, 2008). A class diagram includes usually the classes with their attributes and methods as well as the relationships between the classes. For the sake of a condensed presentation, the class- specic attributes and methods are omitted in the class diagram for longQDA (Figure 5.4). They are described exemplarily in the next subsection. Two sorts of class relationships are visualized: Every arrow joins the derived class with its base class (e.g. LongData and XsecData inherit from Data), a line labelled with its description symbolises an undirected relationship (the method toXsec converts an instance of class LongData to an instance of class XsecData with a cross-sectional data structure, for example). Note that only the most important relations are included. Abstract classes are printed in italics.

The software architecture reects the steps of a typical statistical analysis and is hence quite general and could be easily adopted for other software implementations: There are classes for data objects (on the top right of Figure 5.4), for the analysis setup (in the middle on the top), classes involved in the discriminant analyses (down right), classes containing the raw or summarized results of the analyses (on the left) and the Report class dening a standardized output (down left). We continue by presenting the class diagram, following that order as far as possible, and start with the initial step, the data import. In the upper right corner, all classes for structural data mapping are shown. They have already been partly described above to explain possible class relationships. The user operates only on objects of class LongData which are created from the user-provided raw data sets and which are used to set up the analysis by AnalysisSetup. Objects of class

76 5.2 Software Design XsecData are used only internally for tasks that require a cross-section data structure, e.g. for performing QDA or creating plots (in autoCorr, for example).

Figure 5.4: UML class diagram of the R package longQDA, omitting attributes and meth- ods. Created with Enterprise Architect (SparksSystems, 2008)

Objects of class AnalysisSetup specify all subanalyses that should be performed, includ- ing information about the MCCV design and global precision parameters. Based on the paths attribute of AnalysisSetup describing the subanalyses, AnalysisPath objects are internally set up. These objects determine the instantiation of objects of class QdaAlgo (for QDA) or LongDaUnivAlgo (for the univariate longQDA) during the execution of the method analyze. Besides the determination of the required data structure for the algorithms, the statistical analyses are performed as follows when calling analyze. The evaluation of the biomarker performance is split up into three steps: the estimation of the group-specic parameters for the quadratic discriminant rule with the training data sets, the prediction of the posterior probabilities by (long)QDA with the test sets and at last, the evaluation of the performance measures. The rst step is accomplished by the method fit, dened for the classes determining the algorithms for the discriminant analysis. In the case of longQDA, for example, objects of class RiAlgo, RisAlgo or Ricar1Algo dene the estimation of the means and the covariance matrices by mixed models. For QDA, the functionality of the R package MASS (Venables et al., 2008) is

used. For the univariate longQDA with an RI or RIS structure, the estimation is done by lme4 (Bates et al., 2008) whereas nlme (Pinheiro et al., 2008) is used for the RICAR1 structure2. For the second step, the evaluation of the discriminant rule, predict2 is

called. The method is dened for objects of class LongDaUnivModel and returns a list containing the estimated posterior probabilities amongst others. The classes RiModel, RisModel and Ricar1Model are derived classes of LongDaUnivModel which is in turn, as QdaModel, a derived class of LongDaModel. The same structure underlies the classes describing the corresponding algorithms. The third step involves the calculation of the performance measures by the constructor PerformanceMeasures. It uses the returned list of predict2 as input and returns an object of class PerformanceMeasures. The returned values of those three methods are stored in an instance of class ResultTree which is created within analyze.

These results are not yet summarized, they contain the results for each MCCV sam- ple. The method mccvSummary accomplishes this task by modifying the object of class ResultTree to consist of objects of class MccvSummary. To constrain the results for com- parison, the method selectResultLeafs returns a smaller version of class ResultList. It is recommended to create an instance of class Report at the beginning of a data analysis session. At the end of the analysis, the main parts of the report comprise the output which was generated by the method createReportFiles (for objects of class LongData, AnalysisSetup, MccvSummary or ResultList) or by the method save2Report for more individualized output.

Related documents