STIMA method - Advancements of recursive partitioning methodology to detect moderators of

Part II Statistical Methodology

6.3 Advancements of recursive partitioning methodology to detect moderators of

6.3.2 STIMA method

The STIMA method simultaneously estimates a linear regression model whilst searching for interaction effects by growing a tree. As mentioned before, the STIMA

102

method is more generalised compared to the previously proposed RTA method because it searches for all types of interaction. Therefore, in order for the method to search specifically for treatment-covariate interactions, the user has to force the first split on the treatment variable. It may be difficult to see how forcing the first split allows the tree to search for treatment-covariate interactions; however this will be explained later on.

The first step in the algorithm requires a linear regression model to be fitted using all covariates in the root node. Thereafter, all possible split points, say s, for all covariates are searched to identify the split that generates the largest effect size when comparing the model that was fit before split s to the model after adding split s as an indicator variable. The effect size is determined using

𝑓𝑠2=

(𝜌_𝑠2_{− 𝜌} 𝑠−12 )

(1 − 𝜌𝑠2) (6.9)

where 𝜌_𝑠−12 ₌∑ (𝑌̂𝑖 𝑖(𝑠−1)−𝑌̅)2

∑ (𝑌𝑖 𝑖−𝑌̅)2 is the squared multiple correlation coefficient of the model

before split s i.e. s-1, thus 𝜌_𝑠2₌∑ (𝑌̂𝑖 𝑖𝑠−𝑌̅)2

∑ (𝑌𝑖 𝑖−𝑌̅)2 is the squared multiple correlation coefficient

of the model having split the node on s (133). This effect size essentially measures the increase in the variance-accounted-for (VAF) from the model without split s i.e. s-1, to the model with split s. The split point s that induces the largest relative increase in VAF is chosen as the next best split.

Growing a tree – parameter specifications

The aforementioned splitting criterion for STIMA is used to grow a full tree, which the authors also refer to as a regression trunk. Each iteration of the tree growing process evaluates the effect size for every single split of every single variable within every single node choosing only a single best split at the end of the iteration. This is different

103

to the IT approach which searches each node separately and creates a split for each node if it is permissible, whereas STIMA searches all nodes and forms just the one split. If a best split is found, it is added to the linear regression model as an indicator variable and then the model is re-estimated. This process is continuously repeated until either a stopping criterion is met or if the tree cannot find any more effective splits.

The code for the STIMA approach is readily available on the author’s webpage (Elise Dusseldorp - http://www.elisedusseldorp.nl/). When applying the STIMA procedure, it requires a number of user-defined parameters to be specified to aid the tree growing process. These include the minimum node size, the maximum number of splits, the column index of the variable in the dataset on which to force the first split of the tree (if required) and finally the specification of the value V for the V-fold cross-validation procedure. The authors recommend using either 5-fold or 10-fold for the cross- validation procedure.

Forced first split on treatment

In order to detect treatment-covariate interactions, the first split must be forced on the treatment variable. To better understand this, consider an example of a single one-way interaction between a binary treatment variable, say Trt, and a binary baseline

covariate, say Grp, with a Trt*Grp interaction effect. We can display the interaction by

Trt=0 Trt=1

Grp=0 a b

104

where a, b, c and d are the cell means. The interaction effect using the above is computed by (a-b)-(c-d) = a-b-c+d. After forcing the first split on treatment,

represented by the bold dashed line, the method then goes onto search all splits for every covariate within all nodes to identify the next best single split. In this example, it can only split on Grp within both treatments. If the Trt=0 node is further split by Grp, then the means in the two newly formed child nodes will be a and c respectively. Hence the difference in means between the two child nodes is a-c. Similarly, if the Trt=1 node is further split by Grp, then the means in the two newly formed child nodes will be b

and d respectively. Hence the difference in means between the two child nodes is b-d. Since the STIMA method makes just a single split per iteration, of the two splits, the method will select the split with the largest difference in cell means and then add it to the linear regression model as an indicator variable and the model re-estimated. If the next best split in this example is in the Trt=0 node, then one might assume that the same split is mirrored in the Trt=1 node in the next iteration, however this is not required since the model has been updated and re-estimated to incorporate the detected Trt*Grp interaction effect; thus only one split is made. Figure 6.2 better illustrates the STIMA tree for the above example of a single Trt*Grp interaction where the best split is made using node Trt=0.

Figure 6.2 – Example of STIMA tree for one-way interaction

Grp>0 Grp≤0

Trt=0

105

Selecting the best tree

Once a full tree has been grown, STIMA then applies V-fold cross validation to select the best tree size i.e. the best choice of s;the number of splits in the tree. Although CART and IT methods require a pruning procedure i.e. weakest link pruning, to obtain a sequence of subtrees, such a procedure is not required by STIMA. The reason for this is because STIMA is essentially a set of nested models that are defined by the number of splits s it has. Therefore, these nested models can be thought of as the sequence of subtrees obtained by the CART and IT pruning procedures. For example, a tree with four splits means that there are five candidate nested models to choose from; four models with interaction terms and the null model. Thus, the selection of the best size model determined by the number of splits s is done by using V-fold cross-validation in the same way as CART. To briefly explain, if a fully grown tree on the full dataset has three splits (s=3), then the cross-validation procedure will specify that the maximum number of splits formed when growing a tree on the training data is also three. This will result in four models being fitted, including the null model. The test sample is then used to obtain the predictive error of each model. The process is repeated V times in order to complete the V-fold cross-validation. In order to obtain stable estimates for the averaged predictive error for each tree from the cross-validation, the authors suggest repeating the cross-validation procedure several times and then averaging the results from which the best tree size is determined. For CART and IT, the best tree size is determined using the 1-SE rule, as described in 6.2.5. However the authors of STIMA suggest using a rule that depends on the sample size to improve the performance of STIMA. They suggest a 0.80-SE rule be used for sample sizes of less than 300 and a 0.50-SE rule used otherwise.

106

Interpretation of the final tree

Once the STIMA procedure has selected a final tree from the cross-validation

procedure, the final tree can then be interpreted. This can be done either by observing the model output to see what the estimated interaction effects are for the interactions detected by the procedure. Furthermore, a plot of the STIMA tree (or regression trunk) can be observed to ease the understanding and interpretation of the subgroups

identified.

6.3.3 SIDES method

In document Recursive partitioning based approaches for low back pain subgroup identification in individual patient data meta analyses (Page 119-124)