In this section I describe in detail the simulation procedure used to generate IPD from multiple studies to fit the meta-analysis models to the data, and to evaluate their performance. Initially, just a binary PF is considered, and a binary outcome (e.g. death). I want to emphasise here that all the simulations I perform in this thesis generate data at the patient-level. Further, the initiative STATA code for these simulations was written by Boliang Guo according to my specifications and design.
5.2.1 The Simulation Procedure
The procedure used in my simulation can be broken down in 7 steps as follows:
Step 1: The number of studies in the meta-analyses is chosen as 5 and 10 respectively. Note that the number of studies was fixed in a simulation.
Step 2: randomly sample the number of patients in each study from a uniform distribution ~ ¶(N, Ò); N = 30, Ò = 100 for small sample size and N = 30 and Ò = 1000 for large sample size, N and Ò were fixed per simulation.
Step 3: For each patient in each trial, randomly sample a PF value using for a binary PF using a Bernoulli distribution, where J<> ~ Ó-MMC¯||A(8M-É-|N-). The prevalence was 0.5 or 0.2. For a continuous PF a normal distribution is used to generate the data, where J<> ~ ©(Ì, Âp) whereÌ = 4 NK Â = 1.5. Note that A refers to the number of studies, and D refers to the number of patient.
[128]
Step 4: Sample the binary outcome (Ô<> ªℎ-M- Ô<> = 1 ÐCM K-NK NK Ô<> = 0 ÐCM N|AÉ- ) for each patient according to a specified relationship between the outcome and the PF. Where Ô<> ~Ó-MC¯||A ()<>), log F Õ13
#(Õ13H = ∝<+ ; J<>, ∝< ~ ©(∝, ÂË
p), where ; is the chosen PF effect
(log odds ratio) and ∝< could vary across the A studies according to a chosen ©(∝, ÂËp) distribution.
Step 5: Fit each of the three different meta-analyses models to the data generated as follows:
Model 2 (Two-step method) recall in chapter 4 - in the first step, ;i< is estimated for each IPD study separately and the pooled effect size is then estimated by using fixed-effect meta-analyses of the ;i<obtained: ¿AMLB LB-8: |C¼AB()<>) = ∝<+ ;<J<> S-CK LB-8: ;i< = ; + -< -<~ © Ã0, NM (;i<)Ä ;i =∑ ª∑ ª><?# < ;i< < > <?# Where ª< = Ö#
1¬, refers to the relative weight for each study A.
Model 6 (One-step method ignoring clustering) in this model the pooled value for ;i is
estimated by pooling all of the IPD data together without considering the clustering of patients within studies and the model is given as
[129]
Model 8 (One-step method accounting for clustering ) in this model the pooled value for ;i is
estimated in one step by accounting for the clustering of patients within studies by including study indicators, that allow a separate intercept term for each study, and the model is given as
|C¼AB()<>) =∝<+ ;J<>
Step 6: Repeat steps 1 to 5 a thousand times, keeping the chosen range of sample sizes, number of studies and parameter values as before in each step. This resulted in 1000 β values and 1000 standard error of β values for each model.
Step 7: Repeat step 1 to 6 for a different set of simulation criteria; that is choose again the number of studies, sample size distribution, xר distribution and values, and the true α×, β, and σÛp values in steps one to four, then fit the models to the generated data, and repeat 1000 times.
For a binary PF, Table 5.1 shows the different permutations I chose according to different α×, β, σÛp and prevelance of J<> = 1 values. Note that true ∝ here means the average log odds of the event (e.g. death) when the PF is zero, I chose this value according to the TBI data; ; means the true pooled effect size of the PF and again I chose the value of this parameter equals 0.90 according to the TBI data then consider other situations by assuming ; = 0 CM 0.10, relating to no prognostic effect and small effect respectively. Also I chose ∝= −1.27 again according the TBI dataset and I allow different standard deviations (SD) for ∝ (i.e. SD=0 such that baseline risk is the same across studies; and ‘SD= 0.25 or 1.5’ which assumes a small and slightly larger baseline risk respectively). Note that when SD=1.5 of the baseline risk ∝, 95% range for the baseline |C¼ F#(GG H across studies is between (-1.27± 2×1.5), and the rearranging gives a 95% range for the baseline probability from 0.014 and 0.849. So, there is a substantial variation for the baseline risk ∝ across studies. The standard deviation for ∝ here indicates clinically that the
[130]
variation in baseline risk, ∝ across studies might come from other factors such as the treatments being used in study, its location, measurement techniques etc.
Table 5.1: The possible values considered for ∝ , standard deviation of ∝ , and prevalence of
the binary PF in the simulation scenarios.
Scenario no. True ∝
Standard deviation for ∝ True Prevalence of the true PF 1 -1.27 0 0.90 0.5 2 -1.27 0 0.10 0.5 3 -1.27 0 0.00 0.5 4 -1.27 0.25 0.90 0.5 5 -1.27 0.25 0.10 0.5 6 -1.27 0.25 0.00 0.5 7 -1.27 0 0.90 0.2 8 -1.27 0 0.10 0.2 9 -1.27 0 0.00 0.2 10 -1.27 0.25 0.90 0.2 11 -1.27 0.25 0.10 0.2 12 -1.27 0.25 0.00 0.2 13 -1.27 1.5 0.90 0.2 14 -1.27 1.5 0.10 0.2 15 -1.27 1.5 0.00 0.2 16 -1.27 1.5 0.90 0.5 17 -1.27 1.5 0.10 0.5 18 -1.27 1.5 0.00 0.5
Each scenario was considered for 5 or 10 studies, and 30 to 100 (for small sample size) or 30 to 1000 (for large sample size) patients.
5.2.2 Evaluating the Performance of Statistical Models
For each simulated scenario, 1000 ;i and their standard error were available for each model. The assessment of the three models was examined in terms of different criteria such as bias, mean square error (MSE), and coverage for all parameters estimates and their standard errors for all scenarios.
[131] Assessment of Bias
Bias is the deviation of the average of the pooled effect size estimates from the true value (;i̅– ;); the smaller the bias, the better the estimation method and on unbiased method would have zero bias132-134. Percentage bias is another statistic to summarise bias; it is given as70{(0
0 9 ∗ 100.
Larger percentage bias indicates a worse estimation and if the percentage bias is greater than 10% , this indicates that the bias is meaningful134
. Note that with 95% confidence interval and 1000 simulations studies, it yielded a small difference between the true bias and the estimated bias by using the following equation
^ =(Ý#(ÃË pÓÞ Ä Â)p
Where B is the number of simulation studies, Ý#((∝ pÞ ) is the 1 − ∝ 2Þ quantile of the standard normal distribution, Âp is the variance of the pooled effect size of the PF, and ^ is the specified level of accuracy of the pooled effect size of the PF (i.e. the accepted difference between the true values ; ).
Assessment of Accuracy
Mean square error (MSE)135 is used to assess the overall accuracy of the statistical model and it is given as
ßSo = (;i̅ − ;)p+ (So(;i))p
Where (;i̅ − ;)p refers to bias of the difference between the average pooled estimate, ;i̅, and the true pooled effect size, ;, for the 1000 estimates and (So(;i))p refers to the mean standard error of ;i in one-step and two step meta-analyses. It is considered as a useful method for the assessment of estimation accuracy as it includes both of the bias and the variability.
[132] Assessment of Coverage
Burton et al.132 state that ‘The coverage of a confidence interval is the proportion of times that
the obtained confidence interval contains the true specified parameter value’. The nominal
coverage rate in this chapter is 95%, and a good estimation method will give an observed coverage in the simulation the same as the nominal coverage rate. If the observed coverage rate is above the nominal average rate, then the estimation method is too conservative, this means that too few samples will find significant results when the true effect is non-zero (i.e. the factor is prognostic); thus this leads to a loss of the statistical power. If the coverage rate is lower than the nominal coverage rate; this leads to over-confidence and too many samples then will yield significant results when the true effect is zero (i.e. the factor is not prognostic); this leads to increased type I error and is anti-conservative. To check whether the coverage is suitable in the 1000 simulations in each scenario, I expected coverage percentage should lie between 8 ± 1.96 L. - (8) where L. - (8) = à8(1 − 8) 1000⁄ , and 8 = 0.95 assuming that the true coverage is 95%. The L. -(8) in my simulations equals 0.006892, so the observed coverage should lie between 0.936 and 0.964. If the estimated coverage lies outside this interval, this indicates a problem.