Hypothesis Testing and Model Fitting Strategies

The same model fitting strategies will be used for the full sample and Q24 subsample.

Before explaining the model fitting strategies, I will clarify the definition of random effect and the hypothesis test in the MLLM. Some authors[4][63] discuss the random slope as the race interaction with site. To keep the terminology simple, I split the random slope into two parts: the fixed effect and random effect of black race. The fixed effect of black race represents the average difference in log odds between black and white veterans within sites.

The random effect of black race refers to the variation in the log odds associated with black race across sites.

Goldstein [25] discussed the approximate Wald test for both fixed and random effects (details in SectionII.D.2) and has implement this method in MLwiN. I used this approximate Wald test to perform the hypothesis test in this software. In the MLwiN, we can fit a model and make hypothesis test in two ways 1) manually 2) by macro. Although both produce exactly the same result, in the 2-level logistic model, the parameter estimates from one way won’t be recognized by the other way. To fit the model and do hypothesis tests manually, the MLwiN user Manual has step by step introduction[53]. To fit the model and perform the hypothesis test, I give a sample macro in the Appendix F.Cand F.D. Before performing the hypothesis test, a contrast matrix needs to be defined in a free column for the corresponding hypothesis test, using joint, ie joint c1 1 0 0 c1. The hypothesis test on the fix effect and random effect is implement by function ftest and rtest. In the MLwiN, the parameter estimates are stored in the specific box and cell. For example, in a 2-level unadjusted RI model, ftest c1 will perform a hypothesis on the fixed effect of intercept term. Instead rtest c1, will perform a hypothesis test on the level 2 residual of site.

A standard Chi square with 1 degree freedom test is used to test the fixed effect of black race, using a cut point of 3.84 for a two side 0.05 level test. A mixture Chi-2 test with 1 and 2 degrees of freedom is used for the test for a variance component, using a one-sided 0.05 level test. Because the variance components are always larger than 0, a one sided 0.05 level test is applied to Ho : σ_µ²_i >= 0, a mixture Chi-2 distribution is suggested.[21]

using a cut off value 5.14. Because of mixture Chi-2 distribution applied to random effect

hypothesis test estimates, only a range of P-value can be given. Estimation methods for variance components is very limited, and is an active research area, I will accept Goldstein’s Wald test to test variance components and will not perform any simulation on the variance components.

The RIGLS PQL2 method will be used to implement the simulation study, because this method can fit a model in several second and is the most accurate method within quasi-likelihood estimation. However, the analytic variance formulae were derived using the MQL1 method. To compare the accuracy between the PQL2 and MQL1 methods in both study populations, I will fit the same RI model using IGLS MQL1 and perform the same hypothesis tests.

Model diagnostics are applied to test the normality of residual and identify outliers. A caterpillar plot [25] for the site level will be applied to explore the site outliers and check the normality simultaneously.

For the RI model, I will fit the model and do the hypothesis test using both the RIGLS PQL2 and IGLS MQL1. Three hypothesis tests will be performed: 1) fixed effect of site, 2) random effect of site, and 3) fixed effect of black race as well as model diagnostics.

For the RC model, I will fit the model and do the hypothesis test using both the RIGLS PQL2 and IGLS MQL1 methods. Four hypothesis tests will be performed: 1) fixed effect of site, 2) random effect of site, 3) fixed effect of black race, and 4) random effect of black race as well as model diagnostics.

The outline of model fitting strategies for either the full data or Q24 data are shown as below:

I. Fit the RI model using the RIGLS PQL2 method A. Hypothesis test on fixed effects: site and black B. Hypothesis test on random effect: site

C. Model diagnostics

II. Fit the RI model using the IGLS MQL1 method A. Hypothesis test on fixed effects: site and black B. Hypothesis test on random effects: site

C. Model diagnostics

III. Fit the RC model using the RIGLS PQL2 method A. Hypothesis test on fixed effects: site and black B. Hypothesis test on random effects: site and black C. Model diagnostics

IV. Fit the RC model using the IGLS MQL1 method A. Hypothesis test on fixed effects: site and black B. Hypothesis test on random effects: site and black C. Model diagnostics

Another concern for the RC logistic model is whether the random slope term is sig-nificantly different than zero. Conditional logistic regression is useful in investigating the relationship between an outcome and a set of prognostic factors in matched case-control studies, the outcome being whether the subject is a case or a control. The interaction (group*intervention) can be used to check whether the site and race interaction exists across the matched groups. From this conditional logistic model, we can assess whether the race effect varies across sites, without assuming normality. Stata clogit function is applied to fit this conditional logistic model with race and site interaction. Further more, the main effect conditional logistic model and 2-level RI model will be fitted by Stata using Gaussian Quadrature method. The estimate from Stata will be compared to PQL2 estimates from MLwiN to check the accuracy of PQL2.

In document Experimental Design for Unbalanced Data Involving a Two level Logistic Model (Page 72-75)