Simulation - Estimation in group testing: what can be thrown

Chapter 4 Estimation in group testing: what can be thrown

4.3 Simulation

In this section, we conduct simulation studies to illustrate and compare the performance of the proposed estimators over all considered scenarios S1 – S3.

The group testing data are generated with sample sizeN = 3000. Since the groups have not constructed yet, we denote the index of subject byn for thenth individual, for n = 1,· · · , N. We first generate the true individual variables. For each single subject, we generate a m+ 1-dimensional covariate vector xn = (1, xn1,· · · , xnm)T,

where (xn1,· · · , xnm)T is simulated from a multivariate normal distribution with a

correlated variance-covariance matrix Σl1,l2 = 0.9

|l1−l2| _{and 0} _≤ _l

1, l2 ≤ m. We have

experimented with model dimensions m ∈ (5,10) with the corresponding parameter settings presented below.

• M5: β= (−6,−3,−1,−3,2,3)T

• M10: β= (−5,3,1,−2,1,1,−4,1,−1,−1,1)T • SM10: β= (−5,3,0,−1,0,0,−4,0,0,1,0)T

The first two parameter settings experience low-dimensional and high-dimensional parameters β with m = 5,10 respectively. The third setting considers sparse parameters of β. It is worthwhile to point out, the true regression coefficients β are set to ensure the infectious prevalence being around 8% under each parameter setting. The individual true infection status Ye_n is simulated from a Bernoulli random

variable with the probability of infection computed from (4.1) and an inverse of logit link; i.e. pr(Ye_n = 1 | x_n) = exp(x_nTβ)/{1 + exp(xT_nβ)}. In the next, we mimic

the two-stage Dorfman testing protocol. The individuals are randomly assigned to J non-overlapping pools. Without loss of generality, we consider a common group size across pools and c ∈ {2,5,10}. Then we use ij to reindex the ith individual from the jth pool, for i ∈ Gj and j = 1,· · · , J. With pre-specified testing

errors (Se, Sp), the first-stage pooled outcome of the jth pool is generated by Zj ∼

Bernoulli {SeZe_j+(1−S_p)(1−Ze_j)}, whereZe_j = max_i_∈G

jYeij. Only ifZj = 1, the proto-

col proceeds to the second stage, and we generate the retesting outcome of theith individual from positive pools byYij ∼ Bernoulli{SeYe_ij+(1−S_p)(1−Ye_ij)}. According to

our definition in Section 4.2, the individual diagnoses are recorded asDij =ZjYij for

i∈ Gj and j = 1,· · · , J. Finally, the observed testing data are {Dn :n = 1,· · · , N}

forS1,{Dij :i∈ Gj, j = 1,· · · , J}for S2 and{(Zj, D1j,· · · , Dcj) :j = 1,· · · , J}for

S3.

The simulation is repeated 500 times under each parameter, group size and data collection setting. To evaluate the overall estimation performance, we use the empir-

ical mean squared error (MSE), calculated by MSE =E{( ˆβ−β∗)T_{( ˆ}_β₋_β∗₎_}_, _where β∗ is the true β in the parameter setting which was used to generate data.

The results of parameter settingsM5,M10 andSM10 are reported in Tables 4.1 – 4.3, respectively. For the purpose of comparison, we also mimic the individual testing procedure and provide regression results for all parameter settings. We observe that the two-stage group testing could reduce up to about 45% of the testing cost compared to individual testing does. The details of estimation step under individual testing is provided in Appendix C. As for parameter estimation, in general, the estimates are close to the truth exhibiting small bias regardless of simulation settings. Although the average sample standard deviation increases as group size, it is an ex- pected phenomenon due to the loss of individual status information while testing on large pools. Let’s compare the model estimation over considered data collection settings S1 – S3. One can observe that the MSEs uniformly decrease from S1 to

S2 and S2 to S3, which implies the use of more data would result in a better model estimation. However, for all simulation settings,S2 beats individual testing in terms of producing estimates with lower MSEs. RegardingS1 of using purely individual diagnoses, we found that the moment estimators do not perform well when the number of covariates is small (Table 1.1). In contrast, when regressing on a larger number of covariates (see Tables 4.2 and 4.3), the MSEs of estimator from the “method of moment” (S1) are greatly improved no matter whether the sparsity of β. In particular, for some cases, the estimation performance under S1 is even better than that of S2 and individual testing. Therefore, at a concern of estimation stability and practical efficiency, when performing group testing for screening, we strongly recommend lab- oratories recording only individual diagnoses (and the group memberships) instead of keeping track of every testing outcomes.

Table 4.1: Summary statistics of the estimates under parameter settingM5, data collection scenariosS1 –S3 of two-stage group testing withc∈ {2,5,10}and individual testing (IT). Reported are the average values over 500 simulation runs, with the standard deviations in parentheses. The average numbers of test are 2053.40 (c= 2), 1652.48 (c= 5) and 1944.92 (c= 10). The average prevalence of infection is 7.79%.

c= 2 c= 5 c= 10

IT S1 S2 S3 S1 S2 S3 S1 S2 S3

True Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD)

β0 -6 -6.14(0.52) -6.17(0.54) -6.11(0.42) -6.10(0.41) -6.14(0.55) -6.11(0.43) -6.10(0.41) -6.16(0.53) -6.10(0.46) -6.12(0.44) β1 -3 -3.08(0.38) -3.08(0.41) -3.06(0.35) -3.05(0.34) -3.07(0.39) -3.07(0.36) -3.06(0.35) -3.08(0.42) -3.04(0.39) -3.06(0.37) β2 -1 -1.01(0.40) -1.05(0.39) -1.03(0.34) -1.03(0.33) -1.04(0.40) -1.05(0.38) -1.03(0.37) -1.01(0.42) -1.03(0.41) -1.02(0.39) β3 -3 -3.09(0.48) -3.07(0.46) -3.06(0.42) -3.06(0.42) -3.05(0.50) -3.05(0.45) -3.04(0.44) 3.08(0.50) 3.06(0.47) 3.05(0.45) β4 2 2.05(0.42) 2.03(0.42) 2.03(0.38) 2.03(0.38) 2.05(0.47) 2.05(0.41) 2.05(0.40) 2.03(0.44) 2.03(0.41) 2.01(0.39) β5 3 3.08(0.39) 3.10(0.46) 3.08(0.40) 3.07(0.40) 3.06(0.49) 3.06(0.43) 3.04(0.42) 3.08(0.51) 3.06(0.46) 3.05(0.44) MSE 1.1809 1.2736 0.8500 0.8190 1.4266 0.9418 0.8899 1.3123 1.0627 0.9357 75

Table 4.2: Summary statistics of the estimates under parameter setting M10, data collection scenarios S1 – S3 of two-stage group testing with c ∈ {2,5,10} and individual testing (IT). Reported are the average values over 500 simulation runs, with the standard deviations in parentheses. The average numbers of test are 2060.25 (c = 2), 1664.83 (c = 5) and 1964.48 (c = 10). The average prevalence of infection is 7.91%.

c= 2 c= 5 c= 10

IT S1 S2 S3 S1 S2 S3 S1 S2 S3

True Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD)

β0 -5 -5.09(0.40) -5.13(0.35) -5.09(0.29) -5.10(0.28) -5.08(0.34) -5.11(0.31) -5.11(0.31) -5.10(0.37) -5.08(0.35) -5.13(0.34) β1 3 3.08(0.37) 3.07(0.32) 3.05(0.31) 3.05(0.30) 3.08(0.32) 3.08(0.32) 3.08(0.31) 3.08(0.34) 3.12(0.38) 3.09(0.36) β2 1 1.02(0.35) 1.05(0.34) 1.03(0.33) 1.03(0.32) 1.00(0.33) 1.03(0.34) 1.03(0.33) 0.98(0.36) 1.01(0.39) 0.98(0.37) β3 -2 -2.06(0.40) -2.05(0.38) -2.03(0.37) -2.03(0.36) -2.02(0.33) -2.06(0.35) -2.05(0.34) -2.02(0.36) -2.05(0.38) -2.03(0.37) β4 1 1.01(0.35) 1.01(0.37) 1.00(0.35) 1.00(0.34) 1.00(0.34) 1.03(0.34) 1.02(0.34) 1.01(0.36) 1.06(0.37) 1.05(0.36) β5 1 1.05(0.36) 1.02(0.36) 1.02(0.34) 1.02(0.34) 1.00(0.36) 1.01(0.35) 1.01(0.34) 1.02(0.36) 1.06(0.37) 1.05(0.36) β6 -4 -4.09(0.50) -4.07(0.42) -4.06(0.39) -4.07(0.38) -4.00(0.41) -4.08(0.43) -4.07(0.43) -4.07(0.46) -4.15(0.47) -4.12(0.45) β7 1 1.01(0.36) 1.01(0.34) 1.01(0.33) 1.01(0.33) 0.99(0.33) 1.03(0.33) 1.02(0.33) 1.02(0.37) 1.05(0.38) 1.03(0.37) β8 -1 -1.02(0.36) -1.00(0.33) -1.00(0.33) -0.99(0.33) -1.02(0.33) -1.04(0.35) -1.04(0.34) -1.01(0.33) -1.05(0.35) -1.03(0.34) β9 -1 -1.03(0.39) -1.03(0.33) -1.03(0.32) -1.03(0.32) -1.00(0.34) -1.03(0.34) -1.03(0.34) -1.02(0.35) -1.04(0.36) -1.02(0.35) β10 1 1.02(0.29) 1.03(0.25) 1.02(0.24) 1.02(0.24) 1.02(0.26) 1.03(0.26) 1.03(0.25) 1.03(0.27) 1.04(0.28) 1.02(0.28) MSE 1.5978 1.3505 1.2059 1.1770 1.2485 1.3018 1.2669 1.4364 1.5517 1.4693 76

Table 4.3: Summary statistics of the estimates under parameter setting SM10, data collection scenarios S1 – S3 of two-stage group testing with c ∈ {2,5,10} and individual testing (IT). Reported are the average values over 500 simulation runs, with the standard deviations in parentheses. The average numbers of test are 2067.44 (c = 2), 1672.91 (c = 5) and 1983.30 (c = 10). The average prevalence of infection is 8.02%.

c= 2 c= 5 c= 10

IT S1 S2 S3 S1 S2 S3 S1 S2 S3

True Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD) Mean(SD)

β0 -5 -5.17(0.41) -5.02(0.31) -5.09(0.28) -5.08(0.28) -5.06(0.34) -5.10(0.33) -5.10(0.33) -5.03(0.32) -5.04(0.34) -5.09(0.33) β1 3 3.10(0.38) 3.02(0.30) 3.05(0.29) 3.05(0.29) 3.04(0.32) 3.06(0.32) 3.05(0.31) 3.08(0.32) 3.04(0.33) 3.08(0.33) β2 0 0.01(0.35) -0.02(0.32) -0.01(0.32) -0.01(0.32) 0.01(0.33) 0.01(0.33) 0.01(0.33) -0.02(0.36) -0.02(0.36) -0.02(0.35) β3 -1 -1.05(0.39) -0.98(0.33) -1.01(0.35) -1.01(0.34) -1.03(0.34) -1.05(0.34) -1.04(0.34) -1.00(0.34) -1.03(0.35) -1.02(0.34) β4 0 0.01(0.36) 0.00(0.32) 0.02(0.33) 0.02(0.33) 0.02(0.31) 0.02(0.33) 0.02(0.33) -0.02(0.32) 0.00(0.35) 0.00(0.34) β5 0 0.00(0.36) -0.03(0.32) -0.03(0.31) -0.03(0.30) -0.02(0.31) 0.00(0.33) 0.00(0.33) 0.01(0.33) 0.01(0.35) 0.01(0.34) β6 -4 -4.14(0.52) -3.97(0.38) -4.05(0.39) -4.05(0.39) -4.00(0.42) -4.11(0.42) -4.10(0.41) -3.99(0.39) -4.09(0.46) -4.08(0.45) β7 0 0.00(0.36) -0.03(0.31) -0.02(0.32) -0.02(0.32) 0.00(0.33) 0.02(0.32) 0.03(0.32) -0.01(0.33) 0.01(0.34) 0.01(0.33) β8 0 0.02(0.36) 0.01(0.30) 0.01(0.30) 0.01(0.30) -0.01(0.33) -0.01(0.33) -0.01(0.32) -0.01(0.33) -0.01(0.34) -0.01(0.34) β9 1 1.03(0.37) 1.02(0.31) 1.03(0.31) 1.04(0.30) 0.98(0.33) 1.00(0.33) 1.00(0.33) 0.98(0.34) 1.03(0.36) 1.01(0.35) β10 0 -0.01(0.25) 0.00(0.25) -0.01(0.25) -0.01(0.24) 0.01(0.25) 0.01(0.24) 0.01(0.23) 0.03(0.25) 0.01(0.26) 0.01(0.25) MSE 1.6413 1.1024 1.1130 1.0876 1.2093 1.2262 1.1901 1.2067 1.3564 1.3182 77

In document The Warped One: Nationalist Adaptations of the Cuchulain Myth (Page 91-97)