A Combinatorial Testing Problem - Conclusions and Future Work

Chapter 6 Conclusions and Future Work

D.2 A Combinatorial Testing Problem

A related problem to the main problem of interest in this thesis (2.1) is the following combinatorial testing problem (see, for example, [68] for a related problem):

Under the null hypothesis H0,n, the data is n samples drawn i.i.d. from

f0,n.

Under the alternative hypothesis, the data is generated by the following procedure (assuming nn is integral):

1. Generate n(1−n) samples from f0,n and nn samples from f1,n inde-

pendently.

2. Apply a uniformly at random permutation to the samples.

In other words, the null hypothesis consists of pure noise (as in the sparse mixture detection problem (2.1)). However, the alternative hypothesis for the combinatorial testing problem consists ofexactly nnsamples drawn from

the signal distribution and the remainder noise (but it is not known which coordinates are signal). In the case of the sparse mixture detection problem (2.1), there is arandomnumber of samples drawn from the signal distribution (following a Binomial(n, n) distribution) under the alternative hypothesis.

In many cases (see, for example, [2,15,17]), the combinatorial testing problem has been used as a surrogate for sparse mixture detection problem (2.1)

for performance evaluation of statistics designed for the sparse mixture detection problem (2.1).

However, the combinatorial testing problem may be the true problem of interest. As in [14, 68], the use of statistics designed for the sparse mixture detection problem for the combinatorial testing problem may be desirable from a computational perspective in order to avoid the combinatorial search of a generalized likelihood ratio test or scan statistic-based tests. The combinatorial testing problem is also of interest in microarray analysis [7, 15].

We make the following observation: Depending on the signal strength, one can have vastly different error probabilities between the combinatorial testing problem and the sparse mixture detection problem.

Consider the Gaussian location model described in Section 2.2.1. Recall the Max test from Theorem 6, where we reject the null hypothesis if the sample maximum exceeds τn. The probability of missed detection for the

combinatorial testing problem for the Max test is

PMD,Max,Combinatorial(n) = (Φ(τn−µn))nn(Φ(τn))n(1

−n)

. (D.2)

We can easily see from (D.2) that if τn =

2(1 +o(1)) logn and µn is

large (say, growing linearly in n), then logPMD,Max,Combinatorial(n) can decay

significantly faster thannn. However, by Theorem 2, for any sparse mixture

detection problem, logPMD(n) can decay no faster thannn (due to the event

of not observing any signals), independent of µn.

While the argument above considers a relatively uninteresting detection problem due the very high signal strength, the point is that the error probabilities in the combinatorial testing problem may be vastly different from that in the sparse mixture detection problem.

We repeat the experiments of Section 5.4 showing the trade-off between signal strength and sparsity in the Gaussian location model in an identical manner, but with the data being drawn under combinatorial testing setup with same signal and noise distribution parameterization as the Gaussian location model. We refer to this combinatorial testing problem as a Gaussian location combinatorial testing problem. The results are given in Figs. D.2a and D.2b. Note that the LRT is for the sparse mixture detection problem, given by (2.10), not the combinatorial testing problem. We see while the relative performance and shape of power curves between tests are similar to

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 β PD =1−P MD LRT BJ M=8 M=4 ACW HC Max

(a)n= 104 (Compare to Figure 5.4a)

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β PD =1−P MD LRT BJ M=8 M=4 ACW HC Max (b)n= 106 (Compare to Figure 5.4b)

Figure D.2: Plot of PD= 1−PMD versus β for r = 1.2rcrit(β) + 0.1,

PFA = 0.05 andn = 106 with data drawn under Gaussian location

combinatorial testing problem.

the results of Section 5.4 for the tests applied to the sparse mixture detection problem, the actual power of tests may be different.

References

[1] R.L. Dobrushin, “A statistical problem arising in the theory of detection of signals in the presence of noise in a multi-channel system and leading to stable distribution laws,” Theory of Probability & Its Applications, vol. 3, no. 2, pp. 161–173, 1958.

[2] D. Donoho and J. Jin, “Higher criticism for detecting sparse heterogeneous mixtures,” Annals of Statistics, pp. 962–994, 2004.

[3] M.R. Bloch, “Covert communication over noisy channels: A resolvability perspective,” IEEE Transactions on Information Theory, vol. 62, no. 5, pp. 2334–2354, 2016.

[4] J. Fridrich, Steganography in Digital media: Principles, Algorithms, and Applications, Cambridge University Press, 2009.

[5] E. Mossel and S. Roch, “Distance-based species tree estimation: Information-theoretic trade-off between number of loci and sequence length under the coalescent,” in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2015, vol. 40 of LIPIcs- Leibniz International Proceedings in Informatics, Wadern,DE.

[6] J.J. Goeman and P. B¨uhlmann, “Analyzing gene expression data in terms of gene sets: methodological issues,” Bioinformatics, vol. 23, no. 8, pp. 980–987, 2007.

[7] S. Dudoit and M. J. van der Laan, Multiple Testing Procedures with Applications to Genomics, Springer, New York, NY, 2008.

[8] L. Cayon, J. Jin, and A. Treaster, “Higher Criticism statistic: detecting and identifying non-Gaussianity in the WMAP first-year data,” Monthly Notices of the Royal Astronomical Society, vol. 362, no. 3, pp. 826–832, 2005.

[9] J. Jin et al., “Cosmological non-Gaussian signature detection: Com- paring performance of different statistical tests,” EURASIP Journal on Advances in Signal Processing, vol. 2005, no. 15, pp. 297184, 2005.

[10] Y. I. Ingster and I.A. Suslina, Nonparametric Goodness-of-Fit Testing under Gaussian models, vol. 169 of Lecture Notes in Statistics, Springer Science & Business Media, New York, NY, 2012.

[11] D. Donoho and J. Jin, “Higher criticism thresholding: Optimal feature selection when useful features are rare and weak,” Proceedings of the National Academy of Sciences, vol. 105, no. 39, pp. 14790–14795, 2008. [12] D. Donoho, “50 years of data science,” Keynote, John W. Tukey 100th

Birthday Celebration at Princeton University, September 2015.

[13] D. Donoho and J. Jin, “Higher criticism for large-scale inference, espe- cially for rare and weak effects,” Statistical Science, vol. 30, no. 1, pp. 1–25, 2015.

[14] V. Saligrama and M. Zhao, “Local anomaly detection,” inInternational Conference on Artificial Intelligence and Statistics, La Palma, Spain, 2012, pp. 969–983.

[15] E. Arias-Castro and M. Wang, “Distribution-free tests for sparse heterogeneous mixtures,” Test, vol. 26, no. 1, pp. 71–94, 2017.

[16] T.T. Cai and Y. Wu, “Optimal detection of sparse mixtures against a given null distribution,” IEEE Transactions on Information Theory, vol. 60, no. 4, pp. 2217–2232, 2014.

[17] T.T. Cai, X.J. Jeng, and J. Jin, “Optimal detection of heterogeneous and heteroscedastic mixtures,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 73, no. 5, pp. 629–662, 2011. [18] G. Walther, “The average likelihood ratio for large-scale multiple testing

and detecting sparse mixtures,” in From Probability to Statistics and Back: High-Dimensional Models and Processes–A Festschrift in Honor of Jon A. Wellner, pp. 317–326. Institute of Mathematical Statistics, 2013.

[19] Y.I. Ingster, “Adaptive detection of a signal of growing dimension. I,”

Mathematical Methods of Statistics, vol. 10, no. 4, pp. 395–421, 2001. [20] Y.I. Ingster, “Adaptive detection of a signal of growing dimension. II,”

Mathematical Methods of Statistics, vol. 11, no. 1, pp. 37–68, 2002. [21] W. Hoeffding, “Asymptotically optimal tests for multinomial distri-

butions,” The Annals of Mathematical Statistics, vol. 36, no. 2, pp. 369–401, 1965.

[22] T.M. Cover and J.A. Thomas, Elements of Information Theory, John Wiley & Sons, Hoboken, NJ, 2 edition, 2006.

[23] H.V. Poor, An Introduction to Signal Detection and Estimation, Springer Science & Business Media, New York, NY, 2 edition, 1994. [24] E.L. Lehmann and J.P. Romano, Testing Statistical Hypotheses,

Springer Science & Business Media, New York, NY, 3 edition, 2006. [25] D. Hoffman, “I had a funny feeling in my gut,” Washington Post, p.

A19, February 10, 1999.

[26] J.A. Hartigan, “A failure of likelihood asymptotics for normal mixtures,” in Proceedings of the Berkeley Conference in Honor of J. Neyman and J. Kiefer, 1985, 1985, pp. 807–810.

[27] L. Jager and J.A. Wellner, “Goodness-of-fit tests via phi-divergences,”

The Annals of Statistics, pp. 2018–2053, 2007.

[28] E. Arias-Castro and M. Wang, “The sparse Poisson means model,”

Electronic Journal of Statistics, vol. 9, no. 2, pp. 2170–2201, 2015. [29] E. Arias-Castro, E. J. Cand`es, and Y. Plan, “Global testing under sparse

alternatives: ANOVA, multiple comparisons and the higher criticism,”

The Annals of Statistics, vol. 39, no. 5, pp. 2533–2556, 2011.

[30] C. Cachin, “An information-theoretic model for steganography,” Infor- mation and Computation, vol. 192, no. 1, pp. 41–56, 2004.

[31] P. Moulin and J.A. O’Sullivan, “Information-theoretic analysis of information hiding,” IEEE Transactions on Information Theory, vol. 49, no. 3, pp. 563–593, 2003.

[32] A. Dembo and O. Zeitouni, Large Deviations: Techniques and Applica- tions, vol. 38 of Applications of Mathematics, Springer, New York, 2, corrected edition, 2010.

[33] J.G. Ligo, G.V. Moustakides, and V.V. Veeravalli, “Rate analysis for detection of sparse mixtures,” inAcoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, Shanghai, China, 2016, IEEE, pp. 4244–4248.

[34] J.G. Ligo, G.V. Moustakides, and V.V. Veeravalli, “Detecting sparse mixtures: Rate of decay of error probability,” arXiv preprint arXiv:1509.07566, 2015.

[35] A.L. Gibbs and F.E. Su, “On choosing and bounding probability met- rics,” International Statistical Review, vol. 70, no. 3, pp. 419–435, 2002. [36] F. Den Hollander, Large Deviations, vol. 14, American Mathematical

[37] D.A. Darling and P. Erd¨os, “A limit theorem for the maximum of normalized sums of independent random variables,” Duke Math. J, vol. 23, no. 1, pp. 143–155, 1956.

[38] J.G. Ligo, G.V. Moustakides, and V.V. Veeravalli, “Detection of sparse mixtures: The finite alphabet case,” inAsilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, 2016, IEEE, pp. 1243–1247. [39] W.C.M. Kallenberg, “On moderate and large deviations in multinomial

distributions,” The Annals of Statistics, pp. 1554–1580, 1985.

[40] A. Gersho and R.M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishing, Norwell, MA, 1992.

[41] R.M. Gray and D.L. Neuhoff, “Quantization,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2325–2383, 1998.

[42] R.C. Gonzalez and R.E. Woods, Digital Image Processing, Prentice- Hall, Upper Saddle River, NJ, USA, 3 edition, 2006.

[43] W.R. Bennett, “Spectra of quantized signals,” Bell Labs Technical Journal, vol. 27, no. 3, pp. 446–472, 1948.

[44] D.L. Neuhoff, “The other asymptotic theory of lossy source coding.,” in

Coding and Quantization, 1993, vol. 14 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pp. 55–67.

[45] H.V. Poor and J.B. Thomas, “Applications of ali-silvey distance mea- sures in the design generalized quantizers for binary decision systems,”

IEEE Transactions on Communications, vol. 25, no. 9, pp. 893–900, 1977.

[46] P.K. Varshney, Distributed Detection and Data Fusion, Springer Science & Business Media, New York, NY, 1997.

[47] A. Kiely and M. Klimesh, “The ICER progressive wavelet image com- pressor,” IPN Progress Report, vol. 42, no. 155, pp. 1–46, 2003.

[48] H.S. Malvar, “Fast progressive wavelet coding,” in Data Compression Conference, 1999. Proceedings. DCC’99. IEEE, 1999, pp. 336–343. [49] J. Unnikrishnan and D. Huang et al., “Universal and composite hy-

pothesis testing via mismatched divergence,” IEEE Transactions on Information Theory, vol. 57, no. 3, pp. 1587–1603, 2011.

[50] J.G. Ligo, G.V. Moustakides, and V.V. Veeravalli, “Sparse Gaussian mixture detection: Low complexity, high performance tests via quantization,” in IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 2017, IEEE, pp. 1272–1276.

[51] R. B. Ash, Information Theory, Dover, New York, corrected edition, 1990.

[52] G.R. Shorack and J.A. Wellner, Empirical Processes with Applications to Statistics, Classics in Mathematics. SIAM, Philadelphia, PA, 2009. [53] A. DasGupta, Asymptotic Theory of Statistics and Probability, Springer

Science & Business Media, New York, NY, 2008.

[54] T.W. Anderson and D.A. Darling, “Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes,” The Annals of Math- ematical Statistics, pp. 193–212, 1952.

[55] S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence, Oxford university press, 2013. [56] P. Hall and J. Jin, “Properties of higher criticism under strong depen-

dence,” The Annals of Statistics, pp. 381–402, 2008.

[57] P. Hall and J. Jin, “Innovated higher criticism for detecting sparse signals in correlated noise,” The Annals of Statistics, vol. 38, no. 3, pp. 1686–1732, 2010.

[58] Y. Baryshnikov et al., “Types of Markov fields and tilings,” IEEE Transactions on Information Theory, vol. 62, no. 8, pp. 4361–4375, 2016. [59] D.J. Hand, “Classifier technology and the illusion of progress,” Statis-

tical Science, vol. 21, no. 1, pp. 1–14, 2006.

[60] J.H. Friedman and R.W. Gayler, “Comment: Classifier technology and the illusion of progress: Credit scoring,” Statistical Science, vol. 21, no. 1, pp. 15–21, 2006.

[61] J. Sharpnack, Graph Structured Normal Means Inference, Ph.D. thesis, Carnegie Mellon University, 2013.

[62] J. Sharpnack and E. Arias-Castro, “Exact asymptotics for the scan statistic and fast alternatives,” Electronic Journal of Statistics, vol. 10, no. 2, pp. 2641–2684, 2016.

[63] E. Arias-Castro, E.J. Candes, and A. Durand, “Detection of an anoma- lous cluster in a network,” The Annals of Statistics, vol. 39, no. 1, pp. 278–304, 2011.

[64] R. Durrett, Probability: Theory and Examples, Cambridge University Press, New York, NY, 4 edition, 2010.

[66] I. Csiszar and J. K¨orner, Information Theory: Coding theorems for Discrete Memoryless Systems, Cambridge University Press, 2 edition, 2011.

[67] M. Abramowitz and I.A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover, New York, NY, 1972.

[68] L. Addario-Berry et al., “On combinatorial testing problems,” The Annals of Statistics, vol. 38, no. 5, pp. 3063–3092, 2010.

In document Detection of sparse mixtures: fundamental limits and algorithms (Page 131-139)