The rearrangement of the Conditional Mutual Information criterion (Fleuret, 2004) follows a very similar procedure. The original, and its rewriting are,
Jcmim(Xk) = min Xj∈S h I(Xk;Y|Xj) i = min Xj∈S h I(Xk;Y)−I(Xk; Xj) +I(Xk; Xj|Y) i = I(Xk;Y) +min Xj∈S h I(Xk; Xj|Y)−I(Xk; Xj) i = I(Xk;Y)−max Xj∈S h I(Xk; Xj)−I(Xk; Xj|Y) i ,
which is exactly Equation (19). References
K. S. Balagani and V. V. Phoha. On the feature selection criterion based on an approximation of multidimensional mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7):1342–1343, 2010. ISSN 0162-8828.
R. Battiti. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4):537–550, 1994.
G. Brown. A new perspective for information theoretic feature selection. In International Confer- ence on Artificial Intelligence and Statistics, volume 5, pages 49–56, 2009.
H. Cheng, Z. Qin, C. Feng, Y. Wang, and F. Li. Conditional mutual information-based feature selection analyzing for synergy and redundancy. Electronics and Telecommunications Research Institute (ETRI) Journal, 33(2), 2011.
D. M. Chickering, D. Heckerman, and C. Meek. Large-sample learning of bayesian networks is np-hard. Journal of Machine Learning Research, 5:1287–1330, 2004.
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience New York, 1991.
J. Demˇsar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learn- ing Research, 7:1–30, 2006.
J. P. Dmochowski, P. Sajda, and L. C. Parra. Maximum likelihood in cost-sensitive learning: model specification, approximations, and upper bounds. Journal of Machine Learning Research, 11: 3313–3332, 2010.
W. Duch. Feature Extraction: Foundations and Applications, chapter 3, pages 89–117. Studies in Fuzziness & Soft Computing. Springer, 2006. ISBN 3-540-35487-5.
A. El Akadi, A. El Ouardighi, and D. Aboutajdine. A powerful feature selection approach based on mutual information. International Journal of Computer Science and Network Security, 8(4):116, 2008.
R. M. Fano. Transmission of Information: Statistical Theory of Communications. New York: Wiley, 1961.
F. Fleuret. Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research, 5:1531–1555, 2004.
C. Fonseca and P. Fleming. On the performance assessment and comparison of stochastic multiob- jective optimizers. Parallel Problem Solving from Nature, pages 584–593, 1996.
D. Grossman and P. Domingos. Learning bayesian network classifiers by maximizing conditional likelihood. In International Conference on Machine Learning. ACM, 2004.
G. Gulgezen, Z. Cataltepe, and L. Yu. Stable and accurate feature selection. Machine Learning and Knowledge Discovery in Databases, pages 455–468, 2009.
B. Guo and M. S. Nixon. Gait feature subset selection by mutual information. IEEE Trans Systems, Man and Cybernetics, 39(1):36–46, January 2009.
I. Guyon. Design of experiments for the NIPS 2003 variable selection benchmark. http://www.nipsfsc.ecs.soton.ac.uk/papers/NIPS2003-Datasets.pdf, 2003.
I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, editors. Feature Extraction: Foundations and Applications. Springer, 2006. ISBN 3-540-35487-5.
M. Hellman and J. Raviv. Probability of error, equivocation, and the chernoff bound. IEEE Trans- actions on Information Theory, 16(4):368–372, 1970.
A. Jakulin. Machine Learning Based on Attribute Interactions. PhD thesis, University of Ljubljana, Slovenia, 2005.
A. Kalousis, J. Prados, and M. Hilario. Stability of feature selection algorithms: a study on high- dimensional spaces. Knowledge and information systems, 12(1):95–116, 2007. ISSN 0219-1377. R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial intelligence, 97(1-2):
D. Koller and M. Sahami. Toward optimal feature selection. In International Conference on Ma- chine Learning, 1996.
K. Korb. Encyclopedia of Machine Learning, chapter Learning Graphical Models, page 584. Springer, 2011.
L. I. Kuncheva. A stability index for feature selection. In IASTED International Multi-Conference: Artificial Intelligence and Applications, pages 390–395, 2007.
N. Kwak and C. H. Choi. Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13(1):143–159, 2002.
D. D. Lewis. Feature selection and feature extraction for text categorization. In Proceedings of the workshop on Speech and Natural Language, pages 212–217. Association for Computational Linguistics Morristown, NJ, USA, 1992.
D. Lin and X. Tang. Conditional infomax learning: An integrated framework for feature extraction and fusion. In European Conference on Computer Vision, 2006.
P. Meyer and G. Bontempi. On the use of variable complementarity for feature selection in cancer classification. In Evolutionary Computation and Machine Learning in Bioinformatics, pages 91– 102, 2006.
P. E. Meyer, C. Schretter, and G. Bontempi. Information-theoretic feature selection in microarray data using variable complementarity. IEEE Journal of Selected Topics in Signal Processing, 2(3): 261–274, 2008.
T. Minka. Discriminative models, not discriminative training. Microsoft Research Cambridge, Tech. Rep. TR-2005-144, 2005.
L. Paninski. Estimation of entropy and mutual information. Neural Computation, 15(6):1191–1253, 2003. ISSN 0899-7667.
H. Peng, F. Long, and C. Ding. Feature selection based on mutual information: Criteria of max- dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):1226–1238, 2005.
C. E. Shannon. A mathematical theory of communication. Bell Systems Technical Journal, 27(3): 379–423, 1948.
M. Tesmer and P. A. Estevez. Amifs: Adaptive feature selection by using mutual information. In IEEE International Joint Conference on Neural Networks, volume 1, 2004.
I. Tsamardinos and C. F. Aliferis. Towards principled feature selection: Relevancy, filters and wrappers. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (AISTATS), 2003.
I. Tsamardinos, C. F. Aliferis, and A. Statnikov. Algorithms for large scale markov blanket discov- ery. In 16th International FLAIRS Conference, volume 103, 2003.
M. Vidal-Naquet and S. Ullman. Object recognition with informative features and linear classifica- tion. IEEE Conference on Computer Vision and Pattern Recognition, 2003.
J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection for svms. Advances in Neural Information Processing Systems, pages 668–674, 2001. ISSN 1049- 5258.
H. Yang and J. Moody. Data visualization and feature selection: New algorithms for non-gaussian data. Advances in Neural Information Processing Systems, 12, 1999.
L. Yu and H. Liu. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5:1205–1224, 2004.
L. Yu, C. Ding, and S. Loscalzo. Stable feature selection via dense feature groups. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 803–811, 2008.