Alelyani, S., Tang, J., & Liu, H. (2013). Feature selection for clustering: A review. Data Clustering: Algorithms and Applications, 29, 110-121.
de Amorim, R. C. (2016). A survey on feature weighting based K-Means algorithms. Journal of Classification, 33(2), 210-242.
Atkinson, A.C., Riani, M., & Cerioli, A. (2018) Cluster detection and clustering with random start forward searches, Journal of Applied Statistics, 45:5, 777-798, DOI: 10.1080/02664763.2017.1310806
Bair, E. (2013). Semi‐supervised clustering methods. Wiley Interdisciplinary Reviews: Computational Statistics, 5(5), 349-361.
Berkhin, P. (2006). A survey of clustering data mining techniques. (pp. 25-71). Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/3-540-28349-8_2
Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. KDD-2001: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery. pp. 245–250. doi:10.1145/502512.502546.
Bouveyron, C., Girard, S., & Schmid, C. (2007;2006;). High-dimensional data clustering. Computational Statistics and Data Analysis, 52(1), 502-519.
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers and Electrical Engineering, 40(1), 16-28.
doi:10.1016/j.compeleceng.2013.11.024
Dash M., & Koot P.W. (2009) "Feature Selection for Clustering". In Encyclopedia of Database Systems (pp. 1119-1125). Springer, Boston, MA
Dash, M., & Liu, H. (2000, April). Feature selection for clustering. In Pacific-Asia Conference on knowledge discovery and data mining (pp. 110-121). Springer, Berlin, Heidelberg.
Dy, J. G., & Brodley, C. E. (2004). Feature selection for unsupervised learning. Journal of machine learning research, 5(Aug), 845-889.
Eftekhari, A., Babaie-Zadeh, M., & Abrishami Moghaddam, H. (2011). Two-dimensional random projection. Signal Processing, 91(7), 1589-1603.
doi:10.1016/j.sigpro.2011.01.002
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
Han, J., Kamber, M., & Pei, J. (2011a). Cluster Analysis: Basic Concepts and Methods. In Data mining: Concepts and techniques 3rd edition. San Diego, CA, USA: Elsevier Science.
Han, J., Kamber, M., & Pei, J. (2011b). Cluster Analysis: Basic Concepts and Methods. In Data mining: Concepts and techniques 3rd edition. San Diego, CA, USA: Elsevier Science.
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507. doi:10.1126/science.1127647
Hunter, J.D. (2007) Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95, DOI:10.1109/MCSE.2007.55
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666. doi:10.1016/j.patrec.2009.09.011
John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Machine Learning Proceedings 1994 (pp. 121-129).
Jones, E., Oliphant, E., Peterson, P., et al. SciPy: Open Source Scientific Tools for Python, 2001-, http://www.scipy.org/
Kambhatla, N., & Leen, T. K. (1997). Dimension reduction by local principal component analysis. Neural Computation, 9(7), 1493-1516. doi:10.1162/neco.1997.9.7.1493 Kane, D. M., & Nelson, J. (2014). Sparser johnson-lindenstrauss transforms. Journal of
the Association for Computing Machinery, 61(1), 4.
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273-324.
Kriegel, H., Kröger, P., & Zimek, A. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(1), 1-58.
doi:10.1145/1497577.1497578
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. doi:10.1038/nature14539
Lu, Y., Cohen, I., Zhou, X. S., & Tian, Q. (2007). Feature selection using principal feature analysis. In Proceedings of the 15th ACM international conference on Multimedia (pp. 301-304). ACM.
Mckinney, W. (2010) Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, (pp. 51-56).
Melnic, M. L. (2017). How to strengthen customer loyalty, using customer segmentation? Bulletin of the Transilvania University of Brasov. Series V : Economic Sciences, 9(2), 51-60.
Milligan, G. W., & Cooper, M. C. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5(2), 181-204.
doi:10.1007/BF01897163
Modha, D. S., & Spangler, W. S. (2003). Feature weighting in k-means clustering. Machine Learning, 52(3), 217-237. doi:10.1023/A:1024016609528
Oliphant, T.E. (2006) A guide to NumPy, USA: Trelgol Publishing.
Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations Newsletter, 6(1), 90-105.
doi:10.1145/1007730.1007731
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V, Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. Scikit-learn: Machine Learning in Python, 2010-, http://scikit-learn.org/
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V, Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2010) Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 12, 2825- 2830.
Rezaeinia, S. M., Keramati, A., & Albadvi, A. (2012). An integrated AHP-RFM method to banking customer segmentation. International Journal of Electronic Customer Relationship Management, 6(2), 153-168. doi:10.1504/IJECRM.2012.048721 Rezaeinia, S. M., & Rahmani, R. (2016). Recommender system based on customer
segmentation (RSCS). Kybernetes, 45(6), 946-961. doi:10.1108/K-07-2014-0130 Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural
Networks, 61, 85-117. doi:10.1016/j.neunet.2014.09.003
Sembiring, R. W., & Zain, J. M. (2010). Cluster evaluation of density based subspace clustering. Journal of Computing, 2(11). Retrieved from
https://arxiv.org/pdf/1012.6009.pdf
Siniscalchi, S. M., Yu, D., Deng, L., & Lee, C. (2013). Exploiting deep neural networks for detection-based speech recognition. Neurocomputing, 106, 148-157.
doi:10.1016/j.neucom.2012.11.008
Song, M., Yang, H., Siadat, S. H., & Pechenizkiy, M. (2013). A comparative study of dimensionality reduction techniques to enhance trace clustering performances. Expert Systems with Applications, 40(9), 3722-3737.
doi:10.1016/j.eswa.2012.12.078
Tsiptsis, K. K., & Chorianopoulos, A. (2011). Data mining techniques in CRM: inside customer segmentation. John Wiley & Sons.
Upton, G., & Cook, I. (2014a). Distance measure. In A Dictionary of Statistics (3rd ed.). Oxford University Press.
Upton, G., & Cook, I. (2014b). Euclidean distance. In A Dictionary of Statistics (3rd ed.). Oxford University Press.
Upton, G., & Cook, I. (2014c). Overfitting. In A Dictionary of Statistics (3rd ed.). Oxford University Press.
Upton, G., & Cook, I. (2014d). Standardizing. In A Dictionary of Statistics (3rd ed.). Oxford University Press.
Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques (3rd ed.). San Diego, CA, USA: Elsevier Science.
Yu, W., Sun, X., Yang, K., Rui, Y., & Yao, H. (2018). Hierarchical semantic image matching using CNN feature pyramid. Computer Vision and Image
Understanding, 169, 40-51. doi:10.1016/j.cviu.2018.01.001
Xie, J., Girshick, R., & Farhadi, A. (2016, June). Unsupervised deep embedding for clustering analysis. In International conference on machine learning (pp. 478- 487).
Zhang, J., Wu, X., Zhu, J., & Hoi, S. C. H. (2017). Feature agglomeration networks for single stage face detection. arXiv:1712.00721
Glossary
Term Definition
artificial neural network (ANN)
a system that attempts to solve problems by pattern recognition, mimicking the activities and
representation of neuronal networks like those in the human brain
attribute subset selection (or feature selection or variable elimination)
a process of reducing the size of a dataset by removing irrelevant or redundant attributes
clustering (or cluster analysis)
a type of data mining task whose goal is to discover natural groupings (clusters) in a data set, as well as the process of building the models that represent these groupings
clustering tendency
refers to whether the dataset lends itself to being clustered; datasets that have a random structure are unlikely to cluster, whereas data with a non-random structure have a higher probability of clustering in meaningful ways
coclustering (or biclustering) refers to a clustering method in which both data points and their attributes are clustering simultaneously correlation-based clustering refers to a clustering method in which clusters are
defined by advanced correlation methods cross-validation
refers to the process of using involves a training set to training multiple models and comparing the quality of the models by their performance on a test set
customer segmentation the process of dividing the customer base into distinct and internally homogenous groups
curse of dimensionality
the idea that the large number of input variables and high volume of data may muddle the analysis and result in a model that overfits the data
data mining
the process of discovering (or extracting) previously unknown, interesting patterns in large datasets, largely automatically with the help of computers
deep learning
refers to non-task-specific machine learning methods whose goals are to learn about the underlying data representations with multiple levels of abstraction deep neural network refers to an artificial neural network with multiple hidden layers between the input and output layers density-based clustering
refers to a clustering method that will continue to grow a cluster as long as the density (the number of objects) in the cluster exceeds some threshold
distance measure (or distance metric)
a quantitative measurement of the distance between the points in multidimensional space
Euclidean distance the “straight-line" distance between two points feature agglomeration
refers to the dimensionality reduction method that involves recursively merging features using a predefined calculation (e.g., mean) until the
predetermined number of feature clusters is reached feature construction
a step of data analysis that involves making decisions about which characteristics of the data to include and the best ways to represent these characteristics grid-based clustering
refers to a clustering method that renders the object space into a grid composed of a finite number of cells; the clustering operations are performed on the grid space
hard clustering refers to a clustering model in which clusters are mutually exclusive
hierarchical clustering refers to a clustering method that creates a hierarchical decomposition of the set of data objects
Hopkins statistic a statistic that can be used to measure the clustering tendency of a dataset
inter-cluster variance variance of the data points in between clusters intra-cluster variance (or
within-cluster variance) variance of the data points in a cluster overfit
a model that overfits the data is unnecessarily
complicated and will likely perform poorly when faced with new data
principal component analysis (PCA)
refers to a linear transformation of a set of input data in to an equal number of linearly-uncorrelated variables (principal components), that each successively count for the largest possible portion of remaining data variance
random projection (RP)
refers to the transformation of a set of input data onto a lower-dimensional space while preserving the pairwise Euclidean distances
projected clustering
refers to a clustering method in which each data point is assigned to exactly one subspace cluster, and then a projection for where the points best cluster is found similarity measure
a metric used to quantify the distance between two data points and to determine how similar or dissimilar clusters are
semi-supervised learning
machine learning task in which the data used for analysis are partially labeled or when there are known constraints on how the data should be clustered soft clustering (or fuzzy
clustering)
refers to a clustering model in which data points are assigned probabilities of belonging to one cluster over another
standardization
the process of transforming the values of a variable such that the mean is centered at 0 with a standard deviation of 1
subspace clustering refers to a clustering method in which clusters can exist in multiple, possibly overlapping subspaces
supervised learning machine learning task in which the data used for analysis are not labeled
unsupervised learning machine learning task in which the data used for analysis are not labeled