Bibliography - vazquez

Alelyani, S., Tang, J., & Liu, H. (2013). Feature selection for clustering: A review. Data Clustering: Algorithms and Applications, 29, 110-121.

de Amorim, R. C. (2016). A survey on feature weighting based K-Means algorithms. Journal of Classification, 33(2), 210-242.

Atkinson, A.C., Riani, M., & Cerioli, A. (2018) Cluster detection and clustering with random start forward searches, Journal of Applied Statistics, 45:5, 777-798, DOI: 10.1080/02664763.2017.1310806

Bair, E. (2013). Semi‐supervised clustering methods. Wiley Interdisciplinary Reviews: Computational Statistics, 5(5), 349-361.

Berkhin, P. (2006). A survey of clustering data mining techniques. (pp. 25-71). Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/3-540-28349-8_2

Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. KDD-2001: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery. pp. 245–250. doi:10.1145/502512.502546.

Bouveyron, C., Girard, S., & Schmid, C. (2007;2006;). High-dimensional data clustering. Computational Statistics and Data Analysis, 52(1), 502-519.

Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers and Electrical Engineering, 40(1), 16-28.

doi:10.1016/j.compeleceng.2013.11.024

Dash M., & Koot P.W. (2009) "Feature Selection for Clustering". In Encyclopedia of Database Systems (pp. 1119-1125). Springer, Boston, MA

Dash, M., & Liu, H. (2000, April). Feature selection for clustering. In Pacific-Asia Conference on knowledge discovery and data mining (pp. 110-121). Springer, Berlin, Heidelberg.

Dy, J. G., & Brodley, C. E. (2004). Feature selection for unsupervised learning. Journal of machine learning research, 5(Aug), 845-889.

Eftekhari, A., Babaie-Zadeh, M., & Abrishami Moghaddam, H. (2011). Two-dimensional random projection. Signal Processing, 91(7), 1589-1603.

doi:10.1016/j.sigpro.2011.01.002

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.

Han, J., Kamber, M., & Pei, J. (2011a). Cluster Analysis: Basic Concepts and Methods. In Data mining: Concepts and techniques 3rd edition. San Diego, CA, USA: Elsevier Science.

Han, J., Kamber, M., & Pei, J. (2011b). Cluster Analysis: Basic Concepts and Methods. In Data mining: Concepts and techniques 3rd edition. San Diego, CA, USA: Elsevier Science.

Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507. doi:10.1126/science.1127647

Hunter, J.D. (2007) Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95, DOI:10.1109/MCSE.2007.55

Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666. doi:10.1016/j.patrec.2009.09.011

John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Machine Learning Proceedings 1994 (pp. 121-129).

Jones, E., Oliphant, E., Peterson, P., et al. SciPy: Open Source Scientific Tools for Python, 2001-, http://www.scipy.org/

Kambhatla, N., & Leen, T. K. (1997). Dimension reduction by local principal component analysis. Neural Computation, 9(7), 1493-1516. doi:10.1162/neco.1997.9.7.1493 Kane, D. M., & Nelson, J. (2014). Sparser johnson-lindenstrauss transforms. Journal of

the Association for Computing Machinery, 61(1), 4.

Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273-324.

Kriegel, H., Kröger, P., & Zimek, A. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(1), 1-58.

doi:10.1145/1497577.1497578

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. doi:10.1038/nature14539

Lu, Y., Cohen, I., Zhou, X. S., & Tian, Q. (2007). Feature selection using principal feature analysis. In Proceedings of the 15th ACM international conference on Multimedia (pp. 301-304). ACM.

Mckinney, W. (2010) Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, (pp. 51-56).

Melnic, M. L. (2017). How to strengthen customer loyalty, using customer segmentation? Bulletin of the Transilvania University of Brasov. Series V : Economic Sciences, 9(2), 51-60.

Milligan, G. W., & Cooper, M. C. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5(2), 181-204.

doi:10.1007/BF01897163

Modha, D. S., & Spangler, W. S. (2003). Feature weighting in k-means clustering. Machine Learning, 52(3), 217-237. doi:10.1023/A:1024016609528

Oliphant, T.E. (2006) A guide to NumPy, USA: Trelgol Publishing.

Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. ACM SIGKDD Explorations Newsletter, 6(1), 90-105.

doi:10.1145/1007730.1007731

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V, Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,

Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. Scikit-learn: Machine Learning in Python, 2010-, http://scikit-learn.org/

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V, Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,

Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2010) Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 12, 2825- 2830.

Rezaeinia, S. M., Keramati, A., & Albadvi, A. (2012). An integrated AHP-RFM method to banking customer segmentation. International Journal of Electronic Customer Relationship Management, 6(2), 153-168. doi:10.1504/IJECRM.2012.048721 Rezaeinia, S. M., & Rahmani, R. (2016). Recommender system based on customer

segmentation (RSCS). Kybernetes, 45(6), 946-961. doi:10.1108/K-07-2014-0130 Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural

Networks, 61, 85-117. doi:10.1016/j.neunet.2014.09.003

Sembiring, R. W., & Zain, J. M. (2010). Cluster evaluation of density based subspace clustering. Journal of Computing, 2(11). Retrieved from

https://arxiv.org/pdf/1012.6009.pdf

Siniscalchi, S. M., Yu, D., Deng, L., & Lee, C. (2013). Exploiting deep neural networks for detection-based speech recognition. Neurocomputing, 106, 148-157.

doi:10.1016/j.neucom.2012.11.008

Song, M., Yang, H., Siadat, S. H., & Pechenizkiy, M. (2013). A comparative study of dimensionality reduction techniques to enhance trace clustering performances. Expert Systems with Applications, 40(9), 3722-3737.

doi:10.1016/j.eswa.2012.12.078

Tsiptsis, K. K., & Chorianopoulos, A. (2011). Data mining techniques in CRM: inside customer segmentation. John Wiley & Sons.

Upton, G., & Cook, I. (2014a). Distance measure. In A Dictionary of Statistics (3rd ed.). Oxford University Press.

Upton, G., & Cook, I. (2014b). Euclidean distance. In A Dictionary of Statistics (3rd ed.). Oxford University Press.

Upton, G., & Cook, I. (2014c). Overfitting. In A Dictionary of Statistics (3rd ed.). Oxford University Press.

Upton, G., & Cook, I. (2014d). Standardizing. In A Dictionary of Statistics (3rd ed.). Oxford University Press.

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques (3rd ed.). San Diego, CA, USA: Elsevier Science.

Yu, W., Sun, X., Yang, K., Rui, Y., & Yao, H. (2018). Hierarchical semantic image matching using CNN feature pyramid. Computer Vision and Image

Understanding, 169, 40-51. doi:10.1016/j.cviu.2018.01.001

Xie, J., Girshick, R., & Farhadi, A. (2016, June). Unsupervised deep embedding for clustering analysis. In International conference on machine learning (pp. 478- 487).

Zhang, J., Wu, X., Zhu, J., & Hoi, S. C. H. (2017). Feature agglomeration networks for single stage face detection. arXiv:1712.00721

Glossary

Term Definition

artificial neural network (ANN)

a system that attempts to solve problems by pattern recognition, mimicking the activities and

representation of neuronal networks like those in the human brain

attribute subset selection (or feature selection or variable elimination)

a process of reducing the size of a dataset by removing irrelevant or redundant attributes

clustering (or cluster analysis)

a type of data mining task whose goal is to discover natural groupings (clusters) in a data set, as well as the process of building the models that represent these groupings

clustering tendency

refers to whether the dataset lends itself to being clustered; datasets that have a random structure are unlikely to cluster, whereas data with a non-random structure have a higher probability of clustering in meaningful ways

coclustering (or biclustering) refers to a clustering method in which both data points and their attributes are clustering simultaneously correlation-based clustering refers to a clustering method in which clusters are

defined by advanced correlation methods cross-validation

refers to the process of using involves a training set to training multiple models and comparing the quality of the models by their performance on a test set

customer segmentation the process of dividing the customer base into distinct and internally homogenous groups

curse of dimensionality

the idea that the large number of input variables and high volume of data may muddle the analysis and result in a model that overfits the data

data mining

the process of discovering (or extracting) previously unknown, interesting patterns in large datasets, largely automatically with the help of computers

deep learning

refers to non-task-specific machine learning methods whose goals are to learn about the underlying data representations with multiple levels of abstraction deep neural network refers to an artificial neural network with multiple hidden layers between the input and output layers density-based clustering

refers to a clustering method that will continue to grow a cluster as long as the density (the number of objects) in the cluster exceeds some threshold

distance measure (or distance metric)

a quantitative measurement of the distance between the points in multidimensional space

Euclidean distance the “straight-line" distance between two points feature agglomeration

refers to the dimensionality reduction method that involves recursively merging features using a predefined calculation (e.g., mean) until the

predetermined number of feature clusters is reached feature construction

a step of data analysis that involves making decisions about which characteristics of the data to include and the best ways to represent these characteristics grid-based clustering

refers to a clustering method that renders the object space into a grid composed of a finite number of cells; the clustering operations are performed on the grid space

hard clustering refers to a clustering model in which clusters are mutually exclusive

hierarchical clustering refers to a clustering method that creates a hierarchical decomposition of the set of data objects

Hopkins statistic a statistic that can be used to measure the clustering tendency of a dataset

inter-cluster variance variance of the data points in between clusters intra-cluster variance (or

within-cluster variance) variance of the data points in a cluster overfit

a model that overfits the data is unnecessarily

complicated and will likely perform poorly when faced with new data

principal component analysis (PCA)

refers to a linear transformation of a set of input data in to an equal number of linearly-uncorrelated variables (principal components), that each successively count for the largest possible portion of remaining data variance

random projection (RP)

refers to the transformation of a set of input data onto a lower-dimensional space while preserving the pairwise Euclidean distances

projected clustering

refers to a clustering method in which each data point is assigned to exactly one subspace cluster, and then a projection for where the points best cluster is found similarity measure

a metric used to quantify the distance between two data points and to determine how similar or dissimilar clusters are

semi-supervised learning

machine learning task in which the data used for analysis are partially labeled or when there are known constraints on how the data should be clustered soft clustering (or fuzzy

clustering)

refers to a clustering model in which data points are assigned probabilities of belonging to one cluster over another

standardization

the process of transforming the values of a variable such that the mean is centered at 0 with a standard deviation of 1

subspace clustering refers to a clustering method in which clusters can exist in multiple, possibly overlapping subspaces

supervised learning machine learning task in which the data used for analysis are not labeled

unsupervised learning machine learning task in which the data used for analysis are not labeled

In document vazquez_final.pdf (Page 45-53)