t-Distributed Stochastic Neighbor Embedding (t-SNE) , curvilinear components analysis (CCA) , Maximum Variance Unfolding (MVU) , Schroedinger Eigenmaps (SE)  and Spatial Spectral Schroedinger Eigenmaps (SSSE) . However, nearly all of these nonlinear DR methods have been applied only to small images or tiles. Two of the greatest barriers to effective use of nonlinear DR methods in HSI processing are their computational complexity and memory requirements. Fong  shows that LLE, LTSA and LLTSA are incapabale of handling HSI having more than 70 × 70 pixels; for SPE, KPCA and CFA, this number of pixels reduces to 50 × 50. Although  is now over ten years old, its conclusions have not changed dramatically in the last decade. In attempting to run various nonlinear DR algorithms on the 512 × 217 pixel Salinas image  on a modern desktop computer (AMD FX-6300 Six-Core Processor, 24 GB memory), we run out of memory when attempting to perform LLE, ISOMAP, LTSA and t-SNE. LE does successfully run under the constraint that only a small number (20) of neighbors can be used to construct the graph; however, the accuracy of a subsequent random forest classifier is worse than that achieved with PCA dimension reduction. SSSE and KPCA DR algorithms also successfully run on the same desktop computer, and subsequent random forest classification is superior than when PCA is used for DR; however, computation time is huge: 1,716 seconds for SSSE and 432,173 seconds (more than 5 days) for KPCA.
Isometric Mapping (ISOMAP) is a data-driven nonlineardimensionality reduc- tion method, or manifold learning method, to describe the nonlinear variations of the data. It uses a k-Nearest Neighbor (kNN) Graph to estimate the manifold–the nonlin- ear geometric structure of the data, and Classical Multidimensional Scaling (CMDS) to construct the manifold coordinates that are sorted by the variances of the data on the manifold. It solves the major problem of linear dimensionalityreduction meth- ods. However, it requires the eigendecomposition on the proximity matrix of all data points, which is often impossible for large remotely sensed hyperspectral images. The ENH-ISOMAP designed for hyperspectralimagery addresses this problem by adopt- ing the landmark ISOMAP algorithm, which uses a subset of data points to estimate the whole manifold, along with other optimizations such as backbone reconstruction, efficient data structure and algorithm implementation for nearest neighbor search and shortest path search, etc. This makes ENH-ISOMAP a practical manifold learning algorithm for typical hyperspectralimagery and has better performance than MNF, especially for nonlinear signals.
A use of bootstrap sampling prior to classification might improve classification accuracy [45,46]. The RF algorithm forms many classification trees, and each tree is trained based on a bootstraps sample for the training data . The RF algorithm does the bootstraps internally, therefore, we didn’t consider applying bootstraps separately on the training-polygons. In addition to the mentioned factors, a visual comparison of the results showed that most misclassifications were occurred in shadowing areas and where the two artificial black-lines where existed. Handling shadowing-, and noise-areas (or other abnormalities in an image) prior to the classification might have been optimizing the classification results. Moreover, the pixel-level classification results could be enhanced by post- processing techniques, however, to avoid missing tree-classes, no post-processing strategy was applied in this study.
radiometric resolution of the image increases, expressed as number of bit per pixel, lossy approaches obtain better results than lossless techni- ques, in terms of quality of reconstructed images. In the literature, several lossy approaches have been proposed for the compression of HS images (Abousleman et al., 1995; Conoscenti, Coppola, & Magli, 2016; Fowler et al., 2007; Karami, Heylen, & Scheunders, 2015; Kulkarni et al., 2006). Many of these techniques are based on decorrelation trans- forms, in order to exploit both spatial and spectral correlations, followed by a quantization stage and an entropy coder. In particular these approaches involve the combination of a 1-D spectral decorrelator such as the principal component analysis transform (PCA), the Discrete Wavelet Transform (DWT), or the Discrete Cosine Transform (DCT), followed by a spatial decorrelator (Abrardo, Barni, & Magli, 2010; Christophe, Mailhes, & Duhamel, 2008; Kaarna et al., 2000). It is not difficult to understand that the spec- tral decorrelation phase plays a critical role for an effective HS compression. Wavelet-based techniques include the 3D extensions of JPEG2000, SPIHT, and SPECK (Kim, Xiong, & Pearlman, 2000; Penna et al., IEEE GRSL, 2006a; Tang et al. 2005). These approaches can be seen as direct 3D extensions of approaches designed for 2D imagery, where a 1D
In addition, it is almost impossible for us to obtain labeled samples of every class for training in hyperspectralimagery scene. In other words, we may have no training sample allied to a given query pixel at all. This makes that no matter how we classify the query pixel is wrong. Hence, before classifying a given query pixel, we must first decide if it is a valid sample from one of the classes in the HSI data set. The ability to detect and then reject invalid test pixel is important for hyperspectralclassification task. Conventional classifiers such as nearest neighbor (NN) and nearest space (NS) usually use the representation error for validation. However, with an over-complete dictionary, the smallest representation error of the invalid test pixel is not so large, even an invalid pixel will has as small representation error as valid pixels, leading to inaccurate validation. Since the coefficients are computed globally in sparse representation scheme, the distribution of the coefficients contains important information about the validity of the query pixel. In detail, a valid pixel has sparse coding vector whose nonzero entries concentrate on one subject while an invalid pixel has sparse coefficients spread widely across the entire training set. However, in dense CR, since the samples from different classes share similarity and all the training samples participate in the representation, both of the coefficients of valid and invalid samples would spread widely across the entire training set. This makes the validation ability of coefficients weak. Therefore, we note that the sparsity constraint contributes to validation as well. What calls for special attention is that the coefficients computing must be interpreted as 0 -norm minimization within a certain
The basic LBP operator is a gray scale invariant pattern measure characterising the tex- ture in images Maenpaa et al. . In this method, texture is defined using local patterns on a pixel level. Each pixel is labelled with the code of the texture primitive that best matches the local neighbourhood. First, the centre pixel value is taken as a threshold and n neigh- bours of the pixel are selected. Then, each neighbouring pixel is given a weight based on its position and these weights are multiplied by the threshold values to generate basic LBP code. In the circular LBP (which is used in our algorithm), symmetric neighbours in a cir- cle are used for a particular radius. Normally 8, 12 and 16 neighbourhoods are used. By taking 8 neighbours, 59 parameters represent the local variations in a region, whereas 12 and 16 neighbours result in 133 and 243 parameters, respectively. By using the highly uni- fying approach of LBP, texture in biopsy samples distinguishes benign tissue samples from malignant samples.
[4-17]). Several studies for higher spectral resolution (e.g., 60 channels in [18,19]) used synthetic data which often favor a particular (such as maximum likelihood) classifier, by virtue of (Gaussian) data construction. Oth- ers offered some principled dimensionalityreduction and showed high accuracies with the reduced number of bands for a moderate number of classes (e.g., [20-22]). Some research targeted selected narrow spectral windows of hyperspectral data to classify one specific important spectral feature . A small number of ANN works clas- sified hyperspectral data directly, without prior dimen- sionality reduction [24-26]. Experience suggests that the difference in quality between the performance of clas- sical methods and ANN classifiers increases in favor of the ANNs with increasing number of channels. However, this has not yet been quantified for large-scale classifica- tion of many cover types with subtle differences in com- plex, noisy hyperspectral patterns. Assessment of ANN performance versus conventional methods for realistic, advanced remote sensing situations requires comparisons using the full spectral resolution of real hyperspectral data with many cover classes because conventional tech- niques are most likely to reach their limitations in such circumstances. Systematic evaluation is needed to ensure powerful, reliable, automated applications of ANNs or any other classifiers. The present paper is a step toward filling this gap.
and Principal Component Analysis (PCA). We use PCA in two different ways. Firstly, PCA is applied on hyperspectral bands only and additional features with the first few PCs were added. Secondly, PCA is applied on the whole, feature vector from hyperspectral and LiDAR as Luo et al. (2016). Our former technique for using PCA provides higher classi- fication accuracies. Also, we measure the classification ac- curacies of our feature combination and the feature combi- nation proposed by Luo et al. (2016) when applying PCA on the whole feature vector. Our feature combination achieves higher classification accuracies with the same number of PCs than the mentioned existing one using the decision tree. • Our method for classifying land cover classes is not depen- dent on any prior knowledge like road width/tree height. It can be used in other datasets without any adjustment that is required by some existing method (Man et al., 2015).
The study shows that the variable importance in projection method  can be used to identify the wavebands that are the most important predictor variables in the hyperspectralclassification of grassland age-classes. The accuracy of a partial least squares classification based on a subset of 177 wavebands, identified with the help of the variable importance in projection approach as those that were most important for discriminating between successional stages, was 85% (8% higher than for a classification based on the full set of 269 bands). Among the 177 hyperspectral wavebands that gave the most efficient discrimination between grassland age-classes, 50 wavebands were located in the visible region (414–716 nm), 79 wavebands in the red-edge to near-infrared regions (722–1394 nm), and 48 wavebands in the shortwave infrared region (1448–2417 nm) of the electromagnetic spectrum. The fact that the best wavebands for discriminating between grassland age-classes fell within the operating range of both the HySpex VNIR-1600 spectrometer (414 to 991 nm) and the HySpex SWIR-320m-e spectrometer (966 to 2501 nm) suggests that data from specific wavebands covering the full 400–2500 nm spectral range are likely to provide the best classification of grassland successional stages. Our results also show that the partial least squares-based classification procedure is a suitable method for the classification of grasslands successional stages, allowing a large number of hyperspectral wavebands to be compressed into a few latent variables while decreasing the risk of model overfitting. In our study, the first four latent variables explained approximately 97% of the variation in the spectral data.
Gidudu Anthony  et al performed multiple classification tasks using Support Vector Machines. The approaches used were One-Against-One (1A1) and One-Against-All (1AA) techniques for classification of multiple land covers present in remotely sensed data. The authors conclude that 1AA approach to multiclass classification has exhibited higher propensity for mixed pixels than the 1A1 approach. The two approaches were compared with four different SVM classifiers like Linear, Quadratic, Polynomial and RBF using Kappa Coefficients. Thus, classification accuracy reduced for the linear and RBF classifiers and stayed the same for the polynomial and increased for the quadratic classifier. It can therefore be concluded that whereas one can be certain of high classification results with the 1A1 approach, the 1AA yields approximately as good classification accuracies. The choice therefore of which approach to adopt henceforth becomes a matter of preference.
Experimental comparisons of the methods: In addition to the theoretical specificities of the approaches, the experimental results remain a major selection criterium. A meta-analysis has been conducted to identify the most significant algo- rithm performances from the numerical comparisons inven- toried in the publications. The results depend on both the used quality measure and the main information guiding the dimensionalityreduction (feature space v.s label space or co- label and feature space). Three methods MVMD, SSMDDM and MDDM based on feature space reduction dominate for the two uncorrelated retained measures (Hamming Loss and a selected measure among a large set of correlated ones including Micro F1, Macro F1 and AUC). For the latter, results also highlights SLEEC and REML which are recent approaches especially designed for extreme multi- label learning. A dual examination of domination relation- ships completes the analysis by pointing out the methods dominated for the two measures. However, from a method- ological point of view the generalization of the conclusions should be considered cautiously. As numerous pairwise comparisons are absent of the published experiments, the meta-analysis has been computed on a non-complete graph. Moreover, the heterogeneity of both the datasets used in the different studies and the number of times each algorithm was evaluated add biases to the comparisons. However, despite these limitations, we believe that this first meta- analysis can help identify recurrent properties in the most efficient approaches and also flaws in the experimental protocols (e.g. the lack of some pairwise comparisons). More
As each term in HTML tag for each web page can be taken as a feature. It causes the problem of high dimensionality. To reduce dimensionality problem, Selma Ayse Özel (2009) proposed optimal feature selection technique based on Genetic algorithm. The performance of this method is compared with J48 (decision tree), the Naïve Bayes Multinominal (Bayes), and the IBk (kNN) classifiers. It gives 96% accuracy using GA as feature selector. In this method, the numbers of features considered are large i.e. up to 50000 features, system takes both terms and HTML tags together on a Web page as features, assign different weights to each feature and the weights are determined by the GA. After extracting features, document vectors for the Web pages are created by counting the occurrences of each feature in the associated HTML tag of each Web page. The GA feature selector consists of coding, generation of initial population, evaluation of a population, reproduction, crossover, mutation, and determination of the new generation steps and reproduction, crossover, mutation steps are repeated, number of generations times until optimal feature vector found .
in optimizing the features and sustainable during the process of decision making of tumour cells .The proposed system uses the PCA for feature selection and Artificial Neural Network (ANN) for classification for improving the detection accuracy. Scree Test and cumulative variance are the rule utilized in the PCA. After feature selection, the reduced number of data is passed to back propagation ANN to distinguish the benign and cancer data . The proposed system for feature selection is genetic algorithm .It makes us to understand the most significant parameters for cancer detection. Artificial neural network(ANN), particle swarm optimization and genetic algorithms are utilized to determine the detection accuracy of the classifier models on WDBC and WPBC datasets. Particle swarm optimization outperforms the other classifiers in WDBC dataset. Artificial neural network provides good detection accuracy in both WDBC and WPBC datasets. Hence, feature selection increases the detection accuracy before passing onto the classifier model. Hybrid systems are constructed using the independent component analysis(ICA) with discrete wavelet transform for dimensionalityreduction for WDBC dataset. Probabilistic neural network (PNN) is operated to analyse between the benign and malignant cells. The system provides detection efficiency of 96.31% and 98.88% sensitivity. The computational overhead is reduced because the dataset features are reduced before passing to the PNN classifier . The independent component analysis (ICA) is further explored for its adaptability as the decision system for WDBC dataset. The classifier used to verify the classification results are k-nearest neighbor, ANN, RBFNN and SVM. The metrics evaluated are ROC, specificity, sensitivity, detection efficiency and F-measure. The
With rapid advances in science and technology nowadays, the marginal cost associated with data collection is decreasing, and more and more big data of different types are available for scientific analysis. In the context of data explosion, however, high data dimensionality occurs, posing considerable challenges to classification. The traditional classification algorithms rely on distance or density of data items. But in the case of high dimensionality, these methods are not effective anymore due to space sparsity. Moreover, directly classifying high-dimensionality data using the classification methods causes heavy time costs and computational complexities. This limits the widespread application of the traditional classification algorithms.
We consider two common word space models that have been used with dimensionality reduc- tion. The first is the Vector Space Model (VSM) (Salton et al., 1975). Words are represented as vectors where each dimension corresponds to a document in the corpus and the dimension’s value is the number of times the word occurred in the document. We label the second model the Word Co-occurrence (WC) model: each dimension cor- respond to a unique word, with the dimension’s value indicating the number of times that dimen- sion’s word co-occurred.
In this paper we discuss the specific visualization task of projecting the data to points on a two- dimensional display. Note that this task is different from manifold learning, in case the inherent dimensionality of the manifold is higher than two and the manifold cannot be represented perfectly in two dimensions. As the representation is necessarily imperfect, defining and using a measure of goodness of the representation is crucial. However, in spite of the large amount of research into methods for extracting manifolds, there has been very little discussion on what a good two- dimensional representation should be like and how the goodness should be measured. In a recent survey of 69 papers on dimensionalityreduction from years 2000–2006 (Venna, 2007) it was found that 28 (≈ 40%) of the papers only presented visualizations of toy or real data sets as a proof of quality. Most of the more quantitative approaches were based on one of two strategies. The first is to measure preservation of all pairwise distances or the order of all pairwise distances. Examples of this approach include the multidimensional scaling (MDS)-type cost functions like Sammon’s cost and Stress, methods that relate the distances in the input space to the output space, and various cor- relation measures that assess the preservation of all pairwise distances. The other common quality assurance strategy is to classify the data in the low-dimensional space and report the classification performance.
Three non eigen-based hyperspectral ID estimators have re- cently been proposed. The first one, introduced in  as part of a Negative ABundance-Oriented (NABO) unmixing algorithm, borrows the main idea from the HySIME algorithm. Basically, it decomposes the residual error from the uncon- strained unmixing into two components, a first due to noise and a second due to ID. The algorithm works by starting from an underestimate of the ID, and then, iteratively increments the ID value until the unmixing error can be solely explained by the noise term. The second non eigen-based method, called Hyperspectral Image Dimension Estimation through Nearest Neighbor distance ratios (HIDENN)  is based on local geometrical properties of the data manifold. The technique is aimed at computing the correlation dimension of the dataset, which is itself closely related to the concept of fractal dimen- sion. The basic idea is to count (in the neigborhood of one data point) the total number of pairs of points g() which have a distance between them that is less than . Then it can be shown that if n → ∞ and → 0, the so-called correlation
In this paper, we investigate and compare the features in the structural profiles of core promoter regions in several typical eukaryotes. Instead of using a sliding window of specified width to filter noise in the structural profiles of each individual promoter, we align promoters at the TSS for each eukaryote type and get an averaged promoter representative for this eukaryote type. Then we apply a nonlineardimensionalityreduction algorithm – Isomap on the averaged promoter model, which is described by a set of physicochemical parameters, to separate a comprehensive structural profile. The structural profile derived by our method is very different from those in previous studies. Firstly, the avoidance of the sliding window approach can preserve the local details of each single promoter, while the average between individual promoters weakens the local inconsistent structural traits and strengthens the consistent
Abstract: Dimensionalityreduction is of high importance in hyperspectral data processing, which can effectively reduce the data redundancy and computation time for improved classification accuracy. Band selection and feature extraction methods are two widely used dimensionalityreduction techniques. By integrating the advantages of the band selection and feature extraction, we propose a new method for reducing the dimension of hyperspectral image data. First, a new and fast band selection algorithm is proposed for hyperspectral images based on an improved Determinantal point process (DPP). To reduce the amount of calculation, the Dual-DPP is used for fast sampling representative pixels, followed by kNN-based local processing to explore more spatial information. These representative pixel points are used to construct multiple adjacency matrices to describe correlation between bands based on mutual information. To further improve the classification accuracy, two-Dimensional Singular Spectrum Analysis (2D-SSA) is used for feature extraction from the selected bands. Experiments show that the proposed method can select a low-redundancy and representative band subset, where both data dimension and computation time can be reduced. Furthermore, it also shows that the proposed dimensionalityreduction algorithm outperforms a number of state-of-the-art methods in terms of classification accuracy.
Classification techniques in data mining are capable of processing a large amount of data. It can predict categorical class labels and classifies data based on training set and class labels and hence can be used for classifying newly available data. Classification and prediction are two forms of data analysis that can be used to extract models describing the important data classes or to predict the future data trends. The classification predicts categorical (discrete, unordered) labels, prediction model, and continuous valued function. Some of the most famous classification methodologies including decision tree induction, max margin classifier (SVM), bayesian classification, artificial neural network, and K-nearest neighbors were discussed in this survey.