O mn , which brings prohibitive computational complexity in large-scale problems. In this paper, an efficient and scalable algorithm for tensor principal component analysis is proposed which is called Linearized Alternating Direction Method with Vectorized technique for Tensor Principal Component Analysis (LADMVTPCA). Different from traditional matrix factorization methods, LADMVTPCA utilizes the vectorized technique to formulate the tensor as an outer product of vectors, which greatly improves the computational efficacy compared to matrix factorization method. In the experiment part, synthetic tensor data with different orders are used to empirically evaluate the proposed algorithm LADMVTPCA. Results have shown that LADMVTPCA outper- forms matrix factorization based method.
10 Read more
A novel tensor-based feature extractor called TPCA (Tensor Principal Component Analysis) is proposed for hyperspectral image classification. First, we propose a new tensor matrix algebraic framework, which combines the merits of the re- cently emerged t-product model, which is based on the circular convolution, and the traditional matrix algebra. With the help of the proposed algebraic framework, we extend the traditional PCA algorithm to its tensorial variant TPCA. To speed up the tensor-based computing of TPCA, we also propose a fast TPCA for which the calculations are conducted in the Fourier domain. With a tensorization scheme via a neighborhood of each pixel, each sample is defined by a tensorial vector whose entries are all second-order tensors and TPCA can effectively extract the spectral-spatial information in a given hyperspectral image. To make TPCA applicable to traditional vector-based classifiers, we design a straightforward but effective approach to transform TPCA’s output tensor vector to a traditional vector. Experiments to classify the pixels of two publicly available benchmark hyperspectral images show that TPCA outperforms its rivals including PCA, LDA, TDLA and LDLA in term of classification accuracy.
We present classical and quantum algorithms based on spectral methods for a problem in tensor principal component analysis. The quantum algorithm achieves a quartic speedup while using exponentially smaller space than the fastest classical spectral algorithm, and a super-polynomial speedup over classical algorithms that use only polynomial space. The classical algorithms that we present are related to, but slightly different from those pre- sented recently in Ref. . In particular, we have an improved threshold for recovery and the algorithms we present work for both even and odd order tensors. These results sug- gest that large-scale inference problems are a promising future application for quantum computers.
36 Read more
• This paper has presented a new principal component model of fixed strike volatility deviations from ATM volatility. It has been used to quantify the change that should be made to any given fixed strike volatility per unit change in the underlying.
20 Read more
Face recognition system using the concept of principal component analysis and Genetic Algorithm has been discussed. The simulation is done in MATLAB environment. For implementation this work different sample images for Train and Test Database were taken. The MATLAB code was run for a single Test image approximately 20 seconds for 20 Train and Test images and 78 seconds for 100 Train and Test images. The efficiency calculation for our experiment is given in table 1.
There remain many directions for further research on this topic. First, there many aspects of our understanding of tensors and the t-product can be deepened: to oversimplify the future directions in this regard, a broad approach is to see just how much of our knowledge about matrices generalises to the tensor case. This student feels after finishing this thesis that the answer is: a lot. There also many aspects of the TRPCA procedure that could be explored, such as generalising the sign tensor of the noise (to a random variable that is not ±1 with equal probability or to non-iid noise) or analysing which tensor is returned when the procedure fails. Much work remains to be done surrounding the online algorithm. One method that could be applied to demonstrate the boundedness of A is to note that the implicitly defined function that takes m j and all the past information
111 Read more
A PCA for 3 month implied volatility skew deviations based on the data shown in figure 4b gives the output in table 1. It is clear from table 1a that the first principal component is only explaining 74% of the movement in the volatility surface and that the second principal component is rather important as it explains an additional 12% of the variation over the period. It is interesting that the factor weights shown in table 1b indicate the standard interpretation of the first three principal components in a term structure, as parallel shift, tilt and convexity components. Note that sparse trading in very out-of-the money options implies that the extreme low strike volatilities show less correlation with the rest of the system, and this is reflected by their lower factor weights on the first component.
16 Read more
Abstract: B ig data is the collection and analysis of a large set of data which holds many intelligence and raw information based on user data, Sensor data, Medical and Enterprise data. Since the volume of the medical data is increasing due to the presence of a vast number of features; the conventional rule mining technique is not competent to handle the data and to perform precise diagnosis. For instance, this paper intends to implement the improved rule mining technique to overcome the above-mentioned limitations. The model comes out with two main contribution stages (i) Using Map Reducing Framework (ii) Classification. Initially, the input medical data is given to map reduce framework. Here, Multi-linear Principle Component Analysis (MPCA) is used for reducing the given bulk data. Then, the reduced data is given to the classification process, where it classifies the disease with high accuracy. For this, this paper uses Support Vector Machine (SVM) classifier. After the completion of implementation, the proposed model compares its performance over other conventional methods like Principle Component Analysis- NN (PCA-NN), Independent Component Analysis- NN (ICA-NN) and MPCA-NN respectively in terms of performance measures like accuracy, specificity and, sensitivity, and the superiority of the proposed model is proven over other methods.
PCA was introduced in 1901 , it is a multivariate technique that analyzes a data in which observations are described by several inter correlated quantitative dependent variables. Its goal is to get the important information from the data, to represent it as a set of new orthogonal (independent) variables called principal components. Mathematically, PCA depends on the eigen decomposition of positive semi definite matrices and on the singular value decomposition SVD of rectangular matrices  and In case of multicollinearity problem, the researchers used another forms to estimate the parameters like principal component regression PCR . Where this problem occurs when the predictors included in the linear model are highly correlated with each other. When this is the case, the matrix tends to be singular and hence identifying the least squares estimates will face numerical problems. Researchers used the orthogonal matrix ' in the GLM to get the PCR estimator for [3, 9, 10]:
10 Read more
We have run the sPCA to compare the new spca_randtest and previous tests to a real dataset of human mitochondrial DNA (mtDNA). We used a dataset of 85 populations from Central-Western Africa that spans a big portion of the African continent (from Gabon to Senegal; ). Previous analysis on these data detected a clear genetic structure from West to Central Africa with ongoing stepping stone migration movements. We therefore expected that this spatial distribution of genetic variation would be detected as significant. In the sPCA, populations were treated as units of the analysis, for which allele frequencies of mtDNA
Haplotypes are composed of specific combinations of alleles at the several loci on the same chromosome. Because haplotypes incorporate linkage disequilibrium (LD) information from multiple loci, haplotype-based association analyses can provide greater powers than the single- marker analysis in the association studies. However, when we construct haplotypes using many markers simultaneously, we may be confronted with a sparseness problem due to a large number of haplotypes. In this paper, we propose the principal-component (PC) association test as an alternative to the haplotype-based association test. We define the PC scores from the LD blocks and perform the association test using logistic regression. The proposed PC test was applied to the analysis of the Genetic Analysis Workshop 15 simulated data set. By knowing the answers of Problem 3, we evaluated the performance of the PC test and the haplotype-based association test using Akaike Information Criterion (AIC), power, and type I error. The PC test performed better than the haplotype-based association test in the sense that the former tends to have smaller AIC values and slightly greater power than the latter.
bons [6-9]. The process of data collection including field sampling and chemical analysis using instrumentation in- volves uncertainties or errors which are not considered in the PCA/APCS analysis. Paatero (1997) has developed an advanced multivariate factor analysis model Positive Matrix Factorization 2 (PMF) based on least squares ap- proach which also incorporates an optimization process to improve the source apportionment using uncertainty or error estimates involved in sample collection and analy- sis . This technique has been employed by various researchers to apportion sources contributing to the am- bient levels of fine and coarse particulate matter as well as ozone precursors including volatile organic compounds (VOC) [11-18].
11 Read more
Alt and Smith (1988) stated that the main limitation lies on the property that CD = 0 when there is a variable of zero variance or when there is a variable which is a linear combination of other variables. Due to this limitation Djauhari (2005) proposed a different concept of multivariate dispersion measure, called the vector variance (VV). Geometrically VV is the square of the length of the diagonal of a parallelotope generated by all principal components of
Another method is based on the information theory concepts viz. principal component analysis (PCA) method. In this method, information that best describes a face is derived from the entire face image. Based on the Karhunen- Loeve developed in pattern recognition, Kirby and Sirovich [12,13] used principal component analysis to efficiently represent the pictures of faces. Any face image could be around reconstructed by a small collection of weights for each face and a standard face picture, that is, eigen picture. The weights are obtained by projecting the face image onto the eigen picture. In mathematics, Eigenfaces are the set of eigenvectors which are the set of feature vector or characteristic used in the computer vision problem of human face recognition. The principal component of the distribution of faces or the eigenvectors of the covariance matrix of the set of face image is the Eigenfaces. Each face can be represented exactly by a linear combination of the Eigenfaces . The best M eigenfaces construct an M dimension (M-D) space that is called the “face space” which is same as the image space. Turk and Pentland  proposed a face recognition method based on the Eigenfaces approach. Gumus , Ergun  present an evaluation of using various method for face recognition. According to their experiment the classification accuracy increasing dimension of training data set, chosen feature extraction-classifier pairs. Agarwal,M.,  present a methodology for face recognition based on information theory approach of coding and decoding the face
We next consider a selection of datasets from UC Irvine’s online Machine Learning Repository (Lichman 2013). For each of the datasets, one attribute was selected as a protected class, and the remaining attributes were considered part of the feature space. After splitting each dataset into separate training (70%) and testing (30%) sets, the top five principal components were then found for the training sets of each of these datasets three times: once unconstrained, once with (7) with only the mean constraints (and excluding the covariance constraints) with δ = 0, and once with (7) with both the mean and covariance constraints with δ = 0 and µ = 0.01; the test data was then projected onto these vectors. All data was normalized to have unit variance in each feature, which is common practice for datasets with features of incomparable units. For each instance, we estimated ∆(F) using the test set and for the families of linear SVM’s F v and Gaussian kernel
According to the actual data of the producing well of development blocks of the eighth oil recovery factory in Daqing oil fields, adopting the method of cluster analysis to optimize the geological parameters of Putaohua reservoir, using the limitation Data Transfer Methods, with the correlation coefficient on behalf of the clustering distance, adopting the shortest distance clustering method, the following results are obtained after operation (Table 1, Fig 2).
Body surface potential mapping (BSPM) refers to the record- ing and analysis of temporal and spatial distributions of ECG potentials acquired multiple sites on the torso. In contrast to the analysis of the 12-lead ECG, where wave amplitudes, intervals, and morphology are usually considered, BSPM is rather considered in terms such as the shape of the poten- tial distribution and the number and location of extrema. Since the electrodes that define such a map are closely spaced on the body surface, therefore containing considerable re- dundancy, PCA-based methods have been employed for data compression. It has been shown that spatial redundancy can be substantially reduced using the definition in (28) [70, 71], thereby resulting in a subset of leads which contains much richer information than subsets of the original leads of the same size. From such a subset of leads, better separation can be made of diﬀerent types of patients [72, 73].
21 Read more
Ganoderma lucidum (G. lucidum) spores as a valuable Chinese herbal medicine have vast marketa- ble prospect for its bioactivities and medicinal efficacy. This study aims at the development of an effective and simple analytical method to distinguish G. lucidum spores from its fruiting body, which is of essential importance for the quality control and fast discrimination of raw materials of Chinese herbal medicine. Attenuated total reflection Fourier transform infrared (ATR-FTIR) spec- troscopy combined with the appropriate chemometric methods including penalized discriminant analysis, principal component discriminant analysis and partial least squares discriminant analy- sis has been proven to be a rapid and powerful tool for discrimination of G. lucidum spores and its fruiting body with classification accuracy of 99%. The model leads to a well-performed selection of informative spectral absorption bands which improve the classification accuracy, reduce the model complexity and enhance the quantitative interpretations of the chemical constituents of G. lucidum spores regarding its anticancer effects.
11 Read more
proved two theorems: First, the principal diagonal element of the covariance matrix of averaging data is the square of the coefficient of variation of each index. Second, the mean processing of raw data does not change the correlation coefficient between the indicators. The averaging process of raw data not only eliminates the influence of the dimension and quantity of the index, but also reflects the difference and mutual influence degree of each index in the original data.
The contributions of our works are as follows. Firstly, we utilize traditional average filter algorithms to reduce noise of wireless signals and analyze their differences, which is effective to remove the environment noises. Secondly we use low pass filtering, such as Butterworth, to suppress the high frequency component of noise. Finally, we propose a principal component analysis (PCA) method based on feature extraction algorithm. PCA reduces dimensions of the CSI information obtained from the 30 sub-carriers and removes the uncorrelated noises. The experiments show that the filtered CSI information can be used to enhance the accuracy of the human activity recognition algorithms.
11 Read more