O mn , which brings prohibitive computational complexity in large-scale problems. In this paper, an efficient and scalable algorithm for **tensor** **principal** **component** **analysis** is proposed which is called Linearized Alternating Direction Method with Vectorized technique for **Tensor** **Principal** **Component** **Analysis** (LADMVTPCA). Different from traditional matrix factorization methods, LADMVTPCA utilizes the vectorized technique to formulate the **tensor** as an outer product of vectors, which greatly improves the computational efficacy compared to matrix factorization method. In the experiment part, synthetic **tensor** data with different orders are used to empirically evaluate the proposed algorithm LADMVTPCA. Results have shown that LADMVTPCA outper- forms matrix factorization based method.

10 Read more

A novel **tensor**-based feature extractor called TPCA (**Tensor** **Principal** **Component** **Analysis**) is proposed for hyperspectral image classification. First, we propose a new **tensor** matrix algebraic framework, which combines the merits of the re- cently emerged t-product model, which is based on the circular convolution, and the traditional matrix algebra. With the help of the proposed algebraic framework, we extend the traditional PCA algorithm to its tensorial variant TPCA. To speed up the **tensor**-based computing of TPCA, we also propose a fast TPCA for which the calculations are conducted in the Fourier domain. With a tensorization scheme via a neighborhood of each pixel, each sample is defined by a tensorial vector whose entries are all second-order tensors and TPCA can effectively extract the spectral-spatial information in a given hyperspectral image. To make TPCA applicable to traditional vector-based classifiers, we design a straightforward but effective approach to transform TPCA’s output **tensor** vector to a traditional vector. Experiments to classify the pixels of two publicly available benchmark hyperspectral images show that TPCA outperforms its rivals including PCA, LDA, TDLA and LDLA in term of classification accuracy.

We present classical and quantum algorithms based on spectral methods for a problem in **tensor** **principal** **component** **analysis**. The quantum algorithm achieves a quartic speedup while using exponentially smaller space than the fastest classical spectral algorithm, and a super-polynomial speedup over classical algorithms that use only polynomial space. The classical algorithms that we present are related to, but slightly different from those pre- sented recently in Ref. [1]. In particular, we have an improved threshold for recovery and the algorithms we present work for both even and odd order tensors. These results sug- gest that large-scale inference problems are a promising future application for quantum computers.

36 Read more

• This paper has presented a new **principal** **component** model of fixed strike volatility deviations from ATM volatility. It has been used to quantify the change that should be made to any given fixed strike volatility per unit change in the underlying.

20 Read more

Face recognition system using the concept of **principal** **component** **analysis** and Genetic Algorithm has been discussed. The simulation is done in MATLAB environment. For implementation this work different sample images for Train and Test Database were taken. The MATLAB code was run for a single Test image approximately 20 seconds for 20 Train and Test images and 78 seconds for 100 Train and Test images. The efficiency calculation for our experiment is given in table 1.

There remain many directions for further research on this topic. First, there many aspects of our understanding of tensors and the t-product can be deepened: to oversimplify the future directions in this regard, a broad approach is to see just how much of our knowledge about matrices generalises to the **tensor** case. This student feels after finishing this thesis that the answer is: a lot. There also many aspects of the TRPCA procedure that could be explored, such as generalising the sign **tensor** of the noise (to a random variable that is not ±1 with equal probability or to non-iid noise) or analysing which **tensor** is returned when the procedure fails. Much work remains to be done surrounding the online algorithm. One method that could be applied to demonstrate the boundedness of A is to note that the implicitly defined function that takes m j and all the past information

111 Read more

A PCA for 3 month implied volatility skew deviations based on the data shown in figure 4b gives the output in table 1. It is clear from table 1a that the first **principal** **component** is only explaining 74% of the movement in the volatility surface and that the second **principal** **component** is rather important as it explains an additional 12% of the variation over the period. It is interesting that the factor weights shown in table 1b indicate the standard interpretation of the first three **principal** components in a term structure, as parallel shift, tilt and convexity components. Note that sparse trading in very out-of-the money options implies that the extreme low strike volatilities show less correlation with the rest of the system, and this is reflected by their lower factor weights on the first **component**.

16 Read more

Abstract: B ig data is the collection and **analysis** of a large set of data which holds many intelligence and raw information based on user data, Sensor data, Medical and Enterprise data. Since the volume of the medical data is increasing due to the presence of a vast number of features; the conventional rule mining technique is not competent to handle the data and to perform precise diagnosis. For instance, this paper intends to implement the improved rule mining technique to overcome the above-mentioned limitations. The model comes out with two main contribution stages (i) Using Map Reducing Framework (ii) Classification. Initially, the input medical data is given to map reduce framework. Here, Multi-linear Principle **Component** **Analysis** (MPCA) is used for reducing the given bulk data. Then, the reduced data is given to the classification process, where it classifies the disease with high accuracy. For this, this paper uses Support Vector Machine (SVM) classifier. After the completion of implementation, the proposed model compares its performance over other conventional methods like Principle **Component** **Analysis**- NN (PCA-NN), Independent **Component** **Analysis**- NN (ICA-NN) and MPCA-NN respectively in terms of performance measures like accuracy, specificity and, sensitivity, and the superiority of the proposed model is proven over other methods.

PCA was introduced in 1901 [12], it is a multivariate technique that analyzes a data in which observations are described by several inter correlated quantitative dependent variables. Its goal is to get the important information from the data, to represent it as a set of new orthogonal (independent) variables called **principal** components. Mathematically, PCA depends on the eigen decomposition of positive semi definite matrices and on the singular value decomposition SVD of rectangular matrices [7] and In case of multicollinearity problem, the researchers used another forms to estimate the parameters like **principal** **component** regression PCR [1]. Where this problem occurs when the predictors included in the linear model are highly correlated with each other. When this is the case, the matrix tends to be singular and hence identifying the least squares estimates will face numerical problems. Researchers used the orthogonal matrix ' in the GLM to get the PCR estimator for [3, 9, 10]:

10 Read more

We have run the sPCA to compare the new spca_randtest and previous tests to a real dataset of human mitochondrial DNA (mtDNA). We used a dataset of 85 populations from Central-Western Africa that spans a big portion of the African continent (from Gabon to Senegal; [14]). Previous **analysis** on these data detected a clear genetic structure from West to Central Africa with ongoing stepping stone migration movements. We therefore expected that this spatial distribution of genetic variation would be detected as significant. In the sPCA, populations were treated as units of the **analysis**, for which allele frequencies of mtDNA

Haplotypes are composed of specific combinations of alleles at the several loci on the same chromosome. Because haplotypes incorporate linkage disequilibrium (LD) information from multiple loci, haplotype-based association analyses can provide greater powers than the single- marker **analysis** in the association studies. However, when we construct haplotypes using many markers simultaneously, we may be confronted with a sparseness problem due to a large number of haplotypes. In this paper, we propose the **principal**-**component** (PC) association test as an alternative to the haplotype-based association test. We define the PC scores from the LD blocks and perform the association test using logistic regression. The proposed PC test was applied to the **analysis** of the Genetic **Analysis** Workshop 15 simulated data set. By knowing the answers of Problem 3, we evaluated the performance of the PC test and the haplotype-based association test using Akaike Information Criterion (AIC), power, and type I error. The PC test performed better than the haplotype-based association test in the sense that the former tends to have smaller AIC values and slightly greater power than the latter.

bons [6-9]. The process of data collection including field sampling and chemical **analysis** using instrumentation in- volves uncertainties or errors which are not considered in the PCA/APCS **analysis**. Paatero (1997) has developed an advanced multivariate factor **analysis** model Positive Matrix Factorization 2 (PMF) based on least squares ap- proach which also incorporates an optimization process to improve the source apportionment using uncertainty or error estimates involved in sample collection and analy- sis [10]. This technique has been employed by various researchers to apportion sources contributing to the am- bient levels of fine and coarse particulate matter as well as ozone precursors including volatile organic compounds (VOC) [11-18].

11 Read more

Alt and Smith (1988) stated that the main limitation lies on the property that CD = 0 when there is a variable of zero variance or when there is a variable which is a linear combination of other variables. Due to this limitation Djauhari (2005) proposed a different concept of multivariate dispersion measure, called the vector variance (VV). Geometrically VV is the square of the length of the diagonal of a parallelotope generated by all **principal** components of

Another method is based on the information theory concepts viz. **principal** **component** **analysis** (PCA) method. In this method, information that best describes a face is derived from the entire face image. Based on the Karhunen- Loeve developed in pattern recognition, Kirby and Sirovich [12,13] used **principal** **component** **analysis** to efficiently represent the pictures of faces. Any face image could be around reconstructed by a small collection of weights for each face and a standard face picture, that is, eigen picture. The weights are obtained by projecting the face image onto the eigen picture. In mathematics, Eigenfaces are the set of eigenvectors which are the set of feature vector or characteristic used in the computer vision problem of human face recognition. The **principal** **component** of the distribution of faces or the eigenvectors of the covariance matrix of the set of face image is the Eigenfaces. Each face can be represented exactly by a linear combination of the Eigenfaces [14]. The best M eigenfaces construct an M dimension (M-D) space that is called the “face space” which is same as the image space. Turk and Pentland [15] proposed a face recognition method based on the Eigenfaces approach. Gumus , Ergun [16] present an evaluation of using various method for face recognition. According to their experiment the classification accuracy increasing dimension of training data set, chosen feature extraction-classifier pairs. Agarwal,M., [17] present a methodology for face recognition based on information theory approach of coding and decoding the face

We next consider a selection of datasets from UC Irvine’s online Machine Learning Repository (Lichman 2013). For each of the datasets, one attribute was selected as a protected class, and the remaining attributes were considered part of the feature space. After splitting each dataset into separate training (70%) and testing (30%) sets, the top five **principal** components were then found for the training sets of each of these datasets three times: once unconstrained, once with (7) with only the mean constraints (and excluding the covariance constraints) with δ = 0, and once with (7) with both the mean and covariance constraints with δ = 0 and µ = 0.01; the test data was then projected onto these vectors. All data was normalized to have unit variance in each feature, which is common practice for datasets with features of incomparable units. For each instance, we estimated ∆(F) using the test set and for the families of linear SVM’s F v and Gaussian kernel

According to the actual data of the producing well of development blocks of the eighth oil recovery factory in Daqing oil fields, adopting the method of cluster **analysis** to optimize the geological parameters of Putaohua reservoir, using the limitation Data Transfer Methods, with the correlation coefficient on behalf of the clustering distance, adopting the shortest distance clustering method, the following results are obtained after operation (Table 1, Fig 2).

Body surface potential mapping (BSPM) refers to the record- ing and **analysis** of temporal and spatial distributions of ECG potentials acquired multiple sites on the torso. In contrast to the **analysis** of the 12-lead ECG, where wave amplitudes, intervals, and morphology are usually considered, BSPM is rather considered in terms such as the shape of the poten- tial distribution and the number and location of extrema. Since the electrodes that define such a map are closely spaced on the body surface, therefore containing considerable re- dundancy, PCA-based methods have been employed for data compression. It has been shown that spatial redundancy can be substantially reduced using the definition in (28) [70, 71], thereby resulting in a subset of leads which contains much richer information than subsets of the original leads of the same size. From such a subset of leads, better separation can be made of diﬀerent types of patients [72, 73].

21 Read more

Ganoderma lucidum (G. lucidum) spores as a valuable Chinese herbal medicine have vast marketa- ble prospect for its bioactivities and medicinal efficacy. This study aims at the development of an effective and simple analytical method to distinguish G. lucidum spores from its fruiting body, which is of essential importance for the quality control and fast discrimination of raw materials of Chinese herbal medicine. Attenuated total reflection Fourier transform infrared (ATR-FTIR) spec- troscopy combined with the appropriate chemometric methods including penalized discriminant **analysis**, **principal** **component** discriminant **analysis** and partial least squares discriminant analy- sis has been proven to be a rapid and powerful tool for discrimination of G. lucidum spores and its fruiting body with classification accuracy of 99%. The model leads to a well-performed selection of informative spectral absorption bands which improve the classification accuracy, reduce the model complexity and enhance the quantitative interpretations of the chemical constituents of G. lucidum spores regarding its anticancer effects.

11 Read more

proved two theorems: First, the **principal** diagonal element of the covariance matrix of averaging data is the square of the coefficient of variation of each index. Second, the mean processing of raw data does not change the correlation coefficient between the indicators. The averaging process of raw data not only eliminates the influence of the dimension and quantity of the index, but also reflects the difference and mutual influence degree of each index in the original data.

The contributions of our works are as follows. Firstly, we utilize traditional average filter algorithms to reduce noise of wireless signals and analyze their differences, which is effective to remove the environment noises. Secondly we use low pass filtering, such as Butterworth, to suppress the high frequency **component** of noise. Finally, we propose a **principal** **component** **analysis** (PCA) method based on feature extraction algorithm. PCA reduces dimensions of the CSI information obtained from the 30 sub-carriers and removes the uncorrelated noises. The experiments show that the filtered CSI information can be used to enhance the accuracy of the human activity recognition algorithms.

11 Read more