Top PDF The Common Principal Component (CPC) Approach to Functional time Series (FTS) Models

The Common Principal Component (CPC) Approach to Functional time Series (FTS) Models

The Common Principal Component (CPC) Approach to Functional time Series (FTS) Models

Abstract: The functional time series (FTS) models are used for analyzing, modeling and forecasting age-specific mortality rates. However, the application of these models in presence of two or more groups within similar populations needs some modification. In these cases, it is desirable for the disaggregated forecasts to be coherent with the overall forecast. The 'coherent' forecasts are the non-divergent forecasts of sub-groups within a population. Reference [1] first proposed a coherent functional model based on product and ratios of mortality rates. In this paper, we relate some of the functional time series models to the common principal components (CPC) and partial common principal components (PCPC) models introduced by [2] and provide the methods to estimate these models. We call them common functional principal component (CFPC) models and use them for coherent mortality forecasting. Here, we propose a sequential procedure based on Johansen methodology to estimate the model parameters. We use vector approach and make use of error correction models to forecast the specific time series coefficient for each sub-group.
Show more

5 Read more

Fast Iterative Kernel Principal Component Analysis

Fast Iterative Kernel Principal Component Analysis

We present two sets of experiments. In the first, we benchmark against the KHA with a conventional gain decay schedule (9), which we denote KHA/t, in a number of different settings: Performing ker- nel PCA and spectral clustering on the well-known USPS data set (LeCun et al., 1989), replicating image denoising and face image super-resolution experiments of Kim et al. (2005), and denoising human motion capture data. For Kim et al.’s (2005) experiments we also compare to their original KHA with the constant gain η t = η 0 they employed. A common feature of all these data sets is

26 Read more

A Wireless Signal Denoising Model for Human Activity Recognition

A Wireless Signal Denoising Model for Human Activity Recognition

Abstract. Some pioneer WIFI signal based human activity recognition systems have been proposed. The common characteristic is to use the information of CSI(Channel State Information). Experimental results show that the extracted features of the PCA method are more obvious than that of the traditional denoising method. Even in a static environment, CSI values in Wifi signals fluctuate because WiFi devices are susceptible to surrounding electromagnetic noises. General purpose denoising methods, such as low-pass filters or mean filters, do not perform well in removing these impulse and burst noises. In this paper, we propose a method which use the low pass filter and principal component analysis simultaneously. Experimental results show that the extracted features of the PCA method are more obvious than that of the traditional methods.
Show more

11 Read more

A General Framework for Consistency of Principal Component Analysis

A General Framework for Consistency of Principal Component Analysis

A general asymptotic framework is developed for studying consistency properties of princi- pal component analysis (PCA). Our framework includes several previously studied domains of asymptotics as special cases and allows one to investigate interesting connections and transitions among the various domains. More importantly, it enables us to investigate asymptotic scenarios that have not been considered before, and gain new insights into the consistency, subspace consistency and strong inconsistency regions of PCA and the bound- aries among them. We also establish the corresponding convergence rate within each region. Under general spike covariance models, the dimension (or number of variables) discourages the consistency of PCA, while the sample size and spike information (the relative size of the population eigenvalues) encourage PCA consistency. Our framework nicely illustrates the relationship among these three types of information in terms of dimension, sample size and spike size, and rigorously characterizes how their relationships affect PCA consistency. Keywords: High dimension low sample size, PCA, Random matrix, Spike model
Show more

34 Read more

IDENTIFICATION OF HOMOGENEOUS RAINFALL STATIONS IN HARYANA

IDENTIFICATION OF HOMOGENEOUS RAINFALL STATIONS IN HARYANA

et al. (2017) identified three homogeneous rainfall regions in Tocantins State, Brazil using Ward's algorithm of cluster analysis. Similarly, Terassi & Galvani (2017) also identified the homogeneous rainfall regions in the eastern watersheds of the State of Paraná, Brazil. Recently, Siraj-Ud-Doulah & Islam (2019) analyzed monthly rainfall data from 34 climate stations of Bangladesh using five agglomerative hierarchical clustering measures and found that Ward method based on Euclidean distance, K-means, Fuzzy were the most suitable methods in this particular case. They found seven different climate zones in Bangladesh. Similarly, Gonçalves et al. (2018) used annual mean precipitation and found six homogeneous regions through cluster analysis using Ward's agglomeration method, applied to a historical series of 31 years (1960-1990) at 413 satellite monitoring points in the state of Pará, in the Amazon where the selected years occurred during an El Niño or a La Niña event.The aim of this study was to identify homogeneous regions (rain-gauge stations) in Haryana using cluster analysis and common principal component analysis techniques. For the study monthly rainfall data of 42 years (1970-2011), covering 27 rain gauge stations of Haryana was used for the identification of homogeneous rainfall stations in Haryana. 2 Material and Methods
Show more

10 Read more

Credit Scoring Process using Banking Detailed Data Store

Credit Scoring Process using Banking Detailed Data Store

Scope for Non-Statisticians: Analytics is a domain where statistics knowledge background is mandatory to work on it. This was a big hindrance for non-statisticians to work in the field of analytics. The SAS E-miner tool is so user friendly as it performs many statistical operations within it (built-in) that even people from non-statistics can work in analytics. SAS E-Miner Analytical Strengths: The analytical strengths of SAS E-miner include Pattern Discovery (to identify similar and dissimilar clusters, pattern matching), predictive modeling (to predict the future results, consequences etc.), and to perform Credit Scoring as to rate the customers. Credit Scoring Models – Logistic Regression and Principal Component Analysis (PCA)- The two credit scoring models created, Logistic Regression and Principal Component Analysis are created for the same Dataset. Logistic Regression (LR)
Show more

8 Read more

Principal Component Analysis of the Volatility Smiles and Skews

Principal Component Analysis of the Volatility Smiles and Skews

• Fengler, M., W. Hardle and C. Villa (2000) "The Dynamics of Implied Volatilities: A Common Principal Component Approach" Preliminary version (September 2000) available from fengler@wiwi.hu-berlin.de • Skiadopoulos, G., S. Hodges and L. Clewlow (1998) "The Dynamics of

20 Read more

Application Of Dimensionality Reduction On Classification Of Colon Cancer Using Ica And K-Nn Algorithm

Application Of Dimensionality Reduction On Classification Of Colon Cancer Using Ica And K-Nn Algorithm

Dimension reduction is an effective and essential tool used to analyze microarray datasets ([APM11]). A lot of algorithms and feature extraction techniques have been put forward in literature the reduction of dimensionality ([RT15]). Principal Component Analysis (PCA) is one of the most widely used and common dimensionality reduction techniques, it is seen as an unsupervised technique and relatively effective tool, but it’s not considered as efficient for dataset that are complex and of high dimension ([APM11]). Therefore, there is need to address the inability of PCA to precisely retrieve the genuine latent features of complex datasets ([APM11]). Data in a very high dimensional space often exists in a lower dimensional space and unsupervised feature extraction technique such as PCA may not be totally efficient.
Show more

5 Read more

Dimension reduction of machine learning-based forecasting models employing Principal Component Analysis

Dimension reduction of machine learning-based forecasting models employing Principal Component Analysis

Considering the performance of the WANN model, it can be found that its performance can be evaluated satisfactory since it has high and low values of the coefficient of determination and root mean square error, respectively. Moreover, it sounds to be economic in terms of computational cost as well. However, a comparison between the results of the WANN model and the ANN model, it can be found that the WANN model takes more time to be implemented, more expensive in terms of computational effects and complexity but still lower performance. However, it should not be misunderstood or mislead that the DWT does not improve the performance of the existing ANN model. On the other hand, it is mainly due to feeding the WANN model with too many input variables (20 sub-signals) which are inter-correlated. Therefore, it should be manipulated to only select the appropriate sub-signals toward increasing the accuracy and reliability of the model outputs. In this regard, the principal component analysis was performed to derive the most efficient sub-signals to be used in the input structure of the models. Similarly, for the WANFIS model, using all the 20 sub-signals as the model input leads to the generation of too many rules which are not possible to be efficiently executed by the usual CPUs. In this regard, its performance in terms of the error measures cannot be assessed. Therefore, this model with the current input structure requires more amendments to be employed for forecasting purposes. In Table 3, NAN is used to represent that the WANFIS model due to too many rules cannot be implemented.
Show more

17 Read more

Convex Formulations for Fair Principal Component Analysis

Convex Formulations for Fair Principal Component Analysis

We next consider a selection of datasets from UC Irvine’s online Machine Learning Repository (Lichman 2013). For each of the datasets, one attribute was selected as a protected class, and the remaining attributes were considered part of the feature space. After splitting each dataset into separate training (70%) and testing (30%) sets, the top five principal components were then found for the training sets of each of these datasets three times: once unconstrained, once with (7) with only the mean constraints (and excluding the covariance constraints) with δ = 0, and once with (7) with both the mean and covariance constraints with δ = 0 and µ = 0.01; the test data was then projected onto these vectors. All data was normalized to have unit variance in each feature, which is common practice for datasets with features of incomparable units. For each instance, we estimated ∆(F) using the test set and for the families of linear SVM’s F v and Gaussian kernel
Show more

8 Read more

Principal component gene set enrichment (PCGSE)

Principal component gene set enrichment (PCGSE)

mate this correlation or make simplifying assumptions about the correlation structure, they are likely the most accurate of the statistical tests supported by PCGSE and are therefore used to evaluate the performance of the parametric and correlation-adjusted parametric tests. The exact permutation test was also used as a “gold-standard” in Zhou et al. [34]. Although they provide superior handling of inter-gene correlation, permutation tests do suffer from two important disadvantages relative to parametric tests: computational complexity and lower power to detect gene sets whose members all have a small common association with the outcome. Because of these disadvantages, correlation-adjusted parametric tests are preferred for most PCGSE applications.
Show more

18 Read more

Unobserved common factors in military expenditure interactions across MENA countries

Unobserved common factors in military expenditure interactions across MENA countries

In regional contexts such as the MENA region, if there are unobserved common shocks that influence all countries, there is likely to be a cross-sectional dependence or correlation between the residuals in a panel time-series model. If these common shocks are correlated with the regressors, the conventional estimators are biased and inconsistent. In this study we explore the pattern of interactions between military expenditure shares in the MENA region over the period 1979-2007. The unobserved common shocks arise from economic influences (e.g. oil and aid inflows), political and social influences (e.g. militant oppositions) as well as arms race and alliance influences. To identify the unobserved common factors we apply the Principal Component Analysis (PCA) to the shares of military expenditures in the region and on the residuals from a military demand equa- tion. To evaluate the results from the PCA, we use the multiple-indicator multiple-cause model (MIMIC). The MIMIC model enables to validate which observable variables account for the two most important estimated factors.
Show more

26 Read more

An Eigenvalue test for spatial principal component analysis

An Eigenvalue test for spatial principal component analysis

To assess the performance of our test, we simulated gen- etic data under three migration models: island (IS) and stepping stone (SS), using the software GenomePop 2.7 [11], and isolation by distance (IBD), using IBDSimV2.0 [12]. We simulated the IS and SS models with 4 popula- tions, each with 25 individuals, and a single population under IBD with 100 individuals. 200 unlinked biallelic diploid loci (or single nucleotide polymorphisms; SNPs) were simulated. Populations evolved under constant ef- fective population size θ = 20, and interchanged migrants at three different symmetric and homogeneous rates (0.005, 0.01, and 0.1). We performed 100 independent runs for each of the three migration rates, for a total of 300 simulated dataset per migration model. An example of input file for GenomePop 2.7 and IBDSimV2.0 are included as Additional files 1 and 2.
Show more

7 Read more

Component retention in principal component analysis with application to cDNA microarray data

Component retention in principal component analysis with application to cDNA microarray data

where S is the sample covariance matrix. The researcher decides on a satisfactory value for t(k) and then deter- mines k accordingly. The obvious problem with the tech- nique is deciding on an appropriate t(k). In practice it is common to select levels between 70% to 95% [9]. Jackson [7] argues strongly against the use of this method except possibly for exploratory purposes when little is known about the population of the data. An obvious problem occurs when several eigenvalues are of similar magnitude. For example, suppose for some k = k * , t(k * ) = 0.50 and the remaining q - k eigenvalues have approximately the same magnitude. Can one justify adding more components until some predetermined value of t(k) is reached? Jolliffe [9] points out that the rule is equivalent to looking at the spectral decomposition of S. Determining how many terms to include in the decomposition is closely related to
Show more

21 Read more

2D QSAR Studies on 1, 4 dihydropyridines as Ca++ Channel Blockers

2D QSAR Studies on 1, 4 dihydropyridines as Ca++ Channel Blockers

Calcium channel blockers are widely used for the treatment of various cardiac disorders. The existing calcium channel blockers have several short comings; hence there is a need to develop better drugs with better therapeutic profile. 2D-QSAR approach has been useful in such cases. A number of 1, 4- dihydropyridines like amlodipine are extensively used in therapy of cardio vascular disorders. Looking into importance of calcium channel blockers, a series of 1, 4- dihydropyridines was selected and different models based on Multiple linear regression (MLR), Principal component regression (PCR) and Partial Least Squares regression (PLR) analysis were generated to find out correlation between the physicochemical parameters and the biological activity. Multiple linear regression (MLR) coupled with stepwise variable selection led to a statistically significant model as compared to PLR and PCR with respect to r 2 (coefficient of determination 0.8986) and q 2 (cross- validation, > 0.5). Four descriptors are included in 2D- QSAR equation generated by using MLR.
Show more

7 Read more

fACTOR-ANALYSIS-fd-ts.pptx

fACTOR-ANALYSIS-fd-ts.pptx

 Although both component and common factor analysis models yield similar results in common research settings (30 or more variables or communalities of .60 for most variables):.  the [r]

30 Read more

Data Warehouse Architecture for DSS Applciations

Data Warehouse Architecture for DSS Applciations

As different data models may be employed by the underlying component databases, it becomes imperative to map them to a common data model. The export schema provides a common representati[r]

11 Read more

Dimensionality Reduction Techniques for Improved Diagnosis of Heart Disease

Dimensionality Reduction Techniques for Improved Diagnosis of Heart Disease

In a factor analysis model, the measured variables depend on few dormant factors. Each of the factors affects several variables in common, hence they are known as common factors (referred to as ‘Factors 3’ in this paper). Each variable can be represented by a linear combination of the common factors. The coefficients of this linear equation are known as loadings. Each measured variable includes a component due to independent random variability, known as specific variance because it is specific to one variable. Factor analysis models the correlation structure in terms of k factors including measurement errors. Unlike PCA the factors are not sorted by any criterion.
Show more

8 Read more

PRINCIPAL COMPONENT ANALYSIS

PRINCIPAL COMPONENT ANALYSIS

C. Review all of the surviving variables with high loadings on component 1 to determine the nature of this component. From the rotated factor pattern, you can see that only items 4, 5, and 6 load on component 1 (note the asterisks). It is now necessary to turn to the questionnaire itself and review the content of the questions in order to decide what a given component should be named. What do questions 4, 5, and 6 have in common? What common construct do they seem to be measuring? For illustration, the questions being analyzed in the present case are reproduced here. Remember that question 4 was represented as V4 in the SAS program, question 5 was V5, and so forth. Read questions 4, 5, and 6 to see what they have in common.
Show more

56 Read more

Linguistic pitch analysis using functional principal component mixed effect models

Linguistic pitch analysis using functional principal component mixed effect models

covariates. However, it is reassuring to note, that despite allowing the contours to be non- parametrically specified, the first component did conform to expected linguistic theory for Luobuzhai Qiang in that the most important aspect of the tonal change is a shift rather than a contour change. In particular, the largest contributing covariates to the first eigenfunction were gender, tone, vowel type and sentence type. The random effects of subject and word item were both also significant. This indicates that the shift is speaker dependent, as well as dependent on the word item being said. While these effects are still relatively small in comparison to the effects of gender and tone, their significance shows that it is still important to consider the random nature of these effects in the analysis.
Show more

30 Read more

Show all 10000 documents...