The PLS method in its traditional form does not consider the local information of data. To tackle this problem, a new dimension reduc- tion method called Locality preserving partialleastsquares (LPPLS) (Zhang et al., 2016) was proposed. LPPLS is obtain by introducing lo- cal information into the objective function of PLS. Similarly, a modification of Fisher dis- criminant analysis (FDA) called Locality pre- serving fisher discriminant analysis (LPFDA) (Zhao and Tian, 2009) was also proposed to preserve local structure of data. This method combines the idea of FDA and LPP to perform dimension reduction. LPFDA has the discrim- inating ability of FDA and the locality pre- serving ability of LPP. This characteristics of LPFDA makes it more efficient than FDA.
Structural equation modeling (SEM) and path modeling with latent variables (LVP) are used to measure complex cause-effect relationships (Fornell/Larcker 1981; Chin 1998a; Steenkamp/ Baumgartner 2000). Such models are often applied in marketing to perform research on brand equity (Yoo et al. 2000), consumer behavior (Sargeant et al. 2006) or customer satisfaction (Anderson/Sullivan 1993; Chun/Davies 2006). Covariance-based structural equation modeling (CBSEM, Jöreskog 1978) and partialleastsquares analysis (PLS, Lohmöller 1989) constitute the two corresponding, yet distinctive (Schneeweiß 1991), statistical techniques for assessing cause- effect-relationship-models with latent variables. Compared to CBSEM, Wold's (1982) basic PLS design or basic method of soft modeling is rather a different than alternative methodology for estimating these models (Fornell/Bookstein 1982). Soft modeling refers to the ability of PLS to be more flexible in handling various modeling problems in situations where it is difficult or impossible to meet the hard assumptions of more traditional multivariate statistics. Within this context, "soft" is only attributed to distributional assumptions and not to the concepts, the models or the estimation techniques (Lohmöller 1989). However, CBSEM is customarily used in market- ing to estimate relationships in cause-effect models via latent variables and empirical data. Apparently, there has been little concern about the frequent inability of marketing data to meet methodological requirements or about the common occurrence of improper solutions (Fornell/ Bookstein 1982; Baumgartner/Homburg 1996). Representing a well-substantiated alternative to CBSEM, PLS is relatively unknown and rarely used in marketing research, which fails to appre- ciate its importance for estimating LVP in a variety of contexts, ranging from theoretical and applied research in marketing, management and other social sciences disciplines. Recognizable PLS-based LVP analyses in business research are presented by, for example, Fornell et al. (1985); Fornell et al. (1990); Fornell et al. (1996); Gray/Meister (2004); Venkatesh/Agarwal (2006). Nevertheless, the statistical instruments needed to complement the PLS method for business research are not well developed.
With the exponentially growing volume of data sets, multivariate methods for reducing dimensionality are an important research area in statistics. For combining two data sets, PartialLeastSquares (PLS) regression [28] is a popular dimension reduction method [1]. PLS decomposes variation in each data set in a joint part and a residual part. The joint part is a linear projection of one data set on the other that best explains the covariance between the two data sets. These projections are obtained by iterative algorithms, such as NIPALS [28]. PartialLeastSquares is popular in chemometrics [3]. In this field, the focus is on development of algorithms with good prediction performance, while the underlying model is less important. For applications in life sciences, interpretation of parameter estimates is necessary to gain understanding of the underlying molecular mechanisms.
We have suggested a regularized backward elimination algorithm for variable selection using PartialLeastSquares, where the focus is to obtain a hard, and at the same time stable, selection of variables. In our proposed procedure, we compared three PLS-based selection cri- teria, and all produced good results with respect to size of selected model, model performance and selection sta- bility, with a slight overall improvement for the VIP cri- terion. We obtained a huge reduction in the number of selected variables compared to using the models with optimum performance based on training. The apparent loss in performance compared to the optimum based models, as judged by the fit to the training set, is vir- tually disappearing when evaluated on a separate test set. Our selected model performs at least as good as three alternative methods, Forward, Lasso and ST-PLS, on the present test data. This also indicates that the reg- ularized algorithm not only obtain models with superior interpretation potential, but also an improved stability with respect to classification of new samples. A method like this could have many potential uses in genomics,
The T 2 ellipse assisted analysis method of the partialleastsquares (PLS) is able to recognize the noise, however, it fails to analyze the noise in the multidimensional space. In the paper, we propose the slacks-based measure ( SBM ) algorithm to optimize PLS. Firstly, evaluating the sample data comprehensively with SBM, we can gain the valid data. Secondly, analyzing the data based on the PLSR. The two steps can avoid the impact which the noise data have on the regression accuracy and make up the aided analysis technology of the PLSR. Through the calculation of traditional Chinese medicine (TCM) experiments, for two dependent variables, we find out that the average relative errors of the optimized PLS with SBM algorithm are 5.0844% and 8.7485%, which are lower than the results ( 5.5825% and 9.2810% ) by using the PLSR. Besides, for a single dependent variable of the data of tool wear test, the average relative error optimized by SBM is 2.6984%, which is lower than 3.3526% calculated by utilizing the PLS. The experiments result indicate that the regression precision of the PLSR optimized by SBM is much higher than PLSR.
Heuristically, we can directly employ unsupervised methods in single-label learning, e.g., PCA[7]. Besides, we can use matrix factorization methods to obtain the low-dimensional representations, such as manifold kernel concept factorization [13] and discriminant orthogonal nonnegative matrix factorization [14]. But it is a common fact that the dimensionality reduction can be better performed while guided by supervised information, such as pairwise constraints or labels themselves. This poses a challenge for multi-label data since several labels might be associated with each data point. If we treat each label set as an individual, the number of label combinations is always too huge to handle and the label correlations are neglected as well. To this end, a number of methods have emerged to address this issue for regression and classification [8, 15, 16, 17, 18]. Among these methods, PartialLeastSquares (PLS) [9] and Canonical Correlation Analysis (CCA) [8] are two representative ones, which are used for finding the relationships between two sets of variables.
Given the high development and application potentials for on-line (automated sample extraction and delivery to the analyzer) and in-line (in situ analysis using a probe inside the process) meas- urements in wastewater quality monitoring [4], UV-Visible (UV-Vis) spectrophotometry can be very useful in this area, contributing to the correct operation of the treatment systems. UV-Vis spec- trophotometry is fast and simple method that has been used for wastewater quality evaluation and organic matrix composition identification [5, 6], since most organic compounds and a few soluble minerals (such as nitrates) absorb in the UV-Vis region. The reported quantitative environmental applications of UV-Vis spectrophotometry include the estimation of organic matter and nitrate in wastewaters [7], and determination of polycyclic aromatic hydrocarbons (PAH) in soils [8]. In the present work UV-Vis spectra of samples collected in a fuel park Wastewater Treatment Plant (WWTP), after biological treatment, were acquired and used for the attempted development of PartialLeastSquares (PLS) calibration models for four environmental monitoring parameters, namely, Chemical Oxygen Demand (COD), 5-day Bio- chemical Oxygen Demand (BOD 5 ), Total Sus-
Extending structural equations models based on composites (PLS-SEM) has been widely diffused of late in areas like marketing (Hair, Sarstedt, Ringle & Mena, 2012b; Henseler, Ringle & Sinkovics, 2009; Hult, Hair, Proksch, Sarstedt, Pinkwart & Ringle, 2018), information systems (Petter, 2018; Ringle, Sarstedt & Straub, 2012; Roldán & Sánchez-Franco, 2012; Sharma, Sarstedt, Shmueli, Kim & Thiele, 2019; Urbach & Ahleman, 2010), tourism (do Valle & Assaker, 2016; Duarte & Amaro, 2018; Kumar & Purani, 2018; Latan, 2018; Usakli & Kucukergin, 2018), health sciences (Avkiran, 2018) or human resources (Ringle, Sarstedt, Mitchell, & Gudergan, 2018), among others. The two most recent reviews in the operations management area (Kaufmann & Gaeckler, 2015; Peng & Lai, 2012) came prior to intense debate about the methodology that shook the PartialLeastSquares (PLS) community between 2014 and 2018. New tools continue to be developed and modifications to the report standards of articles made using PLS as an analysis tool are established.
This study investigates the factors that influence the intention of citizens of the United Arab Emirates (UAE) to use a mobile government (mGov) platform. The UAE is one of the leading countries that offer this service in the Arab world. Advanced mobile technologies have transformed how people access government services and have allowed them to benefit from these services from any place at any time. Smartphones and mobile-application-based technologies are examples. Drawing on technology acceptance theories and relevant literature, this study develops a structural model that integrates the major theories of technology adoption – i.e., technology acceptance model (TAM), diffusion of innovation (DOI), and trust model – in order to investigate the predictors of mGov adoption in the UAE. The proposed model includes six main factors as significant predictors of intention to use mGov. The model is tested using structural equation modeling (SEM) with a partialleast-squares (PLS) approach on data collected using an online verified questionnaire. Results indicate significant support for four of the six factors (the influence of compatibility, perceived ease of use, social influence, and trust in technology), but do not support the other two factors (perceived usefulness and trust in government). The study concludes with implications of the research and suggestions for future study.
detect dysregulated genes. This statistical procedure ignored unaccounted array specific factors, including various biological, environmental factors. Previous studies [7,8] have suggested that partialleastsquares (PLS) based expression profile analysis is efficient in dealing with large amount of genes and fairly small samples. Compared with variance and regression analysis, PLS based analysis is more sensitive while maintaining reasonable high specificity, small false discovery rate and false non-discovery rate. Previous study using PLS analysis on other complex disease such as breast cancer has proved its feasibility [9]. Therefore, capturing the gene expression signature in renal failure patients by using PLS based analysis may provide new understanding of the pathogenesis and offer potential therapeutic targets.
PartialLeastSquares (PLS) finds latent variables that are associated with the maximum variation in process data and provides diagonal pairings of latent variables as strong as possible. PLS facilitates in identifying an empirical model from plant data without making any assumptions. First proposed by Wold (1966) PLS has been successfully applied in diverse fields including process monitoring, identification and control and it deals with noisy and highly correlated data, quite often, only with a limited number of observations available [15]. A tutorial description along with some examples on the PLS model was provided by Geladi Kowalaski (1986) [16]. When dealing with nonlinear systems, the underlying nonlinear relationship between predictor variables (X ) and response variables (Y ) can be approximated by quadratic PLS (QPLS) or splines. Sometimes it may not function well when the non-linearities cannot be described by quadratic relationship. Artificial Neural Networks (ANN) can be used to find inner relation to handle nonlinearities [17-23]. This approach employs the neural network as inner model keeping the outer mapping framework as linear PLS algorithm. The conventional PLS is suitable for modeling time independent or steady state processes. Kaspar and Ray (1993) developed dynamic extension of the PLS models by filtering the process inputs and subsequent application of the standard PLS algorithm and demonstrated their approach for identification & control problem using linear models [24]. Lakshminarayanan (1997) proposed the ARX/Hammerstein model as the modified PLS inner relation and used successfully in identifying dynamic models and proposition of PLS based feed forward and feedback controllers [25]. Damarla & Kundu (2011) proposed PLS based artificial neural network scheme for identification and control of distillation process [26]. Kaspar & Ray, Lakshminarayanan and Damarla & Kundu have proposed closed loop control system which uses pre and post compensators acquired from loadings of PLS model for mapping outputs and inputs into physical variables.
Several studies showed that partialleastsquares algorithm is competitive with other regression methods such as ridge regression and principal component regression, needing generally fewer iterations than the latter to achieve comparable estimation and prediction, see, e.g., Frank and Friedman (1993), Kr¨ amer and Braun (2007) and Singer et al. (2016). For an overview of further properties of PLS we refer to Rosipall and Kr¨ amer (2006).
PartialLeastSquares(PLS) [6,7] can well explain the data with the characteristic of multi-independent variables and multiple dependent variables, but the principal components extract from PLS is linear combination of the independent variables’ column vector, apparently, the model is still linear relationship essentially when using such principal component and dependent variables for Multiply Linear Regression(MLR), therefore, PLS will not present good effects for raw TCM data directly.
Our proposed method for constructing candidate models is named partialleast square model averaging (PLSMA). In constructing candidate models, we apply partialleastsquares (PLS) to reduce and transform original explanatory variables become new variables called components, then these components are used to construct candidate models. We choose PLS in process of constructing candidate models at least for two reasons. First, PLS was developed to handle regression analysis in high-dimensional data (number of observations is smaller than number of explanatory variables) and second, the components that constructed from PLS satisfied three conditions: highly correlated with the response variables, have much of the variance among the explanatory variables, and uncorrelated with each other.
Bollen 2009), there is a paucity of research on this topic in partialleastsquares structural equation modeling (PLS-SEM; Wold 1982) whose usage has recently gained momentum in international marketing (Richter et al. 2016) and business research in general (e.g., Ali et al. 2018; Nitzl 2016; Ringle et al. 2018). Some researchers even claim that PLS-SEM does not allow for addressing endogeneity at all (e.g., Antonakis et al. 2010; McIntosh et al. 2014; Rönkkö and Evermann 2013). This assertion is astonishing and inaccurate given that PLS-SEM is grounded in regression analysis (Lohmöller 1989, Chapter 2), for which numerous approaches for handling endogeneity exist (e.g., Ebbes et al. 2005; Park and Gupta 2012; Staiger and Stock 1997). Indeed, Benitez et al. (2016) recently made an advance in this direction by combining the standard PLS-SEM algorithm with the two-stage leastsquares (2SLS) method, but did not consider variables that control for endogeneity’s sources.
Abstract. Person Re-Identification is an important task in surveillance and security systems. Whilst most methods work by extracting features from the entire image, the best methods improve performance by prioritising features from foreground regions during the feature extraction stage. In this paper, we propose the use of a PartialLeastSquares Regression model to predict the skeleton of a person, allowing us to prioritise features from a person’s limbs rather than from the background. Once the foreground area has been identified, we use the LOMO [10] and Salient Colour Names [21] features. We then use the XQDA [10] Distance Metric Learning method to compute the distance between each of the feature vectors. Experiments on VIPeR [4], QMUL GRID [13–15] and CUHK03 [9] data sets demonstrate significant improvements against state- of-the-art.
Abstract—Speech signal is one of the major means for communication which carries not only semantic, but personal information , such as genders and emotions. The researches about speech emotion have become more and more important to human-computer interaction. To this end, from speech, the long-term and short-term emotional features are extracted, the dimensionality of which is then reduced by virtue of the multi linear PCA algorithm. Finally, the kernel partialleastsquares regression is used for speech emotional recognition. The results show that in comparison with other current classifiers, the algorithm proposed herein can improve recognition rates by about 6% to 23%.
A tensor is used to describe head-related transfer functions (HRTFs) depending on frequencies, sound directions, and anthropometric parameters. It keeps the multi-dimensional structure of measured HRTFs. To construct a multi-linear HRTF personalization model, an individual core tensor is extracted from the original HRTFs using high-order singular value decomposition (HOSVD). The individual core tensor in lower-dimensional space acts as the output of the multi-linear model. Some key anthropometric parameters as the inputs of the model are selected by Laplacian scores and correlation analyses between all the measured parameters and the individual core tensor. Then, the multi-linear regression model is constructed by high-order partialleastsquares (HOPLS), aiming to seek a joint subspace approximation for both the selected parameters and the individual core tensor. The numbers of latent variables and loadings are used to control the complexity of the model and prevent overfitting feasibly. Compared with the partialleastsquares regression (PLSR) method, objective simulations demonstrate the better performance for predicting individual HRTFs especially for the sound directions ipsilateral to the concerned ear. The subjective listening tests show that the predicted individual HRTFs are approximate to the measured HRTFs for the sound localization.
Classical PCR, PLS and RR techniques are well known shrinkage estimators designed to deal with multicollinearity (see, e.g., Frank and Friedman, 1993, Montgomery and Peck, 1992, Jolliffe, 1986). The multicollinearity or near-linear dependence of regressors is a serious problem which can dramatically influence the effectiveness of a regression model. Multicollinearity results in large variances and covariances for the leastsquares estimators of the regression coefficients. Multicollinearity can also produce estimates of the regression coefficients that are too large in absolute value. Thus the values and signs of estimated regression coefficients may change considerably given different data samples. This effect can lead to a regression model which fits the training data reasonably well, but generalizes poorly to new data (Montgomery and Peck, 1992). This fact is in a very close relation to the argument stressed in (Smola et al., 1998), where the authors have shown that choosing the flattest linear regression function 1 in a feature space can, based on the smoothing properties of the selected kernel function, lead to a smooth nonlinear function in the input space.
To define the optimal number of factors required to construct the calibration models, PRESS (Prediction Error Sum of Squares) was drawn in function of the ten factors to be evaluated. The results are shown in figure-2. The election of the optimal number of factors was carried out by applying the local minimum criterion with respect to PRESS and the percentage of the accumulated variance 21, 22 , where two