The Partial Least Squares Regression procedure estimates partial least squares (PLS, also known as "projection to latent structure") regression models. PLS is a predictive technique that is an alternative to ordinary least squares (OLS) regression, canonical correlation, or structural equation modeling, and it is particularly useful when predictor variables are highly correlated or when the number of predictors exceeds the number of cases.
PLS combines features of principal components analysis and multiple regression. It first extracts a set of latent factors that explain as much of the covariance as possible between the independent and dependent variables. Then a regression step predicts values of the dependent variables using the decomposition of the independent variables.
Tables.Proportion of variance explained (by latent factor), latent factor weights, latent factor loadings, independent variable importance in projection (VIP), and regression parameter estimates (by dependent variable) are all produced by default.
Charts.Variable importance in projection (VIP), factor scores, factor weights for the first three latent factors, and distance to the model are all produced from the Options tab.
Partial Least Squares Regression Data Considerations
Measurement level.The dependent and independent (predictor) variables can be scale, nominal, or ordinal. The procedure assumes that the appropriate measurement level has been assigned to all
variables, although you can temporarily change the measurement level for a variable by right-clicking the variable in the source variable list and selecting a measurement level from the pop-up menu. Categorical (nominal or ordinal) variables are treated equivalently by the procedure.
Categorical variable coding.The procedure temporarily recodes categorical dependent variables using one-of-ccoding for the duration of the procedure. If there areccategories of a variable, then the variable is stored ascvectors, with the first category denoted (1,0,...,0), the next category (0,1,0,...,0), ..., and the final category (0,0,...,0,1). Categorical dependent variables are represented using dummy coding; that is, simply omit the indicator corresponding to the reference category.
Frequency weights.Weight values are rounded to the nearest whole number before use. Cases with missing weights or weights less than 0.5 are not used in the analyses.
Missing values.User- and system-missing values are treated as invalid.
Rescaling.All model variables are centered and standardized, including indicator variables representing categorical variables.
To Obtain Partial Least Squares Regression From the menus choose:
Analyze>Regression>Partial Least Squares... 1. Select at least one dependent variable. 2. Select at least one independent variable. Optionally, you can:
v Specify a variable to be used as a unique identifier for casewise output and saved datasets. v Specify an upper limit on the number of latent factors to be extracted.
Prerequisites
The Partial Least Squares Regression procedure is a Python extension command and requires IBM SPSS Statistics - Essentials for Python, which is installed by default with your IBM SPSS Statistics product. It also requires the NumPy and SciPy Python libraries, which are freely available.
Note: For users working in distributed analysis mode (requires IBM SPSS Statistics Server), NumPy and SciPy must be installed on the server. Contact your system administrator for assistance.
Windows and Mac Users
For Windows and Mac, NumPy and SciPy must be installed to a separate version of Python 2.7 from the version that is installed with IBM SPSS Statistics. If you do not have a separate version of Python 2.7, you can download it from http://www.python.org. Then, install NumPy and SciPy for Python version 2.7. The installers are available from http://www.scipy.org/Download.
To enable use of NumPy and SciPy, you must set your Python location to the version of Python 2.7 where you installed NumPy and SciPy. The Python location is set from the File Locations tab in the Options dialog (Edit > Options).
Linux Users
We suggest that you download the source and build NumPy and SciPy yourself. The source is available from http://www.scipy.org/Download. You can install NumPy and SciPy to the version of Python 2.7 that is installed with IBM SPSS Statistics. It is in thePythondirectory under the location where IBM SPSS Statistics is installed.
If you choose to install NumPy and SciPy to a version of Python 2.7 other than the version that is installed with IBM SPSS Statistics, then you must set your Python location to point to that
version. The Python location is set from the File Locations tab in the Options dialog (Edit > Options).
Windows and Unix Server
NumPy and SciPy must be installed, on the server, to a separate version of Python 2.7 from the version that is installed with IBM SPSS Statistics. If there is not a separate version of Python 2.7 on the server, then it can be downloaded from http://www.python.org. NumPy and SciPy for Python 2.7 are available from http://www.scipy.org/Download. To enable use of NumPy and SciPy, the Python location for the server must be set to the version of Python 2.7 where NumPy and SciPy are installed. The Python location is set from the IBM SPSS Statistics Administration Console.
Model
Specify Model Effects.A main-effects model contains all factor and covariate main effects. Select Custom to specify interactions. You must indicate all of the terms to be included in the model.
Factors and Covariates.The factors and covariates are listed.
Model.The model depends on the nature of your data. After selecting Custom, you can select the main effects and interactions that are of interest in your analysis.
Build Terms
For the selected factors and covariates:
Main effects.Creates a main-effects term for each variable selected.
All 2-way.Creates all possible two-way interactions of the selected variables. All 3-way.Creates all possible three-way interactions of the selected variables. All 4-way.Creates all possible four-way interactions of the selected variables. All 5-way.Creates all possible five-way interactions of the selected variables.
Options
The Options tab allows the user to save and plot model estimates for individual cases, latent factors, and predictors.
For each type of data, specify the name of a dataset. The dataset names must be unique. If you specify the name of an existing dataset, its contents are replaced; otherwise, a new dataset is created.
v Save estimates for individual cases.Saves the following casewise model estimates: predicted values, residuals, distance to latent factor model, and latent factor scores. It also plots latent factor scores. v Save estimates for latent factors.Saves latent factor loadings and latent factor weights. It also plots
latent factor weights.
v Save estimates for independent variables.Saves regression parameter estimates and variable importance to projection (VIP). It also plots VIP by latent factor.