In this section, concepts associated with the SEM model are described. Concepts include latent and observed variables, exogenous and endogenous latent variables, factor analytic models, ordinal variables and polychoric correlation.
4.2.1
Latent and observed variables
Researchers are often interested in studying theoretical concepts that cannot be observed directly. These concepts are known as latent variables or factors. An example of latent variables are the eight listening strategies being analysed in this thesis.
As latent variables are not observed directly, they cannot be measured directly. As a result, the latent variables must be defined in terms of the behaviour the researcher believes represents it. As such an observed variable is linked to the la- tent variable. They serve as indicators of the underlying concepts that they are
4.2. BASIC CONCEPTS USED IN STRUCTURAL EQUATION MODELS 63
presumed to represent. This enables the measurement of the unobserved variable possible (Byrne 2006, chp. 1).
In the data that we are analysing, the thirty eight questions answered in the survey are the observed variables and the eight listening strategies are the unobserved latent variables.
4.2.2
Exogenous and endogenous latent variables
Here we distinguish between latent variables that are exogenous and endogenous. Exogenous latent variables are the same as independent variables. They can cause variation in the values of the other latent variables in the model. These variations are not explained by the model. Rather, they are considered to be influenced by other factors external to the model (Byrne 2006, chp.1). Variables such as gender and age are examples of these external factors.
Endogenous latent variables are the same as dependent variables. They are influ- enced by the exogenous latent variables in the model, either directly or indirectly. The variations in the values of endogenous variables are said to be explained by the model. This is because all of the latent variables that influence the endogenous variables are included in the model specification (Byrne 2006, chp. 1).
4.2.3
The factor analytic model
Factor analysis is a well known statistical procedure for investigating relationships between sets of observed and unobserved variables. This procedure allows the examination of the covariance among a set of observed variables. This enables in- formation to be gathered on their underlying latent concepts.
There are two basic types of factor analytic models: exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Both these models concentrate on
how and to what extent the observed variables are linked to their underlying latent variables. In this thesis we will concentrate on the CFA model.
4.2.4
Ordinal variables
Let y1, y2, ..., yp be p ordinal variables. It is assumed that there is a continuous vari-
able y∗i underlying the ordinal variable yi (Muth´en 1984, Lee et al. 1990, J¨oreskog
1990). This continuous variable y∗i represents the attitude underlying the ordered responses to yi and is assumed to have a range from −∞ to +∞ (Yang-Wallentin
et al. 2010). Notice that the notations used in this section are slightly different from the notations in the proportional odds model. Because this chapter illustrates a different approach, we decided to use the notations consistent with Yang-Wallentin et al. (2010).
The underlying variable yi∗ is unobservable and the ordinal variable yi is observed.
For an ordinal variable yi with mi categories, the association between the ordinal
variable yi and the underlying variable yi∗ is
yi = c ⇐⇒ τc−1(i) < y ∗ i < τ (i) c c = 1, 2, ..., mi where
τ0(i) = −∞, τ1(i) < τ2(i) < ... < τm(i)
i−1, τ
(i)
mi = +∞
are the threshold parameters. For variable yi with mi categories, there are mi − 1
strictly increasing threshold parameters τ1(i), τ2(i), ..., τm(i)
i−1 (Yang-Wallentin et al.
2010).
As the only ordinal information available is from yi, the distribution of yi∗ is de-
termined only up to a monotonic transformation. Let yi∗ have a standard normal distribution with density function ψ(.) and distribution function Ψ(.). Then the
4.2. BASIC CONCEPTS USED IN STRUCTURAL EQUATION MODELS 65
probability , πi
c, of the response, yi, being in category c is
πc(i) = P r[yi = c] = P r[τ (i) c−1 < y ∗ i < τ (i) c ] = Z τc(i) τc−1(i)
ψ(u)du = Ψ(τc(i)) − Ψ(τc−1(i) ) for c = 1, 2, ..., mi−1, so that
τc(i) = Ψ−1(π(i)1 + π2(i)+ ... + πc(i))
where Ψ−1is the inverse of the standard normal distribution function. The quantity (π1(i)+ π2(i)+ ... + π(i)c ) is the probability of a response being in category c or lower
(Yang-Wallentin et al. 2010).
The probabilities πc(i) are unknown population quantities, but can be estimated by
the corresponding proportions p(i)c of responses in category c on variable yi. Thus,
estimates of the thresholds can be obtained as
τc(i) = Ψ−1(p(i)1 + p(i)2 + ... + p(i)c ) c = 1, 2, ..., mi−1 (4.1)
The quantity (p(i)1 + p(i)2 + ... + p(i)c ) is the proportion of observations in the data
responding in category c or lower on variable yi (Yang-Wallentin et al. 2010).
4.2.5
Polychoric correlation
The polychoric correlation coefficient matrix is largely used to replace the covari- ance matrix. This helps with the estimation of the structural equation model parameters. The matrix estimates the linear relationship between two unobserved continuous variables underlying two ordinal variables (Flora & Curran 2004).
Let yi and yj be two ordinal variables with miand mj categories respectively. Their
marginal distribution in the data is represented by a contingency table
n(ij)11 n(ij)12 · · · n(ij)1m
j
n(ij)21 n(ij)22 · · · n(ij)2mj ..
. ... . .. ... n(ij)mi1 n(ij)mi2 · · · n(ij)mimj
where nijab is the number of observations in the data in category a on variable yi
and in category b on variable yj. The underlying variable yi∗ and yj∗ are assumed
to be bivariate normal with zero means, unit variance and with correlation ρij, the
polychoric correlation (Yang-Wallentin et al. 2010).
Let τ1(i), τ2(i), ..., τm(i)i−1 be the thresholds (as described in section 4.2.4) for variable yi∗ and let τ1(j), τ2(j), ..., τm(j)
i−1 be the thresholds for variable y
∗
j. The polychoric cor-
relations can be estimated by maximising the log-likelihood of the multinomial dis- tribution (see Olsson (1979) for more detail). Maximising the log-likelihood gives the sample polychoric correlation denoted as rij.
The polychoric correlation can be estimated by a two step procedure (Olsson 1979). In the first step, the thresholds are estimated from the univariate marginal distri- butions by equation 4.1. The second step involves the estimation of the polychoric correlations from the bivariate mardinal distributions by maximising log-likelihood for given thresholds (Yang-Wallentin et al. 2010).