Chapter 5. Methods
5.5. Theoretical discussion on statistical techniques
5.5.2. Polynomial Regression
The current research aims to understand how personality and
organisational culture differences (or similarities) affect certain outcomes such as relationship quality, word-of-mouth and purchase intension. A naïve approach might be to take the absolute difference between two measures and model the result to an outcome variable, but there are some problems
associated with this. To overcome these problems, the use of polynomial
regression analysis combined with response surface methodology is suggested. This section will begin with a brief explanation of how difference scores have been used, followed by a discussion of the problems associated with using difference scores, and will then discuss polynomial regression techniques and response surface methodology theory.
Difference scores have been used for many years to understand congruence between two variables and their effects on a predictor variable. Laird and De Los Reyes (2013) explain that difference scores are typically calculated as a simple subtraction of one from another. The reasoning behind this is to
establish the range that certain behaviour occurs over. They further explain that some difference scores may be calculated using the absolute measure or
squared measures, appropriate if the analysis is not focusing on superiority but rather the level of congruency or discrepancy.
Difference scores are largely employed when research is focused around certain dyadic relationships (Chaurasia and Shukla, 2013, consider the leader- member exchange dyad and Cai and Yang, 2008 consider the buyer-supplier dyad), or when research aims to find difference between two measurements (Proyer, Ruch and Buschor, 2013, conduct a pre-test and a post-test, then analyse the results using difference scores).
120
Despite the numerous prior works that employ a methodology using difference scores, few actually engage in the underlying issues. For example, from Garland, Aarons, Hawley and Hough (2003):
“We also examined simple correlations between difference scores on the clinical outcomes and satisfaction scores, and the pattern of results was very similar. However these results are not presented because of controversy over the use of difference scores.” (p. 1546)
There are several well-documented problems associated with difference scores (Berry, 1983; Cronbach & Furby, 1970; Edwards & Parry, 1993; Edwards, 2001; Johns, 1981; Peter, Churchill Jr, & Brown, 1993; Sternberg & Grigorenko, 2002; Thomas & Zumbo, 2012; Wall & Payne, 1973). Edwards (2002) provides a simple summary of the issues surrounding difference scores:
“Difference scores are often less reliable than either of their component measures. Difference scores are also inherently ambiguous, given that they combine measures of conceptually distinct constructs into a single score. Furthermore, they confound the effects of their component measures on outcomes and impose constraints on these effects that are rarely tested empirically. Finally, they reduce an inherently three dimensional relationship between their component measures and the outcome to two dimensions.” (p. 351)
Edwards and Parry (1993) suggest an improved way to analyse dyads. The problems with difference scores are highlighted, and then polynomial
regression is proposed as a way to overcome these problems. According to Bendapudi and Berry (1997), polynomial regression is better suited in the analysis of the agreement (or convergence) of two predictor variables determining an outcome variable, in the analysis of the discrepancy (or
121
the analysis of the direction of the discrepancy those two predictor variables have in determining an outcome variable.
Edwards (2002) provides a more technical discussion of how a polynomial regression the technique works, explaining that there are three basic principles and assumptions. These are summarised below:
“Firstly, congruence should be viewed not as a single score but instead as the correspondence between the component measures in a two
dimensional space. Secondly, the effect of congruence on an outcome should be treated not as a two-dimensional function, but rather as a three
dimensional structure relating the two components to the outcome. Lastly, the constraints associated with difference scores should not be imposed on the data, but instead should be treated as hypotheses to be tested
empirically.” (p. 360)
Although this technique is more complex than standard regression, it has provided some interesting results. For example, Glomb and Welsh (2005) investigate the personality dimension of control within the supervisor-
subordinate dyad, employing polynomial regression. Not only was support for the hypothesis found, but some specific points within the surface area graph were explained.
Although the work of Edwards and Parry (1993) can become rather involved, a summary of their argument is provided below. They explain that when using difference scores the following equation can be used to represent the dyadic nature, regressing onto an outcome variable. In the equation, X and Y represent two component measures, Z represents an outcome measure and e represents a random error term:
When this equation is rearranged it may be seen as:
122
Equation (1) suffers from several issues. Firstly, each component measure is constrained by a single coefficient value (b1) implying equal weight. Secondly,
this coefficient value for the one component is positive while the second is required to be negative. Lastly, this equation assumes that a linear relationship exists between the component values and the outcome variable. However, these constraints can be relaxed:
. (2)
Equation (2) represents the unconstrained representation of Equation (1). In this equation the component measures are split and have their own coefficient values (b1 and b2 respectively), allowing their magnitude and direction to alter.
Due to this alteration, in theory, Equation (2) can explain more variance than Equation (1).
The argument can be taken further. Although a linear relationship is most easily interpreted, it may not necessarily explain the greatest amount of variance within a dataset. Because of the need to maximize the variance explained, one method to increase the amount of variance explained might be to square the differences between the component measures, as written in the following equation:
The resulting equation suggests a curvilinear shape, showing that as the absolute difference between the two component measures increases and decreases so does the value of Z. When this equation is expanded it can be represented as:
(3)
Equation (3) suffers from similar problems to Equation (1), but it does so from a curvilinear perspective. Unlike the linear equation, this equation
imposes some additional constraints, specifically that the coefficients for X and Y will always be 0; that the sum of the coefficients of X2, XY and Y2 will always
123
be 0, and lastly the coefficients on X2 and Y2 are always equal. When these
constraints are relaxed, the following, more general equation is achieved:
(4) Given the numerous different equations, which would be best in
understanding the relationships, investigated in the current study? The best equation would be the equation that allows the most amount of variance to be explained (the equation which yields the highest R2 value would be best suited).
In addition to providing the above equations, Edwards and Parry (1993) also provide a framework for the interpretation of the output. The framework is built around response surface methodologies, which several authors have documented (Khuri & Mukhopadhyay, 2010; W. R. Myers & Montgomery, 2003). Carley, Kamneva and Reminga (2004) explain that response surface methodology is “useful for developing, improving and optimizing processes” (p. 1). Bezerra, Santelli, Oliveira, Villar and Escaleira (2008) explain response surface methodology as follows:
“[Response Surface Methodology] consists of a group of mathematical and statistical techniques that are based on the fit of empirical models to the experimental data obtained in relation to experimental design. Toward this objective, linear or square polynomial functions are employed to describe the system studied and, consequently, to explore (modeling and displacing) experimental conditions until its optimization” (p. 966)
Baş and Boyacı (2007) describe the process of converting the mathematical equations into a graphical representation of the predicted model. The graphical representation attained is a theoretical three-dimensional plot showing the relationships found within the predicted model. A plot would generally contain contour lines depicting the shape of the graph. When these contour lines form ellipses or circles, a stationary point may be calculated, something that can also be done by calculating where the derivative of the second-order equation is
124
equal to zero. In the current study, several three-dimensional plots will be generated.
Returning to the discussion of the framework of Edwards and Parry (1993), it is suggested that there are three key features which should be addressed for each surface. The first feature that should be addressed is the stationary point of the graph. Such a point occurs at a minimum, maximum or saddle point. The second feature that should be interpreted is the principal axes. These axes run perpendicular to each other and intersect at the stationary point. The last feature that should be addressed is the slope along various lines of interest. Typically the researcher would be interested in congruence or incongruence between two elements and how they affect the outcome variable; the line of congruence can be found where Y = X, while the line of incongruence can be found where Y = -X. It is important to remember that the principal axes and the lines of congruence or incongruence may not be the same.