• No results found

4. Results and Discussion

4.2 Model diagnostics – ordinary least-squares

In this section, we will examine the fitting for those candidate models. The model diagnostics primarily focus on examining residuals. Checking the pattern of residuals carefully can tell us whether the model assumptions are reasonable and our choice of model is appropriate.

Residuals can be considered as elements of variation unexplained by the fitted model. Generally speaking, we expect the residuals to be normally and independently distributed, and also with a mean of zero and some constant variance. Violation of those assumptions usually indicates that the residuals contain some structure that is not accounted for in the fitted model. Then improvement of model fitting is required.

The graphic method is an excellent way to examine residuals. Generally, the common diagnostic tools include the plot of residuals versus fitted values and Quantile-Quantile plot (QQ plot).

Ideally, the residuals versus fitted values plot should show the residuals spread randomly around zero, regardless of the size of the fitted value. However, it is quite common to see that residual values increase as the size of the fitted value increases. The residual cloud looks like a ‘funnel’ shaped pattern with the larger end toward larger fitted values. This pattern suggests that the model has non-constant variance. The constant variance is one of the important assumptions of the ordinary least-squares method. If this assumption is violated, the results based on ordinary least-squares are not efficient.

A QQ plot is an excellent way to see whether the data deviates from normal distribution. It is similar to a probability plot. It shows the quantiles of the residuals against the quantiles of theoretical normal distribution. There is a complementary 45-degree reference line. The greater the departure from this reference line, the greater the evidence for the conclusion that the residuals are far from a normal distribution.

In the following sections, we are going to examine the diagnostic plots for each candidate model to assess the appropriateness of the model fitting.

35 4.2.1 Singletons – cut-down model

Figure 4. 6: Residuals vs. Fitted values (singletons – cut-down model)

The Figure 4.6 shows the residuals versus fitted values plot for the cut-down model of singletons. It seems that the residuals spread wider as the size of fitted value increases, especially for fitted values larger than 3. This indicates the variance is non-constant in this model. It also suggests that the ordinary least-squares fit seems to be inappropriate as the assumption of constant variance is violated.

Figure 4. 7: QQ plot (singletons – cut-down model)

0 1 2 3 4 5 6 7 8 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Residuals vs. Fitted values (Rattray singleton-cutdown)

Fitted value R es idual s Res-H Res-L -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Standard Normal Quantiles

Q uant iles of I nput S am pl e

36

The figure 4.7 shows the QQ plot for the cut-down model of singletons. It shows that the middle points seem to be quite close to the reference line. However, some points near two ends deviate a lot from the reference line. This challenges the assumption of normality. The diagnostic plots in this section suggest that the ordinary least-squares fit for the cut-down of singletons seems to be inappropriate. The assumptions of constant variance and normal distribution seem to be violated. The model fitting based on ordinary least-squares is inadequate. As a consequence, the estimated values of parameters are not able to be trusted.

4.2.2 Twins – cut-down model

Figure 4. 8: Residuals vs. Fitted values (twins – cut-down model)

The figure 4.8 shows the plot of residuals versus fitted values for the cut-down model of twins. It shows obviously that the variance increases as the size of the fitted value increases. The residual cloud forms a ‘funnel’ shaped pattern with the larger end toward the larger fitted values. This pattern illustrates clearly there is non-constant variance in this model. This suggests that the ordinary least-squares is inadequate.

0 2 4 6 8 10 12 -1.5 -1 -0.5 0 0.5 1

Residuals vs. Fitted values (Rattray Twins - cutdown)

Fitted value R es idual s Res-H Res-L

37 Figure 4. 9: QQ plot (twins – cut-down model)

The figure 4.9 shows the QQ plot for cut-down model of twins. It shows that the majority of the points lie close to the reference line. Just a few points at two ends show some deviation from the reference line. It indicates that the assumption of normality may still hold.

4.2.3 Twins – hybrid model

Figure 4. 10: Residuals vs. Fitted values (twins – hybrid model)

The figure 4.10 shows the plot of residuals versus fitted values for hybrid model of twins. It shows an obvious trend that variance increase as increasing size of the fitted value. This

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -1.5 -1 -0.5 0 0.5 1

Standard Normal Quantiles

Q uant iles of I nput S am pl e

QQ plot (Twins - cutdown model - ordinary least-squares)

0 2 4 6 8 10 12 -1.5 -1 -0.5 0 0.5 1

Residuals vs. Fitted values (Rattray twins - hybrid)

Fitted value R es idual s Res-H Res-L

38

indicates that the assumption of constant variance is violated. The estimation of parameter values based on ordinary least-squares fit is not reliable.

Figure 4. 11: QQ plot (twins – hybrid model)

The figure 4.11 shows the QQ plot for hybrid model of twins. It shows that the majority of the middle points seem to be quite close to the reference line. A few points around two ends are quite far from the reference line. This illustrates that the normality assumption seems to be violated.

4.2.4 Summary

The model diagnostic plots presented in this section are used to assess the adequacy of model fitting for candidate models. The plots above show that the assumption of constant variance is violated for all of the candidate models. The assumption of normality is challenged for most of those models. The estimation of parameters based on ordinary least-squares is not reliable in this case, as the assumption of constant variance is violated. Hence an improvement in the model fitting is required to produce accurate and reliable values of the parameters.

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -1.5 -1 -0.5 0 0.5 1

Standard Normal Quantiles

Q uant iles of I nput S am pl e

39

Related documents