Model Assumptions and Data Set Requirements

2 Theoretical Framework and Conceptual Fundamentals

3.3 Methodology

3.3.2 Model Assumptions and Data Set Requirements

In general, multilevel modeling relies on the basic assumptions of general linear models such as normality, homoscedasticity, and linearity.460 Furthermore, applying multilevel analysis is only necessary when there are substantial differences between groups. Be- yond these assumptions, the dataset should fulfill certain requirements regarding the sampling.

Normal distribution, homoscedasticity and linearity: Residual error terms on the lowest

level are assumed to be normally distributed with a mean value of zero and variance σ². Similarly, residual error terms for random effects on higher levels are assumed to approx- imate multivariate normal distribution. Residual variances on all levels are assumed to be constant across different parameter values of the explanatory variables.461 For this study, no substantial violations of these requirements were identified. The corresponding results of a post estimation analysis of residuals are provided in Section Fehler! Verweisquelle

konnte nicht gefunden werden..

Hierarchical linear modeling assumes linear relationships between predictors and the out- come variable. A visual inspection of bivariate scatter plots and locally weighted scatter plot smoothing curves for all independent variables and ‘Logvalue’ came to the conclusion that linear relationships can be assumed.462 Appendix 1 displays the corresponding scatter plots.

Independence of errors: The assumption of independence is not retained, since multilevel

analysis explicitly models group effects at different levels. In fact, multilevel modeling is only advised when the assumption of independence of errors is violated, resulting in significant differences between groups. Whether or not multilevel modeling should be applied is tested on the basis of the Intraclass Correlation Coefficient (ICC) that measures the ra- tio of variance on context level and the total variance within those contexts. In this way, the ICC indicates the share of variance that is explained by the grouping structure and is an indicator of the degree of heterogeneity between contexts. Thus, it can also be interpreted as the maximum proportion of variance that can be explained by adding predictors on a specific level.463 Formula shows the corresponding equation.

460

For reasons of space and the focus of the study, this section only emphasizes particularities of multilevel modeling in more depth at this point. See, for instance, TABACHNICK/FIDELL (2007), pp. 78-85 for a more detailed discussion of assumptions in general linear models.

461

In fact, KORENDIJK et al. (2008) found that, in a two-level model with unequal variances, only the second level standard error of the second level variance is underestimated. There were no substantial biases of fixed effects, first level variances, and their standard errors. Nevertheless, heteroscedasticity can also be explicitly integrated in multilevel models. See SNIJDERS/BOSKER

(2012), pp. 119-129, 161 for a detailed discussion of ways to model heteroscedasticity.

462

For a detailed explanation of locally weighted scatter plot smoothing, see CLEVELAND (1979), pp. 829-836.

463

Intraclass correlation coefficients near zero suggest that applying multilevel modeling is not necessary, since there is no substantial variance between contexts. For this study, significant differences between contexts were found on all levels, indicating that a multilevel modeling approach is reasonable. A discussion of the results is provided in Section

Fehler! Verweisquelle konnte nicht gefunden werden..

Formula 1: Population Intraclass Correlation Coefficient

Source: HOX (2010), p. 15.

Random sample: On all levels, observations should be based on a random sample. Obvi-

ously, this assumption is not met in many cases. Especially in studies where regional or social structures are taken into account, then random sampling is often impracticable. BRAUN et al. (2010, S. 20) point out that this aspect is commonly neglected in research, and context and single observations are usually treated as being drawn randomly. How- ever, inferences should not be made beyond the sample groups in these cases.464 In this work, the assumption is violated as well due to data availability restrictions and the preva- lent concentration of real estate markets on specific regions. A random selection of property sub-markets would not have been acceptable from a real estate perspective, since it might lead to a neglecting of Germany’s largest office markets. In accordance with standard practices, the sample is subsequently treated as being random, but special attention is paid to the corresponding limitations of the study results.465

Sample size: The number of observations at the different levels in a data hierarchy, which

are needed in order to obtain valid standard error estimations, is something that is contin- uously being discussed. Especially suggestions concerning the number of groups on higher levels strongly vary. MAAS/HOX (2005) state that estimations of standard errors are too small in cases where the number of higher level contexts is below 50. However, the authors suggest that, even with 30 observations at the context level, acceptable results might be achieved, although these might be less precise. According to SNIJDERS (2003), less than 20 cases substantially limit the power of analysis results, and “sample sizes less than 10 should be regarded with suspicion”466_.

Based on a Monte Carlo simulation, STEGMUELLER (2013) similarly states that in complex multilevel models with higher level predictors and interaction effects, a group sample be-

464

See BRAUN et al. (2010), pp. 20-21; HINZ (2005), p. 269. See also LUCAS (2014), pp. 1619-1649 for a detailed discussion of this matter, and GHITZA/GELMAN (2013), pp. 763-764 for an applica- tion of poststratification as a potential approach to generalize results from multilevel analysis.

465

See HOX (2010), p. 1.

466

SNIJDERS (2003), p. 676.

= intraclass correlation coefficient = variance of higher-level error = variance of lowest-level error

low 20 results in confidence intervals that are almost 5% too short.467 For studies focusing on random effects, SNIDERS/BOSKER (2012) suggests a sample size of 30 or higher.468 However, depending on the field of study, the sample size available might be limited. Es- pecially in studies with a geographic component, research publications apply to multilevel modeling of data structures with a relatively small number of groups at the highest level. BRAUN et al. (2010) list several studies with a sample size between 16 and 35 at the con-

text level. In their own study, the authors build upon 27 cases (Members of the European Union).469

Sample size requirements are less restrictive on the lower levels. GELMAN/HILL (2007) state that, if a certain degree of inaccuracy is accepted, ”even two observations per group is enough to fit a multilevel model.”470_{Equally, D}

ITTON (1998) suggests a group size of at least two observations, and MOSSHOLDER/BENETT/MARTIN (1998) recommend a minimum number of three observations per group.471 In repeated measure analysis, any lacking measurements can be tolerated so that for longitudinal studies, ”(…) group sizes may be as small as one, as long as other groups are larger (…).”472

Until now, no uniform convention on sample size has been developed, and there is “(…) no strong evidence to guide researchers in their multilevel design decisions.”473_However,

there is general agreement that (1) the number of observations on higher levels is more important than on lower levels, (2) the sample size that matters most refers to the level on which the effect of interest is measured, and (3) the average cluster size is of minor im- portance for the power of multilevel analyses.474

In this study, the general sample size requirements are fulfilled. An amount of 1,118 observations are available at the property level, where the effect of brand status is measured. However, it should be noted that, at city level, only 20 cases were considered in the dataset. Even if there is only one independent variable assigned to this level, results concerning random effects and cross-level interactions should be interpreted carefully, since estimated standard errors might be too small, leading to potential biases.

In document Property brand management - applying causal analysis to develop a strategic management tool for office property brands (Page 113-115)