James Algina H. J. Keselman Randall D. Penfield
University of Florida University of Manitoba University of Miami
The squaredmultiplesemipartialcorrelationcoefficient is the increase in the squaredmultiplecorrelationcoefficient that occurs when two or more predictors are added to a multiple regression model. Coverage probability was investigated for two variations of each of three methods for setting confidenceintervals for the population squaredmultiplesemipartialcorrelationcoefficient. Results indicated that the procedure that provides coverage probability in the [ .925, .975 interval for a 95% confidence interval ] depends primarily on the number of added predictors. Guidelines for selecting a procedure are presented. Key words: Squaredmultiplesemipartialcorrelation; effect size; asymptotic and bootstrap confidenceintervals.
regression coefficient, correlational measures such as the correlation, the squared corre- lation and the squared partial correlation (The coefficient of partial correlation is defined as the correlationcoefficient between two sets of variables keeping a third set of variable constant.), and measures based on a combination of the regression coefficients and the cor- relations such as the product of the correlation between a predictor and the outcome and the corresponding standard regression coefficient (Azen and Budescu, 2003; Budescu, 1993). However, all these measures do not offer an intuitive and universal interpretation of im- portance which leads to different orderings of the predictors’ importance and confusion on the meaning of importance. Budescu (1993) suggested that an appropriately general mea- sure of importance should satisfy the following three conditions: “(a) Importance should be defined in terms of a variable’s ‘reduction of error’ in predicting the outcome; (b) The method should allow for direct comparison of relative importance instead of relying on inferred measures; (c) Importance should reflect a variable’s direct effect, total effect and partial effect.” According to these criteria, Budescu (1993) developed a new methodology – dominance analysis, in which a predictor is considered to be dominant or more important than another predictor if its additional contribution in the prediction of the response vari- able defined as the squaredsemipartialcorrelation (i.e., the difference between two squaredmultiple correlations from nested models), is greater than the competitor’s for all possible subset models. In a word, one can identify the relative importance of predictors through a series of pairwise comparisons of squaredmultiple correlations from all submodels.
is adequate statistical power to reject the null hypothesis that the population squaredmultiplecorrelationcoefficient (denoted with an uppercase rho squared, P 2 ) equals zero. Planning sample size from this perspective is well known in the sample size planning literature (e.g., Cohen, 1988; Dunlap, Xin, & Myers, 2004; Gatsonis & Sampson, 1989; Green, 1991; Mendoza & Stafford, 2001). However, with the exception of Algina and Olejnik (2000), sample size planning when interest concerns obtaining accurate estimates of P 2 has largely been ignored. With the emphasis that is now being placed on confidenceintervals for effect sizes in the literature, and with the desire to avoid “embarrassingly large” confidenceintervals (Cohen, 1994, p. 1002), planning sample sizes so that one can achieve narrow confidenceintervals continues to increase in importance. Planning sample size when one is interested in P 2 can thus proceed in (at least)
Note that for the binomial distribution, it is a special case of this paper. A related result for calculating the conﬁdence coefﬁcients of conﬁdence intervals for a binomial proportion is referred to Wang . The techniques used for the binomial distribution cannot be directly applied to the multinomial distribution with k > 2 because there is only one variable and one unknown parameter p 1 involved. For the multinomial distribution case with k > 2, there are at least two
The intraclass correlationcoefficient (Fisher, 1925) originated from genetics and was first applied to social science and then to medical science thereafter to assess agreement or reliability between observers. We focus on ICCs for assessing agreement between ob- servers where there are only subject and observer effects in the models. Extensions of ICC for data with other factors, such as repeated measures, have been proposed by Vangeneug- den et al. (2005) and are not considered in this chapter. The original ICC was based on the one-way ANOVA model. Extensions of this ICC lead to other versions of ICCs based on the two-way ANOVA models (Bartko, 1966). Because different versions of ICCs can give different results depending on the chosen ANOVA models (Bartko, 1966; Shrout and Fleiss, 1979; M¨ uller and B¨ uttner, 1994; McGraw and Wong, 1996), M¨ uller and B¨ uttner (1994) proposed simple rules to choose a suitable ICC with respect to the underlying data setting. However, researchers may compute these ICCs without verifying the assumptions and the ICC is biased if the ANOVA assumptions are not met. Therefore, there is a need to get a sense of the population parameter that the ICC estimator provides under a general setting.
Suppose a set of measurements is made on an individual, recorded as a vector y. For example, in a medical context where a patient might undergo a set of diagnostic tests, each component of y might be the measurement on one test. In psychology, a person might perform a number of tests and y could be the individual’s proﬁle of test scores. In commerce, y might be a number of measurements made on a manufactured item. Mahalanobis distance can be used to calibrate the degree to which y diﬀers from the mean of a normative population. The squared Mahalanobis distance, δ 2 say,
ble resampling-based alternative for quantifying uncertainty. We describe the basic characteristics of the non-parametric bootstrap and illustrate its practical behaviour with simulations in the context of a typical task in ma- chine learning - estimating and comparing the performance of different prediction models. We also present some of the method’s weaknesses. We introduce and compare three standard intervals: the standard normal us- ing bootstrap standard error and two more typical bootstrap confidenceintervals, the percentile and the BCa interval. As theory suggests, the BCa performs the best over a wide range of situations.
As will be seen, the confidenceintervals that we have constructed for the above interval estimation problems are conceptually simple and straightforward in terms of implementation. The performance of the proposed confidenceintervals are assessed based on simulations, and illustrated using several examples. In terms of maintaining the coverage probabilities, the proposed confidenceintervals turn out to be quite satisfactory, regardless of the sample size. Our overall conclusion is that the generalized confidence interval approach and the fiducial approach have resulted in a unified methodology for the interval estimation of various epidemiological measures, and the resulting confidenceintervals exhibit satisfactory performance and are preferable to the likelihood based methods available in the literature. 2. Generalized Pivotal Quantities and Fiducial Quantities
• Synthetic error, which is not included in the loss function analysis, may arise from two sources. One source of synthetic error involves correcting the individual post-stratum estimates for errors estimated at more aggregate levels (such as the corrections for correlation bias and coding errors). Another source of synthetic error is variations of census coverage within post-strata (something not captured by synthetic application of post-stratum coverage correction factors for specific areas). Analyses based on artificial populations that simulated patterns of coverage variation within post-strata were done to assess whether omission of resulting synthetic biases from the loss function analysis tilted the comparisons in one direction or another. These analyses did not in general change the loss function results, though they had some limitations (Griffin 2002). It should be kept in mind that synthetic error is expected to be more important the smaller are the areas whose estimates are being compared, so that any limitations of the loss functions regarding synthetic error would be expected to be more important in comparisons for small places or counties than for large places or counties.
demand, the Allen and Morishima elasticities of substitution, income and expenditure elasticities defined for Engle curves, and long-run elasticities defined in dynamic models can be defined as nonlinear
functions of the estimated parameters. In addition, although most demand specifications imply that these elasticities of interest vary by prices, income, or level of output, it is frequently the case that there is little attempt to draw inferences at more than a single point and for only one level of significance. In this paper we demonstrate how these bounds can be generalized to consider multiple values and how the implied cumulative density function of the estimated elasticity can be used to visualize the relationship between the level of significance and the inferences drawn.
the interval on a model-averaged estimator and determine the width of the interval by an estimate of the standard deviation of this estimator. The distribution theory on which these intervals are based is not (even approximately) correct (Claeskens and Hjort, 2008, p.207) but simulation studies report that these intervals work well in terms of coverage probability in particular cases (Lukacs et al., 2010; Fletcher and Dillingham, 2011). A different approach was proposed by Hjort and Claeskens (2003) but this turns out to be essentially the same as the standard confidence interval based on fitting a full model (Kabaila and Leeb, 2006; Wang and Zou, 2013). More recently, Fletcher and Turek (2011) and Turek and Fletcher (2012) have proposed averaging confidence interval construction procedures from each of the possible models. Fletcher and Turek (2011) averaged the profile likelihood con- fidence interval procedure and Turek and Fletcher (2012) averaged the tail areas of the distributions of the estimators from each of the possible models.
intervals are generally more useful than a p-value alone, as they offer greater consideration of the size of the effect with particular consideration to the minimally clinically important difference (MCID). The MCID for PRWE is six points, reflecting the difference between turning a doorknob with mild pain, versus no pain. Even the most extreme values predicted by the DRAFFT study in the general population (−4.5 to 1.8) are less than the six-points considered necessary to be clinically significant. Thus, confidenceintervals show us that even if the most extreme value of -4.5 points were the true effect in the general population, this would still be lower than the MCID and not be of clinical significance.
The following limitation of impact numbers should be considered. It may be difficult for users to understand pos- itive and negative values of effect measures. In the case of the risk difference it is possible to switch between ARI and ARR. However, negative results for PAR and AF e are not useful in practice. Thus, in the case of protective expo- sures, alternative effect measures such as the preventable fraction are applied in practice . This procedure leads to easily interpretable point estimators in practice but does not solve the problem of difficulties with confidenceintervals. In the case of statistically non-significant results, the lower confidence limits for ARI, PAR, and AF e would
5 An Empirical Illustration
In this section we use real data to illustrate the confidenceintervals proposed in this paper. The data were originally analyzed by Meyer, Viscusi, and Dubin (1995), who wanted to learn how an increase in the level of disability benefits aﬀects the number of weeks a worker spent on disability; this variable is measured in whole weeks, and its distribution is highly skewed. The increase in benefits applies only to high-earning workers, not to low-earning ones. Meyer, Viscusi and Dubin estimated diﬀerence-in-diﬀerence models of the form
e) Wider. More confidence means less precision. f) Possibly. The wider interval may contain 0. 3.
Data are paired for each city; cities are independent of each other; less than 10% of all European cities; boxplot shows the temperature differences are symmetric with no outliers. This is probably not a random sample, so we might be wary of inferring that this difference applies to all European cities. Based on these data, we are 90% confident that the average temperature in European cities in July is between 32.3° and 41.3°F higher than in January.
The multiplecorrelation coeﬃcient is used in a large variety of statistical tests and regres- sion problems. In this article, we derive the null distribution of the square of the sample multiplecorrelation coeﬃcient, R 2 , when a sample is drawn from a mixture of two multi- variate Gaussian populations. The moments of 1 − R 2 and inverse Mellin transform have been used to derive the density of R 2 .
Concept of GV has recently become popular in small sample inferences for complex problems such as Behrens-Fisher problem. These techniques have been shown to be efficient in specific distributions by using MLEs. The GV method was motivated by the fact that the small sample optimal CIs in statistical problems involving nuisance parameters may not be available. The method of generalized confidence interval (GCI) based on GV is used whenever standard pivotal quantities either do not exist or are difficult to obtain. Weerahandi (1993) introduced the concept of GCI. As described in the cited papers, GCI is based on the so-called generalized pivotal quantity (GPQ). For some problems, where the classical procedures are not optimal, GCI performs well. Krishnamoorthy and Mathew (2003) developed exact CI and tests for single lognormal mean using ideas of generalized p-values and GCIs. Guo and Krishnamoorthy (2005) explained a problem of interval estimation and testing for the difference between the quantiles of two populations using GV approach. Krishnamoorthy et al. (2006) explained generalized p-values and CIs with a novel approach for analyzing lognormal distributed exposure data. Krishnamoorthy et al. (2007) explained a problem of hypothesis testing and interval estimation of the reliability parameter in a stress-strength model involving two-parameter exponential distribution using GV approach. Verrill and Johnson (2007) considered confidence bounds and hypothesis tests for coefficient of variation of normal distribution. Kurian et al. (2008) have provided GCI for process capability indices in one-way random model. Krishnamoorthy and Lian (2012) derived generalized TIs for some general linear models based on GV approach. The literature survey reveals that during last ten years number of researchers have reported inference for the well known models using GV approach, which motivated us to consider the problem of generalized CI and generalized TI for Pareto-Rayleigh distribution. Rest of the paper is organized as follows.
Australia. Point estimate 1.35 in Table 4 may be regarded as the expected value of this distribution. The plot provides a useful visual impression of the sampling variability associated with this estimation. It can be seen that the shape of the distribution is far different from that of a normal distribution; this departure from normality is clear in the Q-Q plot presented in Figure 3, as the plot deviates from the 45° line. It is right-skewed with a higher probability mass on the right-hand side of the distribution. The 90% confidence interval calculated is [1.02, 1.78], where 1.02 and 1.78 are the 5th and 95th percentiles of the plotted distribution. This interval represents well the degree of variability observed in the plotted distribution and also captures its asymmetry; that is, the distribution is asymmetric around point estimate 1.35. Conventional confidenceintervals based on normal approximation provide a symmetric interval around the point estimate and are associated with a substantial underestimation of variability (for further details, see Li and Maddala, 1999).