Evaluation To measure the quality of the conditional quantile approximations, loss func- tion (3) is used in conjunction with 5-fold cross-validation. The employed loss function measures the weighted absolute deviations between observations and quantiles, instead of the more common squared error loss. The minimum of the loss would be achieved by the true conditional quantile function, as discussed previously. The empirical loss over the test data is computed for all splits of the data sets at quantiles α ∈ {.005, .025, .05, .5, .95, .975, .995}. Additionally to the average loss for each method, one might be interested to see whether the difference in performance between quantileregression forests and the other methods is significant or not. To this end bootstrapping is used, comparing each method against quantileregression forests ( QRF ). The resulting 95% bootstrap confidence intervals for the difference in average loss is shown by vertical bars; if they do not cross the horizontal line (which marks the average loss of QRF ), the difference in average loss is statistically significant. Results are shown in Figure 1.
When the model contains many predictors, variable selection plays an important role in the model building process to obtain a better interpretation and to improve the precision of model fit. In Chapter 3, we review several variable selection methods in the quantileregression, ranging from the frequentist approaches to the Bayesian procedures. All of those methods estimate the regression coefficients at some fixed quantile value. However, if our purpose is, among all the quantileregression models, to identify which one fits the data best, then the traditional quantileregression may not be appropriate. For example, given a range of quantile, (0.1, 0.2, · · · , 0.9), we could fit 9 different regression models according to each quantile value, we are interested in which one is the most probable one to exact the most information from the data. That is, which model could reflect the inner relationship of the data and which quantile would be the most likely one. In such cases, those questions can be easily answered if we consider the quantile as an unknown parameter and estimate it from the data. Therefore, in order to extract important information from the data itself, we consider the quantile as an unknown parameter and estimate it jointly with other regression coefficients. The detail algorithm is discussed in Chapter 4.
Other authors have applied quantileregression in a financial framework. For instance, Engle and Manganelli (2004) proposed the CAViaR model to estimate the conditional Value-at-Risk, an important measure of risk that financial institutions and their regulators employ. 2 White et al. (2008) generalized the CAViaR to the Multi-Quantile CAViaR (MQ- CAViaR) model, studying the conditional skewness and kurtosis of S&P 500 daily returns. White et al. (2010) extended the MQ-CAViaR model in the multivariate context to measure the systemic risk, a critical issue highlighted by the recent financial crises, taking into account the relationships among 230 financial institutions from around the world. Li and Miu (2010) proposed, on the basis of the binary quantileregression approach, a hybrid bankruptcy prediction model with dynamic loadings for both the accounting-ratio-based and market-based information. Castro and Ferrari (2014) used the ∆CoVaR model as a tool for identifying/ranking systemically important institutions. Finally, Caporin et al. (2014b) adopted quantile regressions in detecting financial contagion across bond spreads in Europe. Our work is closely related to the contribution of Zikes and Barunik (2013). However, our analysis differs in several important points. First of all, Zikes and Barunik (2013) estimated volatility by means of a realized measure that takes into account only the effects of jumps in the price process. Differently, we use the realized range-based bias corrected bipower variation, which considers the impact of microstructure noise as well as that of jumps. To the best of our knowledge, quantileregression methods for the analysis of realized range volatility measures have never been used in the econometric and empirical financial literature, which provides a strong motivation for our study. Secondly, Zikes and Barunik (2013) used a heterogeneous autoregressive quantile model, whereas we also made use of conditioning
The majority of known estimation approaches for model (1) were constructed on either least squares or likelihood based methods. Thus, these approaches are expected to be sensitive to outliers. In contrast to the stated approaches, quantileregression (QR) (Koenker & Bassett, 1978) provides a robust alternative. It supplies us with a full statis- tical analysis of the stochastic relationships among the predictors and the response variable. QR has been applied in different fields such as econometrics, finance, microarrays, medical and agricultural studies–see Koenker (2005) and Yu et al. (2003) for more details. Many researchers have studied the QR methods in the literature; see for example, He and Shi (1996), He et al. (2002), Lee (2003), Cai and Xu (2009), Wang et al. (2010), Kai et al. (2011), among others.
In the first method, we proposed a regression calibration based approach to the censored quantileregression model. We assumed that there exists a linear associa- tion between the accurately measured covariate and its surrogate/auxiliary covariate and other available covariates. First we predicted the unobserved covariate in the non-validation sample using the regression calibration method with the help of the auxiliary covariate and other available covariates. In the next step, we combined the accurately measured covariate readings from the validation sample with the predicted key exposures in the non-validation sample to estimate the censored quantile regres- sion parameters. We developed a new estimating function based on Peng and Huang [2008] censored quantileregression estimating function. We also provided its asymp- totic properties such as consistency and the asymptotic normality of the estimators. In the simulation study, we compared our proposed method with the results based solely on the validation sample and the completely known main exposure scenario. The standard error of the parameter estimates of our proposed method is always smaller than the one using only the validation sample, irrespective of the value of σ 2
the work of Ji et al (2012) on binary quantileregression models with the use of a group lasso penalty. Our model is derived in the framework of probit binary regression and offers an alternative to the mean-based logistic regression model with group lasso penalty (Meier et al 2008), when the response is binary, the predictors have a natural group structure and quantile estimation is of inter- est. In section 2 we describe the model, in section 3 we describe the estimation of the parameters in a Bayesian setting, in section 4 we discuss how the model is used for prediction, in sections 5 and 6 we compare the performance of the method with related mean-based and quantile-based regression approaches on simulated and real data. Finally, in section 7, we draw some conclusions.
For a more fundamental treatment of binary quantileregression, we refer to Manski (1975, 1985), Kordas (2006) and Benoit and Van den Poel (2012) for an overview. As pointed out above, because quantileregression is able to accommodate for non-normal errors, binary quantileregression would be an appropriate tool to classify samples which belong to one of two different categories. Moreover, the proposed Bayesian approach can deal with a high dimensional predictor space and this is rarely the case for the optimization algorithms that are normally used for frequentist quantileregression. A re- cent exception here is Zheng (2012). Other frequentist approaches to binary quantileregression can be found in Manski (1975, 1985) and Kordas (2006).
mean of daily snowfall y for a given maximum temperature x. But, the least squares curve does not provide information about extreme heavy snowfalls that may cause damage. The quantileregression method will be able to estimate the high conditional quantiles. We will discuss this example further in Section 5.
posed a flexible Bayesian quantileregression model for independent and clustered data. The authors assumed that the error distribution is an infinite mixture of Gaussian densities. They called their method “flexible” because it does not impose a parametric assumptions (e.g., asym- metric Laplace) or shape restrictions on the residual dis- tribution (e.g., mode at the quantile of interest) as with other approaches (personal communication with Reich).
In this chapter we introduce a mixed modeling framework for quantileregression with these necessary attributes. We accomplish these methodological innovations by extending the model of Reich and Smith (2013) to accommodate autocorrelation and multiple responses. In the random component we account for the dependence across time and blood pressure type via a copula (Nelsen, 1999). This permits the relationships between the covariates and the two responses to share information and enables probabilistic statements about SBP and DBP jointly. Our copula approach maintains the marginal distributions of the group quantile effects while accounting for within-subject dependence, enabling inference at the population and subject levels. Copulas previously utilized in the longitudinal literature (Sun et al., 2008; Smith et al., 2010) focused on mean inference and do not account for predictors. Copulas have a straightforward connection to quantile function modeling, as both rely on connecting the response to a latent uniformly distributed random variable. Our copula model resembles the usual mixed model (Diggle et al., 2002) in that covariates affect both the marginal population distribution via fixed effects and subject specific distributions via random slopes. In the fixed component we allow for different predictor effects across quantile level, blood pressure type and year. Our model is centered on the usual Gaussian mixed model and contains it as a special case.
Since its introduction as a generalization of the linear regression model, quantileregression (Bassett & Koenker 1978, Koenker & Bassett 1978) has been widely used in economics, …nance, biostatistics and medical statistics - see Koenker (2005) for a review of applications. Compared to standard linear regression models, quantileregression models provide a more complete char- acterization of the conditional distribution of the responses given a set of covariates, being at the same time more robust to the presence of possible outliers. Nonparametric and semiparametric extensions to quantileregression have been considered by Chauduri (1991), Fan et al. (1994), He & Shi (1996), Chauduri et al. (1997), Yu & Jones (1998), He & Liang (2000), Lee (2003), Horowitz & Lee (2005), Cai & Xu (2008) and Cai & Xiao (2012) among many others.
In recent years, high-dimensional data in which the number of covari- ates p is larger than the number of observations n (p n), has become increasingly common. This problem can be found in many different areas like computer vision and pattern recognition [Wright et al., 2010], climate data over different land regions [Chatterjee et al., 2011], and prediction of cancer recurrence based on patients genetic information [Simon et al., 2013]. In these scenarios, variable selection gains special importance offering sparse modeling alternatives that help identifying significant covariates and enhancing prediction accuracy. One of the first and more popular sparse regularization alternatives is LASSO, which was proposed by [Tibshirani, 1996] and adapted to the QR framework by [Li and Zhu, 2008], that devel- oped the piece-wise linear solution of this technique. LASSO is a technique that penalizes each variable individually, enhancing thus individual sparsity. However, in many real applications variables are structured into groups, and group sparsity rather than individual sparsity is desired. One can think for example in a genetic dataset grouped into gene pathways. This prob- lem was faced by the group LASSO penalization of [Yuan and Lin, 2006], and opened the doors to more complex penalizations like the sparse group LASSO [Friedman et al., 2010], which is a linear combination of LASSO and group LASSO providing solutions that are both between and within group sparse. With the same objective in mind, [Zhou and Zhu, 2010] proposed a hierarchical LASSO. To the best of our knowledge, the SGL technique has not been studied in the framework of QR models, so this gap is addressed first, extending the SGL penalization to quantileregression.
The findings presented in Table 2 entail OLS, LAD and QR estimates. OLS estimates provide a baseline of mean effects and we compare these to estimates of LAD and separate quantiles in the conditional distributions. In the first column, an OLS model is estimated in order to not only identify with the significance of the parameters relevant variables, but also, and especially, to demonstrate the utility of conducting quantileregression, providing variability parameters for different levels of distribution, unlike a single linear model. Likely to Correct for heteroskedasticity, we present white-corrected standard errors. The estimate (OLS) confirms the relationship Lynn and Vanhanen (2006). The average IQ is solidly negatively related to poverty. More intelligence is associated with less poverty at the national level. This is what is shown in the graph below. It is normal to find the same result as we use here the same technique they.
I regress several other measures of international reserves on the buffer stock variables to see if the results are robust. Figure 2 presents a graphical summary of the quantileregression results when four different measures of reserve holdings are used. Each figure plots the quantileregression point estimates for θ in increments of 0.05, ranging from 0.05 to 0.95, as the solid curve. This solid curve illustrates the change in coefficient estimates as we move from one quantile to another, holding other independent variables constant. The shaded area around the solid line shows the 90% confidence interval constructed by the 1000 bootstrap replications. The solid straight line in each figure represents the fixed effects OLS estimate and the dashed lines above and below that line show the borders of the 90% confidence interval.
In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantileregression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). With the aim to reduce variance of the first estimator, a second estimator is defined via sequential fitting of univariate local polynomial quantile smoothing for each additive component with the other additive components replaced by the corresponding estimates from the first estimator. The second estimator achieves oracle efficiency in the sense that each estimated additive component has the same variance as in the case when all other additive components were known. Asymptotic properties are derived for both estimators under dependent processes that are strictly stationary and absolutely regular. We also provide a demonstrative empirical application of additive quantile models to ambulance travel times.
are of interest in their own right, we don’t want to dismiss them as outliers, but on the contrary we believe it would be worthwhile to study them in detail. This can be done by calculating coefficient estimates at various quantiles of the conditional distribution. Finally, a quantileregression approach avoids the restrictive assumption that the error terms are identically distributed at all points of the conditional distribution. Relaxing this assumption allows us to acknowledge firm heterogeneity and consider the possibility that estimated slope parameters vary at different quantiles of the conditional distribution of Tobin’s q.
Lately, a third approach was proposed in the literature, namely Quantileregression analysis. This technique has been frequently employed in the econometrics literature; however, there are only a few studies in the context of efficiency estimation examining among others the efficiency of hotels (Bernini et al., 2004), nursing facilities (Knox et al., 2007), dairy farms (Chidmi et al., 2011), and check processing operations (Wheelock and Wilson, 2008). In the case of banking this techniques was applied only very recently, with a handful number of studies examining US (Wheelock and Wilson, 2009), German (Behr, 2010) and European banks (Koutsomanoli-Filippaki and Mamatzakis, 2011).
In our application we used the NHANES data and the proposed method to estimate con- ditional quantile curves of usual sodium intake as a function of age for two domains in the population. These quantile curves clearly offer a broader perspective about the usual sodium intake patterns across ages. For instance, some differences between the median usual sodium intake across age groups for females and males were revealed using the proposed method. These differences would not have been apparent if traditional analytical approaches were adopted. According to our analysis, usual sodium intakes of African Americans clearly exceed the recom- mended adequate intake level (1,500 mg), a finding that has been consistently reported in the literature (Hoy et al., 2011; CDC, 2011). We showed empirical evidence that naively using the subject-specific average of the log 24HRs as the response variable in a quantileregression model to estimate the conditional quantile function of usual sodium intakes can result in misleading conclusions.
The consolidation of the field of high-dimensional statistics over the last two decades has recently renewed the interest in first-order approximations of estimators by sums of indepen- dent random variables, e.g. Chernozhukov et al. (2013, 2014). It is now widely recognized that the major challenge in establishing asymptotic theory for inference, hypothesis testing and uncertainty quantification in high dimensions lies in properly accounting for the model selection procedure that is part of all high-dimensional estimation techniques, e.g. Leeb and Pötscher (2005, 2006); Zhang and Zhang (2013); van de Geer et al. (2014); Lee et al. (2016). To address this issue a number of researcher have obtained application-specific maximal inequalities and Bahadur-type representations that hold uniformly over collections of models. For example, Berk et al. (2013) obtained post-selection coverage guarantees for confidence intervals of least squares estimators that hold simultaneously over a range of models; Bel- loni and Chernozhukov (2011, 2013) assessed properties of quantileregression and least squares estimators after model selection via maximal inequalities that hold uniformly over collections of models; and Kuchibhotla et al. (2018) obtained a Bahadur representation for least squares estimators that holds uniformly over all subsets of possible models based on given number of predictors. We think that this development is a promising step towards a principled theory for high-dimensional inference. Therefore, in this paper, we derive a strong uniform-in-model Bahadur representation for the quantileregression processes in increasing dimensions. For illustrations and applications we refer to our companion work, in which the results established here are applied to three important statistical problems: the analysis of the post-selection quantileregression estimator under misspecification, the high- dimensional de-biased quantileregression process, and the predictive risk of misspecified quantileregression models.
The empirical likelihood is not a likelihood in the usual sense, so the validity of the resultant pos- terior does not follow from the Bayes formula. Lazar (2003) discussed the validity of inference for the Bayesian empirical likelihood (BEL) approach based on earlier work of Monahan and Boos (1992). Schennach (2005) and Lancaster and Jun (2010) considered Bayesian exponentially tilted empirical likelihood (ETEL), which can be viewed as a nonparametric Bayesian procedure with noninforma- tive priors on the space of distributions. Lancaster and Jun (2010) further considered Bayesian ETEL in quantileregression. For the inference of population means, Fang and Mukerjee (2006) investigated the asymptotic validity and accuracy of the Bayesian credible regions, and furthermore, Chang and Mukerjee (2008) showed that EL admits posterior based inference with the frequentist asymptotic validity, but many of its variants do not enjoy this property. To establish the asymptotic validity of the BEL for quantileregression, we need to work with the quantile estimating equations that involve discontinuous functions, so direct local expansions used in the EL literature cannot be used. Chernozhukov and Hong (2003) discuss on the asymptotic properties of the quasi-posterior distributions defined as transformations of general (nonlikelihood-based) statistical criterion func- tions, including empirical likelihood. In our work, we also establish the asymptotic distributions of the posterior from the BEL approach for quantileregression. Different from Chernozhukov and Hong (2003), we are particularly interested in the interaction of informative priors and empirical likelihood on the asymptotic distribution of the posterior, which enables us to evaluate efficiency gains from informative priors. Although finite-sample validity of the BEL posterior inference cannot be expected in our setting, we continue to use the term “posterior” throughout the article.