Large-scale complexsurveys usually have a relatively large sample size ranging from a few hundreds to many thousands. The fully efficient replication weights described in Section 2 or replication weights constructed by some existing methods such as the jackknife or the bootstrap methods would involve a very large number of sets of weights. Although valid replication weights provide enormous convenience to the users of survey data, who are not necessarily the survey runners, the burden of manipulating a data set with hundreds or even thousands of replicate weights can be enormous. As a result, how to achieve efficient replication varianceestimation with a relatively small number of replicate weights is a question with both theoretical and practical value.
matrix estimated at stage (ii) using a WLS approach. Such a three-stage procedure can be extended to a more general structural model involving a vector of observed covariates x for each unit, as described in Muth´ en (1984). This point estimation approach and its consistency extends naturally to complexsurveys as described by Aparahouhov (2005). This extension is only required at stages (i) and (ii), where the log likelihoods at each of these stages involve sums over observations and survey weights need to be incorporated in these sums as in a pseudo maximum likelihood approach. Aparahouhov (2005) also discussed varianceestimation, following Muth´ en and Satorra (1995). More research seems needed, e.g. to consider the possible role of the complex design in the choice of weight matrix at stage (iii), to consider alternative varianceestimation methods for such three- stage procedures and to consider alternative testing methods.
The estimation of population parameters using complex survey data requires careful statistical modelling to account for the design features. This is further complicated by unit and item nonresponse for which a number of methods have been developed in order to reduce estimation bias. In this paper, we address some issues that arise when the target of the inference (i.e. the analysis model or model of interest) is the conditional quantile of a continuous outcome. Survey design variables are duly included in the analysis and a bootstrap varianceestimation approach is proposed. Missing data are multiply imputed by means of chained equations. In particular, imputation of continuous variables is based on their empirical distribution, conditional on all other variables in the analysis. This method preserves the distributional relationships in the data, including conditional skewness and kurtosis, and successfully handles bounded outcomes. Our motivating study concerns the analysis of birthweight determinants in a large UK-based cohort of children. A novel finding on the parental conflict theory is reported. R code implementing these procedures is provided.
The major goal of sampling is to obtain estimates of population parameters that are as precise as possible, correct and free of bias. We would want to obtain these estimates without actually sampling the entire population and from only sampling small proportions of the population. Since the estimates are based on random samples, they themselves are considered random variables and thus can be described using probability functions. The precision of the estimates while certainly is a function of the sample size, it is also related to the sampling design. As the sample design becomes more intricate, the estimation of the variance also becomes more detailed. This chapter shows examples (Kish, 1965) of how to calculate the variance of the sample mean for each of the designs. In general, the precision of an estimate is often described using 95% confidence intervals (95% CI). The width of the confidence intervals is related to the variance of the estimate. Intervals that are wider show less precision, hence it is important to properly estimate the variance.
Abstract— Imputation of missing data is important in many areas, such as reducing non-response bias in surveys and maintaining medical documentation. Estimating the uncertainty inherent in the imputed values is one way of evaluating the results of the imputation process. This paper presents a new method for the estimation of imputation uncertainty, which can be implemented as part of any imputation method, and which can be used to estimate the accuracy of the imputed values generated by both parametric and non-parametric imputation techniques. The proposed approach can be used to assess the feasibility of the imputation process for large complex datasets, and to compare the effectiveness of candidate imputation methods when they are applied to the same dataset. Current uncertainty estimation methods are described and their limitations are discussed. The ideas underpinning the proposed approach are explained in detail, and a case study is presented which shows how the new method has been applied in practice.
Although not a strictly scientific survey, the Standish Group’s ‘Chaos report’ seems to have made such a strong impact (perhaps more than any scientific survey) on common estimation beliefs that it deserves to be included in this review. The first and most cited version is the ‘Chaos report’ from 1994, but Standish Group has continued data collection during the nineties. Total sample size of the 1994-report was 365 respondents. All projects were classified as, success (delivered as planned), challenged (delivered over time, and over budget and with fewer than specified features) or impaired (cancelled). The sample selection process of organizations and projects is unknown (we have made several inquiries to the Standish Group about properties such as the sample involved. The have refused to provide these details, claiming them to be ‘business secrets’.), along with other important design and measurement issues. It is possible that the estimation accuracy reported by Standish Group is misleading. For example, from our inspection of the survey questionnaire available on their web-site, it seems as projects completed ahead of plans had to be registered as projects with “less than 20% overrun”. If our
Three generalize variance estimators of normal- Poisson models have been introduced (see ). Also, the characterization by variance function and by generalized variance of normal-Poisson have been successfully proven (see ). In this paper, a new statistical aspect of normal Poisson model is presented, i.e. the Poisson varianceestimation under only observations of normal components leading to an extension of generalized variance term i.e. the "standardized generalized variance".
Many operations carried out by official statistical institutes use large-scale surveys obtained by stratified random sam- pling without replacement. Variables commonly examined in this type of surveys are binary, categorical and continuous, and hence, the estimates of interest involve estimates of proportions, totals and means. The problem of approximating the sampling relative error of this kind of estimates is studied in this paper. Some new jackknife methods are proposed and compared with plug-in and bootstrap methods. An extensive simulation study is carried out to compare the behavior of all the methods considered in this paper.
The variance-covariance method has one clear advantage, that it is quick and easy to compute. However, it has very limited applicability for several reasons. First of all it is suitable only for linear portfolios or instruments with very small holding period, as it uses the linearisation of the portfolio which otherwise can be too unreliable. The second problem is an assumption of Gaussian distributed changes in risk factors. It is well known, that more heavy-tailed distributions seem to fit the behavior of many markets better than the normal distribution (see Hull (2012a)). Moreover, an assumption that all historic data, including possibly complex dependencies between risk factors, are captured by the covariance matrix is very limited, not mentioning the difficulty of their estimation. Both simulation methods overcome most of the problems of the variance-covariance method. They are extremely effective when any analytical approach is too unreliable. Besides, they are good at dealing with path-dependency, heavy-tails, non-linearity, op- tionality and multidimensionality. Historical simulation, in particular, is maybe the most popular tool for many financial institutions due to its ease of implementation and reporting. Its great advantage compared to MC method is that it does not require any assumption about the distribution of the changes in risk factors. Historical simulation takes information about underlying distribution from the past data. This is, however, also a big disadvantage, in particular, when the past data is not complete or does not represent the future market movements in a reliable manner. Although, such a data is usually used for estimation of the parameters of assumed distribution for MC method- ology, it is very easy to carry out a simulation under alternative assumptions in case of MC method in contrast to historical simulation.
even numbered quadrat counts), which is what the Split method does for the within stripe variance contribution. Generalization . The generalization of Eq. 21 to higher dimensions is relatively straightforward. Consider for instance a series of FUR Cavalieri slabs in R 3 , and hit this series with a perpendicular FUR series. The result is a three dimensional grid of FUR bars whose cross section is a square of side length t. In Fig. 1b, the bars would be perpendicular to the plane of the paper, and they would be viewed as quadrats. The particle contents N i of the ith slab
Secondly, now that some insight has been gained into the quantitative effect of non normality on the size and variability of variance component estimates, attention may be turned to developing robust variance component estimators. It would also be helpful to know whether the observed conservativeness of estimates under non-normality is a global effect or simply a result of the small sample design and particular values of e and b that were considered. It would be good too to prove some results analytically about the large- sample variance of MLEs and REMLEs in the presence of non-normality.
Abstract: - In this paper, we study the estimation of variance components in the one-way repeated measurements model (one-way- RMM),which contains one within- units factor incorporating one random effects one between- units factor as well as the experimental error term, the estimation is carried out by non-linear maximization which requires the maximum likelihood estimation of variance components. Our aim is to estimate these components according constraints that the variance components are positive, as it was assumed this model, and determining the variance components in the matrix covariance of this model, and then derive the estimators of these components by maximum likelihood mothed.
covering of the resource is done subjectively using common sense decisions. Among such decisions there can be found quite wise ideas to generalize the theory of sampling design. Although most coverings are done in some systematic ways using the available knowledge of how the actual resource distributes over the area, the cruise leaders will always also want to use observation data from the finished part of the survey to decide upon the remaining covering. When covering a resource sub- jectively, legs are never selected very close to each other, but when using sampling design, this may happen. Many cruise leaders will consider close legs as a waist of money. But within sampling design from a finite popula- tion, every object must have a positive probability to be selected in the sample. Moreover, to be able to estimate the variance of the used estimator, every pair of objects should have a positive probability to be selected. This is a strict requirement for the Horvitz-Thompson estimator based on unequal probabilities given in RAJ (1968). This is the reason why close legs should have a positive probability to be selected when using sampling design to select parallel lines. There are certain problems with un- equal probability designs. See Tillé (1996), but these do not affect the present methods.
However, when the sample has missing observations in the response variable, two strategies can be followed mainly. The first one only uses complete observations, giving a simplified estimation. The second one is based on the techniques of simple imputation already used by Chu and Cheng [ 29 ] or González-Manteiga and Pérez-González [ 19 ]. This estimation method, which we will refer to as imputed estimation, consists in using the simplified estimator to estimate the missing observations of the response variable and then applying the estimator for complete data to the completed sample. Recently, Pérez-González et al. [ 22 ] studied local polynomial regression with imputation in a context of fixed design with correlated errors and missing data in the response variable.
Horváth and Zakoïan (2011) in the univariate case. We also propose a consistent residual bootstrap procedure for approximating the asymptotic distribution of the VTE. Our second aim is to use the VTE for testing the model adequacy. Even if the QML method for the whole set of parameters is not used, the derivatives of the quasi-likelihood with respect to the first components of the pa- rameter (those which are estimated in the first step of the VT method) can be used to derive a test in the spirit of the score test. The VTE can indeed be viewed as a "constrained estimator", the estimation of the theoretical variance being forced to coincide with the empirical variance. An important difference with usual score tests is that our "constraint" is random, as it depends on the observations.
(MCMC) techniques by Escobar & West (1994). Estimation of these models is now standard with several alternatives available, see Neal (2000) and Kalli et al. (2011). Our proposed method benefits varianceestimation in at least three aspects. First, the common values of intraperiod variance can be pooled into the same group leading to a more precise estimate. The pooling is done endogenously along with estimation of other model parameters. Second, the Bayesian nonparametric model delivers exact finite inference regarding ex-post variance or transformations such as the logarithm. As such, uncertainty around the estimate of ex- post volatility is readily available from the predictive density. Unlike the existing asymptotic theory which may give confidence intervals that contain negative values for variance, density intervals are always on the positive real line and can accommodate asymmetry.
gression modeling. Similar analyses were done by Kalleberg et al. (1996a), and we replicate and extend their analysis. The dependent variable for our analyses, which we label FILM- score, is a mean score for an establishment on various survey questions. High values of FILMscore indicate establishments with highly developed firm internal labor markets: sys- tems of job classification, job ladders, and internal promotion opportunities. Low values of FILMscore indicate that firms tend to hire from outside, and offer few opportunities for advancement. Following previous analyses, our independent variables are factors thought to be associated with more extensive use of FILMs: the number of employees in the establish- ment (size); the natural log of the number of hierarchical levels in the establishment (lnlev ); the number of different departments in the organization (depts); survey-derived measures of decentralization (decent) and formalization (formal ); dummy variables for establishments that produce services only (service), and for establishments that produce both products and services ( prodserv ); a survey item indicating the geographic scope of the establish- ment’s target market (scope); the natural log of the age of the establishment (lnage); a survey-derived scale indicating the extent of problems in attracting and retaining employees (eeprob); a scale indicating the complexity of the establishment’s environment (complex ); dummy variables for public ( public) and non-profit (nonprofit) establishments; scales mea- suring institutionalization (instn) and pressure from trade unions (union); and a dummy variable for establishments that are members of multi-site organizations (multisite).
By definition, the frequency components of a stationary signal have uncorrelated phases, and hence the replicated signal e z (which has completely random phases) is stationary, even if the observed signal z is not. See also  in relation to this. Our test procedure generates many such replicated signals and compares the power variance with that of the observed signal, to see if there is a significant difference. We detail the exact test procedure in Algorithm 1, where FFT denotes the Fast Fourier Transform and I(·) denotes the indicator function. If the null is rejected then there is sufficient evidence to suggest that the observed signal is nonstationary, otherwise the null should not be rejected using this test procedure. Note that this is a two-sided test procedure—we have tested whether the observed power variance is significantly lower or higher than that found in the distribution of stationary replicates. One- sided tests can be performed by using the values of q(z) or r(z) in Algorithm 1, instead of p(z), to respectively test for significantly high or low power variance.
We propose to estimate the design variance of absolute changes between two cross- sectional estimators under rotating sampling schemes. We show that the variance estimator proposed is generally positive. We also propose possible extensions for stratified samples, with dynamic stratification; that is, when units move between strata and new strata are created at the second waves.