This paper is organised as follows. In Section 2 we develop asymptotic theory for the kernel density estimator of Jones (1991) for length-biaseddata, and we also define two different consistent bootstrap procedures. In Section 3 we propose new data-driven bandwidth selection methods: a rule-of-thumb based on the Normal distribution and two bootstrap bandwidth selectors based on the procedures presented in the previous section. These proposals are competitors of a cross-validation method which, to the extent of our knowledge, is the only existent data-driven bandwidth selector in this context. In Section 4, we carry out an extensive simulation study to evaluate the performance of the presented bandwidth selectors for finite samples. We draw some conclusions in Section 5. Final remarks are given in Section 6 as well as a discussion of how the methodology developed in this paper can be generalised to a widespread weight function. Finally we add in the appendix the proofs of the theoretical results.
Several examples of this biaseddata can be found in the literature. For instance, in paper , it is shown that the distribution of the concentration of alcohol in the blood of intoxi- cated drivers is of interest, since the drunken driver has a larger chance of being arrested, the collected data are size-biased.
tained only by weighted sample set, is a special case that considered by him. Vardi (1985) generalized his model to selection bias model. Wu (1996) proposed a nonpara- metric maximum likelihood smooth estimator for biaseddata using kernel method. Jones and Kaunamuni (1997) used fourier series method to estimate unweighted den- sity and they found that their estimator perform better than those estimators in Bhat- tacharyya et al. (1988) and Jones (1991). Lloyd and Jones (2000) proposed a nonpara- metric density estimator for biaseddata with unknown weight function. In their studies, the weight function is treated as a selection probability. A cross-validation method for selecting smoothing parameter in kernel density estimator with selection biaseddata was proposed by Wu (1997). Winter and F¨oldes (1988) derived an Kaplan-Meier type estimator for censored biaseddata. U˜na- ´ Alarez (2002) studied its asymptotic proper- ties.
derivatives of the density function of a random sample. Several works have been done for a random right censorship model. Liuguan and Lixing  investigated the Berry-Esseen type bound for the kernel density estimator of the population density function of such data. Liang and Uña-Álvarez  have proposed a Berry-Esseen type theorem for the kernel density estimator when the data are strongly mixing and are also opposed to random censorship. Asghari et al.  have started e Berry-Esseen type theorem for the kernel density estimator of left truncated data.Investigating a Berry-Esseen type bound for the kernel density estimator of length-biaseddata is the main purpose of this paper.
This paper considers wavelet estimation for a multivariate density function based on mixing and size-biaseddata. We provide upper bounds for the mean integrated squared error (MISE) of wavelet estimators. It turns out that our results reduce to the corresponding theorem of Shirazi and Doosti (Stat. Methodol. 27:12–19, 2015), when the random sample is independent.
learning approaches with good generalization performance as it con- tributes to design more robust applications to unseen or underrep- resented imaging conditions. This paper focuses on the latter topic and presents a comparison between Convolutional Neural Networks (CNNs) and Capsule Networks (CapsNets) [22,7]. The neurons in a CapsNet are organized in groups denoted as Capsules . In contrast to a single neuron, a capsule can learn a specific image entity over a range of viewing conditions such as viewpoint and rotation. With the use of a routing algorithm to interconnect the capsules, a CapsNet model would be affine invariant and spatially aware. While the be- haviour of CNNs with biaseddata has been extensively investigated [11,14,15], how bias influences CapsNets’ performance has received little attention so far.
Historically, biaseddata sets have been a long-standing issue in statistics. The following example describes the failed prediction of the outcome of the 1936 US presidential election. It is often cited in the statistics literature in order to illustrate the impact of biases in data. This example is discussed in detail in . Example 2. The Democratic candidate Franklin D. Roosevelt was elected Pres- ident in 1932 and ran for a second term in 1936. Roosevelt’s Republican op- ponent was Kansas Governor Alfred Landon. The Literary Digest, a general interest weekly magazine, had correctly predicted the outcomes of the elections in 1916, 1920, 1924, 1928 and 1932 based on straw polls. In 1936, The Liter- ary Digest sent out 10M questionnaires in order to predict the outcome of the presidential election. The Literary Digest received 2.3M returns and predicted Landon to win by a landslide. However, the predicted result proved to be wrong, as quite the opposite happened: Roosevelt won by a landslide. This leads to the following questions:
We can also consider lessons that come from our findings for economics and policy-makers. We would warn policy-makers and survey-designers that the assumption that beliefs are on average correct, even concerning such seemingly straightforward characteristics as height or weight, seems woefully inadequate in any context where a self-centered bias may emerge and our findings indicate that such a context may be far more wide-ranging than has hitherto been considered. For example, bias in the perception of the happiness of others might bias individuals’ attitudes towards altruism and redistribution and in turn might bias responses in surveys and should lead us to be more careful in the use of such surveys to guide policy. To restate, for important policy decisions or even in the development of new economic theory it makes sense to think about whether biased beliefs will render a model inaccurate or a policy counter- productive, and if so, it makes further sense to think about how to measure the beliefs of the target population. The solution suggested by Manski (2004) is to make greater use of subjective probabilities in survey-based work and our findings lend empirical support to that recommendation.
In Germany, after the attacks against New York in 2011, a so-called ‘Rasterfahndung’ was affiliated. This method included the screening of personal data on a set of characteristics such as being a man with scholarly background, Muslim and domestic of or native from an indexed country but yet a rightful citizen in Germany. This search for internal sleepers turned out to be unconstitutional. According to Article 1(I) and Article 2(I) of the Grundgesetz this ‘Rasterfahndung’ is conflicting with the fundamental right of informational autonomy and in April of 2006 the Bundesverfassungsgericht banned such procedures without a solid threat to central bodies of legal safeguard, thus trying to keep the accountable actions of the police departments on a proportionate level without having the racial connotation to it (Schmitz, 2006; Volkmann, 2006). Repeated mistakes done by the police when conducting innocent people on hands of racial bias can challenge the citizens trust in the local police and also in the long-run increases crime as the actual offender is not convicted (Taslitz, 2010). Police legitimacy and police effectiveness is believed to rely mainly on citizens believes and support (Tyler, 2004). Furthermore, citizens, when confronted with authority, will feel responsible for obeying the commands of such only if they regard this authority as legitimate (French Jr & Raven, 1959). In 1999, Sir William Macpherson issued a report in the UK with regards to the handlings of the murder of a black pupil- Stephen Lawrence, called the ‘Stephen Lawrence Inquiry’. This report suggested to recruit more black and Asian officers and to improve racism awareness training in order to prepare the police officers for a multi-racial society (Macpherson of Cluny, 1999). A valid study on the hoped positive effects of such recruitments was never successfully completed
The purpose of this paper is to develop a new pre-processing discrimination prevention methodology to prevent direct discrimination. Firstly, measure the discrimination and identify the categories that have been directly discriminated; secondly, transform the data in a proper way to remove all those discriminatory biases. Finally, the discrimination-free data models can be produced from the transformed data set without damaging data quality. The perception of discrimination, just like the perception of privacy, strongly depends on the legal and cultural conventions of a society.
Some varieties of polar interrogatives (polar ques- tions) convey an epistemic bias toward a positive or negative answer. While previous research has revealed much on how different varieties of biased interrogatives contrast with each other in their syn- tactic and semantic properties, there is a great deal of complexity and subtlety concerning the usage of each type that calls for further investigations.
17 As signalled by Vahid (1999) the classical homogeneity assumption in panel data analysis is that the slopes of the N cross sectional units are equal, after one allows for cross section specific or random fixed effects. This assumption is absolutely necessary and non-testable for the analysis of panel data with very small T. Even acknowledging the low power of equality-of-slopes tests across cross section units in our sample (as it is usually the case in most empirical analysis of this kind), we implemented pairwise tests that showed that the assumption is acceptable. Beyond statistical tests, the hypothesis seems reasonable on conceptual grounds. The selected sample period was on purpose starting in 1999, given that all considered countries have been subject since then to the same accounting rules, whose actual implementation is closely monitored by a common agency (Eurostat) that guarantees homogeneity and consistency of application of the rules. This should be true, on average, once controlling for country fixed effects.
demonstrated that efﬁcient nutrient reacquisition by the producer can render nutrient excretion levels insufﬁcient for mutualistic growth, starving the recipient and leading to tragedy of the commons (Fig. 6) (39). Conversely, recipient-biased competition for a cross-fed nutrient promotes mutualism stability. As noted above, the importance of this recipient-biased competitive advantage likely depends on whether the communally valuable resource is generated intracellularly or extracellularly (compare Fig. 2A and C). Intracellular synthesis ensures that a portion of the nutrient pool can be assimilated by the producing partner regardless of the differential afﬁnity between the partners for that nutrient after excretion (Fig. 2A). Intracellular generation therefore helps stabilize a mutualism against an otherwise-competitive recipient by enforcing partial privatiza- tion. The competitive advantage of the recipient is in turn necessary to limit reacqui- sition of the excreted nutrient by the producer and thereby to drive directionality in nutrient exchange. Although partial privatization has primarily been thought to depend on mechanisms used by the producer to retain a portion of a communally valuable resource (16), our results indicate that the degree of privatization can be inﬂuenced by the partner as well; competition for the excreted nutrient pool impacts how much of a cross-fed resource will be shared versus reacquired. In effect, recipient-biased com- petition for an excreted communally valuable nutrient avoids tragedy of the commons by enforcing partial privatization over complete privatization.
A model is presented in which alleles at a number of loci combine to influence the value of a quantitative trait that is subject to stabilizing selection. Mutations can occur to alleles at the loci under consideration. Some of these mutations will tend to increase the value of the trait, while others will tend to decrease it. In contrast to most previous models, we allow the mean effect of mutations to be nonzero. This means that, on average, mutations can have a bias, such that they tend to either increase or decrease the value of the trait. We find, unsurprisingly, that biased mutation moves the equilibrium mean value of the quantitative trait in the direction of the bias. What is more surprising is the behavior of the deviation of the equilibrium mean value of the trait from its optimal value. This has a nonmonotonic dependence on the degree of bias, so that increasing the degree of bias can actually bring the mean phenotype closer to the optimal phenotype. Furthermore, there is a definite maximum to the extent to which biased mutation can cause a difference between the mean phenotype and the optimum. For plausible parameter values, this maximum-possible difference is small. Typically, quantitative-genetics models assume an unconstrained model of mutation, where the expected difference in effect between a parental allele and a mutant allele is independent of the current state of the parental allele. Our results show that models of this sort can easily lead to biologically implausible consequences when mutations are biased. In particular, unconstrained mutation typically leads to a continual increase or decrease in the mean allelic effects at all trait-controlling loci. Thus at each of these loci, the mean allelic effect eventually becomes extreme. This suggests that some of the models of mutation most commonly used in quantitative genetics should be modified so as to introduce genetic constraints.
The results of our paper are relevant in situations when one would like, or is insti- tutionally obligated, to use biased contests but is concerned about their costs. Suppose there is positive discrimination and hence, the contest designer has to favor some par- ticipants over others. Our results can help the designer to turn this obligation to his or her advantage and reach a better outcome in terms of essentially any possible objective. Another application, as discussed below in more detail, is that of dynamic contests in which it may seem fair, or is indeed customary, to favor those who had early success at later stages. Our results can guide the contest designer to create a contest in which there would be no trade-off between rewarding early success and generating subsequent perfor- mance. In both cases, the contest designer effectively uses the institutional constraints for introducing a bias that is hard to justify otherwise. Finally, our paper is important from a methodological perspective in showing the limits of the “leveling the playing field” and “competitive balance” ideas in the design of contests with asymmetric players.
One should note carefully that in all the above cases the methodological differences divert the data in favor of Israel, not the other way around. Naturally, national au- thorities find it more convenient to accept methodical imprecision in publications when they flatter the nation's overall situation, even if it is not carried out by them on purpose. One may assume that government agencies and ministries are happy to present the public and deci- sion makers with data that shed positive light on the areas under their control. However, they must consider the implications that may arise. First of all, such assess- ments may, as mentioned above, distort government policy, mainly with regard to allocation of resources. An- other risk is that insufficient effort will be put into ad- dressing important (but little known) problems. In the long run, and no less serious, the systematic imprecision may detract from the reliability of authority reports in the eyes of the public, thus eroding its faith in govern- ment institutions. This point is particularly important in the case at hand in view of the fact the social survey was held by CBS ordered by no other than the MoF budget
Figure 2 plots the median bias in ^ t r as a function of the true value for both single- and multilocus data sets. The median bias is plotted because the distribution of the estimator tended to have a long tail to the right. Note that the bias is relative to the true value, such that a bias of 1 corresponds to a twofold overestimate, etc. There are two points to make regarding Figure 2. The first is the bias that results from different estimates of P ðDjQÞ. In Figure 2, a–f, the solid line corresponds to the bias when estimating P ðDjQÞ using Tajima’s D as the sole summary of the data (Equation 2). The dashed line corresponds to the bias when estimating P ðDjQÞ using Equation 3. For a single locus (Figure 2a), estimating P ðDjQÞ using Equation 2 results in a larger median up- ward bias than using Equation 3. For data sets consisting of 10 or 20 independent loci (Figure 2, b and c, re- spectively), the median bias is quite low.