In view of the consistency and the simulation results, we feel that the conclusion of Rayner and Best (1990, p. An obvious question is what happens for other penalty functions.[r]

11 Read more

The main theme of this paper is to prove that the proposed **tests** have a simple asymptotic distribution under the null **hypothesis** (chi-square-one) and good power properties (consistency). As a counterpart it is interesting to check the validity of the proposed construction for finite sample sizes. Therefore, the method has been applied in Kallenberg and Ledwina (1995c) to testing expo- nentiality and normality. From the extensive simulation study reported in that paper it follows that the **data** **driven** **smooth** **tests**, in contrast to Kolmogorov– Smirnov and Cram´er–von Mises **tests**, compare well for a wide range of alter- natives with other, more specialized **tests**, such as Gini’s test for exponentiality and Shapiro–Wilk’s test for normality. Finally, it is worthwhile to emphasize that the solution presented here is based on general likelihood methods and hence can be extended to a wide class of other problems both univariate and multivariate. On the other hand, the solution is naturally related to sieve methods and can be extended to some other nonparametric problems as well. The paper is organized as follows. In Section 2 the selection rules are for- mally defined, the assumptions are stated and the asymptotic null distribu- tion and behavior of selection rules under alternatives is discussed. Section 3 presents **smooth** **tests** for **composite** hypotheses. In Section 4 the selection rules and the **smooth** test statistics are combined to give **data** **driven** **smooth** **tests** for **composite** hypotheses. Consistency at essentially any alternative is proved. The Appendix is mainly devoted to the proof of Theorem 2.1 and The- orem 3.1. This involves new results on exponential families with dimension growing with n and may be of independent interest.

Show more
29 Read more

Testing for the martingale di¤erence **hypothesis** (MDH) of a linear or nonlinear time series is central in many areas such as statistics, economics and …nance. In particular, many economic theories in a dynamic context, including the market e¢ ciency **hypothesis**, rational expectations or optimal asset pricing, lead to such dependence restrictions on the underlying economic variables, see e.g. Cochrane (2001). Moreover, testing for the MDH seems to be the …rst natural step in modeling the conditional mean of a time series and it has important consequences in modeling higher order conditional moments. This article proposes **data**-**driven** **smooth** **tests** for the MDH based on the principal components of certain marked empirical processes having the following attributes: (i) they are asymptotically distribution-free, with critical values from a 2 distribution, (ii) they are robust to second and higher order conditional moments of unknown form, in particular, to conditional heteroscedasticity (iii) in contrast to omnibus **tests**, **smooth** **tests** possess good local power properties and are optimal in a semiparametric sense to be discussed below, and (iv) they are very simple to compute, without resorting to nonparametric smoothing estimation.

Show more
30 Read more

to second order as the costs per observation and per stage, c and d, approach zero. Here N and M are the total number of observations and stages, respectively. Whereas the simple hypotheses problem of Chapter 3 naturally reduces to the boundary cross- ing problem of Chapter 2, unfortunately this **composite** hypotheses problem is not sufficiently well-approximated by the simple hypotheses problem to clarify consider- ations of second order optimality until “right before the final stage.” Hence, proving that our test behaves optimally in the time leading up to the final stage requires quite intricate and technical arguments. These arguments make much use of Laplace-type expansions of the stopping risk originated by Schwarz [29] and strengthened by Lor- den [23], as well as generalizations of the tools developed in Chapter 2 for proving stage-wise bounds on the random process as it is being sampled by our procedure. A small-sample procedure is also proposed, which performs significantly better than group sequential sampling in a numerical simulation of the problem of testing

Show more
187 Read more

In this paper, we describe two new classes of prior densities that more equitably balance the rates of convergence of Bayes factors in favour of true null and true alternative hypotheses. Prior densities from these classes offer a compromise between the use of vague proper priors, which can lead to nearly certain acceptance of the null **hypothesis** (Jeffreys, 1998; Lindley, 1957), and the use of local alternative priors, which restrict the accumulation of evidence in favour of the null **hypothesis**. These prior densities rely on a single parameter to determine the scale for deviations between the null and alternative hypotheses. Judicious selection of this parameter can increase the weight of evidence that is collected in favour of both true null and true alterna- tive hypotheses. Our presentation focuses on the case of point null hypotheses (which comprise a vast majority of the null hypotheses that are tested in the scientiﬁc literature), although we brieﬂy consider the extension of our methods to **composite** null and alternative hypotheses in the ﬁnal discussion.

Show more
29 Read more

Null **hypothesis** significance testing (NHST) is the most commonly used statistical methodology in psychology. The probability of achieving a value as extreme or more extreme than the statistic obtained from the **data** is evaluated, and if it is low enough then the null **hypothesis** is rejected. However, because common experimen- tal practice often clashes with the assumptions underlying NHST, these calculated probabilities are often incorrect. Most commonly, experimenters use **tests** that as- sume sample sizes are fixed in advance of **data** collection, but then use the **data** to determine **when** to stop; in the limit, experimenters can use **data** monitoring to guarantee the null **hypothesis** will be rejected. Bayesian **hypothesis** testing (BHT) provides a solution to these ills, because the stopping rule used is irrelevant to the calculation of a Bayes factor. In addition, there are strong mathematical guarantees on the frequentist properties of BHT that are comforting for researchers concerned that stopping rules could influence the Bayes factors produced. Here we show that these guaranteed bounds have limited scope and often do not apply in psycholog- ical research. Specifically, we quantitatively demonstrate the impact of optional stopping on the resulting Bayes factors in two common situations: 1) **when** the truth is a combination of the hypotheses, such as in a heterogeneous population, and 2) **when** a **hypothesis** is **composite**–taking multiple parameter values–such as the alternative **hypothesis** in a t-test. We found that for these situations that while the Bayesian interpretation remains correct regardless of the stopping rule used, the choice of stopping rule can, in some situations, greatly increase the chance of an experimenter finding evidence in the direction they desire. We suggest ways to control these frequentist implications of stopping rules on BHT.

Show more
28 Read more

The key principle in our model is to use the output of latent GPs (sparse GPs) to modulate the parameters of the different likelihoods through link functions. The intractability introduced by the **composite** likelihoods is overcome by making use of sampling-based variational inference with quadrature. We make use of deep neural networks to parameterise the variational inference to introduce a constraint that balances between locality and dissimilarity preservation in the latent space. We demonstrated the effectiveness of our model on toy datasets and clinical **data** of Parkinson’s disease patients treated at the HUS Helsinki University Hospital. Our approach identifies sub-groups from the heterogeneous patient **data** and we evaluated the differences in characteristics among the identified clusters using standard statistical **tests**.

Show more
71 Read more

The approach so far is along the same line as applying Neyman’s **smooth** **tests** in goodness of fit testing problems, cf. e.g. Rayner and Best (1989). Recent research in this area has shown that the **smooth** **tests** behave very well, but that the right choice of the number k of components is extremely important. Since the right choice depends on the type of alternative, which is of course unknown, a deterministic good choice of k is only possible if the main interest is in a very particular type of alternatives. A solution to this problem is to make a choice depending on the **data**. In a series of papers (see Bogdan 1995; Bogdan and Ledwina 1996; Inglot et al. 1997, 1998; Inglot and Ledwina 1996; Janic-Wr´ oblewska and Ledwina 1999; Kallenberg and Ledwina 1995a,b, 1997a,b, 1999; Kallenberg et al. 1997; Ledwina 1994) the **data** **driven** procedure based on (modifications of) Schwarz’ selection rule has been shown to be very successful. The idea is that a higher dimensional and hence more complex model should be penalized. This idea is applied here as well.

Show more
26 Read more

A paired samples t **tests** comparing the effect of tumor lo- cation and grading did not yield statistically significant differ- ences in the MLI. In addition, linear regression analysis did not indicate the presence of a statistically significant correlation between tumor size and the MLI in either group for both tasks. Next, we tested potential differences in the MLI between the GLM and ICA-GLM, separately for patient and control groups. A 2-way repeated-measures ANOVA, with task (WGt, VGt) and method (GLM, ICA-GLM) as factors was performed in each group. In controls, we did not find any significant effects or interactions, suggesting that the 2 methods did not provide significantly different results in assessing MLI. How- ever, **when** the same analysis was performed in the patient group, a significant reliable effect of method was found [F(1,41) ⫽ 9.59, P ⬍ .005], with no interaction with task. These results suggest that in patients with brain tumors, the ICA-GLM approach provided MLIs more lateralized to the dominant hemisphere, regardless of the language task used.

Show more
In this paper we have derived asymptotic local power bounds for seasonal unit root **tests** for both known and unknown deterministic scenarios and for an arbitrary seasonal aspect. Moreover, we have shown that the optimal test of a unit root at a given spectral frequency behaves asymptotically independently of whether unit roots exist at other frequencies or not. The point optimal **tests** were derived under stringent assumptions (Gaussian innova- tions, a known error covariance matrix and zero initial conditions). We have demonstrated that these conditions can be relaxed and that modified versions of the **tests** can achieve the same asymptotic local power functions as the Gaussian point optimal **tests**. We have also proposed near-efficient regression-based (HEGY) seasonal unit root **tests** using pseudo-GLS de-trending and shown that these have well-known limiting null distributions and asymp- totic local power functions. Our Monte Carlo results suggest that the pseudo-GLS de- trended versions, with **data**-dependent lag selection, display much improved finite-sample power properties over the original OLS de-trended HEGY **tests**, yet display very similar size properties against seasonal unit root processes **driven** by weakly dependent innovations.

Show more
40 Read more

Situations with complicated departures from the **hypothesis** (such as crossing functions) occur in real applications. As an example I consider **data** from a bone marrow transplant study discussed by Bajorunaite and Klein (2007). The treatment of leukaemia by the bone marrow transplantation may fail from one of two causes: recurrence of the disease (relapse), and death in remission (treatment-related death). There are two groups of patients to be compared: 1224 individuals with a human leukocyte antigen (HLA) identical sibling donor, and 383 with an HLA-matched unrelated donor. Summary plots for the two groups are displayed in Figure 2.1. Figure 2.2 compares estimates of cumulative incidence functions for both samples for each type of treatment failure. Numerical results of Section 2.4 show that (some of) **tests** sensitive against ordered alternatives do not detect the difference between relapse cumulative incidence curves, which contradicts the visual impression. The reason is that the relapse cumulative incidence functions for these two groups cross.

Show more
88 Read more

Next, Table 2 employs the Flexible Fourier stationarity test for each series based on the estimated frequencies. I follow Burke (1994) and use a 10% significance level. Further, I choose a lag of eight for the truncation lag. Kwaitowski et al. (1992) choose eight lags for the original NP **data**. Also, Becker et al. (2006) choose the same number of truncation lags in their analysis of the Purchasing Power Parity **hypothesis**. They use quarterly **data** from 1973-2004, which amounts to 120 observations and is approximately the same size as the NP **data**. Therefore, a choice of eight for the truncation lag seems reasonable. The second column in Table 2 shows the critical values for each frequency at the 10% level. I find that the null of trend stationarity cannot be rejected for real GNP, real per capita GNP, industrial production, real wages, and monetary aggregates. Also, for the unemployment rate, the null of level stationarity cannot be rejected. Furthermore, I accept the alternative of a unit root for nominal GNP, GNP deflator, CPI, wages, velocity, bond yields, and stock prices.

Show more
30 Read more

We report our ongoing work about a new deep architecture working in tandem with a statis- tical test procedure for jointly training texts and their label descriptions for multi-label and multi-class classification tasks. A statistical **hypothesis** testing method is used to extract the most informative words for each given class. These words are used as a class description for more label-aware text classification. In- tuition is to help the model to concentrate on more informative words rather than more fre- quent ones. The model leverages the use of la- bel descriptions in addition to the input text to enhance text classification performance. Our method is entirely **data**-**driven**, has no depen- dency on other sources of information than the training **data**, and is adaptable to different clas- sification problems by providing appropriate training **data** without major hyper-parameter tuning. We trained and tested our system on several publicly available datasets, where we managed to improve the state-of-the-art on one set with a high margin, and to obtain competi- tive results on all other ones.

Show more
Furthermore, to avoid unnecessary technicalities and to focus on the role of the penalty we investigate in this paper the goodness-of-fit testing problem of testing uniformity (which is equivalent to the simple null **hypothesis** case), con- sider contamination alternatives and use the **data** **driven** test with the simplified version of the selection rule. Moreover, the orthonormal system of the Legendre polynomials is applied, which has turned out to work very well in practice. Gen- eralizations however, to other testing problems, to other type of alternatives, to **data** **driven** **tests** with other orthonormal systems or with other versions of the selection rule can be done as well.

Show more
23 Read more

10 Read more

Simultaneous statistical inference and, in particular, multiple statistical hypoth- esis testing has become a major branch of mathematical and applied statistics during the past 20 years, cf. [ 1 ] for some bibliometric details. This growing in- terest is not least due to the novel challenges posed by the need to analyze ultra high-dimensional **data** from genetic applications. Consider, for instance, genome-wide association studies. In these, it is common to evaluate hundreds of thousands of genetic markers simultaneously with respect to their association with a given phenotype. For the theory of multiple **tests**, one major resulting problem is that many classical multiple test procedures or, equivalently, the corresponding adjustments for multiplicity of the overall significance level lead to extremely small local significance levels if (strong) control of the family-wise error rate (FWER) is targeted. This implies extremely low power for detecting true effects. In [ 2 ], it was proposed to relax the type I error criterion, to allow for a few false rejections and to control the expected proportion of false significances. The mathematical formalization of this idea, the false discovery rate (FDR) has proven attractive for practitioners and the so-called ”Benjamini-Hochberg cor- rection” can meanwhile be found in many statistical software packages. However, in cases with strong dependencies among test statistics or p-values, respectively, it has been shown that, even for large systems of hypotheses, the false discovery proportion (FDP) is typically not well concentrated around its expectation, the FDR (see, for example, [ 11 ] for the case of positively dependent, exchangeable test statistics). Consequently, FDR control in such a setting does not imply any type I error control guarantee for the actual experiment at hand, although pos- itive dependency in the sense of multivariate total positivity of order 2 (MTP 2 )

Show more
22 Read more

These problems arose due to the fact that although Fama (1965) was primarily interested in the world of actual financial markets he did not fully diverge from Samuelson’s Pythagorean (axiomatic) tradition which, according to Samuelson himself, was not meant to produce a falsifiable theory. It is easy to think of an ideal world without extreme price changes which works under Samuelson’s conditions, in which the Grossman-Stiglitz problem can be overcome by assuming that sophi- sticated traders react instantaneously by investing in information at a slight signal of departure from equilibrium. However such a world is a mental construct. De- spite this, Fama tried to find proofs that the main predictions of Samuelson’s mo- del could be found in the actual markets. It was like trying to find proofs for the theory of free fall near the surface of the Earth. One has to ignore additional varia- bles such as air friction and body mass in order to try to make an empirical test. Once these variables are taken into account, the test makes no sense. In general, ignoring characteristics of experimental environment leads either to wrong refuta- tions or inconclusive results, which was an important part of the history of EMH statistical “**tests**”.

Show more
20 Read more

To verify and validate the proposed methods, a new dataset is constructed which contains a manifold images. The images are collected from various databases. First, the proposed system segments the input query images into various shapes according to its nature as depicted in Figure 1. If the segmented query image is color, then it is modelled to HSV color space; otherwise, it is treated as gray-scale image. The intensity values of the gray- scale image are considered for the experiment as it is. The color features are extracted from H and S components; and the texture features are extracted from the V components. Then the test for equality of variances and test for equality of mean values are applied on the color and texture features of the query and target images. If the out- come of these two **tests** is positively significant, then it is concluded that the query and target images are the same or similar. Otherwise, they belong to different groups.

Show more
10 Read more

The results obtained in previous Sections were applied to the diagnosis of coronary stenosis, a disease that consists of the obstruction of the coronary artery and its diagno- sis can be made through a dobutamine echocardiography, a stress echocardiography or through a CT scan, and as the gold standard a coronary angiography is used. As the coro- nary angiography can cause different reactions in individuals (thrombosis, heart attack, infections, etc.), not all of the individuals are verified with the coronary angiography. In Table 5, we show the results obtained **when** applying the three diagnostic **tests** and the gold standard (T 1 : dobutamine ecocardiography; T 2 : stress echocardiography; T 3 : CT scan) to a sample of 2455 spanish males over 45 and **when** applying the coronary an- giography (D) only to a subset of these individuals. The **data** come from a study carried out at the University Hospital in Granada. This study was carried out in two phases: in the first phase, the three diagnostic **tests** were applied to all of the individuals; and in the second phase, the coronary angiography was applied only to a subset of these individu- als depending only on the results of the three diagnostic **tests**. Therefore, in this example it can be assumed that the missing **data** mechanism is MAR and the model is ignorable, and therefore the results of the previous sections can be applied. The values of the esti- mators of the LRs are c LR + 1 = 5.31, c LR + 2 = 3.04, c LR + 3 = 7.61, c LR − 1 = 0.13, c LR − 2 = 0.33 and c LR − 3 = 0.09. Applying equation (3) it holds that Q 2 = 126.20 (p-value = 0) and therefore we reject the joint equality of the LRs. In order to investigate the causes of the significance, the step is to solve the marginal **hypothesis** **tests**. In Table 6, we show the results obtained for each one of the six **hypothesis** **tests** that compare the LRs. Then a method of multiple comparisons (Bonferroni, Holm or Hochberg) is applied and it is found that (with the three methods) the three positive likelihood ratios are different, and the biggest one is that of the CT scan, followed by that of the dobutamine echocardiog- raphy and finally that of the stress echocardiography. Regarding the negative likelihood ratios, no significant differences were found between that of the dobutamine echocar- diography and that of the CT scan; whilst the negative likelihood ratio of the stress echocardiography is significantly larger than that of the dobutamine echocardiography and that of the CT scan.

Show more
20 Read more

A viable approach here is to use the criterion function that has been smoothed over. This then satisÞes the usual regularity conditions and the standard distribution theory applies. Horowitz has applied this idea to a number of problems including standard median estimation [Horowitz (1998a)]; he gives some additional justiÞcation for this approach in terms of higher order properties. Is ‘**smooth**- ing over’ always the best estimation strategy? The issue here is analogous to whether one should use the smoothed empirical distribution function instead of the usual unsmoothed empirical distribution function. Although there are some statistical reasons for so doing, most applied economists would be content with using the unsmoothed empirical distribution. 1

Show more
21 Read more