Statistical tests - Lights and shadows : multi wavelength analysis of young stellar objects an

There are many tests that can be performed on data to analyse their distribution and give an estimate of their value, and they are all based on the comparison between a null hypothesis and an alternative hypothesis, against which the former will be tested. The probability distributions of the two hypotheses will overlap in the so calledcritical region. In this region two errors are possible: the rejection of the null hypothesis when it is true, or the acceptance of it when it is false. The level of significance indicates the maximum probability of rejecting thenull hypothesis when it is true, and a commonly adopted value is 5%. Table of critical values, which depend on the sample size and the level of significance chosen, are available in statistics books and provide the threshold used to decide whether accepting the null hypothesis or not.

2.2.1 Pearson correlation coefficients

Pearson correlation coefficients are a statistic developed by Pearson (1896) to see whether two quantities are statistically independent or not, and this method will be applied espe- cially in Chapters 3 and 4. The coefficients are defined as follows:

r = Pn

i(xi−x¯)(yi−y¯) σxσy

2.2. Statistical tests

where the numerator represents the covariance between the two samples, x and y, and σx, σy are their respective standard deviations. According to this definition, r falls between -1 and 1. A positive value means that both quantities either increase or decrease in the same direction, i.e. there is a direct proportionality between them, while in case of a negative value there is an inverse proportionality.

The null hypothesis is the hypothesis that two samples are completely unrelated, and is verified whenr= 0. Thealternative hypothesis, which is true when there is a correlation between the two quantities, is confirmed when the absolute value ofr is greater than the critical valuerc. The closer the absolute value ofr is to one, the stronger the correlation. 2.2.2 χ2

test

The χ2 test is used to quantify how much the data are scattered from the mean value, and how much this scatter is lower than, comparable to or greater than the errors. It is defined as the sum of the square of residuals from the mean value, normalised to their squared errors according to the following equation:

χ2= N X i=1 (xi−x¯)2 σ_i2 (2.1)

wherexi is the i-th value in a sample and ¯x is the mean over the entire sample.

The reducedχ2_redis computed from Eq. 2.1 dividing by the degrees of freedom N−1, whereN is the total number of data points:

χ2_red= 1 N −1 N X i=1 (xi−x¯)2 σ2 i (2.2) χ2

redis usually preferable, because it normalises results to the degree of freedom. Thenull

hypothesis states that the square of the deviation from the mean is equal to the square of the errors, σ2_{. The} _χ2

red is always positive, and values much higher than one indicate

that there is a significant difference between measured data and the mean. In order to quantify this deviation, the χ2

red needs to be compared to tables of critical values. If the

χ2_red value is greater than the critical value, then the null hypothesis will be rejected, and the deviation from the mean will be much larger than the typical errors. This method will be used in Chapter 4 to define how much stars are variable in infrared.

2.2.3 Kolmogorov-Smirnov test

Kolmogorov-Smirnov (KS) test is used to compare two empirical samples or one empirical and one theoretical sample. The former case is calledtwo-sample KS test, the latterone- sample KS test. In either case, the null hypothesis states that the two samples have identical distribution. The test consists in measuring the maximum distance between the two sample cumulative distributions:

Dn=max|P(x)−Sn(x)| (2.3) where P(x) is the cumulative distribution of either the theoretical or the other empirical sample, whileSn(x) is the cumulative distribution of the empirical sample havingndata points. Sn(x) is defined as follows:

Sn(x) = i

n xi≤x < xi+1 1≤i≤n−1

If the distanceDnis below the critical value Dn,c, thenull hypothesis will be accepted and the two samples will have the same distribution. This method can be used also when samples have different sizes, m andn. In that case the critical value will be:

Dm,n,c=c(α) r

m+n

m·n (2.4)

where c(α)=1.36 for level of significanceα=0.05.

TheKS method will be applied in Chapters 3 and 4.

2.2.4 F test for equality of two variances

TheF test is used to compare the variances of two populations, where the null hypothesis

is that the two variances are equal. TheF test is defined as follows:

F = σ

2 1

σ₂2 (2.5)

whereσ2

1 is the variance of the sample having the higher degrees of freedom andσ22is the

In document Lights and shadows : multi wavelength analysis of young stellar objects and their protoplanetary discs (Page 51-54)