All data are presented as mean ± SEM and one-, two- or three-way analyses of variance (ANOVAs) were used for analysis, followed by either Tukey or Fisher as post-hoc test. Repeated measures ANOVA was used to analyze independent groups for which at least one parameter was measured repeatedly (e.g. home cage activity). Three independent groups were analyzed via Kruskal-Wallis ANOVA (KWA) with Mann-Whitney-U (MWU) as post-hoc test followed by Dunn-Šidák correction for multiple testing. When testing multiple times, statistical significance will occur by chance in every 20th test. Thus,
one has to correct for multiple testing by adjusting the significance level to counteract this effect. The Dunn-Šidák method uses the following formula to adjust the significance level:
Thus, to still reach statistical significance when multiple parameters are tested simultaneously, their p-value must be below the new significance level α of:
number of parameters tested p-value Dunn-Šidák significance level 1 0.05 0.05 2 0.025 3 0.017 4 0.013
The Dunn-Šidák method should be preferred over Bonferroni correction for multiple testing due to the high chance of false negatives for the latter (Abdi, 2007). The following table summarizes the used statistical tests:
A trend or significance were accepted when p≤0.1 or p≤0.05, respectively. When the Dunn-Šidák correction for multiple testing was used, the new significance level is explicitly mentioned in the test. For the sake of clarity, significance levels mentioned in the text or depicted in graphics are categorized in p≤0.05, p≤0.01 and p≤0.001 and are shown as *, ** or *** respectively.
EE was performed several times to assess different behavioral aspects and to evaluate reproducibility and reliability of the obtained data. Meta-analysis was performed by using “z-score”, […] “which standardizes observations obtained across experiments and from different cohorts, thereby allowing their compilation and/or comparison. Z-scores are standardized scores, which indicate how many standard deviations (σ, SD) an observation (X) is above or below the mean of a control group (μ)” (Guilloux et al., 2011).
Statistical test Post-hoc test Multiple correction Type of test
MWU x Dunn-Šidák non-parametric
KWA MWU Dunn-Šidák non-parametric
repeated measures
ANOVA Tukey or Fisher x parametric
2/3-way ANOVA Tukey or Fisher x parametric
χ²-test x x sample
distribution
Z-score 2/3-way ANOVA x parametric
Mouse behavior is multimodal, changes rapidly between emotional states (Ramos et al., 2008) and can only be fully quantified by utilizing multiple behavioral tests covering a wide range of behaviors on several days (Crawley et al., 1997; Crawley and Paylor, 1997). Z-scores allow the integration of several parameters per test, like percent time spent in center (TC), distance travelled in the center (DC), latency to enter center (LCe), total entries light center (EC) and total distance travelled (TD):
( ) ( ) ( ) ( ) ( )
Moreover, it is possible to generate a final score averaging the observed effect sizes of multiple tests (e.g.: OF, EPM, LD, SIH etc.):
The directionality of z-scores was adjusted so that a decrease reflects an anxiolytic and an increase an anxiogenic effect, respectively. Importantly, psychiatric disorders like depression are diagnosed by a set of variable symptoms (4-5 out of 10) over an extended time period since changes of emotionality can manifest via different aspects over time (Guilloux et al., 2011). Thus, a method like z-score, taking advantage of several parameters per test and multiple tests reflects the human situation indeed in a more realistic fashion. Z- scores are more resistant to fluctuating behavior (“behavioral noise”) by testing whether an experimental group deviates from mean behaviors in converging directions across tests and time. Z-scores were calculated for parameters assessing emotionality and locomotor activity, thereby eliminating the latter as confounding factor (for a review see Guilloux et al., 2011).
We used a 2x3 contingency table to perform a χ² goodness of fit test to evaluate whether the sample distribution of effect sizes was shifted after EE. This test is used when data are present as mutually exclusive categorical variables and utilizes observed versus expected frequencies. Thereby, Cramers V is used as a measure of effect size:
Effect size Cramers V small V<0.1 medium 0.1≤V≤0.5
high V≥1.0
Murine gene symbols and the respective mRNA are written in italicized letters with the first letter in capital. Murine peptides and proteins are held in non-italicized capital letters. As human and murine, genes, proteins and peptides apply to the same script conventions, the associated organism is explicitly mentioned in the respective text. The symbols and gene definitions are based on the information provided by the Mouse Genome Database (MGD, Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME. World Wide Web URL: http://www.informatics.jax.org; September, 2012) and are subject to change.
All data are presented as bar plots with male HAB mice being displayed as red and female HAB as pink bars. Male and female NAB and CD1 are depicted as dark and light green and grey bars, respectively. Male and female LABs are shown in dark and light blue. In addition, bars of EE animals are hatched, bricks were used for non-responders (NRs) and bars of animals tested for transgenerational inheritance (EESE) are dotted. P-values in tables highlighted in red or blue indicate significance or a trend, respectively. All data were analyzed via Statistica 8 (Statsoft, Hamburg, Germany).
Fig. 16: EE exhibits an anxiolytic effect indicated by an increased distance travelled in the inner zone for both sexes (A), whereas only males entered more often into it. N (males) = 21 (SE, EE); N (females) = 21 SE and 20 EE