Chapter 2 Wh-question processing in bilingual children
2.2. Method
2.2.1 Participants
A total of 68 children from Grades 3-6 participated in this study. The sample included 37 monolinguals aged 7;10-11;6 (M=9;7, SD=1;1, 16 girls and 21 boys) and 31 bilingual children 7;4-11;5 (M= 9;6, SD=1;2, 17 girls and 14 boys). There was no difference in age between the two groups F (1, 65) = .265, p=.608, ηp=.004). None of the children had a history
of language impairment or learning difficulty based on a parental questionnaire and teachers’ reports12.
The bilingual children came from a variety of linguistic backgrounds; 6 children had French as their L1, 3 had Greek, 2 had Chinese, Serbian and Spanish and two were bilingual Polish- Serbian. One child spoke Arabic, Flemish, Italian, Latvian, Lithuanian, Macedonian, Slovak, Surashtra, Tibetan and Urdu as a first language and two children were bilingual from home (Russian & Georgian and Turkish & Urdu). All monolingual participants were born in the UK except two who were born in Australia. About half the bilingual children were born in
59
English majority countries (all in the UK except one who was born in Canada). Four children were born in France and one was born in China (AR Tibet), Georgia, Italy, Libya, Lithuania, Russia and Spain13. The bilingual children born in the UK or Canada were reportedly exposed to English from birth mainly or early on through day care. For children born in non- English speaking countries, the picture varied in terms of age of onset and length of exposure (see Table 2 for details); one child was born in France and moved to the UK aged 9 but was exposed to English from birth, as s/he was growing up in a one-parent-one-language setting. Four children were exposed to English after their L1 as early sequential bilinguals, between the age of 2 and 3 years, while children were exposed to English as late sequential bilinguals between the ages of 5 and 8 years. These children were born in non-English speaking countries and both parents spoke a language other than English at home. All children in this study had a minimum exposure to English of 2 years (for details, see Table 2); for the children born in the UK and Canada, this was equated to their chronological age (average 9; 3, SD= 13.85 months, range 6;2-10;3); for the children who were reportedly exposed to English after birth, average length of exposure was under 5 years (M=4;10, SD= 25 months) and ranged between 2;5 and 7;4. The children who were first exposed to English before 4 years of age had a mean exposure nearly 7 years (M=6;8, SD= 5.74 months, range 6;2-7;4). The children exposed to English at a later age had a substantially shorter length of exposure (M=3;4, SD= 17.42 months, range 2;2-5;6).
It should be noted that recent research has tended to examine bilingualism as a spectrum of linguistic abilities and cognitive states (Serratrice, 2018; Luk & Bialystok, 2013) depending on either proficiency in one or both languages or circumstances under which the additional language was acquired. For the purposes of this thesis, the bilingual children were included in a single group and compared with the monolingual children. This is acknowledged as a
60
limitation in Chapter 6 of this thesis. Tables 1 and 2 summarise key demographic data about the two groups of children.
Ethical approval for this study was granted from the School of Psychology and Clinical Language Sciences Research Ethics Committee. Children were recruited either through mainstream schools in the area of Reading and Southampton (UK) or privately through a University of Reading webmail for employees or word of mouth. Separate information sheets and consent forms were completed by children and parents.
Table 1 Age of participants
Monolinguals Bilinguals
N 37 31
Range (in years; months) 7;10-11;6 7;4-11;5
Mean (in years; months) 9;7 9;6
SD (in months) 12.50 13.63
Table 2 Age of Onset (AoO) and Length of Exposure (LoE) to English for bilingual children Exposure to
English
N Range (in years;
months)
Mean (in years; months)
Around birth 20 AoO 014 01
LoE 6;2-10;3 9;3
Early (<4 years) 4 AoO 2;0-3;0 2;6
LoE 6:2-7;4 6:8
Later (>5 years) 5 AoO 5;6-8;0 6;8
LoE 2;4-5;6 3;4
61
2.2.2 Design
The study is a visual world eye-tracking task. In this task, participants, heard a wh-question and looked at two pictures. Both pictures contained two animate entities (animals) with one doing something to the other (e.g. carrying). The two pictures differed in that the thematic structure had been reversed so the agent in the one picture was the theme in the other and vice versa. In each trial, one picture had depicted the event with the argument structure corresponding to the one in the question the participants heard (henceforth “target”); the other depicted the reverse argument structure (“competitor”). After hearing the verbal stimulus and looking at the pictures, participants clicked on the picture that answered the question. Between-subjects variable was language group, i.e. monolinguals vs. bilinguals. This task examined subject and object wh-questions with the question word “which” as wh-questions headed by “which” have been found to be comparatively more difficult relative to questions headed by “who” or “what” (e.g. Avrutin, 2000). Moreover, the number of the two noun phrases was manipulated so that the two could be either the same or different (i.e. match vs. mismatch) following Contemori et al. (2018). The third within-subjects variable was the number of the first noun phrase (singular first vs plural first) which was not included in Contemori et al. This gave rise to 8 (2x2x2) conditions. For each condition, there were ten trials with the same words across conditions as can be seen in Table 3.
Table 3 Sample of experimental items per condition Subject Question Singular-Singular Which bear is chasing the camel?
Object Question Singular-Singular Which bear is the camel chasing?
62
Subject Question Singular-Plural Which bear is chasing the camels?
Object Question Singular-Plural Which bear are the camels chasing? Subject Question Plural-Singular
Which bears are chasing the camel?
Object Question Plural-Singular Which bears is the camel chasing?
Subject Question Plural-Plural Which bears are chasing the camels?
Object Question Plural-Plural Which bears are the camels chasing?
This study did not use fillers. There are two reasons for this. First, due to the complexity of the research design, a relatively large number of trials was needed. To include the full experimental design in addition to filler trials would have made the task excessively long, tiresome for the children participants and would have impacted the size of the battery of background assessments of other experimental tasks administered. Reducing the number of conditions would not have been conducive to addressing open questions from the Contemori et al. (2018) study – in particular, the effect of the number of the first noun phrase. To reduce the number of trials in order to allow for fillers would have reduced statistical power. The latter is particularly problematic for studies using physiological measures such as gaze data where participants may vary significantly by trials (e.g. in this case, lack of concentration, particular interest in one specific stimulus etc). For the same reason multiple variants of the task in the form of a Latin square were not created. The second reason is that fillers are perhaps redundant in this task. Fillers are habitually used in order to prevent practise effects or effects of the participant building expectations about upcoming trials. This is unlikely to be the case in this study as knowledge about previous trials is not informative about upcoming ones for two reasons. First, all conditions cannot be distinguished without the participant hearing and processing the entire sentence. Second, the participants hear all eight conditions interchangeably, therefore any biases in expectation are cancelled out. To ensure that
63
individual trials did not impact the results, analyses included random intercepts for trial. These were usually among the first to be removed when the model failed to converge suggesting that the participants tended to behave similarly across trials.
2.2.3 Materials
For each trial, one which-question and two pictures were used. 80 which-questions were created by forming which-questions with all ten lexical sets across all conditions. Each set of lexical items involved a transitive verb and two animate entities, more specifically animals. This way all sentences in the experimental trials were semantically reversible. The verbs used were the same as in Contemori et al. The verbal stimuli were digitally recorded by a male L1 speaker of British English in a sound booth. There were no fillers in this task due to time constraints.
The visual stimuli have been derived from Contemori et al. (2018) but additional pictures for the novel conditions (PL-SG, PL-PL for both subject and object questions) in this study were created by copying to ensure maximal visual similarity. The visual stimuli were two pictures both depicting the action in the question and involving the same animate entities. One picture corresponded to the answer of the question heard (target) whereas the other one did not (competitor). The competitor showed the same action and entities involved as the target but with the thematic roles reversed. The size and visual features of target and competitor were similar to the greatest degree possible.
An example of the visual stimuli where both NPs match in number (Singular) is shown in Figure 1. The picture on the right is the target for Subject SG-SG and competitor for Object SG-SG; the reverse is true for the picture on the right. For the trials with the mismatch condition, the singular and plural entity was the same in both pictures. Figure 2 shows the visual stimuli for the same set of item in the conditions where NPs differed in number;
64
condition Subject SG-PL and Object SG-PL. Again the target for the subject condition is the competitor for the object condition and vice versa. The position of target and competitor was counterbalanced across conditions. As a result, except for the structure of the sentence heard, there were no cues to adjudicate between target and competitor.
Figure 1 Sample visual stimuli for the example listed in the materials for conditions where both NPs are singular. The same pictures were used for both subject (“Which bear is chasing the camel?”, target on the left side; competitor on the right) and object ques
65
Figure 2 Sample visual stimuli for the example listed in the materials for conditions where the NPs differ in number. The same pictures were used for both subject (“Which bear is chasing the camels?”, target on the left side; competitor on the right) and object questions
2.2.4 Procedure
The study took part in a quiet room with the participants wearing headphones. A Tobii X120 (Tobii Technology AB, Sweden) eye-tracker was used to measure participants’ eye-gaze with a resolution of 120Hz. The remote Tobii eye-tracker is placed in front of a laptop, which is raised to a height of 30cm as it is placed on the eye-tracker case, and allows a freedom of head movement of 44 cm x 22 cm x 30cm. The eye-movement data reported are an average of both eyes. Stimulus presentation and eye-gaze data collection was conducted using E- prime (Schneider, Eschman & Zuccolotto, 2002). The testing started with a 5-point calibration procedure using Tobii Studio. The experimenter (first author) judged the quality
66
of the calibration by examining the calibration plot for the five points. Participants sat on a chair at about 60 cm from the screen, although this was adjusted somewhat to facilitate calibration.
During the task, a fixation cross in the centre of the screen appeared before the onset of each trial which participants needed to fixate upon for 1000ms for the trial to begin. Participants heard a question over a set of headphones and saw two pictures on each side of the screen. Following the question, a question mark appeared, and a cursor appeared on screen. The two pictures were kept constant and the participants needed to click on a picture on the screen to select the target while looking at the pictures. There was no time limit for participants to select a picture. The order of the stimuli was pseudorandomised to avoid the same condition in adjacent trials which were split into two blocks of 40 trials. Total testing time for the children was about 10-15 minutes per block excluding the time needed to calibrate before starting.
2.2.5 Analyses
Accuracy, reaction time and gaze data were collected and analysed. Two areas of interest (AOI) were defined a priori in E-prime capturing the left and right half of the screen, corresponding to each picture presented in each trial. Eye-movement data were time locked to the onset of the auxiliary verb. A window of 200ms was allowed for the time it takes to program a saccadic eye-movement and as such any movement prior to this would not reflect linguistic processing related to the stimulus (Matin, Bao & Boff, 1993). Eye-movements were analysed for a period of 2 seconds (200-2200ms post auxiliary). Incorrect trials were removed from the analyses15. The time period examined was divided into ten equal bins of 200ms. For
each bin the proportion of looks to target relative to target and competitor was calculated.
15Trials were originally analysed as a whole and then separately for correct and incorrect trials; as the patterns observed in the incorrect trials were inconsistent, these were assumed to be noise and as such excluded from further analyses.
67
These proportions of looks were quasi-logit transformed to compute the empirical logit which better handles cases where the proportion is extremely high or low (likelihood near 0 or 1) and where the observations are not purely independent from one another, as is the case with eye-movements (Barr, 2008). The decision to segment the observations into 200ms windows is due to the sampling frequency of the Tobii eye-tracker (120Hz, 1 measurement per 8.3 milliseconds). A 200ms time window meant the proportion was calculated as a ratio out of 25. This allows for numeric values with a distribution that does not approximate the binomial one while still allowing for relatively fine-grained analysis in terms of timecourse.
The analysis was conducted using logistic mixed effects for accuracy and linear mixed effects for reaction time and gaze data with crossed random effects for subjects and lexical items (Baayen, Davidson & Bates, 2008) implemented in the lme package in R (Bates, Maechler, Bolker & Walker, 2015, version 3.5-0). For the gaze data, a growth curve model was fitted to access changes over time (Mirman, Dixon & Magnuson, 2008; Mirman, 2014). The latter is a particular type of hierarchical mixed effects model where a polynomial is created for time and is used as a predictor variable to model time course of change in addition to other variables which may have an overall effect or an effect on the change over time. These polynomials are orthogonalised; they have been transformed so that they are independent from one another in order to avoid collinearity and as such their individual contribution to the change over time can be assessed. Using power polynomials has several advantages. First, it allows us to represent curvilinear change over time which is not exclusively linear. Second it avoids the need to segment the data into windows for separate analyses with reduced statistical power whilst reducing the risk of false positives due to multiple comparisons16. Third, allows for an interaction of time with the variables assumed to impact gaze data. Fourth the bias from the arbitrariness of establishing time windows is reduced (for a
16Although this may not be an issue if the effect is reliable across the majority of time bins (e.g. see Ito, Corley, & Pickering 2018 for a study with individual analyses for each 50ms time window).
68
discussion of this, see Mirman et al 2008). This type of analysis examines changes for each group and within subject variable by fitting multiple curves to the data, each corresponding to one of the orders of the polynomial terms, and accessing them as predictors of the data. It does not focus on the effect of time as a multilevel factor which would subsequently require pairwise comparisons but examines the trajectory of change over time. The intercept term reflects the average height of the curve, i.e. the height when all variables are zero, to phrase it differently the constant difference in height of the curve by condition. The linear order polynomial captures the general degree of change over time (i.e. slope) while the higher order polynomials capture the non-linear change (i.e. sensitivity to change at particular time points or revision which would be expected if the participants process incrementally and need to reanalyse). For this study, a fourth17 order polynomial was fitted to the data as visual inspection of the data indicated a not purely linear change over time and which appeared to differ for the two types of syntactic structure. One should note, this is not the only way of analysing gaze data with or without including time as a variable (e.g. Ito et al., 2018). The motivation for choosing this type of analysis is that it follows the analysis in Contemori et al. (2018) and can also be applied to Studies 2A and 2B.
Sum coding (-1, 1) was used for between subject variables (monolingual vs. bilingual) and fixed main effects of ‘structure’ (subject vs. object which-question), ‘number matching’ (match between the two NPs vs. mismatch) as well as ‘first NP’ (singular vs. plural) for all three metrics. Time was scaled, and a fourth order orthogonal polynomial18 was fitted in the range of bins in order to conduct the growth curve analysis. The dependent variable for the eye-tracking was the empirical logit transformation of the aforementioned ratio in order to reduce skew in the data (Barr, 2008). Trials where there were no looks to either target or object were not included, as the computed empirical logit value would be infinity (0 divided
17 It was a better fit than models with fewer terms
18 Contemori et al used a restricted cubic spline with four knots; this is a conceptually similar analysis with a different type of transformation of the data; it is not expected that this would impact the results
69
by zero) and were thus treated as missing data. Weights were added to each observation based on the reciprocal of the variance (i.e. 1/weights).
For all three previously listed metrics, the maximal model permitted by design that converged was used with correlation parameters removed (Barr, Levy, Scheepers & Tily, 2013). This included all dependent variables and by-subject and by-item random intercepts and slopes for all fixed effects. For the eye-tracking data, a single model was fitted for all data instead of multiple models, with bin (i.e. time) as an additional fixed effect. This resulted in each trial having multiple interdependent data points per trial. Therefore, a third random intercept was allowed, that of trial ID (the unique pairing of subject number and lexical item which defined a trial). When a model failed to converge, the random effects that accounted for the least variance were iteratively removed until the model converged.
2.3. Results
2.3.1. Accuracy
70
71
Table 4 Overall accuracy by condition as a percentage (SD) for each group
Monolinguals Bilinguals
Subject questions – Number match – First NP Singular 97.9% (14.5) 98.5% (12.2) Subject questions – Number match – First NP Plural 95.5% (20.8) 97.7% (14.9) Subject questions – Number match 96.7% (18.0) 98.1% (13.6) Subject questions – Number mismatch – First NP Singular 98.4% (12.6) 97.7% (14.9) Subject questions – Number mismatch – First NP Plural 94.3% (23.2) 97.2% (16.6) Subject questions – Number mismatch 96.6% (18.2) 97.5% (15.7)
All subject questions 96.6% (18.0) 97.8% (14.6)
Object questions – Number match – First NP Singular 82.3% (38.2) 83.3% (37.4) Object questions – Number match – First NP Plural 81.9% (38.6) 82.6% (38.0) Object questions – Number match 82.1% (38.4) 83.0% (37.6) Object questions – Number mismatch – First NP Singular 87.7% (32.9) 86.8% (33.9) Object questions – Number mismatch – First NP Plural 86.0% (34.7) 87.0% (33.7) Object questions – Number mismatch 86.8% (33.9) 86.9% (33.8)
All object questions 84.3% (36.4) 84.8% (36.0)
72
There was a significant main effect of syntactic structure (β = 1.07, SE = 0.13, z = -7.93, p <.001) but no significant effects of group (β = -0.01, SE = 0.16, z = -0.45, p = 0.972) or