Grammaticality judgment test procedure - Methods 1 School context

Caudate nucleus

5. The artificial language BrocantoJ 1 Introduction

6.3 Methods 1 School context

6.3.8 Grammaticality judgment test procedure

The grammaticality judgment test (GJT) was developed in E-Prime (version 2.0.10.356) and administered on an ASUS X553M laptop computer. It comprised a total of 28 novel test sentences (14 ungrammatical) and 4 practice sentences (2 ungrammatical). The test started with the practice block followed by 3 experimental blocks (10, 10 and 8 items respectively), with the possibility for the child to take short self-managed breaks at the end of each block, and ask further questions immediately after the practice items. The practice sentences were trials administered in the same modality as the experimental trials and included 2 grammatical and 2 associated ungrammatical sentences (the complete set is reported in Appendix D). Although detailed instructions were provided to every participant in advance, the aim of the practice trials was to familiarize the participants with the task. This included for example practising the sequence of events in the trial, experiencing the timing of each event and the two different types of judgment that were required together with the associated scales (i.e. GJT judgment and confidence rating). At the end of the four practice trials a screen appeared displaying the text: Adesso puoi chiedere alla maestra Diana. Altrimenti clicca la barra verde per continuare [Now you can ask Miss Diana. Otherwise click the green bar to continue]. At this point, if the participant required it, additional clarification information relative to the task instructions was provided. After that, the participants proceeded independently. The order of the practice items was the same for each participant but the order of the experimental blocks, as well as the order of items in each experimental block, was randomized across participants.

The trial started with a fixation cross (3 seconds) after which a sound icon appeared on screen with the associated aural sentence stimulus. Immediately after the

aural stimulus, the text Com'è? [How is it?] appeared on screen, together with a yellow arrow pointing down at the top-right part of the keyboard where six aligned keys were used to select the judgment response (corresponding to the keys from '7' to '=' on a British Microsoft keyboard). The keys were labelled with yellow stickers depicting six different smileys ranging from very unhappy to very happy and had no further numerical or text indication (Figure 6.4). A six-point scale in the

grammaticality judgment was used because it provided the opportunity to code the judgment in a binary way, but at the same time also to assess it as a fine-grained graded judgment (cf. discussion in 2.4.3). Graded judgments deploying multiple-point scales have been considered suitable for use with adults and children aged 4 and upwards (Ambridge & Rowland, 2013; see also Ambridge et al., 2006, who deployed a five-point grammaticality judgment scale with children as young as 5).

After the child had pressed one of the smiley keys, or after 7 seconds, the text Ti senti sicuro? [Do you feel sure?] appeared on screen with a picture of a light blue arrow. This time the arrow pointed sideways to a set of four keys on the top-left side of the keyboard (corresponding to the keys from '1' to '4' on a British Microsoft keyboard). The four keys were labelled with light-blue stickers and displayed the writing 'sì molto' [yes very], 'sì' [yes], 'così così' [so so], 'per niente' [not at all]. After pressing one of the blue keys, or after 7 seconds, the next trial started. Trials for which no grammaticality judgments were provided (even if the corresponding confidence ratings were provided) were later excluded from the analysis.

By allowing a selection of one out of the four options, the confidence rating procedure was designed to rate the participants' confidence in their GJT judgment immediately after this had been given. In practical terms, with the confidence rating the participants confirmed the degree to which they would have picked the same smiley face again, had they been asked to repeat the GJT judgment on the

immediately preceding sentence; with very high confidence ratings (corresponding to very sure) they maximally confirmed the judgment they had just given, whilst with very low confidence ratings (corresponding to not at all sure) they signalled a maximally high possibility they would give a different GJT rating given a second chance.

Note that, since it measures confidence, the confidence rating is independent of the specific judgment given in the GJT and applies regardless of it. For example, if the participant judged a GJT sentence to be very good (very happy smiley), the confidence judgment would be about how sure they were the sentence was indeed very good. Similarly, if for the same stimulus they had given a positive judgment but more towards the middle of the scale, the confidence rating would be about how sure they were the sentence was good but not perfect, etc.

In the GJT instructions direct reference to metalinguistic concepts like

(un)grammaticality or grammatical acceptability was avoided. The children were told that, having completed all six game blocks, they were now experts in the new

language. They were also told that Suzy wanted to create a new game block and had some sentences but needed their expert advice to decide which ones to choose. They could give their ratings of the sentences' suitability using the smiley scale based on how similar the sentences sounded to the ones they heard in the training videos and in

the game and pressing the key immediately when they saw the yellow arrow on screen.

It was considered particularly important to ensure that the children understood the difference between sentence rating and confidence rating. In order to clarify the difference between knowledge and higher-order thoughts about knowledge, the children were asked to consider the familiar classroom situation in which a teacher asks a question in class. In such situations, sometimes they would be absolutely sure a certain answer was right. In other cases they would be quite confident but not as much, and sometimes they would not know the answer but try guessing anyway. A very similar situation would happen in the task. After picking a smiley face to help Suzy choose good sentences for the game, they would have to say how sure they felt their choice was correct.

In terms of the composition of the GJT set, vocabulary items, including case markers, were counter-balanced across word categories (Table 6.5a) and the GJT experimental stimuli included 16 SOV sentences, 8 OV sentences and 4 SV sentences (Table 6.5b). The practice stimuli were entirely comprised of SV sentences (two ungrammatical and the two corresponding grammatical). In the game dataset (and as a consequence in the passive exposure and GJT datasets) SV sentences could not be completely excluded due to the impossibility to omit intransitive moves (moves where a single token was involved) from the game constellations. At the same time the number of SV moves was very limited compared to SOV and OV transitive moves. Due to the limited number of GJT items overall, the decision was made to use SV sentences for the practice trials rather than reducing the number of SOV and OV stimuli. Further, since there was an expectation that, overall, word order violations would have shown a greater learning effect compared to case violations, it was

decided to include the two SV sentences with word order violations in the GJT set and use the two SV sentences with case violations as practice items.

Table 6.5a

Frequency of Vocabulary Items in the GJT Set

Vocabulary items Category Frequency

blomi Noun 12 nipo Noun 12 pleca Noun 12 vode Noun 12 trose Adjective 9 neimo Adjective 9 klino Verb 8 nima Verb 8 yabe Verb 8 prazi Verb 8 noika Adverb 8 zeima Adverb 8 ri Preposition 25 ru Preposition 25 Table 6.5b

Frequency of Verbs per Sentence Type in the GJT Set nima prazi yabe klino TOT

SOV 2 8 6 - 16

OV 6 - 2 - 8

SV - - - 8 8

Tot 8 8 8 8* 32

Note. *half of these sentences were used as practice

All sentence stimuli (practice and experimental) contained an equal number of words (5; corresponding to 8 or 9 syllables in total). In the ill-formed sentences the ungrammaticality was never triggered by the first word in the sentence. The 14 ungrammatical sentences matched the corresponding grammatical ones and were

created inserting violations of case assignment (6 sentences) and word order (8

sentences). Case violations included sentences where the nominative or the accusative case markers were missing (2 sentences) and cases in which the wrong marker was used (4 sentences). Word order violations included ungrammatical order at sentence level (5 sentences) and inside the NP (3 sentences). Appendix D includes the complete GJT set with each sentence labelled for type of word order and, for ungrammatical sentences, type of violation.

GJT scores. In this paradigm the grammaticality judgment test is an offline measure of language learning based on the judgment of aural sentence stimuli as 'good' or 'bad' compared to the ones presented in the language training and practice. Unlike the online measures of learning, the GJT was administered presenting the stimuli outside the game context. Hence it mainly probed morphosyntactic learning independently of its semantic dimension.

Judgments on aural sentence stimuli were given on a six-point-scale (three grades for 'good' and three grades for 'bad', cf. Figure 6.4). For each stimulus this potentially allowed for both binary scoring and graded scoring. As ungrammatical sentences were created to violate specific grammar rules, subsets of the test stimuli could be used to assess learning of word order and case marking.

Confidence in the Accuracy of the GJT Response. The four-graded subjective qualitative rating of confidence in the accuracy of the GJT responses was turned into a four-point numeric scale (with the highest point in the scale corresponding to maximal confidence). Similarly to the GJT this allowed both binary scoring and graded scoring (2 grades for high and low-confidence items respectively, or 4 grades overall).

In document The earliest stages of second language learning:a behavioural investigation of long term memory and age (Page 170-176)