Application of the variationist method to the corpus data

RQ 4. Putting together the quantitative analysis (RQs1 & 2) and the qualitative analysis (RQ3) what does this tell us about the nature of bi-varietal language use in young

4.5 Application of the variationist method to the corpus data

In this section I will detail the steps taken to apply the variationist method, as examined in Chapter 3 (§3.2), to the current corpus data. One of the innovations used in this thesis is the creation of a L1 and L2 data set based on contextual rather than formal

92 criteria. Justification for this was given above (§3.1) and here I describe the procedure (§4.5.1). Following this, I provide an overview of why the variables present temporal referent, subject pronouns and transitive marking were selected (§4.5.2). Finally, there are three sections (§4.5.3-§4.5.5) addressing how several components of standard quantitative variationist methodology has been applied to this unique corpus data. 4.5.1 Forming the L1 and L2 data sets

In Chapter 2 (§2.1) I detailed the difficulty in determining which language a particular clause belongs to in a corpus that contains both a contact language and its standardised source language. For the reasons outlined above, I have decided to resist using structural definitions, and have rather opted to sort clauses in the corpus using contextual criteria. Data were coded for a variety of contextual situations, delineated primarily in terms of the addressee (Alyawarr, non-Alyawarr), location (home, school)—see Figure 4-1 below. Additionally, the data from several specific interactional contexts were assigned sub-category status, since there were grounds for suspecting the language use may be more varied in terms of code choice in these contexts. For example, on one particular field trip the recordings of the children telling the story of the wordless picture books

tended to contain a higher density of English-like structures13_.

It is hypothesized that these base and sub-contexts fall somewhere along a continuum of language use. The context hypothesized to be the most representative of an extreme end

of language use has been called the ‘base context’. There are two base contexts: ‘HOME’

and ‘SCHOOL’. The base contexts, as extremes, are predicted to be most likely to show contrasting systems of internal variation. Because of this, in the analysis which follows, special focus has been given to the base contexts. Clauses assigned sub-contextual status were excluded from the main analysis presented in Chapters 4-6 of this thesis (i.e everything that is in a ‘sub-context’ category was excluded). In short, each language (Alyawarr English and SAE) has been operationalised as a set of clause tokens fitting a set of contextual constraints (‘home/Alyawarr interlocutor’ and ‘school/non-Alyawarr interlocutor’ respectively). To the extent that these contextual constraints are adequate and HOME and SCHOOL data sets truly represent distinct linguistic entities that we can

13_{This was designated SB1, for though it took place at home, the fact that SAE seemed to be the language in use} (rather than AlyE), puts it with the other SAE sub-contexts.

‘Alyawarr English’ ‘SAE’

refer to as Alyawarr English and SAE, then systematic variation that is detected within and different between each data set will be taken as evidence that the children are using two different grammars.

Figure 4-1: Classification of clauses into social context categories, in order to form the two main HOME and SCHOOL data sets, which are taken as representative of the children’s Alyawarr

English and SAE respectively.

‘HOME’ BASE CONTEXT Playing at home, talking to another Alyawarr person ‘HOME’

SUB-CONTEXTS SUB-CONTEXTS ‘SCHOOL’ BASE CONTEXT ‘SCHOOL’

At school, talking to a non-Alyawarr teacher

SB1: At home, Reading from wordless picture book where the tendency was to use SAE features

SB2: At school, reading aloud from a book

SB3: At school, mimicking the

teacher/computer/interactive whiteboard game

SB4: At school, working on SAE text production task, with fellow-student SB5: Playing at home, talking as a character or toy

SB6: Playing at home, talking to researcher

HB1: At home, reading a wordless picture book where the tendency was to use Alyawarr English features

HB2: At school or preschool, talking to another Alyawarr person

(Excluded from analysis)

NOTES: Although SB1, SB5 & SB6 take place at home, the children appeared to be using SAE so it was classified with the ‘school’ sub-contexts attempting to capture SAE use. Likewise HS2 reflects use of more Alyawarr English like features in the classroom (when talking to fellow students) and so falls within the data set attempting to capture Alyawarr English.

RE: SB5. The young girls in the corpus universally spoke as other characters using speech that used a higher density of SAE features (either when voicing black or white dolls or in role-playing where they were the actors).

94 4.5.2 Variable selection

Temporal reference was initially determined as the main variable of study for several reasons. Firstly, it is a compulsory feature of most clauses, so expressions of temporal reference would likely be abundant in the data. Secondly, I had already determined from preliminary grammar analysis that there were different, but overlapping, morphological features involved in the expression of temporal reference in Alyawarr English and SAE. So this would be a site for variability across codes, as well as within codes. This kind of variable—which differs structurally from one variety to another—has been called a ‘conflict site’ (Poplack & Meechan 1998:132). In determining the language of origin of components of mixed languages, the conflict site is where the relevant grammatical structure differs between the source languages. The extent to which a grammatical structure in the mixed language resembles one source language (but not another), a determination regarding the origin of the structure in question can be made. In order to determine whether the children in the present study are producing two different

grammars, the site of investigation needs to also be a potential source of conflict (between the HOME and SCHOOL data sets). To the extent that we know how these variables operate in the input varieties (i.e. adult Alyawarr English and SAE) then we might infer a direction of influence over the emerging grammar in each case.

Once the clauses consisting of each base context were extracted, these were examined to see which temporal reference type would provide the largest amount of data. I was surprised to find that the classroom environment as a general rule seemed to provide students with little opportunity for the production of complete SAE clauses. Getting enough school clause tokens for analysis required the transcription of many more hours of recording than the home data, and even then, there is a large difference between the number of clause tokens gleaned from each context. Present temporal reference clauses were the most common in the school data, and so this was chosen as the focus of analysis and comparison. In coding for temporal reference, particularly in the school data, I examined the context of each clause carefully, in order to not be too influenced by the form of the verb. I was particularly sensitive to the noted tendency for early L2

95 learners of English to use temporal ‘anchors’ such as adverbs (‘yesterday’ ‘always’) and leave verbs unmarked (Klein 1995, Long & Sato 1984).

Table 4-1: Three variables and their primary variants in participating Alyawarr children's Alyawarr English (HOME), and SAE (SCHOOL), and in native-speaker SAE.

HOME SCHOOL native-speaker SAE

tense-aspect morphology V~Ving~Vbat V~Ving V~Ving

1sg Subject Am~I Am~I I

transitive marking -im ~ -ø -im ~ -ø -

In addition to aspect morphology, transitive marking and subject pronominals have been examined since the relationship between the systems in the HOME and SCHOOL differ for each of these. As noted above (§1.4 and §3.4), and shown again in Table 4-1, Alyawarr English and native-Speaker SAE have different numbers of variants for each variable. In the children’s SAE data, captured by the SCHOOL data set, the number of variants is at least 2 in each case, and so this allows for comparative variationist analysis of each variable.

4.5.3 Variable-rule analysis

There are several guiding principles of variationist analyses that both reflect a theoretical stance regarding the nature of variability in natural language and have methodological consequences for proceeding with quantitative analyses. I will highlight two here. Firstly, the ‘principle of accountability’ (Labov 1972a: 72) holds that features of interest must be examined within the context of the other linguistic forms that comprise that functional set. In the current thesis this means that all forms of present temporal expression must be examined before the complete story about each particular form can be understood. The functional system (e.g. present temporal reference) is the ‘variable’ and the multiple formal expressions of that system (e.g. in this case, three verbal

morphemes -ing, -bat and -ø are analysed quantitatively) are the ‘variants’.

Secondly, each form (or variant) within a functional system will have contexts in which only one form occurs virtually all of the time. The job of the analyst is to carefully examine the corpus to locate these sites of categorality, and exclude them from the

96 variable rule analysis. For the purposes of the statistical analysis, categorality is actually defined as a feature occurring with an incidence of fewer than 5% or greater than 95% (Guy 1988:132). In the cases of the present corpus, where there has been no prior description of the L1 in question, this calls for a period of language description and exploration: “a long series of exploratory manoeuvres” (Labov 1969: 728–729). This process has been called ‘circumscribing the variable context’ (e.g. Tagliamonte 2006: 13) and is reported on in each analysis chapter.

The procedure for modelling variation used in this thesis is logistic multiple regression conducted in Goldvarb X (Sankoff, Tagliamonte and Smith 2005).

4.5.4 Longitudinal and cross-sectional design

The lack of longitudinal data has been called one of the greatest problems in the study of child language acquisition. Doughty and Long (2003:3) state that “longitudinal studies of children ... are distressingly rare; the vast majority of SLA studies are cross-sectional, with serious resulting limitations on the conclusions that can be drawn on some

important issues”. While cross-sectional study design is quite common and profitable in traditional explorations of sociolinguistic variation, variationist SLA studies are moving in the direction of more longitudinal design (Bayley and Tarone 2012). The modelling of variation over time allows for close examination of the staging of language acquisition and change. The present corpus has been designed as both cross-sectional and

longitudinal - as shown in Table 4-2. However, the main analyses presented in the following chapters in fact combines the data into one group. This was unfortunately necessitated by the low overall token counts once the relevant exclusions were applied (I describe this process in detail in the relevant chapter sections). As a result, while

potential age effects are still discussed in some detail (particularly for transitive marking), they are not included in the multivariate analyses. The corpus does, nevertheless, retain the potential to be analysed in this way for other more frequently occurring variables (such as phonological features).

Table 4-2: Number of tokens per speaker at each age point, present temporal reference clauses only Lower Primary 5;0-5;5 5;6-5;11 6;0-6;5 6;6-6;11 7;0-7;5 7;6-7;11 Total Dylan 105 105 Lenora 96 275 131 502 Alysha 86 113 146 345 Tiffany 111 161 172 444 Deanna 60 54 128 242 Simon 102 135 139 376 Shamus 193 170 95 458 Ramona 39 39 Emerson14 ₃₁ ₃₁ Total 31 334 365 485 498 593 236 2542

4.5.5 The individual and the group

It was noted in Chapter 2 (§2.2.1) that sociolinguists have gone to some efforts to explore and defend the concept of a ‘speech community’; that is, a cohort of speakers whose variable use of a particular language feature or set of features conforms to the same set of constraints and social norms (Labov 2007: 347). In the language acquisition context a cohort of speakers or ‘speech community’ could be a group of learners at the same level of acquisition. This might be particularly evident for language features given to reordering of constraints over the process of acquisition, as we saw in Chapter 3 (§3.2.1.1). Therefore, it is important that a group of speakers tendered as the sample be shown to behave in a similar manner with respect to variable usage of a feature. Traditionally inter-speaker variability is explored prior to conducting statistical

analyses. In particular, the focus is on outliers; individuals who behave quite differently from the rest of the group. Once identified, the analyst may investigate that individual’s linguistic history as a means of explaining their divergent patterns of use. One

procedure for detecting outliers, usually unpublished, is to cross-tabulate each of the individuals in the sample by each of the factor groups (Tagliamonte 2012:131): each individual speaker’s use of the variable should mirror the group pattern. For example, if

14_{Emerson’s birthdate was not recorded, though he was in the lower primary class during the recordings taken of} him. When data is discussed in terms of age effects, his data is generally excluded (as noted in the relevant sections).

the use of aspectual morpheme -ing is favoured by durative clause contexts across the

pooled sample, then the data of individual speakers should reflect this pattern too. This is the procedure that has been undertaken for the three variables in the present study. This exploratory work confirmed that there is sufficient inter-speaker consistency for the factors ultimately found to significantly constrain variation, so we can be sure that these factors are operating for all the children in the sample.

In document Alyawarr children's variable present temporal reference expression in two, closely related languages of Central Australia (Page 111-118)

Application of the variationist method to the corpus data

RQ 4. Putting together the quantitative analysis (RQs1 &amp; 2) and the qualitative analysis (RQ3) what does this tell us about the nature of bi-varietal language use in young

4.5 Application of the variationist method to the corpus data

RQ 4. Putting together the quantitative analysis (RQs1 & 2) and the qualitative analysis (RQ3) what does this tell us about the nature of bi-varietal language use in young