3. Research framework
5.2 Methodology
5.2.8 Procedure for data analysis
The oral narratives from 116 speakers were transcribed following standard orthographic transcription including pauses. The data were coded and annotated in a database in Excel, which was converted into a database in Stata (version 11.2). The database, after being finalised and cleaned, comprised 4,839 clauses coming from 232 (=116*2) narratives. The annotation included information on the linguistic variables shown in Table 5.16. Only the linguistic variables in bold were taken into account in the final quantitative analyses of the production data.
135
Table 5.16. Linguistic variables of the annotation of narratives
Clause & Verb Subject
Type of Clause
(matrix or embedded)
Dependency (if embedded)
Type of embedded (adverbial, relative, complement: indicative, subjunctive, infinitive) Coordination Type of verb (existential, unaccusative, unergative, transitive) Finiteness Verb Person Verb Number
Ambiguous verb morphology
(Spanish)
Type of Subject (overt, null)
Category of Subject (NS, LS, OSP) Category of Subject if overt
(personal pronoun, demonstrative pronoun, LS, proper name,
demonstrative noun phrase, relative pronoun or relative complementiser)
Person Number Mismatch in SV agreement Adjacency Position Definiteness Animacy
Discourse Value (TS, TC, Focus)
Ambiguity
The following subject structures were excluded from the analysis: non-referential subjects or impersonal verbs (e.g. prepi ‘must’), direct speech, fixed expressions or fillers (e.g. pos na pume ‘how to say’), first person (e.g. vlepume ‘we see’), nominalization of clauses (e.g. to na traviksi tin ura tis gatas ‘the pulling of the cat’s tail’), proverbs (e.g. mana ine mono mia ‘there is only one mother’), formulaic phrases (e.g. ke zisane afti kala ki emis kalitera ‘they lived happily ever after’), codeswitching, verb phrase ellipsis, false starts, incomplete sentences and any unclear utterances. Subject-headed relative clauses were not considered in the narratives analyses because of their particular dependence on the head of the noun phrase. Verbs of relative clauses cannot be used with a NS or an overt subject like in other types of clauses, thereby often being excluded from such kind of analysis (e.g. Dimitriadis 1996; Montrul & Rodríguez Louro 2006; Shin 2012, 2016). The contents of the production task triggered mostly third-person animate subjects in singular. Instances of the other persons were ruled out from the analyses. Plural number and inanimate referents, although rare, were included. The contexts in narratives considered for analyses included only those of TC and TS. Infinitives in Spanish, whether adjuncts or complements, were regarded as clauses (see Torrego 1998; Zagona 2002).
136
A qualitative approach to the linguistic data was also conducted for each category of subject (LS, OSP, NS) in each context (TC, TS). A detailed examination of all the chains and relations established between matrix and embedded clauses was beyond the scope of the present study.
As for the AR data, the participants’ responses were also coded and annotated in an Excel database, which was converted into a Stata database comprising 1,840 responses coming from 115 speakers. The linguistic variables of interest in this case were (a) type of embedded subject (NS, OSP); (b) definiteness of the matrix object (definite, indefinite); and (c) antecedent preferences (AP).
5.2.8.2 Sociolinguistic variables
The language-external variables included in both Stata datasets were Sex, Group of speakers, Age, AοΒ, LοR (if immigrant) and Proficiency (if HS or L2). For monolinguals, only Sex, Group of speakers and Age were added in the corresponding datasets. The variables that were finally used in the analyses were Group of speakers, Age and Proficiency (Table 5.17). Sex was included only for descriptive purposes. AοΒ and LοR were strongly correlated with Age, thus only Age was included in the final statistical analyses since it was the only crucial language-external variable common to all groups of speakers of this study (see Kaltsa et al. 2015; Schmitz & Scherger 2017).
Table 5.17. Sociolinguistic variables used in the analyses
Sociolinguistic variables
Group of speakers: Spanish monolinguals, Greek monolinguals, IMM, HS, L2 Age at testing
137
5.2.8.3 Baseline
The ideal baseline against which to compare the linguistic behaviour of bilingual HS has encouraged discussion in the field, with the language of input generally being the most reasonable option. According to Polinsky and Kagan (2007), ‘the baseline language for a heritage speaker is the language that he or she was exposed to as a child’. As Benmamoun et al. (2013: 134) explicitly state regarding HS, ‘crucially, the baseline language is not the monolingual variety of that language but the language spoken by first-generation immigrants’ (see also Polinsky 2018). Schmid (2011) and Grosjean (2008) have also argued that using the monolingual norm as a yardstick to evaluate bilinguals more generally may be questionable. However, the comparison between monolinguals and bilinguals is not only difficult to ignore but it may be particularly useful for answering specific research questions (Montrul 2016b). Such a comparison may disentangle behaviours which are due bilingualism effects from behaviours stemming from cognitive biases which are not language specific. The second research question of the present study involves a comparison between monolinguals and bilinguals in order to pinpoint potentially existing differences and their sources. In order to better understand the sources of potential differences, a comparison between the bilingual groups becomes also relevant.
Comparisons were first drawn between monolingual Spanish and Greek data in order to establish potential differences between the two languages. The focus thereafter was on the Greek performance of the bilingual groups. The bilingual performance was compared against that of Greek monolinguals. The performance of HS and L2 groups was also compared against that of immigrants. The latter speakers are the main source of linguistic input for HS (see e.g. Zombolou 2011; Kaltsa et al. 2015; Montrul 2016b) and often for L2ers in this context. Crucially, HS and L2ers have different AoB and language exposure. Comparisons were also drawn between HS and L2ers since both groups share the characteristic of speaking Greek as their weak language.
138