The UMLC Corpus - Data and Coding

CHAPTER 3: CORPUS STUDY

3.2 Data and Coding

3.2.1 The UMLC Corpus

This dissertation makes use of a corpus of casual conversations collected as part of the Influence of Urban Minorities on Linguistic Change Project (UMLC) from 1981‒1984.21 The UMLC Project resulted in four published articles on African American speech in Philadelphia: Labov and Harris 1986; Myhill and Harris 1986; Ash and Myhill 1986; Graff, Labov, and Harris 1986.22

The initial aim of the project was to examine inter-ethnic communication within the city of Philadelphia with a focus on language variation in communities of color. For that reason, fieldwork was conducted in areas of the city characterized by ethnic/racial residential segregation like Logan, a Black and Hispanic neighborhood in North Philadelphia, as well as in those areas characterized by integration, like Germantown. Speakers from areas of West Philadelphia and other neighborhoods around the city are

NSF-funded research project 8023306 (1981‒1984), Principal Investigator: William Labov.

22_{Of the 42 speakers under study in this dissertation, 13 are identifiable as having been included in one of}

the 1986 analyses. Ash and Myhill 1986 in particular looks at speakers’ rates of use of ain’t in the past tense by their degree of inter-ethnic contact; However, the results reported in the article do not look at individual rates or other aspects of the linguistic and social context with respect to the variable, both of which are examined in this dissertation.

also included. The majority of speakers in the corpus are Black, and this dissertation focuses on those speakers.23

Figure 7: Map of Philadelphia showing the two test areas for the UMLC project field work.

The data in the UMLC corpus was collected by Wendell A. Harris (henceforth WH), a member of the Black community of North Philadelphia, then in his early 30s. WH utilized his social network to carry out much of the fieldwork, but also sought out to record speakers from different areas of the city who were previously unknown to him. Because of the fieldwork’s original aim, the recordings represent a diverse cross section of African American experiences in Philadelphia with regard to inter-ethnic contact and social mobility, though most speakers are from working class backgrounds. At the same time, many of the speakers in the corpus primarily or exclusively interacted with other Black speakers on a daily basis and can thus be considered to speak a vernacular variety of African American English (Baugh 1983; Ash and Myhill 1986; Labov 2014). In

The UMLC corpus includes a number of Puerto Rican and White speakers who have varying degrees of contact with other ethnic communities.

Germantown

previous work, Labov refers to the dense social network such speakers made up as the “Core” speakers in the community (Labov and Harris 1986; Labov 2014; Labov and Fisher 2015). Furthermore, given the field researcher’s high degree of embedding within the community and his intimate familiarity with many of the speakers, the recordings themselves are representative of vernacular speech, characterized as “the style in which minimal attention is given to the monitoring of speech” (Labov 1972a). The vernacular nature of these recordings is reflected in the fact that they are best characterized as conversations, not as sociolinguistic interviews.

In this dissertation, I analyze data from 42 speakers in 47 recorded conversations. These speakers represent a subset of the corpus, chosen because they identified as Black and had grown up in either Philadelphia or the Southern United States.24 The majority of conversations are approximately 45 minutes long—the approximate length of one side of the compact audio cassettes on which they were originally recorded, though some cover both sides of the cassette tape. Most recordings are of one-on-one conversations with the researcher, though some include multiple participants. A few speakers appear on more than one recording, either as a main or peripheral participant.25 All speakers have been given pseudonyms, which are the speaker names that appear in this dissertation.

Years of birth for the 42 speakers range between 1901 and 1969, representing 68 years worth of data in apparent time. The sample is roughly split between speakers under

24_{Though this sample represents only about a third of all recordings (not all recordings have been}

digitized), it does represent the majority of Black speakers in the UMLC corpus who grew up in either Philadelphia or the South. Other speakers were either African Americans who grew up in other areas of the country, Puerto Rican, or White.

25_{Some participants can be described as “peripheral” participants, meaning that they were not the}

researcher’s main conversational partner but were present for an extended period of the recording (either engaged in a parallel activity or listening in on the conversation). Such participants do make interjecting remarks from time to time.

30 years old (N=20 speakers) and those 30 and older (N=22 speakers), with the oldest speaker being 81 years old.

Figure 8: Density of corpus speakers by Age at the time of interview (1981-1983).

Social information on speakers was gathered from recordings as well as interview reports from the original fieldwork. Previous studies of the corpus as well as recorded fieldwork “journals” were also helpful in understanding the social networks and social characteristics of several speakers.

The recordings used in this dissertation were transcribed either by myself or by undergraduates at the University of Pennsylvania who were familiar with AAE.26 Undergraduate transcription was funded thanks to a Doctoral Dissertation Research Improvement Grant for Behavioral and Cognitive Sciences from the National Science Foundation. A portion of the recordings used in this dissertation had been previously transcribed as part of the Philadelphia Neighborhood Corpus.

Transcribers with “knowledge of African American language patterns” were recruited from Linguistics courses at UPenn and the University’s student job board. Students who qualified for an interview were given a semantic differential exercise for AAE constructions (What’s the difference between ‘The bus be

late’ and ‘The bus is late’) and were asked to transcribe one minute of speech from a corpus speaker who used several features of AAE. Students who were able to complete both tasks with accuracy were hired as transcribers.

In document Variation And Change In Past Tense Negation In African American English (Page 63-67)