• No results found

1.6 Documentation methods

1.6.2 Text work

Much of my early work on TEO consisted of elicitation sessions to get at the tone system. As the work on this area of the grammar has progressed I have been able to place a greater emphasis on the analysis of natural speech and discourse. During prior periods of fieldwork (2007-09) culturally significant work processes were identified, such as: the cultivation of maize, beans and other food staples as well as the cultivation of sugar cane and its processing into a raw sugar product, known as panela. Likewise, I noted a collection of historical narratives and place-based stories that are repeated by elder speakers. My interaction with language consultants and community elders has provided a necessary foundation for consultation on which additional speakers and culturally relevant themes would be important sources for this knowledge.

The recording and documentation of narratives as texts was central to my field re- search during the summers of 2010-2012. These recordings have been uploaded to the project computers, stored on project-dedicated external hard drives and will be archived at the En- dangered Languages Archive (ELAR) housed at SOAS (School of Oriental and African Studies) in the University of London and at AILLA (Archive of Indigenous Languages of Latin America) housed at the University of Texas at Austin. Metadata, as described in Nathan and Fang (2009), were created the day the recording was made and bundled with the media files in a folder uniquely labeled to identify the session or event. These archives contain session specific folders that include the original archivable recordings, related ELAN and Audacity files (Audacity Team), detailed metadata for each recording written in plain text format that includes equipment used, speakers involved, place of the recording, and a brief description of what the documented event is about. Audacity software was used for sound enhancement and ELAN for linguistic analysis, using a template specific to our work on Teotepec.

This part of the documentation of TEO may is similar to what Himmelmann (1998) describes as a “low level” analysis of the data; however, as a given text is reanalyzed, the analysis improves with each pass. This part of the text work moves from a docu- mentary activity very much into the realm of language description. This is where the language documentation and description activities overlap, minimally requiring a phonetic and phonological analysis, a segmental analysis, and well developed orthographic system.

Labels have been developed for ELAN which use the following tiers TxR - raw text, ft - free translation, Tx - text, gn - gloss national, ps - part of speech, and notes. Figure 1.3 is an image taken from one of the texts that has been transcribed and translated.

Figure 1.3: Example of TEO text in ELAN

for each utterance, allowing for easy reference back to the original audio file. The first tier, TxR - raw text, is where the text is transcribed in its “raw” form. At first, Hugo Reyes and I transcribed the text on this tier with all of the surface tone sandhi realizations and underlying tones in parentheses. As our analysis of the tones and sandhi evolved, we decided to transcribe using only the underlying lexical tones. In this way, sandhi rules can be “applied” to a given utterance to show surface level prosodic changes. The ft - free translation tier is designed to be where Chatino is translated into the language of wider communication - in this case, Spanish. These first two tiers serve as elemental for what is needed to get at the language in order to analyze its grammatical structure. This format will be used to extract transcriptions and translation that will accompany a given audio file for digital archiving or publishing of text materials for community members and other publics.

The tier Tx - text was originally included to allow for further analysis of the raw text. However, we have never used this tier. This tier could be used to reproduce the raw text with the surface prosodic sandhi realizations. The following two tiers gn - gloss national (Spanish in Mexico) and ps - part of speech are designed for text tokenization. Ideally, if the segmentation is good and the transcription and translation are accurate, this is a place where the tokenization would allow for getting at the lexemes for glossing parts of speech, compounds, and morphology. These items can be extracted for a dictionary or lexicon. The notes tier is used for making notations about a given text. All of the tiers are interlinked to the TxR - raw text tier, which acts as the parent interrelating all of the intermediate levels between notes and raw text.

The typical workflow of the texts for this project has involved the contributions of language consultants who work on the transcriptions in ELAN. Step three could be repeated as many times as necessary in order to achieve an analysis that is adequate for archiving and continued descriptive work. And step four is an ongoing process.

Table 1.3: Text Workflow

Step 1 Step 2 Step 3 Step 4

Record Text with ! Transcribe-translate: ! Review-Revise: ! Analyze texts Reginaldo or Hugo Reyes ! Hugo Reyes ! Myself & Hugo Reyes ! Analyze texts

Texts from the project were recorded and will be archived in 24-bit stereo .WAV files at a 44.1 kHz sampling rate with accompanying MPEG2 video files. Of the 30+ hours of material that have been recorded I have worked closely with speakers to choose the best texts to be used for the analysis of the grammatical description in the dissertation. These materials will be archived and made available through ELAR and AILLA sometime after the dissertation is completed. Texts will be archived in XML format exported from ELAN, along with the .WAV and MPEG2 files.

Related documents