14596786 Write Like an Academic the Thesis

(1)

Incorporating Corpus data into an Advanced

Academic Thesis Writing Course

Nilgün Hancıoğlu

Submitted to the

Institute of Graduate Studies and Research

in partial fulfillment of the requirements for the Degree of

Doctor of Philosophy

in

English Language Teaching

Eastern Mediterranean University

January 2009, Gazimağusa

(2)

Approval of the Institute of Graduate Studies and Research

________________________________ Prof. Dr. Elvan Yılmaz

Director(a)

I certify that this thesis satisfies the requirements as a thesis for the degree of Doctor of Philosophy in English Language Teaching.

________________________________ Assist. Prof. Dr. Fatoş Erozan

Chair, Department of English Language Teaching

We certify that we have read this thesis and that in our opinion it is fully adequate in scope and quality as a thesis for the degree of Doctor of Philosophy in English Language Teaching.

________________________________ Assoc. Prof. Dr. Gülşen Musayeva Vefalı

Supervisor

Examining Committee

___________________________________________________________________

1. Prof. Dr. Gürkan Doğan ______________________________

2. Prof. Dr. Ülker Vancı Osam ______________________________

3. Prof. Dr. Deniz Zeyrek ______________________________

4. Assoc. Prof. Dr. Necdet Osam ______________________________ 5. Assoc. Prof. Dr. Gülşen Musayeva Vefalı ______________________________

(3)

ABSTRACT

Since English remains the primary language of science and research across the globe, many academics are required to produce research in a language that is not their own. This research has been motivated by the difficulties this presents for the post-graduate students at the Eastern Mediterranean University (EMU). The main aim of the study is to construct a comprehensive pedagogic corpus for such students, and to incorporate it into an advanced academic thesis writing course. To this end, a learner abstract corpus (LAC) and a target abstract corpus (TAC) were compiled respectively from work produced by post-graduate students at EMU, and from abstracts written by post-graduate students in English speaking countries. Both quantitative and qualitative methods were utilized to analyse the corpora. The comparison of the corpora exhibited extensive use of higher frequency vocabulary, a tendency to repeat similar items, and recurrent inadequacy in using appropriate collocations and lexico-grammatical patterns in the LAC. The work in the TAC, however, demonstrated the use of a wider range of lower frequency words, as well as a more varied lexico-grammatical utilization of these items. Accordingly, a pedagogic corpus was constructed. This corpus includes the Academic Abstract Corpus (AAC) Bank, which offers alternative lexico-grammatical patterns for fulfilling the required generic moves and sub-moves in abstract and thesis writing; the TAC Wordlist of 165 key words for thesis writing; the web concordances of the LAC and the TAC; and a variety of teacher-led data driven and learner-led discovery tasks as well as other diverse academic writing resources. The corpus-informed course is mounted on Moodle, a virtual learning platform founded on social constructivist principles. The study produced major conclusions regarding corpora,

(4)

wordlists, and lexico-grammatical patterns, the broader implications and applications of which are explored from a range of perspectives.

Keywords: corpus, pedagogic corpus, learner corpus, lexico-grammar, genre, academic writing

(5)

ÖZET

Günümüzde İngilizce’nin tüm dünyada bilim ve araştırma dili olması, birçok akademisyenin kendi dili olmayan bir dilde araştırma üretmesini gerektiriyor. Bu çalışma, Doğu Akdeniz Üniversitesi’nde lisansüstü eğitim yapmakta olan öğrencilerin bu bağlamda yaşadıkları zorluklardan esinlenmiştir. Araştırmanın temel amacı bu tür öğrenciler için kapsamlı bir eğitsel bütünce oluşturup, bunu ileri düzeyde tez yazımı dersine kazandırmaktır. Bu amaçla, biri Doğu Akdeniz Üniversitesi lisansüstü öğrencilerinin, diğeri ise ana dili İngilizce olan ülkelerdeki üniversitelerde lisansüstü eğitim görmüş öğrencilerin yazdığı tez özetlerinden oluşan iki bütünce oluşturulmuş ve oluşturulan bütüncelerin çözümlemesinde nitel ve nicel yöntemlerden yararlanılmıştır. Bu bütünceler öğrenci tez özeti bütüncesi (LAC) ve hedeflenen tez özeti bütüncesi (TAC) olarak adlandırılmıştır. İki bütüncenin karşılaştırılması sonucu LAC’taki tez özetlerinde en sık kullanılan sözcük listelerindeki sözcüklerin yaygın olarak kullanıldığı, benzer sözcüklerin tekrarlandığı, eşdizimsel sözcük ve sözcük gruplarının doğru kullanımlarının yetersiz olduğu gözlemlenmiştir. Diğer yandan, TAC’taki tez özetlerinin çözümlemesi, daha ender sözcüklerin sıkça ve eşdizimsel sözcüklerle ve sözcük grupları içinde doğru olarak kullanıldığını göstermiştir. Çözümleme sonuçlarından çıkan verilere dayanarak yaratılan kapsamlı eğitsel bütünce, tez özeti ve tez yazımında kullanılan hamleleri oluşturmak için gereken eşdizimsel sözcüklerden ve sözcük gruplarindan oluşan bir banka (AAC Bank), tez özeti yazımında en çok kullanılan 165 sözcük ailesi (TAC list), çözümlemede kullanılan iki bütüncenin örüt dizini (web concordance), bilgisayar ve veri destekli öğretmen ve öğrenci önderliğinde farklı türde aktiviteler ile çeşitli akademik yazı kaynaklarından

(6)

oluşmuştur. Bütünce destekli ileri düzey tez yazımı dersi sosyal oluşturmacılık ilkesine dayanan sanal etkileşimsel bir örüt (web) ortamına (Moodle) taşınmıştır. Bu çalışma, bütünce, sözcük sıklığına dayalı listeler, eşdizimsel sözcük ve sözcük gruplarıyla ilgili önemli sonuçlar üretmiştir. Bu sonuçların çıkarımları ve uygulama alanları çalışmada değişik açılardan irdelenmektedir.

Anahtar sözcükler: bütünce, öğrenci bütüncesi, eğitsel bütünce, eşdizimsel sözcük ve sözcük grupları, yazı türü (genre), akademik yazı

(7)

ACKNOWLEDGEMENTS

In a unique corpus-based study of dissertation acknowledgements, Ken Hyland remarks that completing a thesis is a long and difficult task, and that many students see “an acknowledgement as an important way of publicly recognizing the role of mentors and the sacrifices of loved ones” (2004b, p. 306). As I have now reached this concluding stage, I can understand these sentiments only too well.

I would like to express my sincere gratitude to my supervisor, Assoc. Prof. Dr. Gülşen Musayeva Vefalı, for being my constant guide and mentor, for providing me with invaluable feedback by reading my numerous drafts not merely line by line, but word by word, to ensure that there were no gaps or omissions. I genuinely appreciate her tireless work, and support.

I am extremely grateful to the members of the thesis monitoring committee, Prof. Dr. Gürkan Doğan and Assoc. Prof. Dr. Necdet Osam. It is thanks to their comprehensive, thoughtful, and highly useful feedback that this research developed in the way that it has. I would also like to thank the members of the jury, Prof. Dr. Ülker Vancı Osam, Prof. Dr. Deniz Zeyrek and Assist. Prof. Dr. Ali Sıdkı Ağazade for their valuable comments towards the improvement of this thesis.

I decided to conduct corpus related research in 2002. It was during this period that I first read Averil Coxhead’s article on the development of the Academic Word List. Her article then led me to Prof. Dr. Tom Cobb’s extraordinary Lexical Tutor Website, the riches of which have generously been made freely available to us all. I

(8)

would therefore like to acknowledge the genuinely inspirational role of both these researchers in the realization of this study.

I would like to thank all those friends that accompanied me on the long road to this PhD. It was fun because of them. Special thanks go to my closest companion, Elmaziye Özgür Küfi, with whom I went through thick and thin.

My colleagues at the Department of General Education have been extremely understanding and helpful. I would like to thank all of them, especially Ayfer Şen and Yeşim Dede for helping ensure that time and space was made available for me to focus on my research, especially in the last period, and Aytül Dereboylu for being by my side when I most needed support and a helping hand.

I should also thank all those ENGL501 students who have taken this course over the years, for bearing with me as the course evolved, and for willingly taking time out of their own busy research schedules to assist me further when I asked. This research and accompanying course has been developed on their behalf, and their positive responses and appreciation therefore make me feel immensely proud and privileged.

I would like to thank my whole family, and especially my sister, Berna, my brother, Attila, my aunt and my uncle Jennifer and Engin Kemal Örek, and my niece and three nephews for their encouragement and support. Special thanks go to Özbil Ege, who showed a genuine interest in my work despite his young age.

This has indeed been a long and stressful journey. I would like to express my deepest gratitude to my beloved parents, Serpil and Özbil Hancıoğlu, for teaching

(9)

me the meaning of hard work and sacrifice, for encouraging me, and believing in me not only through this period, but all my life. It really means a lot to me.

How can I ever thank my dear friend, Steve Neufeld, who was always there when I needed help, guidance, and advice? On this strenuous journey, it was wonderful to be accompanied by a true friend, and a positive personality, one who seems to lack words of negation in his mental lexicon.

I owe the greatest debt and gratitude to my dear husband, who looked after me, encouraged me, guided me with his wisdom, and always stood by me. It is a cliché to say “I could not have done it without you”. In my case, it is the essential reality. I could not have done it without you, John P. Eldridge.

(10)

CHAPTER I

INTRODUCTION

The skill of writing is to create a context in which other people can think.

Edwin Schlossberg

This research study adopts a corpus-informed approach to academic writing pedagogy, and employs two corpora with the aim of constructing a pedagogic corpus with multiple components that incorporate teacher-directed data-driven in-class work, complemented by a virtual learning environment providing access to the authentic corpus data and learner-led exploratory tasks. The pedagogic corpus incorporating manifold constituents is envisaged to assist non-native post-graduate students involved in research and publication in producing accurate and appropriate written texts.

This introductory chapter first provides the background to the study. After the problem that prompted this research is described in detail, the chapter presents the purpose of the study. Following a discussion of the factors that make this research significant, the terms that are exploited in this study are defined.

1.1 Background to the Study

English is the primary international language of research communication (Garfield & Welljams-Dorof, 1990; Krashen, 2003; Swales, 1990), “today’s premier research language” (Swales, 2004, p. 33), and “indeed the international language of science”

(14)

(Wood, 2001, p. 71). As the overall trend in many domains, including education and academic scientific research, is toward globalization, “more and more nonnative speakers are seeking to publish in international journals devoted to English language teaching, applied linguistics, and related areas” (Flowerdew, 2001, p. 121). Further, while the percentage of articles written in English in the 1977 Science Citation Index was 83% (Krashen, 2003), by 1997, this number had increased to 95% and of this, only half came from authors in English-speaking countries (Graddol, 1997). This increase was not due to more research done by scholars in English-speaking countries, but because of more scholars from non-English speaking countries publishing in English (Krashen, 2003). It is clear, therefore, that non-native speakers

do write ‘a considerable number’ of research articles even in the “most prestigious

journals in science” (Wood, 2001, p. 80).

Swales (2004) refers to today’s ‘Anglophone research world’, and states that “the status and contribution of the non-native speaker of English has become somewhat more central than it used to be and increasingly (albeit slowly) is perhaps recognized as such by native speakers of English” (p. 52). He further holds that by the beginning of the 21st century, there has been some internationalization of the research world, and the role of ‘non-Anglophones in that Englishized world’ has gained greater recognition (Swales, 2004, p. 46). According to Wood (2001), to become a member of this world, the scientist needs to produce research accepted by the community. “The more central the claim and the more widely accepted by the community, the more central a member of the community the researcher becomes” (Wood, 2001, p. 81). For the achievement of these aims, Wood (2001) claims, “the researcher must deploy a skilful use of language” (p. 81).

(15)

Such statistics and observations stated above clearly indicate that non-native speakers of English are under increasing pressure to both follow the latest research, and probably even more so, to have their own research published. Non-native speakers of English “risk being unaware of- and overlooked by- mainstream international research unless they learn to read, write, and publish in English” (Garfield & Welljams-Dorof, 1990). Given the speed of change and development in all aspects of science, technology, and research in general, and the generally accepted view, and necessity that a common language of research is required to disseminate research and findings with efficiency, it is unlikely that the position of English as the dominant research language will diminish in the near future. Wood (2001) states that:

for scientists to become recognized and successful their work must be read and cited by their peers as frequently as possible. To ensure such citation it is imperative that their work be accessible to as many as possible and thus that it be written in English. (p. 71)

Hence, non-native speaker researchers and academics would seem to have little choice, but to continue to try and master the prevailing conventions of academic English. This may not be a major problem for NNS (non-native speaker) researchers with experience, as “experienced NNS writers are familiar with the discourse requirements of their discipline” (Wood, 2001, p. 77). Nevertheless, Wood (2001) acknowledges that the beginning writer faces difficulties in terms of publishing research (p. 77), and cites Canagarajah (1996), Jernudd and Baldauf (1987), St John (1987), and Swales (1990, 1996), who have emphasized non-native writers’ difficulties in publishing in English (p. 77).

(16)

There seems to be little question, then, that non-native speakers of English need and will continue to need a lot of guidance and support in developing acceptable performance levels in reporting research. This is more so for beginning researchers. Whether or not they have traditionally been provided with this support is a different issue. Swales, for example, points out that there is little research on “how non-native speakers of English manage to survive in an increasingly English-dominated research world” (1990, p. 102).

Hyland (2004a) also agrees that “in an era of globalisation, English is now established as the world’s leading language for the dissemination of academic knowledge” (p. ix). He further emphasizes that:

whether we see this as a facilitative lingua franca or a rampaging

Tyrannosaurus rex (Swales, 1997), the dominance of English has

transformed the educational experiences and professional lives of countless students and academics across the planet. (Hyland, 2004a, p. ix)

Writing, therefore, has become a central element of university courses, as well as professional development programs, which necessitated the understanding of “what these discourses of the academy are, and what counts as ‘good writing’” (Hyland, 2004a, p. x).

According to Grabe and Kaplan (1996), “academically valued writing requires composing skills which transform information or transform the language itself” (p. 17). Therefore, writing, particularly the more complex composing skill appreciated in the academy, “involves training, instruction, practice, experience, and purpose” (Grabe and Kaplan, 1996, p. 6). Conventionally, English for Academic Purposes (EAP) classes have offered academic language support to especially university

(17)

students. Yet, these courses have generally tended to focus on the general needs of students involved in academic studies, and catered more for university students at undergraduate level, who are not expected to carry out or publish research. However, post-graduate candidates who are engaged in conducting and disseminating research have more sophisticated needs in terms of language knowledge and related skills, the most important of which is producing cohesive and coherent written text. Hyland (2005) holds that EAP teachers should do more research in their classes to better understand their teaching context (p. 60). This need is even more pressing for EAP teachers of post-graduate students who have very specialized needs, and who need a lot of guidance and support in attaining a language level whereby they can report their research and compete in the international academic discourse community.

Written text is “the product of a series of complicated mental operations” (Clark and Clark 1977, cited in Richards, 1990, p. 101), and is not easy to construct. After deciding on a meaning to be conveyed, writers must consider the genre, the style they are going to employ, the purpose they want to achieve and the amount of detail required to achieve it (Richards, 1990, p. 101-102). Nunan agrees that “producing a coherent, fluent, extended piece of writing is probably the most difficult thing there is to do in language” and “it is something most native speakers never master”. He also acknowledges the enormity of this challenge for second language learners, “particularly for those who go on to a university and study in a language that is not their own” (1999, p. 271).

One very important consideration in text creation is that language does not exist in a vacuum, but is a social phenomenon used for social interaction. Gumperz (1968, p. 219) emphasizes this fact by referring to verbal interaction as “a social process in

(18)

which utterances are selected in accordance with socially recognized norms and expectations”. He states that “the communication of social information presupposes the existence of regular relationships between language usage and social structure” (Gumperz, 1968, p. 220). The fact that language use is closely related to the social context naturally leads to the concept of ‘genre’.

Hyland characterizes genres as “socially recognized ways of using language” (Johns et al., 2006, p. 3). For Swales, a genre is “a class of communicative events, the members of which share some set of communicative purposes” (1990, p. 58), and this purpose determines the genre’s ‘generic’ (Flowerdew, 2000; Halliday & Hasan, 1985; Henry, 2007; Nunan, 1993), ‘organizational’ (Flowerdew, 2000), ‘discourse’ (Swales, 1990), ‘generic move’ (Flowerdew, 2000), or ‘schematic’ (Swales, 1990) structure. This structure is achieved through units of purpose, called ‘moves’ (Swales, 1990) or ‘move structures’ (Flowerdew, 2000) which are fulfilled by lexico-grammar (Henry, 2007, p. 1-2). Key lexical phrases represent the move structures of a genre (Flowerdew, 2000, p. 374). Moves, in turn, are realized through different ‘strategies’ or ‘tactics’ (Henry 2007), which are tactical selections of the writer in accomplishing the purpose (Bhatia, 1993, p. 19). These tactics or strategies similarly necessitate the exploitation of lexico-grammar. Therefore, it can be concluded that lexico-grammar has a major function in the fulfillment of strategies or tactics leading to moves, which in turn form the generic structure of a genre, and thereby reflect its communicative purpose.

The major role lexico-grammar plays in text creation requires a thorough analysis of lexico-grammatical features employed to fulfill different communicative purposes in texts, and this comprehensive analysis is nowadays viable through the use of a

(19)

corpus, “a collection of naturally-occurring language text, chosen to characterize a state or a variety of a language” (Sinclair, 1991, p. 171). Referring to the late 1950s, Leech (1991) recalls that “for years, corpus linguistics was the obsession of a small group which received little or no recognition from either linguistics or computer science” (cited in Granger, 2003, p. 538). Due to the recent developments in computer technology, however, it is now possible for anyone to store large amounts of language data on a computer for analysis. Like many other scholars and researchers, Hunston holds that “corpora, and the study of corpora, have revolutionized the study of language and the applications of language” (Hunston, 2002, p. 1). Referring to the emergence of this new view of language and the use of technology related to it, Sinclair points out that “the analysis of language has developed out of all recognition” (1991, p. 1).

In the last two decades, extensive vocabulary research has been carried out using corpora. In the 1980’s, the English Language Research group at the University of Birmingham collaborated with Collins publishers to create language reference works, and the Collins Cobuild dictionary project produced new research areas in the study and teaching of languages (Sinclair, 1991, pp. 1-3). Extensive research has also been done on compiling wordlists based on frequency, as research shows that most English texts are covered by a limited number of words, with the most frequent 2,000 word families making up 79.7% of all text (Cobb, 2002). Basing her research on West’s manually compiled list of the most commonly used 2,000 words in English (GSL, General Service List) in 1953, Coxhead (2000) produced the Academic Word List (AWL), which is based on a corpus of academic texts from different fields and consists of the most frequent 570 word families not included in the GSL. More recently, Billuroglu and Neufeld (2005), in an attempt to tackle the

(20)

weaknesses they observed in the GSL and the AWL, compiled the BNL (Billuroglu Neufeld List), which is based on an analysis of six different wordlists comprising 2,709 word families. These lists are invaluable in teaching English, especially for academic purposes, and are, undoubtedly, indispensable resources for vocabulary recognition and tools for staged vocabulary development. However, they are criticized for treating words as isolated units, and separating lexis and grammar. Wordlists, inasmuch as they are very useful resources in language learning, cannot be the sole resource to rely on for productive vocabulary teaching purposes, as in real life, words are never encountered or produced in isolation, but in a social context.

As regards grammar, the belief is that corpus-based studies have the potential to revolutionize grammar teaching in the 21st_{century through providing}

register-specific descriptions of English grammar, shifting the emphasis from structural accuracy to appropriate use of structures, and most importantly, incorporating grammar teaching with the teaching of vocabulary (Conrad, 2000, p. 549). Extensive research employing corpora is also being increasingly carried out by genre analysts, especially by researchers involved in EAP. In addition to studies that focus on the generic move structure of many different kinds of genres, there has also been considerable research concerned with how different moves are achieved through language, i.e, lexico-grammar (Bonn & Swales, 2007; Flowerdew, 2000; Henry, 2007; Hyland, 2004a; Hyland & Tse, 2005; Ozturk, 2007; Paltridge, 2002; Weber, 2001). However, in spite of the wealth of the research in this field, there are few studies analyzing problems that non-native post-graduate students face in producing coherent and appropriate language to write their thesis, and to publish in internationally recognized journals. In addition, although there are books (Swales &

(21)

Feak, 1994), websites (http://www.uefap.com/index.htm), and university academic writing centres trying to provide support for post-graduate students in writing their theses, there has been insufficient attention to the thesis writing instruction.

As already stated, EAP practitioners need to do more research in the classroom so as to be able to acquire a better understanding of the teaching-learning experiences, and provide continuous and up-to-date support and guidance to students. The need, therefore, is to focus more on classroom practices, and exploit corpora not only for research but also for pedagogic purposes.

1.2 Statement of the Problem

In academic settings, especially at post-graduate levels, non-native speakers of English are faced with a serious problem. They are specifically expected to produce work at a native-like level to be admitted into the academic discourse community. As the conventions in text types determine the intertextuality of texts, creating texts should not be considered an “individually-oriented, inner directed cognitive process….but an acquired response to the discourse conventions which arise from preferred ways of creating and communicating knowledge within particular communities” (Swales, 1990, p. 4). These communities are known as discourse communities and they are:

recognized by the specific genres that they employ, which include both speech events and written text types. The work that members of the discourse community are engaged in involves the processing of tasks which reflect specific linguistic, discoursal and rhetoric skills. (Swales, 1990, p. vii)

(22)

This study focuses on the academic genre of ‘theses’. ENGL501 (currently Advanced Thesis Writing) is an advanced writing course offered to Masters’ and PhD candidates from all faculties by the Modern Languages Division of the Department of General Education at EMU (Eastern Mediterranean University), an English-medium university in North Cyprus. The students taking this course come from a variety of countries and backgrounds. The original EFL 501 course was designed by the former School of Foreign Languages, upon the request of the EMU Graduate Institute to support Master’s and PhD students languagewise in the thesis writing stage, as although language support is provided for undergraduate students, post-graduate students were not previously given language guidance.

The course was first designed to focus on the common language functions and lexis in academic writing prior to thesis writing. Gradually it evolved into a thesis writing course with a language focus. Currently, the aims of the course are specified in the course description as follows: The purpose of this course is to develop the academic thesis writing knowledge and skills of post-graduate candidates. The course focuses on improving the participants’ academic study skills, and their knowledge of academic conventions, and thesis structure and format. It is also the objective of the course to systematically develop post-graduate candidates’ academic vocabulary knowledge and skills, to develop their awareness of the need and benefit of producing multiple drafts with the aim of improving the structure, lexis and style of their own text, and bringing their work to an acceptable level.

The participants of the thesis writing course observe and analyze the organization, the discourse structure, the grammar and lexis of different sections or chapters of authentic theses, and how they are made cohesive and coherent through the use of

(23)

lexico-grammatical devices. The post-graduate candidates then produce their own work and are encouraged to identify their own problems with language use, and find solutions to those problems. To this end, they work on multiple drafts, which make up their end-of-semester portfolio, until the end-product is adequate.

Although the participants are given guidance and support in terms of the moves making up the generic structure of the thesis in accordance with the genre-based approaches, the quality of most of their work reveals a gap between actual and target performance levels in producing coherent text. The main problem hindering the production of coherent and appropriate texts seems to be the participants’ insufficient knowledge of the lexico-grammatical resources necessary for meaning creation. This problem possibly arises from insufficient exposure to, and lack of awareness of collocations, and syntagmatic and paradigmatic relations that are of major significance for creating meaning. This problem is consistent with the findings in the literature. For example, Hunston cites Halliday and Martin (1993) who emphasize that non-native writers use “fewer of the lengthy noun phrases that are essential to formal, particularly academic, writing in English” due to the fact that they do not use prepositions in a ‘native-like’ way (2002, p. 82).

Jordan (1997) also maintains that “written work has been referred to as being one of the major causes of concern for students” (p. 46), and reports a study (Jordan 1981) exploring the writing difficulties of overseas post-graduate students taking writing classes in UK universities. The results showed that these students had the most difficulty in vocabulary (62%), followed by style (53%), spelling (41%), grammar (38%), punctuation (18%) and handwriting (12%) respectively. When asked what caused the difficulty in vocabulary, the students stated ‘using a word correctly’

(24)

(21%), ‘own lack of vocabulary’ (15%), and ‘confusion caused by similar sounding/looking words’ (12%). As regards the difficulties with grammar, the students reported ‘verbs: tense formation and use; active / passive use’ and ‘agreement of verb and subject’ (Jordan, 1997, pp. 46-47). As can be observed from the percentages, the greatest difficulty in text creation seems to be related to lexico-grammar.

The problem that has led to the present study can therefore be summarized as follows. Like all graduate students worldwide, the students pursuing a post-graduate degree at EMU are expected to produce coherent and appropriate academic texts, so that their work can be accepted in the global academic discourse community, and so that they can disseminate their research internationally. However, most of the work produced by the post-graduate candidates taking ENGL501 reveals problems specifically at the lexico-grammatical level.

1.3 Purpose of the Study

This study employs a corpus-informed approach (McCarthy, 2001) whereby the applied linguist can “mediate the corpus, design it from the very outset and build it with applied linguistic questions in mind, ask of it the questions applied linguists want answers to, and filter its output, use it as a guide or tool for what you, the teacher, want to achieve” (p. 129). Extracting lexico-grammatical information from a corpus applies this approach (McCarthy, 2001, p. 138).

The main aim of the study is to construct a pedagogic corpus for ENGL501 that incorporates various corpus-informed components. The key component of the pedagogic corpus is a bank of lexico-grammatical patterns to fulfill the generic

(25)

moves that constitute the overall generic structure of thesis abstracts. In the pedagogic corpus, there are also tasks for teacher-directed data-driven in-class work, and a complementary web-based interactive platform (Moodle) to provide access to the authentic data, the corpora, and to promote learner-led exploratory work. The complementary platform is a virtual classroom, with all the features of a traditional classroom and more, which is expected to increase the post-graduate students’ exposure to the pedagogic corpus. Moreover, the increased interaction with the data, the tasks, peers and the teacher is anticipated to maximize the participants’ learning opportunities. It is envisaged that the construction of a comprehensive corpus-informed advanced thesis writing course will assist the post-graduate students involved in research and publication in creating coherent academic texts, and therefore help to minimize the gap between the current and the target performance levels. Through the authentic corpus data and the data-driven tasks, the students are expected to observe the use of language themselves, and become language researchers, or ‘language detectives’ (Johns, 1997).

The two corpora incorporated into the pedagogic corpus are constructed from thesis abstracts. One of the reasons for this choice is that abstracts do not normally include quotations and paraphrases, and the language is expected to be the writers’ own. The second reason is that abstracts are miniature forms of research studies. The scientific research article has a particular type of rhetorical pattern which is reflected through the Introduction-Method-Results-Discussion (IMRD) format (Swales, 1990). Although there may be variations across different disciplines, Wood (2001) holds that these rhetorical conventions “are so accepted and so standard that they are often given in journal guidelines to contributors” (p. 74). In the same vein, according to Swales, the abstract, like other genres reporting research, also seems to have an

(26)

IMRD (Introduction-Method-Results-Discussion) structure (1990, p.181). This structure reflects the main chapters of the thesis: Introduction, Methodology, Analysis, and Conclusion. Therefore, it is anticipated that the analysis of abstracts in this study will reveal language data that are relevant to the thesis as a whole.

For this study, two corpora are compiled: a learner corpus of abstracts of about 100 non-native participants as a representative sample of the whole ENGL501 population (LAC: Learner Abstract Corpus), and a specialized target corpus of abstracts from universities in countries where English is the native language (TAC: Target Abstract Corpus). The abstracts in the target corpus are also produced by learners, not experts. Flowerdew (2000) draws attention to the importance of providing good ‘apprentice’ models rather than ‘expert’ generic models as these are difficult to replicate due to learners’ communicative and linguistic deficiencies.

Both corpora are analyzed through computer-based tools: RANGE (http://www.vuw.ac.nz/lals/staff/paul-nation/nation.aspx) for range and frequency,

Concordance (http://www.concordancesoftware.co.uk/) and AntConc (http://www.antlab.sci.waseda.ac.jp/software/README_antconc3.2.1.txt) to explore the syntagmatic and paradigmatic relations of words. The learner abstract corpus (LAC) is analyzed to identify the most common lexico-grammatical problems in the academic work produced by the post-graduate candidates enrolled in the advanced thesis writing course. Then the target abstract corpus (TAC) is analyzed to extract the targeted lexico-grammar used for fulfilling the strategies and moves within the generic structure of a thesis, and compose a bank of moves and sub-moves. The data are integrated into the pedagogic corpus through both teacher-directed data-driven and learner-led discovery work. Through various task-based activities, the

(27)

participants are provided with the opportunity to enrich their lexico-grammatical knowledge, and produce coherent and appropriate academic text. The study will seek answers to the following research questions:

1. What are the major lexico-grammatical patterns identified in the LAC?

2. What are the major lexico-structural patterns in the TAC?

3. How does the LAC relate to the TAC?

4. What does the cross-examination of the two corpora necessitate in terms of the comprehensive pedagogic corpus design?

1.4 Significance of the Study

Post-graduate students pursuing Master’s and PhD degrees are required to follow the latest international developments in their fields, get their research articles published in international journals and present at conferences. Authenticity and high performance standards of their academic work are of primary significance. Writing even in the mother tongue is no easy task. In a foreign language, text creation becomes a major challenge. Hence, in academic environments, EFL learners have to compete with their native peers in the international arena not only in terms of the quality and relevance of their research, but also in the coherent and appropriate manifestation of their work. Due to the recent developments in computer technology which have made possible the compilation of vast amounts of authentic data electronically, more and more studies across the globe are making use of corpora not only to provide better descriptions of languages, but also to offer a new, and a more

(28)

effective way of learning languages. However, most of these studies “have focused on teaching a corpus approach per se rather than incorporating it into the writing process” (Yoon, 2008, p. 31). The current study is significant, as corpus data is integrated into the writing process, providing a data-rich environment where post-graduate students are exposed to authentic language use, and engaged in a process of discovery learning. Furthermore, this study is the first post-graduate study making use of corpora in EMU.

This research is also significant in terms of the nature of the ENGL501 course. Most universities have Academic Writing Centers, academic writing courses, and research methods courses to assist their post-graduate students. There are not, however, many universities that offer language support for thesis writing to their post-graduate students. In fact, a search of all domains ending with '.edu.tr' using WebCorp produced no reference to any blended advanced thesis writing course at any Turkish University in Turkey or the Turkish Republic of Northern Cyprus.

Another factor making this research significant is that this pedagogic study incorporates the use of an e-learning platform, or a virtual learning environment, which is widely used in the world, including the Open University in the UK (http://openlearn.open.ac.uk/index.php), but quite innovative on the EMU campus. This platform, Moodle (http://moodle.org/), which is based on strong underlying pedagogical principles, provides an environment where new knowledge is created through the individual’s interaction with the environment, as well as through

individuals constructing things for one another

(29)

increased the participants’ exposure to the target language, and the target genre manifold.

The research is also noteworthy in that the researcher has continued to teach the course throughout the research and thesis writing process. This made it possible for the researcher to examine the difficulties of new groups of post-graduate students, and continuously revise the pedagogic corpus. Furthermore, she could observe the impact of the pedagogic corpus and its components on the course participants’ learning and performance.

1.5 Definition of Terms

Corpus:

Sinclair (1991, p. 171) provides the following definition for a corpus: “A corpus is a collection of naturally-occurring language text, chosen to characterize a state or variety of a language”. A similar definition is provided by Biber, Conrad, and Reppen (1998): A corpus “is a large and principled collection of natural texts” (p. 12), which is analysed both quantitatively and qualitatively (p. 5). Hunston (2002) also states that “a corpus is planned, …, and it is designed for some linguistic purpose. The specific purpose of the design determines the selection of texts” (p. 2). In this study, authentic abstracts from the World Wide Web, and from the thesis writing course participants were compiled into two corpora based on the required design principles and for a linguistic purpose, and analyzed both quantitatively and qualitatively to address the identified language-related problem and work towards its solution.

(30)

Learner Corpus:

A learner corpus is comprised of texts produced by learners of a language. A learner corpus is used to “identify in what respects learners differ from each other and from the language of native speakers ….” (Hunston, 2002, p. 15). The compilation of learner corpora is very recent, it started only in the 1990s (Granger, 2003, p. 538). O’Keeffe, McCarthy and Carter (2007) define the compilation of learner corpora as a very important development, and acknowledge Granger as a ‘forerunner in the area’ (p. 23). Granger (2003) refers to a learner corpus as “an electronic collection of authentic texts produced by foreign or second language learners” (p. 538). The best known learner corpus is the International Corpus of Learner English (ICLE) (Granger, Dagneaux & Meunier, 2002). The present study utilized a learner corpus of abstracts produced by EFL post-graduate students.

Specialised Corpus:

For this corpus, particular types of texts are chosen. Therefore, “it aims to be representative of a given type of text. It is used to investigate a particular type of language” (Hunston, 2002, p. 15). In this study, the Target Abstract Corpus (TAC) is a specialized corpus as it is representative of post-graduate thesis abstracts, and is used to explore the lexico-grammatical patterns fulfilling moves and sub-moves in theses.

Pedagogic Corpus:

A pedagogic corpus “can consist of all the course books, readers, etc. a learner has used, plus any tapes, etc. they have heard” (Hunston, 2002, p. 16). In short,

(31)

according to Hunston, this corpus is comprised of “all the language a learner has been exposed to” (p. 16). Willis, on the other hand, provides a more comprehensive definition, and points out that a pedagogic corpus involves the texts that the learners have encountered, or will encounter (Willis, 2003, p. 165). He maintains that “learners process a set of texts to enable them to develop their own vocabulary and work out their own grammar of the language”, and this set of texts can be described as a pedagogic corpus (Willis, 2003, p. 163). According to Willis (2003), tasks are also components of a pedagogic corpus (p. 223). This study adopts Willis’ more inclusive definition of the pedagogic corpus.

Genre:

A genre is “a class of communicative events, the members of which share some set of communicative purposes” (Swales, 1990, p. 58). Stubbs (2002, p. 20) uses ‘genre’ and ‘text type’ interchangeably. However, according to Biber, “genre categories are determined on the basis of external criteria relating to the speaker’s purpose and topic; they are assigned on the basis of use rather than on the basis of form”, whereas “text types represent groupings of texts that are similar in their linguistic form, irrespective of genre” (1988, p. 170). Flowerdew and Peacock (2001) define ‘genre’ as “a particular type of communicative event which has a particular communicative purpose recognized by its users, or discourse community” (p. 15). This study adopts the definition of ‘genre’ by Swales (1990), Biber (1988) and Flowerdew and Peacock (2001), and differentiates between ‘genre’ and ‘text type’. In this study, the genres of post-graduate theses, and specifically thesis abstracts are explored.

(32)

Virtual Learning Environment:

A virtual learning environment is “a collection of integrated tools enabling the management of online learning, providing a delivery mechanism, student tracking, assessment and access to resources” (http://www.jiscinfonet.ac.uk/InfoKits/ effective-use-of-VLEs). Moodle, which is employed in this study, is an open-source, free, and highly adaptable virtual learning environment offering a rich selection of features (Robb, 2004, p. 1).

(33)

CHAPTER II

REVIEW OF LITERATURE

This chapter aims to present the conceptual framework of the study through a comprehensive review of text and textuality, text creation and the importance of lexico-grammar, the significance of the discourse community and the concept of ‘genre’, and the features of academic texts in general and thesis abstracts in specific. After this exploration of ‘text as product’, writing pedagogy is reviewed as this study is pedagogical in nature and it is, therefore, essential to explore how text creation, i.e. writing in language teaching terms, is taught. Following an in-depth discussion of corpora, their relevance to language teaching pedagogy is assessed. The section on the use of corpora in language teaching incorporates a review of a closely related issue, Data-driven Learning (DDL), and the need for a platform to host and exploit corpora as well as DDL tasks. After the related research studies are reviewed, and their relevance to the present study explored, the chapter concludes with a summary of the literature review focusing on the implications for the present study.

2.1 Texts

2.1.1 Text and textuality

The concept of text has been extensively defined by linguists. Halliday and Hasan (1976) maintain that a text is not a collection of sentences, but realized through sentences, and a text needs to form a ‘unified whole’ to be considered as text. They

(34)

note that most teachers are sometimes unsure about whether their students’ compositions can be regarded as texts or not, and stress the fact that “the distinction between a text and a collection of unrelated sentences is … a matter of degree” (p. 1-2).

What, then, is a text and what are the features and regularities through which textuality is achieved? Stubbs (1996) defines text as “an instance of language in use, either spoken or written: a piece of language behaviour which has occurred naturally, without the intervention of the linguist” (p. 4). Halliday and Matthiessen, on the other hand, consider “any instance of language, in any medium, that makes sense to someone who knows the language” (2004, p. 3) as text. For Nunan, text is “any written record of a communicative event” (1993, p. 6) and for Widdowson, “the product of the process of discourse” where, in written language, the writer is “part of the communication” (1996, p. 132). Halliday and Hasan provide the following definition for text: “any instance of living language that is playing some part in a context of situation” (1985, p. 10).

The common element in all these definitions is that text is an instance of language, a record, or a product of language in use, making a distinction between ‘text’ and ‘discourse’, the process of language in use. According to Stubbs (Hoey et al., 2007), text is a static, fixed product, and discourse is a dynamic, interactive process (p. 146). Likewise, Beaugrande and Dressler refer to ‘text’ as an ‘occurrence’, implying some sort of completion. According to them, a text is a “communicative occurrence which meets seven standards of textuality” (1981, p. 3). These seven standards are “the constitutive principles of textual communication and they define and create the

(35)

form of behaviour identifiable as textual communicating” (Beaugrande and Dressler, 1981, p. 11).

The first standard of textuality is cohesion, “the way in which the components of the surface text, i.e. the actual words we hear or see, are mutually connected within a sequence” (Beaugrande and Dressler, 1981, p. 3). According to Halliday and Hasan, “typically, in any text, every sentence except the first exhibits some form of cohesion with a preceding sentence, usually with the one immediately preceding”. That is, each sentence contains at least one anaphoric tie that links it with the previous one or ones (1976, p. 293). Nunan has a word of caution about cohesion. He holds that “the cohesive devices themselves do not create the relationships in the text; what they do is to make the relationships explicit” (1993, p. 27). In a similar vein, Beaugrande and Dressler emphasize that cohesion by itself is not sufficient, and for efficient communication, there should be interaction with the other standards of textuality (1981, p. 4). They point out that cohesion “is the function of syntax in communication” (1981, p. 48) and it relies on grammatical dependencies which are “major signals for sorting out meanings and uses” (1981, p. 3).

Coherence “concerns the ways in which the components of the textual world, i.e. the configuration of concepts and relations which underlie the surface text, are mutually accessible and relevant” (Beaugrande and Dressler, 1981, p. 4), and it is “the outcome of cognitive processes among text users” (Beaugrande and Dressler, 1981, p. 6). The foundation of coherence is the continuity of meaning among the knowledge stimulated by the expressions of the text (1981, p. 84). Stubbs refers to coherence as semantic unity or connectedness (1983, p. 9).

(36)

In addition to cohesion and coherence, which are text-centred notions, there are also ‘user-centred notions’ acting upon textual communication (Beaugrande and Dressler, 1981, p. 7). Two of these are ‘intentionality’ and ‘acceptability’. The text producer intends to produce a cohesive and coherent text in line with the objectives, and the text receiver accepts the text as cohesive and coherent and relevant for the objectives (Beaugrande and Dressler, 1981, p. 7). ‘Acceptability’ requires the text receiver to maintain cohesion and coherence by providing material, and tolerating disturbances as required (pp.7-8). Text receivers support coherence through inferencing, and therefore contributing to the sense of the text (p. 8).

‘Informativity’ is the fifth standard and “concerns the extent to which the occurrences of the presented text are expected vs. unexpected or known vs. unknown / certain” (Beaugrande and Dressler, 1981, pp. 8-9). Low informativity causes boredom, and even rejection of text. On the other hand, very high informativity puts too much burden on the receivers’ processing and may endanger communication (Beaugrande and Dressler, 1981, p. 9).

“The factors which make a text relevant to a situation of occurrence” are known as ‘situationality’. Through this standard, “the sense and use of the text are decided” and the situation helps to make sense of the text (Beaugrande and Dressler, 1981, pp. 9-10). ‘Intertextuality’, the seventh standard, “concerns the factors which make the utilization of one text dependent upon knowledge of one or more previously encountered texts”, and it is “responsible for the evolution of text types as classes of texts with typical patterns of characteristics” (Beaugrande and Dressler, 1981, p. 10). Although there are certain features that are common to all texts to be considered as

(37)

texts, there are also texts that share some common characteristics that distinguish them from other texts.

Beaugrande and Dressler consider these 7 standards of textuality to be concerned with how occurrences are linked to others “via grammatical dependencies on the surface (cohesion), via conceptual dependencies in the textual world (coherence); via the attitudes of the participants towards the text (intentionality and acceptability); via the incorporation of the new and unexpected into the known and expected (informativity); via the setting (situationality); and via the mutual relevance of separate texts (intertextuality)” (1981, p. 37).

In addition to these constitutive principles, there are also ‘regulative’ ones that “control textual communication rather than define it” (Beaugrande and Dressler, 1981, p. 11). These are ‘efficiency’, ‘effectiveness’, and ‘appropriateness’ of a text. Efficiency refers to the use of a text with minimum effort by the participants. The effectiveness of a text is “its leaving a strong impression and creating favourable conditions for attaining a goal”. “The agreement between its setting and the ways in which the standards of textuality are upheld” is the appropriateness principle that regulates and controls a text (Beaugrande and Dressler, 1981, p. 11). According to Beaugrande and Dressler, “acceptability and appropriateness are more crucial standards for texts rather than grammaticality and well-formedness” (1981, pp. XIV-XV).

(38)

2.1.2 Text Creation

Nunan states that the creation of a written text is a complicated undertaking (1993, p. 2). An understanding of how textuality is achieved, therefore, initially requires an understanding of how language resources are used to create text, “the most extensive unit of meaning” (Halliday and Matthiessen, 2004, p. 566). Halliday and Hasan regard text “as a semantic unit; a unit not of form but of meaning” (1976, p. 2). Halliday and Matthiessen emphasize that it is important “to be able to think of text dynamically, as an ongoing process of meaning” (2004, p. 524). Beaugrande and Dressler maintain that “the text producer has the intention of pursuing some goal via the text” and thus, text creation is a sub-goal towards the main goal (1981, p. 39). Texts, then, are produced to achieve goals and to convey meanings, and the greatest challenge is whether or not the intended messages are coherently and appropriately communicated through the use of language since, as Beaugrande and Dressler point out, “knowledge is not identical with language expressions that represent or convey it” (1981, p. 85).

Having established that text creation is a means to an end, and the ultimate objective is to communicate via the text, it is worth examining how meaning is encoded through language. Widdowson proposes that “semantics is the complex interplay of morphology, lexis, and syntax” (1996, p. 61). They interact with each other to create meaning. Semantics is concerned with the meanings of words as lexical items (lexis), the meanings of derivational and inflectional morphemes (morphology) and how words are ordered (syntax) (Widdowson, 1996, p. 53). Morphology is concerned with “how morphemes operate in the processes of derivation and inflection” (Widdowson, 1996, p. 129). Derivation involves ‘lexical innovation’ or

(39)

‘formation’, i.e. the way words mean, and inflection is about ‘grammatical adaptation’, i.e. the way words function (pp. 47-48). Therefore, morphology is closely related to lexis and syntax. Widdowson concludes that although meaning is communicated by “the morphological and syntactic processes of word adaptation and assembly; … it is the words which provide the main semantic content” (1996, p. 54).

Morphological and syntactic processes together make up the study of grammar; how words are combined in sentences, and how they are adapted (Widdowson, 1996, p. 48). As grammar is concerned with word combinations and adaptations, it is impossible to think of lexis and grammar as two separate entities. McCarthy believes that there is no major distinction between vocabulary and grammar and “… any word in the language can be examined from the point of view of grammar, and, vice versa, any word, even words like articles and prepositions, can be considered as vocabulary items” (1990, p. 12).

Halliday and Matthiessen use the terms ‘lexicogrammar’ and ‘grammar’ interchangeably and argue that “grammar and vocabulary are not two separate components of a language- they are just the two ends of a single continuum”, and “the sound system and the writing system are the two modes of expression by which the lexicogrammar of a language is presented, or realized” (2004, p. 7). In lexicogrammar, according to Halliday and Hasan, there is “no hard-and-fast division between vocabulary and grammar; the guiding principle in language is that the more general meanings are expressed through the grammar, and the more specific meanings through the vocabulary” (1976, p. 5). Grammar is the fundamental processing unit of language (p. 21), and a resource for making meaning (Halliday

(40)

and Matthiessen, 2004, p. 31). Widdowson also considers grammar as a tool to express meaning. Grammar, he says, is important because of its communicative purpose. It serves to “adapt words morphologically and organize them syntactically so that they are more capable of encoding the reality that people want to express” (1996, p. 51).

Within lexicogrammar, system and structure are very important in the creation of meaning. Structure is the “syntagmatic ordering in language patterns, or regularities, in what goes together with what”. System, which is the paradigmatic ordering in language, involves “patterns in what could go instead of what” (Halliday and Matthiessen, 2004, p. 22). System and structure work together and “… each system- each moment of choice- contributes to the formation of the structure” (Halliday and Matthiessen, 2004, p. 23). Therefore, what goes together with what and what has the potential to go instead of what are very important in text creation and “a text is the product of ongoing selection in a very large network of systems- a system network” (Halliday and Matthiessen, 2004, p. 23).

Widdowson states that language elements combining with others along a horizontal dimension are in a syntagmatic relationship, and those that have the same potential to vertically appear in the same environment are in paradigmatic relationship. The horizontal elements exist in combination; sounds or letters combine to form words, words combine to form phrases, phrases combine to form sentences. The vertical elements, on the other hand, exist in association; “when different forms have the same possibility of occurrence in a structure at a particular level, and are therefore equivalent in function, they are paradigmatically associated as members of the same class of items” (1996, p. 33-34). According to Widdowson, this two-dimensional

(41)

mode of organization allows the generation of infinite expressions from finite means and “is the essential source of the creativity and flexibility …. of human language” (1996, p. 34).

Halliday and Hasan argue that all components of the semantic system are realized through the lexicogrammatical system” (1976, p. 6). Stubbs holds that “.. messages are conveyed not only explicitly, by words themselves, but also implicitly, by lexical and syntactic patterning” (1996, p. 10). Morphological and syntactic processes, according to Widdowson, perform the function of extending word meanings, and so “constitute a communicative resource” (1996, p. 52). Therefore, although grammatical processes play a supportive role in organizing and adapting existing units of lexical meaning to requirements, they do not initiate meaning but “act upon meaning already lexically provided” (Widdowson, 1996, p. 55).

As lexis is the initiator of meaning and grammar organizes and changes lexical meaning according to needs through syntax and morphology, it would be meaningful to look at the major carrier of meaning in more detail. A lexeme or a lexical item is a “separate unit of meaning, usually in the form of a word, but also as a group of words” (Widdowson, 1996, p. 129). Sinclair holds that “lexical items are not always words, and each word may enter into a variety of relationships with others to realize lexical items” (2004, p. 161). Lexical words are the ‘content’ words of the vocabulary of a language, and “they can be viewed in terms of the relations in which they enter: paradigmatic relations (the options that are open to them) and syntagmatic relations (the company they keep)” (Halliday and Matthiessen, 2004, p. 38).

(42)

Ginzburg defines paradigmatic relations as those “that exist between individual lexical items which make up one of the subgroups of vocabulary items, e.g. sets of synonyms, lexico-semantic groups, etc.”, and holds, for example, that “the meaning of the verb to get can be fully understood only in comparison with other items of the synonymic set: get, obtain, receive, etc.” (1979, p. 46). Paradigmatically, words can form lexical sets. “They function in sets having shared semantic features and common patterns of collocation” and “typically, the semantic features that link the members of a lexical set are those of synonymy or antonymy, hyponymy and meronymy” (Halliday and Matthiessen, 2004, p. 40). Antonymy is “the sense relation of various kinds of opposing meaning between lexical items” (Widdowson, 1996, p. 125), and synonymy “the sense relation of equivalence of meaning between lexical items” (Widdowson, 1996, p. 131). Cohyponyms are “words that are subtypes of the same type” and comeronyms are words that are “part of the same whole” (Halliday and Matthiessen, 2004, p. 40). Hyponymy is characterized by Widdowson as “the sense relation between terms in a hierarchy, where a more particular term (the hyponym) is included in the more general one (the superordinate)” (1996, p. 128).

According to Ginzburg, “syntagmatic relations define the meaning the word possesses when it is used in combination with other words” (1979, p. 46). Syntagmatically, lexical items can form collocations, “the co-occurrence of lexical items in text” (Widdowson, 1996, p. 125) and “a tendency for words to occur together” (Sinclair, 1991, p. 71). Approaches to the semantic analysis of natural languages depend on the view that ‘lexical items are interrelatable’ (van Buren, 1975, p. 126). The probabilistic view, also known as the ‘collocational theory of lexical meaning’, was supported by the British linguist J. R. Firth (van Buren, 1975,