• No results found

Chapter 4 – Main study: materials and methods

4.4 Collecting the dataset

60 responses to each of the six prompts were collected from past administrations of the MELAB. For each of the six writing prompts, the aim was to collect 20 responses at three distinct levels of language proficiency. This number of 20 responses at each proficiency level was driven by advice from a statistics consultant at the Center for Statistical Consultancy and Research (CSCAR) at The University of Michigan. A minimum sample size of 20, within each cell of the matrix that makes up the overall dataset was recommended. This number allows for an observation of trends at the narrowest level of interest (control for proficiency) within the dataset. Each response within the dataset was written by a different test taker; that is, there are 360 individual writers represented in the dataset.

The control of language proficiency helps to guard against any prompt effect that may be detected being attributable to subgroups within the sample population that may be at different levels of proficiency. For example, if the test takers who responded to one prompt happened to be at a particularly high level of proficiency, then the written products would likely reflect the test takers’ proficiency level. An analysis of these responses may indicate that the prompt elicits complex or sophisticated language, when these observed features may be attributed to the population and not the prompt. Hence, controlling for language proficiency within each prompt-specific sub-group was necessary to focus the study on the effect of the prompt characteristics.

Language proficiency was controlled for by identifying test takers for inclusion in the sample population based on their score on the grammar, cloze, vocabulary, and reading (GCVR) section of the MELAB (see Table 4.1). GCVR scaled scores are reported on a 0-100 point scale. Raw scores are converted to scaled scores and the scores across test forms (a unique form of the MELAB is administered once a month) are

68

equated to ensure that scores are equivalent from form to form. The three levels of language proficiency that were identified for this dataset were as follows:

 Low proficiency band = GCVR 45-74  Medium proficiency band = GCVR 75 – 84  High proficiency band = GCVR 85-100

Table 4.6 illustrates the total number of responses within the dataset, broken down by prompt and by proficiency band.

Table 4.6: Distribution of responses by prompt and proficiency band

Prompt # # of responses in low-proficiency band # of responses in medium- proficiency band # of responses in high-proficiency band Total # of responses 1 (95) 20 20 20 60 2 (214) 20 20 20 60 3 (100) 20 20 20 60 4 (108) 20 20 20 60 5 (115) 20 20 20 60 6 (73) 20 20 20 60 Dataset Total = 360

The sample population was not divided into sub-groups by scores on the writing section of the MELAB to avoid the risk of predetermining the writing features that could be found at each of the three

proficiency levels. Using these scores to group responses to prompts into three bands would have predetermined, to some extent the type of textual features within the dataset. Using GCVR scores to group responses into distinct proficiency bands helps guard against this possibility. In addition, the GCVR section is the most reliable section of the MELAB (r= 0.93-0.95; CaMLA, 2013) and hence, provides a consistent measure of test takers’ language proficiency.

The comparability of GCVR scores across the six writing prompts within each proficiency group was checked using ANOVA. The purpose of doing so was to check for significant differences in the score profiles awarded to individuals within each proficiency group by writing prompt. The results of the ANOVAs are shown in tables 4.7 to 4.9 below.

69

Table 4.7 shows the ANOVA results for the dependent variable GCVR score for the low proficiency group. The independent variable is the writing prompt.

Table 4.7: ANOVA results for high low proficiency group Sum of

squares

df Mean Square F Sig.

Between Groups 1109.5 5 221.9 5.5 .0001

Within Groups 4595.8 114 40.3

Total 5705.3 119

There was a significant effect of writing prompts on the GCVR scores of the low proficiency group F(5, 114) = 5.5, p<.05. These data indicate that there are significant differences in the GCVR scores awarded to test takers in the low proficiency group, depending on the prompt.

Table 4.8 shows the ANOVA results for the dependent variable GCVR score for the medium proficiency group. The independent variable is the writing prompt.

Table 4.8: ANOVA results for medium proficiency group Sum of

squares

df Mean Square F Sig.

Between Groups 80.1 5 16.0 1.82 .1147

Within Groups 1004.9 114 8.8

Total 1084.9 119

There was no significant effect of writing prompts on the GCVR scores of the medium proficiency group F(5, 114) = 1.82, p>.05.

Table 4.9 shows the ANOVA results for the dependent variable GCVR score for the high proficiency group. The independent variable is the writing prompt.

70 Table 4.9: ANOVA results for high proficiency group

Sum of squares

df Mean Square F Sig.

Between Groups 98.9 5 19.8 2.2 .0596

Within Groups 1027.6 114 9.0

Total 1126.6 119

There was no significant effect of writing prompts on the GCVR scores of the high proficiency group F(5, 114) = 2.2, p>.05.

These ANOVA results show that the GCVR scores did not differ significantly by writing prompt for the medium and high proficiency groups. However, that was not the case for the low proficiency group. For the low proficiency group, there were significant differences in GCVR scores by writing prompt.

Descriptive statistics for the GCVR scores for each of the six writing prompts are shown in Table 4.10.

Table 4.10: Descriptive statistics for low proficiency group GCVR scores

Prompt Mean GCVR score SD Minimum Maximum

73 70.75 2.77 65 74 95 68.15 5.15 52 74 100 69.3 5.13 52 74 108 63.15 5.69 52 70 115 68.7 5.23 52 74 214 62.85 11.03 46 74

The descriptive statistics show that the GCVR score profiles for prompts 73 and 214 are quite different, with relatively large differences in range and standard deviations. During data collection it was

challenging to identify MELAB essays from test takers who had GCVR scores that were either very low or very high. Indeed, finding responses to specific prompts with GCVR scores below 70 was not at all simple. MELAB Writing prompts are administered on only a limited number of occasions to prevent exposure and some prompts simply had few responses available and this was particularly problematic at the extreme ends of the scale. The aim of this study is to investigate the effect of specific prompts that most exemplify certain prompt characteristics and as a result, there was little option other than to use the responses available for prompts 73 and 214 for the low proficiency groups.

This limitation in the dataset means that it would likely be inadvisable to compare the effects of the writing prompts on responses by proficiency band, especially for the low proficiency group. Given the

71

significant differences in language proficiency in the low proficiency group, any differences in the written responses of these test takers (within this group) may be attributable to the language

proficiency differences as to characteristics of the different prompts. This limitation will be considered when analyzing the effect of the prompts on written responses for test takers at specific proficiency levels. Beyond this limitation to analyzing data across proficiency bands, the significant differences in GCVR scores in the low proficiency group do not present any risk to the main aims of the quantitative study, namely to investigate the effect of different prompt characteristics on the textual features of responses for the MELAB test population as a whole.

Table 4.11 shows the results of the ANOVA with the writing prompt as the independent variable and writing score as the dependent variable.

Table 4.11: ANOVA results for writing score awarded

Sum of Squares df

Mean

Square F Sig. Writing score Between Groups 235.356 5 47.071 1.149 .334

Within Groups 14507.300 354 40.981 Total 14742.656 359

There was no significant effect of writing prompt features on the original score awarded, F(5, 354) = 1.149, p>.05. These data indicate that the writing prompts studied in this work elicit responses that do not differ significantly in writing score awarded.

The correlation (Spearman’s Rho) between Writing scores and GCVR scores is 0.72, indicating a

moderate to strong relationship between the two sets of scores, indicating that GCVR scores are a good control for language proficiency but do not provide the same measurement information about the sample population as the Writing scores. That is, the findings of the ANOVA and correlation data indicate that GCVR scores are appropriate for use as a language proficiency control with the caveat that there are significant differences between GCVR scores by prompt for the low proficiency group in the sample population.

Using the criteria described above (60 responses per prompt at three distinct levels of language proficiency), 360 MELAB essays were collected. The responses were drawn broadly from test forms administered between 2006 and 2011. Multiple test centers in both Canada and the United States were represented in the sample, along with centers in Austria, Japan, and China.

72

Table 4.12: Number of test taker responses collected at specific MELAB Test Centers

Test Center Number of responses Percentage

Toronto 93 25.8 Vancouver 66 18.3 Calgary 45 12.5 Detroit 43 11.9 Ann Arbor 34 9.4 Shanghai 24 6.7 Tokyo 18 5 Others 37 10.3 TOTAL 360 100 4.4.1. The sample population

Of the 360 test takers in the sample, 54.6% were female and 45.4% were male. The average age of the test takers was 25. There were many native languages represented in the sample population. Table 4.13 shows the most represented native languages.

Table 4.13: Well-represented native language backgrounds

Language Number of Test Takers Percentage Tagalog 112 31.1 Chinese 81 22.5 Arabic 55 15.3 Farsi 26 7.2 Malayalam 20 5.6 Japanese 18 5 Spanish 13 3.6 Punjabi 11 3.1 Others 24 6.7 TOTAL 360 100

73

In Chapter 2, the importance of a diverse sample population was highlighted (see 2.4.2). Of particular importance is the language background, educational background, and gender balance of the sample population. Populations that lack diversity in these key areas may skew the findings, as described in the literature review of test taker features in Chapter 2 (see 2.4.1). The sample population for this study is drawn from test centers in five different countries and represents multiple different native languages, with no single first language predominating in the sample. The sample population is also evenly

balanced in terms of gender representation. Overall, the diversity of the sample population helps guard against any confounding variable that may arise as a result from an over-representation of a single sub- group.