Students reliability - Information and Communication Technology in Education

By friendship

Since the aim of peer-review outputs was using it in student's evaluation we were first interested in finding out if the reviews are influenced by friendship between the reviewer and the author. We will call reviews where this relationship exists friendly and the other case, when we know of no positive relation between the reviewer and the author, we call non- friendly.

To find out positive relationships between students we subjected them to a sociometric questionnaire. We used deception to improve accuracy of the survey: questionnaire objectives were explained to students as finding out collegiality, responsibility, and overall attitude towards studying in their cohort. They were asked to select from a given list of all enrolled students those colleagues they considered to be responsible, communicative, experts in the course subject, and they had a good relationship with. We were interested in the last question only, and the first three were filler questions. However, we subsequently made use of the third question as well.

To compare friendly and non-friendly reviews we selected only the articles which had at least one friendly and at least one non-friendly review. There were 58 such articles with 172 reviews in total, out of which 62 were friendly and 110 non-friendly.

We first computed the average review ratings for each of the five reviewed aspects, separately for friendly and non-friendly reviews. These did not differ very much. The smallest difference between the average values was 0.03 points in comprehensibility aspect and the biggest was 0.40 points in usefulness aspect.

To show this also statistically, we were interested in testing the obvious hypothesis H: There is a dependence between the way how a student rates the interestingness (usefulness, ...) of her colleague’s article and their friendship, respectively non-friendship.

In the χ2_{significance test [6], we verified H as alternative hypothesis to the null-hypothesis}

H0: There is not a dependence between the way how a student rates the interestingness

(usefulness, ...) of her colleague’s article and their friendship, respectively non-friendship. For the articles rated by one friendly and two non-friendly reviewers we calculated the deviation between friendly and non-friendly reviews as a difference between the value given

by the friendly reviewer and the average of the other two (for each of the five apects). For the articles with two friendly and one non-friendly reviews we calculated the difference between the values of friendly and non-friendly review in both cases. Then we used the same method for finding deviations in an opposite way – between non-friendly and friendly reviews.

This way we obtained positive deviations (when the review rating was higher than the value it was compared with) and negative deviations (in the other case). Given the review ratings ranging between 1 and 5, we placed the significance threshold to x = 1.5 and created the contingency table (Table 1). For each of the five reviewed aspects the “friend”' row in the table contains the numbers of negatively deviating (lower than -1.5), non-deviating (between -1.5 and 1.5), and positively deviating (higher than 1.5) friendly ratings. Analogously, for “non-friend” rows.

rating type negative deviation no deviation positive deviation

interestingness friend 2 51 9 non-friend 21 80 8 usefulness friend 3 51 8 non-friend 24 81 4 comprehensibility friend 6 49 7 non-friend 16 78 15

topic relevance friend 2 55 5

non-friend 12 91 6

overall impression friend 2 52 8

non-friend 21 80 8

Tab. 1: Contingency table of friendly and non-friendly ratings

On the data from the Table 1 we run the χ2_{test of independence [6], and then the polarity test}

[10], both separately for each aspect. Degrees of freedom were set to 2 and confidence coefficient was set to 0.05. The χ2 _{test proved significant dependence between the friendly}

and the non-friendly reviews (i.e., it rejected H0) in three aspects: interestingness, usefulness

and overall impression. After polarity test we obtained the result which showed only that students did not tend to give lower ratings to their friends in these three aspects.

By the initial performance

From the previous section it is obvious that even if we rule out the friendly reviews from the evaluation, it would not help us. However, we simply analyzed that we cannot use all reviews in the evaluation: if we split all submitted articles into those accepted (240 articles) and those rejected (59 articles) by teachers, the mean average review rating for the former group is 3.8, whereas for the latter group it is 3.5. This difference is indeed very minute.

In our effort to take advantage of peer-reviews in articles evaluation, we tried to find “reliable students”. Our first approach was based on the students performance in two initial blogging rounds. At first we defined three categories of main articles deficiencies tracked by teacher: formatting problems, copyright problems and problems with topic. For each feedback the teacher gave to an article and for each deficiency category we marked the deficiency level from 0 to 3 (0 – no problem, 3 – the reason why the article was rejected). Since approved articles had different levels of quality we needed to find out if students are able to reveal excellent, good, weak (but still approved) and unsatisfactory (rejected) articles. Therefore we compared their articles ratings in all five aspects with teachers’ evaluation mentioned above:

 excellent articles: teacher evaluated at most one category by 1, other categories are assigned the 0 level; student’s ratings should be at least 4 in all aspects

 good articles: sum of teacher’s evaluations in all three categories falls in interval (1;3> (however, there is no 3 in any category); student’s rating should be 3 or 4 in all aspects  weak articles: sum of the teacher’s evaluation in all three categories falls in interval (3;6> (however, there is no 3 in any category); student’s rating should be 2 or 3 in all aspects

 unsatisfactory articles: teacher’s evaluation is equal to 3 at least in one category; student’s rating should be 1 or 2 in all aspects

According to these criteria we identified students able to reveal articles with different quality (group of reliable students) but also students who evaluated unsatisfactory articles as excellent, weak articles as good, etc. (group of unreliable students). Then we subtracted the number of times a student appeared in latter group from the number of times she appeared in former group. If the result was greater than 3, we marked the student as reliable.

We selected seven students from our cohort this way. However, we needed to find out whether their reliability is constantly on the same level or it varies during semester. These students reviewed 53 articles in total, 5 of them were rejected and 48 were approved by teacher. 14 out of 15 reviews assigned to those rejected articles were written by these seven students. However, unsatisfactory articles were not revealed by them. We also investigated if they were critical especially in evaluation of one particular aspect (interestingness, ...) but even this approach was not successful.

Therefore we concluded that it is not possible to identify a group of students after two rounds which can be automatically seen as reliable in next rounds. This finding can be related to the fact that two very first rounds served specially for training in blogging and teacher’s evaluation was perhaps not so strict as in later rounds.

By the expertise

This is where our sociometric study comes into play again. In the questionnaire we also asked students to identify those of their classmates whom they considered to be already experts in the course topics. Thus we identified a group of students marked down as experts by at least five of their colleagues. This group (further called experts) consisted of 8 students.

To verify the actual expertise of the students in the experts group as well as the appropriateness of the selection, we examined their performance in the blogging and other course assignments.

Investigating the blog articles we found out, that the average number of articles posted by experts almost did not differ from the average number of articles posted by the others (4.9 compared to 4.8), but the quality of these two groups of articles was different (see Table 2). Whereas 92.3% of experts' articles were approved by teachers, in the case of other students it was only 76.7%. Since the score acquired for blog articles was not directly proportional to the number of approved articles, we evaluated this indicator as well. The average article score of experts was 6.1 and it was 4.3 for the other students. This means that the experts gained 87.5% and the other students 61.4% of maximum score in average.

66 articles submitted articles accepted articles score reviews submitted reviews accepted reviews score midterm score project score experts 4.9 92.3% 87.5% 13.1 98.1% 81.3% 68.8% 77.0% others 4.8 76.7% 61.4% 13.6 90.8% 51.8% 51.7% 69.0% Tab. 2: Average results of experts compared to the other students

The evaluation of the experts group by way of average values of selected indicators showed that this group performs better in average than the other students. Moreover, the article success rate (the ratio of approved articles to submitted articles) of 6 out of 8 experts was 100%. In the case of reviews the 100% success rate was achieved by 7 out of 8 experts.

In document Information and Communication Technology in Education (Page 63-66)