The exercises - Errors and misunderstandings among novice programmers: Assessing the student no

The goal of the exercises was to supplement the data we collected in the interviews. This was done to help us identify more misunderstandings causing the type of errors we previously had experienced. Therefore the exercises had to be designed to suit our previous results. That gave us the

following list of possible error types to look into: • Exception errors

• Poor choice of data structure

• Data and functionality distribution errors • Linked list errors

• Array-errors • Generics errors • Iterator errors

6.2.1 Limitations from the think-aloud method

Because the exercises were to be solved during think-aloud observations it was important to design the exercises to fit the observations with the method described in section 3.5.1.

With regards to the work required to analyze the observations and the time available to the students who were practicing to resit the exam the exercises should not take more than 30 minutes to complete for a skilled novice programmer.

The topics covered should be ones the subjects do not master fully yet manage to at least attempt creating a solution to. If the exercises are too hard no subject will be able to make a proper attempt. If the exercises are too easy they will have learned how to solve it using rote learning and a lot of the choices made would be unconscious ones. This is a reason why it is often a problem to figure out how experts solve simpler problems. They have reached a point where they know it so well they no longer have a good way of verbalizing the process they went through to solve the problem.

Additionally there should be a warm-up exercise to let the subjects get comfortable with verbalizing their thoughts and actions when they write program code. That exercise should take five to ten minutes and not be included in the recording of the observation to help the subjects relax and get comfortable.

6.2.2 Limitations due to the subjects’ skill levels

As the subjects were expected to not be the most skilled programmers the exercises would need to consist the simpler parts of the core curriculum. We had no other data from the study to help us do this. But we were allowed to use the scores achieved in the final exam in the targeted course held a few months earlier. One of the professors teaching the targeted course provided us with the scores of the students who failed the exam. We were given the scores of the individual tasks of the exam without the candidate numbers of the students.

Most of the students we expected to participate in the observation would have tried to solve that exam and gotten a failing grade. Therefore it

Theme

Task 1 Class hierarchy and pointer structure Task 2 Data structure from program definition Task 3 Ordered linked list and recursion Task 4 GUI, Java Swing and Java AWT

Task 5 Threads, monitors and sorting algorithms Table 6.1: INF1010 June 2015 exam overview

¯x ˜x s Students with 0% Task 1 20.82% 10.00 23.89 23 Task 2 23.65% 17.50 24.79 10 Task 3 19.82% 18.50 14.36 6 Task 4 20,49% 11.25 19.32 14 Task 5 9,03% 5.50 10.67 35

Table 6.2: Statistical analysis of the INF1010 June 2015 exam scores for students with failing grades

helped us make some assumptions about which topics are suited for them to solve during an observation. If they scored too high in a topic it might be too easy. If they scored close to zero it is unlikely that asking them to solve such an exercise may give any insights into their thinking, as they will probably not have any idea at all on how to solve it.

The exam of June 2015

The exam was made up of 5 different tasks. Those may also have been divided into extra subtasks. But the scores we were supplied with were separated into those scores only. Therefore we have chosen to consider them as a whole. The topics of the different tasks can be found in Table 6.1 and a histogram containing the distribution of the students’ scores can be found in 6.1 on the next page.

To analyze the scores the students achieved we made a quick statistical analysis. We used the statistical measures we presented in section 3.7 on the students’ scores. The results of those measures are listed in Table 6.2.

The statistical measure mode is not included as it computed to 0 for all five tasks. When the granularity of the scoring system is so detailed (0- 100%), the number of equal non-zero scores are unlikely to be very high. Instead the number of students who did not get any points at all were added as a metric. This is very relevant as it may highlight topics that should be avoided in tasks in the observation.

The mean, ¯x, is quite similar, at approximately 20%, for all tasks except task 5 which is at 9%. So it is hard to make any assumptions except that threads, monitors and sorting algorithms would have been likely to be too difficult to fit in the exercises for the observations.

Figure 6.1: The distribution of students failing the INF1010 June 2015 exam based on percentage of the maximum score they achieved

The median, ˜x, on the other hand varies a bit more. A median lower than the mean indicates a large amount of students scoring very low and a few students scoring very high. A median above the mean indicates the opposite. And a median closer to the mean indicates a more even spread. A median above the mean indicates the opposite.

Finally a low standard deviation, s, and few students scoring 0% shows less spread in the score and more students who got a score close to the mean as the mean cannot have been pulled up far by a few very high scores.

From this we can see that the third topic, linked lists and recursion, may prove to be a good choice for these students.

6.2.3 The chosen set of exercises

With all the aforementioned requirements in mind we designed the set of exercises described below. The full text given to the subjects can be found in Appendix G.

• A warm-up exercise requiring the subjects to create a simple print- method for a class called Car and create a functioning program that creates a few instances of the class Car and prints the info about them. • The first proper exercise required the subjects to design and program a simple class hierarchy for a vehicles with some predefined requirements.

• The second exercise required the subjects to program a LIFO-queue in the form of a linked list.

• If the subjects were to complete both exercises the subjects were to solve an additional exercise to make the LIFO-queue from the second exercises iterable.

How will these exercises fit our requirements?

The warm-up exercise fills the requirement to ease the students into verbalizing their thoughts.

The first proper exercise is designed to help us identify misunderstandings that lead to errors in distribution of data and functionality. Those were very common errors in the interviews. It was made to be a lot easier than the exercises from the final exam from June 2015 that consider class hierarchy and pointer structure. So it should be feasible to solve for the students participating.

The second exercise was designed to help us identify misunderstandings that lead to errors with linked lists and generic classes. They are two types of error we experienced a bit, and the teaching assistants in the targeted course described them as "very common" in our talks with them.

The additional exercise was made to help us identify misunderstandings that lead to errors both in linked lists and iterators. This is achieved because writing an iterator requires knowledge of the container is written for.

We did not have a student with the proper level of experience available to test our exercises. Instead they were tested by a student with one additional year of experience. He spent twenty-six minutes solving the exercises. From this we believe a skilled student with one year less experience may have to spend a couple of minutes more.

6.3 The participants

27 students signed up for the extra lectures and problem solving session. They were offered a crash course consisting classroom lectures and plenary problem solving. 16 out of the 27 showed up for the crash course and of them a total of nine students chose to participate in the individual observations.

The lectures were held over two days and the individual observations were held afterwards. The lectures were not designed to improve skills in topics specific to the observations but to prepare them to resit the exam. Thus the topics covered in the observation exercises did not receive any more focus than other topics in the course.

The nine participants were all students who were going to resit the exam. As they chose to participate themselves there was no random selection of students. Based on their choice to participate first in the crash course and then the observations we can expect them to be among the more motivated students. But as mentioned in the beginning of this chapter,

most of them are still likely to be less skilled than the average student of the targeted course.

In document Errors and misunderstandings among novice programmers: Assessing the student not the program (Page 63-68)