Problem Taxonomy – Most common problems - Novice Difficulties with Language Constructs

In this section we give a thorough description of the most frequent occurring problems and then we classify (see Fig. 3.3) them using the taxonomy described in Section 2.3. Together these problems account for more than half of all problem occurrences (see Section 4.7) and by classifying these we may be able to suggest actions that can be taken with respect to the teaching. For each problem we provide a general description, concrete examples and, if necessary, actual code samples. To classify the problems we have identified a list of plausible underlying causes for each problem and an argument is given for why those causes are plausible. Appendix A is a valuable resource throughout this section.

3.3.1

w r n g

c o n d

– Wrong condition

Any conditional expression that in some way is incorrect may be counted as a wrong condition problem. However the annotation is mutually exclusive

with those annotations that consider special cases of incorrect conditional expressions, e.g. 207c o n db d ry. Except for 207c o n db d ry, we did not identify any other

special cases of conditional expression problems that were common enough to deserve their own annotation. Though we do recommend adding annotations for any kind of conditional expression, e.g. tautology conditions, that seems to be common.

Occurrences of this problem are primarily incorrect conditional expressions in if-sentences, though there are a few occurrences in loop-constructs. Typical occurrences of this problem are conditions that are wrong because they use an incorrect operator, connective or method call. Concrete examples of this problem are solutions that check for reference equality instead of object equality, or overcomplicated conditions that render the solution incorrect, etc.

We suggest these plausible underlying reasons:

1. Interpretation of specification. The students may find it difficult to extract the correct conditions from the specification. It requires them to understand the given specification in its entirety, and they may miss the implicit conditions.

2. Natural language translation. Not all logical expressions expressed in natural language can be directly mapped to the programming language. Conditions may change meaning and parts may be lost in translation.

3. Existing environment. The student may find it difficult to combine the relevant existing state of the program into a correct conditional expression. This often leads to overcomplicated or conflicting expressions. The underlying reasons we identified for this problem are determinants from both the concept-based problem and the knowledge-based problem categories. Though the majority of the determinants are for the knowledge- based problems, we argue that it is uncertain what category is primary and secondary and choose both categories.

3.3.2

b d ry

c o n d

– Boundary case condition

Any conditional expression that evaluates incorrectly for some boundary case, either by going out of bounds or being too restrictive, e.g. incorrectly included or excluded boundary elements in a range. A concrete example of this is a condition that uses the inclusive less-than operator instead of the strictly less-than. This might result in an attempt at accessing an index

outside of an array, or a guard that accepts too little or too much of a number range. As a consequence the program might suffer of index-out-of-bounds errors for arrays. Listing 3.1 is a typical example of this kind of problem.

Listing 3.1: Boundary case sample 7

1 int[] numbers = new int[10];

2 for(int i = 0; i <= numbers . length ; i ++){

3 numbers [i] = i;

4 }

We suggest these plausible underlying reasons:

1. Specialisation problem. Identifying the correct boundary points may be challenging for the student when instantiating a schema.

2. Incorrect notional machine. The student may possess an incorrect conception of how the conditional expression relates to the environment, e.g. a model where the initial array-index is one-based instead of the actual zero-based.

3. Misconception of construct semantics. The student may have an incorrect understanding of how the language construct works, e.g. at what point the condition of a loop is validated or when the update statement of a for-loop is executed (or even what kinds of expressions that should be used as update statements). Du Boulay [10] notes that the automatic increment of the counter variable may be problematic.

4. Interpretation of specification. Students may have interpreted the program specification incorrectly, e.g. not perceived correctly if it should be an inclusive or exclusive condition.

These reasons are mostly determinants in the concept-based problem category, though (4) belongs to the knowledge-based problem category, and we argue that this problem should be classified primarily as a concept-based problem, and secondarily as a knowledge-based problem.

3.3.3

l o o p

c r e at e

– Loop instantiation problem

Any situation where the student has had trouble instantiating a loop- construct to solve a given task, is counted as a loop instantiation problem. Occurrences of this problem are situations where:

• The student has either not been able to instantiate a loop, or did not realise that one was needed, often leaving that part unsolved or given a hard coded solution.

• The student has only partly instantiated a loop, often the most trivial loop-construct schema.

• The student has instantiated a loop that is far away from a working solution, often with conditions and bodies that does not resemble a solution in any way.

Though situations where the student has solved the problem in a static manual way are not counted, these are counted as 206l o o pm a n. Listing 3.2 is an

example of a case where the student has either not been able to instantiate a loop or did not properly understand the semantics of the existing environment and failed to see why line 7 would not compile.

Listing 3.2: Missing loop 7

1 // Field variable representing shares in percentage

2 double[] shares = {30.0 ,25.0 ,10.0};

3 /**

4 * @return the remaining share in percentage .

5 */

6 public double getShareOfOther (){

7 double otherShare = 100 - shares ;

8 return otherShare ;

9 }

We suggest these plausible underlying reasons:

1. Schema knowledge. The student may not possess the necessary schemas to instantiate a loop construct to solve a particular problem, or may not understand how the schemas work and is not able to use them for any other cases than the default.

2. Existing environment. To combine the existing knowledge of the program and use the state of the environment when instantiating a loop construct may be very difficult. To realise which methods/functions and/or variables that should be utilised, requires a good overview of the program and a good conception of the semantics of them.

3. Interpretation of specification. In some cases the requirement that speci- fies the need for a loop-construct may be less obvious and the student may miss it.

All the underlying reasons we identified for this problem are determinants from the knowledge-based category, and we argue that it should be classified as a knowledge-based problem.

3.3.4

c m p l x

c t r l s

– Unnecessarily complicated

Instances of this problem are those situations where the student devise an overcomplicated solution that may or may not work. These code segments may be very difficult to understand, even for the student at the time of writing, and if it should prove to not work as intended it is very difficult to identify exactly what is incorrect. Many instances of this problem could in some situations, with a more thorough inspection and communication with the student, have been annotated with other more specific annotations. Typically these are complicated structures that may be any combination of the following (or other less frequent examples):

• Deeply nested blocks.

• Repetition of large segments of code.

• Conditional structures with individual branches for each possible case the student can imagine, even though they may not be relevant or should have the same result/consequence.

Listing 3.3 is an example of an unnecessarily repeated code segment that could easily be removed with some arithmetic or by performing the conditional difference calculation first. This implementation does not follow the specification completely either, it misses the case wheretime1is equal totime2.

Listing 3.3: Repeated code 7

1 if( time1 > time2 ){

2 int difference = time1 - time2 ;

3 /*

4 ...

5 Code segment that prints out the difference

6 converted to hours , minutes and seconds .

7 ...

8 */

9 }

10 else if( time1 < time2 ){

11 int difference = time2 - time1 ;

12 /*

13 ...

14 Exact same code segment .

15 ...

16 */

17 }

1. Schema knowledge. The student may not possess or understand the schemas necessary to instantiate sensible conditional structures, prevent repetition etc.

2. Existing environment. The student needs a good understanding of the environment, and its state, to be able to prevent the creation of unnecessary complicated structures. To prevent repetition the student needs to identify what separates the different cases, and how to exploit this knowledge to combine the segments.

Both the determinants we have identified and listed above are from the knowledge-based category, and we argue that it should be classified as a knowledge-based problem. However, in some cases there may be another underlying reason for this problem, namely if the student fears to change something that to some degree may fulfill the requirements.

3.3.5

n o

c a l l

– Missing method call(s)

Any situation where the student has omitted a necessary, or useful, method call is counted as a missing method call problem. In some situations the student has still managed to create a working solution by re-inventing the wheel, i.e. implementing the functionality of the method specifically for that situation, and in others the student may have left out that part of the solution entirely. When solving assignments given in a course, the student will in some cases be provided with methods (or classes) and/or information about any library methods that could be useful.

We have identified the following plausible underlying reasons:

1. Existing environment. The student needs a good overview of the existing program structure and the available library methods to be able to avoid this problem. Using this knowledge the student must be able to realise the possibility of calling a method in a given situation, and be able to identify which method.

2. Interpretation of specification. The student may not have perceived the specification correctly and in turn does not realise that there should be a certain method call in that situation.

3. Test data. The student may not be able to realise that there is a problem if the data set the student is executing the program with does not expose that the method(s) was not called.

Two of the determinants identified for this problem, (1) and (2), are for the knowledge-based category and the third is for the concept-based category. We argue that it is more likely that the student does not realise there is a missing method call due to insufficient test data than incorrect conception of the specification, and classify this as both a concept-based problem and a knowledge-based problem.

3.3.6

w r n g

g r p n g

– Incorrect grouping

Any situation where the student has incorrectly connected, or failed to connect, constructs and/or statements is counted as incorrect grouping problem. Typical examples of this are:

• Incorrectly connected, or failed to connect, multiple if-constructs (using the else-connective).

• Omitted to place those statements that should be the “else-case” in an else-block, i.e. placed in such a way that they do not depend on the if-construct.

• Placed the result action of a search schema in the body of the loop, resulting in that the action that should be performed on the result of the search, is performed each iteration instead.

• Placed a statement that also should depend on the condition of an if-structure, outside of that block.

Listing 3.4: else connective 7

1 if( player1Winner ){

2 System . out . println (‘‘ something funny’’);

3 }

4 if( player2Winner ){

5 System . out . println (‘‘ an angry message’’);

6 }

7 else {

8 System . out . println (‘‘ something sad’’);

9 }

We have identified the following plausible underlying reasons:

1. Misconception of construct semantics. The student may not properly understand how the if-construct connects using the else-statement, and consequently fails to connect them. This may be an understanding where sequential if-statements automatically connect (as if there was an else-statement connecting them) and that any else-statement at

the end of the sequence is the branch that is entered if none of the previous if-statement branches are entered.

2. Natural language translation. A plan of conditional actions may be expressed correctly in the natural language of the student, but in such a way that it does not directly map to the programming language. Listing 3.4 implements these requirements “If player1 wins print something funny, if player2 wins print something sad, otherwise print an angry message”. The student has implemented the requirements by directly mapping the natural language solution into the programming language and the angry message will only be printed if player2 does not win.

3. Interpretation of specification. It may be challenging for the student to perceive the specification as it is intended, especially regarding implicit requirements. In many cases the requirement that the calculated result of a program should not be printed if input validation fails, is often implicit and many students does not realise that.

4. Test data. The test data the student is using may not expose that there is a problem with the current structure, e.g. the data set might be so small that there is only one iteration of the search loop and the result action is only evaluated once.

The majority of the underlying reasons identified for this problem are determinants from the concept-based problem category, and we argue that this problem should be classified primarily as a concept-based problem and secondarily as a knowledge-based problem.

3.3.7

ba d

a r i t

– Erroneous arithmetic

Any arithmetic expression that is incorrect with respect to the given assignment is counted as erroneous arithmetic. Common occurrences of this problem is when the student needs to calculate the difference between two numbers, and neglects that the result might be negative. The student may realise the problem but fail to find a correct solution, often statically multiplying by −1 or finding the absolute values of the individual terms instead of the entire expression. Other occurrences of this problem may be that the student fails to understand exactly what the numbers represent, what denominator they have and/or what they count.

1. Test data. The test data the student is using may not expose that there is a problem with the arithmetics, e.g. no case where calculating the difference will give a negative result.

2. Existing environment. The student may not possess the necessary understanding of the existing environment and fails to see the correct semantics of a number, e.g. incorrect denominator.

3. Interpretation of specification. The student may not have understood the specification correctly, e.g. choosing the wrong representation of a value.

4. Natural language translation. The experience the student has with arithmetics is likely purely statical and the dynamic behaviour of arithmetical expressions in a programming language may be challenging and may make it difficult for the student to translate the expressions.

The underlying reasons identified for this problem are determinants from both the concept-based and knowledge-based problem categories, and we argue that it is uncertain if one of them is more dominant than the other, and choose to classify this problem as both a concept-based problem and a knowledge-based problem.

3.3.8

b o o l

ac c u m

– Accumulate boolean

Instances of this problem are situations where the student has neglected, or failed to, accumulate a result boolean when performing an operation on a list of elements. The operation that is performed on each element returns a boolean value that represents failure or success. This boolean value should be accumulated by the loop-construct iterating over the list of elements in a way that detects any failure, see Listing 3.5.

Listing 3.5: Boolean accumulate 3

1 boolean success = true;

2 for( Recipe r: getRecipes ()){

3 if(! r. printToFile ())

4 success = false;

5 // or the reduced way

6 success = success && r. printToFile ();

7 }

We have identified the following plausible underlying reasons:

1. Existing environment. The student may not realise that the operation performed on the elements returns a boolean value representing its success, and fails to realise that there should be an accumulation. 2. Interpretation of specification. The student may not have perceived that

the resulting boolean expression should consider the operations. 3. Specialisation problem. The student may not have been able to express

how to accumulate the boolean correctly.

The majority of the identified underlying reasons are from the knowledge- based category, (1) and (2), though we argue that (3) is more likely because the student is syntactically forced to have a boolean return expression in the method implementation. This should provide the student with a reason to investigate how that value should be calculated, which should limit the number of students having (1) and (2) as underlying reasons. We argue that it is primarily a concept-based problem and secondarily a knowledge-based problem.

Chapter 4 Result Analysis

In our study we inspected and annotated the source code of 349 solutions submitted by the 79 selected students, for the six selected assignments. A total of 2395 annotations across 920 files. Of these 1185 are meta annotations that describes if the files:

• do not compile ( 6c o m pe r r).

• contain methods that are not implemented ( 5u n s o lvp ro b ).

• are not implemented at all ( 7i m p ln ot).

• sufficiently solve the problem, or displays sufficient understanding of how ( 1pa s s).

• do not sufficiently solve the problem ( 2f a i l).

• are changed in a way that breaks the premise of the assignment ( 4c h n ge x e r).

We did not have strict requirements for the pass annotation ( 1pa s s), so any

file where the student was able to display how the problem should be solved to a satisfying degree, was annotated. Even some files that did not compile (42) was annotated with pass. The pass ( 1pa s s) and fail ( 2f a i l) annotations

are mutually exclusive. There were only 94 files annotated with fail, and the rest, 808, was annotated with pass.

There were 1210 problem annotations divided among 52 different annotation kinds. Some of these annotations had very few occurrences and are not necessarily referred to, and if they are, it is in relation to some other problem that they are relevant for.

4.1 Lazy students?

While we studied the gathered data we noticed that as many as 49 of the 79 students had at least one file annotated with 5u n s o lvp ro b , missing implementation

of a method. In total 83 files was annotated with 5u n s o lvp ro b and in 17 files there

were multiple occurrences (i.e. multiple missing methods). In addition 14 students had files that were not implemented at all, 7i m p ln ot, in total 20 files.

We can only share our suspicions of why there were such a significant part of the submitted solutions that was not implemented. In our opinion there are two reasons that seems to be the most plausible.

1. The student may have speculated in how much of the assignment that were necessary to implement to achieve a passing grade.

2. The student did not possess the knowledge to be able to provide a solution for the missing parts.

If lazy students (1) was the reason why some of the submitted solutions were missing parts, we may have reasons to suspect that this may have been an underlying reason for other problems identified in the student submitted solutions as well. This especially applies to those situations where the solution was missing some part, e.g. 402c r e at el o o p (loop instantiation

problem). Though it may be possible that the student would have chosen to implement the skipped parts if the student had enough experience and knowledge such that the implementation would require less work and time.

We give this argument to caution the reader that the following analysis may be affected by this, and to make the reader aware that we did have this

In document Novice Difficulties with Language Constructs (Page 34-46)

Problem Taxonomy – Most common problems

3.3.1



– Wrong condition

3.3.2



– Boundary case condition

3.3.3



– Loop instantiation problem

3.3.4



– Unnecessarily complicated

3.3.5



– Missing method call(s)

3.3.6



– Incorrect grouping

3.3.7



– Erroneous arithmetic

3.3.8



– Accumulate boolean

Chapter 4

Result Analysis

4.1

Lazy students?