Techniques for grading programming labs

(1)

Rochester Institute of Technology

RIT Scholar Works

Theses

Thesis/Dissertation Collections

1984

Techniques for grading programming labs

Kathleen Muller

Follow this and additional works at:

http://scholarworks.rit.edu/theses

This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion

in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact

[email protected].

Recommended Citation

(2)

Rochester Institute of Technology

School of Computer Science and Technology

TECHNIQUES FOR GRADIIIG PROGRAIIIlIIIG LABS

by

Kathleen Muller

A thesis, submitted to

The Faculty of the School of Computer Science and Technology,

in partial fulfillment of the requirements for the degree of

Computer Systems Management

Approved by:

Lawrence

A.

Coon

John A.

Biles

James Harnmerton

Dr.

Professor

/l./!Wr

Lawrence Coon

Jo~!-{t;~f

(3)

DEDICATED TO

(4)

TECHNIQUES FOR GRADING PROGRAMMING LABS

Kathleen A.

Muller

•

I Kathl een Mull e

r'---_-"-:=.::"'-':.::;:~....=:;o.:...::o::.=~~---hereby grant permission to the wallace Memorial Library,

of Rochester Institute of Technology, to reproduce my

(5)

ACKNOWLEDGEMENTS

I would

like

to thank all

the

professors at Rochester

Institute

of

Technology

who

helped

me obtain a

fine

educa

tion

in

the

field

of computer science. I would _especially

like

to

thank

Larry

Coon,

Al

Biles,

and Jim Hammerton

for

all their

time

and effort on

this

thesis.

I would

like

to thank

my

husband

Tom

for

being

so

sup

portive, and

my

children, Anna and Eric

for

being

so under

(6)

ABSTRACT

Techniques

for

manual and automated _grading of program

ming

labs

are

discussed.

Topics

investigated

include:

gen

eral

grading

of

programming

labs,

plagiarism

detection,

pro

gram

documentation,

program output, and program efficiency.

This

investigation

led to the development of automated

grading

tools that report on style and point to possible

instances

of plagiarism. The techniques utilized will

be

(7)

TABLE

OF

CONTENTS

Preliminary

Information

i

Title

Page

i

Acknowledgements

ii

Abstract

iii

Table

of

Contents

iv

1.

Research General

Information

1

1. 1

Research

Background

1

Figure

1.

1

Chart of Available Tools

2

Figure

1.

2

Table of Information Collected

by

Tools

3

2.

General

Grading

of

Programming

Labs

4

2. 1

Introduction

and Background

4

2. 2

5

2. 2. 1

Manual Approaches

6

2.2.1.1

G.

Weinberg

and E. Schulman

6

2.

1. 2

D. Clutterham

6

2.2.1.3

N. Miller and C. Peterson

6

2. 2. 1. 4

G. Morgan

8

2.2.1.5

R.

Hamm,

K.

Henderson,

M. Repsher

K. Timmer

8

2.

2

Automated Approaches

11

2.

2. 2. 1

S. Robinson and S. Torsun

11

2.2.2.2

S. Robinson

(ITPAD)

12

2.2.2.3

M. Rees

(STYLE)

17

2.

3

Summary

22

3.

Plagiarism

23

3. 1

Introduction and Background

23

3. 2

Previous Work

25

3. 2. 1

Preventive Approaches

25

3. 2. 2

Detection Approaches

31

3.

2.

2. 1

K.

Ottenstein

31

3.2.2.2

S. Robinson

(ITPAD)

32

3.2.2.3

S. Grier

(ACCDSE)

33

3. 2. 2. 4

J.

Donaldson,

A.

Lancaster

and P. Sposato

36

3.2.2.5

M. Rees

(CHEAT)

3 8

(8)

4.

Program Documentation

40

4. 1

Introduction

and Background

40

4. 2

41

4. 3

Sample

Pascal

Program

Standards

45

4. 4

Sample

Program

Documentation

Standards

47

4. 5

Summary

4 8

5.

Program Output

49

5. 1

Introduction

and Background

49

5. 2

General

Testing

Approaches

49

5. 3

Summary

55

6.

Program

Efficiency

56

6. 1

Introduction and Background

56

6. 2

Previous Work 57

6. 3

Summary

5

8

7.

Tools

Developed

59

7- 1

Introduction

59

7.

2

Program Explanation

59

7. 2. 1

Str. com. c - Comment Stripper

60

7. 2. 2

Token.

1

-Lexical Analyzer

61

7. 2. 3

Token,

h

-Header

61

7. 2. 4

Style, c - _{Style Grader}

62

7.

2. 5

Plag. c - _Plagiarism _Detector

67

7. 2. 6

Summary

71

7. 3

Results of the Tools

73

7. 3. 1

Style Grader Program

73

7.

3.

2

Plagiarism Detection Program

74

7. 4

User Information

76

7. 5

Suggestions

for

Future Development

7 9

8.

Summary

80

Annotated

Bibliography

81

General

Grading

of

Programming

Labs

81

Plagirism

86

Program Documentation

89

Program Output

95

Program

Efficiency

99

Bibliography

(9)

1.

RESEARCH GENERAL

INFORMATION

1. 1

RESEARCH

_BACKGROUND

The

intent

of

this

thesis

is

to

determine

the automated

tools

that

are available

for

grading

_programming

labs

and

detecting

plagiarism. In

researching

the topic

five

categories emerged on which

instructors

might concentrate on

when

grading

programs. These categories are:

-General

Grading

of

Programming

Labs,

-Plagiarism

Detection,

-Program

Documentation,

-Program

Output,

-Program Efficiency.

Figure

1.1

contains a chart of the

tools,

both

manual

and _automated, available

for

use

in

each of the

different

categories. Figure

1.2

contains a chart of the

information

collected

for

automated tools.

In addition to automated

tools,

basic

information

on

how

to weight these categories was found and

is

reported on

in

the sections to follow.

At the end of this thesis

is

an annotated

bibliography

as well as a bibliography. The annotated

bibliography

pro

vides a

brief

review of the articles or

books

which are

referenced

in

this thesis and

is

organized

by

_category,

whereas the

bibliography

is

organized

alphabetically

by

(10)

************************************************************

Purpose

Language

Article

Implementation

of

Tool

used

for

found

in

notes

************************************************************

GENERAL

GRADING

any

language

Pascal

any

language

any

language

Pascal

Fortran

PLAGIARISM

PRETTY-PRINTING

Cobol

Basic

Pascal

Fortran

Pascal

Fortran Pascal Pascal PL/1 PL/1

Lisp/Rlisp

Pascal OUTPUT Basic Algol Algol EFFICIENCY Pascal Snobol

Pascal

C

[HHRT83]

grading

sheet

[Meek831

program style assessor

[Morg82]

use of a rubber

stamp

[MiPe80]

grading

sheet

[Rees82]

[Rose83]

[R0S08O]

[RoTo77]

[DLS08I]

program

plag

detection

[Grie81]

program

plag

detection

[Otte77]

program

plag

detection

[Rees82]

program

plag

detection

[R0S08O]

program

plag

detection

[Bate81]

prettypr

inter

[Bond791

indentation

algorithm

[Clif7 8]

connector

lines

[CoSm7 93

statement reformatter

[HeNo7 9]

prettyprinter

[LeHu77]

prettyprinter

[Chan7 81

[FoWi6

5]

[H0II6O]

[Naur641

[MaMi76]

[RiGr75]

[Site7 8]

match output match output match output match output execution time execution time execution

time

prof

(11)

p

s

p

s

p

s

p

s

p

s c ***************************************************************

VALUES

COUNTED

A B

C

D E

#

***************************************************************

1.

nl

-#

of unique operators

p

s

p

s,p

2.

n2

-#

of unique operands

3.

Nl

-total

operators

4.

N2

-total

operands

5.

N

-size of program N1+N2

6.

code

lines

7.

variables

declared

(and

used)

8.

total

control statements

9.

total

lines

s c

p

10.

average

line

length

s

11.

code comment

lines

c

12.

use of comments s

13.

use of

indentation

s

14.

total

of non-comment characters

p

15.

use of

blank

lines

as separators s

16.

multiple statement

lines

c

17.

constants and types c

18.

number of reserved words

s,p

19.

variables

declared

(not

used) c

20.

length

of

identifier

s

21.

number of procedure/functions s c

p

s,p

22

total

calls to subroutine

p

23.

total

input

statements

p

24. var parameters c

25. value parameters c

26.

#

and

kind

of

data

structure s

27- procedure var

(includes

21,24)

c

2 8.

total conditional statements

p

2 9.

#

and

kind

of control structure s

30.

for

statements c

31.

repeat statements c

32.

while statements c

33.

goto statements c s

34.

assignment statements s

p

35.

loop

statements

p

36.

indenting

function

c

37.

%

of embedded spaces

s,p

3 8.

vocabulary

of the program s

39.

volume of the program s

40.

level

of the program s

41.

intelligence

content s

42.

effort of the program s

s -

information

_used

in

_style _program

p

- information _used

in

_plagiarism _program

c - information _counted

but

_not _used

in

plagiarism program

#

A-I0tte77] ; B-[RoSo80] ;

C-tGrie81]

;

D-[DLSo811

;

E-[Rees82]

(12)

2.

GENERAL

GRADING

OF

PROGRAMMING

LABS

2. 1

INTRODUCTION

AND BACKGROUND

Instructors

confronted with

large

numbers of programs

to

grade tend to

defend

themselves

in

several ways:

they

may

employ

a cadre of graders or

teaching

assistants,

they

may

decrease

the number of

programming

assignments, or

they

may

be

forced

to grade so

hastily

that

they

seize one or two

simplistic criteria often unrelated to their course objec

tives.

Unfortunately,

this results

in

evaluation

incon

sistencies, a

loss

of student confidence

in

grading

fair

ness, and a

diminished

level

of student competence

in

pro

gramming

[HHRT83].

It

becomes

important

for

the sake of

both

students and

instructors

that efficient, objective criteria

for

grading

programs

be

developed. These criteria should

accurately

measure a student's achievements and avoid errors

in

evalua

tion

[Morg82].

By developing

some

kind

of standard

grading

technique,

the student

knew

precisely

what was expected

(13)

2. 2

uncertainty

of

marking

by

manual methods,

it

was thought that automatic assessment of style

using

sim

ple algorithms could produce results

just

as valid and with

improved

consistency. At the same time automatic assessment

would

completely

eliminating

time-consuming

manual

inspec

tion

of program

listings

[Rees823.

A

discussion

of

both

manual and automatic approaches

(14)

2. 2. 1

MANUAL APPROACHES

This

section

includes

five

different

manual approaches

to

the

grading

of

programming

labs.

2. 2. 1. 1

G.

WEINBERG

AND E. SCHULMAN

Weinberg

and

Schulman

[MiPe80]

graded programs

by

rank

ing

the students

according

to the

following

criteria:

-number of program _statements,

-number of

hours

in

completing

the assignment,

-output _clarity,

-program clarity.

2. 2. 1. 2

D. CLUTTERHAM

Clutterham

[MiPe80]

used the

following

criteria

for

grading

a _program,

assigning

points

for

each criterion:

-correct answers,

-program

efficiency

in

terms of

length

(#

of statements

in

instructor's

program

divided

by

#

of statements

in

student's program multiplied

by

total points

for

the criterion) ,

-correct termination of program.

2. 2. 1.

3

N. MILLER AND C. PETERSON

Miller and Peterson

[MiPe80]

used

forms

attached to

each program with the evaluation criteria

listed,

along

with

the weight given

for

each criterion.

They

felt

that

the

weighting factors

helped

make the

grading

more objective.

(15)

was

for

students who

did

more than what was required.

Four

sample

forms

were presented

by

the authors.

One

was

the

original

form

the authors used, the other three were

other

instructor's

_adaptations _of _the _original

form.

The

original and one of the adaptations

follows:

ORIGINAL APPROACH

Algorithm

(10%)

Structure

chart

showing

calling

hierarchy

(5%)

Detailed

algorithm expression

for

each module

(5%)

Program style and

clarity

(25%)

Internal

documentation

(10%)

Meaningful

identifiers

(5%)

Formatted

listing

(10%)

Output

(45%)

Correct

for

specific

input

(35%)

Easy

to read

(5%)

Graceful

termination

(5%)

Refinements above minimum

(20%)

Algorithm clarity, efficiency, and/or elegance

(5%)

"Elegant"

implementations

(10%)

Output embellishments

(2%)

Exemplary

program

design

and

implementation

(3%)

AN ADAPTATED APPROACH

Top

down

design

(40%)

Detailed problem

definition

(20%)

Refinement of the problem

using

a

level

by

level

approach

(20%)

Program style and

clarity

(20%)

Description of all

data

structures

(5%)

Meaningful

identifiers

(5%)

Proper

indentation

(5%)

Modular

design

(5%)

Output

(20%)

Correctness

(15%)

Well organized and readable

(5%)

Refinements

-Superior work

(20%)

Program

length

(5%)

Output embellishments

(5%)

(16)

2. 2. 1. 4

G.

MORGAN

Morgan

[Morg82]

used the same approach of

listing

the

criteria

to

evaluate,

and then

rating

each criterion. Morgan

[Morg82]

used a rubber

stamp

applied to the

front

of each

program to grade each program, rather than an attached form.

A sample

format

for

the rubber

stamp

as

it

might

be

filled

out

follows:

Timely

2

3

4

(J)

Problem

definition

2

3

4^

I/O

design

2

3

<|)

5

Logic

design

2

(f)

4

5

Source

program

2

3

@

5

Test

validity

2

3

@

5

This

student would receive an

83%

for

the

lab,

since

there were

25

points awarded out of a possible

3 0.

2. 2. 1. 5

R.

HAMM,

K.

HENDERSON,

M. REPSHERT and K. TIMMER

Hamm,

Henderson,

Repshert and Timmer

[HHRT83]

borrowed

an approach used

in

grading

English

Compositions

called the

"Diederich

Scale".

They

felt

that there was a

similarity

between

writing

a computer program and

writing

an

English

paper. Thus

using

a similar approach

in

the

grading

of each

(17)

The

following

concepts

tied

English

compositions to computer

programs:

-both

are the solution to a communication problem

- _the

composition

-communicates with other

persons

- _the

program

-communicates with a

computer

-both

start with an outline or flowchart

-both

implement

the outline or flowchart

-both

have

qualities of style and

individuality

-both

create a

heavy

paper-load on the

instructor

-both

students expect a consistent _grading between

instructors.

The

proposed system

had

a

weighting

scheme similar to that

of Miller and Peterson

[MiPe80].

A

list

of criteria, with a

sample weight scale

for

an English composition

follows:

p-poor; a-adequate; g-good

p

a

g

ideas

organization

flavor

wording

usage

spelling

punctuation

2

4

6

8

10

2

4

6

8

10

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

(18)

7

13

20

7

13

20

4

8

12

16

20

4

8

12

16

20

4

8

12

16

20

4

8

12

16

20

4

8

12

16

20

A

list

of

criteria,

_with _a _sample _weight scale

for

a com

puter program

follows:

p-poor; a-adequate; g-good

p a

g

execution of the program,

0

correctness of the output,

0

design

of the _output,

0

design

of

the

logic,

0

design

of

test

data,

0

internal

documentation,

0

external

documentation,

0

A program was written to generate a specific

form

for

each assignment, so that changes to the

list

of criteria

graded and the weight assigned to each could

be

easily

made.

The

form

would contain some

identifying

information

and the

criterion and weight assigned to each criterion

for

each

assignment

(similar

to the previous example). The

forms

for

the appropriate assignment would then

be

filled

out

by

the

(19)

2. 2. 2

AUTOMATED APPROACHES

This

section

includes

three

different

approaches to the

automatic

grading

of

programming

labs.

2. 2. 2. 1

S.

ROBINSON

and S. TORSUN

Robinson

and

Torsun

[RoTo77]

used an approach

whereby

the

set of submitted solutions were

automatically

assessed

relative to a solution produced

by

the

instructor.

They

used a program which classified each source statement

by

its

relative

importance

to the execution of the whole program,

then a report was produced that

listed

the

following

for

each program statement:

-an estimate of execution time

(a),

-a count of the

frequency

of execution

(b),

-an estimate of

total

execution time

(a*b).

From

these

three values the

importance

factor

for

each

statement was calculated. The

importance

factor

was the

relative contribution of a statement to the overall execu

tion

time of the program, expressed as ten times the percen

tage of

total

execution run time. Thus an

importance

factor

could range

from

0

to

1000,

with a

larger

value

indicating

a

higher

cost statement. The

importance

factor

was

then

used

to

produce a graph. The x coordinate

being

the

importance

factor

and the

y

coordinate

being

the statements rank

in

order of importance. The student's graph was

then

compared

(20)

Robinson

and

Torsun

showed that as

programming

style was

improved

the

graph would mold to the

instructor's

solution.

A major problem with

this

method, as with the other

automated _methods, was that

if

output was not also looked

at, the program could

fit

into

the correct measurements, yet

not solve

the

problem

it

was

designed

for.

Also this system

will not mark or

take

into

consideration original solutions

or

readability

[RoTo771.

2.2.2.2

S. ROBINSON

(ITPAD)

The

approach of Robinson

[R0S08O]

was to use modified

code optimization

techniques

and software science measures

to analyze FORTRAN source programs. Each student's program

went

through

three phases of analyze. With the

information

that the system _collected, the

following

functions

were per

formed:

-each student's program was examined

visually

for

certain

design

requirements,

-the progress of a student

through

a quarter was

evaluated,

-the _programming assignments were evaluated to see

if

the student was

using

the

desired

concepts,

-possible plagiarism was

looked

for,

(21)

The

first

phase was

the

lexical

analysis phase. In

this

phase

fourteen

_program characteristics

(listed

in

Fig

ure

1.2)

were

tracked.

The

information

gathered

in

this

phase was used to create

two

profiles, the student's profile

and

the

assignment's profile.

The

student's profile contained

information

about the

control _structures,

retreating

_edges, and

data

structures

that a student employed

throughout

the quarter. A _retreating

edge

is

the edge of a program graph which represents a

return to

the

beginning

of a

loop.

The student's profile

aided

in

determining

whether or not a student

had

mastered a

particular topic. An

instructor's

model program was used

for

comparing

programs.

Following

is

a sample of the

infor

mation contained

in

a student's profile

for

four

programming

(22)

STUDENT

PROFILE

-OUTPUT FROM S. ROBINSON

************************************************************

ASSIGNMENT

NUMBER

1

2

3

4

************************************************************

Control

Structures:

if-then

12

3

if-then-else

1

else-if

Logical

if

with goto

4

without goto

while

2

for

indexed

do

1

3

goto 5

Data Structures

real

3

5

integer

5

6

5

Basic Blocks 8

23

12

6

Retreating

edges

7-2

22-2

11-2

3-3

(23)

The assignment profile contained the software science

measures:

the

control structures _used, the retreating edges,

the

number of

basic

blocks

and the

data

structures used

by

the

students

for

each assignment

[R0S08O].

The assignment

profile gave

insight

into

how

effective a _programming

assignment was at

displaying

a student's _{understanding} of a

particular concept. It

did

so

by

revealing

the general con

cepts used

by

the students to solve the problem. Did the

student use the new material

in

the assignment or

did

they

use older material that

they

felt

more comfortable with?

Three sample assignment profiles

from

Robinson's

approach

follows.

This

information

contains the

different

ranges of values

(low,

_model,

high)

for

each Criterion

(24)

ASSIGNMENT

PROFILE

- _OUTPUT

FROM S. ROBINSON

************************************************************

Assignment

Profile of Program

1

************************************************************

low

model

high

unique variables

3

4

13

total

variables

14

18

46

unique operators

4

5

8

total

operators

10

13

34

assignments

1

5

11

length

24

31

76

vocabulary

9

19

volume

79

98

3 23

level

. 30

1.0

1.3

intelligence

content

6.1

8.7

3 0. 5

effort

59

98

1061

leaders

2

3

16

************************************************************ Assignment Profile of Program

2

************************************************************

low

model

high

unique variables

6

7

17

total

variables

32

46

95

unique operators

4

5

7

total

operators

13

25

53

assignments

4

6

26

length

45

71

132

vocabulary

10

12

23

volume

150

254

565

level

. 48

1.0

1.12

intelligence

content

10. 9

15. 5

99.0

effort

87

254

1045

leaders

4

17

36

************************************************************

Assignment Profile of Program

3

************************************************************

low

model

high

unique variables

7

9

19

total

variables

19

21

34

unique operators

4

6

10

total operators

10

13

80

assignments

6

7

39

length

31

47

189

vocabulary

12

15

22

volume

111

184

7 56

level

.24

1.0

1.62

intelligence content

12.4

16.2

44.0

(25)

The

second phase was the analysis of program structure.

This

phase obtained characteristics

by:

-dividing

the program

into

basic

blocks,

-constructing

a

flow

graph

from basic

blocks,

the

flow

graph was constructed

by

examining

all statements that could cause transfer to other

basic

blocks,

-constructing

a

directed

acyclic graph

(DAG)

for

each

basic

block,

the DAG presents a picture of

how

the the value computed

by

each statement

in

a

basic

block

were used

by

subsequent

statements

in

the

block,

-performing

data

flow

analysis on the

flow

graph,

-detecting

loops

in

the

flow

graph.

The

third

phase was the analysis of program charac

teristics.

This phase analyzed the characteristics

detected

in

the second phase to

determine

if

the student should

receive a message

containing

advice on

how

they

might

improve

their program. The messages were selected

from

com

mon

programming

errors and could

be

specialized

by

the

instructor

for

an

individual

assignment

[R0S08O].

This

implementation

concentrated on

evaluating

a pro

gramming

assignment,

but

unlike the other approaches,

it

assigned no grade to a student's program.

2. 2.2.3

M. REES

(STYLE)

Michael Rees'

[Rees82]

approach

to

grading

an

assignment's style was called STYLE. STYLE was

designed

to

accept as

input

the source of a

syntactically

correct pro

gram, make measures on

individual

criterion

in

one pass, on

(26)

The

final

mark was

influenced

by

_a

weighting

table supplied

by

the

instructor.

The

data

collected on each assignment

along

with,

in

parenthesis,

whether the value should

be

a

high

or

low

number and notes on changes made

by

Rees to

his

original

implementation

follows:

Layout

-line

length

-the average number of "significant"

characters per

line

(LOW),

-comments

-percentage of all program

lines

comprised

wholly

or

partially

of comments

(HIGH)

,

-indentation

-percentage of

lines

indented

in

any

way

(changed

to calculate changes of

indentation

on a

line

by

line

basis)

(HIGH)

,

-

blank

lines

-percentage of

blank

lines

in

a

program,

(changed

to

blank

lines were

subtracted

from

total

line

count before other measures were calculated)

(HIGH)

,

-embedded spaces

-additional spaces embedded within a

line

(HIGH).

Identifiers

-program

decomposition

-number of procedures and

functions

(HIGH)

By

dividing

this

figure

into

the total number of

lines,

a measure of module

length

was obtained

(LOW)

,

-variety

of reserved words

-count of the number of different reserved words used

(HIGH)

,

-

length

of

identifiers

-average

length

of all the

programmer-defined

identifiers

(HIGH)

,

-variety

of

identifiers

-number of

different

programmer-defined

identifiers,

(changed

to

number

of different

identifiers

as

a function of program length)

(MID)

,

- labels _and _gotos

-count the number of

occurrences of the reserved words "label" and

"goto"

(27)

The

grade

for

each value was obtained

using

the

follow

ing

parameters:

-max_.score

-the

maximum percentage mark allowed

for

the

criterion,

-lo_max,

hi_max

-the

low

and

high

value range of

the

criterion which will yield the maximum grade

for

that

criterion,

~

lo_no

to

lo_max

-the

interval

of the criterion which will yield a grade

from

zero

to

the

max_score on a

linear

basis,

-hi_max

to

hi_no

- _the

interval

of the criterion which will yield a grade

from

the max_score to zero on a

linear

basis,

-lo_no,

hi_no

-any

criterion

below

lo_no

and above

hi_no

yields a zero mark.

A visual representation of

how

these values work

fol

lows:

max_score

-0

lo_no

lo_max

hi^max

hi_no

An

illustration

of the

grading

of a criterion

follows.

If an

instructor

wished

to

grade

commenting

as

10%

of the

grade and was

looking

for

between

50%

to

70%

commenting

for

a perfect grade, and

for

anything

less

than

20%

or greater

than

90%

as

being

a zero grade, then the system parameters

(28)

-max_score =

10

-low_.no

=

20

_points

-low_max

=50

-hi_max

=70

0

-hi_no

=90

0

20

50

7 0

90

% of comments

in

program

This

would result

in

assigning

0

points

for

less

than

20%

or greater

than

90%

comments;

10 points

for

between

50%

to

70%

comments;

and a

linear

grade

between

0

and 10 points

for

between

20%

to

50%

or

7 0%

to

90%

comments.

The sum of each criterion's weighted grade yielded the

style grade. A sample

setting

for

the parameters

(max_score,

low_no,

low_max,

hi_max,

hi_no)

and output

from

two sample programs

follows

on the next page.

Another

observation made

by

Rees was that programs

which used some

form

of a

prettyprinting

before

being

graded

(29)

OUTPUT

FROM REES

************************************************************

SAMPLE

OF

PARAMETER

SETTINGS

************************************************************

Measure

max_score

low_no

low_max

high_max

high_.no

chars/line

15

12

15

25

30

%

comments

10

15

20

25

35

%

indentation

12

60

70

80

90

%

blank

lines

5

8

10

15

20

%

spaces

8

12

18

20

proc/fnc

length

20

10

20

35

50

#

reserved words

10

22

26

40

41

id.

length

20

7

9

15

16

#

identifiers

0

label

and gotos -20

1

3

199

200

************************************************************

OUTPUT OF THE PARAMETERS FOR TWO COURSES

************************************************************

Program

1

Program

2

Ave.

350

lines

Pascal

Ave.

750

lines

Pascal

Measure

low

mean max

low

mean max

chars/line

%

comments

%

indentation

%

blank

lines

%

spaces

proc/fnc length

#

reserved words

id.

length

#

identifiers

labels

and gotos

Marks

14

20

34

9

13

18

3

21

31

0

16

35

0

74

98

39

72

94

0

5

27

0

17

33

2

7

55

3

11

20

15

32

174

17

37

77

10

23

26

17

23

29

5

8

10

7

10

15

24

46

87

13

41

97

0

3

0

3

35

~~60

84

44

6~4

95

(30)

2. 3

SUMMARY

Both

the

automated and manual tools offered the same

benefit

to

the

student and

instructor

-they

provided a con

sistent

grading

_method. _The _automated _approaches _also

offered

the

benefits

of

being

efficient

for

the

instructor

and objective

for

the student. Style

is

not the

only

aspect

of a program that should

be

looked

at

by

the

instructor,

but

the

tools

reported on could aid at

making

the evaluation of

this

responsibility

of

evaluating

students'

programs.

The

acquisition of skills

in

computer

programming

can

be,

and often _was, a

challenging

and

rewarding

experience.

Unfortunately,

the need to

teach

larger

classes

consisting

of a wider

variety

of students

had

introduced many

problems.

Outstanding

among

these was the

tendency

of students

to

resort to unorthodox means

in

fulfilling

course requirements. In other _words,

students cheat

[Mill

811.

There are a

variety

of reasons and pressures which

cause students to cheat on

programming

assignments: some

students plagiarize

because

they

can not

do

the work them

selves, some students plagiarize to prove

they

can pull a

fast

one on the

instructor

and get

away

with

it,

some stu

dents

desire

to get

something

for

nothing, other students

only

cheat on assignments that

they

feel

were

busy

work

[Mill 81].

But the biggest reason of all was that

the

mone

tary

and social rewards were

very

attractive, or at

least

perceived as such,

in

this

field

[HwGi821.

Students should

be

given a sense of values

regarding

their chosen field. Employers who

hire

Computer

Science

gra

duates

should

be

able to trust a student's

knowledge

and

ability

in

the subject

[Mill81].

Thus

a

responsibility

to

(32)

When

cheating

_occurred

in

_courses

it:

-failed

to

establish a standard of professional

integrity,

-reduced the

ability

to make accurate assessments of student's _skills,

-demoralized

honest

students who

feel

(often

with reason) that

they

were

in

competition

with the _cheaters,

-wasted the

energy

of

both

faculty

and students,

-encouraged the cheaters to

believe

that

cheating

pays and that good grades were a

substitute

for

understanding

[Shaw80].

Students

can plagiarize

programming

assignments

in

a

variety

of ways:

-copying

a program and

changing

only

the author's name,

-copying

a program and

changing

the

documentation,

-copying

a program and

changing

the variable names,

-transposing

statements when the

ordering

of the statements

does

not effect the results,

-breaking

up

single statements such as

declarations

and output _statements,

-stealing

programs written

by

other _students,

-copying

a program and

changing

the logic a

little,

-copying

a program and

changing

the

logic

a

lot,

-copying

a program given

in

an earlier _class,

-having

someone else write all or part of the

program,

-copying

a program

by

changing

only

the

line

numbers

(Basic

and

Fortran),

[HwGi82]

,

[DLSp811,

[Mill81].

A discussion of ways

in

which students plagiarize and

some methods used

in

dealing

with plagiarism

follows.

Preventive approaches are

discussed

in

Section

3.2.1

and

(33)

3. 2

Detection

of plagiarized programs

is

a complicated

issue.

Both

Ottenstein

[Otte77]

and

Donaldson,

Lancaster

and

Sposato

[DLSp81]

realized

that

using

a grader alone was

inadequate

for

detecting

plagiarized programs. In the area

of plagiarism prevention

there

were a

variety

of approaches,

the next section will

discuss

some of them.

3. 2. 1

PREVENTIVE

APPROACHES

Hwang

and Gibson

[HwGi82]

summarized

five

different

approaches to

dealing

with plagiarism and their success with

these

approaches. Included

in

the

following

discussion

are

references to the other author's researched that

strengthened their position.

1.

Set

up

a punishment

policy

to

discourage

students

from

cheating.

Hwang

and Gibson

[HwGi82]

felt

that this

method was

ineffective,

since

it

was

essentially

negative.

A

totally

negative attitude was not the complete solution to

the problem,

but

it

was part of the solution. Miller

[Mill

81]

stated that the consequences of

plagiarizing

should

be

reasonable yet severe enough to point out

that

it

will

not

be

tolerated. Whatever penalties were

declared,

offenders must

be

dealt

with

fairly

and

firmly.

The student

should

be

aware of what the consequences of plagiarism will

(34)

A

list

of possible

disciplinary

actions

is

given below:

-actions within

the

_course,

-sharing

the

grade

among

guilty

students

[Mill81],

-negative credit

for

the assignment,

-no credit

for

the

assignment and loss of a

letter

grade

for

the course,

-makeup

assignment over the same material, no _credit,

-forced

drop

in

the course,

-failure

in

the course,

-actions within the

Computer

Science

Department,

-suspension

from Departmental

courses

for

a

designated

period,

-expulsion

from Departmental

courses,

-actions

by

the

University,

-warning,

-probation,

-suspension

from

the

University

for

a

designated

period,

-expulsion

from

the

University

[Shaw80].

2.

Set

up

a software plagiarism

detection

system.

Hwang

and Gibson

[HwGi82]

questioned

if

this approach would catch

every

type

of cheating,

they

felt

it

might

be

rather expen

sive. This approach

is

covered

in

Section

3. 2. 2.

3.

Raise the consciousness of the students to under

stand and appreciate what

they

must

know

in

order

to

obtain

a

degree.

This was a positive approach,

but

Hwang

and Gib

son

[HwGi82]

realized that students were too

interested

in

(35)

4.

Inform

the students that

they

may

be

called

into

the

office at

any

time

to

verify

what

they

"claim" to

have

learned

on a

programming

assignment.

Hwang

and Gibson

[HwGi82]

felt

this method was a cynical approach which

bred

mistrust and was not

too

effective. It was also apt to

invite

confrontations

between

students and instructors.

5.

Assign

grades

according

to the ratio of programming

assignments and exams

(including

routine quizzes). This was

the method supported

by

Hwang

and Gibson. Six different

ratio methods were

discussed

with the advantages and

disad

vantages of each

in

the article

[HwGi82].

The methods

along

with Hwang's and Gibson's labels are listed below:

A

-exams weighted

proportionately

heavier

than

programming

assignments, B

-programming

assignments weighted

proportionately

heavier

than exams,

C

-exams and

programming

assignments weighted

approximately

equally, D

-final

exam used as evidence of what the student

had

learned - _Fail _the

final

-Fail the course

E

-programming assignment related quiz associated

with each programming assignment, X

-percentage on _programming assignment-related

quiz applied to the score on the

programming

assignment,

Example: 100 points total

80

points on program assignment

90%

points

for

programming

assignment related quiz

80

*

90

=

72

_total points on the project Y

-score obtained on the

programming

assignment-related quiz added to the score

obtained on the _programming assignment, Example:

100

points

total

(50/50)

40

points

for

quiz

(36)

Hwang

and

Gibson

[HwGi82]

discarded

methods

B,

C and E

as

being

too

lenient

on

those

who _cheat; methods D and X

were

better

but

could penalize the

honest

student

if

they

happen

to

have

a

bad

day.

Thus

methods A and Y were the

better

choices with Y

being

the

best

because

of

its

fewer

listed

disadvantages.

The

advantages of method Y were as

follows:

-encourages students to

do

the

programming

assignments

in

order to

do

well on the

programming

assignment quiz,

-the

total

grade

actually

represented the

student's

understanding

of the

programming

assignment,

-the grade was proportional to the time and effort

expended,

-the method represented the students'

grade

very

well

for

all unexpected situations.

The

disadvantages

of Method Y were as

follows:

-if

the student

had

a

bad

day,

the grade on the quiz would not represent their true ability,

-if

the

programming

assignment quiz

did

not

represent the

programming

assignment well

the grade would not represent the student's

(37)

Shaw

[Shaw80]

outlined a series of actions which an

instructor

could use when

accusing

_a student of cheating:

-make copies of

the

evidence

(Ex.

program)

for

the

student and the

Department,

retaining

the original,

-in

the

presence of a witness confront the student

with the

allegation,

-if

after

the

confrontation the

instructor

decided

to

impose

a penalty, the

instructor

should so

inform

the student

by

letter,

the

letter

should

state the

basis

for

the action, the assigned

penalty

and student's right to appeal to the

University

Committee

on Discipline within one calendar week.

Throughout

the entire process,

it

was essential that

all meetings,

decisions,

and actions

be

documented

in

writ

(38)

Shaw

also outlined actions the computer science

depart

ment, the computer center and the

faculty

could

follow

for

the prevention and

detention

of plagiarism. These are

listed

below:

-possible

department

actions,

-develop

an on-line system to

detect

programs that were similar,

(see

Section

3. 2. 2)

-provide an adequate number of available,

knowledgeable

consultants to advise students

in

the

lower

level

courses,

-establish

facilities

for

in-class

examination of the student's programming,

-maintain records on

cheating

incidents

in

department

courses,

-spread the word that the

department

does

not condone cheating,

-possible computer center _actions,

-upgrade on-line assistance

including

help

facilities,

debuggers,

and on-line explanations of

routinely

encountered

errors,

-routinely

provide

information

on computer usage

including

the amount of time each

student

is

connected to various systems,

-provide closed trash cans

for

the

disposal

of program

listings,

-possible

instructor

actions,

-provide students with a

hand-out

stating

the

cheating

policy

and

disciplinary

action,

-

base

judgement

on the student's

mastery

of course material on work

done

in

monitored situations to the extent

educational objectives _permit,

- _provide _guidelines

for

_user

consultants,

indicating

what

kinds

of

help

they

should and should not give to students

in

each

course,

-use

any

automatic

detection

procedures

that become available.

(39)

3. 2. 2

DETECTION

APPROACHES

In

the

plagiarism

detection

_process

instructors

have

designed

systems to

detect

similarities

in

student programs.

The

next sections will present

five

different

views on the

detection

of plagiarism.

3. 2. 2. 1

K.

OTTENSTEIN

The earliest article available on

this

subject was

Ottenstein

[Otte77].

His approach was conservative

but

effective. His method was to count Halstead's

[Hals77]

software science criteria:

-nl - the

number of unique operators,

-n2

-the number of unique operands,

-Nl

-the total number of occurrences of operators,

-N2

-the

total

number of occurrences of operands.

for

each student's program.

Operators

consisted of control

structures as well as the normal program operators.

Reserved words other than control structures were not

counted. Each occurrence of an operator or operand was

called a token. He also calculated the size of

the

program

(in

tokens),

N,

which was Nl + N2. These

five

values were

assigned to each program and were the

basis

for

comparison

between

programs. Ottenstein

's

repfirting

of

this

informa

tion was

very

simple. The values nl, n2,

Nl,

N2,

and N

for

each

students'

information was reported on a

line.

These

lines

were sorted

by

program size (N).

Thus

an

instructor

(40)

look

back

at

the

other values to

determine

if

there was a

need

to

manually

review

the

similar programs.

This

approach was _successful

in

detecting

programs

changed

by

:

-reordering

time

independent

statements,

-recommenting,

-reformatting

of the

text,

-renaming

the variables and

labels.

It would not

detect

a student who cheated on

only

part

of a program.

Donaldson,

Lancester and Sposato

[DLSp81]

questioned

how

effective this method was

for

introductory

courses where there

may

be

only

slight variation

in

the

final

results.

3.2.2.2

S. ROBINSON

(ITPAD)

Robinson's and

[R0S08O]

approach

for

the collection and

reporting

of student program

information

was expanded to

detect

plagiarism.

(see

Section

2.2.2.2

for

a

discussion

of

the

basic

implementation. )

The method

for

detecting

possible

collaborators

followed

this procedure:

-group

the program

by

the number of

leaders,

(leaders

were a type of statement),

- _compare the _number _of _statements

in

_each

basic

block,

then eliminate the programs which

match less than 50% of the

time,

-compare the control structures and

retreating

edges, then eliminate the programs

that

have

different values,

-compare the

data

structures, and eliminate programs

with a difference of more

than

one

for

each

data

(41)

Since

this

approach

has

more

detail

and was

less

res

trictive

in

selecting

_similar _programs _than _Ottenstein

[Otte771,

it

matched more students. The question

is

whether

the

extra

information

was worth

the

extra time and resources

required. The

Robinson

results

did

not show much

justifica

tion

for

the

extra

detail.

In

fact

after visual

inspections

most of

the

extra programs which

they

selected appeared not

to

have

been

plagiarized.

3. 2. 2. 3

S. GRIER

(ACCUSE)

Grier's

[Grie81]

approach to plagiarism

detection

is

an

extension of Ottenstein's

[Otte77].

Grier's program,

ACCUSE,

calculated the

four

Halstead

[Hals77]

software sci

ence criteria plus 16 others

(see

Figure

1.2).

Through

testing

different

combinations of the 20 elements, seven

were retained to

determine

a correlation number.

An

interesting

calculation was used

by

Grier

[Grie81]

to

determine

the correlation number between two programs.

The correlation scheme

involved

computing

an

increment

for

each pair of affected programs

based

on the equation:

increment

=

"importance

factor"

-(pcounta

-pcountb)

where pcounta and pcountb represent criterion counts

for

the

two programs compared. If the

(pcounta

-pcountb) was

less

than or equal to some

"window

size",

depending

on the par

(42)

The

importance

factor

was the weight

for

each criterion

which affected

the

increment

value.

Each

of the seven

increments

was

totaled

to

form

_a _correlation _number.

The

following

is

a

list

of the seven

increments

dis

cussed,

listed

also are the

increments

window size and

importance

factor

and notes on

how

they

were calculated:

-Unique

operators

-(Begin

and End

ignored)

window size

5

importance

factor

6

-Unique

operands

-(for

each assignment operator

two

operands were subtracted)

window size

5

importance

factor

6

-Total

operators

-(does

not

include

assignment

operators, Begin and End

ignored)

window size

3

importance

factor

5

-Total operands

window size

3

importance

factor

5

- _Code lines

-(decremented

for

each assignment

operator,

ignore

blank

lines,

comments, and

declarations,

count

only

executable

lines

of code)

window size

3

importance

factor

5

- Variables

declared

(and

used)

window size

2

importance

factor

3

-Total control statements

window size

1

(43)

Grier

also produced

the

following

five

reports:

-report of each student's program's

20

criteria as

listed

in

Figure

1.2

measured

by

ACCUSE,

-report of each student's program's

7

criteria as

listed

in

Figure

1.2

used to compute the correlation number,

-a

triangular

matrix whose

entry

in

the

matrix

is

the correlation number

between

each program pair,

-frequency

distribution

graph that

indicates

the

number of pairs of programs with

the same correlation numbers,

-a

list

of all pairs of programs which

have

a

correlation number greater than or equal to

2 8

(32

was maximum correlation number).

These

were then used to

determine

which programs might

have

been

plagiarized. A manual

inspection

of each

suspected program was still necessary. This approach was

successful

in

detecting

programs changed

by

the

following

means:

-reordering

of time

independent

statements,

-recommenting,

-reformatting

of

text,

-renaming

variables and

labels,

-adding

unnecessary

initialization

and assignment statements,

-adding

excess

declarations.

ACCUSE was

designed

to

be

as

inexpensive

to

use as pos

sible. Thus the

idea

of _utilizing a

front

end of a compiler

was replaced with

Ottenstein'

s

[Otte77]

approach of a

fast

counter. The result was a compromise

between

speed and

(44)

3. 2. 2. 4

J.

DONALDSON,

A.

LANCASTER,

and P. SPASATO

Donaldson,

Lancaster

and

Sposato

[DLSp81]

approach to

plagiarism

included

two

data

collection phases: one to

gather

information

on

the

structure of the program, and the

other

to

gather

information

on the content of the assign

ment.

There

were also

two

data

analysis phases, one to

evaluate each

type

of

information

collected.

The

first

data

collection phase

(for

FORTRAN assign

ments)

kept

track

of

the

following

criteria:

-total

number of variables,

-total

number of subprograms,

-total

number of

input

statements,

-total

number of conditional statements,

-total

number of

loop

statements,

-total

number of assignment statements,

-total

number of calls to subprograms,

-total number of statements of type

2-7.

The second

data

collection phase characterized the

assignment

by

the order

in

which statements occurred. Each

type of statement was given a character code

(Ex.

X

for

log

ical

if).

As the assignment was processed a

string

of char

acter codes was produced.

The first

data

analysis phase performed the

following

three types of calculations on the

information

gathered

in

(45)

1.

Sum

of

the

difference

-corresponding

criterion values were subtracted and

the

absolute values of the

difference

were summed.

This

gave some

indication

of

how

two

assignments

differed

in

content.

2.

Count

of

similarity

-each

similarity

factor

starts at zero and was

incremented

by

one

for

each

corresponding

criterion value which was equal. This showed

how

many

cri

terion

values were equal

but

not which ones.

3.

Weighted

count of

similarity

-this method was an

extension of number

2

above. Instead of

incrementing by

one, the

increment

was

by

the weight given the criterion

values.

This

allowed the

instructor

to weight the criterion

value

according

to what was expected of the particular

assignment.

The second

data

analysis phase worked with the

string

of character codes

from

the second

data

collection phase.

It compressed

identical

characters

in

succession. The

resulting

strings were compared. If all the characters of the

string

matched that of another _student,

it

meant

the

two

(46)

This

approach was successful

in

detecting

programs

changed

by

the

following

means:

-transposing

statements when the

ordering

of the statement

does

not effect the results,

-altering

format

statments,

-breaking

up

single statements such as declarations and output _statments,

-renaming

variables and

labels,

-recommenting.

3. 2. 2. 5

M. REES

(CHEAT)

The

method

for

detecting

plagiarism started with Rees'

STYLE program

(see

Section

2.2.2.3). Robson

[Rees82]

added

to

STYLE and created a post processor called CHEAT which

looked

for

similar programs. After some

experimenting

the

following

criteria were selected

for

comparison:

-total

of non-comment _characters,

- _%

of embedded spaces,

-number of reserved _words,

-number of

identifiers,

-total number of

lines,

-number of procedure/functions.

This approach was similar to the others

in

that the

criteria

for

each student was compared. Student programs

with similar values were then verified

for

possible plagiar

(47)

3. 3

SUMMARY

The

tools

and

techniques

for

the

detection

of plagiar

ism

can

only

point

to

possible plagiarized programs. It

is

still

necessary

to

manually

inspect

the suspected programs

to

confirm plagiarism. The

tools

should report

broad

enough

information

on possible plagiarism so that changes

in

pla

giarism approaches will

be

flagged.

If a plagiarism tool

is

designed

too

restrictively

it

may

create a

false

sense of

security

for

the

instructor.

The

benefit

of

having

both

a plagiarism

policy

and

detection

mechanism was to create a

cheating

deterrent.

In partial

fulfillment

of this thesis a

tool

was

developed

that uses a

counting

approach similar to Donald

son, Lancaster and Sposato

[DLSp81].

See Section 7

for

(48)

4.

PROGRAM

DOCUMENTATION

4. 1

INTRODUCTION

AND

BACKGROUND

A program

that

is

easy

to read and understand

is

easier

to

test,

maintain and

modify

[Clif7 8].

Failure to ade

quately

document

software

leads

to:

higher

production and

maintenance costs, customer

dissatisfact

Techniques for grading programming labs

Rochester Institute of Technology

in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact

Computer Systems Management

DEDICATED TO

TECHNIQUES FOR GRADING PROGRAMMING LABS

husband

Acknowledgements

Sample

Bibliography

Output,

brief

Pascal

declared

indenting

Unfortunately,

PREVIOUS

uncertainty

Weinberg

forms

follows:

implementations

AN ADAPTATED APPROACH

design

length

validity

implement

follows:

0

0

0

0

0

0

0

easily

Robinson

frequency

0

factor

designed

visually

information

-OUTPUT FROM S. ROBINSON

ASSIGNMENT

basic

follows.

FROM S. ROBINSON

Assignment

level

blocks,

improve

final

lines

decomposition

identifiers,

criterion,

looking

0

0

0

0

between

OUTPUT

0

0

0

0

0

OUTPUT OF THE PARAMETERS FOR TWO COURSES

lines

0

0

0

0

0

0

0

0

instructor