Rochester Institute of Technology
RIT Scholar Works
Theses
Thesis/Dissertation Collections
1984
Techniques for grading programming labs
Kathleen Muller
Follow this and additional works at:
http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact
Recommended Citation
Rochester Institute of Technology
School of Computer Science and Technology
TECHNIQUES FOR GRADIIIG PROGRAIIIlIIIG LABS
by
Kathleen Muller
A thesis, submitted to
The Faculty of the School of Computer Science and Technology,
in partial fulfillment of the requirements for the degree of
Computer Systems Management
Approved by:
Lawrence
A.
Coon
John A.
Biles
James Harnmerton
Dr.
Professor
/l./!Wr
Lawrence Coon
Jo~!-{t;~f
DEDICATED TO
TECHNIQUES FOR GRADING PROGRAMMING LABS
Kathleen A.
Muller
•
I Kathl een Mull e
r'---_-"-:=.::"'-':.::;:~....=:;o.:...::o::.=~~---hereby grant permission to the wallace Memorial Library,
of Rochester Institute of Technology, to reproduce my
ACKNOWLEDGEMENTS
I would
like
to thank allthe
professors at RochesterInstitute
ofTechnology
whohelped
me obtain afine
education
in
thefield
of computer science. I would especiallylike
tothank
Larry
Coon,
AlBiles,
and Jim Hammertonfor
all their
time
and effort onthis
thesis.I would
like
to thankmy
husband
Tomfor
being
sosup
portive, and
my
children, Anna and Ericfor
being
so underABSTRACT
Techniques
for
manual and automated grading of programming
labs
arediscussed.
Topics
investigatedinclude:
general
grading
ofprogramming
labs,
plagiarismdetection,
program
documentation,
program output, and program efficiency.This
investigation
led to the development of automatedgrading
tools that report on style and point to possibleinstances
of plagiarism. The techniques utilized willbe
TABLE
OFCONTENTS
Preliminary
Information
i
Title
Page
i
Acknowledgements
ii
Abstract
iii
Table
ofContents
iv
1.
Research General
Information1
1. 1
Research
Background1
Figure
1.1
Chart of Available Tools2
Figure
1.2
Table of Information Collectedby
Tools3
2.
General
Grading
ofProgramming
Labs4
2. 1
Introduction
and Background4
2. 2
Previous
Work5
2. 2. 1
Manual Approaches6
2.2.1.1
G.Weinberg
and E. Schulman6
2.
2.1. 2
D. Clutterham6
2.2.1.3
N. Miller and C. Peterson6
2. 2. 1. 4
G. Morgan8
2.2.1.5
R.Hamm,
K.Henderson,
M. RepsherK. Timmer
8
2.
2.2
Automated Approaches11
2.
2. 2. 1
S. Robinson and S. Torsun11
2.2.2.2
S. Robinson(ITPAD)
12
2.2.2.3
M. Rees(STYLE)
172.
3
Summary
223.
Plagiarism23
3. 1
Introduction and Background23
3. 2
Previous Work25
3. 2. 1
Preventive Approaches25
3. 2. 2
Detection Approaches31
3.
2.2. 1
K.Ottenstein
31
3.2.2.2
S. Robinson(ITPAD)
32
3.2.2.3
S. Grier(ACCDSE)
33
3. 2. 2. 4
J.Donaldson,
A.Lancaster
and P. Sposato
36
3.2.2.5
M. Rees(CHEAT)
3 8
4.
Program Documentation
40
4. 1
Introduction
and Background40
4. 2
Previous
Work41
4. 3
Sample
Pascal
Program
Standards45
4. 4
Sample
Program
Documentation
Standards47
4. 5
Summary
4 8
5.
Program Output
49
5. 1
Introduction
and Background49
5. 2
General
Testing
Approaches49
5. 3
Summary
556.
ProgramEfficiency
566. 1
Introduction and Background56
6. 2
Previous Work 576. 3
Summary
58
7.
ToolsDeveloped
59
7- 1
Introduction
59
7.
2
Program Explanation59
7. 2. 1
Str. com. c - Comment Stripper60
7. 2. 2
Token.1
-Lexical Analyzer
61
7. 2. 3
Token,h
-Header
61
7. 2. 4
Style, c - Style Grader62
7.
2. 5
Plag. c - Plagiarism Detector67
7. 2. 6
Summary
71
7. 3
Results of the Tools73
7. 3. 1
Style Grader Program73
7.
3.2
Plagiarism Detection Program74
7. 4
User Information76
7. 5
Suggestionsfor
Future Development7 9
8.
Summary
80
Annotated
Bibliography
81
General
Grading
ofProgramming
Labs81
Plagirism
86
Program Documentation
89
Program Output
95
Program
Efficiency
99
Bibliography
1.
RESEARCH GENERAL
INFORMATION
1. 1
RESEARCH
BACKGROUNDThe
intent
ofthis
thesis
is
todetermine
the automatedtools
that
are availablefor
grading
programminglabs
anddetecting
plagiarism. Inresearching
the topicfive
categories emerged on which
instructors
might concentrate onwhen
grading
programs. These categories are:
-General
Grading
ofProgramming
Labs,
-Plagiarism
Detection,
-Program
Documentation,
-Program
Output,
-Program Efficiency.
Figure
1.1
contains a chart of thetools,
both
manualand automated, available
for
usein
each of thedifferent
categories. Figure
1.2
contains a chart of theinformation
collected
for
automated tools.In addition to automated
tools,
basic
information
onhow
to weight these categories was found andis
reported onin
the sections to follow.At the end of this thesis
is
an annotatedbibliography
as well as a bibliography. The annotated
bibliography
provides a
brief
review of the articles orbooks
which arereferenced
in
this thesis andis
organizedby
category,whereas the
bibliography
is
organizedalphabetically
by
************************************************************
Purpose
Language
Article
Implementationof
Tool
usedfor
found
in
notes************************************************************
GENERAL
GRADING
any
language
Pascal
any
language
any
language
Pascal
Pascal
Fortran
PLAGIARISM
PRETTY-PRINTING
Cobol
BasicPascal
FortranPascal
Fortran Pascal Pascal PL/1 PL/1Lisp/Rlisp
Pascal OUTPUT Basic Algol Algol EFFICIENCY Pascal SnobolPascal
C[HHRT83]
grading
sheet[Meek831
program style assessor[Morg82]
use of a rubberstamp
[MiPe80]
grading
sheet[Rees82]
program style assessor[Rose83]
program style assessor[R0S08O]
program style assessor[RoTo77]
program style assessor[DLS08I]
programplag
detection
[Grie81]
programplag
detection
[Otte77]
programplag
detection
[Rees82]
programplag
detection
[R0S08O]
programplag
detection
[Bate81]
prettyprinter
[Bond791
indentation
algorithm[Clif7 8]
connectorlines
[CoSm7 93
statement reformatter[HeNo7 9]
prettyprinter[LeHu77]
prettyprinter[Chan7 81
[FoWi6
5]
[H0II6O]
[Naur641
[MaMi76]
[RiGr75]
[Site7 8]
match output match output match output match output execution time execution time executiontime
prof
p
sp
p
sp
p
sp
p
sp
p
p
p
p
s c ***************************************************************VALUES
COUNTED
A BC
D E#
***************************************************************
1.
nl-#
of unique operatorsp
sp
p
s,p
2.
n2-#
of unique operands3.
Nl-total
operators4.
N2-total
operands5.
N-size of program N1+N2
6.
codelines
7.
variablesdeclared
(and
used)8.
total
control statements9.
total
lines
s cp
10.
averageline
length
s11.
code commentlines
c12.
use of comments s13.
use ofindentation
s14.
total
of non-comment charactersp
15.
use ofblank
lines
as separators s16.
multiple statementlines
c17.
constants and types c18.
number of reserved wordss,p
19.
variablesdeclared
(not
used) c20.
length
ofidentifier
s21.
number of procedure/functions s cp
s,p
22
total
calls to subroutinep
23.
total
input
statementsp
24. var parameters c
25. value parameters c
26.
#
andkind
ofdata
structure s27- procedure var
(includes
21,24)
c2 8.
total conditional statementsp
2 9.
#
andkind
of control structure s30.
for
statements c31.
repeat statements c32.
while statements c33.
goto statements c s34.
assignment statements sp
35.
loop
statementsp
36.
indenting
function
c37.
%
of embedded spacess,p
3 8.
vocabulary
of the program s39.
volume of the program s40.
level
of the program s41.
intelligence
content s42.
effort of the program ss -
information
usedin
style programp
- information usedin
plagiarism programc - information counted
but
not usedin
plagiarism program
#
A-I0tte77] ; B-[RoSo80] ;C-tGrie81]
;D-[DLSo811
;E-[Rees82]
2.
GENERAL
GRADING
OF
PROGRAMMING
LABS2. 1
INTRODUCTION
AND BACKGROUNDInstructors
confronted withlarge
numbers of programsto
grade tend todefend
themselvesin
several ways:they
may
employ
a cadre of graders orteaching
assistants,they
may
decrease
the number ofprogramming
assignments, orthey
may
be
forced
to grade sohastily
thatthey
seize one or twosimplistic criteria often unrelated to their course objec
tives.
Unfortunately,
this resultsin
evaluationincon
sistencies, a
loss
of student confidencein
grading
fair
ness, and a
diminished
level
of student competencein
programming
[HHRT83].
It
becomes
importantfor
the sake ofboth
students andinstructors
that efficient, objective criteriafor
grading
programs
be
developed. These criteria shouldaccurately
measure a student's achievements and avoid errors
in
evaluation
[Morg82].
By developing
somekind
of standardgrading
technique,
the studentknew
precisely
what was expected2. 2
PREVIOUS
WORKIn
researching
the
previous work ongrading
of programming
labs
two
approaches werefound
-manual
grading
systemsand automatic
grading
systems.Manual
grading
systems allbasically
took the sameapproach. In each case evaluation criteria
for
a programming
assignment wasdefined,
and then arating
scheme tobe
used to grade the program was
developed.
Knowing
theuncertainty
ofmarking
by
manual methods,it
was thought that automatic assessment of styleusing
simple algorithms could produce results
just
as valid and withimproved
consistency. At the same time automatic assessmentwould
completely
eliminating
time-consuming
manualinspec
tion
of programlistings
[Rees823.
A
discussion
ofboth
manual and automatic approaches2. 2. 1
MANUAL APPROACHES
This
sectionincludes
five
different
manual approachesto
the
grading
ofprogramming
labs.
2. 2. 1. 1
G.WEINBERG
AND E. SCHULMANWeinberg
andSchulman
[MiPe80]
graded programsby
ranking
the studentsaccording
to thefollowing
criteria:
-number of program statements,
-number of
hours
in
completing
the assignment,-output clarity,
-program clarity.
2. 2. 1. 2
D. CLUTTERHAMClutterham
[MiPe80]
used thefollowing
criteriafor
grading
a program,assigning
pointsfor
each criterion:
-correct answers,
-program
efficiency
in
terms oflength
(#
of statementsin
instructor's
programdivided
by
#
of statementsin
student's program multipliedby
total pointsfor
the criterion) ,
-correct termination of program.
2. 2. 1.
3
N. MILLER AND C. PETERSONMiller and Peterson
[MiPe80]
usedforms
attached toeach program with the evaluation criteria
listed,
along
withthe weight given
for
each criterion.They
felt
that
the
weighting factors
helped
make thegrading
more objective.was
for
students whodid
more than what was required.Four
sampleforms
were presentedby
the authors.One
was
the
originalform
the authors used, the other three wereother
instructor's
adaptations of the originalform.
Theoriginal and one of the adaptations
follows:
ORIGINAL APPROACH
Algorithm
(10%)
Structure
chartshowing
calling
hierarchy
(5%)
Detailed
algorithm expressionfor
each module(5%)
Program style and
clarity
(25%)
Internal
documentation
(10%)
Meaningful
identifiers
(5%)
Formatted
listing
(10%)
Output
(45%)
Correct
for
specificinput
(35%)
Easy
to read(5%)
Graceful
termination(5%)
Refinements above minimum
(20%)
Algorithm clarity, efficiency, and/or elegance
(5%)
"Elegant"
implementations
(10%)
Output embellishments
(2%)
Exemplary
programdesign
andimplementation
(3%)
AN ADAPTATED APPROACH
Top
down
design
(40%)
Detailed problem
definition
(20%)
Refinement of the problem
using
a
level
by
level
approach(20%)
Program style and
clarity
(20%)
Description of all
data
structures(5%)
Meaningful
identifiers
(5%)
Proper
indentation
(5%)
Modular
design
(5%)
Output
(20%)
Correctness
(15%)
Well organized and readable
(5%)
Refinements
-Superior work
(20%)
Program
length
(5%)
Output embellishments
(5%)
2. 2. 1. 4
G.MORGAN
Morgan
[Morg82]
used the same approach oflisting
thecriteria
to
evaluate,
and thenrating
each criterion. Morgan[Morg82]
used a rubberstamp
applied to thefront
of eachprogram to grade each program, rather than an attached form.
A sample
format
for
the rubberstamp
asit
mightbe
filled
out
follows:
Timely
2
3
4
(J)
Problem
definition
2
3
4^
I/O
design
2
3
<|)
5Logic
design
2
(f)
45
Source
program2
3
@
5
Test
validity
2
3
@
5
This
student would receive an83%
for
thelab,
sincethere were
25
points awarded out of a possible3 0.
2. 2. 1. 5
R.HAMM,
K.HENDERSON,
M. REPSHERT and K. TIMMERHamm,
Henderson,
Repshert and Timmer[HHRT83]
borrowed
an approach used
in
grading
English
Compositions
called the"Diederich
Scale".They
felt
that there was asimilarity
between
writing
a computer program andwriting
anEnglish
paper. Thus
using
a similar approachin
thegrading
of eachThe
following
conceptstied
English
compositions to computerprograms:
-both
are the solution to a communication problem- the
composition
-communicates with other
persons
- the
program
-communicates with a
computer
-both
start with an outline or flowchart-both
implement
the outline or flowchart-both
have
qualities of style andindividuality
-both
create aheavy
paper-load on theinstructor
-both
students expect a consistent grading betweeninstructors.
The
proposed systemhad
aweighting
scheme similar to thatof Miller and Peterson
[MiPe80].
Alist
of criteria, with asample weight scale
for
an English compositionfollows:
p-poor; a-adequate; g-good
p
ag
ideas
organization
flavor
wording
usagespelling
punctuation2
4
6
8
10
2
4
6
8
10
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
7
13
20
7
13
20
4
8
12
16
20
4
8
12
16
20
4
8
12
16
20
4
8
12
16
20
4
8
12
16
20
A
list
ofcriteria,
with a sample weight scalefor
a computer program
follows:
p-poor; a-adequate; g-good
p a
g
execution of the program,
0
correctness of the output,0
design
of the output,0
design
ofthe
logic,
0
design
oftest
data,
0
internal
documentation,
0
externaldocumentation,
0
A program was written to generate a specific
form
for
each assignment, so that changes to the
list
of criteriagraded and the weight assigned to each could
be
easily
made.The
form
would contain someidentifying
information
and thecriterion and weight assigned to each criterion
for
eachassignment
(similar
to the previous example). Theforms
for
the appropriate assignment would then
be
filled
outby
the2. 2. 2
AUTOMATED APPROACHES
This
sectionincludes
three
different
approaches to theautomatic
grading
ofprogramming
labs.
2. 2. 2. 1
S.ROBINSON
and S. TORSUNRobinson
andTorsun
[RoTo77]
used an approachwhereby
the
set of submitted solutions wereautomatically
assessedrelative to a solution produced
by
theinstructor.
They
used a program which classified each source statement
by
its
relative
importance
to the execution of the whole program,then a report was produced that
listed
thefollowing
for
each program statement:
-an estimate of execution time
(a),
-a count of the
frequency
of execution(b),
-an estimate of
total
execution time(a*b).
From
these
three values theimportance
factor
for
eachstatement was calculated. The
importance
factor
was therelative contribution of a statement to the overall execu
tion
time of the program, expressed as ten times the percentage of
total
execution run time. Thus animportance
factor
could range
from
0
to1000,
with alarger
valueindicating
ahigher
cost statement. Theimportance
factor
wasthen
usedto
produce a graph. The x coordinatebeing
theimportance
factor
and they
coordinatebeing
the statements rankin
order of importance. The student's graph was
then
comparedRobinson
andTorsun
showed that asprogramming
style wasimproved
the
graph would mold to theinstructor's
solution.A major problem with
this
method, as with the otherautomated methods, was that
if
output was not also lookedat, the program could
fit
into
the correct measurements, yetnot solve
the
problemit
wasdesigned
for.
Also this systemwill not mark or
take
into
consideration original solutionsor
readability
[RoTo771.
2.2.2.2
S. ROBINSON(ITPAD)
The
approach of Robinson[R0S08O]
was to use modifiedcode optimization
techniques
and software science measuresto analyze FORTRAN source programs. Each student's program
went
through
three phases of analyze. With theinformation
that the system collected, the
following
functions
were performed:
-each student's program was examined
visually
for
certaindesign
requirements,
-the progress of a student
through
a quarter wasevaluated,
-the programming assignments were evaluated to see
if
the student wasusing
thedesired
concepts,
-possible plagiarism was
looked
for,
The
first
phase wasthe
lexical
analysis phase. Inthis
phasefourteen
program characteristics(listed
in
Fig
ure
1.2)
weretracked.
Theinformation
gatheredin
thisphase was used to create
two
profiles, the student's profileand
the
assignment's profile.The
student's profile containedinformation
about thecontrol structures,
retreating
edges, anddata
structuresthat a student employed
throughout
the quarter. A retreatingedge
is
the edge of a program graph which represents areturn to
the
beginning
of aloop.
The student's profileaided
in
determining
whether or not a studenthad
mastered aparticular topic. An
instructor's
model program was usedfor
comparing
programs.Following
is
a sample of theinfor
mation contained
in
a student's profilefor
four
programming
STUDENT
PROFILE-OUTPUT FROM S. ROBINSON
************************************************************
ASSIGNMENT
NUMBER1
2
3
4
************************************************************
Control
Structures:
if-then
12
3
if-then-else
1
else-if
Logical
if
with goto
4
without goto
while
2
for
indexed
do
1
3
goto 5
Data Structures
real
3
5
integer
56
6
5
Basic Blocks 8
23
12
6
Retreating
edges
7-2
22-2
11-23-3
The assignment profile contained the software science
measures:
the
control structures used, the retreating edges,the
number ofbasic
blocks
and thedata
structures usedby
the
studentsfor
each assignment[R0S08O].
The assignmentprofile gave
insight
into
how
effective a programmingassignment was at
displaying
a student's understanding of aparticular concept. It
did
soby
revealing
the general concepts used
by
the students to solve the problem. Did thestudent use the new material
in
the assignment ordid
they
use older material that
they
felt
more comfortable with?Three sample assignment profiles
from
Robinson'sapproach
follows.
Thisinformation
contains thedifferent
ranges of values
(low,
model,high)
for
each CriterionASSIGNMENT
PROFILE
- OUTPUTFROM S. ROBINSON
************************************************************
Assignment
Profile of Program1
************************************************************
low
modelhigh
unique variables
3
4
13
total
variables14
18
46
unique operators
4
5
8
total
operators10
13
34
assignments
1
511
length
24
31
76
vocabulary
9
9
19
volume
79
98
3 23
level
. 301.0
1.3
intelligence
content6.1
8.7
3 0. 5
effort
59
98
1061leaders
2
3
16
************************************************************ Assignment Profile of Program
2
************************************************************
low
modelhigh
unique variables
6
7
17
total
variables32
4695
unique operators
4
5
7
total
operators13
25
53
assignments
4
6
26
length
45
71
132
vocabulary
10
12
23
volume
150
254
565
level
. 481.0
1.12
intelligence
content10. 9
15. 5
99.0
effort
87
2541045
leaders
4
1736
************************************************************
Assignment Profile of Program
3
************************************************************
low
modelhigh
unique variables
7
9
19
total
variables19
21
34
unique operators
4
6
10
total operators
10
13
80
assignments
6
7
39
length
31
47
189
vocabulary
12
15
22
volume
111
184
7 56
level
.241.0
1.62
intelligence content
12.4
16.2
44.0
The
second phase was the analysis of program structure.This
phase obtained characteristicsby:
-dividing
the programinto
basic
blocks,
-constructing
aflow
graphfrom basic
blocks,
the
flow
graph was constructedby
examining
all statements that could cause transfer to otherbasic
blocks,
-constructing
adirected
acyclic graph(DAG)
for
eachbasic
block,
the DAG presents a picture ofhow
the the value computedby
each statementin
abasic
block
were usedby
subsequentstatements
in
the
block,
-performing
data
flow
analysis on theflow
graph,-detecting
loops
in
theflow
graph.The
third
phase was the analysis of program characteristics.
This phase analyzed the characteristicsdetected
in
the second phase todetermine
if
the student shouldreceive a message
containing
advice onhow
they
mightimprove
their program. The messages were selectedfrom
common
programming
errors and couldbe
specializedby
theinstructor
for
anindividual
assignment[R0S08O].
This
implementation
concentrated onevaluating
a programming
assignment,but
unlike the other approaches,it
assigned no grade to a student's program.
2. 2.2.3
M. REES(STYLE)
Michael Rees'
[Rees82]
approach
to
grading
anassignment's style was called STYLE. STYLE was
designed
toaccept as
input
the source of asyntactically
correct program, make measures on
individual
criterionin
one pass, onThe
final
mark wasinfluenced
by
aweighting
table suppliedby
theinstructor.
The
data
collected on each assignmentalong
with,in
parenthesis,
whether the value shouldbe
ahigh
orlow
number and notes on changes made
by
Rees tohis
originalimplementation
follows:
Layout
-line
length
-the average number of "significant"
characters per
line
(LOW),
-comments
-percentage of all program
lines
comprised
wholly
orpartially
of comments(HIGH)
,-indentation
-percentage of
lines
indented
in
any
way
(changed
to calculate changes ofindentation
on aline
by
line
basis)
(HIGH)
,-
blank
lines
-percentage of
blank
linesin
aprogram,
(changed
toblank
lines weresubtracted
from
totalline
count before other measures were calculated)(HIGH)
,-embedded spaces
-additional spaces embedded within a
line
(HIGH).
Identifiers
-program
decomposition
-number of procedures and
functions
(HIGH)
By
dividing
thisfigure
into
the total number of
lines,
a measure of modulelength
was obtained(LOW)
,
-variety
of reserved words-count of the number of different reserved words used
(HIGH)
,-
length
of
identifiers
-average
length
of all theprogrammer-defined
identifiers
(HIGH)
,
-variety
ofidentifiers
-number of
different
programmer-defined
identifiers,
(changed
to
numberof different
identifiers
asa function of program length)
(MID)
,- labels and gotos
-count the number of
occurrences of the reserved words "label" and
"goto"
The
gradefor
each value was obtainedusing
thefollow
ing
parameters:
-max_.score
-the
maximum percentage mark allowedfor
thecriterion,
-lo_max,
hi_max
-the
low
andhigh
value range ofthe
criterion which will yield the maximum gradefor
that
criterion,
~lo_no
tolo_max
-the
interval
of the criterion which will yield a gradefrom
zeroto
themax_score on a
linear
basis,
-hi_max
tohi_no
- theinterval
of the criterion which will yield a gradefrom
the max_score to zero on alinear
basis,
-lo_no,
hi_no
-any
criterionbelow
lo_no
and abovehi_no
yields a zero mark.A visual representation of
how
these values workfol
lows:
max_score
-0
lo_no
lo_max
hi^max
hi_no
An
illustration
of thegrading
of a criterionfollows.
If an
instructor
wishedto
gradecommenting
as10%
of thegrade and was
looking
for
between
50%
to70%
commenting
for
a perfect grade, and
for
anything
less
than
20%
or greaterthan
90%
asbeing
a zero grade, then the system parameters
-max_score =
10
10
-low_.no
=20
points-low_max
=50
-hi_max
=700
-hi_no
=900
20
50
7 0
90
% of comments
in
programThis
would resultin
assigning
0
pointsfor
less
than20%
or greaterthan
90%
comments;
10 pointsfor
between
50%to
70%
comments;
and alinear
gradebetween
0
and 10 pointsfor
between
20%
to50%
or7 0%
to90%
comments.The sum of each criterion's weighted grade yielded the
style grade. A sample
setting
for
the parameters(max_score,
low_no,
low_max,
hi_max,
hi_no)
and outputfrom
two sample programs
follows
on the next page.Another
observation madeby
Rees was that programswhich used some
form
of aprettyprinting
before
being
gradedOUTPUT
FROM REES************************************************************
SAMPLE
OFPARAMETER
SETTINGS************************************************************
Measure
max_scorelow_no
low_max
high_max
high_.no
chars/line
15
12
15
25
30
%
comments10
15
20
25
35
%
indentation
12
60
70
80
90
%
blank
lines
5
8
10
15
20
%
spaces8
8
12
18
20
proc/fnc
length
20
10
20
35
50
#
reserved words10
22
2640
41
id.
length
20
7
9
1516
#
identifiers
0
0
0
0
0
label
and gotos -201
3
199
200************************************************************
OUTPUT OF THE PARAMETERS FOR TWO COURSES
************************************************************
Program
1
Program2
Ave.
350
lines
Pascal
Ave.
750
lines
Pascal
Measure
low
mean maxlow
mean maxchars/line
%
comments%
indentation%
blank
lines%
spacesproc/fnc length
#
reserved wordsid.
length
#
identifierslabels
and gotosMarks
14
20
34
9
13
18
3
21
31
0
16
35
0
74
98
39
72
94
0
5
27
0
1733
2
7
55
3
11
20
15
32
174
17
37
77
10
23
26
17
23
29
5
8
10
7
10
15
24
46
87
13
41
97
0
0
3
0
0
3
35
~~60
84
44
6~4
95
2. 3
SUMMARY
Both
the
automated and manual tools offered the samebenefit
tothe
student andinstructor
-they
provided a consistent
grading
method. The automated approaches alsooffered
the
benefits
ofbeing
efficientfor
the
instructor
and objective
for
the student. Styleis
not theonly
aspectof a program that should
be
looked
atby
theinstructor,
but
the
tools
reported on could aid atmaking
the evaluation ofthis
category
efficient and objective.In partial
fulfillment
of this thesis an automaticstyle grader was
developed
with a similargrading
approach3.
PLAGIARISM
3. 1
INTRODUCTION
ANDBACKGROUND
The
problem ofthe
possibility
of plagiarized programscompounds
the
already difficult
responsibility
ofevaluating
students'
programs.
The
acquisition of skillsin
computerprogramming
canbe,
and often was, achallenging
andrewarding
experience.Unfortunately,
the need toteach
larger
classesconsisting
of a widervariety
of studentshad
introduced many
problems.Outstanding
among
these was thetendency
of studentsto
resort to unorthodox meansin
fulfilling
course requirements. In other words,
students cheat
[Mill
811.
There are a
variety
of reasons and pressures whichcause students to cheat on
programming
assignments: somestudents plagiarize
because
they
can notdo
the work themselves, some students plagiarize to prove
they
can pull afast
one on theinstructor
and getaway
withit,
some students
desire
to getsomething
for
nothing, other studentsonly
cheat on assignments thatthey
feel
werebusy
work[Mill 81].
But the biggest reason of all was thatthe
monetary
and social rewards werevery
attractive, or atleast
perceived as such,
in
thisfield
[HwGi821.
Students should
be
given a sense of valuesregarding
their chosen field. Employers who
hire
Computer
Science
graduates
shouldbe
able to trust a student'sknowledge
andability
in
the subject[Mill81].
Thus
aresponsibility
toWhen
cheating
occurredin
coursesit:
-failed
to
establish a standard of professionalintegrity,
-reduced the
ability
to make accurate assessments of student's skills,
-demoralized
honest
students whofeel
(often
with reason) that
they
werein
competitionwith the cheaters,
-wasted the
energy
ofboth
faculty
and students,
-encouraged the cheaters to
believe
thatcheating
pays and that good grades were asubstitute
for
understanding
[Shaw80].
Students
can plagiarizeprogramming
assignmentsin
avariety
of ways:
-copying
a program andchanging
only
the author's name,
-copying
a program andchanging
thedocumentation,
-copying
a program andchanging
the variable names,-transposing
statements when theordering
of the statementsdoes
not effect the results,
-breaking
up
single statements such asdeclarations
and output statements,
-stealing
programs writtenby
other students,-copying
a program andchanging
the logic alittle,
-copying
a program andchanging
thelogic
alot,
-copying
a program givenin
an earlier class,-having
someone else write all or part of theprogram,
-copying
a programby
changing
only
theline
numbers(Basic
andFortran),
[HwGi82]
,[DLSp811,
[Mill81].
A discussion of ways
in
which students plagiarize andsome methods used
in
dealing
with plagiarismfollows.
Preventive approaches are
discussed
in
Section
3.2.1
and3. 2
PREVIOUS
WORKDetection
of plagiarized programsis
a complicatedissue.
Both
Ottenstein
[Otte77]
andDonaldson,
Lancasterand
Sposato
[DLSp81]
realizedthat
using
a grader alone wasinadequate
for
detecting
plagiarized programs. In the areaof plagiarism prevention
there
were avariety
of approaches,the next section will
discuss
some of them.3. 2. 1
PREVENTIVEAPPROACHES
Hwang
and Gibson[HwGi82]
summarizedfive
different
approaches to
dealing
with plagiarism and their success withthese
approaches. Includedin
thefollowing
discussion
arereferences to the other author's researched that
strengthened their position.
1.
Setup
a punishmentpolicy
todiscourage
studentsfrom
cheating.Hwang
and Gibson[HwGi82]
felt
that thismethod was
ineffective,
sinceit
wasessentially
negative.A
totally
negative attitude was not the complete solution tothe problem,
but
it
was part of the solution. Miller[Mill
81]
stated that the consequences ofplagiarizing
shouldbe
reasonable yet severe enough to point outthat
it
willnot
be
tolerated. Whatever penalties weredeclared,
offenders must
be
dealt
withfairly
andfirmly.
The studentshould
be
aware of what the consequences of plagiarism willA
list
of possibledisciplinary
actionsis
given below:
-actions within
the
course,
-sharing
the
gradeamong
guilty
students[Mill81],
-negative credit
for
the assignment,
-no credit
for
the
assignment and loss of aletter
gradefor
the course,-makeup
assignment over the same material, no credit,
-forced
drop
in
the course,-failure
in
the course,-actions within the
Computer
ScienceDepartment,
-suspension
from Departmental
coursesfor
adesignated
period,
-expulsion
from Departmental
courses,
-actions
by
theUniversity,
-warning,
-probation,
-suspension
from
theUniversity
for
adesignated
period,
-expulsion
from
theUniversity
[Shaw80].
2.
Setup
a software plagiarismdetection
system.Hwang
and Gibson
[HwGi82]
questionedif
this approach would catchevery
type
of cheating,they
felt
it
mightbe
rather expensive. This approach
is
coveredin
Section3. 2. 2.
3.
Raise the consciousness of the students to understand and appreciate what
they
mustknow
in
orderto
obtaina
degree.
This was a positive approach,but
Hwang
and Gibson
[HwGi82]
realized that students were toointerested
in
4.
Inform
the students thatthey
may
be
calledinto
theoffice at
any
time
to
verify
whatthey
"claim" tohave
learned
on aprogramming
assignment.Hwang
and Gibson[HwGi82]
felt
this method was a cynical approach whichbred
mistrust and was not
too
effective. It was also apt toinvite
confrontationsbetween
students and instructors.5.
Assign
gradesaccording
to the ratio of programmingassignments and exams
(including
routine quizzes). This wasthe method supported
by
Hwang
and Gibson. Six differentratio methods were
discussed
with the advantages anddisad
vantages of each
in
the article[HwGi82].
The methodsalong
with Hwang's and Gibson's labels are listed below:
A
-exams weighted
proportionately
heavier
thanprogramming
assignments, B-programming
assignments weightedproportionately
heavier
than exams,C
-exams and
programming
assignments weightedapproximately
equally, D-final
exam used as evidence of what the studenthad
learned - Fail thefinal
-Fail the course
E
-programming assignment related quiz associated
with each programming assignment, X
-percentage on programming assignment-related
quiz applied to the score on the
programming
assignment,Example: 100 points total
80
points on program assignment90%
pointsfor
programming
assignment related quiz
80
*90
=72
total points on the project Y-score obtained on the
programming
assignment-related quiz added to the scoreobtained on the programming assignment, Example:
100
pointstotal
(50/50)
40
pointsfor
quizHwang
andGibson
[HwGi82]
discarded
methodsB,
C and Eas
being
toolenient
onthose
who cheat; methods D and Xwere
better
but
could penalize thehonest
studentif
they
happen
to
have
abad
day.
Thus
methods A and Y were thebetter
choices with Ybeing
thebest
because
ofits
fewer
listed
disadvantages.
The
advantages of method Y were asfollows:
-encourages students to
do
theprogramming
assignmentsin
order todo
well on theprogramming
assignment quiz,
-the
total
gradeactually
represented thestudent's
understanding
of theprogramming
assignment,
-the grade was proportional to the time and effort
expended,
-the method represented the students'
grade
very
well
for
all unexpected situations.The
disadvantages
of Method Y were asfollows:
-if
the studenthad
abad
day,
the grade on the quiz would not represent their true ability,
-if
theprogramming
assignment quizdid
notrepresent the
programming
assignment wellthe grade would not represent the student's
Shaw
[Shaw80]
outlined a series of actions which aninstructor
could use whenaccusing
a student of cheating:
-make copies of
the
evidence(Ex.
program)for
thestudent and the
Department,
retaining
the original,
-in
the
presence of a witness confront the studentwith the
allegation,
-if
afterthe
confrontation theinstructor
decidedto
impose
a penalty, theinstructor
should soinform
the studentby
letter,
theletter
shouldstate the
basis
for
the action, the assignedpenalty
and student's right to appeal to theUniversity
Committee
on Discipline within one calendar week.Throughout
the entire process,it
was essential thatall meetings,
decisions,
and actionsbe
documented
in
writShaw
also outlined actions the computer sciencedepart
ment, the computer center and the
faculty
couldfollow
for
the prevention and
detention
of plagiarism. These arelisted
below:
-possible
department
actions,
-develop
an on-line system todetect
programs that were similar,(see
Section
3. 2. 2)
-provide an adequate number of available,
knowledgeable
consultants to advise studentsin
thelower
level
courses,
-establish
facilities
for
in-class
examination of the student's programming,
-maintain records on
cheating
incidents
in
department
courses,
-spread the word that the
department
does
not condone cheating,
-possible computer center actions,
-upgrade on-line assistance
including
help
facilities,
debuggers,
and on-line explanations ofroutinely
encounterederrors,
-routinely
provideinformation
on computer usageincluding
the amount of time eachstudent
is
connected to various systems,
-provide closed trash cans
for
thedisposal
of programlistings,
-possible
instructor
actions,
-provide students with a
hand-out
stating
the
cheating
policy
anddisciplinary
action,
-
base
judgement
on the student's
mastery
of course material on work
done
in
monitored situations to the extent
educational objectives permit,
- provide guidelines
for
userconsultants,
indicating
whatkinds
ofhelp
they
should and should not give to studentsin
eachcourse,
-use
any
automaticdetection
proceduresthat become available.
3. 2. 2
DETECTION
APPROACHES
In
the
plagiarismdetection
processinstructors
have
designed
systems todetect
similaritiesin
student programs.The
next sections will presentfive
different
views on thedetection
of plagiarism.3. 2. 2. 1
K.OTTENSTEIN
The earliest article available on
this
subject wasOttenstein
[Otte77].
His approach was conservativebut
effective. His method was to count Halstead's
[Hals77]
software science criteria:
-nl - the
number of unique operators,
-n2
-the number of unique operands,
-Nl
-the total number of occurrences of operators,
-N2
-the
total
number of occurrences of operands.for
each student's program.Operators
consisted of controlstructures as well as the normal program operators.
Reserved words other than control structures were not
counted. Each occurrence of an operator or operand was
called a token. He also calculated the size of
the
program(in
tokens),
N,
which was Nl + N2. Thesefive
values wereassigned to each program and were the
basis
for
comparisonbetween
programs. Ottenstein's
repfirting
ofthis
informa
tion was
very
simple. The values nl, n2,Nl,
N2,
and Nfor
each
students'
information was reported on a
line.
These
lines
were sortedby
program size (N).Thus
aninstructor
look
back
atthe
other values todetermine
if
there was aneed
to
manually
reviewthe
similar programs.This
approach was successfulin
detecting
programschanged
by
:
-reordering
time
independent
statements,-recommenting,
-reformatting
of thetext,
-renaming
the variables andlabels.
It would not
detect
a student who cheated ononly
partof a program.
Donaldson,
Lancester and Sposato[DLSp81]
questioned
how
effective this method wasfor
introductory
courses where there
may
be
only
slight variationin
thefinal
results.3.2.2.2
S. ROBINSON(ITPAD)
Robinson's and
[R0S08O]
approachfor
the collection andreporting
of student programinformation
was expanded todetect
plagiarism.(see
Section2.2.2.2
for
adiscussion
ofthe
basic
implementation. )
The methodfor
detecting
possiblecollaborators
followed
this procedure:
-group
the programby
the number ofleaders,
(leaders
were a type of statement),- compare the number of statements
in
eachbasic
block,
then eliminate the programs whichmatch less than 50% of the
time,
-compare the control structures and
retreating
edges, then eliminate the programs
that
have
different values,
-compare the
data
structures, and eliminate programswith a difference of more
than
onefor
eachdata
Since
this
approachhas
moredetail
and wasless
restrictive
in
selecting
similar programs than Ottenstein[Otte771,
it
matched more students. The questionis
whetherthe
extrainformation
was worththe
extra time and resourcesrequired. The
Robinson
resultsdid
not show muchjustifica
tion
for
the
extradetail.
Infact
after visualinspections
most of
the
extra programs whichthey
selected appeared notto
have
been
plagiarized.3. 2. 2. 3
S. GRIER(ACCUSE)
Grier's
[Grie81]
approach to plagiarismdetection
is
anextension of Ottenstein's
[Otte77].
Grier's program,ACCUSE,
calculated thefour
Halstead[Hals77]
software science criteria plus 16 others
(see
Figure1.2).
Throughtesting
different
combinations of the 20 elements, sevenwere retained to
determine
a correlation number.An
interesting
calculation was usedby
Grier[Grie81]
to
determine
the correlation number between two programs.The correlation scheme
involved
computing
anincrement
for
each pair of affected programs
based
on the equation:increment
="importance
factor"-(pcounta
-pcountb)
where pcounta and pcountb represent criterion counts
for
the
two programs compared. If the
(pcounta
-pcountb) was
less
than or equal to some
"window
size",depending
on the parThe
importance
factor
was the weightfor
each criterionwhich affected
the
increment
value.Each
of the sevenincrements
wastotaled
to
form
a correlation number.The
following
is
alist
of the sevenincrements
dis
cussed,
listed
also are theincrements
window size andimportance
factor
and notes onhow
they
were calculated:
-Unique
operators-(Begin
and Endignored)
window size
5
importance
factor
6
-Unique
operands-(for
each assignment operatortwo
operands were subtracted)window size
5
importance
factor
6
-Total
operators-(does
notinclude
assignmentoperators, Begin and End
ignored)
window size
3
importance
factor
5
-Total operands
window size
3
importance
factor
5
- Code lines
-(decremented
for
each assignmentoperator,
ignore
blank
lines,
comments, anddeclarations,
countonly
executablelines
of code)
window size
3
importance
factor
5
- Variables
declared
(and
used)window size
2
importance
factor
3
-Total control statements
window size
1
Grier
also producedthe
following
five
reports:
-report of each student's program's
20
criteria aslisted
in
Figure1.2
measured
by
ACCUSE,
-report of each student's program's
7
criteria aslisted
in
Figure
1.2
used to compute the correlation number,
-a
triangular
matrix whoseentry
in
the
matrixis
the correlation numberbetween
each program pair,
-frequency
distribution
graph thatindicates
thenumber of pairs of programs with
the same correlation numbers,
-a
list
of all pairs of programs whichhave
acorrelation number greater than or equal to
2 8
(32
was maximum correlation number).These
were then used todetermine
which programs mighthave
been
plagiarized. A manualinspection
of eachsuspected program was still necessary. This approach was
successful
in
detecting
programs changedby
thefollowing
means:
-reordering
of timeindependent
statements,-recommenting,
-reformatting
oftext,
-renaming
variables andlabels,
-adding
unnecessary
initialization
and assignment statements,
-adding
excessdeclarations.
ACCUSE was
designed
tobe
asinexpensive
to
use as possible. Thus the
idea
of utilizing afront
end of a compilerwas replaced with
Ottenstein'
s
[Otte77]
approach of afast
counter. The result was a compromise
between
speed and3. 2. 2. 4
J.DONALDSON,
A.LANCASTER,
and P. SPASATODonaldson,
Lancaster
andSposato
[DLSp81]
approach toplagiarism
included
twodata
collection phases: one togather
information
onthe
structure of the program, and theother
to
gatherinformation
on the content of the assignment.
There
were alsotwo
data
analysis phases, one toevaluate each
type
ofinformation
collected.The
first
data
collection phase(for
FORTRAN assignments)
kept
track
ofthe
following
criteria:
-total
number of variables,
-total
number of subprograms,
-total
number ofinput
statements,
-total
number of conditional statements,
-total
number ofloop
statements,
-total
number of assignment statements,
-total
number of calls to subprograms,
-total number of statements of type
2-7.
The second
data
collection phase characterized theassignment
by
the orderin
which statements occurred. Eachtype of statement was given a character code
(Ex.
Xfor
log
ical
if).
As the assignment was processed astring
of character codes was produced.
The first
data
analysis phase performed thefollowing
three types of calculations on the
information
gatheredin
1.
Sum
ofthe
difference
-corresponding
criterion values were subtracted andthe
absolute values of thedifference
were summed.This
gave someindication
ofhow
two
assignmentsdiffered
in
content.2.
Count
ofsimilarity
-each
similarity
factor
starts at zero and wasincremented
by
onefor
eachcorresponding
criterion value which was equal. This showed
how
many
criterion
values were equalbut
not which ones.3.
Weighted
count ofsimilarity
-this method was an
extension of number
2
above. Instead ofincrementing by
one, the
increment
wasby
the weight given the criterionvalues.
This
allowed theinstructor
to weight the criterionvalue
according
to what was expected of the particularassignment.
The second
data
analysis phase worked with thestring
of character codes
from
the seconddata
collection phase.It compressed
identical
charactersin
succession. Theresulting
strings were compared. If all the characters of thestring
matched that of another student,it
meantthe
twoThis
approach was successfulin
detecting
programschanged
by
the
following
means:
-transposing
statements when theordering
of the statementdoes
not effect the results,
-altering
format
statments,-breaking
up
single statements such as declarations and output statments,
-renaming
variables andlabels,
-recommenting.
3. 2. 2. 5
M. REES(CHEAT)
The
methodfor
detecting
plagiarism started with Rees'STYLE program
(see
Section
2.2.2.3). Robson[Rees82]
addedto
STYLE and created a post processor called CHEAT whichlooked
for
similar programs. After someexperimenting
thefollowing
criteria were selectedfor
comparison:
-total
of non-comment characters,- %
of embedded spaces,
-number of reserved words,
-number of
identifiers,
-total number of
lines,
-number of procedure/functions.
This approach was similar to the others
in
that thecriteria
for
each student was compared. Student programswith similar values were then verified
for
possible plagiar3. 3
SUMMARY
The
tools
andtechniques
for
thedetection
of plagiarism
canonly
pointto
possible plagiarized programs. Itis
still
necessary
tomanually
inspect
the suspected programsto
confirm plagiarism. Thetools
should reportbroad
enoughinformation
on possible plagiarism so that changesin
plagiarism approaches will
be
flagged.
If a plagiarism toolis
designed
toorestrictively
it
may
create afalse
sense ofsecurity
for
the
instructor.
The
benefit
ofhaving
both
a plagiarismpolicy
anddetection
mechanism was to create acheating
deterrent.
In partial
fulfillment
of this thesis atool
wasdeveloped
that uses acounting
approach similar to Donaldson, Lancaster and Sposato
[DLSp81].
See Section 7for
4.
PROGRAMDOCUMENTATION
4. 1
INTRODUCTION
ANDBACKGROUND
A program
that
is
easy
to read and understandis
easierto
test,
maintain andmodify
[Clif7 8].
Failure to adequately
document
softwareleads
to:higher
production andmaintenance costs, customer
dissatisfact