Rochester Institute of Technology
RIT Scholar Works
Theses
Thesis/Dissertation Collections
5-1-1995
The Design, construction, and implementation of
an engineering software command processor and
macro compiler
Jesse Coleman
Follow this and additional works at:
http://scholarworks.rit.edu/theses
Recommended Citation
THE DESIGN, CONSTRUCTION, AND
IMPLEMENTATION OF AN ENGINEERING
SOFTWARE COMMAND PROCESSOR AND
MACRO COMPILER
by
Jesse J. Coleman
A Thesis Submitted
in
Partial Fulfillment
of the
Requirements for the
MASTER OF SCIENCE
in
Mechanical Engineering
Approved by:
Professor
[Names Illegible]
Thesis Advisor
Professor
_
Professor
_
professor_-;;_--,---;-;-;---;-_ _
Department Head
Title of thesis
THE DESIGN, CONSTRUCTION, AND IMPLEMENTATION OF AN
ENGINEERING SOFTWARE COMMAND PROCESSOR AND MACRO COMPILER
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _hereby grant permission to the
Wallace Memorial Library of RIT to reproduce my thesis in whole or in part. Any reproduction will
not be for commercial use or profit.
S-ABSTRACT
This
paper presentsthedesign
and construction of a softwaretranslator thatservesas afoundation,
or centralengine,
aroundwhichan entireengineering
softwaresystem canbe
constructed.
To
providetheuserwitha powerfulinterface
todrive
an application, ahigh-level
procedural
language
similartoFORTRAN
orBASIC is integrated into
the translator.An
application shellwasalsowrittento providethe user withan
interactive
commandline
environment
for using the translator. The
translator, language,
and applicationshelltogethermechanize a
programming
andcommandinterpreter
environment.Users
caninteractively
entercommands
from the keyboard
orload
andprocess prewritten macrofiles
from disk. The
language
givesusersthe ability to
createvariables, arrays,andfunctions,
process complexmathematical expressions, and
develop
sophisticated macro programs.The language is
quitecapableof
solving very
complexengineering
problems.Several engineering
examples arepresented
including
a solutiontoafour-bar
crankmechanism,adding
a materiallibrary
toanapplication, acommand
line integration
solver,aRunge-Kutta
routinefor solving
sets ofdifferential
equations,and a convolutionintegral
routine.The translator is modular, easily
extensible,written
entirely in
C/C++,
andreadily
portabletodifferent
platforms.A
set ofdiagnostic
toolsis integrated into
the translator toaidthedeveloper in future development
work.Complete
theory
anddesign details for
allphasesofthe translatorandlanguage
are presented.Performance
issues
are studiedincluding
acomparison against C/C++ andMS-DOS Qbasic.
Exploration
in
application integrationfor
a simulation packagesimilartoCSMP is investigated. A
TABLE OF CONTENTS
LIST OF
TABLES
viiLIST OF FIGURES
viiiLIST OF SYMBOLS
ix
1. INTRODUCTION
1
2.
TRANSLATOR
THEORY
6
2.1 Overview
6
2.2
Grammars
9
2.2.7
Context-Free Grammars
92.2.2 Parse
Trees
122.3
Syntax
Analysis
13
2.3.1 Recursive DescentwithPredictive
Parsing
132.3.2 Noteon
Left
Recursion 152.4
Semantic Analysis
15
2.5
Syntax-Directed Translation
16
2.6 Lexical Analysis
19
2.7
Static
Checking
20
2.8
Intermediate
Code Generation
21
2.9 Code Optimization
21
2.10 Code
Generation
22
2.11
Interpretation
22
2.12 Threaded
Code
Interpreter/Compiler
23
2.12.1 PrimitivesandSecondaries 24
2.12.2
Dictionary
252.12.3 StacksandRegisters 26
2.12.4 Outer Interpreter 26
2.12.5 Inner Interpreter 27
3.
TRANSLATOR
DESIGN
33
3.1 Overview
33
3.2 Lexical Analysis
36
3.2.1 Transliterator
37
3.4.1 Table Structure
493.4.2 Scope
andBinding
Time 503.4.3 Record
Formats50
3.4.4 Service
Routines55
3.5 Error
Module
56
3.5.1 Interface
RoutinesandRecord Format 563.5.2 Error Management
Rules 573.5.3 Example
583.5.4 Design
Note- Multiple ErrorRegistration
59
3.6
Threaded
Interpreter/Compiler
60
3.6.1 Primitives
andSecondaries62
3.6.2
TIL
KeywordDictionary
693.6.3
Stacks
andRegisters 703.6.4 Outer Interpreter
71
3.6.5 Inner Interpreter
72
3.7 Source Language Specification
73
4. IMPLEMENTATION
82
4.1 Examples
Solving
Engineering
Problems
82
4.1.1 Four-Bar
Crank
Mechanism82
4.1.2 Material
Library
864.1.3 Integration
Solver
894.1.4 Fourth-Order Runge-Kutta
DEQ
Solver. 944.1.5
Convolution
Integral 1024.2
Performance Evaluation
107
4.3 Language Extensions
110
4.3.1 User-Defined Extensions 110
4.3.2
System
Extensions 1104.4
Application
Integration
111
4.4.1
Simulation
ModelerIll
4.4.2 Finite Element
Analysis
116
5. RESULTS
119
6. CONCLUSIONS
131
7. REFERENCES
133
APPENDIX
A:
ZORTECH/SYMANTEC
C++
COMPILER
134
APPENDIX
B: SOURCE CODE
MODULES
135
APPENDIX
C:
LANGUAGE
AND
COMPILER GUIDE
136
C1.
Introduction
136
C1.1
The Command Processor136
C2.1 Translation Units
138C2.2 Lifetime
138C2.3 Scope
138C2.4
Linkage 138C3. Language
Elements
139
C3.1 Tokens
139C3.2 Comments
139C3.3 Keywords
139C3A
Identifiers 139C3.5 Constants
140C3.6 Operators
140C4. Types And Declarations
141
C4.1
Base Types 141C4.2
User Defined Types 141C4.3
Declarations 142C4.4
Initialization 142C4.5
Storage 142C5.
Variables And Arrays
143
C5.1
Variables 143C5.2
Arrays 143C6.
Assignments, Operators,
andExpressions
145
C6.1 Assignments
145C6.2 Operators
145C6.3
Expressions 146C6.4
Type Conversion 146C7. Program Control Flow Statements
147
C7.1 Conditional
Statements. 147C7
.2IterativeStatements 150C7. 3
BREAKStatement
152C8. Functions
153
C8.1
Definition 153C8. 2
Parameters,
Local VariablesandArrays 154C8.3
RETURN Statement 155C8.
4
Invocation156
C9. Symbolic Constants
157
C10.
Miscellaneous
Commands
158
C11. Specialized Command Sets
159
C1
1.1
Mathematical 159C11.2
Utility. 159C11.3I/0 160
C1
1.4
Diagnostic162
C12. Operations Guide
163
C12.1
System Requirements163
C14. Advanced Diagnostics
176
C14.1 Intermediate Language Output
176C1 4.2 Threaded Code Generation
177C1 4.3 Back End
ThreadedInterpreter/Compiler.
179C1 4.4 Symbol
TableObject Destructors
181C15. Error Messages
182
C1 5.1 Lexical
182C15.2 Parser
182C1 5.3 Symbol
Table 184C15.4 Compiler
184C15.5 Runtime
185C15.6 System
785LIST OF
TABLES
Table
2.1
Grammar
productionsto
derive
an expression10
Table
2.2
Grammar
productions and syntax-directeddefinitions
16
Table 2.3
Grammar
productions with embedded semanticactions17
Table
2.4
Step by
step
action ofTIL interpreter
31
Table 3.1
Character
classificationtable
37
Table
3.2
Token
classificationtable
39
Table 3.3 Examples
oftoken
classifications40
Table 3.4
State
table
43
Table 3.5 TIL headerless
primitives63
Table 3.6 TIL headerless
secondaries63
Table 3.7 TIL immediate vocabulary
primitives63
Table 3.8 TIL
compilervocabulary
primitives64
Table 3.9 TIL
corevocabulary
primitives65
Table 3.10 Terminal
set74
Table 3.11
Nonterminal
set75
Table
4.1 Compilation
times
for
afew
selected macrofiles
107
Table 4.2 Source
codefor
performancetest
108
Table 4.3 Test
executiontimes
109
Table
5.1 Language
constructs123
Table
5.2
Operators,
precedence,
andassociativity
123
Table
5.3 Language
token
set124
Table
5.1 Compilation
times
for
afew
selected macrofiles
128
Table
5.2 Source
codefor
performancetest
129
Table
5.3 Test
executiontimes
129
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
Figu
LIST OF FIGURES
e
2.1
Typical
compiler/interpreter
translation
phases8
e
2.2
Parse
tree
for
a simple expression12
e
2.3
Primitive
andsecondary
code structures24
e
2.4 TIL
dictionary
25
e
2.5
Memory
layout
ofTIL
dictionary
and machine code routines29
e
3.1 Actual translator design for
this
project35
e
3.2
Lexical
analyzer36
e
3.3
Stream
buffer
and pointers38
e
3.4
State
diagram
to
construct anidentifier
41
e
3.5
State
diagram
to
construct an unsigned number42
e
3.6 Parser
44
e
3.7 Threaded
codeinterpreter/compiler
61
e
3.8 Flowchart
ofbasic
outerinterpreter
operation71
e4.1
Four-bar
crank mechanism82
e4.2
Program
to
solvefour-bar
crank mechanism problem83
e4.3
Graph
offour-bar
crank mechanism output angles85
e4.4
Material
library
program86
e
4.5 Numerical integration
programusing
the
Trapezoidal
method89
e4.6
Hypothetical
integration
routineusing
pointers93
e4.7
Input data file for
Runge-Kutta
program95
e4.8
Rocket
flight
characteristics96
e4.9
Program
to
implement 4th-order
Runge-Kutta
DEQ
solver97
e4.10 Hypothetical Runge-Kutta
solverusing
pointers100
e
4.11
Input data file for Convolution Integral
program103
e
4.12
Response
of2nd-order
systemusing the
Convolution Integral
104
e4.13
Convolution Integral
program105
e4.14
Atypical
CSMP
program112
e
4.15
Viscously
damped spring
mass system112
e4.16
Simulator input deck
113
e4.17
Translator
as a central enginefor
anFEA
system116
e
5.1 Basic
translator
operation119
e
5.2 Typical
command stream119
e
5.3 Translator design
120
LIST OF SYMBOLS
< )
grammar nonterminale
empty
set; "null
sequence"for
a nonterminal( )
grouping
operator[ ]
optional grammarsegment,
array
operator{ }
repeating
grammarsegment,
set operator|
"or"symbol;
alternative choice ->grammar production symbol
0,
<|> angle(deg)
Q
heat transfer
rate(Btu/hr)
T
temperature
(F),
thrust force
(Ibf),
period(sec)
k
thermal
conductivity
(Btu/hr-ft-F),
spring
constant(lbf/ft)
A
surface area(ft2)
Ax
distance
(ft)
cp
specificheat
at constant pressure(Btu/lbm-F)
x,
y
displacement
(ft)
g
gravitational acceleration=32.17
(ft/sec2)
W
weight(Ibf)
K
aerodynamicdrag
coefficient(lbrsec2/ft2)
C,
damping
ratioco0 natural
frequency (rad/sec)
m mass(slug)
c
damping
coefficient(slug/sec)
h(t)
unitimpulse
response(ft/lbf)
t
time(sec)
1.
INTRODUCTION
Definition, Description,
andBackground
From
practical experience and exposureto many different engineering
softwaresystemsoverthe
years,it's
apparentthat the
most successful andversatile systems aretheonesthat
incorporate
acommandinterpreter
and providethe userwithsomeform
ofprogramming
environment.A
system ofthis type
givesthe
end application userthemost powerful meanstointerface
with anddrive
an application.A typical
commandinterpreter
allowsthe usertointeract
withthe applicationby
entering
commandsatthe
keyboard
orloading
andprocessing
prewrittenmacro
files from disk. The inclusion
of ahigh-level
procedurallanguage
adds aprogramming
environment
that
givesthe
userthe ability to
createvariables,arrays,
andfunctions,
process complex mathematicalexpressions,
anddevelop
sophisticated macro programs.The ability
tocreate and process macros
is the
key
to
a powerfulandflexible
systemthatallows a usertocustomize an application particularto
their
needs.Macros
provide a powerfulmechanismtosupportand
drive
an application:System
configurations canbe
set or altered.Frequently
used variables andfunctions
canbe loaded
atstartup
andbe
immediately
availableto theuser.
Libraries
of specialfunctions
canbe
createdandloaded for
use whenneeded.Libraries
cantransform
ageneral purpose systeminto
aspecializedcomputing
environment.Libraries
of geometric entitiesor parts canbe
stored asmacros andloaded
when needed.Entire
sessionfiles
canbe
processed.Application
processes canbe
automated.To
providean applicationwithacommandinterpreter
andprogramming environment,
atranslator
mustbe designed into
the system.Technically,
atranslator is defined
asa mechanism whichtakes
asinput
a sourcelanguage
and produces as output either:1)
A
generatedtarget
language for later
executionon ahost
machine.2)
A
generatedtarget language
whichitself
becomes the input
sourcelanguage for further
translation.3)
Interpretsduring
translation
and synthesizes actionsthat
areexecutedby
calling
appropriatehost
machinesubroutines.The
processoftranslationinvolves
analysis and synthesis.A translator
must analyzeasourcelanguage to
determine it's
content andstructure,interpret
themeaning
oractionintended
by
the
language,
andsynthesize eitheratargetlanguage
ordirectly
executehost
machine subroutinestoperform the
intended
actions.A
translator thatgeneratesa machine-orientedtarget language
is
called acompiler,atranslator that interpretsduring
translation and synthesizes actionsis
called aninterpreter.
machine
language. The familiar DOS
commandlanguage
is
another good example whereatranslator is
usedto implement
acommandinterpreter. The
source languageconsistsof commandsdesigned to
interface
withanddrive
theDOS operating
system.System
commands are entered atthekeyboard,
interpreted,
and executed.The ability
towrite and processbatch
files
adds aprogramming
environmentthat
givesthe
userthecapability to
customizeanddrive
the
DOS
environment.Translation
mechanisms are presentin
oneform
or anotherin
most applications.Even
systemsthat do
nothave
allthecapabilities ofthemore advancedapplications still implementsomeform
oftranslation.
The ability
tosimply
key
in
acommand requiresrudimentary
lexical,
syntactic, and semantic analysistobe
performed.Expression processing increases
thecomplexity
oftranslation
and requires advancedparsing, type
checking,implicit
casting, andafull
setofmathematical operators.
The
additionof userdefined
variables adds symboltableand management routinesfor memory allocation,
insertion,
searching, anddeleting
ofvariables.Language
andtranslation
theory
evenplay important
rolesin
areasone mightnotexpect atfirst.
Solid
modelersthatimplement Constructive Solid
Geometry (CSG)
systems arebased
onthesame
language
andparsing
theory
usedtospecify
andtranslategrammars.Parse
treesfor
a grammar parallelCSG trees for
asolid.For
instance,
the
primitives ortokensof agrammar(technically
calledterminals)
mightinclude
FLOAT, LET,
+,
-;whereCSG
terminalswouldinclude
CYLINDER, BLOCK, SPHERE,
n(intersection),
u(union),
and-(difference). Grammar
statements or sentences
(called nonterminals)
wouldinclude
suchitems
asfor_statement
andif_statement,
whileCSG
nonterminals wouldrepresentintermediate
partsthatmakeup
thefinal
solid
being
constructed.Even
the toolslearned
and usedin
translationdevelopment
areindispensable
in
applicationdevelopment: linked
lists,
stacks,trees,
etc.Linked lists
are quite oftenusedin
applicationsto build dynamic data
structuresfor
objects such as geometric entities andto group
parent/child relationshipsamong
objects.Quadtree
and octreetree
data
structures are usedby
solid modelerstorepresentplane andsolid geometric objects.FEA
modelersimploy
thesametree
structurestoimplement
mesh generation.Whether
directly
orindirectly,
translatortheory
anddesign
is
presentin practically every
engineering
software application.Therefore,
if
oneseriously
endeavorstodevelop
a professionalengineering
package; translatorandlanguage
theory,
design,
and construction mustbe
undertaken.
Developing
a modernengineering
softwarepackageis
a majorundertaking in
many
respects.To
be
competitivein
today's market,an application mustbe designed
togivethe userasmuchflexibility
and power as possibleto interfacewithanddrive
an application.Graphical
userinterfaces
manageforms,
menus, and other widgetstoprovideauser-friendly
environmentthat
greatly increases
easeofuse and productivity.Interactive
graphics allowtheuserto easily
visualize, construct, and manipulate entitiessuchas solidgeometry,finite
elements,boundary
conditions,etc.
However,
asnecessary
anduseful astheseinterfaces
are,they
do
not affordthe
userthe
flexibility
andpowerrequiredto customize, extend, anddrive
theapplication.To
accomplish
this,
anengineering
software package mustincorporate
atranslator to
providethe
application with acommand interpreterand
programming
environment.A
superior systemdesign includes
atranslator thatprovides:Many
reasonslead to both the
needfor
atranslator
andits
constructionfirst
in the
overalldesign
of an application.From
a user's point ofview, atruly
professional package should offerthe
following:
Interactive
commandline input from the keyboard.
The
ability
to
process mathematical expressions.The
ability
to
create variables and usethem
in
expressions oras commandarguments.The ability
towritefunctions
or macrosto
do
calculations anddrive
applicationprocesses.A
comprehensive application commandset with accesstosystem routines anddatabases.A high-level
procedurallanguage for
programming.From
adeveloper's
point ofview, atranslator
offersthefollowing
advantages:No
single system canbe designed
to meet allthe
requirements ofevery
user.A
programming
and commandinterpreter
environment givestheuseras muchflexibility
as possibleto
customize anddrive
an applicationto meettheir
specific needs.A translator
provides a solidfoundation,
or centralengine,
around which an entire application canbe
constructed.A
systemdesigned
around atranslatoris easily
extensiblefor future development.
-
A programming
andcommandinterpreter
environmentis necessary
tobe
competitivewiththe
current state oftheartin
softwarepackages.Many
successfulsystemsthat implement
translatorsarein
usetoday.
The familiar PATRAN
pre and postprocessoris
a good exampleofafinite
elementsystemthat
includes
atranslator
with ahigh-level
procedurallanguage
and extensive application command set.The
usercanwritefunctions
ormacrosin PCL (Patran Command
Language)
and compiletheminto the
system.These
macros canthenbe
usedtoconstructgeometry,generatemeshes, model complex spatial andtemporalfields,
etc.The CSMP (Continuous System
Modeling Program)
simulation programdeveloped
by
IBM is
a problem-orientedprogramming language
thatincludes the
FORTRAN language
and a command settailored towardssolving ordinary
differentialequationsandblock diagram
simulations.In
actuality,CSMP
is
atranslatorthat
takesasinput CSMP
statements and produces as outputaFORTRAN
sourcelanguage
subroutine.This
subroutineis then
compiled with anintegration
routine
using
a standardFORTRAN
compiler and executed.Purpose
This
paper presentsthedesign
and construction of atranslator,
language,
and application shellthat
serve as afoundation
andframework
around which an entireengineering
software systemcan
be
constructed.To
providetheend application user with a powerfulinterface
todrive
anapplication,
ahigh-level
procedurallanguage
similarto FORTRAN
orBASIC
wasdeveloped
andintegrated into the translator. Since
no applicationwasactually
constructed, an application shell was written tocreate aninteractive
commandline
environmentfor using
the translator.The
shelland
translator
comprise a single executable program: cp.exe.After
systemstartup
andinitialization,
aprogramming
and commandinterpreter
environmentis
createdthatlasts for the
duration
ofthe
application session.Users
canthen
interactively
enter commandsatthekeyboard
orload
and process user-written macrofiles from disk.
Major features
ofthe translator
include:
Creates
aprogramming
andcommandinterpreter
environment.Takes
asinput
afully
structuredsourcelanguage very
similartoBASIC
orFORTRAN.
Supports two
modes ofoperation;
aninterpreter
mode and acompile mode.Performs
typechecking
andimplicit
casting.Extensive
errorchecking is
performedduring
all phases oftranslation
including
runtime supportfor
floating
point exceptions andarray bounds
checking.Reentrable;
user-defined macroscancallthetranslator
toprocesscommands orcompilefunctions
atruntime.Easily
extensiblefor future development
andapplicationintegration.
Portable to
otherhardware
platforms; theentire system was writtenin C/C++.
A
setofdiagnostic
toolsis integrated into
the translator toaidthedeveloper.
The
majorfeatures
ofthe language include:
Free format
ofsourcelanguage
text.Symbolic
constants.Variablesandmulti-dimensionalarrays.
User-defined typesor records.
User-defined
functions that
takeparametersand returnvalues.Program
flow
controlconstructs; e.g.,IF,
SWITCH,
FOR, REPEAT, WHILE,
etc.Local,
global,and external objects.The
project resultedin
thefollowing
majoritems:
Complete
theory
anddesign details for
all phases ofthe translatorandlanguage.
CP.EXE
-An
executabletranslator
programthatcreates aprogrammingand commandinterpreter
environment.The
programincludes
an integratedapplication shelltoprovidethe userwithaninteractive
commandline
environment.A
completeLanguage
andCompiler Guide.
A
set ofdiagnostic
tools toaidthedeveloper.
Translator
C++sourcecode modulesandseveral example macrofiles.
Example
codesolving engineering problems; Four-Bar Crank
Mechanism,
Material
Library,
Integration
Solver,
Runge-Kutta DEQ
Solver,
andConvolution Integral.
2.
TRANSLATOR THEORY
2.1 Overview
A translator is
a mechanism whichtakes
asinput
a sourcelanguage
and producesasoutputeither:
1)
A
generatedtarget
language for later
execution onahost
machine.2)
A
generatedtargetlanguage
whichitself becomes
theinput
sourcelanguage for further
translation.
3)
Interprets
during
translation
and synthesizesactionsthatare executedby
calling
appropriatehost
machine subroutines.The
processoftranslation
involves
analysisandsynthesis.A
translatormust analyzea sourcelanguage
todetermine it's
content andstructure, interpret
themeaning
oractionintended
by
the
language,
and synthesize either atarget
language
ordirectly
executehost
machine subroutines toperformtheintended
actions.A
translatorthat
generates a machine-orientedtargetlanguage
is
called acompiler,
atranslator that
interprets
during
translation
andsynthesizesactionsis
called aninterpreter.
Conceptually,
translationmay be
groupedinto
phases asshownbelow:
Lexical Analysis
Syntax Analysis
The
process ofconstructing
validlanguage tokens from the input
sourcelanguage
textstream.The
process ofrecognizing
whether agiventokensequenceis
avalidlanguage
statement, andif
so,whatits
structureis.
Semantic Analysis
The
process ofperforming
theactionsormeaning intended
by
the
language.
Intermediate
Code
The
production ofanintermediate
form
ofthe
sourcelanguage that is
Generation
easily
optimized andtranslatableinto
atargetlanguage
or executableby
an
interpreter.
Code Generation
Generation
oftarget language.
orAdditional
phases oftranslationinclude:
Static
Checking
The many types
of checksthat
are performedduring
translation suchas typechecking, flow
controlchecking, uniquenesschecking, etc.Code Optimization The
process ofoptimizing
theintermediate
codeto
producefaster
running target
code.Symbol Table
Management
ofthe data
structures usedby
the translatorto
storeandManagement
retrieve records of objects andtheirassociated attributes.Error Management Error
tracking,
reporting, andrecovery
managementduring
all phases ofthe translation
process.Conceptually,
phases aredistinct
processes performedduring
translation.Depending
onthe
type oftranslator
design,
phasesare often combinedtogether
and some evenomitted such ascode optimization.Translator designs
arecommonly
structuredinto
modules; each module'sjob
to processa phase ofgroup
of phases.The
phases oftranslation
can alsobe
groupedinto front
andback
ends.The
front
endtypically
includes
all phases oftranslation necessary
toprocessthesourcelanguage
into
anintermediate
language form. The back
endincludes
all phasesnecessary
to processtheintermediate
language into
executable code orsynthesize actionsby
interpretation. A
typical compilerorinterpreter layout
groupedinto front
andback
endsis
shownin Figure 2.1.
Quite
abit
oftheory
has been developed
aroundthephases oftranslation andinclude
studiesinto many
areas such asformal
grammars,finite
automata,parsing
techniques,
errorrecovery
andcorrection, data-flow
analysis, etc.This
paper presentsin the
next sections anintroduction
to some ofthe
basic
theory
relevantto thisparticulartranslator
design. The
readeris
referredto
any
ofthe translatorbooks
givenin
theBibliography
sectionofthispaperfor further background
ontranslator
theory
and methods.The
following
majortopics
willbe
covered:Context-Free Grammars
Lexical
Analysis
Syntax Analysis using Recursive Descent
withPredictive
Parsing
Front End
Back End
Source Language
Lexical Analysis
Syntax Analysis
Semantic Analysis
Type Checker
Intermediate Code Generator
m
i Code Optimizer i
L . J
Intermediate Language
i
1
Code Optimizer
'-"T"-Interpreter
''
CodeOptimizer i
CodeGenerator
Execute Host MachineSubroutines
Target Language
Error Manager
Symbol Table Manager
2.2
Grammars
A
grammaris
aformal
specification ofthestructure,
orsyntax,ofa language.A
grammardefines
what symbolsare usedtowrite alanguage
and what sequences of symbols are permitted.2.2.1
Context-Free Grammars
A
context-free grammaris
atype
ofgrammar oftenusedtodescribe
thesyntactic structure ofprogramming
languages.1
A
context-free grammar consists offour
parts:1
.A
finite,
nonempty
set of symbols calledterminals. Terminals
arethewordsandsymbolsthat makeup
thelanguage;
e.g.,FOR,
+,
*,
{, LET,
WHILE,
etc.2.
A
finite,
nonempty
set of nonterminals.Nonterminals
representsequences ofterminals and/ornonterminalsusedin
alanguage;
e.g.,for_statement,
assignment_statement,if_statement,
expression,etc.For
example, aWHILE block is
representedby
the nonterminal while_statement which representsthesequence\NH\LE(expr)
statementjistENDWHILE.
WHILE, (, ),
andENDWHILE
areterminals,
expr andstatementjist are nonterminals.3.
A
finite,
nonempty
set of rulescalledproductions.A
production associatesthe
replacementof a single nonterminal witha sequence ofterminalsand/or nonterminalsorthe null sequence, e.
For
example, the productionfor
a while_statement wouldbe
givenby:
while_statement ->
WHILE(expr)
statementjistENDWHILE
If
alternative productionsfor
a nonterminalexist, eachis
groupedwiththe
nonterminal and precededby
the productionsymbol"->".
Alternatively,
the"|"symbol
may be
usedtoindicate
an alternative production.For
example,the
following
setsofproductionsfor
expr are equivalent:expr -
term
+term
->
term
-term
expr ->
term
+|
-term
4.
A
designationof oneofthenonterminalsin
part2
as astarting
nonterminalfrom
which all others canbe
generatedby
systematic applicationoftherulesin
part3.
Example 2.1. A
context-freegrammar specificationfor evaluating
expressionsis
givenbelow:
1
.terminal
set:2.
nonterminal set:3.
productions:a)
expr-->
b)
term
{
number,
(, ),
+,
,/,
*}
{expr,term,
factor)
term
+term
term
-term
term
factor*factor
factor I factor
factor
c)
factor
4.
Starting
nonterminal(expr)
numberexpr
Now
considertheexpression3/5+(8-2)*4. To
showthis
is
avalid expressionin
thelanguage,
a sequence of productions appliedtoexprusing
the rulesin
part3
mustexist.The
applicationof rulestoderive
thefinal form consisting
of allterminalsis
called aderivation. Application
of ruleson expr
to
derive 3/5+(8-2)*4 is
shownbelow in Table 2.1
.RULE
PRODUCTION
3.
a expr-term
+term
3.b
expr->factor 1 factor
+term
3.c
expr->3 / factor
+term
3.c
expr->3 / 5
+term
3.b
expr->3 / 5
+factor
*
factor
3.c
expr->3 / 5
+(
expr)
*
factor
3.a
expr-3 /
5
+(
term
-term
)
*factor
3.b
expr->3 / 5
+(
factor
-term)
*factor
3.c
expr->3
/ 5
+(
8
-term)
*factor
3.b
expr->3
/ 5
+(
8
-factor)
*factor
3.c
expr^y3 / 5
+(
8
-2
The
derivation
usedin Example 2.1 to
generatethe
expressionis
not unique.However,
for
anunambiguousgrammar,
there
existsa uniqueleft-most
and right-mostderivation.The derivation
shown wasaleft-most derivation. A left-most derivation systematically
appliesrulestoderive
nonterminals
from left to
right.A
right-mostderivation does
thesamebut from
righttoleft.
A
grammarthat
produces morethan
oneleft-most
orright-mostderivation is
termed ambiguous.Ambiguous
grammarshave
morethan
oneleft
orright-mostderivation
andspecialhandling
is
required
by
the
parsertoresolvethe
ambiguity.The
parserimplemented in the translator implements left-most
derivationfor
all cases exceptexponentiation
in
expressions.Exponentiation has
righttoleft associativity
anda right-mostderivation is
usedin
thatcase.High-level language
constructs arejust
aseasily
representedby
a context-free grammar.Shown
below
aresometypicalproductionsfor
afew
ofthecommonflow
control constructs.A typical WHILE block:
while_statement ->
WHILE(
expr)
statementjistENDWHILE
A
simpleIF block:
if_statement
->IF(
expr)
statementjistENDIF
A
powerfulform
oftheFOR block:
for_statement
->FOR(
assignmentjist',
expr;
assignmentjist)
statementjistNEXT
An
entire program canbe
thestarting
symbolfor
a grammar with a production asshownbelow:
2.2.2 Parse Trees
A
parsetree is
agraphical representationoftheproductions appliedtoastarting
nonterminalto reachthe final terminal
symbols ofthe
grammar.A
parsetree has
thefollowing
characteristics:1
.A
singlestarting
node,
orroot,
whichis the
starting
nonterminal.2.
Leaves
that
areterminal
symbols.3.
Interior
nodesthat
are nonterminals.A
parsetree for
theexpression givenin Example 2.1 is
shownin Figure 2.2.
Figure 2.2 Parse
tree
for
a simpleexpression.
2.3 Syntax Analysis
Syntax analysis,
orparsing,is the
process ofrecognizing
whetheragivenstring is in
alanguage,
andif
so,whatits
structureis.
Parsing
readsinput
terminalsfrom
left
toright,
andusing the
grammar ofthe
language,
attemptstoconstructtheparsetree for
thederivation. Two
major classesofparsing techniques
exist;top-down
andbottom-up.
Top-down parsing
startsatthe
root,orstarting
nonterminal, andbuilds
downward to theleaves.
Parsing
systematically
applies productionsto
single nonterminalstoyield sequencesofterminals and/or nonterminals untiltheterminals,
orleaves
ofthe
tree,
arereached.Bottom-up
parsing
starts atthe leaves
ofthe tree
andbuilds
upwardto the root.Parsing
systematically
substitutes sequences ofterminalsand/or nonterminalsfor
single nonterminals untilthe starting nonterminal,
or root ofthetree,
is
reachedMuch
workhas
goneinto the
theory
ofparsing
and as aresult,many
techniquesexistfor
implementing
parser schemes.The
readeris
referredtoany
ofthecompilerbooks
givenin
theBibliography
section ofthis
paperfor background
onparsing
methods.2.3.1 Recursive
Descent
withPredictive
Parsing
The parsing
methodchosenfor
usein this translator is
a populartop-down
method calledrecursive
descent
withpredictive parsing.This technique is easily
extendible, usestheactivation records offunction
callsto
implicitly
build
a parsetree,
exploitsfunction recursion,
requires nobacktracking
(the
nexttoken
read alwaysdetermines
whichroutineto
call),anddirectly
parallelsthe
format
ofthe
grammar.Semantic
actions andtypechecking
areeasily integrated into the
parsing
routine.Additionally,
for
someone without abackground in
compilertheory,
it's
the easiesttechnique to understand andlearn.
Major
elementsof aRecursive Descent
Predictive Parser
A
procedure or routine existsfor
each nonterminalin
thegrammar.The
nexttoken to be
read, calledthelookahead
symbol,unambiguously determines
whichprocedureorroutinetocall.
To
illustrate
predictiveparsing
using
the
recursivedescent
technique,
seeExample 2.2.
Example 2.2 Consider the
productionfor
thewhilestatement: while_statement -?WHILE(
expr)
statementjistENDWHILE
A typical 'C
routinetoprocessthis
productionwouldbe
asfollows:
void
while_statement()
{
matchfWHILE); match('C); exprfj; matchC)*); statementlistO;
match(ENDWHILE);}
matchO is
ashortroutinethatcallsthe
lexical
analyzertogetthenexttoken.It
takesa parameterthatis
comparedto
thecurrenttoken;
if
they
match, the
nexttokenis
extracted,if
they
don't match,
an erroris
generated.The
parameter passedtomatchO is actually predicting
what
the
currenttoken
mustbe according to
thegrammarspecification.exprfj is the
main routinetoevaluate expressionsfor
thenonterminalexpr.statementlistO is the
main routinetoevaluate statementsfor
thenonterminalstatementjist.To
simulatethe parsing
process,assume some routinehas already
extractedthe
nexttoken andfound it to be WHILE. This
triggersa callto thefunction
while_statement().Step
1
.Current
tokenis WHILE. Call matchfWHILE)
toextractthenexttoken.
Step
2. Current
tokenis
either(
orsomething
else.Call
match('(');if the
currenttokenis
(,
the nexttoken is
extracted,if
not, an erroris
generated.Step
3. Call
routine exprfj.This
routine will processtheexpression.If
successful, noerrorswillbe
generated and program control returns.Step
4. Current
tokenis
either)
orsomething
else.Call
match(')');if the
currenttoken
is
),
the
nexttoken
is
extracted,if
not, an erroris
generated.Step
5. Call
routine statementlistO-This
routinewillprocessthestatementsin the
body
ofthe
while
block. Since
whileblocks
canbe
nested,this
routinemay
actually
callwhile_statementOseveral
times,
whichin turn
calls statementlistO.This is
wherethe
recursive natureof
function
callsis
usedtoimplement
nesting.If successful,
no errors willbe
generatedand program control returns.Step
6.
Current tokenis
eitherENDWHILE
orsomething
else.Call match(ENDWHILE); if the
currenttokenis
ENDWHILE,
thenexttoken
is extracted, if
not,an erroris
generated.2.3.2 Note
onLeft Recursion
Left
recursion occurswhenthe
left-most
nonterminal onthe
right sideof a productionis the
same nonterminal as on
the
left
side ofthe
production.For
example,expr - expr+
term
This form
of production will causethe
recursivedescent
parserto
gointo
aninfinite
loop
making
repeated calls
to
exprO-Left
recursion canbe
eliminatedby
introducing
a newnonterminalandrewriting the
grammarto
be
rightrecursivein form.
expr -
term
morejermsmorejerms -> +
term
For
a generaldiscussion
onleft
recursionelimination,seeAho, Sethi,
andUllman.2
2.4
Semantic Analysis
The
processofparsing discussed in the
previous section carries outrecognition andsyntacticanalysisofthesource
language. Semantic
analysisperformstheactions ormeaning intended
by
the language. Semantic
rulesdefine
theactions requiredtocarry
outtheactivitiesnecessary for
translation;
e.g.,insertion
of objectsinto
a symboltable,
intermediate
codegeneration, errorhandling,
etc.Syntax-directed definitions
andtranslation
schemes relatesemanticrules with productions.Both
are ageneralization ofacontext
free
grammar whereeachgrammar symbolhas
an associated set ofattributes;synthesized andinherited. Inherited
attributes are computedfrom
theparentand siblings of anodein
theparsetree.Synthesized
attributes are computedfrom
theattributesofthe
children ofa nodein
theparsetree.Listed below
are someofthemajor elements ofeach:Syntax-Directed
Definitions
High-levelspecifications
for
translations.Hide
implementationdetails
andorderofevaluation.Dependency
graphsandtopologicalsorting to determine
evaluation order of semantic rules.Syntax treesand
dags to
decouple translationfrom
parsing.Translation Schemes
Indicate order
in
which semanticrulesareto
be
evaluated.Reveal implementation
details.
Semanticactionsarecoupled with parsing.
2.5
Syntax-Directed
Translation
A
syntax-directedtranslation
schemewasimplemented in
the translator togenerate postfixexpressions
for the
entirelanguage. A
syntax-directedtranslationscheme associates attributes with each element ofthe
grammar and embeds semanticprocessing
with parsing.Attributes
canbe anything;
a value,amemory
address,
an objecttype,
etc.The
productions ofthegrammar are modifiedto
include
semantic actionswherenecessary to implement
thesemanticrules.For example, let's look
againatExample 2.1
andthis
timeinclude
semantic analysisto
generate anintermediate language string in
postfix notation.Table 2.2
showstheproductionsandsyntax-directed definitions for
thelanguage.
PRODUCTIONS
SEMANTICRULES
expr ->
term
+term
expr.val=term^val
\\
term2.val
||
'+'->
term
-term
expr.val=term^.val
\\
term2.val
||
'-'->
term
expr. val=term.val
term ->
factor'
factor
term.
val=1'actor
\.val||
factorial
\\
'*'-*
factor 1 factor
term.val
=factor
\.val||
factorial
\\
V
->
factor
term.val
=factor,
valfactor
->(expr)
f
actor.val=expr. val-> number factor.val= number, val
Table 2.2 Grammar
productions and syntax-directeddefinitions.
The
semantic rulesin Table 2.2
associate synthesizedstring
attributesfor
each production.The
||
symbol representsstring
concatenation.The derivation
of anexpressionwill resultin
afinal
string
representation ofthe
outputin
expr.val.To implement
atranslation scheme,
wefirst define
an action symbol
{
}. An
actionsymbolis
usedto representasemantic actionin
a production.For
example, expr -term
+term{
emit('+') }. Here
a routine called emitis
calledto
outputthe'+'
Embedding
semanticactionsin the
grammaryieldsthefollowing:
expr ->
term
+term
{
emit('+')
}
->
term
-term
{
emitf-')
}
->
term
term
->factor*
factor
{
emit('*')
}
->factor / factor
{emitf/') }
-
factor
facfor
-(
expr)
- number
{
emit(number)
}
Note that the translation
scheme presented above emits strings astheproductionsareparsed.This differs from
thesyntax-directeddefinition
givenin Table 2.2
thatsynthesizes stringsthrough
concatenationuntilthe
final string is
storedin
expr. val.Now,
again considertheexpression3/5+(8-2)*4.
Processing
ofthe
expressionwill resultin the
following
intermediate string
representation: 35/82-4*+Application
ofrules on exprto
derive 3/5+(8-2)*4
withembeddedsemantic actionsis
shownbelow in Table 2.3.
RULE
PRODUCTION
SEMANTIC ACTION
3.a
expr->term
+term
3.b
expr-factor 1 factor
+term
3.c
expr->3 / factor
+term
3
3.c
expr-3 / 5
+term
5
/
3.b
expr->3 / 5
+factor
*factor
3.c
expr->3
/ 5
+(
expr)
*
factor
3.a
expr->3 / 5
+(
ferni
-term
)
*factor
3.b
expr->3 / 5
+(
factor
-term)
*factor
3.c
expr-3
/ 5
+(
8
-term)
*factor
8
3.b
expr->3
/ 5
+(
8
-factor)
*factor
3.c
expr->3 / 5
+(
8
-2
)
*factor
2
-3.c
expr^>3/5
+(8
-2)*4
4
*+
Let's
again revisitExample 2.2 but include
semantic processing.Example 2.3 Consider the
production ofthewhilestatementwith embedded semantic actions.while_statement ->
{inc
blocklevel}
{emitf'while")}
WHILE
(
expr)
{emit("if^statementjist ENDWHILE
{emit("wend")}
{emit{breakcount),
emit("setbrk")}
{dec
blocklevel}
Semantic
actions:blocklevel
Integer
representing
the level
ornesteddepth
ofthecurrent while statement. whileKeyword for back
end compilerindicating
start ofwhileblock.
if
Keyword for back
end compilertoinsert
codetoevaluate expr andjump
to appropriatelocation.
wend
Keyword for back
end compilerindicating
end of whileblock.
breakcount
Integer representing
numberofBREAK
statementsin
currentwhileblock.
setbrk
Keyword for back
end compilertoinsert
jump
codefor break
statements.The
actual sourcecodeto
processa whileblock is
shownbelow. LP is
a global pointerto
alinked list
oflevel
records.Levels
correspondto thedepth
of nestedflow
control statements.Any
flow
control statement creates a newlevel
record onentry
andinserts it in the list. LP
always pointsto the
most recent recordwhich representsthe currentlevel. The BREAK
statement usesLP
tostore a countofthenumber ofbreaks
thatoccurin
thecurrentlevel. On
exit, thecurrentlevel
recordis deleted
andLP is
setto
pointto thenexthigher up level.
staticvoid
while_statementO
{
if(error.statusO)
return;LevelRecord
*Level
= newLevelRecord;
Level->Next
=LP;
LP
=Level;
matchfWHILE); emit("while"); match('C);
boolexprO;
match(')'); emitfif); statementlistO; match(ENDWHILE); emit("wend");emit("ii");emit(itoa(LP->breakcnt,AddressBuffer,1
0));
emit("setbrk");
LP
=Level->Next;
delete
Level;
}
2.6 Lexical Analysis
Lexical
analysisis the
process ofconstructing
validlanguage
tokensfrom
theinput
sourcetext stream.Lexical
analysis canbe included in
thegrammarspecificationfor parsing
orseparatedinto
a separate phase.For example,
a portion of a context-free grammartoconstructthenonterminals(identifier)
and(number)
is
presentedbelow.
(identifier)
->(letter) {(letter) \ (digit)}
(number)
->(integer) ["."] [(exponent)]
->
[(integer)]
"."(integer) [(exponent)]
(integer)
-(digit)
{(digit)}
(exponent)
->"E"
|
"e"[(sign)]
(integer)
(sign)
->n.11iii m
(digit)
->"0"
|
"1"|
"2"|
...|
"9"
(letter)
->"A"
|
"B"|
...|
"Z"
1 1
"_"1 1
"a"|
"b"|
...|
"z"
Of
course,some routineis
neededtoextract charactersfrom
thesourcetext.
Also,
notethat tokens are notin the terminal
set ofthe
grammarin this
implementation,
but
appear as nonterminals.More
commonly,lexical
analysisis handled
by
aseparate phase ofthe translator.The lexical
analyzer, or"scanner",
is
calledby
the
parsertoextractthenexttokenfrom the input
sourcetext
stream.This implementation
allowsclassification oftokens
andinclusion
oftokens
into the
2.7
Static
Checking
Static
checking
includes many types
of checksthat
mustbe done
by
the
compilerduring
translation. These
checksinclude:
Type Checking:
Assuring
that
operands are compatible with eachother,
operations aredefined for
aparticular operandtype,
function
parameters areofthe
appropriatetype,
assignmentsarecompatible, etc.Flow Control Checking:
Checking
that
flow
control statements such asBREAK
andRETURN
appearin legal
contextin
a source code program.For example, RETURN
cannotbe
usedoutside ofafunction definition
and mustbe
usedif
afunction is
toreturn a value.Uniqueness Checking:
Checking
that
external objectshave been declared only once,
objectsdeclared in the
body
ofafunction
areunique,function
parametershave distinct
names, etc.For
example, suppose an external variableis declared
asFLOAT
xcoor.A
checkmust existtosearchthesymboltableto
be
surethat
xcoorhas
notbeen previously declared
assomething
else.Many
otherchecksmay have
tobe done
depending
onthecomplexity, structure,
andcontentofthe
language
anddesign
ofthetranslator.
Most
checksareeasily
embeddedinto
thetranslation
2.8
Intermediate Code
Generation
Although
atranslator
canbe
designed
to
directly
generatetargetcodefrom
a sourcelanguage,
intermediate
codegenerationis
commonly
employedin
translator designs. Intermediatecodegeneration
has the
following
advantages:Different back
endtranslators
canbe designed
to producetarget codefor
varioushost
machinesandstilluse
the
samefront
endtranslator.A
single code optimizer phase canbe designed
tooptimizethe intermediateform
independent
ofthe
host
machine.A
moreeasily
optimized andtranslatable form
oflanguage
canbe
generatedfor
targetcodeproduction orexecution
by
aninterpreter.
Three
commonforms
ofintermediate
representation are syntaxtrees,
postfixnotation,andthree-address
code.Postfix
notationwaschosenfor
thisdesign
to allowthe inclusion
of aback
endthreaded
interpreter/compiler
whichis itself
astack-based,postfix notationtranslator.Postfix
notation requiresthe
operandsof anexpression precedethe
operator.For
example, theinfix
expression4+5/3
wouldhave
thepostfixform: 4 5 3/+.
Extending
thisnotation toinclude
function calls,
a call such ashyp(x,y,z)
wouldhave
thepostfixform:
xy
zhyp.
Postfix
notationis easily implemented using
a stackbased
machine which willbe
discussed in
detail in the
threadedcodeinterpreter
section ofthispaper.Intermediate language
generationis
easily implemented in
a syntax-directedtranslationscheme.2.9 Code Optimization
Code
optimizationis
an attempttoperformtransformationson codeto ultimately
produceperformance
improvements. Optimization
canbe
applied at several stages:Source Code
User
applies codeimprovements
tosource codeby improving
algorithms, etc.Intermediate
Code
Optimizing
phase of compiler performslocal
andglobaloptimizing
techniquessuch assubexpression elimination,copy propagation,
dead-code
elimination,loop
optimization, etc.Target
Code
Optimizing
phaseof compiler performs optimizationtechniques to
take advantageofparticularhost
machine'shardware
such as registerusage,
instruction set,
etc.No
optimization phaseis currently implemented in
thetranslator.
For
adetailed discussion
oncodeoptimization, see
Aho,
Sethi,
and2. 10
Code
Generation
The
generation oftargetcodefrom intermediate language
codeis
thefinal
phaseofthecompiling
translator. Typical target
codeincludes
absoluteor relocatable machinelanguage orassembly language. For
adetailed discussion
on codegeneration,seeAho, Sethi,
andUllman.
5The translator designed for this
projectdoes
notactually
generate machine-dependentcode,but
instead,
generates athreaded
code(list
of addressesthatpointtohost
machinesubroutines.)
This threaded
codeis
processedby
a softwareinterpreter
whichwillbe
discussedin detail in the
sectionsthatfollow.
2.11
Interpretation
Interpretation
synthesizes actionsduring
translation thatareexecutedby
calling
appropriatehost
machine subroutines.Interpreters
canbe
software orhardware in design. A
typical computerexecuting
a programis actually running
ahardware interpreter built into
theCPU. The
main stepsinvolved
are:1
.Fetch the
nextinstruction
pointedto
by
theinstruction
pointerIP
2.
Increment the instruction
pointerIP
3.
Decode
theinstruction
4.
Execute the instruction
5.
Repeat
steps1-4
Of
course, theCPU interpreter is processing
alist
ofbinary
machineinstructions
specificto
thehost
machine.A
softwareinterpreter,
onthe
otherhand,
usually
processesalist
of abstract machineinstructions
whichhave
acorresponding host
machinelanguage
subroutine associated withthem.The
stepsinvolved
in
asoftwareinterpreter
arevery
similar:1
.Fetch
the next abstractmachineinstruction
pointedtoby
the instruction
pointer2.
Increment the
abstract machineinstruction
pointer3.
Determinewhichhost
machine subroutineis
tobe
called4.
Execute thehost
machine subroutine5.
Repeatsteps1-4
2.12
Threaded Code
Interpreter/Compiler
A threaded
codeinterpreter is itself
atranslator
thatincorporates
athreadedcode generatorand softwareinterpreter.
Threaded
codeis
afully
analyzedinternal form
ofinstructionscomprised of addressesthat
pointto
either:Primitives
which are executablehost
machinesubroutinesorSecondaries
thatarelists
ofthreaded
codeinstructions pointing
toprimitives and/or secondaries.The job
ofthe
interpreter is to step thru
alist
ofthreadedcode instructionsand executethe code associated with eachinstruction.
If
theinstruction
pointstoa primitive, acalltothehost
machine subroutineis
made and executed.If the instruction
pointsto
asecondary,whichitself is
alist
ofthreaded
codeinstructions,
the
interpreter's instruction
counteris
setto
pointatthe
newlist,
andthe
interpreter
processesthesecondary'slist
ofinstructions. After the secondary has been
processed,
control returnsback
tothe
nextinstruction
following
the
calltoprocessthe secondary.Primitives
pointto theactual codethat'sexecutedby
the
host
machine.Secondaries group
pointersto
primitives andothersecondaries andform
alist
ofthreadedcodeinstructions
akinto
a program,function,
or subroutine.The ability
ofthe
threadedcodeinterpreter
tocreatesecondaries
is very
similarto theaction ofacompilergenerating
a machine codetargetlanguage. Because
ofthis,
the threaded
codeinterpreter
canbe
consideredto
have two
modes ofoperation;
a compile modewheresecondaries arecreatedand an execution modewherethreaded
codeis interpreted
and executed.The term Threaded Interpretive
Language,
orTIL for
short, refersto
theinternal form
of codegenerated
by
the
translator.However,
it's commonly
usedin
referencetoall elements ofthe
interpreter,
and willbe likewise
adoptedhere for
convenience.The
main elements of aTIL
are:Primitives
andSecondaries
Dictionary
Stacks
andRegisters
Outer Interpreter
Inner Interpreter
2.12.1 Primitives
andSecondaries
Primitives
and secondaries are calledkeywords,
muchthe
sameasreserved wordsin
otherlanguages.
Each have distinct
record structures asshownin Figure 2.3.
WA
PRIMITIVE
SECONDARY
address of
keyword lexeme
Header Section
Code Address WA
WA+2
WA+4
WA+6
WA+2n
WA+2(n+1)
address of
keyword lexeme
HeaderSection
link
addressofnext
keyword
linkaddressof
next
keyword
address of
primitive
's
machine code
subroutine
address of
COLON
machine code
subroutine
Code Address
WA#1 Code
Body
WA#2
WAV3
'
WA#n
wordaddress of
primitive
SEMI
Return Address
Figure
2.3
Primitive
andsecondary
code structures.Each keyword is
comprised of sections asfollows:
Header
Address
ofthekeyword's
name orlexeme
andthelink
address ofthe
nextkeyword
in
thedictionary. Note
thatsome systemkeywords
existthathave
noheaders. These
are notavailableto the userbut
areusedby
the
translator to performspecial operations.
Since the translator is
aware oftheir
existenceand
location,
noheader
is
required.For example, SEMI is
aheaderlessprimitive
inserted
atthe
end of eachsecondary
by
the
compiler.Code
Address Addressof an executablemachine code subroutine.For
aprimitive, the
machine code subroutine
is
theactualmachine codetoexecute whatever function theprimitive wasdesigned
todo. For
asecondary, the
machinecode routine
is
alwaysCOLON,
aspecial subroutinedesigned into the
Code
Body
For
aprimitive, the
actualmachinecoderoutine locatedsomewherein
memory (not
shownin Figure
2.3.)
For
a secondary, thecodebody
sectionis
a
list
ofthreaded
codeword addresses;WA#1, WA#2, WA#3,
....Each WA#
correspondsto
the starting
wordaddress,WA,
ofa primitive or secondary.Return Address
Word
addressofthe
primitiveSEMI
whichis
usedto
terminatethe
execution ofa secondary.Assuming
a2-byte
word addressscheme,
all primitives withheaders haveafixed
record size of6 bytes. Secondaries have
variable recordlengths
depending
onthenumberofthreadedcodeinstructions that
are compiledinto the
codebody
ofthesecondary.In
additionto
primitives andsecondaries,
threaded codelists may
alsoinclude
addressesthatare not
instructions,
but
instead,
are pointersto
literals
such asnumbers,characterstrings,andthe
like. Literal handlers
are primitivesdesigned
toallowfor
theinclusion
of suchliterals
in
threaded
codelists. For
eachliteral,
thewordaddress oftheappropriateliteral handler
mustimmediately
precedetheaddress oftheliteral in
the threadedcodelist.
During
execution, theliteral handler
willextracttheliteral
atthe
addresscurrently
pointedtoby
theinstruction pointer,
push
it to the data
stack,andthenincrement
theinstruction
pointer pasttheliteral
addressto
thenext valid
instruction in the threaded
codelist.
2.12.2
Dictionary
The TIL
maintains adata
structure called adictionary. The
dictionary
is
alinked list
ofTIL
keyword
records.At
startup, thedictionary
is loaded
withallthesystem primitives.As translation
proceeds,
the
dictionary
dynamically
grows withtheinsertion
of newkeywords
as secondaries are createdin
compile mode.See Figure
2.4.
Secondaries
Primitives
2.12.3 Stacks
andRegisters
The TIL
interpreter
usesstacks and severalregisters asdescribed below:
DS
Data
Stack;
LIFO
stack usedtostore numbersandaddressesRS
Return
Stack;
LIFO
stackusedtostore returnaddresses when asecondary
callsanother
secondary
ora primitive.I_reg
Instruction
Register;
usedto
store address ofnextthreadedcodeinstruction
in
currentsecondary
being
processed.WA_reg
Word Address
Register;
usedtostoretheword address ofthecurrentkeyword
orthe
address ofthecodebody
sectionofthecurrentkeyword.
CA_reg
Code Address
Register;
usedtostoreaddress ofnext executable machinecodebody.
2.12.4 Outer
Interpreter
The
outerinterpreter
servesas acontrolling
executivefor
theinterpreter. In many
TIL's,
the
outer
controlling
executiveis itself
asecondary
writtenin
thelanguage
ofthe