Rochester Institute of Technology
RIT Scholar Works
Theses
Thesis/Dissertation Collections
11-1-1993
An Extensible, scalable microprocessor architecture
Matthew Dinmore
Follow this and additional works at:
http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact
Recommended Citation
An
Extensible, Scalable
Microprocessor Architecture
by
Matthew Dinmore
A Thesis Submitted
ill
Partial Fulfillment of the
Requirements for the Degree of
MASTER OF SCIENCE
in
Computer Engineering
Approved by:
Thesis Advisor
Dr. Tony Chang
Department Head
Dr. Roy Czemikowski
Committee Member
Prof. George Brown
DEPARTMENT OF COMPUTER ENGINEERING
COLLEGE OF ENGINEERING
Thesis Release Permission Form
ROCHESTER INSTITUTE OF TECHNOLOGY
College of Engineering
Title of Thesis: An Extensible, Scalable Microprocessor Architecture
I, Matthew D. Dinmore, hereby grant permission to the Wallace Memorial Library
of the Rochester Institute of Technology to reproduce my thesis in whole or in
Abstract
An extensible,
scalablestack-based microprocessor architectureis developed
anddiscussed. Several
uniquefeatures
ofthe
architecture,
including
its
non-memory
oriented
interface,
andits
use of a stackfor
holding
andexecuting
code,
aredetailed.
A
programmed modelis
usedto
verify
the architecture,
and ahardware
implementation
of a small-scale version ofthe
architectureis
constructed andtested.
Notes for
future implementations
areprovides.Possible
applicationsbased
on
the
latest
technological trends
arediscussed,
andtopics
for further
researchContents
Abstract
i
List
ofDiagrams
vList
ofTables
viList
ofAcronyms
&
Abbreviations
vii1
Introduction
1
1.1.0 Background &
Goals
1
1.2.0
Definitions
2
1.3.0
Overview
3
2
Design Issues
4
2.0.0 Basic
Services
4
2.1.0 Input/Output Performance
6
2.2.0 System Interfaces
8
2.3.0 Program Execution Model
10
2.4.0 Additional Architectural Issues
14
2.4.1 Stack Sizes
14
2.4.2 Interrupts
16
2.4.3 Levels
ofImplementation
19
2.5.0
Instruction
Set
Design
22
2.5.1 Arithmetic & Logic Instructions
23
2.5.5
Program
Execution
Instructions
35
2.5.6 Extended
Instructions
37
2.5.7 Instruction
Encoding
38
2.6.0 Instruction
Mandates for
Each Implementation Level
45
3
Implementation &
Testing
47
3.0.0
Overview
47
3.1.0
VHDL Model
48
3.1.1
Entity
Definition
49
3.1.2
Model Internals
50
3.1.3
VHDL
Test Bench
51
3.2.0 FPGA Implementation
53
3.2.1
FPGA
Technology
53
3.2.2
The
Processor
Design
54
3.2.3 Overall Design
56
3.2.4
Arithmetic/Logic
Unit
(ALU)
57
3.2.5
Input/Output
Unit
(IOU)
60
3.2.6 Data Stack
(DS)
61
3.2.7 Control
Register
Block
(CRB)
62
3.2.8 Control Unit
(CU)
64
3.2.9 Additional
Design
Notes
69
4
Discussion
80
4.0.0
Overview
80
4.1.0
IXA&
RISC
80
4.2.0 IXA &
Stack Machines
82
4.2.1
IXA
andthe
Transputer
83
4.3.0
Applications
84
4.3.1
Uniprocessor Applications
85
4.3.2
Multiprocessor Applications
86
4.4.0
Issues for
Further Research
87
5
Conclusions
90
References
91
Appendix
A
-Architectural Definition
Appendix
B
-Condition
Code
Results
Appendix
C
-VHDL
Source Code
Appendix
D
-Test Programs
Appendix
E
-VHDL Model Test Results
Appendix
F
-FPGA Logic Diagrams
Appendix
G
-Test Results
for
the
FPGA
System
Appendix
H
-FPGA Place
&
Route Results
List
of
Diagrams
2.1
I/O Protocol
9
2.2
Architectural Levels
20
2.3
Architectural Layers
ofDefinition
21
3.1
VHDL Model Design
49
3.2
System
Layout
57
3.3
Arithmetic/Logic
Bit
Slice
58
3.4
Input Handshake
Timing
60
3.5
Output
Handshake
Timing
61
3.6
Cycle
Timing
65
3.7
Processor
Pin
Placement
72
3.8
Processor/Switch
Board Test Bench
73
3.9
Processor/MC68008
Test Bench
74
List
of
Tables
2.1
Instruction
Set
43-44
3.1
ALU
Internal
Control Signals
59
3.2
Control Signal Generation
66
List
of
Acronyms
&
Abbreviations
ALU
Arithmetic
andLogic Unit
ASCII
American
Standard Code for Information Interchange
ASIC
Application-Specific
Integrated Circuit
CISC
Complex Instruction Set Computer
CRB
Control Register
Block
CU
Control Unit
DS
Data
Stack
DSP
Data Stack
Pointer
FPGA
Field
Programmable Gate
Array
I/O
Input/Output
IB
Instruction Bit
IOCR
Input/Output Control Register
IOU
Input/Output Unit
IS
Instruction
Stack
ISP
Instruction
Stack Pointer
IXA
Implementation
Extensible
Architecture
LIFO
Last-In,
First
Out
(a
stack)
LSB
Least-Significant
Bit
MIPS
Millions
ofInstructions
Per
Second
RPN
Reverse Polish
Notation
SIMD
Single
Instruction,
Multiple
Data
SMI
Standard
Memory
Interface
SP
Stack
Pointer
SR
Status
Register
SW
Status Word
TOS
Top
Of Stack
VHDL
VHSIC Hardware Description Language
VHSIC
Very
High
Speed
Integrated
Circuit
1
Introduction
1.1.0
Background
&
Goals
The
proliferation oflow-cost
microprocessorshas
madethe task
ofselecting
aprocessor
for
a particular application more complicated.System
implementers
may
choose a general-purpose processor which providesthe
functionality
necessary
for
the
application,
orthey
candesign
acustomprocessor.By
selecting
an off-the-shelf
system,
costs arekept
low,
time
is
saved,
and a certainlevel
ofcompatibility
is
established.However,
the
processoris
notnecessarily
optimizedfor
the
application,
andthere
may
be
features
which gocompletely
unused.Developing
a custom processor would alleviatethese problems,
but
wouldbe
expensive
in
terms
oftime
andmoney,
and wouldlack
compatibility
withexisting
systems and
tools.
A better
solution wouldbe
to
base
a new systemon asimple,
standardprocessor whichprovides a core setof
computing
services and excellentscalability
andextensibility.This
thesis
develops
anddemonstrates
an architecturewhich
incorporates
these
basic
goals.Additionally,
the
architecturehas been designed
suchthat
it is
fabrication
technology
independent.
In
modern microprocessordesign,
there
aretwo
approaches
to
improving
performance; the
fabrication
technology
usedto
producefurther
performance,
but
doing
soties
logical
features
ofthe
architectureto the
technology,
a situation whichcouldimpede
implementation
ofthe
architecturein
othertechnologies.
The
architecturedeveloped here is
implementation-technology
independent
to
allowits fabrication
in
any technology,
afeature
whichis demonstrated.
1.2.0
Definitions
In this context,
scalability
refersto the
fact
that
design
variables suchasthe
wordsize,
amountofinternal
storageorbuffering,
and portionofthe
defined
instruction
setwhich
is
implemented
in
the
processorare notrigidly
setby
the
architecture.Instead,
these
variables wouldbe determined
by
the
processor or systemimplementer
to
meetthe
needs ofthe
application.Extensibility
refersto the
architecture'sability
to
supportimplementation-specific
extensions
to the
instruction
set orsupporting
hardware. This
allowsthe
systemdesigner
to take the
basic
architecture and addcapabilitiessupporting
the
specific1.3.0
Overview
In
orderto
provide abasis
for
the
development
ofthe architecture,
severalissues
ofprocessor
design
arediscussed
relativeto
currentsolutions.From this
discussion,
various aspects of
the
architecture areintroduced.
A
detailed discussion
ofthe
architecture
follows.
To
putthe
architecturein
perspective,
it is
comparedto
some similar systems.
A
VHDL
model of one possibleimplementation
ofthe
architecturehas
been
created and used
to test
various aspects ofthe
architecture.A
different
implementation has been
performedin
hardware
to
demonstrate
the
architecture'simplementation flexibility.
Both
ofthese
implementations
aredetailed
anddiscussed.
Finally,
the
unique nature ofthe
architecture requiresthat
topics
such asprogramming
(and
the
softwaretools to
supportdevelopment),
externalinterfacing
and
implementation
willbe
discussed.
Applications for
the
architecture willbe
2
Design
Issues
2.0.0
Basic
Services
The
purpose ofdeveloping
a new architectureis
to
provide solutionsto
computing
problems.
As
discussed
above,
an application-specific solutionhas
the
drawback
of
being
limited
to
just
onepurpose,
arestriction whichmay
not proveeconomicalagainst
the
high
costofproducing
a newprocessor.Since
the
goalofthis
architectureis
to
providebasic computing
services while notimposing
the
restrictions ondesign found
in
either general-purpose or application-specificprocessors,
it is
necessary
to
define
whatthese
basic
services are.All computing
units mustbe
ableto
acceptdata
andinstructions,
apply
the
instructions
to the
data,
and produce results.To
be
ableto
acceptinput
andproduce
output, the
processor musthave
aninterface between
its
internal
andexternalenvironment.
The
specificationofthis
interface is
criticalto the
systemsimplementer,
because
it
determines
whattypes
of externaldevices
canbe
connectedto the processor,
andwhat protocolthey
will useto
communicate.To
makethe
architecture as extensible as possible, a simple protocol and
interface
schemeis
required.
The
selection ofinstructions
for data
processing
willdetermine
whattypes
ofdevelopment,
complexinstruction
setcomputers(CISC)
providedinstructions
to
handle
a widevariety
ofcomputing
tasks.
The belief
wasthat
by
providing
powerful and
flexible instructions
in
hardware,
programmers wouldbe
ableto
spend more
time
writing
applications,
andless
time writing
subroutinesto
performbasic
tasks.
CISC
machines providedthis
capability
atthe
expense of complexdecoding
andinstruction
execution mechanisms.As
programmersbecame
morereliant on compilers
than
hand
assembly,
fewer
ofthe
complexinstructions
wereused,
sincethe
compilertends to
convert source codein
the
most genericmannerto
improve
efficiency.As
such,
if
the
programmerwrote a routinein
ahigh-level
language
to
performthe
samefunction
as anexisting
processorinstruction,
the
compiler would assemble
it from
simplerinstructions
ratherthan
usethe single,
more complex
instruction.
The
realizationofthis
fact
lead
computer architectsto
devise
reducedinstruction
set computers
(RISC),
anarchitectural paradigmin
whichonly
basic
instructions
with
few addressing
modes are provided.The
task
ofcreating
more complexinstructions
is left
to
the
compiler.The instruction
setsofthese
computers provide2.1.0
Input/Output
Performance
The
first
task
is for
the
processorto
be
ableto
input data
andinstructions. The
optimal
way
to
do
this
is
to
be
ableto
process a continuous stream ofdata
andinstructions
whileproducing
acontinuousoutput stream of results.Some
systemswhichattempt
to
do
this
include
systolicarrays and certaindigital
signal processors.Both
ofthese
rely
onthe
regularity
ofthe
data
andconsistency
ofthe
instruction
execution cyde
to
achievehigh
performance.However,
this
is
generally
not possiblesince
instruction
executionis
oftenconditional,
anddata
may
residein
the
processor's
internal
orthe
system'sexternalmemory
in
analmost randomorder.Systolic
arrays are ableto
achievehigh
performanceby
repeatedly executing
aninstruction
sequenceondata
whichis
continuously
shiftedthrough
eachprocessorin
the
array
[3,
2.18]. In
this
mode ofoperation, the
instruction
sequenceis
eitherhard-wired
into
the processor,
orloaded before data
processing
begins
(assuming
a reconfigurable
array,
of course).Once processing
begins,
nospecialinstructions
are required
to
read or writedata;
it
simply
'flows'
in
and out ofthe
processor.This
is,
in
asense,
animplicit
addressing
scheme.This
modeldoes
not allowfor
conditionalcode execution or
any
changein
the
code sequenceduring
execution,
thereby
reducing
the
number of applicationsfor
whichthe
array
is
usefulto
aorder
to
makethe
process moregeneric, it is
necessary
to
provide a means ofloading
andinternally
storing
data,
andthen
operating
onthe
data
withinstruction
sequenceswhichare not
necessarily
staticfor
the
duration
ofthe
computing
task.
This
canbe
accomplishedby
using
a stack.This is
alast-in,
first-out
(LIFO)
data
structure which requires no
addressing
or specialinstructions for
loading
data
(unlike
the
register-basedmodelcommonly found
in
CISC
andRISC
processors).Instead,
it is
only necessary
for
the
input
systemto
be
ableto
differentiate data
and
instructions,
sothat
they
cansharethe
sameinput/output
(I/O)
pathways,
an
important
considerationboth
in
terms
oflogical
integration
andthe
physicallimitations
ofimplementation. All
that
is
requiredto
do
this
is
a singlebit. If
weconsider
this
bit
to
be
set whenrepresenting
aninstruction
and cleared whenrepresenting
data,
then
it
is
possibleto
load data
and processit
from
a singlestream of words
from
an externalsource; the
nature ofthis
sourceis irrelevant
to
the
function
ofthe
processor.Any
wordarriving
atthe
input
with a clearedinstruction
bit (the
formal
namefor
the
flag
bit)
is
implicitly
pushed ontothe
stack,
while a wordarriving
witha setinstruction
bit
is
directed
to the
execution2.2.0
System
Interfaces
The
goalin
designing
the
interface
to
the
processoris
to
providethe
lowest
common
denominator
uponwhich more powerfulinterfaces
canbe
constructed.Two
elements arerequired;
adata
path and a means ofproviding
controlinformation,
i.e.
handshaking
between
the
components.The
data
pathmay
be
either serial or parallel.Serial
systems providesimplicity
in
terms
ofinterconnection,
but
are slowerthan
parallel systems.Additionally,
since processors
generally
handle data
in
parallelmodeinternally,
less
hardware
is
requiredto
implement
the
interface.
To
providehandshaking
between
systems,
a standard arrangementrequiring
handshake
input
and output signals onboth
devices
canbe
used.The
protocolfor
handling
communicationsis
the
important
consideration,
and needsto
be
verified
for
proper operation.A Petri diagram
withinitial
markingsfor
the
resetDack
IHift)
OHi
IHi
(+)
Outdata
OHo)
Diagram
2.1-I/O
Protocol
Symbols
I H i
Input
handshake
(for
input,
from
sender,
arrowsdenote
edges)
IHo
Output
handshake
(for
input,
from
processor)
OHo
Output handshake (for
output,
from
processor)
OHi
Input handshake
(for
output,
from
sender)
Outdata
Processor internal
signalto senddata
Dav
Processor
internal
signal toindicate
data
availableDack
Processor internal
signaltoacknowledge acceptance ofdata
at portThe
protocol callsfor
the
senderto
raiseits
outputhandshake
(IHi)
andhold it
while
the
data
onthe
bus is
valid(the data
shouldactually
be
valid priorto the
assertionof
the
handshake
sothat the
receiver can usethe
handshake
to
latch
the
data into
aninput
buffer,
if
necessary).The
sender must continueto
hold its
handshake
untilthe
receiver replies withits
ownhandshake
(IHo).
The
receiverthen
holds its
acknowledgmenthandshake
untilthe
senderdrops
its
handshake,
This
protocol canbe
used onboth
uni- andbidirectional busses. With
aunidirectional
bus,
there
is
nopossibility
ofcollision,
sothe
protocol need notaddress collisions.
In
the
eventthe
bus
is
bidirectional,
sometype
of collisiondetection
is
required.In
keeping
withthe
goal ofproviding only
the
mostbasic
definition
in
the architecture,
no collisiondetection
orhandling
protocol willbe
specified.
If
animplementation
requires a more robustsystem,
one canbe
implemented
externally.To
mandate a protocol wouldincrease
complexity
for
all
cases,
eventhose
for
whcihcollisions are not a possibility.Note
that the
protocolis
notdeadlock-free. If
eitherparty fails
to
respondto
ahandshake,
the
system willfreeze.
Again,
defining
a protocolto
handle
lock-up
increases
complexity for
allimplementations,
eventhose
wheredeadlock
is
notan
issue.
Instead,
deadlock
shouldbe
consideredasystemerrorcondition.2.3.0 Program Execution Model
As
discussed
above,
data
andinstructions
canbe loaded
into
the
processorusing
the
sameprotocols.Using
astack,
data
canfirst be loaded
andthen
processedby
instructions
from
the
sameincoming
stream of words.This
format for
expressionhandling
is
known
asReverse
Polish Notation
(RPN),
andis
the
most efficientway
to
evaluatearbitrarily
complex expressions.The
question ofhow
the
Standard
processorsbegin
fetching
instructions
from
memory
whenthey
arestarted.
These instructions
arelocated
in
blocks
in
memory,
andthe
instructions
in
eachblock
are sequential.The
processoris
ableto
branch
orjump
within andbetween blocks
by
executing
specificbranching
operations,
oftenbased
onthe
current state of
the
processor.Since
nomemory
interface
has been defined
in
this
architecture, this
processis
notdirectly
possible.Instead,
this
architecture reliesonanexternal program controller
(PC)
to
feed
instructions
anddata
to the
processorin
anexecutablestream.The
nature ofthis
controlleris
notspecified,
sinceit
willdefine
whatthe
systemdoes.
However,
the
interface
to the
processorfor
the
controller
is
clearly
defined
andstandard,
sothat
processors and controllerscanbe
interchanged.
With
such acontroller,
astandardmemory
interface
couldbe
created and programsin
memory
couldbe
executed as withany
other processor.The
controller wouldhandle reading
andwriting data
to memory,
and wouldbe
ableto
performbranches
andjumps.
This
is
made possibleby
allowing
additionalinstructions
to
be defined (the
architecture's extensibility), andby
providing
the
processor'sinternal
statusto the
PC
thoughaset ofdedicated
signals.Building
a systemin
this
way has
no clearbenefits
over atraditional processor;
in
fact,
if
the
processor andthe
PC
arephysically
separated,
performance wouldthis
is
the time
requiredto
executebranches.
This
is
one ofthe
largest
problemsin
any
architecture,
but
even more sohere due
to the
extratime
requiredto
communicate
the
processor's statusbefore
aconditionalbranch.
Recent
processors often addressthis
issue
through the
use ofbranch
prediction.A
prediction aboutthe
outcome ofthe
branch is
made,
often withthe
help
ofinformation
encodedin
the
instruction
whichindicates
the
most-oftentaken
path(this is
the
method usedin
processors such asthe
IBM/Motorola PowerPC
[7,
1.4.1]). If
the
predictionsucceeds,
nothing
is
lost
in
execution,
whileif it
fails,
the
processor's pipelineoftenneedsto
be flushed
causing
adecrease
in
performance.This
architecturedoes
not allowfor
easy pipelining
because
ofthe
inherent
data
dependency
causedby
the
use of a stack.However,
conditionalbranches
willstill
lower
performance.To improve
the situation,
it
wouldbe
usefulto
movecode
fetch
and executioninside
the
processor under certaincircumstances.This
can
be
accomplishedby
providing
a means ofloading
a codesegment,
whichwill
possibly be repeatedly
executed,
into
the processor,
andthen
executing
it
without
interaction from
the
external controller.Since
there
is
noaddressing
schemebuilt
into
the processor,
it
makes senseto
implement
a second stack which willbe
usedto
hold
the
code asit is loaded.
A
instructions
areimmediately
executed);
execution ofthis
instruction
placesthe
processor
into
a mode whichdirects
incoming
instructions
to the
second stack(the
Instruction Stack
orIS)
until an escapeinstruction
is
executedto
returnthe
processor
to
normal mode.Note
that
incoming
instructions
are stilltagged
withthe
instruction bit
andthat
data
canstillbe loaded
concurrently
withinstructions,
since
it
willbe
directed
to the
Data Stack (DS).
Once
the
IS
has been
loaded,
instruction
execution canbe
switchedfrom
the
I/O
ports
to the
IS.
Instructions
arethen
executedfrom
the
IS
until aninstruction is
reached whichstops
IS-based
execution.During
execution, the
IS
behaves like
astack
only
in that the
stackpointeris
usedto
locate
the
nextinstruction.
Instructions
for
modifying
the
stackpointer,
andthereby
allowing
branching
andlooping,
arenecessary.
Instructions
executedfrom
the
IS
act uponthe
DS in
the
same mannerasinstructions
coming
from
the
I/O
system.Data
can also stillbe loaded
implicitly
during
stack-based execution
by
passing
it
to the
I/O
systemasduring
normalexecution;
an
arriving
wordis
treated
asthe
top-of-stack(TOS)
item,
asit
wouldbe if it
were pushed onto
the
DS.
This
allows continuousdata
processing
withoutany
explicit
data-load
instructions
being
executed, a clear advantage over standardprocessors
for
certainapplications.However,
aninstruction
to
performanexplicit2.4.0
Additional
Architectural
Issues
The
abovediscussion
provides a coredefinition for
anextensible, scalable,
stack-based
architecturefor
whichthe
nameImplementation
Extensible
Architecture
(IXA)
has
been
chosen.The
remainder ofthis
sectiondiscusses
some other aspects ofthe
architecturewhich are
less
criticalto the
overalldefinition,
yet require somediscussion. A
complete,
stand-alonedefinition
for
the
architectureis
providedin Appendix A.
2.4.1
Stack
Sizes
The
size of either ofthe
internal
stacksis
notdictated
by
the
architecture.Instead,
it
shouldbe based
onthe
requirementsofthe
application.Stack
usagefor
animplementation
ofthis
architecturefalls into
oneoftwo
classes.First
is
that
of singleframe
usage.In
this class,
it is
used as atraditional
LIFO
stack,
which requiresonly
a small amount of space.Extensive
tests
have been
performed
to
determine
the
optimal size of aData
Stack
[5,
6.4.1]. These
tests
show
that
a stack size of32
elementsis
sufficientfor
allbut
the
mostdeeply
The
secondclassofusageis
withmultiple,
simultaneous stackframes on-chip (it
is
also possibleto
have
a single small stackandswap
its
contentsto
anoff-chip
memory).The
stack shouldbe large
enoughto
supportthe
required number offrames.
The
externalprogram controller(or
a programrunning
from
the
instruction
stack)
mustkeep
a setofstack pointersto
referencethe
frames.
These
may
eitherbe
storedexternally
oronthe
DS,
so a master pointer spacemay
alsobe
required.The
size ofthe
Instruction
Stack is entirely dependent
upon what applications willbe
run andhow
the
stack willbe
usedby
them.
If
the
entire programis
to
be
loaded,
the
stackmay
needto
be
large;
32
kilobytes has been found
to
be
avery
reasonable sizefor
programs encodedin
eightbits
[5, 4.1.1]
when runfrom
external memories.
Alternately,
the
IS
couldbe
usedto
store often repeated subroutinesto
gain a performanceboost.
Finally,
it
may
only
needto
execute asmall,
very
simpleoperation(as
in
asystolicarray
application),
requiring
aminimal stack size.Another issue
affecting
the
size ofboth
stacksis
the
word size selectedfor
the
organized around a
memory
subsystem,
the
amount ofmemory
which canbe
accessed
is limited
by
the
width ofthe
address storage andtransport system,
which
is
the
stackin
this
architecture.The
word width ofthe
IS is
also relatedto the
word-sizeissue. The
DS
mustsupport
the
full
word widthto
handle
data,
but
the
IS
only
needsto
hold
instructions,
which areonly
8-bits
wide.The
only
cases where a widerIS is
necessary
are whenthe
IS
willbe
usedto
storedata
or constantsinside
a codemodule
for
repeated useduring
execution,
or whenthe
implementation
uses anextended
instruction
format
to
supportinstructions
outsidethe
definition.
To
handle
allcases,
it is
recommendedthat the
IS
be
word-width.2.4.2
Interrupts
Many
applications requirethe
ability
to
interrupt
processoractivity
to
performspecial,
usually
time-sensitive operations.This
is
particularly
important
whenexternal,
asynchronousdevices
areinvolved
whicharerunning
slowerthan the
internal
speed ofthe processor;
it is
more efficientto
continue normaloperationsthan
to
waitfor
the
device
to
become
ready.Normally,
interrupt
handling
is
amatter ofstopping
the
normalinstruction fetch
use of
the processor,
andthe
interrupted
process canbe
completely
restoredafterward),
andbeginning
execution ofthe
interrupt
handling
code.With
out astandard
instruction
fetch
cycle, IXA
requiresadifferent
approach.Interrupts
arehandled
through
apriority-basedrequestsystem.When
aninterrupt
is
signalled, the
processor stopsit
currentactivity
and signalsto
all externalprogram controllers
that
aninterrupt has
occurred at a particularlevel. Four
interrupt levels
aredirectly
supported;
three
are maskableusing
aninterrupt
mask code stored
in
the
processor's status register.An interupt
request succeedsif its level is
greater-than or equalto the
masklevel. This
meansthat
level
0
is
non-maskable.
Once
the
interrupt
is
accepted, the
processor pushesthe
status wordontothe
DS.
This
is
the
only
item
pushed as a matter ofmaintaining
high-performance
andsimplicity.
If
the
interrupting
processmodifiesany
other system varibles suchasthe
stackpointers,
IO
Mode
information,
or either ofthe stacks,
it
is
responsiblefor properly restoring them,
if
necessary.The
basic
restoration processis
to
clearthe
interrupt
andthen
restorethat
status word.The
currentinterrupt level
is
communicatedto
external processorsthrough
a setof
dedicated
signallines;
the
information
is
not storedin
the
status word.As
to
the
processor untilthe
interrupt
conditionis
cleared.This
preventsdata
loss
and gives
the
interrupting
device
completeaccessto the
processor.When the
interrupting
device is
through
withthe processor,
it
should restorethe
state and clear
the
interrupt
status.Note that
state restorationis
left
to the
interrupting
device.
This
is because
aninterrupt is handled
more as anindefinite
transfer
of controlto
anotherdevice
than
one whichhas
some expectation ofending.
An
interrupted
process canitself
be
interrupted. The
same process as aboveis
performed
in
such a case.Note
that
care needsto
be
taken
notto
clearthe
interrupt
statusunderthese
circumstances asthis
willpreclude propernesting
ofinterrupts.
Instead,
a processinterrupting
anotherinterrupt
process shouldfirst
determine
whether aninterrupt
conditionalready
exists,
and,
if
it
does,
eitherwait
for
the
interrupt
to complete,
or notclearthe
interrupt
status uponexiting
its
routine.Interrupt
handling
atthe
processormay
notbe
as efficient ashandling
it
atthe
program controller when
the
processoris
notexecuting from
the
stack,
sinceit is
the
program controller whichis
sending
the
instruction
streamto the
processorthe
stackit is
possibleto
handle
interrupts
atthe
PC
sinceinstructions
arriving
atthe
portshave
ahigher
priority
than
those
coming
from
the
stack(in
otherwords, this
canbe
usedasanotherform
ofinterrupt
mechanism).2.4.3
Levels
ofImplementation
The
development
ofthe
computerindustry
overthe
pastten
yearshas demonstrated
that
no one computer systemis
correctfor
all applications.However,
there
aremany
instances
where one architecture canbe
appliedto
avariety
of situationson
different
scales.To
addressthis
issue,
the
IXA
has been divided into
three
levels
of standardimplementation. Each
ofthese
supports a subsetofthe
features
defined
by
the architecture, and,
in
doing
so,
asubset ofthe
instruction
set.The
first
level,
Level
0,
supportsonly
the
mostbasic
set ofinstructions. It
has
only
a minimalData Stack. Without
anInstruction
Stack,
there
is
no needto
support
any
ofthe
stack-based executioninstructions
or modes.Interrupts
alsoneed not
be
supported,
since such animplementation
wouldrely
entirely
on anexternal program controller
for
direction,
sothe
controller wouldbe
requiredto
deal
withinterrupts. Applications
for
such adesign
wouldbe
primarily
found
in
array
processing
wherethe
data-passing
capabilities ofthe
processor couldbe
coupled with a shared
instruction-feed
mechanism, asis
found
in
many
Single
A Level
1
system mightbe
applied where someamountof autonomousinstruction
execution
is
required,
for
examplein
areconfigurablearray
in
whicheach processormay
perform adifferent
operation.Level 2
representsthe
complete architecturaldefinition.
All
aspects ofIXA
andits
instruction
set mustbe
presentin
such animplementation. A
full
implementation
with
large Data
andInstruction
Stacks
couldbe
usedas a stand-alonecomputing
system,
or as a powerful coprocessor.The
three
levels form
alayered
model ofthe
architecture:;'J
Xit
'
::,'
m
__Level
0
i
:Level 1TTnnnTl
Level
2
*_!1 111 1irrc**Extensionsto Level
0
Extensions
toLevel
1Extensions to Level 2
Diagram
2.2
-Architectural
Levels
an
implementation may
support either someinstructions
of
the
layer
above,
orsome extended
instructions
uniqueto the
implementation.
This
canbe
graphically
shown
in
a similar manner::
Defined:;
Extensible
Diagram 2.3
-Architectural Layers
ofDefinition
The
three
layers
shownin Diagram 2.3
apply
to
eachlevel defined
above.The
Mandated
area representsthe
instructions
and architecturalfeatures
requiredfor
a particular
implementation
to
be
compliant with one ofthe
levels. The Defined
regioncontains
features
notrequiredby
a particularlevel,
but
defined
for higher
levels,
whichcanbe
implemented if
necessary (a design
may
use somefeatures
of a
higher level
withoutmeeting
all ofthe
mandatedrequirementsofthat
level).
Finally,
the
Extensible
region covers allfeatures
notprovidedfor
by
the
architecturalWhen
these
two
views ofthe
architecture arecombined,
it becomes
clearhow
flexible
it
is
with respectto
implementation. It
also showshow
muchresponsibility
is
placed onthe
systemimplementer
to
integrate
the
processorinto
the total
system.
The
concept ofdeveloping
an architecturein
many
levels is
not new.A
similarapproach was used
in
the
design
ofthe
Intel 80960
architecture[8,
p.2]. This
architecture
directly
supportsthree
layers
representing
core,
numeric and protectedfunctions.
A
fourth
layer,
the
extendedarchitecture,
is
reservedby
Intel
for
its
ownextensions.
Limiting
extendedfeatures
in
this
mannerreducesthe
processor'sappealas a genericsolution
to
aspecific problem.2.5.0 Instruction Set Design
With
the
important features
ofthe
architecturedefined,
aninstruction
set canbe
developed. Since
the
architectureis based
onthat
ofastackmachine,
many
ofthe
instructions
willnaturally be
similarto those
found in
stack-based computers.The
majordifferences
arethat
(1)
the
I/O interface
is
significantly
different,
and(2)
the
lack
of a codefetch
mechanism andthe
use of a stackfor
the
internal
The basic
functional
areaswhichmustbe
addressedfor
this
architectureare:Arithmetic & Logic
-basic logical
functions,
shifts,
and arithmeticoperations.I/O
-providing ability
to
read and writedata
underprogrammed control.Control
-handle setting
andreading
of processor statusinformation.
Program Execution
-allow
for
conditional execution ofcode,
branching.
These
operations are similarin
many
casesto
Forth
language
primitives.The
Forth
language
wasdeveloped
to
provide efficient execution ofcodebased
on astack machine model.
The
core set ofoperations performed on stacks can canbe
easily
described
in
terms
ofForth
operations,
which providea pointof reference.Also,
sinceForth
mapsdirectly
to
a stackmachine,
it
should notbe difficult
to
develop
a compilerto translate
Forth
to this
architecture.The
referencesthroughout
this
document
to
Forth
are aimedatshowing
this
inherent
level
ofcompatability.For
moreinformation
onthe
Forth
language
asit
relatesto
stackmachines,
see2.5.1
Arithmetic
&
Logic
Instructions
The
selection of which arithmetic andlogic
operationsto
implement
is
based
primarily
uponthe
examples setby
many
architectures.The Arithmetic & Logic
Unit
(ALU)
does
most ofthe
useful workin
the processor,
andthere
are severalbasic
operations which areuniversal.Additionally,
some considerationneedsto
be
givento the
types
of applicationsthe
processormay
be
usedfor,
namely
data
processing
in
whichlogical
and shift operations are useful.The
following
instructions
wereselectedwiththis
in
mind:NOT
bit
inversion
AND
bit
disjunction
OR
bit
conjunctionXOR
bit
inequality
SLL shift
left,
logical
SRL shift
right,
logical
SLA shift
left,
arithmeticSRA shift
right,
arithmetic(sign extend)
SLR
shiftleft,
bit
rotationSRR
shiftright,
bit
rotationADD signed addition
SUB signedsubtraction
MUL signedmultiplication
A
few
notes aboutinstruction
selection:a.
All
arithmetic operationstake
signed(two's
complement)
values asoperands.
b.
XORwas selected as anindependent
operation eventhough
it
canbe
performed
through
a combination oflower-order
operationsbecause
it
is
useful
in
performing
operations such asthe
Cyclic
Redundancy
Check
(CRC)
found
in
data
communications.c.
The
MUL and DIVoperations will notreturnvalues outsidethe
defined
single-wordrange of
the
implementation. This
meansthat
not all possibleinputs
generate resultsin
range.This is in
keeping
withthe
minimalapproach.
Requiring
acomplete answerto
be
producedrequiresthe
ability
to
handle
double-words,
whichin
turn
requires morehardware
just
to
support
two instructions.
Further,
since no otherinstructions
handle
double-word
values,
a programmermaking
use ofthe
entiredouble-word
resultwill
have
writeadditionalcodeto
usein
with addition orlogical
operations,
oreven storeormanipulate
it
onthe
stack.It
seemsbetter,
then,
if
additioanlcode
is
required anyway,to
providebasic
multiply
anddivide
operationsAs
a stackmachine,the
ALU
willtake
its
operandsfrom
the
top
of stack(TOS)
(one
ortwo
operands,
asappropriate),
and return a resultto the
TOS;
the
operands are
destroyed in
the
process.In
cases wherethere
aretwo operands,
it
may be
usefulto
keep
onefor
additionalcalculations,
like
a constantfunction
ona
hand
calculator.In
many
applications a constantis
often appliedto
an entiredata
set.To
facilitate this,
operations whichtake two
operandshave
two
modesof operation: normal and constantmode.
To denote
constantmode,
a'C
willbe
appended
to the
standardmnemonic.The
operationssupporting
constants are:ANDC
ORC
XORC
ADDC
SUBC
MULC
DIVC
A final
operationwhich canbe
classified as an arithmetic operationis
acomparisonoperation.
Generally,
comparisonoperationsaresimply
subtractionsin
whichnoresult
is
stored.In
a stackmachine,the
disposition
ofthe
operandsis in
question.in
constantmodeabove),
orthat
both be
saved sothat
anadditional,
conditionaloperation
be
performed onthem
afterthe
comparison.As
such,
three
compareoperations are provided:
CMP
compare,
destroying
both
operandsCMPC
compare,
destroying
only
the
operand atthe
TOS
CMPN
compare,
preserving
all ofthe
operands.To
flag
the
result of anoperation,
a standardsetofflags
is
used:N
(Negative
Hag)
-result was negative
C
(Carry
Flag)
-a
carry
occurredV
(Overflow
Hag)
-arithmetic overflow occurred
Z
(Zero
Hag)
-result was zero
The
N
flag
is
set whenthe
most-significantbit
(MSB)
ofthe
internal
word-size(i.e.
afterthe
externalinstruction
flag
bit is
discarded)
is
set,
asit
is
for
allnegativetwo's
complement numbers.The
C
flag
is
setwhen anarithmetic operation causesa carry-out
to occur,
or when abit
whichis
setis
shifted outduring
a shiftoperation.
The
V
flag
is
set on and addor subtract whenthe
last
two
carries arenot
the
same(indicating
the
result was out ofthe
representablerange),
on aleft
shiftif
the
shifted-outbit
is
not equalto the
resulting MSB.
Zero is indicated
when all
bits
ofthe
result are zero.The
flags
areonly
setby
ALU
operations ordirect
manipulation ofthe
statusregister;
other operations whichsimply
manipulate
items
onthe
stackdo
not affectthe
previous settings ofthe
flags.
2.5.2
Stack Manipulation
Operations
Just
as register-based processorshave
operationsto
movedata
to,
from
andamong
registers,
a stack machinehas
instructions
which movedata
to,
from
andwithin
the
stack.For
this architecture,
only
avery basic
set of operationsis
provided,
from
whichmany
standard stack manipulation operations canbe
derived.
The
mostbasic
operation on a stackis
to
pushitems
ontoit,
andpop
them
off.This is
handled
essentially
transparently
by
the processor,
whichautomatically
pushes
data
items,
and pops operandsto
instructions
as necessary.The
READinstruction,
aprogrammed methodofpushing
data,
aswellasthe
writinstruction
for outputting
data,
arediscussed in
section2.5.3.
Another
way
to
removedata
from
the
stackis
to
adjustthe
stack pointer suchthat
it
pointsto the
seconditem in
the stack,
thereby
moving
the
top
of stack.the
next operation.In
Forth,
this
operationis
DROP
(in
Forth's
virtual machineenvironment,
the
item
is
destroyed).
By
allowing
the
stackpointerto
be
changedby
two positions, Forth's
DR0P2
canbe
executed.This
leads
to
the
development
of
the ADJ
instruction,
arguably
the
most powerfulinstruction
providedby
the
architecture.
Since
there
is
nodefined
stacksize,
how
adjis
usedis
relativeto
the
implementation's
stack size.Using
the ADJ
instruction,
it
shouldbe
possibleto
move
the
stack pointerby
any
offset,
allowing any
numberofitems
to
be
dropped.
On
a smallstack,
adjusting
canbe
usedto
simulate a register-based systemby
relatively addressing
the
variousstackelements.On
alarge
stackbuffer
(several
hundreds
orthousands
ofbytes),
this
addressing
canbe
extendedto
allow multiplesimultaneous stack
frames,
each of which couldbe
accessedby
adjusting
the
stack pointer.
The
effect ofthis
is
to
allow multiple processes or separatedata
storage areas
to
usethe
stackmemory
withouthaving
to
flush
one processesstack
frame
priorto
another'sactivation,
asis
common withother stackmachines.The
ADJinstruction
appliedto the
IS
(using
the
mnemonicADJI)
providesthe
capability
to
perform relativebranches
orjumps in
the
executing
code.As
withthe
DS,
multiple activeframes,
in
this
casecontaining
executablecode,
canbe
loaded.
These
frames may
containonly
subroutines usedby
oneprogram,
orIn
standardarchitectures,
branches
areusually
made conditionally.To
facilitate
this
(as
wellas allow adjusts madeonthe
DS
to
occurconditionally),
a conditioncode
may
be
spedfiedwiththe
adjust.See
section2.5.7
for
adiscussion
of conditional operations.Two
other stack manipulationoperations providethe
ability
to
movedata
onthe
stack.
Unlike
a register-based machinein
which all validdata items
are availableto
anoperation,
only
the
items
atthe
TOS
are availablein
a stackmachine.The
ADJ
operations canbe
usedto
changethe
location
ofthe
TOS,
but
not affectthe
ordering
ofitems
relativeto
each other.A
common operationrequiredto
do
this
is
the
SWAPfunction
whichexchangesthe
top
two
items
onthe
stack.Note
that
this
operation requiresthat
two
items
be
writtento the
stackin
one processorcycle;
nootherinstruction
requiresthis,
andspecialhardware may be
requiredto
implement
the
instruction.
However,
SWAPwasfound
to
be
among
the ten
most-common
instructions in Forth
programs[5,
6.3.1].
Another
commoninstruction in Forth
is
the
dup operation which makes acopy
of
the
TOS
andthen
pushesit. This
is generally
usedto
maintain adata
item
whichwouldbe
destroyed
by
alater
operation.This
functionality
is
partly
providedby
the
constant mode providedfor
arithmeticoperations,
but
notfor
otherinstructions
taking
operandsfrom
the
DS.
Related
to
DUP arethe
SECOND,
THIRDnew copies onto
the
stack.The
genericform
ofthese
instructions
is
the
PICKinstruction
whichtakes
anoperandnaming
a particular element onthe
stackto
copy.
To
providethis
capability, the
architecturehas
the GET
instruction.
The
basic
GETinstruction
takes
an offset encoded withinthe
instruction from
the
TOS,
allowing
one of
the
top
16
stack elementsto
be
copiedto the
TOS
witha singleinstruction.
LGET
takes the
offset as an operandfrom
the
DS,
allowing
accessto
2word-sizeelements.
GET
0
providesthe
DUPoperation,
asdoes
LGET0
(for LGET,
the
offset
is
calculated afterthe
operandcontaining
the
offsetinformation is
poppedfrom
the
stack sothat
GETand LGEToperateinterchangeably
for
the
top
16
stackelements).
Note
that
by
using
the
GEToperations,
operands canbe fetched from
anywhere
in
the
stackto
the
TOS,
and,
after a non-constant mode operationis
executed on
them,
the
only
changeto the
stackis
the
result ofthe
operationwhich
is
atthe
TOS.
This
allows several operations against adata
set stored onthe
stackto
be
performed(albeit
inefficiently).
Two
additionalinstructions
which operate onthe
stacks are PUSH and PUSHI.These
allowdata from
one stackto
be
transferred
to the
other. PUSH popsthe
next
item
onthe
IS
andpushesontothe
DS.
PUSHIPerforms
the
reverse operation.PUSH
is
usefulfor allowing
constantsto
be
placedinside
a code modulethat
willtime
(or
multipletimes)
during
program execution.PUSHI
allowsthe
IS
to
be
used
for
storage of results(which
canbe
recovered with PUSH or written withWRITI).
Note
that
executing
PUSHI
from
the
instruction
stackhas
the
possibly
deleterious
effect ofcausing
the
PUSHIinstruction
onthe
IS
to
be
overwrittenwith
the
pusheddata,
which willthen
be
executed asthe
nextinstruction. The
instruction's
ability
to
be
executedunderthese
conditionsis
ofquestionablevalue.2.5.3
Input/Output
Instructions
In
additionto the
implicit
pushoperation performedwhendata
arrives at aport,
it is
also usefulto
be
ableto
readdata from
a port under programmedcontrol,
particularly
whenthe
default
prioritization ofthe
data does
not meetthe
program'srequirements.
To
allowthis,
there
is
a READinstruction
which allows a singledata
item to
be
readfrom
a specified port and stored onthe
DS. A
read willcause
the
processorto
halt
untildata is
availableatthe
specifiedport.To
outputdata,
the
WRITinstructions
are provided.The
basic
instruction
popsthe
top
element ofthe
DS
and sendsit
to the
named port.As
with arithmeticinstructions,
a constant mode of operationis
provided(WRITC)
sothat
only
acopy
ofthe
data item is
written; the
original remains onthe
stack.Finally,
the
WRITI
instruction
is
usedto
writethe
next elementin
the
IS
to
a port.Note
that
an operation would cause
the
writtendata
item
then to
be
executedas aninstruction.
The
purposeofthe
instruction
is
to
providethe
ability
to
encode constantdata
asan
instruction (so
that
it
canbe
readdirectly
to
the
IS)
whichcanthen
be
writtendirectly
to
a portduring
IS-based
execution.2.5.4
Control Operations
Control
operations provide aninterface between
the
state ofthe
processor andthe
programexecuting
onthe
processorby
allowing
that
stateto
be
read andmodified
directly.
Several
registers are providedto
do
this.
There
aretwo
stackpointer registers:one
for
the
Data Stack Pointer
(DSP),
and onefor
the
Instruction
Stack
Pointer (ISP). These
registersare atmostword-sized,but
may
be
smaller asrequired
by
the implementation.
The Input/Output Control Register
(IOCR)
has
at most eight
bits. The
even-numberedbits
(0,2,4,6)
representthe
setting
ofthe
word-size-plus-one
bit
ofthe
outputports,
normally
the
instruction
bit.
It
may
be
used
to
passinstructions
to
other processors or as a separatesignalling
line. The
odd-numbered
bits
representthe
lock-out
statusofeachport; the
bit is
setfor
the
port
if
the
portis locked
(meaning
noinput
is
possible onthe port,
althoughoutput
is
still possible).Finally,
the
Status
Register
(SR)
containsthe
processor'sStatus
Word (SW). The
eightbits in
this
register representthe
following
(from
LSB
to
MSB): Zero
flag
(Z),
Overflow
flag
(V),
Carry
flag
(C),
Negative
flag
(N),
Access
to
all ofthese
registersis
providedthrough
agroup
of controlinstructions.
When
a registeris
read,
its
contents are pushed ontothe
DS.
The instructions
to
do
this
are:PSW
Push
Status Word Register
PIO
Push I/O Control Register
PS
PPush
Stack
Pointer
(DS)
PS PI
Push
Stack Pointer
(IS)
When
PSPis
executed, the
value ofthe
stack pointer which gets pushedis
the
valueprior
to the
push operation(since
the
DSP is
decremented
whenthe
valueit
pushed).
If
PSPIis
executedfrom
the
IS,
then the
value ofthe
ISP
whichis
pushed
is
that
afterthe
PSPIinstruction
has
been
popped.Once
a valuehas been
pushed,
it
may
be
modified andthen
usedto
setthe
register
(or
anew valuemay be loaded
and usedto
setthe
register).The
instructions
SSW
Set
Status Word
Register
S IO
Set I/O Control Register
S S
PSet
Data
Stack
Pointer
S S
P ISet
Instruction
Stack
Pointer
SSP and SSPI are
different from
the
otherregister-setting
instructions in
that
they
canbe
executed conditionally.This
is
because,
like adj,
they
canbe
usedto
modify
programflow
whenappliedto the
IS.
Another
controlinstruction
whichdoes
not accessany
user-registeris
Clear
Interrupt
(CLRINT),
which setsthe
interrupt
statusto
normal.The
copy
ofthe
status word pushed onto
the
DS
whenthe
interrupt
occurredis
not restoredby
this
instruction;
this
mustbe done
programmatically.Also,
sincethis
instruction
simply
clearsthe
interrupt
status,
aninterrupt
process whichwasitself interrupted
is
n