• No results found

Novel array representation methods in support of a microcomputer-based APL interpreter

N/A
N/A
Protected

Academic year: 2019

Share "Novel array representation methods in support of a microcomputer-based APL interpreter"

Copied!
177
0
0

Loading.... (view fulltext now)

Full text

(1)

Theses

Thesis/Dissertation Collections

5-1-1985

Novel array representation methods in support of a

microcomputer-based APL interpreter

Daniel Fleysher

Follow this and additional works at:

http://scholarworks.rit.edu/theses

This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion

in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact

[email protected].

Recommended Citation

(2)

Novel Array Representation Methods

in Support of a Microcomputer-based APL Interpreter

by

Daniel Fleysher

A thesis, submitted to

The Faculty of The School of Computer Science and Technology,

in partial fulfillment of the requirements for the degree of

Master of Science in Com puter Science

Approved by

:

Guy Johnson

Professor Guy Joh nson

Jim Hammerton

Professor James Hammerton

Jack Hollingsworth

Professor Jack Hollingsworth

Peter G. Anderson

Professor Peter Anderson

(3)

Novel Array Representation Methods

in Support of a Microcomputer-based APL Interpreter

by

Daniel Fleysher

Permission to reproduce this thesis in whole or in part is hereby granted to the

Wallace Memorial Library of RIT, unless such reproduction is for commercial use or

profit.

Daniel Fleysher

(4)

1.2.

Abstract

Objective:

To

study

novel waysof

representing data

arrays

for

potential application

in

a

microcomputer-based

APL

interpreter. The

goal

is

to

find,

for

arrays

containing

mixed

integers

and real

numbers,

a

way to

improve

both

storage

efficiency

and

thruput,

over

that

obtainable

using

conventional

APL interpreter

array

representations.

Investigation:

For the

purposesof

this study, three

representative

APL

operators were chosen

for implementation

-

dyadic

addition,

multiplication and selection.

To

establish a setof

base

cases

from

which

to

work,

these

three

operators were

implemented

for

two

distinctly

different data

structures:

Case-0:

arrays

containing

only

fixed

length

floating

point

data

elements

Case-1

: arrays

containing

only

fixed

length integer data

elements

These

two

cases are

termed

"homogeneous"

because

all

data

elements within each

array

share a common

data

structure -

the

conventional approach

for

APL

interpreters.

Three

additional "heterogeneous" cases were

then

built

upon

the

homogeneous

base

cases:

Case-2:

arrays

containing

mixed

floating

point and

integer

fixed length

data

elements

Case-3:

arrays

containing

mixed

floating

point and

integerdata

elements,

with

the

integer

elements

having

variable

length

Case-4:

arrays

containing

fixed

length

pointers

to

variable

length

Case-3

data

elements

For

all of

these cases,

space and

time tradeoffs

werestudied and charted.

Exerciser

programswere written

in

BASIC to

drive

the

5 Case-n

implementations

to

enable

direct

comparison of

the

5

storageallocation

approaches; these

driver

routines prepared

test

data,

ran

the

addition/multiplication/selection

exercises,

retrieved

time

and space

measurements,

and performed

data

reduction

for

presentation

in

this

report.

The

5

Case-n

implementors

were written

in 6502 CPU

assembly

language,

and provided

the

functions

of

addition, multiplication, selection,

timing,

ancfdata

format

conversion

between BASIC

and

Case-n

data

structures.

Fixed length

floating

point arithmetic wassupported on

the target

microcomputer

for

which all codewas written -an

Atari 800. In

supportofmulti-byte

integer

arithmetic,

however,

original addition and multiplication atomic

functions

required

development.

(5)

elements.

This

produced

astonishingly fast

selection

thruput

for

some applications, and

dismally

poor selection performance

for

others.

At the

end of

the

report are suggestions

for future development

of

the

variable

length

data

element selection algorithm.

Case-4

(pointers

to

heterogeneous

variable

length

data

elements)

was

introduced

to

enable

the

conventional address calculation selection scheme

for

variable

length

elements.

The

addition of

the

pointers

did

not

have

much

impact

upon

thruput, but

the

additional space required

for

the

pointerserased

the

space savings achieved with variable

length

elements.

1.3.

Key

Words

and

Phrases

addition,

APL,

array,

data

structure,

floating

point,

heterogeneous, index, integer,

microcomputer, multiplication,

selection

1.4.

Computing

Review

Subject Codes

This

thesis

contains material which can

be

categorized under one of

the

following

three

Subject Code

classifications:

D.

Software

D.3

Programming

Languages

D.3.3 Language

Constructs:

Data Types

&

Structures

E. Data

E.2

Data

Storage Representations: Primitive Data Items

G. Mathematics

of

Computing

G.1

Numerical Analysis

(6)

1.5.

Table

of

Contents

1.

Prelimary

Information

1.1.

Title

and

Acceptance

Page

Frontpiece

1

.2.

Abstract

1-2

1

.3.

Key

Words

and

Phrases

1-3

1.4.

Computing

Review Subject Codes

1-3

1

.5.

Table

of

Contents

1-4

2. Introduction

and

Background

2-1

2.1

.

Problem

Statement

2-1

2.2. Scope

of

Investigation

2-2

2.3.

Previous

Work

2-4

3.

System Specification

3-1

3.1.

Data Structures

3-2

3.2.

Functions Performed

3-4

3.3. System Flow

3-4

4. Architectural Design

4-1

4.1.

Assembly

Language Implementors

4-1

4.2.

BASIC Language

Driver

4-1

4.3.

Memory Map

4-2

4.4. Hardware Utilized

4-4

4.5. Software

Utilized

4-4

5.

Detail

Designs

5-1

5.1.

Implementor-

Driver

Interface

5-1

5.2.

Overview

5-3

5.2.1. Initialization

5-4

5.2.2.

Loop Setup

5-4

5.2.3. Main

Loop

5-7

(7)

1.5. Table

of

Contents,

cont'd

5.3. Integer Addition

5-9

5.4. Integer

Multiplication

5-9

5.5.

Selection

5-11

5.6. Elapsed Time

Measurement

5-13

6.

Investigation

6-1

6.1. Integer Function Speed

6-1

6.2. Variable Length Data Elements: Space

Requirements

6-7

6.3. Variable Length Data

Elements: Thruput

6-10

6.4. Type Coercions

6-12

7. Conclusions

7-1

7.1. Thesis Validation

7-1

7.2.

Further

Work

7-2

7.2.1.

Fixing

the Case-3

Defect

7-2

7.2.2. Variable

Length

Floating-point Data Elements

7-3

7.2.3. Integer Multiplication

Lookup

Table Size

7-3

8.

Bibliography

8-1

9.

Appendices

9-1

I

Floating

Point Package

II

Data Tables

III BASIC Program

Listings

(8)

2.

Introduction

and

Background

This

section

describes

thepurposeofthis study, thepotentialapplication, theproblemsto

be

examined,and past workwhich

is

relevant.

References

to the

literature

ofthe

form

[n]

areto

be

found

in Section

8,

the

Bibliography.

2.1.

Problem Statement

APL

is

a concise and powerful

language. Unlike

mostother

high level

languages,

it

treatsarithmetic operationson

large

arraysof

data

as

if

thearrays were atomicentities. Theconstructionof

loops for

repetitively performing

anoperationupon all theelementsof an

array is

actually hidden

from

the user; as aconsequence,the

loops

can

be

can

be built very

efficiently,

executing

withsuch

low

overheadastoapproachtheperformanceofmachine

language implementations.

APL

is

also

typically

implemented

asan

interpreted language. This

providestheutmost

in

flexibility

and user

friendliness;

it isquite

common

for

usersto

build

and

debug

functions in

an

interactive

style,

greatly reducing

theamountof

planning necessary before coding

can

begin.

Thus,

APL

is

perfectly

suitedtoquick

development

of smalltomedium size

"data-crunching"

application programswitha minimum ofprogrammereffort.

Data-intensive business

andscientific applications are

gradually making

thetransition

from

the mainframeto thepersonal microcomputer.

Micros

are also

becoming

requiredtosupport

high

speed real-timeanimation of

high

resolution

images.

Theseapplicationsrequire

both high

speed calculation and movement of

relatively large

arraysof

data.

APL

is

thusa potential candidate

for

suchapplications, providedanAPL

interpreter for

the targetmicrocomputer

is

available.

Traditional APL

interpreters

adjustthe

internal

representation of

data

elementstooptimize storage utilizationand

processing

thruput

[14].

For example,small

integers

can

be

storedin

less

spacethan

floating

pointrealnumbers. In addition,

integer

representations can

be

processed

faster

than

floating

point

forms.

This

is especially

true

for

microcomputers,where

floating

point calculations are

invariably

performed

by

software

(rather

than

costly

specialized

hardware).

In summary,

by

optimizing

internal representation

for

the

data

being

represented,

both

storage space and

processing

timecan

be

saved.

This

is important

onamicrocomputer,where

both

space and processor powerare

critically

shortresources.

Traditional APL

interpreters

invariably

assigna singlerepresentation

for

all elementsof anAPL

array

atthetimeofcreation

(or

re-creation)

[14].

Thus,

all

data

elementsof an

array

are

forced

into

the samerepresentation which violates our

desire

to tailorrepresentationto

data. Of

course,

representing

all

data

elements

similarly

simplifiesthe

loops

thatAPLmustconstruct and executeto perform

iterative

arithmetic,as

only

one arithmetic routinetailoredtoa specific

data

representation need

be

called.

Moreover,

fixed-size

elements enable random accessto

array

components

(such

as rows, columns,or

indexed

specificelements),

because

theiraddresses can

be

calculated

directly

using

the

known

elementsize. Randomaccess

into

anarrayofvariable-sizeelementswouldrequire

sequentially

steppingthrough thearray

from

the

beginning,

unlessthe

array is

supplementedwith some sort of

index

table

(cf.

"beating"

and

"slack

representation",section2.3.).

For example,aparticular

array

containing

mostly integers

andsprinkledwith a

few

sparsely

scattered real elements,will

have

all elementsrepresented

internally

in

floating

point

form;

we

say

the

array

elements

have

a

"homogeneous"

(fixed

size and

type)

representation.

From

theviewpoint
(9)

The

subject ofthis thesis

is

to

study

alternativesto

homogeneous

arrays,

for

potential incorporation

into

amicrocomputer-basedAPL

interpreter.

Specifically,

variable

length integers

and

floating

point representationswill

be

mixedwithinthesamearray,

making it

"heterogeneous"

Thetradeoffs associatedwith

building

and

processing

such

heterogeneous

arrayswill

be

explored. Problemssuch astheeffect of

varying

elementtypeupon

loop

executionoverhead, and randomly accessing variable

length

data

elementswill

be

dealt

with. Theoverall goal

is

to

determine

whether

heterogeneous

representationscan

simultaneously improve

storageutilizationand processing thruputovertraditional

homogeneous

representations,

despite

theproblemsthatheterogeneous representations

introduce.

2.2. Scope

of

Investigation

This

thesisstudies

five

different internal

representations

for

APLarrays:

No. Name

Array

type

Element

Representation Element

Size

0

F.P.

base

homogeneous

fixed

(floating

point)

1 Integer

base

homogeneous

fixed

(integer)

fixed (6

bytes)

fixed (6

bytes)

2 Fixed Length

heterogeneous

mixed

(floating

point/

integer)

fi

xed

(6

bytes)

3

Var. Length

heterogeneous

mixed

(floating

point/

integer)

variable

(1-6

bytes)

Pointer

heterogeneous

pointerto

mixed

(floating

point/

integer)

fixed

pointer

(2

bytes)

tovariable

(0-6

bytes)

The

homogeneous

floating

pointand

integer

representationsare

base

cases,and are numbered "0" and

"1"

in

theabovetable. For

both

cases, theelement

lengths

are

fixed

at

6

bytes,

and

datatypes

ofallelements arethesamethroughoutthearray. The

homogeneous

cases

(0

&

1)

are referenced as

"floating

point

base

case"

and

"integer base

case"

respectively in

thisreport.

Three heterogeneous

representations are

built

uponthe

homogeneous

cases. In the

heterogeneous

cases,

both integer

and

floating

point elements existwithin agivenarray,

depending

uponthe

data

being

represented. In

Case-2,

the

"heterogeneous

fixed length

case",

both integer

and

floating

pointelement

lengths

are

fixed

at

6 bytes.

Althoughno space

is

savedwiththis

representation,

integer

elementscan

be

processed

by

faster

(integer)

functions

than their

floating

pointneighbors.

Of

course, the discriminationof

floating

point vs.

integer

elementsandtheselection ofthe

appropriate

processing

routine

introduce

undesirableoverhead.

Case-3

is

namedthe

"heterogeneous

variable

length

case"

This

case

is built

upon

Case-2,

but

introduces

variable

length

for

the

integer

elements.

Floating

pointelementsremain

6

bytes

in

length. The

objective ofthisrepresentation

is

tosavestorage space while

simultaneously

deriving

the

benefits

of

integer

processing.

However,

the

fact

thatelement

lengths

vary

withinan

array

impacts

the

ability

to randomlyaccess agivenelement.

Finally,

Case-4

introduces

fixed length

pointersto thevariable

length

elementsof

Case-3. Case-4 is

thusreferredtoasthe

"heterogeneous

pointer

case"

Theobjective ofthisrepresentation

is

to regainthesimplerandom accessaddressabilityof

fixed length elements,

while
(10)

Although

these

may be

thoughtof as

homogeneous

arrays

consisting

of

purely fixed-length

pointers, the

data

elementspointedtoare

heterogeneous.

To actually

test

processing

thruput

for

these5waysof

representing

APL arrays,threerepresentative

dyadic

(two-argument)

APLprimitiveoperationsare

implemented: array

addition, array

multiplication,

and

array

selection.

These

operations arechosen

because

they

are

fundamental

toall

APLprocessing.

Addition

providesa good

fast baseline

toexploretherelativespeed advantagesof

handling

arrays of various representations. Multiplication

is

chosen

because

ofthesignificant numerical

processing load it

presentstoa

CPU

withoutthe

benefit

of

hardware

assist.

Finally,

selection

is

chosen

because varying

the

data

element sizetominimize spacerequirements

destroys

therandom

accessibility

thatwas possiblewith

fixed

sizeelements,and

forces

serial access.

Thus

the

selection algorithm must

be radically

modified

in

ordertosupportvariable

length

elements,which

is

bound

toaffect

processing

thruput.

Addition,

multiplication and selection are

implemented for

eachofthe5

array

representationcases.

This

supportsthe three

primary

areasof

investigation

undertaken

in

thisthesis:

1)

Faster Integer

Functions

-Canaddition,multiplication and selection

really be

made

significantly faster

by introducing

integer

representations

for integer data

elements and

using integer functions

toprocess

them,

instead

ofalwaysusing

floating

point representations and

floating

point

functions?

The

benchmark

for

comparison

is

acommercial

floating

point software package

developed

by

Atari,

Inc.which

is included

withthe

built-in operating

system codeof

every

Atari Home Computer.

The

work

described in

section6.1

investigates

thisquestion.

2)

Variable Length Data Elements

Can

heterogeneous

arrays

be built successfully

usingmixturesof variable

length integer

and

floating

point elementstoreduceoverall spacerequirements?

Does

thespace overhead of

length

flags

and codeto

interpret

themconsumethespace savingsthatwould

have

resulted? Doesthe

processing

overhead

introduced

by

variable

length

elements

negatively

affectthruput?

Since

theconventionalprocessof selection

involves

thecalculation ofanelement's address

from

its

(fixed)

size and

index,

variable

length

elementsrequireanew algorithmtosearch

for

the

desired

elements,astheirsize

is

nolonger

known

or constant. Does this

destroy

thruput

for

the

selection

function? Could fixed length

pointerstovariable

length data

elements permitthe

re-introduction

ofstraight calculationofelement addresses? Withwhat

benefit,

and at what cost?

Thespace savingspotentialof variable

length

data

elements

is discussed in

section

6.2.

Processing

thruput

impacts

of variable

length

arrays was

investigated in

thework

described

in

section

6.3.

3)

Type Coercions: ImpactuponThruput

Whenaddition or multiplication

functions

encountera pair ofelementsto

be

processedwhich

have dissimilar

representations

(e.g.,

floating

pointvs.

integer),

willthe timerequiredtoconvert
(11)

2.3. Previous Work

From

its early

beginnings

in

the

late

1960's,

theactual

implementation

ofAPL Interpreters

has

remained

mostly

withinthecontrol ofcommercialcompanies. IBM Corporation's Research Division

did

virtually

all ofthework

in

the

late

1960's,

to

be joined in

the

1970's

by

LP.

Sharp

Associatesof

Canada [5].

Understandably,

papers published

by

theseand other

implementers

areusually

sketchy

on

implementation

details.

Asearch ofthe

literature for

topicsrelatedto this thesiswas

initiated

attheproposalstage.

Only

current

Proceedings

ofACM APL

Conferences

were

found

tocontain articlesrelevantto the

low-level design

orientation ofthisthesis. The

Bibliography

(section

8)

liststhose

papers

[9,

10,

11, 12, 14]

dealing

with

implementation innovations designed

tospeedupor extendAPL

interpreters.

References

[9]

and

[1

1]

aretypicalofthosepaperswhich

discuss

speed

improvements.

Reference

[9]

discusses

several methods

for

smart evaluationofAPLexpressions.

One

suchmethod,

"Beating",

appears applicabletoanAPL

interpreter implemented

onamicrocomputer:

data descriptors

are

introduced

whichpoint at elementsoftheactual array. APL

functions

suchasreshape,

take,

drop

andselectionneed

only

manipulatethe

data descriptors

toachievetheirresult-the

data itself

need never

be

moved.

Reference

[11]

discusses

several approachestoward

making

APLcompilation or partial compilation

feasible.

For example, tentative

binding

ofarithmeticroutines

into

loop

code eliminates

having

toselecttheappropriateroutine

for

the

array

element(s) to

be

processedon each

iteration

ofthe

loop.

For example,

if

theelementsare

known

to

be integer

(as

opposedto

floating

point),an

integer-optimized function

could

be

assignedto

do

thearithmetic

processing

withinthe

loop.

Suchan approach utilizesthe

fact

thatarrays are

homogeneous;

a checkwould

have

to

be

made ateach

loop

iteration

to verify that thecorrectroutine

had been

selected,

if

therewas a chancethat

data

elementrepresentationcouldchange withinthearrays

being

processed.

Reference

[14]

is

anexampleofapaper

proposing

a methodof

implementing

a

recently

proposed APLextension called

"enclosed

arrays"

This

paper

is

of particular

interest because it

contains

in

the

first few

paragraphsan overview of

how array

elementrepresentations are selectedin those

"traditional"

APLsystemsuniversallyreferencedand

rarely described in

the

literature. Reference

[14]

is

also

interesting

because it introduces

conglomerate

array

elementswhichcan

vary in

size within an array,aswould

my heterogeneous array

elements. Inordertoenable quick address

calculation

for

selectionand

indexing

functions,

theauthor

introduces

"slackrepresentation"

-i.e.,

fixed

sizereferencepointerstotheactual variablesize

array

elements,which

lend

themselves to

beating

as

described

above.

My

searchthroughthe

literature

attheproposalstage uncovered no other papers

bearing

on

heterogeneous array implementation.

References

[8]

and

[13]

are

descriptions

ofone

working

microcomputer-basedAPL

interpreter.

This

interpreter

is designed

toexecute onan

8080-class

microprocessor.

During

theproposal phase ofthe

thesis,

an

informal study

ofthisAPL

interpreter

wasperformed

using

aXeroxmodel

820

(12)

3. System Specification

This

section

describes

in

more

detail

theprograms

implemented in

supportofthis thesis. The

operationsofaddition,multiplication and selection

implemented here

are more

fully

specified

in

the

following

paragraphs.

Section 3.1

specifiesthe

data

structures employed

in

implementing

homogeneous

and

heterogeneous

arrays.

Section

3.2

describes

thespecific

functions implemented

tocreate and processthose

data

structures,and section

3.3

specifiesthe

flow

of

data

andcontrol

between

the

functions

implemented.

For

dyadic

APLaddition andmultiplication, the twoarrays must

have

either

identical

shapeor else

one must

be

a scalar.

The

additionormultiplication

is

performed,element

by

element,

producing

a

resultant array. Theshapeofthe resultant

is

thatofthetwoargumentarrays

(or

thatofthe

array

if theother argument was a scalar).

InAPL selection, a

"target"

array

contains

data

elementswhichare selected

by

integer

indices

contained

in

a second

"selector"

array. Theresultantarraycontains

data

elements selected

from

the target array,witha shape

conforming

tothatoftheselectorarray. In APL notation,the

index

of each

desired

element

is

expressed

in

theselectorarrayasann-tuple

containing

theposition within each

dimension. The

simple

data

structures

described below do

not supportn-tuples.

Therefore,

for

thepurposes ofthis study,theelementsoftheselector

array

will

be

simple positional referencesto the

(linearized)

data

elementsofthe targetarray. Theconversion ofn-tuple

index

representation whichwould

be

found in

realAPLprograms

into

thesimplepositional representation used

here is

a simplecalculation

involving

the

length

of each

dimension

ofthe targetarray. It

is

awell-understood algorithm,and

is

considered outsidethescopeofthis

implementation.

A

few

more comments

concerning

system specificationshould

be

stated

here.

In general,

for

each

homogeneous

and

heterogeneous

case

being

studied, theaddition, multiplicationandselection routines arecapable of

handling

all

input

argument representations

legal for

thatcase. Forall

heterogeneous

cases,theadditionand multiplicationroutines calculatenot

only

thevalue of each

resultant

array

element,

but

alsotheoptimal representation

for

it,

withintheconfines ofthecase

being

studied. For example,

in

thevariable

length

case

(Case-3)

if

two

floating

point elementsare

addedto producea resultelementwhich

is best

representedas a

2-byte

integer,

then thatelement

will

be

stored assuch withintheresultant array.

The

heterogeneous

pointercase

(Case-4)

arrayscontain

both data

elements and pointersto them. Pointerscanreference

data

elementswhicharenot

necessarily in

thevectorof

data

elements associatedwiththepointers. That

is,

thiscase

is

optimized

for

common

data

values

(the

identity

elements"0"

for

addition,

"

1"

for

multiplication):

These

constantsarestored

in

thepointer case

(Case-4)

implementationcode. An

array

might containpointers

referencing

these

(common)

values

by

pointing

to them

in

the

implementor

code. Thusno

corresponding data

elementcomponents would

be

required

in

the

array itself. This has

thepotential

for saving

thestorage spacethatwould

be

required

for duplicate

common values.

Pointers

could also

be

used as

in

reference

[14]

"slack

representation" tosavethe

copying

of

data

elements

from

an argument

array

to theresultant array. Such

data

elements couldthen

have

more thanone pointerelementreferencing

them,

soan actualAPL

interpreter

would require each

data

elementto

carry

ausage counter

byte

or some othermeansof

notifying

garbage collection routines whena

data

element'sstoragecould

be freed. Since

garbage collection

is

beyond

thescopeofthis study,counter

bytes

are omitted

from

thepointercase

implementation,

andall

data

elementsnot equalto"0"

or" 1"

(13)

3.1. Data Structures

Regardless

ofwhichcase

is

being implemented,

all arrays

have

thesameoverall structure. Each

has

a

header

whichcontains shape

information,

followed

by

a

data

areawhichcontains a

linearized

vector

ofthe

data

elementsthemselves:

Header

Data Area

Thestructure ofthe

header is depicted below.

The

first byte

containstherank ofthe

array

- that

is,

thenumber of

dimensions.

The

dimension length

specifications

follow

therank

byte.

Each

dimension length is 2 bytes long.

#dims

"n"

(rank)

length

of

dimension

#1

length

of

dimension

#n

Byte 1

Bytes

2

& 3

Bytes

(2n)&(2n

+

1)

Thus

a

header

for

a scalar

data

elementwould contain

only

a singlezero

byte,

whilethe

header

for

a 127-dimension

array

would contain a

leading

byte

withthevalue of

127,

followed

by

127pairsof

bytes,

each

containing

the

length

oftheappropriate

dimension.

Avalue of zero

for

any

dimension

length

results

in

APL's

"empty

vector"

The

data

area which

follows

theheadercontains a series of

data

elements arranged

in

row-major order

(APL

standard). The number of

data

elements

in

the

data

area

is

preciselytheproduct ofthe

dimension

lengths in

the

header

(except for

ascalar,which

has

one

data

elementinthe

data

area).

data

element #1

data

element#2

data

element

#dim-|

dim2

...

dimn

Anexception

is

the

data

arraystructure

for

thepointercase

(Case-4),

which

is divided into

twoareas

-avector of

fixed

pointers,and avectorof

data

elementswhich

they

(in

general)pointto.

Header Pointers Data Elements

Thestructureofthe

data

elements

is

what varies

from

casetocase.

This

structure

is

illustrated

for

eachcase

in Figure

3-1.
(14)

Case-0:

F.P

base

case

Case-1

:

Int.

base

case

Case-2:

Fixed

Leng.case

Case-3:

Var. Leng. case

Case-4:

Pointer case

+ expon't

I

m

1

a n

1

t

i

s s a

1

1

1

+

dummy

1

1

v a

1

1

u e

1

1

1

+ expon't

/flag

1

m a n t i

1

s s a

1

or v a 1

1

1

u e

1

+ expon't

/flag

m a n t i s s a or v a

1

u e

I

pointer

;

+ expon't

:

[image:14.530.34.503.89.468.2]

/flag

:

m a n t

i

s s a or v a u e

;

Figure 3-1

.

Structure

of

Individual Data Elements

The

heterogeneous fixed length

case(Case-2)

has

arrayswithelementsof

fixed

size,

but

oftwo

different

(mixed)

representations. The

leading

byte

(used for

theexponent

in

floating

point

representations)

flags

an

integer

representation

by having

thevaluezero,an

illegal

exponentvalue.

(See

Appendix I

for

acomplete

discussion

ofthe

floating

point package andthe

legal

rangeof

exponentvalues.)

The

heterogeneousvariablelengthcase

(Case-3)

extends

Case-2

by

introducing

variable size

integers.

Leading

byte

values of2through

6

indicatean

integer data

element of

length

2 through

6

bytes

(containing

a 1 through5

byte

value,respectively). Thespecialvalueof

0 for

the

leading

byte

indicates

a

data

element valueofzero,and 1

indicates

avalueofone:thesespecialcasesare

one-byte

representations. Becausethe

lead byte for

floating

point representation

is

fully

occupied

by

the exponent, there

is

noroom

for

a

byte

counterand

floating

point

data

element

length

remains

fixed

at

6 bytes.

(15)

actual

array data.

InaCase-4 array, theset of

fixed

size pointers

immediately

follows

the

array

header bytes

which

define

the

length

ofeach

dimension.

In

turn,

thesepointersare

followed in

general

by

thevariable size

data

elementstowhich

they

point.

However,

pointerstocommon values

(0

and

1)

can reference such values

in

the code,as mentioned above

in

section3.1. Insucha case there

is

no

data

element

corresponding

to the pointer,

resulting in

a spacesavings.

3.2.

Functions Performed

Foreach ofthe

five

cases of

array

elements

described

above,asetofthree

assembly language

routines are writtentoprovidethe threeprimitiveAPL operations, addition,multiplicationand selection. Each routinetakesas argumentsthestart addresses oftwooperand arrays andthestart addressofthearea reserved

for

theresultantarray. Inadditionto

leaving

theresultant

array in

the

designated

area,eachroutine makes availableto

its

callera measurementof actual executiontime and

array

size.

This

and other

data

communicationsoccur via an

interface

table. The

interfaces

to the

assembly language

routines

for

eachofthe

five

cases are

identical,

sothat

they

can

be

called

from

acommon

driver

program.

The

driver

program

is

writtenin BASIC. Itprovidesa

flexible

testenvironmentand

interface

for

the

user. Itgeneratestest

data,

runsthe

desired assembly language

routinestoprocessthe

data,

retrievesand

displays

theresults,and calculatesandstores statistics such as storage

efficiency

and mean executiontime.

The BASIC

interpreter

onwhichthe

driver

program runs

deals exclusively

in 6-byte

floating

point numerical representation.

Thus,

auxiliary assembly language

routinesareneededtoconvert

back

and

forth between BASIC'S

straight

floating

pointrepresentationandthemoresophisticatedarray representationsutilized

in

the

five

cases. Conversion

is

provided

in both directions between

two

BASIC

arrays

(one containing

test

data

and theother

containing

shape

information)

and a

Case-n

array

incorporating

both.

Floating

pointto/from

integer

element conversion

is

also provided.

3.3. System Flow

The

following

chart

lists

all ofthe assembly

language

routineswritten

for

each ofthe

five

cases:

Routine

Arguments

(addresses

of:) Return Value

Side Effects

Arithmetic:

ADD

MULT

SELECT

2argument& 1 resultantCase-narrays

2

argument& 1 resultantCase-narrays

2argument

&

1 resultantCase-narrays

errorcode

errorcode

errorcode

executiontime

executiontime

executiontime

Conversion:

FLTTOCn 1 BASIC

data,

1 BASICrank

input

arrays,and errorcode 1 Case-nresultant

array

CnTOFLT 1 Case-n

input

array,and error code

1 BASIC

data &

1 BASICrank resultant

array

executiontime

(16)

The

aboveroutinesoperateonthe

following

storagearrays,

defined

and reserved

by

the BASIC

driver

program:

BASIC-compatible

floating

point

data

arrays:

FA,

FB

&

FR

BASIC-compatible

floating

point rankspecificationarray:

FS

Space

reserved

for Case-n

arrays:

A$,

B$ &

R$

BASIC

Driver

Program

Prepare

Test Data

Assembly

Language

Routines

DATA

Process Test

Data

Convert

Floating

Point

To Case

ADD

MULT.

Analyze Results

SELECT

Convert

Case-N

To

Floating

Point

BASIC Arrays

FA,

FB,

FS

(Floating

Point)

Case-n Arrays

A$,

B$,

R$

[image:16.530.46.497.172.544.2]

BASIC

Arrays

FR,

FS

(Floating

Point)

Figure 3-2. System Flow

Figure 3-2

diagrams

the

flow

ofcontrol

during

exercising

ofthe

assembly language

routines.

First,

the

BASIC driver

program preparestest

data in

arraysFA &

FB,

places shape

information in array

FS,

and callsFLTTOCn toproduceCase-ncompatibletest

data in

storage areas

A$

&

B$,

respectively.

Next,

the

desired

arithmeticroutine

is

called,whichprocessesthearrays

in

A$

and

B$,

leaving

the resultant

array in

R$.

Finally,

theresultant can

be

made

BASIC-compatible for evaluation,

by

calling

CnTOFLT

which

leaves

theshape oftheresult

in array

FSandthe

data

result

in array FR.

(17)

values

between

the

BASIC driver

program andthe

implementor

module.

Among

other

things,

this tableprovidestheaddressesofthe

assembly language implementor

routines, theelapsedtimetaken toprocessthe

data

by

an

implementor function

andthesizeoftheresultant array.

This

table

is

(18)

4. Architectural

Design

The

following

sections

describe

theoverall

design

ofthesoftware

implemented

in

support ofthis

thesis,

andtheenvironment

in

which

it

operates.

4.1.

Assembly

Language Routines

Thesoftware

designed

to

implement

thethreeAPL

functions

under

study

waswritten

in 6502

Assembly

Language

on anAtari

800

home

computer. Foreach case of

data

representation,a separate stand-alone module

implements

addition,multiplication and selection.

Since

thereare5 suchcases, thereare

5

independent (although

related)

"implementor"

modules. Inadditionto the threearithmetic

functions,

each module contains utilities

for measuring

elapsedtimeand

for

converting data

representations

back

and

forth between

an

internal form

specificto thecase

being

studied,and an external

form

compatiblewithAtari

BASIC

(see

following

section). Theaddresses of all

functions,

utilitiesand

data

pointers are

held in

atableatthe

beginning

of each

implementor

module.

A

driver

program

for exercising

thismodule can

interface

with

it

viathistableof pointers.

Since

the tables

in

all

implementor

modules

have

thesame

format

and

memory

location,

the

driver

program

does

not needto

know

whichcaseof

data

representation

is

being

exercised. An

implementor

module can

be

replaced

in memory

by

overlaying it

with another

(from

disk),

andthe same exercise canthus

be

performed

for different

casesof

data

representation,

allowing easy

comparison of results.

Also

written

in

Assembly

Language

is

autilitywhichcallsan

implementor

module

in

from disk

and

loads it into

theassigned

memory

area,

overlaying

theprevious contents.

4.2. BASIC Language

Driver Program

Asingle

driver

program waswrittentoexercisethe

implementor

modules

described

above.

This

program preparestest

data,

exercisesthe

currently loaded implementor

module,and printsor evaluatestheresults.

The Basic driver

programcontains a

data

declaration

header

which

exactly

matchesthe

format

ofthe

interface

table

defined in

the

Assembly

Languagecode. Also

defined is

theaddress ofthe utilityused

for

loading

an

implementor

module

into memory from disk. This

gives the

Basic

programthe

ability

to

"swap

in"thevarious

implementor

modules.

There

is

a "hole"

built

into

the Basic

driver

program where

any

ofseveral exercisersubprograms can

be inserted.

Thesearealsowritten

in

BASIC,

and

become

part ofthe

driver

program. Eachexerciser subprogram containsaparticularseries ofBASICstatements

for creating

test

data,

calling

(19)

4.3.

Memory

Map

Figure

4-1showsthe

memory map

oftheAtari

800

home

computer.

Each block in

the

memory map

is described

below.

Page Zero RAM

-special

6502

instructions

are available whichaccesstheseaddresses. Page

Zero

accesses are

faster

thanother

memory

accesses,and

indirect

pointerscan

only

reside

in

Page Zero.

The

Atari

O.S.

and

BASIC

interpreter

reserve most of

Page

Zero,

but

a

few locations

are available

for

applicationssuch asthisproject.

6502 Stack

- the

6502

usesthisarea

for saving

return addresses and processorstatuses.

Temporary

data

can also

be

saved onthestack.

Atari

O.S.

Working

Storage

-memory

used

by

the

operating

system

for

flags, buffers,

etc.

Spare

-thisarea

is

available

for

use

by

applicationprograms,

but

was not used

for

thisproject.

Atari

DOS & DOS

Working

Storage

-memoryused

by

the

disk operating

system

for its

code,

flags,

buffers,

etc.

Case-n

Overlay

Area

-each

implementor

module

is

assembledtostartatthe

beginning

ofthisarea.

One implementor

module

is

resident at atime.

The

space

is

used

for

code,

flags

and

buffers.

Load &

Memory

Mgmt.Utilities-two

utilities written

in

Assembly

Language occupy

thisarea.

They

are read

in

from disk

at

disk boot-uptime. One

ofthemreservesthe Case-n

Overlay

areaand runsat

disk boot-uptime. The

other

utility loads any

ofthe 5

implementor

modules

into

the Case-n

Overlay

Area,

and

is

callable

from

BASIC

underoperator orprogramcontrol.

BASIC

Working

Storage &

Program

Area

- this

area containstheBASIC

driver

program and overlaid exerciser modulewhichprepares

data

toexercise

implementor

modules,callsthe

implementor

modules

into

memory,exercises

them,

retrievesthe results,and prints or evaluatestheresults.

Also

present are

buffers

for

thepreparation oftest

data.

Display Working

Storage

- this

area containsthe

display

buffer

and

display

list

(a

display

hardware

control program).

It

is

managed

by

the Atari O.S.

BASIC

Interpreter ROM- theplug-inBASIC

interpreter

ROMoccupiesthis

block

of

memory

addresses.

Unused

-not occupied

by

any

memory

or

devices;

for

future

expansion.

Hardware I/O

- memory-mapped

device

addresses,

for controlling

theoperation ofthe

display

hardware,

game controllerports,etc.

Floating

Point

ROM

- this

block

of

memory

addresses

is

occupied

by

the

internal

Atari

floating

point routines.

These

areutilized

by

BASIC,

and also

by

implementor

module codewhen

floating

point operations are required.

Atari O.S.

ROM-thecode ofthe Atari operatingsystem resides

in ROMs

which

occupy

this

block

of memory.

This

code

implements

byte-leveland record-level

I/O functions

to/fromthe

display,

(20)

Decimal

Address

65535

Hex Address

57344

55296

53248

49152

40960

=

39947

16384

16128

11264

1792

1536

512

256

0

Atari O.S.

ROM

Floating

Pt. ROM

Hardware

I/O

(unused)

BASIC

Interpreter

ROM

Display Working

Storage

BASIC

Working

Storage

& Program

Area

Load &

Memory

Mgmt.

Utilities

Case-n

Overlay

Area

Atari

DOS

&

DOS

Working

Storage

Spare

Atari

O.S.

Working

Storage

6502 Stack

Page Zero RAM

FFFF

E000

^D80T

D000

C000

A000

=

9C1F

Atari

Floating

Point

Routines

"ROM

rRAM

BASIC

Driver

Programs &

Data

Structures

Utilities to

Reserve &

Load

Case-n

Overlay

Area

2C00

Area

Where

Case-n

Implementor

Modules

are

Overlaid

700

600

A

few bytes

available

for

use

by

Case-n

[image:20.530.45.494.71.692.2]

Implementor

(21)

4.4.

Hardware Utilized

Atari 800 Home

Computer,

with

48K

bytes

ofRAMandatotalof

26K

bytes

of

ROMs

Percom

RFD-40

5i"

double

density floppy

Disk Drive

Atari

410

program

Tape Recorder (for

back-up)

Centronics 739 Printer (for local

listings)

Multi-Tech FM-30

modem

(for

remote

listings)

Atari

850 Interface Module

RCAXL-100

19"

televisionset

(monitor)

4.5.

Software

Utilized

Atari

BASIC

ROMcartridge-provides

flexible

easily

programmed

driver/test

environment

Atari

Assembler/Editor

ROMcartridge

-providesCase-n

implementor development

environment
(22)

5.

Detail

Designs

This

sectionconcentratesuponthe

design

ofthe

implementor

modules,asthesearethe

basis

ofthis thesis.

First

the

detail design

ofthe table thatprovides communications

between

theBASIC driver program andthe

implementor

module

is

described.

Then

the

design

ofthe implementormodules

is

outlined,

followed

by

design details behind

important

sections ofthecode. Complete implementor module

Assembly

language

source

listings for

Case-0,

-1,-2,-3and-4appear

in

Appendix Mil.

5.1

Implementor-

Driver Interface

The

interface

whichprovidescommunication

between

the BASIC

driver

programandtheCase-n

implementor

being

exercised

is

atableofaddressesand

data

registers,whose structureand memory

location is defined in both

worlds.

The BASIC

driver

program containsthevariable

declarations

listed in

the

left

column of

Figure

5-1. Theequivalent

Assembly

Languagestatementsshown

in

the right column are part ofthe

file

DEFS.

ASM,

which

is included in every implementor

-seethe

listings

ofAppendix INI.

Excerpt

from BASIC

Driver Program Excerpt

from

Assy. Language DEFS. ASM

17 REM ***************************** 0520 *= $2C00 ;T0P OF OSS DOS, 11264 DECIMAL 18 REM *DEFS OF ASSY CODE REGISTERS*

0530 ;C0MM0N POINTERS AND REGISTERS 19 REM ***************************** 0540

; FOR COMMUNICATION WITH BASIC

0560 ; POINTERS TO ROUTINES CALLED FROM BASIC

0570 ; DECIMAL ADDRESS

20 LET AFLTTOCASE=11264:REM HEX $2C00 0580 AFLTTOCASE .WORD FLTTOCASE , 11264 22 LET ACASET0FLT=AFLTT0CASE+2 0590 ACASETOFLT .WORD CASETOFLT 11266

24 LET AADD=ACASETOFLT+2 0600 AADD .WORD ADD 11268

26 LET AMULT=AADD+2 0610 AMULT .WORD MULT 11270

28 LET ASELECT=AMULT+2 0620 ASELECT .WORD SELECT 11272

0640 ; FLOATING AND CASE-N BUFFER POINTERS 0650 PTRBASE

32 LET FLTA=ASELECT+2 0660 FLTA .WORD 0 11274

34 LET AADR=FLTA+2 0670 AADR .WORD 0 11276

36 LET FLTB=AADR+2 0680 FLTB .WORD 0 11278

38 LET BADR=FLTB+2 0690 BADR .WORD 0 11280

40 LET FLTR=BADR+2 0700 FLTR .WORD 0 11282

42 LET RADR=FLTR+2 0710 RADR .WORD 0 11284

44 LET DADR=RADR+2 0720 DADR .WORD 0 11286

0730 ;MISC. STORAGE REGISTERS

46 LET LC0UNT=DADR+2 0740 LCOUNT .WORD 0 11288

48 LET TIMER=LCOUNT+2 0750 TIMER .BYTE 0,0,0 11290

50 LET VCOUNTER=TIMER+3 0760 VCOUNTER .BYTE 0 11293

52 LET TMPCTR1=VC0UNTER+1 0770 TMPCTR1 .BYTE 0 11294

54 LET TMPCTR2=TMPCTR1+1 0780 TMPCTR2 .BYTE 0 11295

56 LET DELTAA=TMPCTR2+1 0790 DELTAA .BYTE 0 ,11296

58 LET DELTAB=DELTAA+1 0800 DELTAB .BYTE 0 ;11297

60 LET DELTAR=DELTAB+1 0810 DELTAR .BYTE 0 ;11298

62 LET DELTAD=DELTAR+1 0820 DELTAD .BYTE 0 ;11299

64 LET INHIBDMA=DELTAD+1 0830 INHIBDMA .BYTE 0 ;11300 66 LET SCALASW=INHIB0MA+1 0840 SCALASW .BYTE 7 ;11301

68 LET SCALBSW=SCALASW+1 0850 SCALBSW .BYTE 7 ;11302

Figure 5-1

.

Implementor

(23)

The

first five

entries

in

the

Interface

tablearetheaddresses ofthe

five

entry

pointstoeach

implementor.

The Assembler

fills

in

these tablevalues.

When

an

implementor is loaded

the

addresses,which

vary from

implementor

to

implementor,

are availableto the

calling

BASICprogram

in

fixed

memory locations.

The

BASIC

command "PEEK"

is

usedtoreadtheaddressesofthe

entry

points out ofthe

fixed

table

locations,

sothat the

corresponding

routinescan

be

called

directly

with the

BASIC

command "USR"

The

remainder ofthe

interface

tableentries are registers

containing

strategiccontrol

data

withinthe

implementor

module.

They

are

included in

the

interface

table toprovidestatistical

feedback

to the

BASIC

driver

program

following

theexecution oftheselected

implementor function.

Accessto these strategic registersalso

facilitates

some

implementor

module

debug

from

the BASICenvironment.

The

rest ofthissection

describes

theuseofeach register.

Section 3

discusses

thememory-resident

data

arrayswhichsupportthe

exercising

of an implementor

module.

These

arrays are reserved

by

the

BASIC

driver

program. Theiraddresses arepassed

in

the "USR"

callto the

implementor

routines,whichstorethem

into interface

registers

FLTA, AADR, FLTB,

BADR,

FLTR,

and/orRADR

defined in Figure

5-1.

These

registers are advanced

during

implementor

executiontopointto the

array data

elements as

they

are

being

processed.

Following

execution,the

registers

may be

read

from BASIC

to

determine how

much of an

array

was processed. DADR

is

a special pointer used

for Case-4

topointto the

data

elementregionof apointercase

(Case-4)

array.

Moving

down

the

interface

table

definitions,

LCOUNT

is

thestorageregister

for

the

loop

iteration

counter,which

decrements

tozero asthe

array

elementsare processed.

TIMER

is

athree-byteregister whichreturnstheelapsedexecutiontimewitha resolutionof

roughly

a

60th

of a second.

VCOUNTERisa

one-byteregister whichextendsthisresolutiontoabout1

millisecond.

Timing

facilities

are

discussed further in

section5.6.

TMPCTR1 & 2

have

miscellaneous uses

during

execution,and providevisibility

for

debug.

DELTAA,

DELTAB,

DELTARand DELTADarethe

increments

by

whichregisters

AADR,

BADR,

RADR

and DADRare

incremented

uponeach

loop

iteration

-that

is,

they

aresettothe

length

ofthe elements

being

processed. Initializationofthesevariables

is

as

described in

section5.2.2.

INHIBDMA

is

a

boolean

switch whichenablesthe BASIC

driver

programtocontrol whether ornot

display

DMA

is

shut off

during

implementorexecution

(to

stabilizetimemeasurements).

SCALASW

andSCALBSWare used

in

thevariable

length

case

(Case-3)

and and pointercase

(Case-4),

as

described in

section 5.2.2.

They

inhibit

incrementing

theAADRand BADRregisters

if

argumentsA
(24)

5.2 Overview

The

overallexecution

for

arithmetic

implementor

module operations

is flow-charted in

Figure 5-2.

Each

ofthe 4main

blocks in

the

figure is

annotated withthesectionnumber where

it is

described.

The entry

point"START" receives controlwhenthe

BASIC

driver

program makestheappropriatecall. Acompletioncode

is

returnedto the

BASIC driver

program attheend ofexecution,

indicating

success/failure.

C

START

115.2.1

INITIALIZATION

5.2.2 LOOP SETUP

Error? Y

5.2.3

N

MAIN LOOP

Error?

5.2.4 CLEANUP

[image:24.530.111.437.174.623.2]

C

OKAY RETURN

)

C

ERROR RETURN

)

(25)

5.2.1

Initialization

Upon

being

called

from

BASIC,

the

Initialization block

tests

INHIBDMA,and (if

set) turnsoffthe computer

display

DMA

function

(to

stabilizetimemeasurements). Itclears and startsthe interval

timerwhich will

be

usedtomeasure elapsed execution

time,

unstacks argument valueswhich

accompaniedthecall

from

BASIC,

andstoresthem

in

appropriate registers. Theseargumentsconsist oftheaddressesofthearraysto

be

manipulated- the

space

for

thesearrays

has been

preallocated

by

the

BASIC

driver

program.

Finally,

Initialization

sets

up

theMain

Loop

withthe

desired

calculation

function

-addition, multiplication,etc.

5.2.2

Loop

Setup

The

Loop Setup

block

performsrank and shapecalculationstosetup theresultant's

header,

and calculatesthenumber of

loop

iterations

reguired ofthe MainLoop. The

logic for

Loop

Setup

is

flow-charted

in Figures 5-3

and

5-4.

Upon

entry

to the

flow

chart,variable

'A'

pointsto thestartofthe

Case-n

argumentA array,

'B'

pointsto thestartoftheB array,and

'R'

pointsto thestartofthe

buffer

reserved

for

the

R

resultant array.

Thus,

'A',

'B'and

'R'

pointto the

beginnings

ofthe

headers

ofthe

corresponding

arrays.

Upon

successful exit

from

the

flow

chart,

'A'

has been incremented

pastthe

header

topointto the

first

A

data

element,

'B'

to the

firsts data

element and

'R'

to the

first

Rresultant element. The R

header

ahead ofthis

first

Relement

has been

calculated and

filled in

with

both

rank and shape.

AA,

AB

and

AR,

the

increment

to

'A',

'B'and

'R'

pointers attheendof each

loop

iteration,

have been

filled

in (for

cases of

fixed

elementsize)-witha value of

0 if

theargument

is

ascalar,and avalue of

the

data

element size

(6)

if

theargument

is

an array.

(The

definitionofa

dyadic

operation

between

an

array

and ascalarrequiresthatthescalar

be

repeated

for

each member ofthearray. This

is

implemented

by

zeroing incrementation for

scalar

'A'

and/or

'B'.)

Lastly,

thenumberof

loop

iterations

'L'

is

calculated.

Error

exits occur

from

the

flow

chart

if

theargumentsAandBare arrayswith

dissimilar

rankor

shape.

The

notation

for

thevalue ofthecontents ofpointer

'A'

is (A).

That

is,

atthe

entry

to the

flow

chart,

(A)

is

the

first item

ofthe

array A's

header,

which

is

therank(numberof

dimensions)

of

array

A,

referredtoaspA. Successiveelements

in

the

header

following

therankarethenumber ofelements

per

dimension

(the

shape). Thetotal numberofrequiredmain

loop

iterations

'L'

is

calculated

by

finding

theproduct ofthese

dimension lengths

usingaFOR

loop

of

'rank'

iterations.

For thevariable

length

case

(Case-3)

andpointercase

(Case-4)

whereelement sizes are not

fixed,

AA,

AB

and

AR

arecalculatedwithinthe Main

Loop,

and notwithin

Loop

Setup.

Instead,

Loop Setup

supplies

boolean

valuestosignal the Main

Loop

whether ornotto

increment

'A'and/or

'B'

Thus

for

these two cases,

Figure

5-4replacesthe initializationof

AA,

AB

and

AR

withthe

setting

of

SCALASW

& SCALBSW.

During

theconstructionof pointercase

(Case-4)

arrays,register

'R'

addressesthe

2-byte

pointers,and

anotherregister 'D'

pointsto thevariable

length data

elements.

AR

takesona

fixed

value of

2

because

ofthe

fixed

length

ofthepointers,and

AD is

the

length

oftheprevious

data

element.

AR

(26)

C

LOOP SETUP

AR<-0

(R)-0

L-1 R<-R+1

AR<-6

(R)-(B)

L-1 FORI=

[1..pB]DO

{L<-L*(B,);

(R|)<-(B,)}

R-R+ pB

AB-0

B<-B+1

AR<-6

(R)-(A)

L<-1 FORI=[1..pA]DO

(L^L*(A|);

(R|)-(A|)}

R<^R+pA

AB<-6

B-B+pB

AR<-6

(R)-(A)

L-1 FORI=

[1..pA]DO

{IF(A,)*(B|)THEN

EXIT ELSE

L-L*(A|);

(R,)*-(A|)>

R-R+pA

AB<-0

B-B+ 1

AA<-0

A-A+1

AB-6

B-B+pB

AA<-6

A-A+pA

[image:26.530.43.490.84.691.2]

OKAY RETURN

C

ERROR

RETURN

)

(27)

(R)<-0

L-1 R-R+1

D-R+L*

C

LOOP

SETUP

)

(R)HB)

L-1

FORI=

[1..pB]DO

{L<-L*(B,);

(R|)-(B|)}

R<-R+pB

D-R+L*

(R)-(A)

L<-1

FORI=

[1..pA]DO

(L^L*(A,);

(R,)-(A,)>

R-R+pA

D<-R+ L*

SCALBSW<-0

B-B+1

SCALBSW<-1 B<-B+pB

(R)-(A)

L-1

FORI=

[1..pA]DO

{IF(A|)*(B,)THEN

EXIT ELSE

L<-L*(A,);

(R|)-(A,)}

R<-R+pA

D<-R+ L*

SCALBSW<-0 B<-B+ 1

SCALASW<-0 A<-A+1

SCALBSW<-1 B-B+pB

Case-4

only

SCALASW-1 A<-A+pA

[image:27.530.42.491.86.687.2]

OKAY RETURN

C

ERROR

RETURN

)

(28)

5.2.3

Main

Loop

Figure 5-5

showsthe

logic

ofthe Main

Loop

block

of

Figure

5-2.

Within

the Main

Loop

oneach

iteration,

pairsof

A&Bargument data

elements areprocessed,

producing

a resultant

data

element which

is

stored

into

the Rarray. Upon

entry

to the Main

Loop,

pointer'A'containstheaddress ofthe

first

element of argument

A,

having

been

advancedpast

A's header

by

the

Loop Setup

block (qv).

Similarly,

pointer'B'addressesthe

first

element of argument

B,

and pointer'R'containstheaddress

wherethe

first

resultant elementwill

be

written

(past

theR

header).

Execution

ofthe

Main

Loop

proceeds

for

L

iterations,

where 'L'

wascalculated

during Loop

Setup. At

the

beginning

ofeach

iteration,

thenexttwoargument

data

elementsaddressed

by

(A)

and

(B)

respectively

are

loaded into

a pairofpage zero registers

for processing

by

thearithmetic

function

selected

by

the

Initialization

block.

Incaseswhere

data

elementrepresentations can

differ,

the

simpler

data

element

is

coerced

into

themore complex

data

element'srepresentation,andthemore complex

processing

algorithm

is

chosen.

The

arithmetic

function

processesthetwo

data

elements

in

thepage zeroregisters,and

leaves

theresultant

data

element

in

oneoftheregisters. Incaseswhere

data

element representations aremixed,anattempt

is

madetocoercetheresultant

data

element

into

a simplertype

(such

as

rounding

off

floating

point3.99999...to

integer

4). Then the

data

element

is

stored atthe

location

addressed

by

(R).

At theconclusionofeach

iteration,

pointers

'A',

'B'and 'R'are incremented

by AA,

AB

and

AR

respectively,sothat

they

willpointto thenext elementsto

be

processed onthe

following

iteration.

At theend of

iteration

number

'L',

in

general

'A',

'B'

and

'R'

pointtothe

byte foil owing

the

last

element ofthe

corresponding

arrays.

The

exceptionto thisstatement occurs when

A,

Band/orRare

scalar:

in

such a case no

incrementation

takes place,andthepointer continuestopointtothe

first

and

only

element ofthescalar

data

structure.

Figure

5-5must

be

modified

slightly

to

handle

thevariable

length

case

(Case-3)

and pointercase

(Case-4),

where element size

(and hence

theamount ofincrementationattheconclusionof each

loop

iteration)

is

not

fixed.

For these twocases

AA,

AB

and

AR

are calculated

by

thearithmetic

function

executed atthe

beginning

of each

iteration,

not

by

Loop

Setup. Attheendofeach

iteration

the

decision

whetheror notto

increment

eacharraypointer

is

provided

in

Case-3and

Case-4

by

booleans

SCALASWandSCALBSWwhich are

initialized

by

Loop

Setup. Avalue of

0

indicates

a scalar

requiring

no

incrementation;

avalueof 1

indicates

anarray

requiring

incrementation.

For

thepointercase

only

(Case-4),

theregister'D'

is

incremented

by

AD

attheconclusionof each

iteration.

For thiscase

AD is

calculatedasthe lengthofthejust-calculated resultant, and

AR is fixed

at

2

-

because

theregister

'R'

addressesthe 2-bytepointersoftheCase-4array.

5.2.4

Cleanup

The

final

block in

Figure 5-2

is

the

Cleanup block,

whichstopsthetimerand storestheelapsedtime
(29)

C

MAIN LOOP

)

Load Data Elements

at

(A)

&

(B)

Call Specified Arithmetic

Function

(Add

or

Multiply)

Error?

N

Store Data Element

at

(R)

A<-A+

AA

B<-B+

AB

R<-R+

AR

L-L-1

C

ERROR RETURN

)

*

modification

for Case-3

'

calc.

Aa,

Ab&Ar

'

. A-A+

Aa*scalasw

'

. B-B+

Ab*scalbsw

'

r<-r+

Ar

'

ll-i

;

*

modification

for

Case-4

'

calc.

Aa,

Ab&Ad

'

AR-2

'.

. A-A+

Aa*scalasw '

. B-B+

Ab*scalbsw

'

r-r+

Ar

d-d+

Ad

l-l-i

:

C

OKAY RETURN

^

[image:29.530.42.499.80.690.2]
(30)

5.3

Integer Addition

Integer

addition

is

implemented

using

asimplealgorithm.

First

thealgebraicsignsofthetwo addendsarecompared. If

they

arethe same, thesign oftheresult

is

thesame asthatofeach

addend,andthemagnitudesofthe twoargumentsare added

in

a

loop

thatprocesses a

byte from

eachaddendat a

time, starting

withthe

least

significantand

ending

withthemostsignificant.

If thesignsofthe twoarguments

differ,

then the twomagnitudes must

be

comparedto

determine

which

is larger.

This

is

accomplishedwith a

loop

thatcomparesa

byte from

one addendwiththe

corresponding

byte

ofthe

other,

starting

withthemost significant

bytes.

Equality

causesthe

loop

to proceedto thenextpair of

bytes.

Assoon as anunequal pair of

bytes is

encountered,the

larger

of

the twomagnitudes

is

determined.

At that point, thesignoftheresultant

is

assignedasthesignof

the

larger

magnitude,

andthesmallermagnitude

is

subtracted

from

the

larger. This is

accomplished as withmagnitudeadditionabove,

starting

withthe

least

significant

bytes

and

ending

withthe most significant.

If

themagnitudes

happen

to

be

equal

but

thesignsare

different,

theresultant

is

assignedas all-zeros, therepresentation

for

+

0.

The

integer

addition algorithm always processes

5

bytes

ofmagnitude

from

eachargument, regardlessofthe number of

bytes in

variable

length integers.

Thus,

thevariablelength

integers

have

to

be

loaded

into

the

least

significant ends oftheaddendregisters, except

for

thesignswhich are

inserted into

themost significant

byte. Unused high

significant

bytes

arezero-filled

before

addition

begins.

5.4 Integer Multiplication

Integermultiplication

is

implemented using

anoriginalalgorithm

based

uponthe

look-uptables

shown

in

Figure 5-6. The tableonthe

left

givesthe

least

significant

digit

oftheproduct of

any

pair of

digits.

The tableontheright givesthe

carry (or

mostsignificant)

digit

from

themultiplication of

any

pair of

digits.

For example,giventhe

digits

4 &

7,

Table Pyieldsthe

least

significant

digit

oftheir product

(8),

andTable

C

yieldsthemost significant

digit

oftheirproduct

(2).

In theactual

implementation (SeeTables.ASM in

Appendix

III),

each ofthe tables

is laid

out as a

linear list

ofentries,suchthatthe two

digits

to

be

multipliedcan

be

concatenated

into

asingle

byte

thatcan

be

usedto

directly

index

the tableof

interest.

Althougha

byte

can

potentially

address

256

locations

the tables Pand

Care only

160entries

long.

This

is because

each nibble contains aBCD

digit,

sothemostsignificantnibbleofthe

byte index

canneverexceed 10decimai-

See

section7.2

for

a

discussion

onoptimizing tablesize.

The

tables

P &

C handle only

single-digitarguments,

but

thecomplete

integer

multiplier

is designed

to

find

theproductofmulti-digitarguments,withtwo

digits

packed

into

each

byte

ofeach argument.

To illustrate

the

implementation,

the

long

multiplicationof

CDEF

by

AB

is detailed in

Figure 5-7. As depicted

by

theshading,AB

is

atwo-digitmultiplier packed

into

asingle

byte,

and

CDEF

is

a4-digitmultiplicandpacked

into

twoadjacent

bytes. The

long

multiplication contains a series of

looked-up

termssuch as

PBf

&

Qf- Thenotation

PBF

representsthe

Product

of

digits

B & F.

PBF

is

asingle

digit

whichcan

be looked up in

Table

Pas defined

above.

Similarly,

the notation

CBF

representsthe

Carry

of

digits

B

&

F
(31)

0

1

2

3

4

5

6

7 8

9

0

1 2

3

4 5

6

7

8

9

0

0 0

0

0

0

0

5

2

9

6 3

0

8

6

4

2

5

4

3

2

1

\

0

1 2

3

4 5

6

7 8

9

0

0

0 0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

2

0

0

0

0

0

1 1 1 1 1

3

0

0

0

0

1 1 1 2

2

2 4

0

0

0

1 1 2 2

2

3

3

5

0

0

1 1 2 2

3

3

4 4

6

0

0

1 1 2

3

3

4 4

5

7

0

0

1 2 2

3

4 4 5

6

8

0

0

1 2

3

4 4 5

6

7

9

0

0

1 2 3 4 5

6

7

8

TABLE

P

TABLE

C

Figure

5-6. Integer Multiplication

Lookup

Tables

D

+

CAc

Pbc

Pbd

<

Figure

Figure 3-1. Structure of Individual Data Elements
Figure 3-2. System Flow
Figure 4-1. Memory Map
Figure 5-2. Implementor Block Diagram
+7

References

Related documents