Development of parallel digital circuits and computation of the Dynostat algorithm

(1)

OF

PARALLEL DIGITAL CIRCUITS

AND COMPU'rATION OF THE DYNOSTAT

by

R.W. qIaBARD B.E (Hons.)

A is for the degree of

Doctor of Philosophy in Electr 1 Engineeri.ng Un sity

Chris , New Zealand.

(2)

ABSTRACT

A synthesis of a I I mathemat 1 formu~

lation for binary two's comp

comraon basis for understanding exi

arithmet provides a

algorithms and circui ts for -the implementat of common arithmetic

tions in dig 1 simulation. A sy appl ation of

the machine word yields fied algorithm formu~

1 and circuit schemat sincluding contributions to

new knowledge, particularly in parallel array multiplica~

tiona From comparisons with existing formulations and

circu • the relative merits of the machine word approach are elucidated.

The lopment of a prototype parallel dig (P.D.)

optimiser, incorporating results from the circuits study, provides special purpose equipment which is superior in

speed to existing resistor-capac shaped (RC) analogue

computers. Comparative performance data is provided by

simple Linear Programming examples, so in earlier

research using RC-analogue equipment. Also solved is the static section of an illustrative example of optimal resource allocation using a Dynostat approach.

An extended performance assessment to dynamic opti= misation, in which the P.D. machine supplies solutions of the algebraic (static) section of the problem to a serial~ digital (S.D.) computer carrying out dynamic programming

calculations (the dynamic of Dynostat). demonstrates

the efficiency of an all digital of a problem

solved oUBly by a hybrid RC~ana S. D. computation.

The sa factory results suggest desirability of such

(3)

problems and on~ ne situations in which advantages of inaccuracy and maintainabili machines are important restrictions.

Zero~zone functional

dis-of RC-analogue

(4)

ACKNOWLEDGEMENTS

I would like to thank my supervi~or, Mr J.A. Gibson, for his guidance and encouragement throughout the course of this research.

Also, I am ful for the assistance afforded me by the many discussions with the staff and postgraduates of the Electrical Engineering Department; in particular Pro sor J.K. Bargh, Mr W.K.Kennedy, and Dr T.W. Marks.

I thank Messrs N. Gray, C. Rowe, and M. Cusdin for their help in some of the hardware aspects of the project.

My thanks also to Mrs J. Bleakley for her typing of this thesis, and to Mr L. Hill for the photographic work involved in the reproduction of the diagrams.

Appreciation is expressed to the University Grants committee of New Zealand for financial assistance through the award of a Postgraduate Scholarship.

The University Grants Committee, The New Zealand Electricity Department, and the Scientific Research Distribution Committee (State Services Commission) are also thanked for their assistance in providing computer facilities in the Electrical Engineering Department, used extensively in this research.

(5)

TABLE OF

CHAPTER 1: INTRODUCTION TO STUDY

, 1.1 _Introduction _1-1

1.2 Thesis Organ ation

References 7

CHAPTER 2: MACHINE WORD SYNTHESIS OF

2.1 2 2 2.3 2.3.1

2, 3.2

2.3. 3

2.3.4

2.3 5 2.3.6

2. 4

2.5 2.6

ALGORITHMS IN TWO'S COMPLEMENT FORM

Introduction 2 1

Notation and Terminology 2-2

Synthes of Multiplication Algorithms 2-4

A Bas Algorithm with Explicit 2 4

Corrections

M

Algorithm with Complemented Multiplier/ 2-7 Multiplicand Corrections

P

Algorithm with Complemented Partial 12

Product Word Corrections

Braun Algorithm with Mixed Explicit- 2-15

Implicit Corrections

Booth Algorithm with Implicit Corrections 2 23 De Mori and Serra Algorithm with One's 2-27 Complement Imbedding

Non-Restoring Division 2-30

Non-Restoring Square Root 2 33

General Linear Form 2-37

2.7 Examples of Non-Linear Function Generation 2-38

2. 7.1 Nodulus Function 2-38

2.7.2 continuous Polynomial Function 2 38

2 7.3 Zero-zone Algebraic Restraint Functions

2.8 Integration 2-43

2.9 Examp Problem Simulation

(6)

2.902 2.10

CHAPTER 3

3 1 3.2

3.2,1

3.2 2

3.2.3 3.2.4 3.2.5 3.2.6 3.2.7 3.3

3.4

3.5

with Non~Line 2 50

Conclus Re

ASSESSMENT AND SYNTHESIS OF' VE

CIRCUIT DESIGNS USING THE MACHINE WORD FORMULNfION

ion 3-1

Hardware Implementation Algorithms

Multiplication 3-1

Implementation of the Bas Explicit Corrections

Algorithm with 3 3

Implementation of the

M

Algorithm with Complemented Multiplier/Multiplicand Corrections

Implementation of the

P

Algorithm with Complemented Partial Product Word

Corrections

3 7

Implementation of the Braun Algorithm with 3~13

Mixed Explicit-Implic Corrections

Implementation the Booth Algorithm with 3-17

Implic Corrections

Implementation of the De Mori and Serra 3-21 Algorithm with One's Complement Imbedding

Comparative Discussion of Multipl 3-22

Implementations

Implementation of Non-Restoring Division 3-25

Implementation of Non-Restoring Square

31

Root

Conclusions 3~

References 3 44

Bibliography

3-CHAPTER 4: SPECIFICATION AND DESIGN OF THE PROTOTYPE PARALLEL DIGITAL MACHINE

(7)

4.2 General Hardware Cons s the 4 2 Machine

4,2. 1 The Choice of Word 4 2

4.2.2 The Choice of 4-4

4. 3 The Modu 4 5

4.4 on 4'~7

4 4.1 The of 4 8

4.4. LIThe 2-Input Add~Subtract-Modulus unit 4 9

4 4.1 2 The 4 Array Multip ers

4.4 1 3 Zero-Zone Restraint units 4-14

4.4.1 4 The Buf Units

4

25

4 4.2 The Operation of the Synchronous Modules 4~25

of the Machine

4.4 2.1 Mode and Clocking Control for the 4~25

4 4.2.2 Generation of the Signals (C_lOO_{' C10l)}

Control

4 4.2.3 The Master Arithmetic Fault Unit

4.4.2.4 The Integrator Clock 4.4.2 5 The Integrator

4.5 Conclusions

References Bibliography

CHAPTER 5 THE PARALLEL DIGITAL IMPLEMENTATION OF' A STEEPEST ASCENT METHOD OF RESTRAINED

STATIC OPTIMISATION 5.1

5.2 Unrestrained by

st Ascent

5. Continuous S 'c Ascent on

Augmentation

4-28

4-31

4-39 4 2 46 4-47

5 1 5-1

(8)

5.4 5.5 506. 5.7 5.7.2 5.7.3 5.7.4 5.745

Simpl ication of the Continuous S st

Ascent Method with Func Augmentation

the Case where the Index

and the Restraints are L Improved Formulation Using D Cosines

Mathematical Model of the Optimiser

Assessment of the Per

Paral Optimiser

tion

the Illus·trative Example Examination of the 'I'raj ect:or s the

Paral 1 Dig 1 ser for the

I

Sources Error in the D

Implementation of Optimiser

Comparison between the Parallel Digital

and RC-Analogue Optimisers terms of

Climbing Speed

The Effect of Varying Restraint St fness on the Performance of the Zero-Zone

Controller.s

5 7.6 tivity of Solution Errors to Answer

Point Topography

5.7.6.1 Sensitivity of Solution Errors to the Performance Index

5.7.6.2 Sensitivity of Solution Errors to Restraint Configuration

5.7.7 Auxiliary Algorithms to Ensure Accuracy of Solution

5.7.8

5.8

CHAPTER 6

6.1 " 2

Summary of the Performance Digital Optimiser

Conclusions References

t.he·

INTERFACING OF PARALLEL AND SERIAL COHPU'I'ING MACHINES Introduction

Features of the EAI Structure of the I System

Ba Data Bus

9

5-13

5-21

5-23

1 24

(9)

6.3 The Set of EAI 640 Instructions Chosen for 6-4 Use with the Data Inter

6.4 6.4.1 6.4.2 .6.4.3 6.5.3 6.6

CHAPTER 7

7.1 7.2

7.3 7 4

7.5

7.6

7.6.2

The Serial Digi to Parallel Link

Bas Construe of the S.D. to P D, Link The Operation the S.D. to P.D. Link

Sununary 'the of the S.D. to

P.O. Link

The Parallel Dig

Basic Construction of P.O. to S.D. Link, The Operation the P.o. to S.D. Link Sununary

S.D. Link Conclusions References

SERIAL~PARALLEL Dr

of the P.O. to

OF THE DYNOSTAT ALGORITHM Introduction

Formulation the Dynostat Class of

Problem

Variables and Dimensionality Problem

Recapitulation of the Principle of Operation of the Dynostat Algorithm Previous Implementations of Dynostat

Hybrid Ser Digital ~ Parallel Digital

Implementation the Dynostat Algorithm

Formulation of an Illustrative Example

The Parallel 0 P for the

Illu Dynostat

7.6.3 Digital

ive Dynostat

7.6.3.1 Discus on the Different Versions of

the Dynostat Ser 1 Dig Programs

Prepared

Auxil Algor to Ensure Ac

Solution

6 5

6 8 6 11

6-18

6-18 6 20

7-1 7-1

7-3

7 7 7 8

7 13

(10)

7.6.4

7.6.6

7.7

CHAPTER 8

8.1 8.2

Coordination of Serial Digital and Digital Operations through Interface Operations

The Performance of Hybrid Parallel Digital Optimiser Comparative Assessment of

Implementat Conclus Refer'ences

Digital~

tat

CONCLUSIONS FROM THE RESEARCH IN THIS THESIS

Contributions the Thes

Future Developments on the Research in this Thesis

References

APPENDIX A: RIGHT AND LEFT SHIFTS OF VARIABLES A.l Shi s of Normalised Problem iables A.2 Shifts of Machine Words

B: ADDENDA RELATING TO

--...;,.;,~=---..;...

B.1

B 2

MULTIPLICATION

Worked Example the Braun Single Shift and Add Multiplication Formulation

Alternative Braun Single Formulation

ft and Add

AND REMA.INDER MINIMISATION STRATEGY FOR

NON~RESTORING DIVISION

APPENDIX D NON~RESTORING

Roorr

FORMULATIONS D.l .Addenda H.e to the

Res

7 23

7 32

8~1

8 3

8-5

A-I

B-1

B-2

(11)

D, L I D L2

D. L 3

D.2

Introduction of the Stage Quotient Word (ch-l) D-1

Simpli ation of a Sum of We D~2

Stage Quotients

Stage by Stage Development of the 'Bit' D-3 Form of the Non~Restoring Square Root

Iteration Contribution Word (wi)

Pseudo-Quotient and Minimisa·t~

ion for Non~Res

Root

APPENDIX E TIME SCALING OF PROBLEMS

APPENDIX F: CIRCUI'l' DE'l'AILS AND

PRINTED CIRCUIT BOARD FOR

PARALLEL DIGITAL

F.I The A-S~j\1 _Un F-I

F.2 The P Mu1tipl F-3

1:":' 3 The Zero-Zone Restraint Units F-3

F.4 The Master Arithmetic Fault Unit F-7

F.5 The Integrator Clock F-12

F 6 The Integrators F-12

F.7 The Serial Digital to Parallel Digital Link F-12

F 8 The Parallel Digital to Serial Link F-12

APPENDIX G: SENSITIVITY OF SOLUTION ERRORS TO

ANSWER POINT TOPOGRAPHY A SOFT

RESTRAINT AUGMENTATION

G.I Errors to

Solution Errors to

i

(12)

APPENDIX I EXTENS OF 'l'HE Z FORMULATION

TO AND

SYNTHESIS OF A ZERO~Z HAMILTONIAN

CONDITION

1.1 Introduction

1.2 of

Formulation

ero~Zone Restraint

1.3

1.4

1,5

Appl in a Dynostat

of Optimal Control Synthesis of a Control Cond ion at Extrema Conclusions

Necessary

I-3

(13)

CHAPTER 1

(14)

1 ~ 1

1,1 INTRODUCTION

Advances in compu-ter technology in recent. s

have been accompanied by improvements in it.hm design;

in order t.o i l t.he st. use of lable computing

equipment.. Nevertheless, in t.he licat.ion of comput:ers t.o t.he opt.imisat.ion of mult iable , t.here st.ill

remains a 'dimensionality I which ses essent.ially

from t.he excessive comput.at.ion t.imes involved. A

principal cause of t.he comput.ation time barr r is t.he de iency of parallelism in both solution algorit.hms and computer hardware for implement.ation.

Opt.imisat.ion problems fall into t.wo cat.egories, namely, stat.ic and dynamic. In st.atic opt.imisat.ion t.he system being opt.imised is cribed by a set. of equalit.y and inequality restraint.s, expressed in t.erms of the system state and control variables. The optimisation problem is to determine the operating point of the tem which maximises some numerical performance index such as profitability, subject to the requirements that the system be in steady state. The most obvious approach to the

optimisation of such a syst.em is termed the grid search.

The of the control les is subdivided by means

a uniformly spaced gridv and the function evaluat.ed

at each grid point. The point at which the function is a

maximum is then determined by d comparison. This

approach, which is f • has for most

appli-cations been supe by Ihill climbing' iques

which explore only a I region the control space.

The general approach is to teer a tarts at an initial guess of the

(15)

I ~ 2

-peak of the profi lity function, us 9 ion about

values of the function and grad t the cin

of the search point. The for the ased

eff iency of this is that may only

local, and not global, optima.

In dynamic optimis the tern

state during the optimisat v the 1em to

determine the control policy which maximises a performance

index which usually the of the instantaneous

profitability over the period of optimisation. Although from a mathematical standpoint the control policy is a continuous function of time! in practice, with both manual and digital computer controlu the control variables are commonly adjusted only at discrete instants of time. If the period of optimisation is divided into N intervals or stages, an N~stage decision process is involved which is suitable for implementation utilising Dynamic Programming techniques. In Dynamic Programming each of the (say) n state variables quantised into (say) R levels. Thus a grid in state-time space formed. Using the Princ pIe of Optimality, the Ie cost of bringing the system from the starting point t:o all allowable points in "the

grid structure determined, I by interval, and

finally lding the overall optimum trajectory on arrival at the end point. In employing the Principle of

Optimal-neces to instantaneous costs

of state changes within a . These costs

can be dete:t:"rnined by a var -ty of means, such as a grid

search in control Alternat ly if the nuwber of

control variables is a cons

(16)

sat Dynostat

util Gibson and

i

nature of ·the the high hill climbing

i s

Becaused thmG

cap 1

thm using ities. Gibson and Marks

[2]

1 3

is

tat ithm

vers

me

al

Dynostat s of a gradient

ithm

llel computing faci utilised the parallel

not

computation lities an Re·-analogue computer to

implemen·t a gradient search optimiser. A sign

f over ier implementation was

However, Rc~analogue machines suffer from the

well known problems of owing to

opera-tional amp1i drift, 1imi reliability and

maintainability in a severe oper environment The

ing of RC~analogue and serial digital computers

is also an ive

In order to overcome t.hese

retain the 1

grad search f the lat.est

ithm and the study i

1 1 digital

ion a s manneto

Because

with 1 d

lems, and yet still

t the

lopment of the

subject this

computer by a

i s to

(17)

~.4

be t acting, neces tating the use parallel~'acting

rather than seri acting a all arithmet

functions.

In the course of the deve of the llel

digital machine was neces to make a detailed study

of the d rcuits current avail~

for forming the d equiva s of the cormuon

c rations usually provided on RC~analogue

computers. Th ss of becoming familiar th the

state of the art led to a 'machine word' formulation for ,

binary two's complement arithmetic. In this formulation all bits in the binary word take on the normal bit

values of (0,1). It considered that the machine word

formulation has merit in its systematic uni ion of the two's complement formulations of common arithmetic oper-ations. Use of the machine word formulation has led to the development of improved two's complement multiplication

. algorithms for paral 1 implementation. This work

has been reported on in a paper by Gibson and Gibbard [ 3

J.

The results of this supporting study are included

in thesis in a systematic and de led machine word

(18)

1.2 THESIS ORGANISATION

.into

1)

The seven

principal se

I

~

5

s of th thesis can grouped s:

2 3 sent the re Its of using machine word

ithms and ci

common

2) Des and Assessment

Chapters 4 and 5 des ign of a high speed parallel some of the algorithms and c ters 2 and 3. The performance of

twois camp implementat s

specification and gital optimiser, us

scribed in Chap~

s machine then 'assessed using a c

example.

sic Linear Programming illus

3) Interf of the with a

Serial

Chapter 6 scribes 'the if ion and design

of a 1 1:0 linking of the

leI digital op r with a d

tal computer.

4) al~Parallel 1

Ch r 7 de cx: the the

(19)

1 ~ 6

which the algorithms, ci , and the over 1

para-llel dig brought toge

leI digi Results

al

5)

signed ear

and tested a hybrid

version of the

the solut f an

are presented.

fr'om the

8 sUIIlInarises

al dig thmo

are

resource

to new

knowledge and technique made in sections 1 to 4 above, and outlines intended future developments.

A number signif but contributions to

the overall study have been placed in the Append s, to avoid inter renee with the rna stream development. of

the thesis. re IS attention drawn particularly

to the zero-zone Dynostat formulation using a varying

res Also,

pair ation to derive the Maximum Principle.

(20)

1 7

REFERENCES

(1) Gibson, J.A. and Coornbes, G.E. !A parallel optimum seeking technique Dynostat', I.E.E.E. Trans. Sys Sci. Cyb. p voL SSC'~6f no. 3, pp. 197 08, Ju 1970.

(2) Gibson, J.A. and Marks T.W. IFast hybrid

( 3)

imp Trans.

of the tat , I.E.E.E.

t voL 21, no.S pp. 87 880; Aug. 1972.

Gibson, J.A. and sis and com"~

(21)

MACHINE WORD SYNTHESIS OF

ALGORITHMS IN TWO'S

(22)

2 - 1

2 1 INTRODUCTION

In this the common of d ital

arithme are si a machine

form. In 1 rature the lent a lem

variable is sign bit valued.

s form is nurner 1 an exact: sentation of but, of course, hav a b which is

neg-valued is not convenient in circ

for it is that all shave

the same values of (0,1). There a binary machine word form termed the "machine II is fined as in

2.2. It is used in all cases cons in However, cross re to al ve so

u-tion s ed terms lem variables

are placed in the ces, cases where they seem to fer an advantage or useful complement to the machine word formul

An important contribution this chapter to new knowledge and technique of two is complement algorithm synthesis 1 s the consis and uniformity of the machine word formulation Algorithms for all the important arithmetic

s

are formulated in machine words. In some cases this has in led to new

a algori-thms:

(23)

2 2

2) of leI mul lication

ithms by Braun, and De Mori and Serra. 3) Improved formulation

ithms by Braun Booth, and ial vers of "che

M

and P 4)

multipl luding ithms ¢

ion

s to ser~

divis and

terms of c error

(ThE:~ lem var

ons in the ces are be to be new in

.

)

unit

All algorithms are presented using normalised machine ables, as would be required in computer simu~

using paralle acting function modules. As well as mult 1 division and

form and important examples

rooti a general linear

non~linear functions are

ted. Only rectangular integration is considered. The chapter concluded by scribing simple examp of problem simulation.

2.2 AND TERMINOLOGY

Notation employed the machine

word formulation are as llows:~

Problem Variab }{,y Z in a problem specified format.

Norma

[ x

J

f

[y]

V

[z]

where [x]

and x_{M YM1 ZM are} t the maximum values of

I

x

I , I

Y

I I

z

I "

minimum Iyl enter into

(24)

2 3

the lowing (n+l)

bit machine word

20

x :::: xo

+

~l + ID 0 0 + 2

n

where

Machine Constants ~.S

Word s

where x.

1. l~x. 1. is a bit complement

and (; .. is 1.J

;::v r:;:; Similarly y,zo

Kronecker Delta

Ident. between PoV and M.Vo

~

! s

~i + (;. ) 2

1.n

'"

_x

_[x]

₊

(2c

-/r

x

J/

)xo 2i s complement form.

[x]

~

:::::

Xo

]

+ 2

l[x]1

+ 2x o

--

x

Ie:: smallest bit discrimination

Be·tween Prob

(25)

2 4

It will be noted from Fig 2 1 mach

word s s are numerical i and in the range

<0,2),

and together with their complements make up the distinguishing diamond characte st the

Xu

[x]

2.3 SYNTHESIS OF MULTIPLICATION ALGORITHMS

In this section two new multiplication algorithms, desig-nated

M

and

P,

[lJ are developed using the machine word formulation. Improved formulations several existing multiplication algorithms are also produced. All algor~ ithms are developed from a bas two's complement multi-plication relationship, which is derived in Section 2 3.1.

2.3.1 A BASIC ALGORITHM WITH IONS

Normalised problem equation:

[z] ::::

[x] [yJ (2-6 )

. Transforming to 2's complement binary format using 2-5:

z -

2z _o

=

(x -

2x )

(y

~ 2yo)

0 (2-7 )

Expanding 2-7 and complementing negative signed machine words:

(2-8) where Zo

The contributions to the product are conveniently ident as

(i) ( ) a core of the multip and

multiplicand.

(26)

2 ~ 5

, f ' ' f ' 2+1

b1tS 0 slgn1 1cance , 2+2 , 1n the word aggregate

However, since bits of significance >20 overflow and are neglected, i t is a necessary compensation to neglect also the si9'n bit group

[J

0 This is equi valen't to saying that

[ ]

is omit,ted in modulo two addition.

Leo <'0

(x

'y)

12

(x_o ,~

i)}

z :::: ₊ y ₊ _y

0

Fig.2~2 Basic Multiplication (Explicit Corrections)

The significant contributions (i) and (ii) are incorporated in a schematic Fig. 2-2. The formulations 2-8, 2-9 are machine word equations for 21_{s complement multiplication}

(27)

2 ~ 6

able Paral

9 cou be to form the

using a 1 1 However, the st:ruc~

ture ·the schemat is

f of InU swill _ly

be corrections (ii)

into on (i) i.e. employing r

than s for quadrant operation

al Version of the ithm:

To i t i a structure 2-9 suitable for serial ion, 2 9 is decomposed into successive computa~ tional stages i

=

0 to n.

'" z ::;: n i _{y. )}

₊

2 (x

y

₊

i=o 1. 0 y - 0 X

or

_2:

n y. (2 i

x)

1.

+

2 (

+

YoX)

Equation 2-10 indicates a core numerical procedure involving n

+

1 successive of addition (Yi

=

1) the shifted multiplicand or zeros ( == 0) Equation 2~10 may be rewritten as:

z

== 2

₊

(n-l) _Yn- + ₂-n _Ynx

-+

2 (

+

o . +2

-11

2- 1

{y

n_l

X

+ + 2 ( + Yo~)

1

lYnX

f}! ... }

(2 11)

2 11 a core cal

involving n + 1 succes s sens !UuIt Iier

s wit:h ast s ant b working to the

(28)

2 7

e ng accumulation (y.

1 1) followed by shifting the new accumulation so one place to the right

In either case ion using 2 10 or

2~11 would Ive correction cycle of ing

2 (xofi + y x) 0

0 Circu imp f 2 10 2~11

are de Sect 3,2.1, from wh ch 1 seen the ci based on - 1 i better te:rms f

.3.2 M

MULTIPLICAND

From 2-7:

z

= {

{

}{(Y~Yo)-Yo}

+ 2zo

) (y-y o)}

+ {

+

} +

=

( <"'-J"-') ROO

:= xy +

+

[2x _oy -2x -2y

+

2z_o

J

0 0 0

+ {x ~ + + x y } + ~2x y ] (2 12)

0 0 0 0 0

using +

=

x

y + y

x

o 0 0 0

where RDO indicates a core multiplication with the first row word

Ri

=

YoX and the first diagonal word

.~ "'"

Dl

=

xoY omitted.

• .'<J (""xy'''') RDO

+

l.e. z =:

addition.

i" .

z

cxsn

RDM

four

Parallel Vers

on

t a

rules:

ope

+ y

x

+ x y ) in modulo 2

0 0 0

cate core 1

13 can be to

1 the

(2 l~)

(29)

2 ~ 8

(i) subst _{the word complement x for the mult}'" _i

cand word x in row 1. i.e. from 2 3 write + O. for 1n x· v i

1

o

to n.

(ii) subst

y

for y d 1 1. ioe. from 2 3 write Yi + 0 for v i o t a n.

(iii) col and Iify ons from

2~~13 to (CRDl)M

M co~non corner element, say (CRDl) •

~

{I

+

2y

x ]

a 0

=

1 + x _oy using modulo 2 addition.

0

{corner element xoyoanticipated from rules (i) and (ii)} + 1

From the schematic Fig 2-3 is clear that the M implicit correction algorithm obtained simply by

substituting the complements of the multiplicand and

mul,tipl words in (R 1) and (D I and adding a compensatory

1 to the common corner element.

A leI implementation of this algorithm, in which all negat number s are implicit in the multiplication process, described Section 3.2.2.

al Version of ithm:

ion 2~12 may be es serial stage summation form:

i n ~j

}

{

2 i

_o

_in)xo

1 y. 1 j~l + (y i +

z

= {

+

2 j .. ( +

o.

_.+" _x _{Y }}

+

[~2

]

=0 ] n a 0

Assoc correction with the

terms, and group inat into a

(30)

2-9

(a) Algorithm.

2- n .. t _2-n 2-n -1 _2-n -2 2-2n+1 _T2n

Xn-1 Xn

V n-2 Yn-1 Vn

Yo~n+Vo

(b) Circuit.

Full Adder Schematic.

Zf"

(31)

""-{L~=o

2 z

+{L~=o

2

+{z=n 2

:::::

z=~=o

₂

+

(y.

₁

+

i -i

~i

(;

2 - 10

z=j=o

2~jYi

(l-Ojo-6 + 20.

o.

)x.}

JO 1.0 J n 2

j=o j

(y.

1 + <5. 1n ) 0 . J }

n 2 j (

+

O. ) O.

y.}

+

r~

2x

y ]

j=o In 10 1

L-

0 0

z=

r:

2 - j

{y.

(8. 8.

+ O.

o. )

J=O 1 JO 10 JO 10

) O. x· +

(x.

+ O. )

o. y.}

+

[~~2x

y

J

JO J J In 10 1 . 0 0 (2 15)

where

8.

=:

o.

JO JO

The ons M are now icit an 0+1 stage

summation ithm. From 2 15 the operation of the

M algorithm in generating sequentially (i = 0 to n) the

row words of the multiplication lelogram specified by

I

2-13, can be confirmed with re rence to Fig. 2-3{a). Another way of expressing equation 2-12 is:

'"

{Ln (x-xo)

i

₊

_y

_i}

₊

x

y

+

O(x,y)

z = y. _xoyo

1 1 o I 0

n

x_o)2 i

Yo(~)}

;::::; O(x,y) (2-16)

i=l y. 1

+

x y 0

+

since yo( :;;;;: _{Yo (2- (} ) )

In equation 2 16 the elements the first diagonal ,word D = xoY have been effectively removed from the row

summation process, and D is added as an expli corr-ection term. The terms

are table for an n

+

1 s summation process, since

they all contain the term (x-x

o) or its complement, allowing implementation by adding (or subtracting) an appropriately shifted

hi. This

x_o) if the current mUltipl

[image:31.595.86.532.55.727.2]

(32)

2 - 11

y" I

"

"-

,,-" n + 1 row "

"

"summatIOn

. "

"

1 Product Paral

The term { 2-i Yi +

(X~Xo)}

can

readily written as a succession of shifts 1 of a accumulation, as was the case th the Ba EXpl

c thm. Both forms wou be for hardware

implementa

Hardware imp ations of 15 and

2-16 are discus Sect are shown

(33)

2 12

as some other a to be d cussed"

203.3

P

ALGORITHM WITH Pl-'>RT IAL PRODUC'I'

= {

+

WORD COH.RECTIONS

+

)

CY~Yo)}

+

+ Yo~2~~

+

f( :2

t

o)J

{

+

)]

+ 2

+

(2-17)

cxsn

RDO

+

('x~1) RDO + ₊ _{+ x Y } in modulo two}

o 0

addition.

(""~~) _xy:RDP _{, say.}

Equation 2 18 can be used directly to form the product with a parallel ar v using the following rules:

(i) substitute the part 1 product complement for

the parial product word

y-x

_o row L L eo from 2~3 + 0in for Yox_{i ,} i ; 0 to n.

(ii) substitute for in diagonal 1. i e from

2 3 write +'0.

1.n x o y., 1. i =: 0 to n"

(iii) collect and simplify three cont.r

2~18 to the common corner element,

( CRD 1) P =: x oy 0 + +

modulo 2 add i.e. the common corner e

case 0 '1'he

from

from s the

struc-ture of this and lementation.

A 1 of th algorithm

descr in Sect 3 Aga num~

(34)

2 - 13

(a)Algorilhm.

2° 2-2n+1 _T2n

X

::;:

Xo XI Xl )(n-2 )(n-l Xn

If YO V2 YO

YoXo

-..=.., - - ..

VOX1 YoXl-Rl-YoXn,2 VoXn_t ~n+l y 1 Xn'3 V, )(n-2 VI X_{n_}₁ Y,Xn

YX I YX .... 1 0 I.... 1 1

--=..

V2)( 0 1 . . •. y 2 )(0-4 Y 2Xn-3 V:1 Xn -2 Y 2 )(n-l V₂Xn

1.... •• , RDO

'D1 ... · ... ··{XV} ... · '~l

Yn - 2Xo I L Yn - 2_ _ , X1 Yn-z)(z Vn-2X3 Yn-2X4 Yn-2Xn

Yn -1Xo 1'<,-1)(1 Vn-₁₎₍₂ Y_{n - 1}X₃ _{Yn-, Xn-1 '{,-IXn}

I... _ , YnXo+ 1 I

Z := V= ,-.., _~ Z; 2: l:: E .... ~ E Z L _"

"" Zo ZI Z2 Zn-2 Zn-l Zn

(b)Circuit.

Full Adde, 5<hemohc

'~

A II Cin

(35)

2 ~ 14

Serial Ver

Proceeding in a very similar on to the M

development 2=17 y Ids:

r:

2~j{y.((5. (5

)=0 l JO

+

O. )8.

+

1n )0

The correct tage

+ 8. 6

JO

+ 6. ) 8 In of 2

)x.

)

} (2 19) in generating

terms of the

P

Ie may be confirmed with

the d of Figo 2 5(a) 0

For the ~ algorithm was convenient to partition

the 1 logram by removing the first

diagonal word from the summation, as was done in equation 2 160 'fh ft for the summation process

{

1

Cx~xo)

2 i Y i + (x-x

o) }

so that if the current value of Yi was

I,

a shifted complement for the Yo case) was added to

the accumulation. S term y (x-x ) accoun·ted for

_o

0

f row word its most significant term) f

only the

DI

correction needed to be added explicitly.

With the

P

algorithm, the e do not con~

tain factor expl tly. For this reason, the

word is difficult to incorporate in a summation process to above. Nevertheless, a convenient serial

implementation the

P

algorithm can obtained by

s the algorithm in the following form, the

valid of which is readily irmable from Fig.2-5(a)

r<J

6. + 2 j)

z xy

1n

n + 1

(36)

2 - 15

= " n - 1 , - - ~ n 2- j ) + 2-n ~ i=l 2 'Yixo + ~ j=lYiXj

o n - ]

+ 2 (Y x + " . lY x· 2 ) 0 0 L.J= OJ

+ o(x,y) (2-20)

Equation 2-20 can readily be rewritten as a succ-ession of shifts 2-1 of a stage accumulation, as was the case with the Basic EKplicit algorithm. In this form it

1S particularly suitaole for implementation.

Hardware imple:nentation of equations 2-19 and 2-20 are discussed in Section 3.2.3. It will be shown that an implementation based on equation 2-20 is quite efficient, although direct implementation from equation 2-19 is not competitive.

2.3.4 BRAUN ALGORITHM WITH MIXED EXPLICIT-IMPLICIT CORRECTIONS

The correction technique described by Braun [2] comprises the following 'two distinct strategies:

(i~ Negative Multiplier: Substitute the 2's complement of the multiplicand \lOrd for the first row word Yox of the core mul tiplicat:_on.

(ii) In the first diagonal word xoY of the core multi-plication, where an (~lement is unity fill out the remain-der of the row to the left with unit bits. (An explicit correction external to the core multiplication).

The strategy for a negative multiplier is the same as the implicit correction in the

M

algorithm. For this reason the basic 2's complement multiplication relation-ship 2-8 is now rearranged so as to separate the first

(37)

y.2

1. +

2 16

+ 2

1 Yi 2 + {Yox +

2yo~}+

_2xo

Y

+ _{2(xo +- yoB}

{Yo(2~~)

+

2Yo~}

+- n

pi2~ix

+

2xo(2~y)

+ [~2 (x

o +- y )] o

1

+

t-2 (

1

y. (

1.

+

2 ~i+-yO)]

i",

+

( 2~2

x

) _{+ 2y} ₊

0 4

i+l) x )

0

(2 21)

(2-22)

:::::

+ (2-23 )

the term [

J

being neglected in modulo 2 addition.

Note that the word Yi2 is simply the machine word y.x shifted right through i places. As shown in

1.

Appendix A, the term (2-2-i+l )X_o_Yi

=

.xoYi

r=~

2-r indicates 11 out the places vacated by the shifted word ~x/as stated by Braun's algorithm. N.B. It is shown also in Appendix A that 2 ix + (2-2 i+l)x is

o equivalent to the shifted problem variable 2

The structure of the algorithm as described by 2-22 or 2-23 is depicted in Fig. 2- 6 (a).

Parallel Version of the thm:

Equation 2-22 can be used to form the product

z

using a paral 1 This is described in Section 3.2.4,

where i t is shown that extra hardware is required to imple-ment the negative multiplier correction term

ithm:

(38)

20 2-1 2-2

YO'll Y_OX₂

Y₁X₁ XOY2

Row XOY3 ){o Y3

D1

fill-out corr'ns.

XoYn~2 XOYn-2 XOYn-2

XOYn-1 XOYn-1 XOYn-1

XoYn XoYn XoYn

rotal corr'n

=

D1

2-17

(a)Algorithm in 3rd Quadrant.

2-n +1 _2-n

-

---.

Y_oX_n-₂ Y_oX_n-₁ YoXn+Y_o

"'-Y1Xn-a Y1Xn-2 Y1Xn-1

Y2Xn-4 Y2Xn-a Yz){n-2

Core Multiplication.

Yn-2X1 Yn-2X2

X_OYn-1 Yn-1Xo Yn-1 Xl

XoY_n XoY_n _YnXo

(b) Peripheral Adder Chain.

Full Adder Schematic.

S

C A,B sig. rIp's

Cin carry"

B Co II

olp

(39)

2

-n

y i{ + ;S. ( +

xo(2~2

i+1))} +

o

(xoYo)

1=0 io 1.0

'" Z ::::

y.

{5to .

+ ;S. ( 2 1."", _{x + Xo} _{r=o 2}i-I ~r} ₎

1. 1.0 1.0

+ O(xo'Yo) (2 24 )

(i) the initial is recombined with the

subse-quent stages using the Kronecker delta operator

o.

1.0

(ii) the + _Xo 1 2~r) is the mnl

FV

shifted ri through "i" places with bits

x _Xo

into the resulting vacant p In Appendix A

this process is shown to be equivalent to a

t of the problem variable [x] through "i" places right. (iii) the negative word correction y ~ could be formed

o

explicitly and added in, if "i" is iterated from 1 to n, or could be formed in the i :; 0 stage if "i" is iterated from 0 to n.

An alternative method of implementing Braun's algor-ithm would be to replace the shifts of the mUltiplier

through "i" places followed by accumulation, with a succ-ession of stages involving adding the term y

.x

followed

n-1.

by shi ing the _{.;;;;,.;;;...=..;;.;;...::.=.;;;;.;:..:=..::.:::.} one place right. The vacant

place arising from the sh filled with the sign bit

of the accumulation. The validity of this process is easy to confirm from the following:

[z]

::: _{[x] [yJ where [y]} ₌₌ _y-2y

0 "'" -2y 0

+

2

{

n

2- iy i } [x] -2yo[x]

=

_i=o

{Yo

+

1 -2 - (n-l)

=

Yl

+

2 Y2 + + 2 Yn-l

+ 2-nY_{n }[xJ 2yo[x]}

{2

12 1 -1 ~l { 1 -1

:= _{. . . 2} ₂ ₍₂ _{Yn + Yn-l) +}

+ Yn-3" G GYo}[X]-2Yo[X] (2 Equation 2 25 expresses the multiplication in successive 2 1 shifts (of the problem variable [x]) and

add s where accumulation at s j is

(40)

2

The stage connection can seen to be expressed by

the recurrence relationship: 1

rj 1 + Yn-j [x] for j ::::: 1 to n where ro Y

n [x] ,

(2 26)

By examining the first few terms of r., i t is poss-J

Ie to deduce a general term: Y [x] _n

2 1 _{ro + Yn-l [x]} r

l

=

_{[x]{Yn - l +} 2-ly

n } -1

r

2 2 r 1

+

Y n _ 2 [x]

:::::

2-1 [x] {Y n-l + 2-1 _{Yn + Yn}} -2 [x] [xJ{Y n- 2 + 1 Yn - l + 2

:::::

Therefore in general:

r.

=

[x]

Jy , J 1 n-J

-1

+ 2 Y _{n - ]}'+1 +

+ - (j-l)

2 Yn-l

2 Yn }

2

Yn-j+2

+ 2:-jy n}

+ e 0

(2-27)

Examination of 2-27 shows that the limits of r, are:

i.e.

- (1-2-n ) j 2-j

~ k=o

r. ~ (1-2-n)(2~2 J)

J

It is therefore apparent that rj dges not necessarily lie within the normalised problem variable range:

1 ~ rj ~ 1-2-n

Therefore, in order to convert 2-26 to machine variables, an "extended 2i _{s complement}ll

(41)

2 20

Figure 2-7 illustrates both the 2's and extended 2's complement machine word representations of a prob~

lem vari x.

/ /

.r-.J*~

X, X

-1

/ I / I

,,/ I

/ I complement

/ smallest bit discrimination

"

/ /

£---~

r---:I:ell!+!-___

_I _1-e_I

" /

-1

I

o 1-e

~--[xl---ilI'llli'l

/ / /

/

"

/

2-e

o 2-7 Two's and Extended Two's lement

Machine Word sentations

It is apparent from Fig. 2-7 that if x is within the range of normalised problem variables [x], (as i t is in equation 2-27) then the machine word representations are related by:

,...,

_'*

x

=

x

+ 21x _o

where

x

is the usual 2 I S complement

repre-sentation of

[x]

",,*

and x is the extended 2i _{s complement}

repre-sentation [x]

",*

converted to x simply by the insertion of a bit of value Xo in the extra 21 position 0

[image:41.595.68.539.92.817.2]

(42)

2 - 21

"-' *

r.

]

*

2- 1 (

*

2(r]. )0 _- 1 -2(r. 1 ) ) _{] -} ₀

+

Y _{n - ]}

.(x

-2(x )0)

*

where (r_j )0' (r_j 1 )0 and (x )0 are used to indicate the

'" *

*

",*

values of the sign bits of rj , rj 1 and x , and so are

of ~alues 0 or 21.

*

r. J

-l~

*

=

2 r. _J-1 + (r~ _]-1 ) ₀

*

~{-(rj~/)O

+ (r. )

J 0 (2-28)

where [ ] cont compensation terms for overflow into

2 ~

*

the 2 column,and ro y ~* x

n

From 2-28 i t is apparent that corresponding to each

*

shift of the machine word accumulation r. 1 by one place

]

-right, the vacant 21 bit place arising must be filled out

*

with (r. 1 ) , the sign bit of the accumulation.

] - 0

I t readily shown from 2-28 that the accumulation at iteration "n" is:

=

xy

+ 2x_oy-4x_o

Y

XV

+

2X_o

Y

+

O(x,y) However, the product

z

is

z

=

xy

+

2x_o

Y

+ 2yo~

n

i=o

-i

*

2 (x) oy i

~

2 -i (21x )y:.

1=0 0 1

Hence the final accumulation needs to be corrected by the addition of 2yo~. However, since on the final

,..",*

iteration yox was added to the accumulation, the correc-tion 2yo~ (equivalent to -2y_o

x)

can be implemented

impli--*

citly by adding yoX (equivalent to subtracting yox ) on "",*

y,

the final iteration, rather than adding YoX'. When this

rv

*

is done the product

z

is obtained directly from rn (ig-noring the most significant 21 bit)

(43)

2 22

are not encountered. This formulation yields however, a circuit implementation wh h is less convenient.

Lewin ~J describes a variant on the above exten-ded 2's complement procedure. He retains an n

+

1 b binary word (i.e msb of 2 ). o When a shift of the accum-ula·tion is made, the sign digit is used to fill out the vacated 2° bit pos ion However, when an overflow has occurred the formation of the accumulation i t is ret-ained, and treated as the sign'bit, that is, shifted

right into the 20 position. This method and that described above are nume cally equivalent.

- i

Implementation using either the 2 shift form 24 or the 1 shift procedure just described requires dealing with the correction term Yo~. Since addition of

ware

equivalent to subtraction of y

X,

then if the hard-o

capable of subtraction as well as addition, no problems are encountered, and the variable to be added (yo = 1) on each iteration (or subtracted on the final . ~

iteration) always ~ {shifted a certain number of places, and with row II-out as necessary}. This contrasts with the serial M and

P

algorithms based on equations 2-15 and

19 which require different words to be added to the accumulation at each iteration. The se al

P

algorithm based on equation 2-20 overcomes th di iculty by using the multiplier digits direct to form a stage term

L:

n - j

Y ·x _~₀ + _J=. 1 y,x.2 _~ _J which is then added to the accumu-lation, except for the final Yo iteration where its com-plement is added

(44)

2 23

uniformity of word to be added ( :::::: 1) on each iteration (or subtracted on the final y

o ion) .

The penalty that the negative multiplier correction

then has to be added ic ly.

The al version of Braun's algorithm overcomes

this difficulty simply with the row fill out techn

whi implicit implements the negative multiplicand

ion the shi ng process. Because of this, and

cause of the uniformity of the word to be added (or subtracted) on each iteration (if y. is hi), Braun's

~

algorithm seen to be particular suitable for serial

implementation.

Hardware implementations of equations 2 24 and 2-28 are described in Section 3 2.4.

2.3.5 BOOTH CORRECTIONS

Booth [5J developed an algorithm for serial mu plication of 2's complement numbers in which the multi-plier bits are sensed in pairs, and depending upon the configuration of the bits in the sensed pair, a part

lar contribution is made towards the product. This contri-but ion must, as usual~ be lied so that i t is of the

correct significance with to the final product.

Again there appear to be two methods of doing this, one -i

involving a succession of 2 shifts of the contributions followed by accumulation, and the other involving a succ-ession of stages

followed by a 2 -1

addition of the current contribution, ft of the accumulation. Both

meth-ods will discussed the treatment to follow.

(45)

2

-product,as determined by the multipl being sensed are

bJ:

bit ir currently

(i) If the multipl bit is 1 and the next r

multipl bit is 0, multiplicand.

(ii) If the mul lier bit i 0 and the next lower

order mult ier is 1, add the multip

(iii) If the multipl bit is the same as the next

lower r bit, do nothing.

Boothls algorithm ly derived from the basic

2's complement mUltiplication ationship 2-7.

Rewrit-ing 2-7:

Z

2 (y-y 0) ( 2x

o) (x-2xo) + 2 (xy) 0

=

2 (Y-Yo) (X'-2x_o) + y(~ + 2(xy)o

Decomposing 2-29 into the sum of i

=

0 to n stages:

'

-z

(2- 30)

where the Yi+l ,mul plied by 2 in the f

f originates because in being bracket of 2-29, Y-Yo has been shifted one place left. The dummy parameter Yn+l is

ined to be zero.

Le.

z

_{Yi+l{2- + x

o (2 2 i+l) }

+ Yi{ + 2 i +l

)}J

(xoYi+l + x y.) + 2 (x + _o 1 0 -2x o y ) 0

-2

( Y i+l { + xo (2_2- i + l )}

+ Yi{2

+

Xo (2~ i+l)})

+

[-2

y. + 2 ( +y

~xy

>]

1 o 0 0

(Yi+l{2- + 1 2-r }

n{2

+-

r }

J

₊

O(x,y) (2 31)

where the group of terms O(xuY) modulo 2

(46)

2

formulation where each of the contribut are shifted

through "i" places followed by accumulation. The contri-but ions to the product may be identif from equation 2 31

(cross to rules (i), (ii) (iii) above) as~

(i) If Yi+l := 1 and Yi

=

0, add

x

shifted through iii"

s right (Lee i

5n

and fi out the vacant places

sing from the ft with ts

(ii) If _Y

1 ;:; 0 and y, 1 "'" 1, add

x

(Le. subtract x)

shifted through " ill places right, and 11 out the vacant

places sing from shift with bits Xo

(iii)

If Yi+l =

o

and y.

1

1 and y .

. 1.

in modulo 2 addition.

=

0, add nothing.

1 add { + 2

Numerical implementation of the algorithm is

ply obtained by accumulating i

=

0 to n s s of adding the appropriate contributions (as ined by rules (i)

(ii) and (iii», shifted the required amount, with the vacated bit positions lIed with the appropriate bits.

As with Braun's algorithm, the above procedure of shifting the variables through "ill places followed ~ accumulation not normally used in hardware imp lemen-tations of Booth's algorithm. It is instead usual to implement the algorithm by a succession of stages in which the terms Yi'y I are sensed (starting at Yn,Yn+I)' the appropriate contributions are added, and the

accumu-lation is fted one p right, as scribed by Chu

[6J.

The vacant place ing from the shift i filled with the s bit of the accumulation. As was the case with Braun's

algorithm, the validity of e s is intuitive, since

(47)

2 ~ 26

the value of accumulator. Again however, it is

difficult to predict th~ value of the s bit of

accumulation and so a recurrence relation proof is

to, where i t is not necess -to know the va of the accumulation sign bit to prove the validity of the

1

sSG The recurrence re

in problem

1

r. J

abIes: r. + {

J 1 j+l [x]

where r 1 0, Yn+l =: 0

By examining the first ible to deduce a general term

2-1_{r_ l} + Yn+l [x] =

[x] {

[x]

== _{+ Yn [x]} 1

[x]

2-1 ([x] {-Y_n}) + Y [x] _n == [x] { 1 + 2 -1 _Yn} = 2

1 + Yn-l [x] -Yn -2 [x]

1 ([x] { ₁

+

1 _{Yn })}

[x] { + 1 Y + 2

=

₂ _n-l

Therefore, in general:

r.

J

j

[x]}

j 0 to n

terms of r. i t J

1 [x]

+

Y n-1 [x] -Y n-2 [x] 2

Yn }

2 +

Examination of 33 shows that the limits of r. for J

1 ~ [x] ~ are:

~ r. ~

J

The upper limit of + 1 on ses the case [x]

(2 32)

poss-(2-33)

Since this limit is 2-n above the of normalised

prob vari necess to proh case

where the multipl and [x] is 1. If thi r.

(48)

2 27

abIes and so a transformation of 2 32 to vari~

straight~forward:

2 (r . )

J 0

=

l{

1~2(rj~1)O}

+

{Yn~j+l(

)

}}

i e r.

J

{2~

j-,l

+

(rj l)O} + J

'+

-I- Y n~J

.SZ}

+

[2

{~

( 1)

0

+

(r j ) 0

j+l~XoYno"j}J

(2~34)

It readi confirmed 2~34 by considering

iterations between r a n d

o that:

n

i=o ~i{ y ) -I- y. ~

(2 35)

Notice from 2-34 that at stage "j" i t is the accumulation from stage "j

-1'

which is shi (2 ~l ), whereas with 2-31 the contributions where e shifted (2 i) prior to accum-ulation. Consequently from 2 34 is seen that the

vacant place occurring with 2 1 shift filled by

zero or unity depending on the state the sign bit

I" Also, modulo 2 ition occurs at each stage of the accumulation, and so con-tents [ ] in 2-34 are omitted the numericaL implementation

Hardware implementations of Booth's algorithm,

which ars to be very sui Ie implementation

s no additional correction les are required,

described in Section 3.2 5.

2.3.6 DE MORl AND SERRA ALGORITHM WITH liS COMPLEMENT IMBEDDING

De Mori Serra [7] scr a 21_{s complement}

multiplication algorithm for llel array

implementation. T h e ' s lement numbering

(49)

2 ~ 28

De Mori and Serra with ir 2's lement multipl

It noted, v that lement of a 2'

complement input variable is adopted only when the vari able is negative,

Synthesis of the algorithm in terms starts

from lowing

2 5 and 1'5 comp vari s as From 2 5:

Ixl

"'~

+ (2~:;n x == _XXo

0 v

2~n)

=

+

ex

+ 36)

"'- lJ

larly Iyl == _yYo +

(y

+ ) Yo

where the lis complements of

x

and '" _{y are:}

LJ _-n

X

=

2 2

(2-37)

>(,

2

Y :::::

Substituting from 2~36 into the modulUS product identity

IxYI

==

ixll YI

(2-38)

yields in the notation of De Mori and Serra IxYI

Gl+

2 -n _{x oYo}

where Q

₌

U/Z+

Jf

+x

v

2-nWY .

and ~

₌

"'-_xXo

₊

""

xXo ' :=

0 (2 39)

"'- v

== _yYo ₊_YYo,J{ _Xo

Now introduce the 2's complement product identity

'"'-'

xy

=

40)

and substitute from 2-39 to obtain after simple manipu-lation the De Mori and Serra algorithm in general terms

Z == (xy 0 +

~G)

( ) 0 + 2 2nxoY 0 41)

The mixed 2's and l's comp var

ur

and Z are

evident

The structure i clear from Fig. 2~8 (aL

This algorithm, when the M

algorithms, is compl both the 2G

s

(50)

20 2-3

W :;

WI U/2 U/J

;t: %, %3

%11

~w.

%2«{

G _'" ~ ~

:::

a,

_Gl

2 Q3

z

Zo

Z,

z, z, z,

29

(a) Algorithm.

2-n~1 _2-n 2,n-l _T_{O- l ..}

wn-1

%n-l %0

%'~-2 ~j~-l _~~

~W %W

2 0-2 2 rH

. "

...

%n-f!2

'2; ~

a

n -1 G.n

ZO-1 Zn

x:'UJ

n-2 n

% UJ A;: UF

n-l 2 (1-1 3 % W n-1 fl-I 0-1 n %u/

~l-«~2 A:'~_1 %n~

Xfl+1 }Cn.,,,,' Km -1 Xln

H:m-z ~fH

~ ~ 2: 2: E

a

n•l UO>2 612n - 2 G,2n-1

am

(b) Circuit,

DeMor; and Serra OCN'"

"'~.~~"

), Z. Z, Zi Z,"

M, Network (b)-exacl.

x,

Y,

~tr=::

Z. Z, ZI Z2n

(51)

2,- 0

ware implementation of algorithm is 1n Section 3 2.6. Although ral networks are

to generate the mixed 1 sand 2's complement var s, and corrections to be added to the core mult li-cation product p'the rcuit imp ion shows

a surp s of try and ffic

2.4 NON~RESTORING DIVISION

Problem equation z ::;;:: x/y

Normali Problem Equation = k

[x]

I[Y]

i.e. k[xJ-[zJ[yJ 0 where ZM := xM/Ym and k "" Ym/Y

M > 0

Machine word equation:

(2'-42)

For reasons explained in Appendix C, introduce a pseudo-quot

q

word: z-2z _a

+-

1

. and substitute into the machine equation:

(2-43)

k

x

+-

{~(q~l) (y-2y

o)}

+-

[-2xo

k]::::

0 (2-44) Arrange the contents of { } as positive connected words suit Ie machine implementation:

+-

{qy

+-

o

Write the machine word equation d sian, 2-45, as sequence of

words Y, '" Y

ikx

+-+-

[2 Collect all

+

s i 0 to n of add ion of to the shi d dend ]~x

i - ₎_,'"

_}

{ qi

Y

+

+6 Y

i{qi(2Y_o 1)-2x k.} 4y _o

J

0

1 a

terms

'~) + q

one summation:

o

d

(2 45) a sor

(52)

+ n _i=o

2

l ~~n+l

f

== L Yo (2

+ 2q. (2y ~1)~4x lc

1 0 0 1

2 i

+

n+l _Yo

+

q

:~1)

+

o.

y

1 1n

+ 1) 4 .-2y}]

~ 2~n+ly (2~48)

1 0 0

where

w.

=

q

1

2~n+l

" Yo

+ 2qi (2yo ~1)-4 ~2y o

Writing in a manner similar to _{single shift and} add stages as employed in 1 mu lication, but employing a left shift:

2~n[2x2x'

··x2{2(2kx

+

w )

+

wd

+

w

2

+

~)

_{e; __}

~

__

(2-50)

Equation 2-50 is the machine algebra formulation for non~ restoring division. Numerical implementation of this equation, based on the strategy in Appendix C is achieved by starting at the inner stage 1 element and working out to stage n. Two important simplifications arising from the modulo 2 arithmetic are:

1) In

W.

(equation

1 49) omit the group of terms

2qi(2Yo-l}-4xoki 2yo' since being zero or nega-powers of 2, they cancel overflow into the 21 column.

2)

ection) hand s

For w.

=

w ,

omit

o.

y,

and

1 n 1n (for later

corr-an error ny (as i t would on the right

Thus, in numerical implementation

. - J

W. ~

(53)

2 ~ 32

The effective error terms are the 2-n

V

from 2) above, and constant 2 ~n,+ on right hand s of equations 2~47 to 2-50. These terms sent the error in the quotient multiplied by the divisor. Hence the quotient error is

)

/

To str ly correct then the should be corrected the

~2 n

ion of 2 ~n ,

e]'

The true 'z" is obtained SilUply from 2 43 as

'" z ₁ ₊ ₂₂₀

where, if _z is 1, '" _z '" _q ₊ ₁

0 (2-52)

and, if _z 0, Z ₌₌ '" _q

_-

₁

0

"-'

+ 1- [2J

=

q (2 53)

i.e. from 2-52 and 2 53v the true quotient

z

is obtained

simply by adding 1 to the psuedo~quotient

q.

In the shift and add accumulation 2-50, the recurrence relationship as described Chu

[6J

and others is c arly scernable. value

a.

of

1

accumulation at i re to that at stage i-I:

(2-54)

'" _O.

1 the machine word sentation

r.

1 f

remainder at stage if in formulation Chu. As discus by Chu, and

stage i quotient (Le (0 _{1)0) and}

as

are

ibed q. _{1 -}l' is 1

same

(0. 1) and

:1- 0 are different ..

It should be

necessary if the division is taking situation where

[x]

and

_[YJ

~

'"

[x]

~ and :( [yJ

YM

Appendix C, at the s of q 1 is 0 if

k is place a dynamic

that

(54)

In a s situation, where des simply to divide

[x]

by [yJ

then the scaling k i set to unity

Guild

[9J

s a non~restoring

on the s add ion

scussed in

In Append P Vo at for

agrams illus

trating closed loop the ive

sion algorithm. It is found that 'closed loop'

algorithms such as division and square root the P.V. form-ulation is simplifying

2 5 NON-RESTORING ROOT

Problem Equation: z =:

/X

i e x-z 2

=

0

Normali _{Problem Equatlon}

.

[x] [z]

2 with zM =:

Machine Word Equation:

(x-2x )-CZ'-2z )2

o 0

=: ₀

As previously in the case of divisionv

pseudo-quotient word (Appendix D 2.)

q

=: 2'-2z + 1

o

Subst into 2 56:

o

Writing 2~57 in s summation fo:rrn:

{

1

No-ting the simi

0, (

liM

(2-56 ) introduce the

57)

"" 0

problem structure,

re 57 and 2-58 to replace unknown 2) corres

to the known d sor

'y

in divi v by a

(55)

2

-q. L ( ~i)

~ ) qi

Using the identity

~i

qi 2 (q 2 i + 1) (z\ppend Do L 1) then s~

ig 0 (

1 1 i}

1 := ₀ 60)

q. ~ and not

i

)}

+ [~

2x]

0

~

=

0

x

== 2 (2 61)

the terms positively connected

words and

x+

n

i=o

where = using the

(Appendix

thei complements

2~i{q +

gop.

+

[-1 1

FoJ

+

i

g 1

i"" gi-l D 1. 2) I then

q.p.

+ q

i{

-1 1

1 +

+

[-i

g.

~

o

2x ]

=

0 o

(2 62)

Removing the term gi n outside the summation because its feet is small:

x

+ n . 2

i{-

g.p.

+

1=0 1 1

The term 2i

p.

1

now partit

-n",

~2 g n

~1

2 g.

1

(1/4,3/4) in order to the terms which will enter into numerical

from e ch will be

x

+

+ g~.'p. ~ (1/4) 1

1 1- 1

.~

+ i

f

i}

eo x

₁₉

_i