OF
PARALLEL DIGITAL CIRCUITS
AND COMPU'rATION OF THE DYNOSTAT
by
R.W. qIaBARD B.E (Hons.)
A is for the degree of
Doctor of Philosophy in Electr 1 Engineeri.ng Un sity
Chris , New Zealand.
ABSTRACT
A synthesis of a I I mathemat 1 formu~
lation for binary two's comp
comraon basis for understanding exi
arithmet provides a
algorithms and circui ts for -the implementat of common arithmetic
tions in dig 1 simulation. A sy appl ation of
the machine word yields fied algorithm formu~
1 and circuit schemat sincluding contributions to
new knowledge, particularly in parallel array multiplica~
tiona From comparisons with existing formulations and
circu • the relative merits of the machine word approach are elucidated.
The lopment of a prototype parallel dig (P.D.)
optimiser, incorporating results from the circuits study, provides special purpose equipment which is superior in
speed to existing resistor-capac shaped (RC) analogue
computers. Comparative performance data is provided by
simple Linear Programming examples, so in earlier
research using RC-analogue equipment. Also solved is the static section of an illustrative example of optimal resource allocation using a Dynostat approach.
An extended performance assessment to dynamic opti= misation, in which the P.D. machine supplies solutions of the algebraic (static) section of the problem to a serial~ digital (S.D.) computer carrying out dynamic programming
calculations (the dynamic of Dynostat). demonstrates
the efficiency of an all digital of a problem
solved oUBly by a hybrid RC~ana S. D. computation.
The sa factory results suggest desirability of such
problems and on~ ne situations in which advantages of inaccuracy and maintainabili machines are important restrictions.
Zero~zone functional
dis-of RC-analogue
ACKNOWLEDGEMENTS
I would like to thank my supervi~or, Mr J.A. Gibson, for his guidance and encouragement throughout the course of this research.
Also, I am ful for the assistance afforded me by the many discussions with the staff and postgraduates of the Electrical Engineering Department; in particular Pro sor J.K. Bargh, Mr W.K.Kennedy, and Dr T.W. Marks.
I thank Messrs N. Gray, C. Rowe, and M. Cusdin for their help in some of the hardware aspects of the project.
My thanks also to Mrs J. Bleakley for her typing of this thesis, and to Mr L. Hill for the photographic work involved in the reproduction of the diagrams.
Appreciation is expressed to the University Grants committee of New Zealand for financial assistance through the award of a Postgraduate Scholarship.
The University Grants Committee, The New Zealand Electricity Department, and the Scientific Research Distribution Committee (State Services Commission) are also thanked for their assistance in providing computer facilities in the Electrical Engineering Department, used extensively in this research.
TABLE OF
CHAPTER 1: INTRODUCTION TO STUDY
, 1.1 Introduction 1-1
1.2 Thesis Organ ation
References 7
CHAPTER 2: MACHINE WORD SYNTHESIS OF
2.1 2 2 2.3 2.3.1
2, 3.2
2.3. 3
2.3.4
2.3 5 2.3.6
2. 4
2.5 2.6
ALGORITHMS IN TWO'S COMPLEMENT FORM
Introduction 2 1
Notation and Terminology 2-2
Synthes of Multiplication Algorithms 2-4
A Bas Algorithm with Explicit 2 4
Corrections
M
Algorithm with Complemented Multiplier/ 2-7 Multiplicand CorrectionsP
Algorithm with Complemented Partial 12Product Word Corrections
Braun Algorithm with Mixed Explicit- 2-15
Implicit Corrections
Booth Algorithm with Implicit Corrections 2 23 De Mori and Serra Algorithm with One's 2-27 Complement Imbedding
Non-Restoring Division 2-30
Non-Restoring Square Root 2 33
General Linear Form 2-37
2.7 Examples of Non-Linear Function Generation 2-38
2. 7.1 Nodulus Function 2-38
2.7.2 continuous Polynomial Function 2 38
2 7.3 Zero-zone Algebraic Restraint Functions
2.8 Integration 2-43
2.9 Examp Problem Simulation
2.902 2.10
CHAPTER 3
3 1 3.2
3.2,1
3.2 2
3.2.3 3.2.4 3.2.5 3.2.6 3.2.7 3.3
3.4
3.5with Non~Line 2 50
Conclus Re
ASSESSMENT AND SYNTHESIS OF' VE
CIRCUIT DESIGNS USING THE MACHINE WORD FORMULNfION
ion 3-1
Hardware Implementation Algorithms
Multiplication 3-1
Implementation of the Bas Explicit Corrections
Algorithm with 3 3
Implementation of the
M
Algorithm with Complemented Multiplier/Multiplicand CorrectionsImplementation of the
P
Algorithm with Complemented Partial Product WordCorrections
3 7
Implementation of the Braun Algorithm with 3~13
Mixed Explicit-Implic Corrections
Implementation the Booth Algorithm with 3-17
Implic Corrections
Implementation of the De Mori and Serra 3-21 Algorithm with One's Complement Imbedding
Comparative Discussion of Multipl 3-22
Implementations
Implementation of Non-Restoring Division 3-25
Implementation of Non-Restoring Square
31
Root
Conclusions 3~
References 3 44
Bibliography
3-CHAPTER 4: SPECIFICATION AND DESIGN OF THE PROTOTYPE PARALLEL DIGITAL MACHINE
4.2 General Hardware Cons s the 4 2 Machine
4,2. 1 The Choice of Word 4 2
4.2.2 The Choice of 4-4
4. 3 The Modu 4 5
4.4 on 4'~7
4 4.1 The of 4 8
4.4. LIThe 2-Input Add~Subtract-Modulus unit 4 9
4 4.1 2 The 4 Array Multip ers
4.4 1 3 Zero-Zone Restraint units 4-14
4.4.1 4 The Buf Units
4
254 4.2 The Operation of the Synchronous Modules 4~25
of the Machine
4.4 2.1 Mode and Clocking Control for the 4~25
4 4.2.2 Generation of the Signals (ClOO' C10l)
Control
4 4.2.3 The Master Arithmetic Fault Unit
4.4.2.4 The Integrator Clock 4.4.2 5 The Integrator
4.5 Conclusions
References Bibliography
CHAPTER 5 THE PARALLEL DIGITAL IMPLEMENTATION OF' A STEEPEST ASCENT METHOD OF RESTRAINED
STATIC OPTIMISATION 5.1
5.2 Unrestrained by
st Ascent
5. Continuous S 'c Ascent on
Augmentation
4-28
4-31
4-39 4 2 46 4-47
5 1 5-1
5.4 5.5 506. 5.7 5.7.2 5.7.3 5.7.4 5.745
Simpl ication of the Continuous S st
Ascent Method with Func Augmentation
the Case where the Index
and the Restraints are L Improved Formulation Using D Cosines
Mathematical Model of the Optimiser
Assessment of the Per
Paral Optimiser
tion
the Illus·trative Example Examination of the 'I'raj ect:or s the
Paral 1 Dig 1 ser for the
I
Sources Error in the D
Implementation of Optimiser
Comparison between the Parallel Digital
and RC-Analogue Optimisers terms of
Climbing Speed
The Effect of Varying Restraint St fness on the Performance of the Zero-Zone
Controller.s
5 7.6 tivity of Solution Errors to Answer
Point Topography
5.7.6.1 Sensitivity of Solution Errors to the Performance Index
5.7.6.2 Sensitivity of Solution Errors to Restraint Configuration
5.7.7 Auxiliary Algorithms to Ensure Accuracy of Solution
5.7.8
5.8
CHAPTER 6
6.1 " 2
Summary of the Performance Digital Optimiser
Conclusions References
t.he·
INTERFACING OF PARALLEL AND SERIAL COHPU'I'ING MACHINES Introduction
Features of the EAI Structure of the I System
Ba Data Bus
9
5-13
5-21
5-23
1 24
6.3 The Set of EAI 640 Instructions Chosen for 6-4 Use with the Data Inter
6.4 6.4.1 6.4.2 .6.4.3 6.5.3 6.6
CHAPTER 7
7.1 7.2
7.3 7 4
7.5
7.6
7.6.2
The Serial Digi to Parallel Link
Bas Construe of the S.D. to P D, Link The Operation the S.D. to P.D. Link
Sununary 'the of the S.D. to
P.O. Link
The Parallel Dig
Basic Construction of P.O. to S.D. Link, The Operation the P.o. to S.D. Link Sununary
S.D. Link Conclusions References
SERIAL~PARALLEL Dr
of the P.O. to
OF THE DYNOSTAT ALGORITHM Introduction
Formulation the Dynostat Class of
Problem
Variables and Dimensionality Problem
Recapitulation of the Principle of Operation of the Dynostat Algorithm Previous Implementations of Dynostat
Hybrid Ser Digital ~ Parallel Digital
Implementation the Dynostat Algorithm
Formulation of an Illustrative Example
The Parallel 0 P for the
Illu Dynostat
7.6.3 Digital
ive Dynostat
7.6.3.1 Discus on the Different Versions of
the Dynostat Ser 1 Dig Programs
Prepared
Auxil Algor to Ensure Ac
Solution
6 5
6 8 6 11
6-18
6-18 6 20
7-1 7-1
7-3
7 7 7 8
7 13
7.6.4
7.6.6
7.7
CHAPTER 8
8.1 8.2
Coordination of Serial Digital and Digital Operations through Interface Operations
The Performance of Hybrid Parallel Digital Optimiser Comparative Assessment of
Implementat Conclus Refer'ences
Digital~
tat
CONCLUSIONS FROM THE RESEARCH IN THIS THESIS
Contributions the Thes
Future Developments on the Research in this Thesis
References
APPENDIX A: RIGHT AND LEFT SHIFTS OF VARIABLES A.l Shi s of Normalised Problem iables A.2 Shifts of Machine Words
B: ADDENDA RELATING TO
--...;,.;,~=---..;...
B.1
B 2
MULTIPLICATION
Worked Example the Braun Single Shift and Add Multiplication Formulation
Alternative Braun Single Formulation
ft and Add
AND REMA.INDER MINIMISATION STRATEGY FOR
NON~RESTORING DIVISION
APPENDIX D NON~RESTORING
Roorr
FORMULATIONS D.l .Addenda H.e to theRes
7 23
7 32
8~1
8 3
8-5
A-I
B-1
B-2
D, L I D L2
D. L 3
D.2
Introduction of the Stage Quotient Word (ch-l) D-1
Simpli ation of a Sum of We D~2
Stage Quotients
Stage by Stage Development of the 'Bit' D-3 Form of the Non~Restoring Square Root
Iteration Contribution Word (wi)
Pseudo-Quotient and Minimisa·t~
ion for Non~Res
Root
APPENDIX E TIME SCALING OF PROBLEMS
APPENDIX F: CIRCUI'l' DE'l'AILS AND
PRINTED CIRCUIT BOARD FOR
PARALLEL DIGITAL
F.I The A-S~j\1 Un F-I
F.2 The P Mu1tipl F-3
1:":' 3 The Zero-Zone Restraint Units F-3
F.4 The Master Arithmetic Fault Unit F-7
F.5 The Integrator Clock F-12
F 6 The Integrators F-12
F.7 The Serial Digital to Parallel Digital Link F-12
F 8 The Parallel Digital to Serial Link F-12
APPENDIX G: SENSITIVITY OF SOLUTION ERRORS TO
ANSWER POINT TOPOGRAPHY A SOFT
RESTRAINT AUGMENTATION
G.I Errors to
Solution Errors to
i
APPENDIX I EXTENS OF 'l'HE Z FORMULATION
TO AND
SYNTHESIS OF A ZERO~Z HAMILTONIAN
CONDITION
1.1 Introduction
1.2 of
Formulation
ero~Zone Restraint
1.3
1.4
1,5
Appl in a Dynostat
of Optimal Control Synthesis of a Control Cond ion at Extrema Conclusions
Necessary
I-3
CHAPTER 1
1 ~ 1
1,1 INTRODUCTION
Advances in compu-ter technology in recent. s
have been accompanied by improvements in it.hm design;
in order t.o i l t.he st. use of lable computing
equipment.. Nevertheless, in t.he licat.ion of comput:ers t.o t.he opt.imisat.ion of mult iable , t.here st.ill
remains a 'dimensionality I which ses essent.ially
from t.he excessive comput.at.ion t.imes involved. A
principal cause of t.he comput.ation time barr r is t.he de iency of parallelism in both solution algorit.hms and computer hardware for implement.ation.
Opt.imisat.ion problems fall into t.wo cat.egories, namely, stat.ic and dynamic. In st.atic opt.imisat.ion t.he system being opt.imised is cribed by a set. of equalit.y and inequality restraint.s, expressed in t.erms of the system state and control variables. The optimisation problem is to determine the operating point of the tem which maximises some numerical performance index such as profitability, subject to the requirements that the system be in steady state. The most obvious approach to the
optimisation of such a syst.em is termed the grid search.
The of the control les is subdivided by means
a uniformly spaced gridv and the function evaluat.ed
at each grid point. The point at which the function is a
maximum is then determined by d comparison. This
approach, which is f • has for most
appli-cations been supe by Ihill climbing' iques
which explore only a I region the control space.
The general approach is to teer a tarts at an initial guess of the
I ~ 2
-peak of the profi lity function, us 9 ion about
values of the function and grad t the cin
of the search point. The for the ased
eff iency of this is that may only
local, and not global, optima.
In dynamic optimis the tern
state during the optimisat v the 1em to
determine the control policy which maximises a performance
index which usually the of the instantaneous
profitability over the period of optimisation. Although from a mathematical standpoint the control policy is a continuous function of time! in practice, with both manual and digital computer controlu the control variables are commonly adjusted only at discrete instants of time. If the period of optimisation is divided into N intervals or stages, an N~stage decision process is involved which is suitable for implementation utilising Dynamic Programming techniques. In Dynamic Programming each of the (say) n state variables quantised into (say) R levels. Thus a grid in state-time space formed. Using the Princ pIe of Optimality, the Ie cost of bringing the system from the starting point t:o all allowable points in "the
grid structure determined, I by interval, and
finally lding the overall optimum trajectory on arrival at the end point. In employing the Principle of
Optimal-neces to instantaneous costs
of state changes within a . These costs
can be dete:t:"rnined by a var -ty of means, such as a grid
search in control Alternat ly if the nuwber of
control variables is a cons
sat Dynostat
util Gibson and
i
nature of ·the the high hill climbing
i s
Becaused thmG
cap 1
thm using ities. Gibson and Marks
[2]
1 3
is
tat ithm
vers
me
al
Dynostat s of a gradient
ithm
llel computing faci utilised the parallel
not
computation lities an Re·-analogue computer to
implemen·t a gradient search optimiser. A sign
f over ier implementation was
However, Rc~analogue machines suffer from the
well known problems of owing to
opera-tional amp1i drift, 1imi reliability and
maintainability in a severe oper environment The
ing of RC~analogue and serial digital computers
is also an ive
In order to overcome t.hese
retain the 1
grad search f the lat.est
ithm and the study i
1 1 digital
ion a s manneto
Because
with 1 d
lems, and yet still
t the
lopment of the
subject this
computer by a
i s to
~.4
be t acting, neces tating the use parallel~'acting
rather than seri acting a all arithmet
functions.
In the course of the deve of the llel
digital machine was neces to make a detailed study
of the d rcuits current avail~
for forming the d equiva s of the cormuon
c rations usually provided on RC~analogue
computers. Th ss of becoming familiar th the
state of the art led to a 'machine word' formulation for ,
binary two's complement arithmetic. In this formulation all bits in the binary word take on the normal bit
values of (0,1). It considered that the machine word
formulation has merit in its systematic uni ion of the two's complement formulations of common arithmetic oper-ations. Use of the machine word formulation has led to the development of improved two's complement multiplication
. algorithms for paral 1 implementation. This work
has been reported on in a paper by Gibson and Gibbard [ 3
J.
The results of this supporting study are includedin thesis in a systematic and de led machine word
1.2 THESIS ORGANISATION
.into
1)
The seven
principal se
I
~5
s of th thesis can grouped s:
2 3 sent the re Its of using machine word
ithms and ci
common
2) Des and Assessment
Chapters 4 and 5 des ign of a high speed parallel some of the algorithms and c ters 2 and 3. The performance of
twois camp implementat s
specification and gital optimiser, us
scribed in Chap~
s machine then 'assessed using a c
example.
sic Linear Programming illus
3) Interf of the with a
Serial
Chapter 6 scribes 'the if ion and design
of a 1 1:0 linking of the
leI digital op r with a d
tal computer.
4) al~Parallel 1
Ch r 7 de cx: the the
1 ~ 6
which the algorithms, ci , and the over 1
para-llel dig brought toge
leI digi Results
al
5)
signed ear
and tested a hybrid
version of the
the solut f an
are presented.
fr'om the
8 sUIIlInarises
al dig thmo
are
resource
to new
knowledge and technique made in sections 1 to 4 above, and outlines intended future developments.
A number signif but contributions to
the overall study have been placed in the Append s, to avoid inter renee with the rna stream development. of
the thesis. re IS attention drawn particularly
to the zero-zone Dynostat formulation using a varying
res Also,
pair ation to derive the Maximum Principle.
1 7
REFERENCES
(1) Gibson, J.A. and Coornbes, G.E. !A parallel optimum seeking technique Dynostat', I.E.E.E. Trans. Sys Sci. Cyb. p voL SSC'~6f no. 3, pp. 197 08, Ju 1970.
(2) Gibson, J.A. and Marks T.W. IFast hybrid
( 3)
imp Trans.
of the tat , I.E.E.E.
t voL 21, no.S pp. 87 880; Aug. 1972.
Gibson, J.A. and sis and com"~
MACHINE WORD SYNTHESIS OF
ALGORITHMS IN TWO'S
2 - 1
2 1 INTRODUCTION
In this the common of d ital
arithme are si a machine
form. In 1 rature the lent a lem
variable is sign bit valued.
s form is nurner 1 an exact: sentation of but, of course, hav a b which is
neg-valued is not convenient in circ
for it is that all shave
the same values of (0,1). There a binary machine word form termed the "machine II is fined as in
2.2. It is used in all cases cons in However, cross re to al ve so
u-tion s ed terms lem variables
are placed in the ces, cases where they seem to fer an advantage or useful complement to the machine word formul
An important contribution this chapter to new knowledge and technique of two is complement algorithm synthesis 1 s the consis and uniformity of the machine word formulation Algorithms for all the important arithmetic
s
are formulated in machine words. In some cases this has in led to newa algori-thms:
2 2
2) of leI mul lication
ithms by Braun, and De Mori and Serra. 3) Improved formulation
ithms by Braun Booth, and ial vers of "che
M
and P 4)multipl luding ithms ¢
ion
s to ser~
divis and
terms of c error
(ThE:~ lem var
ons in the ces are be to be new in
.
)unit
All algorithms are presented using normalised machine ables, as would be required in computer simu~
using paralle acting function modules. As well as mult 1 division and
form and important examples
rooti a general linear
non~linear functions are
ted. Only rectangular integration is considered. The chapter concluded by scribing simple examp of problem simulation.
2.2 AND TERMINOLOGY
Notation employed the machine
word formulation are as llows:~
Problem Variab }{,y Z in a problem specified format.
Norma
[ x
J
f[y]
V[z]
where [x]
and xM YM1 ZM are t the maximum values of
I
xI , I
YI I
zI "
minimum Iyl enter into2 3
the lowing (n+l)
bit machine word
20
x :::: xo
+
~l + ID 0 0 + 2n
where
Machine Constants ~.S
Word s
where x.
1. l~x. 1. is a bit complement
and (; .. is 1.J
;::v r:;:; Similarly y,zo
Kronecker Delta
Ident. between PoV and M.Vo
~
! s
~i + (;. ) 2
1.n
'"
x
[x]
+
(2c
-/r
xJ/
)xo 2i s complement form.
[x]
~:::::
Xo
]
+ 2l[x]1
+ 2x o--
xIe:: smallest bit discrimination
Be·tween Prob
2 4
It will be noted from Fig 2 1 mach
word s s are numerical i and in the range
<0,2),
and together with their complements make up the distinguishing diamond characte st theXu
[x]
2.3 SYNTHESIS OF MULTIPLICATION ALGORITHMS
In this section two new multiplication algorithms, desig-nated
M
andP,
[lJ are developed using the machine word formulation. Improved formulations several existing multiplication algorithms are also produced. All algor~ ithms are developed from a bas two's complement multi-plication relationship, which is derived in Section 2 3.1.2.3.1 A BASIC ALGORITHM WITH IONS
Normalised problem equation:
[z] ::::
[x] [yJ (2-6 ). Transforming to 2's complement binary format using 2-5:
z -
2z o=
(x -
2x )(y
~ 2yo)0 (2-7 )
Expanding 2-7 and complementing negative signed machine words:
(2-8) where Zo
The contributions to the product are conveniently ident as
(i) ( ) a core of the multip and
multiplicand.
2 ~ 5
, f ' ' f ' 2+1
b1tS 0 slgn1 1cance , 2+2 , 1n the word aggregate
However, since bits of significance >20 overflow and are neglected, i t is a necessary compensation to neglect also the si9'n bit group
[J
0 This is equi valen't to saying that[ ]
is omit,ted in modulo two addition.Leo <'0
(x
'y)
12
(xo ,~i)}
z :::: + y + y
0
Fig.2~2 Basic Multiplication (Explicit Corrections)
The significant contributions (i) and (ii) are incorporated in a schematic Fig. 2-2. The formulations 2-8, 2-9 are machine word equations for 21s complement multiplication
2 ~ 6
able Paral
9 cou be to form the
using a 1 1 However, the st:ruc~
ture ·the schemat is
f of InU swill ly
be corrections (ii)
into on (i) i.e. employing r
than s for quadrant operation
al Version of the ithm:
To i t i a structure 2-9 suitable for serial ion, 2 9 is decomposed into successive computa~ tional stages i
=
0 to n.'" z ::;: n i y. )
+
2 (xy
+
i=o 1. 0 y - 0 X
or
2:
n y. (2 ix)
1.
+
2 (+
YoX)Equation 2-10 indicates a core numerical procedure involving n
+
1 successive of addition (Yi=
1) the shifted multiplicand or zeros ( == 0) Equation 2~10 may be rewritten as:z
== 2+
(n-l) Yn- + 2 -n Ynx-+
2 (+
+
o . +2-11
2- 1{y
n_l
X
+ + 2 ( + Yo~)1
lYnX
f}! ... }
(2 11)
2 11 a core cal
involving n + 1 succes s sens !UuIt Iier
s wit:h ast s ant b working to the
2 7
e ng accumulation (y.
1 1) followed by shifting the new accumulation so one place to the right
In either case ion using 2 10 or
2~11 would Ive correction cycle of ing
2 (xofi + y x) 0
0 Circu imp f 2 10 2~11
are de Sect 3,2.1, from wh ch 1 seen the ci based on - 1 i better te:rms f
.3.2 M
MULTIPLICAND
From 2-7:
z
= {
{
}{(Y~Yo)-Yo}
+ 2zo) (y-y o)}
+ {
+
+
} +=
( <"'-J"-') ROO
:= xy +
+
[2x o y -2x -2y+
2zoJ
0 0 0
+ {x ~ + + x y } + ~2x y ] (2 12)
0 0 0 0 0
using +
=
x
y + yx
o 0 0 0
where RDO indicates a core multiplication with the first row word
Ri
=
YoX and the first diagonal word.~ "'"
Dl
=
xoY omitted.• .'<J (""xy'''') RDO
+
l.e. z =:
addition.
i" .
z
cxsn
RDMfour
Parallel Vers
on
t a
rules:
ope
+ y
x
+ x y ) in modulo 20 0 0
cate core 1
13 can be to
1 the
(2 l~)
2 ~ 8
(i) subst the word complement x for the mult '" i
cand word x in row 1. i.e. from 2 3 write + O. for 1n x· v i
1
o
to n.(ii) subst
y
for y d 1 1. ioe. from 2 3 write Yi + 0 for v i o t a n.(iii) col and Iify ons from
2~~13 to (CRDl)M
M co~non corner element, say (CRDl) •
~
{I
+2y
x ]
a 0
=
1 + x o y using modulo 2 addition.0
{corner element xoyoanticipated from rules (i) and (ii)} + 1
From the schematic Fig 2-3 is clear that the M implicit correction algorithm obtained simply by
substituting the complements of the multiplicand and
mul,tipl words in (R 1) and (D I and adding a compensatory
1 to the common corner element.
A leI implementation of this algorithm, in which all negat number s are implicit in the multiplication process, described Section 3.2.2.
al Version of ithm:
ion 2~12 may be es serial stage summation form:
i n ~j
}
{
2 io
in)xo1 y. 1 j~l + (y i +
z
= {
+
2 j .. ( +o.
.+" x Y }+
[~2]
=0 ] n a 0
Assoc correction with the
terms, and group inat into a
2-9
(a) Algorithm.
2- n .. t 2-n 2-n -1 2-n -2 2-2n+1 T2n
Xn-1 Xn
V n-2 Yn-1 Vn
Yo~n+Vo
(b) Circuit.
Full Adder Schematic.
Zf"
""-{L~=o
2 z+{L~=o
2+{z=n 2
:::::
z=~=o
2+
(y.
1+
i -i
~i
(;
2 - 10
z=j=o
2~jYi
(l-Ojo-6 + 20.o.
)x.}
JO 1.0 J n 2j=o j
(y.
1 + <5. 1n ) 0 . J }n 2 j (
+
O. ) O.y.}
+
r~
2xy ]
j=o In 10 1
L-
0 0z=
r:
2 - j{y.
(8. 8.
+ O.o. )
J=O 1 JO 10 JO 10) O. x· +
(x.
+ O. )o. y.}
+
[~~2x
y
J
JO J J In 10 1 . 0 0 (2 15)
where
8.
=:o.
JO JO
The ons M are now icit an 0+1 stage
summation ithm. From 2 15 the operation of the
M algorithm in generating sequentially (i = 0 to n) the
row words of the multiplication lelogram specified by
I
2-13, can be confirmed with re rence to Fig. 2-3{a). Another way of expressing equation 2-12 is:
'"
{Ln (x-xo)
i
+
yi}
+
x
y
+
+
O(x,y)z = y. xoyo
1 1 o I 0
n
xo)2 i
Yo(~)}
;::::; O(x,y) (2-16)i=l y. 1
+
+
x y 0+
since yo( :;;;;: Yo (2- ( ) )
In equation 2 16 the elements the first diagonal ,word D = xoY have been effectively removed from the row
summation process, and D is added as an expli corr-ection term. The terms
are table for an n
+
1 s summation process, sincethey all contain the term (x-x
o) or its complement, allowing implementation by adding (or subtracting) an appropriately shifted
hi. This
xo) if the current mUltipl
[image:31.595.86.532.55.727.2]2 - 11
y" I
"
"
"-,,-" n + 1 row "
"
"summatIOn. "
"1 Product Paral
The term { 2-i Yi +
(X~Xo)}
canreadily written as a succession of shifts 1 of a accumulation, as was the case th the Ba EXpl
c thm. Both forms wou be for hardware
implementa
Hardware imp ations of 15 and
2-16 are discus Sect are shown
2 12
as some other a to be d cussed"
203.3
P
ALGORITHM WITH Pl-'>RT IAL PRODUC'I'= {
+
WORD COH.RECTIONS
+
)
CY~Yo)}
+
+ Yo~2~~
+
f( :2
t
o)J
{
+
)]
+ 2
+
(2-17)
cxsn
RDO+
('x~1) RDO + + + x Y } in modulo two
o 0
addition.
(""~~) xy: RDP , say.
Equation 2 18 can be used directly to form the product with a parallel ar v using the following rules:
(i) substitute the part 1 product complement for
the parial product word
y-x
o row L L eo from 2~3 + 0in for Yoxi , i ; 0 to n.(ii) substitute for in diagonal 1. i e from
2 3 write +'0.
1.n x o y., 1. i =: 0 to n"
(iii) collect and simplify three cont.r
2~18 to the common corner element,
( CRD 1) P =: x oy 0 + +
modulo 2 add i.e. the common corner e
case 0 '1'he
from
from s the
struc-ture of this and lementation.
A 1 of th algorithm
descr in Sect 3 Aga num~
2 - 13
(a)Algorilhm.
2° 2-2n+1 T2n
X
::;:Xo XI Xl )(n-2 )(n-l Xn
If YO V2 YO
YoXo
-..=.., - - ..
VOX1 YoXl-Rl-YoXn,2 VoXn_t ~n+l y 1 Xn'3 V, )(n-2 VI Xn_1 Y,XnYX I YX .... 1 0 I.... 1 1
--=..
V2)( 0 1 . . •. y 2 )(0-4 Y 2Xn-3 V:1 Xn -2 Y 2 )(n-l V2Xn
1.... •• , RDO
'D1 ... · ... ··{XV} ... · '~l
Yn - 2Xo I L Yn - 2_ _ , X1 Yn-z)(z Vn-2X3 Yn-2X4 Yn-2Xn
Yn -1Xo 1'<,-1)(1 Vn-1)(2 Yn - 1X3 Yn-, Xn-1 '{,-IXn
I... _ , YnXo+ 1 I
Z := V= ,-.., ~ Z; 2: l:: E .... ~ E Z L "
"" Zo ZI Z2 Zn-2 Zn-l Zn
(b)Circuit.
Full Adde, 5<hemohc
'~
A II Cin
2 ~ 14
Serial Ver
Proceeding in a very similar on to the M
development 2=17 y Ids:
r:
2~j{y.((5. (5)=0 l JO
+
+
O. )8.+
1n )0
The correct tage
+ 8. 6
JO
+ 6. ) 8 In of 2
)x.
)
} (2 19) in generating
terms of the
P
Ie may be confirmed withthe d of Figo 2 5(a) 0
For the ~ algorithm was convenient to partition
the 1 logram by removing the first
diagonal word from the summation, as was done in equation 2 160 'fh ft for the summation process
{
1Cx~xo)
2 i Y i + (x-xo) }
so that if the current value of Yi was
I,
a shifted complement for the Yo case) was added tothe accumulation. S term y (x-x ) accoun·ted for
o
0
f row word its most significant term) f
only the
DI
correction needed to be added explicitly.With the
P
algorithm, the e do not con~tain factor expl tly. For this reason, the
word is difficult to incorporate in a summation process to above. Nevertheless, a convenient serial
implementation the
P
algorithm can obtained bys the algorithm in the following form, the
valid of which is readily irmable from Fig.2-5(a)
r<J
6. + 2 j)
z xy
1n
n + 1
2 - 15
= " n - 1 , - - ~ n 2- j ) + 2-n ~ i=l 2 'Yixo + ~ j=lYiXj
o n - ]
+ 2 (Y x + " . lY x· 2 ) 0 0 L.J= OJ
+ o(x,y) (2-20)
Equation 2-20 can readily be rewritten as a succ-ession of shifts 2-1 of a stage accumulation, as was the case with the Basic EKplicit algorithm. In this form it
1S particularly suitaole for implementation.
Hardware imple:nentation of equations 2-19 and 2-20 are discussed in Section 3.2.3. It will be shown that an implementation based on equation 2-20 is quite efficient, although direct implementation from equation 2-19 is not competitive.
2.3.4 BRAUN ALGORITHM WITH MIXED EXPLICIT-IMPLICIT CORRECTIONS
The correction technique described by Braun [2] comprises the following 'two distinct strategies:
(i~ Negative Multiplier: Substitute the 2's complement of the multiplicand \lOrd for the first row word Yox of the core mul tiplicat:_on.
(ii) In the first diagonal word xoY of the core multi-plication, where an (~lement is unity fill out the remain-der of the row to the left with unit bits. (An explicit correction external to the core multiplication).
The strategy for a negative multiplier is the same as the implicit correction in the
M
algorithm. For this reason the basic 2's complement multiplication relation-ship 2-8 is now rearranged so as to separate the firsty.2
1. +
2 16
+ 2
1 Yi 2 + {Yox +
2yo~}+
2xoY
+ 2(xo +- yoB{Yo(2~~)
+2Yo~}
+- npi2~ix
+2xo(2~y)
+ [~2 (x
o +- y )] o
1
+
t-2 (
1
y. (
1.
+
2 ~i+-yO)]
i",
+
( 2~2x
) + 2y +
0 4
i+l) x )
0
(2 21)
(2-22)
:::::
+ (2-23 )
the term [
J
being neglected in modulo 2 addition.Note that the word Yi2 is simply the machine word y.x shifted right through i places. As shown in
1.
Appendix A, the term (2-2-i+l )XoYi
=
.xoYir=~
2-r indicates 11 out the places vacated by the shifted word ~x/as stated by Braun's algorithm. N.B. It is shown also in Appendix A that 2 ix + (2-2 i+l)x iso equivalent to the shifted problem variable 2
The structure of the algorithm as described by 2-22 or 2-23 is depicted in Fig. 2- 6 (a).
Parallel Version of the thm:
Equation 2-22 can be used to form the product
z
using a paral 1 This is described in Section 3.2.4,
where i t is shown that extra hardware is required to imple-ment the negative multiplier correction term
ithm:
20 2-1 2-2
YO'll YOX2
Y1X1 XOY2
Row XOY3 ){o Y3
D1
fill-out corr'ns.
XoYn~2 XOYn-2 XOYn-2
XOYn-1 XOYn-1 XOYn-1
XoYn XoYn XoYn
rotal corr'n
=
D1
2-17
(a)Algorithm in 3rd Quadrant.
2-n +1 2-n
-
-
---.YoXn-2 YoXn-1 YoXn+Yo
"'-Y1Xn-a Y1Xn-2 Y1Xn-1
Y2Xn-4 Y2Xn-a Yz){n-2
Core Multiplication.
Yn-2X1 Yn-2X2
XOYn-1 Yn-1Xo Yn-1 Xl
XoYn XoYn YnXo
(b) Peripheral Adder Chain.
Full Adder Schematic.
S
C A,B sig. rIp's
Cin carry"
B Co II
olp
2
-n
y i{ + ;S. ( +
xo(2~2
i+1))} +o
(xoYo)1=0 io 1.0
'" Z ::::
y.
{5to .
+ ;S. ( 2 1."", x + Xo r=o 2 i-I ~r} )1. 1.0 1.0
+ O(xo'Yo) (2 24 )
(i) the initial is recombined with the
subse-quent stages using the Kronecker delta operator
o.
1.0
(ii) the + Xo 1 2~r) is the mnl
FV
shifted ri through "i" places with bits
x Xo
into the resulting vacant p In Appendix A
this process is shown to be equivalent to a
t of the problem variable [x] through "i" places right. (iii) the negative word correction y ~ could be formed
o
explicitly and added in, if "i" is iterated from 1 to n, or could be formed in the i :; 0 stage if "i" is iterated from 0 to n.
An alternative method of implementing Braun's algor-ithm would be to replace the shifts of the mUltiplier
through "i" places followed by accumulation, with a succ-ession of stages involving adding the term y
.x
followedn-1.
by shi ing the .;;;;,.;;;...=..;;.;;...::.=.;;;;.;:..:=..::.:::. one place right. The vacant
place arising from the sh filled with the sign bit
of the accumulation. The validity of this process is easy to confirm from the following:
[z]
::: [x] [yJ where [y] == y-2y0 "'" -2y 0
+
2{
n2- iy i } [x] -2yo[x]
=
i=o{Yo
+
1 -2 - (n-l)
=
Yl+
2 Y2 + + 2 Yn-l+ 2-nYn }[xJ 2yo[x]
{2
12 1 -1 ~l { 1 -1:= . . . 2 2 (2 Yn + Yn-l) +
+ Yn-3" G GYo}[X]-2Yo[X] (2 Equation 2 25 expresses the multiplication in successive 2 1 shifts (of the problem variable [x]) and
add s where accumulation at s j is
2
The stage connection can seen to be expressed by
the recurrence relationship: 1
rj 1 + Yn-j [x] for j ::::: 1 to n where ro Y
n [x] ,
(2 26)
By examining the first few terms of r., i t is poss-J
Ie to deduce a general term: Y [x] n
2 1 ro + Yn-l [x] r
l
=
[x]{Yn - l + 2-lyn } -1
r
2 2 r 1
+
Y n _ 2 [x]:::::
2-1 [x] {Y n-l + 2-1 Yn + Yn} -2 [x] [xJ{Y n- 2 + 1 Yn - l + 2
:::::
Therefore in general:
r.
=
[x]
Jy , J 1 n-J-1
+ 2 Y n - ] '+1 +
+ - (j-l)
2 Yn-l
2 Yn }
2
Yn-j+2
+ 2:-jy n}
+ e 0
(2-27)
Examination of 2-27 shows that the limits of r, are:
i.e.
- (1-2-n ) j 2-j
~ k=o
r. ~ (1-2-n)(2~2 J)
J
J
It is therefore apparent that rj dges not necessarily lie within the normalised problem variable range:
1 ~ rj ~ 1-2-n
Therefore, in order to convert 2-26 to machine variables, an "extended 2i s complementll
2 20
Figure 2-7 illustrates both the 2's and extended 2's complement machine word representations of a prob~
lem vari x.
/ /
.r-.J*~
X, X
-1
/ I / I
,,/ I
/ I complement
/ smallest bit discrimination
"
/ /
£---~
r---:I:ell!+!-___
I 1-e I" /
-1
I
I
o 1-e
~--[xl---ilI'llli'l
/ / /
/
/
"
/2-e
o 2-7 Two's and Extended Two's lement
Machine Word sentations
It is apparent from Fig. 2-7 that if x is within the range of normalised problem variables [x], (as i t is in equation 2-27) then the machine word representations are related by:
,...,
'*
x
=
x
+ 21x owhere
x
is the usual 2 I S complementrepre-sentation of
[x]
",,*
and x is the extended 2i s complement
repre-sentation [x]
",*
converted to x simply by the insertion of a bit of value Xo in the extra 21 position 0
[image:41.595.68.539.92.817.2]2 - 21
"-' *
r.
]
*
2- 1 (
*
*
*
*
2(r]. )0 - 1 -2(r. 1 ) ) ] - 0
+
Y n - ].(x
-2(x )0)*
*
*
where (rj )0' (rj 1 )0 and (x )0 are used to indicate the
'" *
*
",*values of the sign bits of rj , rj 1 and x , and so are
of ~alues 0 or 21.
*
r. J
-l~
*
*
=
2 r. J-1 + (r~ ]-1 ) 0*
~{-(rj~/)O
+ (r. )
J 0 (2-28)
where [ ] cont compensation terms for overflow into
2 ~
*
the 2 column,and ro y ~* x
n
From 2-28 i t is apparent that corresponding to each
*
shift of the machine word accumulation r. 1 by one place
]
-right, the vacant 21 bit place arising must be filled out
*
with (r. 1 ) , the sign bit of the accumulation.
] - 0
I t readily shown from 2-28 that the accumulation at iteration "n" is:
=
xy
+ 2xoy-4xoY
XV
+
2XoY
+
O(x,y) However, the productz
isz
=
xy
+
2xoY
+ 2yo~n
i=o
-i
*
2 (x) oy i~
2 -i (21x )y:.1=0 0 1
Hence the final accumulation needs to be corrected by the addition of 2yo~. However, since on the final
,..",*
iteration yox was added to the accumulation, the correc-tion 2yo~ (equivalent to -2yo
x)
can be implementedimpli--*
citly by adding yoX (equivalent to subtracting yox ) on "",*
y,
the final iteration, rather than adding YoX'. When this
rv
*
is done the product
z
is obtained directly from rn (ig-noring the most significant 21 bit)2 22
are not encountered. This formulation yields however, a circuit implementation wh h is less convenient.
Lewin ~J describes a variant on the above exten-ded 2's complement procedure. He retains an n
+
1 b binary word (i.e msb of 2 ). o When a shift of the accum-ula·tion is made, the sign digit is used to fill out the vacated 2° bit pos ion However, when an overflow has occurred the formation of the accumulation i t is ret-ained, and treated as the sign'bit, that is, shiftedright into the 20 position. This method and that described above are nume cally equivalent.
- i
Implementation using either the 2 shift form 24 or the 1 shift procedure just described requires dealing with the correction term Yo~. Since addition of
ware
equivalent to subtraction of y
X,
then if the hard-ocapable of subtraction as well as addition, no problems are encountered, and the variable to be added (yo = 1) on each iteration (or subtracted on the final . ~
iteration) always ~ {shifted a certain number of places, and with row II-out as necessary}. This contrasts with the serial M and
P
algorithms based on equations 2-15 and19 which require different words to be added to the accumulation at each iteration. The se al
P
algorithm based on equation 2-20 overcomes th di iculty by using the multiplier digits direct to form a stage termL:
n - jY ·x ~ 0 + J= . 1 y,x.2 ~ J which is then added to the accumu-lation, except for the final Yo iteration where its com-plement is added
2 23
uniformity of word to be added ( :::::: 1) on each iteration (or subtracted on the final y
o ion) .
The penalty that the negative multiplier correction
then has to be added ic ly.
The al version of Braun's algorithm overcomes
this difficulty simply with the row fill out techn
whi implicit implements the negative multiplicand
ion the shi ng process. Because of this, and
cause of the uniformity of the word to be added (or subtracted) on each iteration (if y. is hi), Braun's
~
algorithm seen to be particular suitable for serial
implementation.
Hardware implementations of equations 2 24 and 2-28 are described in Section 3 2.4.
2.3.5 BOOTH CORRECTIONS
Booth [5J developed an algorithm for serial mu plication of 2's complement numbers in which the multi-plier bits are sensed in pairs, and depending upon the configuration of the bits in the sensed pair, a part
lar contribution is made towards the product. This contri-but ion must, as usual~ be lied so that i t is of the
correct significance with to the final product.
Again there appear to be two methods of doing this, one -i
involving a succession of 2 shifts of the contributions followed by accumulation, and the other involving a succ-ession of stages
followed by a 2 -1
addition of the current contribution, ft of the accumulation. Both
meth-ods will discussed the treatment to follow.
2
-product,as determined by the multipl being sensed are
bJ:
bit ir currently
(i) If the multipl bit is 1 and the next r
multipl bit is 0, multiplicand.
(ii) If the mul lier bit i 0 and the next lower
order mult ier is 1, add the multip
(iii) If the multipl bit is the same as the next
lower r bit, do nothing.
Boothls algorithm ly derived from the basic
2's complement mUltiplication ationship 2-7.
Rewrit-ing 2-7:
Z
2 (y-y 0) ( 2xo) (x-2xo) + 2 (xy) 0
=
2 (Y-Yo) (X'-2xo) + y(~ + 2(xy)oDecomposing 2-29 into the sum of i
=
0 to n stages:'
-z
(2- 30)
where the Yi+l ,mul plied by 2 in the f
f originates because in being bracket of 2-29, Y-Yo has been shifted one place left. The dummy parameter Yn+l is
ined to be zero.
Le.
z
{Yi+l{2- + xo (2 2 i+l) }
+ Yi{ + 2 i +l
)}J
(xoYi+l + x y.) + 2 (x + o 1 0 -2x o y ) 0
-2
( Y i+l { + xo (2_2- i + l )}
+ Yi{2
+
Xo (2~ i+l)})+
[-2
y. + 2 ( +y~xy
>]
1 o 0 0
(Yi+l{2- + 1 2-r }
n{2
+-
r }J
+
O(x,y) (2 31)where the group of terms O(xuY) modulo 2
2
formulation where each of the contribut are shifted
through "i" places followed by accumulation. The contri-but ions to the product may be identif from equation 2 31
(cross to rules (i), (ii) (iii) above) as~
(i) If Yi+l := 1 and Yi
=
0, addx
shifted through iii"s right (Lee i
5n
and fi out the vacant placessing from the ft with ts
(ii) If Y
1 ;:; 0 and y, 1 "'" 1, add
x
(Le. subtract x)shifted through " ill places right, and 11 out the vacant
places sing from shift with bits Xo
(iii)
If Yi+l =
o
and y.1
1 and y .
. 1.
in modulo 2 addition.
=
0, add nothing.1 add { + 2
Numerical implementation of the algorithm is
ply obtained by accumulating i
=
0 to n s s of adding the appropriate contributions (as ined by rules (i)(ii) and (iii», shifted the required amount, with the vacated bit positions lIed with the appropriate bits.
As with Braun's algorithm, the above procedure of shifting the variables through "ill places followed ~ accumulation not normally used in hardware imp lemen-tations of Booth's algorithm. It is instead usual to implement the algorithm by a succession of stages in which the terms Yi'y I are sensed (starting at Yn,Yn+I)' the appropriate contributions are added, and the
accumu-lation is fted one p right, as scribed by Chu
[6J.
The vacant place ing from the shift i filled with the s bit of the accumulation. As was the case with Braun's
algorithm, the validity of e s is intuitive, since
2 ~ 26
the value of accumulator. Again however, it is
difficult to predict th~ value of the s bit of
accumulation and so a recurrence relation proof is
to, where i t is not necess -to know the va of the accumulation sign bit to prove the validity of the
1
sSG The recurrence re
in problem
1
r. J
abIes: r. + {
J 1 j+l [x]
where r 1 0, Yn+l =: 0
By examining the first ible to deduce a general term
2-1r_ l + Yn+l [x] =
[x] {
[x]
== + Yn [x] 1
[x]
2-1 ([x] {-Yn}) + Y [x] n == [x] { 1 + 2 -1 Yn } = 2
1 + Yn-l [x] -Yn -2 [x]
1 ([x] { 1
+
1 Yn })[x] { + 1 Y + 2
=
2 n-lTherefore, in general:
r.
J
j
[x]}
j 0 to n
terms of r. i t J
1 [x]
+
Y n-1 [x] -Y n-2 [x] 2Yn }
2 +
Examination of 33 shows that the limits of r. for J
1 ~ [x] ~ are:
~ r. ~
J
The upper limit of + 1 on ses the case [x]
(2 32)
poss-(2-33)
Since this limit is 2-n above the of normalised
prob vari necess to proh case
where the multipl and [x] is 1. If thi r.
2 27
abIes and so a transformation of 2 32 to vari~
straight~forward:
2 (r . )
J 0
=
l{
1~2(rj~1)O}
+
{Yn~j+l(
)}}
i e r.
J
{2~
j-,l+
(rj l)O} + J'+
-I- Y n~J.SZ}
+
[2
{~
( 1)
0+
(r j ) 0j+l~XoYno"j}J
(2~34)
It readi confirmed 2~34 by considering
iterations between r a n d
o that:
n
i=o ~i{ y ) -I- y. ~
(2 35)
Notice from 2-34 that at stage "j" i t is the accumulation from stage "j
-1'
which is shi (2 ~l ), whereas with 2-31 the contributions where e shifted (2 i) prior to accum-ulation. Consequently from 2 34 is seen that thevacant place occurring with 2 1 shift filled by
zero or unity depending on the state the sign bit
I" Also, modulo 2 ition occurs at each stage of the accumulation, and so con-tents [ ] in 2-34 are omitted the numericaL implementation
Hardware implementations of Booth's algorithm,
which ars to be very sui Ie implementation
s no additional correction les are required,
described in Section 3.2 5.
2.3.6 DE MORl AND SERRA ALGORITHM WITH liS COMPLEMENT IMBEDDING
De Mori Serra [7] scr a 21s complement
multiplication algorithm for llel array
implementation. T h e ' s lement numbering
2 ~ 28
De Mori and Serra with ir 2's lement multipl
It noted, v that lement of a 2'
complement input variable is adopted only when the vari able is negative,
Synthesis of the algorithm in terms starts
from lowing
2 5 and 1'5 comp vari s as From 2 5:
Ixl
"'~+ (2~:;n x == XXo
0 v
2~n)
=
+ex
+ 36)"'- lJ
larly Iyl == yYo +
(y
+ ) Yowhere the lis complements of
x
and '" y are:LJ -n
X
=
2 2(2-37)
>(,
2
Y :::::
Substituting from 2~36 into the modulUS product identity
IxYI
==ixll YI
(2-38)yields in the notation of De Mori and Serra IxYI
Gl+
2 -n x oYowhere Q
=
U/Z+Jf
+x
v
2-nWY .
and ~
=
"'-xXo+
""xXo ' :=
0 (2 39)
"'- v
== yYo + YYo, J{ Xo
Now introduce the 2's complement product identity
'"'-'
xy
=
40)and substitute from 2-39 to obtain after simple manipu-lation the De Mori and Serra algorithm in general terms
Z == (xy 0 +
~G)
( ) 0 + 2 2nxoY 0 41)The mixed 2's and l's comp var
ur
and Z areevident
The structure i clear from Fig. 2~8 (aL
This algorithm, when the M
algorithms, is compl both the 2G
s
20 2-3
W :;
WI U/2 U/J
;t: %, %3
%11
~w.%2«{
G '" ~ ~
:::
a,
Gl2 Q3
z
ZoZ,
z, z, z,
29
(a) Algorithm.
2-n~1 2-n 2,n-l TO- l ..
wn-1
%n-l %0
%'~-2 ~j~-l ~~
~W %W
2 0-2 2 rH
. "
...
%n-f!2
'2; ~
a
n -1 G.nZO-1 Zn
x:'UJ
n-2 n
% UJ A;: UF
n-l 2 (1-1 3 % W n-1 fl-I 0-1 n %u/
~l-«~2 A:'~_1 %n~
Xfl+1 }Cn.,,,,' Km -1 Xln
H:m-z ~fH
~ ~ 2: 2: E
a
n•l UO>2 612n - 2 G,2n-1am
(b) Circuit,
DeMor; and Serra OCN'"
"'~.~~"
), Z. Z, Zi Z,"M, Network (b)-exacl.
x,
Y,
~tr=::
Z. Z, ZI Z2n
2,- 0
ware implementation of algorithm is 1n Section 3 2.6. Although ral networks are
to generate the mixed 1 sand 2's complement var s, and corrections to be added to the core mult li-cation product p'the rcuit imp ion shows
a surp s of try and ffic
2.4 NON~RESTORING DIVISION
Problem equation z ::;;:: x/y
Normali Problem Equation = k
[x]
I[Y]
i.e. k[xJ-[zJ[yJ 0 where ZM := xM/Ym and k "" Ym/Y
M > 0
Machine word equation:
(2'-42)
For reasons explained in Appendix C, introduce a pseudo-quot
q
word: z-2z a
+-
1. and substitute into the machine equation:
(2-43)
k
x
+-
{~(q~l) (y-2yo)}
+-
[-2xok]::::
0 (2-44) Arrange the contents of { } as positive connected words suit Ie machine implementation:+-
{qy
+-
o
Write the machine word equation d sian, 2-45, as sequence of
words Y, '" Y
ikx
+-+-
[2 Collect all+
s i 0 to n of add ion of to the shi d dend ]~x
i - ) ,'"
}
{ qiY
+
+6 Yi{qi(2Yo 1)-2x k.} 4y o
J
01 a
terms
'~) + q
one summation:
o
d
(2 45) a sor
+ n i=o
2
l ~~n+l
f
== L Yo (2+ 2q. (2y ~1)~4x lc
1 0 0 1
2 i
+
n+l Yo+
q
:~1)+
o.
y
1 1n
+ 1) 4 .-2y}]
~ 2~n+ly (2~48)
1 0 0
where
w.
=
q1
2~n+l
" Yo
+ 2qi (2yo ~1)-4 ~2y o
Writing in a manner similar to single shift and add stages as employed in 1 mu lication, but employing a left shift:
2~n[2x2x'
··x2{2(2kx+
w )
+
wd
+
w
2
+
~)
e; __
~
__
(2-50)
Equation 2-50 is the machine algebra formulation for non~ restoring division. Numerical implementation of this equation, based on the strategy in Appendix C is achieved by starting at the inner stage 1 element and working out to stage n. Two important simplifications arising from the modulo 2 arithmetic are:
1) In
W.
(equation1 49) omit the group of terms
2qi(2Yo-l}-4xoki 2yo' since being zero or nega-powers of 2, they cancel overflow into the 21 column.
2)
ection) hand s
For w.
=
w ,
omito.
y,
and1 n 1n (for later
corr-an error ny (as i t would on the right
Thus, in numerical implementation
. - J
W. ~
2 ~ 32
The effective error terms are the 2-n
V
from 2) above, and constant 2 ~n,+ on right hand s of equations 2~47 to 2-50. These terms sent the error in the quotient multiplied by the divisor. Hence the quotient error is)
/
To str ly correct then the should be corrected the
~2 n
ion of 2 ~n ,
e]'
The true 'z" is obtained SilUply from 2 43 as
'" z 1 + 220
where, if z is 1, '" z '" q + 1
0 (2-52)
and, if z 0, Z == '" q
-
10
"-'
+ 1- [2J
=
q (2 53)i.e. from 2-52 and 2 53v the true quotient
z
is obtainedsimply by adding 1 to the psuedo~quotient
q.
In the shift and add accumulation 2-50, the recurrence relationship as described Chu
[6J
and others is c arly scernable. valuea.
of1
accumulation at i re to that at stage i-I:
(2-54)
'" O.
1 the machine word sentation
r.
1 fremainder at stage if in formulation Chu. As discus by Chu, and
stage i quotient (Le (0 1)0) and
as
are
ibed q. 1 -l' is 1
same
(0. 1) and
:1- 0 are different ..
It should be
necessary if the division is taking situation where
[x]
and[YJ
~
'"
[x]
~ and :( [yJYM
Appendix C, at the s of q 1 is 0 if
k is place a dynamic
that
In a s situation, where des simply to divide
[x]
by [yJthen the scaling k i set to unity
Guild
[9J
s a non~restoringon the s add ion
scussed in
In Append P Vo at for
agrams illus
trating closed loop the ive
sion algorithm. It is found that 'closed loop'
algorithms such as division and square root the P.V. form-ulation is simplifying
2 5 NON-RESTORING ROOT
Problem Equation: z =:
/X
i e x-z 2
=
0Normali Problem Equatlon
.
[x] [z]
2 with zM =:Machine Word Equation:
(x-2x )-CZ'-2z )2
o 0
=: 0
As previously in the case of divisionv
pseudo-quotient word (Appendix D 2.)
q
=: 2'-2z + 1o
Subst into 2 56:
o
Writing 2~57 in s summation fo:rrn:
{
1No-ting the simi
0, (
liM
(2-56 ) introduce the
57)
"" 0
problem structure,
re 57 and 2-58 to replace unknown 2) corres
to the known d sor
'y
in divi v by a2
-q. L ( ~i)
~ ) qi
Using the identity
~i
qi 2 (q 2 i + 1) (z\ppend Do L 1) then s~
ig 0 (
1 1 i}
1 := 0 60)
q. ~ and not
i
)}
+ [~
2x]
0~
=
0
x
== 2 (2 61)the terms positively connected
words and
x+
ni=o
where = using the
(Appendix
thei complements
2~i{q +
gop.
+[-1 1
FoJ
+
ig 1
i"" gi-l D 1. 2) I then
q.p.
+ qi{
-1 1
1 +
+
[-i
g.
~
o
2x ]
=
0 o(2 62)
Removing the term gi n outside the summation because its feet is small:
x
+ n . 2i{-
g.p.
+1=0 1 1
The term 2i
p.
1
now partit
-n",
~2 g n
~1
2 g.
1
(1/4,3/4) in order to the terms which will enter into numerical
from e ch will be
x
+
+ g~.'p. ~ (1/4) 11 1- 1
.~
+ i
f
i}
eo x