Design of a hardware efficient key generation algorithm with a VHDL implementation

(1)

Rochester Institute of Technology

RIT Scholar Works

Theses

Thesis/Dissertation Collections

5-1-1993

Design of a hardware efficient key generation

algorithm with a VHDL implementation

James A. Goeke

Follow this and additional works at:

http://scholarworks.rit.edu/theses

This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please [email protected].

Recommended Citation

(2)

Design of a Hardware Efficient Key

Generation Algorithm with a

VHDL Implementation

by

James A. Goeke

A Thesis Submitted

III

Partial Fulfillment of the

Requirements for the Degree of

MASTER OF SCIENCE

in

Computer Engineering

Approved by:

Graduate Advisor - Prof. George A. Brown

Department Chairman - Dr. Roy Czemikowski

Reader - Dr. Tony Chang

DEPARTMENT OF COMPUTER ENGINEERING

COLLEGE OF ENGINEERING

(3)

THESIS RELEASE PERMISSION FORM

ROCHESTER INSTITUTE OF TECHNOLOGY

COLLEGE OF ENGINEERING

Title of Thesis: Design of a Hardware Efficient Key Generation Algorithm with

a VHDL Implementation.

I, James A. Goeke, hereby deny permission to the Wallace Memorial Library

of RIT to reproduce my thesis in whole or in part.

(4)

This document was produced _using Windows version

3.1,

Wordperfect version 5.2 for

Windows,

Windows

Paintbrush,

and Mathcad version 3.1 for Windows. The C programs were developed _usingBorland C version 3.1. All oftheseprograms were run on a

Gateway

2000 386/33 PC clone. The VHDLportions were developedon a Sun SparcStation 10. The

code was written _using the emacs _editor, and a syntax directed editor extension available on

theInternet. Theanalysis and simulation oftheVHDL wasdone_{using Vantage Spreadsheet.}

The C code was also compiled and run on the Sun _using

CC,

an ANSI compatible C

compiler from Sun.

The

following

names used here and in the remainder of the document are registered

trademarks ofthe respective companies:

Windows Microsoft

Corporation,

Wordperfect Wordperfect

Corporation,

Paintbrush Microsoft

Corporation,

Mathcad Mathsoft

Corporation,

Borland Borland

International,

Sun Sun

Microsystems,

Vantage Spreadsheet Vantage Analysis Systems.

Copyright

(C)

1993

by

James Goeke

(5)

BIBLIOGRAPHY

. D-l

D-2

. D-7

D-10

. . D-15

D-18

. . D-20

D-23

D-25

D-27

. D-29

D-31

(7)

LIST OF FIGURES

Figure 1 - Equations for

a gaussian distribution of random numbers .3-5 Figure 2 - Analysis

ofdata produced

by

equations in Figure 1 ... 3-6 Figure 3 - Theoretical

plot ofaccesses at each location 3-7

Figure 4 - Theoretical

plot ofaccess distribution .3-7

Figure 5

-Output data set ADDRUCMS . 3-10

Figure 6

-Hierarchy

of objects for the behavioral code 4-11 Figure 7

-Hierarchy

ofobjects for the structural code ... 4-14

Figure Bl-1

-Input Data Set CDEUNTF B-4

Figure Bl-2

-Input Data Set CDEUNIF . B-5

Figure B2-1

-Input Data Set SYMUNIF . B-7

Figure B2-2 Input Data Set SYMUNIF B-8

Figure B3-1 - Input Data Set

CDEGAUSS B-10

Figure B3-2

-Input Data Set CDEGAUSS B-ll

Figure B4-1 - Input Data Set

SYMGAUSS ... B-l 3

Figure B4-2

-Input Data Set SYMGAUSS ... B-14

Figure B5-1 - Input Data Set CDEMYC1

. . . B-16

Figure B6-1 - Input Data Set SYMMYC1

.... B-18

Figure Cl-1 - OutputData Set ADDRUCUS C-4

Figure CI-2 - Output

Data Set ADDRUCUS C-5

Figure C2-1

-Output Data Set ADDRGCUS C-7

Figure C2-2 - Output Data Set ADDRGCUS

... ... . C-8

Figure C3-1 - Output Data Set ADDRMCUS

... C-10

Figure C3-2

-Output Data Set ADDRMCUS C-ll

Figure C4-1 - Output Data

Set ADDRUCGS ... C-13

Figure C4-2 - Output Data Set ADDRUCGS

... C-l4

Figure C5-1 - Output Data Set ADDRGCGS

. C-l6

Figure C5-2 - Output Data Set ADDRGCGS C-l7

Figure C6-1 - Output Data Set ADDRMCGS

... C-l9

Figure C6-2 - Output Data Set ADDRMCGS

... C-20

Figure C7-1 - Output Data Set ADDRUCMS C-22

Figure C7-2 - Output Data Set ADDRUCMS

. . C-23

Figure C8-1 - Output Data Set ADDRGCMS C-25

Figure C8-2 - Output Data Set ADDRGCMS C-26

Figure C9-1 - Output Data Set ADDRMCMS C-30

Figure C9-3

-Output Data Set ADDRMCMS C-32

Figure C9-5 - Output Data Set ADDRMCMS

... . C-34

Figure C9-7 - Output Data Set ADDRMCMS

. . C-36

(8)

LIST OF TABLES AND EQUATIONS

Equation 1 . 2-3

Equation 2 . 2-6

Table I Input Data Sets . . 3-4

(9)

ABSTRACT

Thedesign andimplementationof akey-generationalgorithm will be discussed. The

steps in the procedure consist of_choosing a baseline algorithm for comparison,

designing

a

new _algorithm, _testing and _comparing the algorithms _using C language programs, and then

implementing

the new algorithm _using VHDL. The final result of _testing the algorithm

implemented in VHDLwill be comparedto the original results obtained from theC program

implementing

the new algorithm.

C language programs of the chosen algorithms will be developed to verify their

similarity and functionality. The resultswill be used to decide ifthe newalgorithm selected

for implementation is sufficiently robust, and similar in

functionality

to the baseline

algorithm.

After the algorithm is _selected, it will be implemented in a hardware description

language. This will be done for purposes of

demonstrating

_top down design and hardware

efficiency. This process will involve the steps of moving the design from an abstract

algorithmic view, to a high level

(behavioral)

hardware

description,

and then to a low level

(structural)

description. This process will illustrate the details of_moving the design from a

(10)

1.0

INTRODUCTION

Processors of all _sizes, from single board microprocessors to large main

frames,

can

perform _many complex and varied tasks. But _they all suffer from a common problem. No

matterhow large or complexthe_system, it hasafiniteamount of_working storagespace, and

usually a much smaller amount of spaceto _actually performthe processing. The amountof

storage space available is

increasing dramatically

through the use ofnew _{technologies, but}

so are the size ofthe data sets that need to be processed.

Ifthe problem to be solved requires more data than the amount of available _storage,

this is a serious fault. But often the data can be more _compactly stored because it _only

sparsely occupies the storage space it is using.

Key

generation algorithms can be used to

perform this _memory space compaction.

Anotheraspect ofthese datasetsisthat_they must oftenbeaccessed basedon the new

input to the program. As the new data arrives, it must either be _stored, compared to

something already in storage, usedto change _already stored

data,

etc. There are_manyvaried

methodsto both store thedataand to perform the accesses, but

they

all dependon

being

able

to _uniquely associate the data with a storage location or locations. As with the storage

limitations,

thisbrings _upanother _shortcoming of processors. No matterhow

big

or how fast

the processor is (and again this is _rapidly

improving

with new _{technologies),} ifthe data set

is large enough, it will _eventually cause _very poor performance.

Key

generation algorithms

(11)

ofthe

incoming

data_{to use,}or somehowgenerate a new

key

from the

incoming

data. Ifthe

key

isgenerated properly, it can _potentially allow a more direct accessto the desired storage

location,

and thus improve performance

by

_allowing the processor to work on the problem

rather thanjust using its power to search its own memory.

Key

generation algorithms are an important part of _many larger algorithms.

They

provide an important device to allow storage of_potentially large volumes ofdata in a much

smaller_{memory space,}and_theyprovide a methodof_associating

incoming

datawithlocations

within the processors storage space. Because of_this,

key

generation algorithms have been

widely studied. But most of this effort has been on how to generate keys using software

As _technology _{advances, many} ofthe algorithms that

historically

were done in software are

now

being

done in hardware. This presents a problemfor

key

generation algorithms.

Many

ofthe current algorithms are based on ideas that do not translate well into

hardware,

or are

very inefficient in hardware. Because of_{this, it is} _my desire to work on the development of

a

key

generation algorithm that will be comparable in performance to a _currently used

software _algorithm, and at the same time be designed with hardware efficiencies and

(12)

2.0

CONCEPT

There are _many methods used to _associate, or

key,

datato a storage location. Some

of the possibilities include using: a particular element in the data (e.g. name), a set or

combination ofelements inthedata (e.g. name and_address), anumber assignedto each piece

of

data,

a

linking

ofthe datatogether in a

list,

etc. No matter howthe data is referenced to

a

location,

it must have the _property of

having

a one-to-one correspondence between the

attributes used as the

key

and the location

The purpose of this work is not to examine

key

generation algorithms in detail.

Rather,

it is to examine how to build an efficient one in hardware. To select a baseline

algorithm, aninformal _surveyof software examples was_examined, and discussionswereheld

with a number of people involved in the design ofboth hardware and software. The result

ofthis was the conclusion that

hashing

is one of the most _commonly used

key

generation

algorithms. Itwas decided to use this as the baseline algorithm for comparison.

Having

decided on

hashing,

it should be noted that this violates the last _property

mentioned in thefirstparagraph, namely, that the

key

andthe storage location have a

one-to-one correspondence. This limitation can be overcome

by developing

the algorithm to

compensate for these collisions. There are a large number of methods to do this that place

the data in an alternate storage location. The main requirement is that _they be deterministic

(13)

Since the collision _strategy often involves knowledgeofthe program and/or system _using

it,

it is beyond the scope ofthis work.

Hashing

is atype of

key

generation thatgets its namebecause it

hashes,

or scrambles

the bits ofthe piece ofdata to be used as the key. This is a _widely used technique because

it is

fast,

simple to implement in software, can use minimal storage _space, and can take the

search

directly

to the desired storage location (as opposed to a linear search). Repeated use

and

testing

have demonstrated this to be one of the most simple and robust methods. It is

due to the power and _simplicity of

hashing

that it is so _widely _used, even thought it has the

drawback of_producing collisions.

Inaddition to

having

a

key

generation algorithm_chosen,sometypeoflargeralgorithm

or process is needed to use the

functionality

ofthe

key

generation algorithm. Forthis work,

the skeleton of the type of algorithm that could be used in a data _searching or data

compression application will be used. It was selected because it will provide a _very simple

framework within whichto usethe

key

generation algorithmwhileatthe sametime_providing

a more realistic test.

Theskeleton algorithm will operate

by

_usingthe

incoming

data,

and another valuethat

will represent the

history

of some section ofthe data that has come

immediately

before the

current piece ofdata. A

key

will be generated_using

hashing

to representthisentire _quantity

ofdata. The

key

will be used as the reference into storage. The operations that result from

(14)

the scope ofthis work. One element that the skeleton will reference from storage will be a

value assumedto be stored there thatwill representthe newvalueforthe

history

ofthe data.

This new

history

value will be used with the next piece of input data.

2.1 SOFTWAREALGORITHM

The actual skeleton operation will be as follows. The

incoming

piece of

data,

which

will be called the _symbol, will be read in. This will be concatenated with the value

representing the data

history,

which will be called the previous code. The result will be

called the hash _argument, or harg. The modulo ofthe

harg

will then be taken. This result

will be the location in storage to be referenced, called the address (see Equation 1). An

modulo algorithm address_-[(symbol) catanate [previous code)]mod 4021

Equation 1

actual algorithm would _probably have several pieces ofinformation stored there for its use.

In this work, all that will be stored there will be a value that will become the new previous

code, which will be combined with thenext symbol. Thisprocess will then continueas

long

as there are more symbols to be processed.

The previous paragraph mentions that a modulo operation will be used. The choice

ofthe modulus is an important parameter. The values most often used for this are prime

(15)

is difficult to perform division operations unless

they

are

by

apower of2 (in which case it

is _merely a shift operation). Because the number ofstorage locations is most often picked

to be to be some power oftwo (to make both hardware and software _addressing _easier) the

modulus choosen will be a prime number smallerthan the number of storagelocations. This

means that the operation will produce remainders that range from 0 to the one less than the

prime number. One unfortunate side effect ofthis is that there will be some locations that

will never be accessed.

Any

location that corresponds to a value from the address equal to

the prime number _up to the last address in the table will be a location that will not be able

to be produced

by

the algorithm. This will notbe serious ifthe prime number is chosen to

be close in value to the number of storage locations. The prime number usedin the selected

hashing

algorithm was picked from a table that listed a special value to use for _many

common table sizes (all ofthe table sizes were powers of2). The table of selected primes

used was from a piece of code developed

by

Kodak

Berkley

Research.

Forthis work, theactual _processing thatcouldtakeplace outside ofthe

hashing

is not

important,

what is important is the

hashing

itself. For that _reason, no activities occur in the

skeleton exceptforthehashing. The informationrecorded will be howoften aeach particular

address is accessed. This will be used as a measure of the _efficiency of the

hashing

algorithms.

It was mentioned _{that values,} called previous codes, would preexist at the storage

locations. One reasonforthisisthat_they cannotbegeneratedwithout

developing

acomplete

(16)

unrealistic assumption. Some algorithms work

by

_{pre-computing} values that are known to

function well for a given typeofdata (this assumesthat _something is known about the type

ofdata

being

input).

Alternately,

it could be assumedthat a snapshot is

being

taken

during

the middle of a_run, andthevalues atthe locations were calculated earlier in the processand

are now _only

being

accessed.

2.2 HARDWARE ALGORITHM

Further investigation revealed that XOR operations were sometimes mentioned in

relation to hashing. This was

interesting

because it is a_very simple operation to perform in

hardware. It is one of the "primitive" operations _usually included in the set of logical

operators in most _programming languages and in _many hardware devices. Because it is

hardware _{efficient, the} XOR operation made an excellent candidate for this work.

Unfortunately

none ofthe references to using XOR operations for

hashing

gave an

algorithm,

they

only mentionedthatit couldbe used. Butthe_underlying premise of

hashing

operations is that

they

will give an _apparently random but uniform distribution ofthe data

into thestorage locations. It is

highly

desirable to avoid

bunching

or_groupingofdataasthis

will _greatly increase the number of collisions.

Aftera greatdeal of

investigation,

looking

at mappings and

distributions,

and_running

(17)

better

hashing

was thenumber ofbits

being

operated on. Thisobservation, andfurther tests,

leadto the development ofthe algorithm selected forhardware implementation.

The algorithm will mirror the procedure _using the modulo. The skeleton of the

algorithm will read inthe _{data representing}symbols. Thesymbol will be XOR'edwith abit

reversed representation of itself. This value will be concatenated with the previous code

which will produce the harg. The address will be generated

by

_shifting the

harg

down

(toward the

LSB) by

the number ofbits in the symbol and

XOR'ing

this with the previous

code (see Equation 2). This substitute for the modulo at first glance seems much more

XOR algorithm address =

(symbol reverse symbol) catanate previous code m . ,

X-previous code

256

NOTE: division

by

256 (which is

28)

= _shift _right

by

8

bits

Equation 2

complicated. Further inspection reveals that each ofthe operations is _{actually very} simple

to implement in hardware. There are _only two types of operations _{used, the}

XOR's,

which

are simple combinatorial operations, and the shift and bit reverse which are both just

interconnect_wiring that come for

"free"

in hardware. A benefitofthis

hashing

function over

(18)

3.0

ALGORITHM ANALYSIS

To_studythealgorithms proposedinthefirstsection, several C programs were written.

A few ofthem will be discussed here because _they relate to the generation and analysis of

the dataused to_verify and comparethe algorithms. The programs discussed are included in

appendix A. Other programs were

developed,

but are not included as _they helped

develop

theprograms shownbutdonot add_any additional information. Theprograms discussed here

are: make-sym.c,

histogra.c,

smooth._c, and addr-gen.c.

Make-sym.cwas one method used togeneratethesymbolsand previous codesneeded

by

the algorithms. It would have beenpossible to take _{any arbitrary} file and use it as input

data. This wouldhave allowedgenerationof_results, buttheresults wouldhave been difficult

to understand. Instead it wasdecidedthat _maintainingcontrol overtheinput datasets would

allow more comprehensiveinterpretation ofthe results. Onemethod of_producing inputwas

this Cprogram. It wasdesignedto generate datathathad arandom distribution. The reason

forthis was to check the algorithms against _totally unstructured data. The source code and

a brief description for this program are in appendix AI. The analysis ofthe dataproduced

by

this program is shown in Appendices B5 and B6.

Inadditionto theunstructuredinput

data,

additional datawas generated_usingMathcad

on apersonal computer. Mathcad allows the user to generate random values that fall into a

distribution. Two distributions were chosen, a uniform and a gaussian distribution. These

(19)

check how the algorithm would respond to all input values

being

used with _something

approaching equality. The gaussian distribution was chosen for the opposite _reason, to see

the algorithm's response to inputs

falling

into a small range ofthe possible values.

Histogramwas writtento do high speed

histogramming

operations. Itaccepts as input

either values _representing symbols or values _representing previous codes. It then generates

counts that show how often each location in the range has been accessed. This function is

available in

Mathcad,

but a stand alone program was much faster. The source code and a

brief description for this program are in appendix A2.

The program smooth. c was written to perform simple one dimensional convolutions.

It allows the user to _specify the width over which the convolution is to be applied. To

simplify the program, it only accepts odd values for the width so that there will always be

a central value to convolve about. The actual convolution did an average over the entire

width of the window. The source code and a brief description for this program are in

appendix A3.

The final C program is addr-gen.c. This is the program that implemented the

algorithm skeletons discussed inthe CONCEPT section. The program read afileofprevious

codes and stored them in an array. As processing

began,

it read the symbols from another

file,

one symbol at a time. For each symbol read

in,

both the modulo and theXOR

hashing

algorithms were applied. Each algorithm generated an address that was used to increment

(20)

containingthenumber ofaccessesperlocation for both the modulo andXOR algorithms and

wrote them out to separate files. The source code and a brief description for this program

are in appendix A4.

3.1 HASHING

Hashing

algorithms are not _completely general purpose.

They

are designed for a

specific size table. Inthis caseitwasdecided to use a4-kbyte table. Thismeans that 12 bits

areneeded for addressing. This size was chosen as

being

large enough to be useful in a real

algorithm, but small enough to

keep

the amount of data tractable. _{Since every} address is

associated with aprevious code that means that4096 12 bit codes had to be generated.

The input symbol size was chosen as 8 bits. This is a _very standard _value, and

corresponds to the size of_many buses and operations. It was chosen because it would be

very

likely

that real world applications could use this symbol size. This meant that values

in the range of0 to 255 were generated as input symbols. One point to note is the number

ofsymbols used for each test. Since the point ofthe test was to determine the

frequency

of

accesses at each _address, it is important to start with an input symbol set that is an integer

multiple ofthe number of addresses. This at least provides the potential that each address

(21)

3.2 INPUT DATA

As previously

discussed,

several sets of input data were produced. Each of the

resulting files was placed in its own directory. The C program histogra.c was then run on

each of the these data files.

Then,

to ensure _uniformity of analysis, a single Mathcad

document was produced. This document was then copied to each

distribution,

a mean and standard deviation had to be selected. The mean was picked as

the center ofthe range (to make it easier to plot the results and _verify that the distribution

(23)

performthecalculationcamefromaMathcadhandbook (a Mathcad handbook isan electronic

online document that is part of the tool). The next

figure,

Figure

2,

generated from the

Mathcad document shown in Appendix

B4,

shows the result of histogra.c _processing the

values produced

by

the document in Figure 1. Ascan be _seen, and was _expected, a gaussian

distribution is indeed the result.

Figure 2 - Analysis _of data _produced

by

equations in Figure 1

3.3 DISCUSSION

Beforethe actual results are

discussed,

it ishelpful to examine what results should be

expected in the theoretical case. Two main types of results were produced

by

these tests.

(24)

each location was _accessed, which gave a

frequency

_{distribution.} _{One benefit}of_using these

outputs is the ease of

illustrating

the _perfect, or

theoretical,

case.

For a perfect

hashing

function,

every address wouldbe

accessed the same number of

times. This would produce two

very distinct graphs. For the

accesses at each _memory

location,

the theoretical graph is

shown in Figure 3. As _shown, , , < ,

, , , , _,

b '

Figure 3 Theoretical plot ot accesses at each location

this produces a flat horizontal

line across the entire range of _addresses,

indicating

equal accesses. The height ofthis line

on the vertical axis is

directly

related to the integer constant times the number oflocations

used to determine how _many

symbols to produce. For the

histogram,

the theoretical graph

is shown in Figure 4. This also

produces a _very distinctive

graph, a vertical line. This

shows that there was _only one

frequency

of access,

implying

Figure 4 Theoretical plot ofaccess distribution

(25)

accessed an equal numberoftimes. The heightoftheline would bethe numberoflocations,

and the x coordinate would be the integer constant.

These two plots showthe ideal case of a perfect

hashing

function,

and thenumberof

input symbols

being

a multiple ofthe number oflocations.

Any

other set of conditions will

cause both types of plots to degrade. It is this degradation that we are interested in because

it can be used as a metric to compare

hashing

functions. It isnot the amount ofdegradation

that we are interested

in,

because there is no such _thing as a perfect

hashing

function (even

ifthe proper number ofinputs were _used), but rather the difference in degradation between

the two selected

hashing

functions.

3.4 RESULTS

The input symbol and previous code files were tested against each other in all

combinations. This was to observe the effect of a given type of symbol

being

used with

similar and dissimilar types of previous codes. A method similar to that used when

generating the input files was used. A separate

Again,

a Mathcad document was created, copied to each

directory,

and

calculated, to ensure consistent data analysis. Table II shows the input types used, and the

(26)

Table II - Output Data Sets Code File

(See

Appendix)

CDEUNIF CDEGAUSS CDEMYC1

Symbol

File

SYMUNIF ADDRUCUS

(CI)

ADDRGCUS

(C2)

ADDRMCUS

(C3)

SYMGAUSS ADDRUCGS

(C4)

ADDRGCGS

(C5)

ADDRMCGS

(C6)

SYMMYC1 ADDRUCMS

|

(C7)

ADDRGCMS

(C8)

ADDRMCMS

(C9)

As with theinput data _only a small example ofthe results will be discussed

here,

the

rest ofthe results will be in Appendix C. To

keep

the datasets reasonable, small amounts

of data were used in most of the tests. While these results validate the point

being

demonstrated,

that the two

hashing

functions perform similarly, the _resultingplots are not as

illustrative as _they couldbe. The plots _only show aone-sided distribution which is a direct

result ofthe small size ofthe input symbol set. As larger symbol sets are used, the peak of

the distributionmoves_away fromzero and more

fully

illustratestheentire

distribution,

which

would be more representative of the actual results expected in a real application. These

results can beseen insome ofthe tests becauseoneofthe input files

(SYMMYC1,

Appendix

B6)

was generated with a larger number ofsymbols for this purpose.

A copy of figure C7-2 from Appendix C7 is shown in Figure 5. The plots in this

figure,

which showthe results ofanalyzing ADDRUCMS, are included here for purposes of

discussion. The upper plots show the number ofaccesses versus _{memory address,} the upper

(27)

range20

AccessM BinsA

300

-200

AccessM,_BinsA

AccessX BinsA

(28)

very

busy,

but it can be seen that there is adefinite horizontal band to the data centered on

the mean of16. To provide abetter illustration of_this, one set ofdata was processed _using

aconvolution (smooth.c), andtheseresults were processed

by

Mathcadto sum groups ofdata

points. _Thisreducedthe clutterintheplot and gave abetterviewofthe_underlying structure.

This extended analysis can be found in Appendix C9. The central two plots in the figure

show the plots for the histograms of the

hashing

functions. The left most ofthe two plots

shows the information for the mod

hashing

_algorithm, and right most shows the similar

informationfortheXOR

hashing

algorithm. Forcomparison purposesthe lasttwoplots were

overlaid onto the same set of axis and this is shown in the bottom plot.

These plots show a definite gaussian

distribution,

which is a degradation of the

theoretically

expected vertical line. The plots show that neither of the

hashing

functions

performs _perfectly, which should be expected, but that _they both demonstrate a degraded

version ofthe theoretical plot.

But,

most

importantly,

both plots show that the algorithms

function_very similarly. This iswhat was

being

attempted, andthedatashowsthat this result

was obtained. This demonstrates that a new

hashing

function,

which is easier to implement

in

hardware,

has been developed and shown to work _similarly to a _{commonly used,} difficult

to

implement,

software version.

There are several items to remember.

First,

even though it is known that

hashing

produces collisions, thateffecthasbeenignored while

developing

thealgorithmsinthiswork.

Second,

themodulo

hashing

algorithm doesnot use all theavailable_memory,while theXOR

(29)

Collisions have been ignoredbecause _they do notrelate to the purpose ofthis study.

Both algorithms will produce _collisions, and as

long

as both distribute the addresses

throughout the entire range in a similar

fashion,

the _resulting

functionality

and collision

frequency

will also be similar. Itshould also be notedthat not all collisions are undesirable.

For _example, if a _{string searching} or _matching algorithm is _using the

hashing function,

the

collision could show that a match has been found. This could be the desired result ofthe

process. Therefore _{the number,}

frequency,

and importance of collisions are more of a

concern and function ofthe larger _enclosing algorithm than the

hashing

algorithm.

The fact that the modulo does not use all ofthe available _memory has _already been

mentioned. This seems to suggest that _using a prime number as close to the table size as

possible would be desirable. This turns out not to be the case. The prime chosen was

selected from atableof recommended_primes, and_they are recommendedfor a good reason.

This was verified

by

_testing several other larger prime numbers (closer to the actual table

size) in the algorithm, but

they

all produced more degradation (sometimes severe) from the

theoretical.

So,

even though a smaller portion of the _{memory may} used, if the accessed

portion is used _{effectively, the} overall performance is superior.

Itshouldberememberedthat these testswerenotdesignedto measurethe "goodness"

or

"badness"

of the

hashing

algorithms. That type of _testing is beyond the scope of this

work. These tests were meant to check a specific set of constraints; does the XOR

hashing

algorithm perform_similarlyto thebaselinemodulo

hashing

algorithm. Itwouldbe dangerous

(30)

that theselected

hashing

algorithm performs_well,andis_verysimilar inresultsto thebaseline

(31)

4.0 VHDL IMPLEMENTATION

Before

looking

at the details of the particular

hashing

algorithm discussed in the

previous _{sections, I} will give a brief introduction and overview ofVHDL. This will give

some historical perspective to its application

here,

and provide the rational for its use in

thisand other _{designs. This}willbe especially significantwith its regard to _{moving designs}

from the abstract world of theoretical algorithm development to the concrete world of

hardware development.

VHDL is

formalized

in the IEEE specification 1076-1987 (the IEEE is _currently

working on the revised _version, 1076-1992). VHDL is an acronym that itself contains

acronyms. VHDL stands

for;

VHSIC Hardware Description Language. VHSIC stands

for;

Very

High Speed IC. IC stands

for;

Integrated Circuit. The original impetus for

VHSIC workcame from the Department ofDefense. The DOD was also one ofthe main

forces behind the development ofVHDL. Since that _time, a much broader audience has

developed for theuse andstandardization ofVHDL.

The need for hardware description languages

(HDLs)

has

long

been recognized in

industry,

although _many people will argue about what constitutes an

HDL.

Some refer to

the _early languages used to write simple state machines as the beginnings of HDL

development,

others claim that _only the new languages are

truly

HDLs. The need for

more powerful andcompact methods ofdescriptionbecame necessary as the_complexity of

designs began to

increase,

resulting in hand coding and simplification of the work

becoming

impossible. The designs would either take too

long

to process

by

hand,

or the

(32)

ofdesign and _{verification,}

implying

_the _{use of machines} _to _do much ofthe work.

Using

machines meant that methods of

describing

and _specifying the design in a machine

"understandable"

format had to be derived. This

input

also had to have the _capability of

being

understandabletohumans. Thus some ofthe_early HDLswereborn.

As time progressed and circuit _{complexity began} _rapidly

increasing,

it became

obvious thatthe _ability to describe largerpieces of

designs,

or even entire systems, would

have great benefit. This drove the development of

increasingly

more capable HDLs.

There were two main

forces

behind this.

First,

design _complexity was

increasing

so

rapidly that

it

was

becoming

difficult for designers to

keep

all the low-level details in

mind, while at the same time the low-level details were

becoming

less _necessary as

technologies matured to the point of

being

_virtually able to guarantee low level device

functionality

and correctness.

Second,

computers and the tools

they

fostered were

becoming

increasingly

more sophisticated and _capable, _allowing them to automate larger

and larger portions of the low-level implementation details

freeing

the designers to

concentrate on the high-level system details.

Proprietary

HDLs began to emerge that resulted in

tying

designers to a particular

company's toolsetsand methodologies.

Verilog

wasone ofthe most popularHDLs in this

class. TheDepartmentofDefense had majorproblems with this. It wanted to be able to

have workdeveloped at one_{company be}transferableto another _company without_worrying

about the tools and

training

of the companies and individuals involved. In addition, the

DOD wanted to be able to have a closer correlation and _{verifiability} between the

specifications it distributed and the _resulting products. This led to the development of

VHDLwhich wasto bean open standard, and was tobe developed from the

beginning

as a

(33)

These desires led to the

development

of the language for DOD _use, but the

openness and the standardization _{quickly began} to attract the attention of non-DOD

industries.

This drove the

language development

and standardization process into the

IEEE,

resulting in a standard that

included

a formal specification ofVHDL. This is an

important point that

is

often overlooked. VHDL is a _completely

formally

specified

language. All the constructs are defined and described in Backus Normal Form notation.

This _{is very different from} most other languages in common use. All the constructs are

defined _exactly; what

they

do,

and how the relate to each other and the _underlying

language definition. For example, compare this to the popular_programming language C.

There isno formal specification for C (although thereis some workin thisdirection). The

language is described and some of its

functionality

illustrated in _many

books,

the most

famous

being

"The C

Programming

Language"

by

Kemighan and Ritchie. But this work

doesnot give _{any formalism}to the language. Anyonewho wants to implement it is freeto

interpret the constructs and theirinterrelation differendy. This _{is completely different}than

VHDL which uses the formalism of language constructs from _{programming language}

theory

todescribeit.

Looking

at the_preceding would seem to

imply

thatall tools would be designed and

perform identically.

Unfortunately

this is not the case. The specification was written

by

humans,

and the development of tools from this language is done

by

humans.

Being

human means that there are bound to be errors, gray areas, interpretation differences and

disputes,

and other areas of confusion. All that the formal specification

implies

is that the

differences should be minor and on the fringes ofthe specification rather than at the core

of the language. This development has fostered broad support for VHDL among both

(34)

VHDL iswithoutits problems or

detractors,

_{but its} appeal continuesto broadenand its use

is rapidly raising with predictions that it will soon be the most common and most _widely

usedHDL.

What makes VHDL more useful than _many ofthe other HDLs? In addition to the

standardization _previously _{mentioned, VHDL} _is a higher level language. In

fact,

some of

its detractors claim that it is too high level.

They

_say that it has lost sight ofthe factthat

the ultimate productis supposedtobephysicaldevices. Whileit is true thatthereare some

constructs which cannot be

directly

implemented

in

hardware,

this begs thequestion. The

more

important

question is what can VHDL be used

for,

and are the high level constructs

necessary orhelpful?

Toanswerthis _question, there aretwo ways of

looking

at theproblem. Firstis the

traditional or historical way. In this view, tools were developed and implemented to

automate portions of the design

tasks,

_especially the

highly

repetitive parts. In this

perspective the criticism is _probably correct. The high level constructs don't provide _any

additional

help describing

portions of the design to be implemented. But in the second

view, which is_attemptingto breaksome ofthe historical

bounds,

the approach _{is different.}

It isnolonger just important todescribethe implementation ofthe

design,

it isalso_equally

important to _specify the

design,

and to test the design. These last two items can often

make _very good use of higher level constructs, and there is no need to _worry about

implementation. Constructsused to_specify ortest thedesignwill most often remain in the

abstract world of _{computers, programming,} and software. This not _only makes the high

level constructs _useful, it demands that

they

exist in order to provide the _necessary tools

for thejob. Without higher level constructs, many essential activities would _{be extremely}

(35)

The second view ofthe problem

is

just

beginning

to be developed and understood.

Very

few _{(if any)} people have made use of the full range of activities to _completely

implement

adesign in thisnew view. But_usingVHDL todothis processis _precisely what

it was

developed

for. The new view _{is attempting} to

tightly

bindtogetherall the steps of a

design,

from

_early specificationon through

testing

thefinal implementation..

4.1 BEHAVIORAL IMPLEMENTATION

Whatisabehavioral model? What is a structural model? There is a fairamount of

discussion,

and _controversy, about the definition of these two terms. _{A very} restrictive

definition of abehavioral model isone thatdoes not create or use a_clock, _anything elseis

considered structural. A more liberal definition says that a behavioral model is one that

does not instantiate any components. If the model instantiates _components, then it is

structural. To complicate _{matters, there} are some people that

try

to further refine the

definitions and draw more distinctions between the _{two models,} even to the point of

defining

differentmodels such as

dataflow,

etc. In this_paper, topreempt thedigression of

thisdiscussion into anotherversionof"how_manyangels can danceon thehead of a_pin", I

chose to use the latter definition above, namely that the distinction is drawn

by

the

instantiationof components.

The

first

VHDL implementation of the _algorithm, the _{behavioral model,} was

constructed without concern about

implementing

the design in hardware. This allowed

modeling the algorithmin a manner which allowed comparison of not_only the _{results, but}

also comparison to the functional steps in the C code implementation. _{In addition,} it

(36)

will most often execute much faster. This can be _very advantageous in the_{early modeling}

stages where _many simulations_may needtobe run.

Writing

a model at the behavioral level should not be confused with _writing a

totally

unconstrained model. While this is _possible,

it

is not desirable orhelpful. In the

approach taken in this _project, the "blue sky"

type of

thinking

should have been accomplished while

developing

the algorithm and _writing the C code. Once that is

completed, a significant amount ofinformation andknowledge about the algorithm and its workings is already in hand. This allows the behavioral model to be constructed so as to

constrain the problem as much as possible. This can have several benefits. It can

help

to check the algorithm at the edges of its area of _operation, and it can lead

directly

to

knowledge of the hardware element sizes needed for the final implementation (e.g. bus

widths, register_sizes, etc.).

VHDL is _very useful for

doing

the type of_constraining mentioned in the previous

paragraph. VHDL is a _strongly typed language. This means that each objectdeclared is given a_type, and can_onlycontain objects ofthattype. Inaddition to _this, subtypes can be

declared. Thisallows operations tobe defined thatwill work ondifferent subtypesas

long

as the base type is the same, and at the same time restricts the values that the object can

contain. Thistype ofrestriction,

by

using subtypes, canbe _very valuable as adesign tool.

Any

attempt to assign a value to a subtyped object that is not within its range will be an

error. Thesecan beeitheranalyzetimeorrun timeerrors (note: in the lexicon of

VHDL,

source code is analyzed, not compiled). The advantage this provides is ifa desired object

size

is

_{chosen, the} simulations will prove ifthis size is _adequate, or an error will result.

The ability to find errors will be _strictly limited

by

the thoroughness and

ingenuity

used in

(37)

Finding

errors is the main purpose for simulating. This means that the _strong

typing

ofVHDLcan be a

big

assist in

finding

design _errors, _especially some of the subtle

errors

involving

_{lost bits from} _{overflow conditions etc.}

Being

able to accomplish this at

the behavioral stage iseven more useful because it is much easierto make corrections and

changes.

Finding

theerrors earlier _mayeven changethe

design,

or_possibly call thedesign

feasibility

into question. It is

interesting

to note that

traditionally,

these types of errors

were not found until _simulating at the lowest structural

level,

which meant after all the

schematics had been created. In the new _approach, simulations with all the appropriate

sizes and typescan be done atthe conceptual _stage,

long

before schematics haveeven been

thoughtabout.

To test the algorithm

being

discussed in this paper it would be _necessary to have a

source of _symbols, and some method to use the output (the addresses generated). In

VHDLthe designelementthat "contains" theelementundertestiscalled a testbench. This

identified

twoofthe main portions ofthecode tobe written: thebehavioralalgorithm and

the testbench. These two elements will contain most of_{the effort,} and have the code that

can be followed to see what the algorithm isdoing. In addition to _{this, it} was decided to

separate some of the details to more _clearly illustrate the essential points. In addition to

making the main

body

of the code _clearer, this is good _coding practice. In

VHDL,

elements ofthedesignthatare abstracted are either otherdesign_elements, or _subprograms,

definitions,

types, etc. This latter typeofinformation is contained in a set ofdesign units

called a package.

The basic structures in VHDL are design units. There are five types of design

units: _{the entity, the architecture, the configuration, the} package

declaration,

and the

(38)

definitions are separated

into different

_design _units _to _allow individual analysis. Entities

are the design units that represent "real"

devices.

That device _may be a _gate, a _chip, a

board,

or a system. The

function

or size ofthedevice is not

important,

what is important

is

the

fact

that it represents _something physicalthat _may or _may notinterface to the rest of

the world. _{The entity is} the object that contains the declaration of_{any interfaces into} and

outof the design unit (if _any exist). An architecture is the definition ofthe

functionality

that will

implement

an entity. There may be any number of architectures for a given

entity. If there are multiple architectures or other instantiated elements that need to be

specified, the configuration

is

the design unit used. A configuration

fully

specifies what

"component"

to use foreach elementinthe_entityand architecturethat it isconfiguring.

The

testbench,

called

HashTB.vhdl,

is shown in Appendix Dl. For _{this work,} it

reads in theneeded symbols from a

file,

it writes theaddresses produced to another

file,

it

produces a histogramofthe _addresses, itcompares thishistogram to a reference file that is

read from

disk,

it storesthehistogram backonto

disk,

and itproduces theclock and enable

signals needed to drive the component under test. It performs all these functions so that it

isnot _only _providinginput to the algorithm, but it isalso

testing

the results and _comparing

them to the original results from the C code. This shows some of the power of a

testbench. _{Not only} can it drive the simulation, but it can do much of the work that

traditionally

was performed

by

the designer

looking

at waveforms or output files. A

skillfully constructed testbench can almost

fully

automate the simulation and _checking

cycle, relieving the designeroftedious and error pronetasks.

Oneofthe main statements used inVHDL istheprocess. Theprocess statement is

used to encapsulate a section of code that will represent a specific _action, or series of

(39)

execution is what defines the behavior of each _process, _giving it its

identity

or

functionality.

Butsequential execution of a series of statements will not produce a realistic

model of

hardware

where

many

things happen in _parallel, simultaneously. Thus all

processes (if more than one _exist) execute concurrently. There are additional statements

thatcan be used that will also execute

concurrently

with theprocesses. The importanceof

the elements at the concurrent level is the _relationship between them, especially the

communications between the various _elements, because this is what creates the _ability to

realistically model

hardware.

When

developing

the behavioral _model, a number of decisions had to be made

concerning the model. Ofparticular concern in thispaper was the further development of

the structural model and thedesire to_unify thiswork withthebehavioral modelinthe most

seamless

fashion.

This fact preordained the outcome of some of thedecisions. One was

the inclusion of a clock and enable signals in the testbench. Since itwasdesired touse the

sametestbench to driveboth the behavioraland structural _models, andthe structural model

wasto be developed in a synchronous

fashion,

a clock was necessary. The enable signals

were_optional, butgave a more realisticdesign to thestructural model.

Since the testbench is intended _solely for _exercising and

testing

ofthe instantiated

component and is not _going to be further developed or _{synthesized, this} allowed a wide

latitude to the developer.

Many

different code constructs are now available to the

testbench developer that would be unusable when _considering a design for real

hardware,

or synthesizability. In addition, special processes can be added that _{may have} no purpose

to the device under _{test, but may} exist _only to aid the developer or the tools

being

used.

Some ofthese can be seen in the code for the _testbench, shown in Appendix Dl. There

(40)

that exists _only _{to terminate the} _clock _when _the _simulation _has

completed, which exists to

aid the simulator. The two processes that do most of the work are

ReadFile,

and

WriteFile.

The names are self_explanatory, with theaddition that WriteFile also generates

a

histogram

of the addresses produced. The last _process,

CompareArray,

takes the

histogram

and compares it against the histogram produced

by

the C _code, and stores the

VHDL produced

histogram

back to disk. This final process exists to _simplify the

simulation and comparison process. It allows the simulations to be run in a batch mode

fromthe command

line,

with the testbench

doing

allofthe work. Of_{course, this} will _only

work afterthe testbench andcomponenthave beendebugged.

To

keep

the

testbench,

and the algorithm design units from getting _{too cluttered,}

some of the _{supporting types,}

declarations,

_subprograms, and component instantiations

were put in a package. The code for this _package, called HashPKG.vhdl is shown in

Appendix D3. The package contains both the package declaration and the package body.

While this is not _necessary, since each is a _separately analyzable design _unit, it was done

here sincethe package is not thatlarge. As can be _{seen, the} package contains a _variety of

things that can be used

by

several ofthe different design units. In addition to _{this, if}a

package

is

well written and

documented,

it may often be useful for more than one

design,

proving one ofVHDL's design goals, code

(design)

reuse. In the case of HashPKG.vhdl

the code was written _only with this project in _{mind, but} the point is illustrated

by

another

package called

dataconv_pkg.vhdl,

shown in Appendix

D4,

that is used

by

HashPKG but

was notwritten forthisproject.

dataconv_pkg

was a package that was written forprevious

work, but used as it was for this work. In this

fashion,

as designers gain experience with

VHDL, they

may build up a

library

of packages and design units that will allow future

work to be accomplished much more _rapidly because parts ofit can be put together from

(41)

Thediscussion hasnow touched on all the majordesignunits used in the behavioral

design,

except forthe algorithm

itself.

_{The discussion} progressed in this

fashion,

because

it approximates the manner and order in which the design units were _actually constructed.

Often it is best tobuild the tools and supportinfrastructure before

tackling

the main design

unit. Thisallows full effortto beapplied to theelement

being

designed,

without theeffort

being

diluted

by

other concerns. In this case, the testbench was developed

first,

until it

could _read, _write, and compare test files. This caused much of the package to be

developed in parallel.

By

the time the algorithm development

began,

most of the

supporting code existed. While it will _usually be impossible to

fully

develop

all the

supporting code before starting on the main _element, small changes and additions will be

trivial matters ifmost ofthe _supporting code and the

hierarchy

exists. The full

hierarchy

ofthebehavioral model canbe seenin Figure6.

HashPKG

dataconv_pkg

HashTB

textio

standard

Figure 6

(42)

When the testbench was _working and could do all the _necessary file manipulations,

work was started on

HashAlgXbehv.vhdl,

_shown _{in Appendix D6.} _{This design} _unit

contains the code for the behavioral version of the XOR

hashing

algorithm. As can be

seen, it is

relatively

compact and_{straightforward, consisting} offourprocesses. This shows

thatproper use of

testbenches,

and

abstracting

_support_functions _{to packages,} can makethe

main design unit much _simpler,

allowing easier

development,

debugging,

and _allowing it

tobe more self-explanatory.

Ofthe four_processes, _only one is concerned with the actual algorithm. The other

three are support

functions.

The NextCode process is needed to prime the system. The

very first symbol read in becomes the

first

previous _code, after that the previous code

always comes from the memory. HashMem is a process the simulates a ROM type of

device. WTienever an address is _presented, the process returns the proper data for that

address. Thisprocess alsohas some _ancillary work to

do;

it hastoknow thefirst time that

it

runs (elaboration

time),

and fill the _{array from} a file. Enable is a process that watches

forvaliddata tobe presented, andaftertheproper

delay,

signalsthat valid datais available

on the output. The last process is the one concerned with the actual

hashing

operation.

When seen at the behavioral

level,

the_processing requires a _very minimal amount of_code,

which shows the power of high level programming. Each linecontains a number ofhigh

level _operations,

functions,

and/or_arrayoperations.

When all the code development was _{complete, the} behavioral algorithm was tested

viathe testbench. A Unix script was writtenthat ranthe algorithm written in C code. The

output from this was stored in a file. The script then ran the testbench which ran the

VHDL version ofthe algorithm. The testbench also read the file that had been produced

(43)

Thevalues produced

by

the two methods had to compare exactly. These testswere run on

allthe possible combinations of

input

codes and inputaddressesthathadbeen generated.

4.2

STRUCTURAL IMPLEMENTATION

When

doing

the structural

implementation,

it was desired to use the testbench and

packages that had been developed forthebehavioral version without_anymodification other

than the additions of new components or subprogram declarations that _{may be} needed.

The second goal was to

implement

the algorithm in a _completely structural _manner,

meaning only component

instantiations

and the interconnections between them. This

second goal was bent slightly to show some ofthe tools of

VHDL,

port conversions and

signal name changes.

Using

VHDL,

it is possible to have a single _entity and write multiple architectures

forit. That would have been possible, but was not done in this case. There were several

reasons for this.

First,

it was desired to use integer base types as much as possible in the

behavioral version,

including

many ofthe ports.

Second,

since it was desired to make the

structural version _only instantiations and

interconnections,

it was _necessary to have ports

that were based on bit or bitvector types. To meet _{these goals,} and the goal in the

previous paragraph of_using the testbench without _{modification,} a new level of

hierarchy

had to be introduced. The testbench instantiates a component with the same ports as the

behavioral version, but thisis a "dummy" component that_{only does} type _conversions, and

then instantiates the structural component. This intermediate component is called

HashConvert,

and

its

code

is

called HashConvert.vhdl which is shown in Appendix D5.

(44)

HashAlgXstru.vhdl and is shown in Appendix D6. The full

hierarchy

of the structural

model can be seenin Figure 7.

HashTB

HashConvert

HashAkjXstru

Regl2

Muxl2

XorMuIti

MiscComp

dataconv_pkg

textio

standard

Figure7

-Hierarchy

of objectsforthe structural code

When

looking

at the structural _{code, the} first reaction

is,

'why?'. It is about as

informative as

trying

to read a _{netlist; possible,} but _{time consuming,} error _prone, and

pointless. It merely lists the components _used, and the connections to their ports. It is

much better to have the structural code written

by

a _machine, it is faster and there are

fewer errors. The programs used to do this are called synthesis tools. In the code there

are two places where signals are assigned to other _signals, identifier changes needed to

satisfy VHDL's port mode requirements. The _only real point of interest is on the

(45)

The use ofthe port conversion

functions

was not _essential, but it demonstrates one

ofVHDL's

features

andallowed code reuse. _{The memory} element needs to read a file of

valuesthatrepresent theprevious codes. These values were produced _{using the} C program

previously

mentioned. Since it is not possible to

directly

produce a bit_vector from a C

program and

it

is possibletoproduce_{ASCII values,} thatis howallthe output was storedin

files.

This also allowed forgreater tool and platform

independence,

as ASCII is standard

across almost all platforms while

binary

_{file formats} are

frequently

different. Since the

data stored was

ASCII,

and VHDL _only allows

indexing

arrays with integers or

enumerated

types,

the _memory element was constructed as a high level element. The

behavioral version of the algorithm

implemented

the file open and _memory _reading

directly. In the structural versionthis was _undesirable, since thereis no piece ofhardware

that corresponds to _opening a

file.

To reuse the code from the behavioral version, a

memory component was constructed with ports that were a subtypes ofinteger. Then to

connect this component to the bit_vectors used in the structural _code, port conversion

functions

were needed. These are special functions that accept a signal in the port _map,

and return a signal of adifferent type. Thisallowed conversion to and from bit_vectors in

the structural algorithm andthe integers required

by

the _memoryelement. This would not

have been _{necessary, the} conversion could have been done inside the _memory _element, but

thismethod alloweddirectreuse ofthecode from thebehavioral version ofthealgorithm.

A structural description ofadevice does not provide much information beyond the

components used and the interconnection. The real information on

functionality

lies at a

lower level. Therecould

be

a

hierarchy

ofstructural _{elements, o}

Design of a hardware efficient key generation algorithm with a VHDL implementation

Rochester Institute of Technology

Recommended Citation

Department Chairman - Dr. Roy Czemikowski

of RIT to reproduce my thesis in whole or in part.

following

TABLE OF CONTENTS

BIBLIOGRAPHY

-Hierarchy

CDEGAUSS B-10

LIST OF TABLES AND EQUATIONS

ABSTRACT

implementing

demonstrating

increasing dramatically

being

location,

historically

CONCEPT

Rather,

Having

Hashing

functionality

incoming

history

2.1 SOFTWAREALGORITHM

incoming

hashing

hashing

being

2.2 HARDWARE ALGORITHM

interesting

hashing

LSB) by

hashing

developed,

being

Mathcad,

began,

3.1 HASHING

Hashing

likely

3.2 INPUT DATA

discussed,

directory

SYMUNIF

distribution,

3.3 DISCUSSION

discussed,

location,

histogram,

hashing

hashing

3.4 RESULTS

being

Again,

CDEUNIF CDEGAUSS CDEMYC1

being

fully

hashing

importantly,

fashion,

hashing

hashing

formalized

industry,

describing

forces

freeing

training

development

functionality

theory

detractors,

directly

help describing

beginning

4.1 BEHAVIORAL IMPLEMENTATION

discussion,

defining