A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION STANDARD (AES) ALGORITHM USING SYSTEMVERILOG

(1)

A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION

STANDARD (AES) ALGORITHM USING SYSTEMVERILOG

Bahram Hakhamaneshi

B.S., Islamic Azad University, Iran, 2004

PROJECT

Submitted in partial satisfaction of

the requirements for the degree of

MASTER OF SCIENCE

in

COMPUTER ENGINEERING

at

CALIFORNIA STATE UNIVERSITY, SACRAMENTO

FALL

2009

(2)

ii

A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION

STANDARD (AES) ALGORITHM USING SYSTEMVERILOG

A Project

by

Bahram Hakhamaneshi

Approved by:

__________________________________, Committee Chair

Dr. Behnam Arad

____________________________

Date

__________________________________, Second Reader

Dr. Isaac Ghansah

____________________________

Date

(3)

iii

Student: Bahram Hakhamaneshi

I certify that this student has met the requirements for format contained in the University

format manual, and that this project is suitable for shelving in the Library and credit is to

be awarded for the Project.

__________, Graduate Coordinator

Dr. Suresh Vadhva

Date

(4)

iv

of

A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION

STANDARD (AES) ALGORITHM USING SYSTEMVERILOG

by

Bahram Hakhamaneshi

The increasing need for protecting data communication in computer networks has

led to development of several cryptography algorithms. The Advanced Encryption

Standard (AES) is a computer security standard issued by the National Institute of

Standards and Technology (NIST) intended for protecting electronic data. Its

specification is defined in Federal Information Processing Standards (FIPS) Publication

197. The AES cryptography algorithm can be used to encrypt/decrypt blocks of 128 bits

and is capable of using cipher keys of 128, 196 or 256 bits wide (AES128, AES196, and

AES256).

The Advanced Encryption Standard can be implemented in either software or

hardware. Hardware acceleration is the use of hardware to perform a task more

efficiently than is possible in software. In order to achieve higher performance in today’s

heavily loaded communication networks, utilization of hardware accelerators for

(5)

v

proposed. A unique feature of the proposed pipelined design is that the round keys,

which are consumed during different iterations of encryption, are generated in parallel

with the encryption process. This lowers the delay associated with each round of

encryption and reduces the overall encryption delay of a plaintext block. This leads to an

increase in the message encryption throughput.

The proposed pipelined design was modeled and validated in SystemVerilog

hardware description language. The testbench developed for validating the design kept

track of Functional Coverage to make sure the design is thoroughly verified. The design

was validated using the Synopsys VCS tool and synthesized using the Synopsys

Design-Compiler tool. The gate level netlist generated during the synthesis phase using the

LSI_10K technology library was capable of operating at 40MHz frequency. We expect

the timing and area of the gate level netlist to improve if a more efficient technology

library file is used for synthesis.

Finally, to get an estimate of the speed gain by the hardware implementation, a

virtual system was created using the Virtutech® Simics™ software to emulate the

execution of a “C” program that implements the AES128 encryption in software. The

Simics virtual system utilized in this project is based on Intel’s x86 architecture with the

440BX chipset and has a 2GHz Pentium4 processor.

(6)

vi

The statistics gathered from the virtual system showed that it would take more than

30,000 CPU cycles to encrypt a block of plaintext, assuming one clock per instruction.

The results indicate that the hardware implementation proposed in this project is at least

60 times faster than the software implementation.

_______________________, Committee Chair

Dr. Behnam Arad

_______________________

Date

(7)

vii

(8)

viii

I would like to say thanks to Dr. Behnam Arad and Dr. Isaac Ghansah for their help

with defining and concluding this project. This project could not have reached this far

without their guidance and assistance. I also want to give special thanks to them for

reviewing this report and proofreading it in the very short time that was left before

submission deadline.

I also would like to thank my family, either those who were close or far away, for

encouraging and supporting me during the course of this project and all my life.

(9)

ix

Dedication.……….vii

Acknowledgments………viii

List of Tables..………xi

List of Figures...……….………xii

Chapter

1. INTRODUCTION….………1

2. ADVANCED ENCRYPTION STANDARD (AES)….………5

2.1 Overview...……….………5

2.2 Inputs, Outputs and the State.………6

2.3 Cipher Transformations ………9

2.3.1 SubBytes ( ) Transformation………11

2.3.2 ShiftRows ( ) Transformation...………13

2.3.3 MixColumns ( ) Transformation...………13

2.3.4 AddRoundKey ( ) Transformation ...………15

2.4 AES Key Expansion…...……….………16

3. AES128 DESIGN AND IMPLEMENTATION..………19

3.1 Overview…...…….………....19

3.2 Design Hierarchy…...……….……….………....20

3.2.1 AES128 Encryption Process……….21

3.2.2 AES128 Round Key Generation...………22

(10)

x

4. AES128 VERIFICATION...………27

4.1 Overview….……….………27

4.2 Testbench Infrastructure...………27

4.3 AES128_Interface………29

4.4 AES128_Program………31

5. AES128 SYNTHESIS….………36

5.1 Overview……….……….36

5.2 Synthesis Methodology………37

5.3 Synthesis Timing Result...………40

5.4 Synthesis Area Result...………42

5.5 Synthesis Constraint Violators Result.…….………43

6. AES128 SOFTWARE IMPLEMENTATION…..……….……..44

6.1 Overview...………44

6.2 AES128 Software Implementation on a Simics Virtual System...………44

7. CONCLUSION…...………..……….……..48

Appendix A: AES128 Hardware Model Source Files...………52

Appendix B: AES128 Testbench Source Files..………68

Appendix C: AES128 Simulation Results….………75

Appendix D: AES128 Implementation in “C” Language………102

(11)

xi

1. Table 1 – AES Variations………7

2. Table 2 – AES S-box………….………12

(12)

xii

1. Figure 1 – State Population and Results……….8

2. Figure 2 – AES Cipher.….………10

3. Figure 3 – SubBytes Transformation………11

4. Figure 4 – ShiftRows Transformation..………13

5. Figure 5 – MixColumns Transformation...………15

6. Figure 6 – AddRoundKey Transformation.….……….16

7. Figure 7 – KeyExpansion Algorithm……...……….………..17

8. Figure 8 – Design Hierarchy.………..………..…20

9. Figure 9 – AES128_Cipher_Top Module State Diagram...………. 22

10. Figure 10 – AES128_Key_Expand Module State Diagram.………23

11. Figure 11 – AES128_Key_Expand Module...………. 24

12. Figure 12 – AES128_Rcon Module………. 25

13. Figure 13 – AES128 Pipelined Round Key Generation and Cipher Rounds...……26

14. Figure 14 – AES128 Test Infrastructure...………28

15. Figure 15 – AES128_Top Definition.……….…….….29

16. Figure 16 – AES128_Interface Definition..………. 30

17. Figure 17 – Class Definition in the AES128_Program……….32

18. Figure 18 – AES128_Program Pseudo Code...….………33

19. Figure 19 – AES128_Testbench_Package Pseudo Code..………34

20. Figure 20 – Sample Simulation Results………35

(13)

Chapter 1 INTRODUCTION

In today’s digital world, encryption is emerging as a disintegrable part of all

communication networks and information processing systems, for protecting both stored

and in transit data. Encryption is the transformation of plain data (known as plaintext)

into unintelligible data (known as ciphertext) through an algorithm referred to as cipher.

There are numerous encryption algorithms that are now commonly used in computation,

but the U.S. government has adopted the Advanced Encryption Standard (AES) to be

used by Federal departments and agencies for protecting sensitive information. The

National Institute of Standards and Technology (NIST) has published the specifications

of this encryption standard in the Federal Information Processing Standards (FIPS)

Publication 197. [1]

Any conventional symmetric cipher, such as AES, requires a single key for both

encryption and decryption, which is independent of the plaintext and the cipher itself. It

should be impractical to retrieve the plaintext solely based on the ciphertext and the

encryption algorithm, without knowing the encryption key. Thus, the secrecy of the

encryption key is of high importance in symmetric ciphers such as AES. Software

implementation of encryption algorithms does not provide ultimate secrecy of the key

since the operating system, on which the encryption software runs, is always vulnerable

to attacks.

(14)

There are other important drawbacks in software implementation of any encryption

algorithm, including lack of CPU instructions operating on very large operands, word

size mismatch on different operating systems and less parallelism in software. In

addition, software implementation does not fulfill the required speed for time critical

encryption applications. Thus, hardware implementation of encryption algorithms is an

important alternative, since it provides ultimate secrecy of the encryption key, faster

speed and more efficiency through higher levels of parallelism.

Different versions of AES algorithm exist today (AES128, AES196, and AES256)

depending on the size of the encryption key. In this project, a hardware model for

implementing the AES128 algorithm was developed using the SystemVerilog hardware

description language. A unique feature of the design proposed in this project is that the

round keys, which are consumed during different iterations of encryption, are generated

in parallel with the encryption process.

The hardware model was then completely verified using a testbench, which took

advantage of the SystemVerilog’s object oriented programming (OOP) feature, by

constructing random test objects and providing them to the model. The validation

process continued until the model was verified for a certain Functional Coverage. Then,

the verified model was synthesized using the Synopsis Design-Compiler tool to get an

estimate of the number of gates, area and timing of the hardware model.

(15)

In addition, the AES128 algorithm was modeled in “C” language and was ported on

a Simics virtual system. The statistics of the Simics virtual system was gathered to get an

estimate of the time it would take to encrypt a plaintext block on the virtual system.

Finally, the performances of software and hardware implementations were compared.

The rest of the report is organized into six chapters. Chapter 2 covers an overview

of the AES encryption algorithm and different version of it. In this chapter, different

types of transformations and steps that are involved in the AES encryption process are

introduced.

Chapter 3 discusses the design and modeling of the hardware implementation of the

AES128 encryption algorithm by explaining the modules used in the design hierarchy,

their interconnections and state diagrams.

Chapter 4 covers the verification of the hardware model. In this chapter, a test

infrastructure is developed which fully validates the design. The testbench generates

random input test vectors for the hardware model and validates its functionality until a

certain Functional Coverage is met.

Chapter 5 covers the synthesis of the hardware model using the Synopsys Design

Compiler synthesis tool. In this chapter, a script is developed to synthesize the design

(16)

into a gate-level netlist using the LSI_10K library file. The synthesis result, including the

timing and area of the netlist comes at the end of this chapter.

Chapter 6 covers the software implementation of the AES128 algorithm (in “C”

language) and porting it on a Simics virtual system. In addition, the software and

hardware implementation are compared based on the time it takes to encrypt a block of

plaintext.

Finally, in Chapter 7, the research work is summarized and potential improvements

and suggestions of future works for this project are included.

(17)

Chapter 2 ADVANCED ENCRYPTION STANDARD (AES)

2.1 Overview

This chapter is a summary of the Federal Information Processing Standards (FIPS)

Publication 197 [1], issued by the National Institute of Standards and Technology (NIST)

which specifies the

Advanced Encryption Standard. Throughout the remainder of this

chapter, the mathematical properties of the

Advanced Encryption Standard (AES) are

introduced using the information obtained from the AES specification.

The AES is a subset of a much larger encryption algorithm known as

Rijndael,

which was one of many proposals to the NIST competing for becoming a standard

encryption algorithm. On October of 2000, the NIST announced the Rijndael algorithm

as the winner due to the best overall score in security, performance, efficiency,

implementation capability and simplicity. [2]

The AES algorithm is a symmetric cipher. In symmetric ciphers, a single secret key

is used for both the encryption and decryption, whereas in asymmetric ciphers, there are

two sets of keys known as private and public keys. The plaintext is encrypted using the

public key and can only be decrypted using the private key.

(18)

In addition, the AES algorithm is a

block cipher as it operates on fixed-length

groups of bits (blocks), whereas in stream ciphers, the plaintext bits are encrypted one at

a time, and the set of transformations applied to successive bits may vary during the

encryption process.

The AES algorithm operates on blocks of 128 bits, by using cipher keys with

lengths of 128, 192 or 256 bits for the encryption process. Although the original Rijndael

encryption algorithm was capable of processing different blocks sizes as well as using

several other cipher key lengths, but the NIST did not adopt these additional features in

the AES. [1]

2.2 Inputs, Outputs and the State

The plaintext input and ciphertext output for the AES algorithms are blocks of

128 bits. The cipher key input is a sequence of 128, 192 or 256 bits. In other words the

length of the cipher key, N

k

, is either 4, 6 or 8 words which represent the number of

columns in the cipher key. The AES algorithm is categorized into three versions based

on the cipher key length. The number of rounds of encryption for each AES version

depends on the cipher key size.

In the AES algorithm, the number of rounds is represented by N

r

, where N

r

= 10

(19)

illustrated the variations of the AES algorithm. For the AES algorithm the block size

(N

b

), which represents the number of columns comprising the State is N

b

= 4.

AES Version

Key Length

(N

k

words)

Block Size

(N

b

words)

Number of Rounds

(N

r

rounds)

AES128

4

10 AES192

6

4

12 AES256

8

4

14 Table 1 – AES Variations

The basic processing unit for the AES algorithm is a byte. As a result, the plaintext,

ciphertext and the cipher key are arranged and processed as arrays of bytes. For an input,

an output or a cipher key denoted by a, the bytes in the resulting array are referenced as

a

n

, where n is in one of the following ranges:

Block length = 128 bits, 0 <=

n

< 16

Key length = 128 bits, 0 <=

n

< 16

Key length = 192 bits, 0 <=

n

< 24

Key length = 256 bits, 0 <=

n

< 24

(20)

All byte values in the AES algorithm are presented as the concatenation of their

individual bit values between braces in the order {b7, b6, b5, b4, b3, b2, b1, b0}. These

bytes are interpreted as finite field elements using a polynomial representation:

i i i

x

b

x

b

x

b

x

b

x

b

x

b

x

b

x

b

x

b

∑

=

+

7 0 0 1 2 3 3 4 4 5 5 6 6 7 7

As an example, {10001001} (or {85} in hexadecimal) identifies the polynomial

1

3 7

+

x

. The arrays of bytes in the AES algorithm are represented as

a

₀

a

₁

a

₂

...

a

_n

.

All the AES algorithm operations are performed on a two dimensional 4x4 array

of bytes which is called the State, and any individual byte within the State is referred to

as s

r,c

, where letter ‘r’ represent the row and letter ‘c’ denotes the column. At the

beginning of the encryption process, the State is populated with the plaintext. Then the

cipher performs a set of substitutions and permutations on the State. After the cipher

operations are conducted on the State, the final value of the state is copied to the

ciphertext output as is shown in the following figure.

in

0

in

4

in

8

in

12

in

1

in

5

in

9

in

13

in

2

in

6

in

10

in

14

in

3

in

7

in

11

in

15

s

0,0

s

0,1

s

0,2

s

0,3

s

1,0

s

1,1

s

1,2

s

1,3

s

2,0

s

2,1

s

2,2

s

2,3

s

3,0

s

3,1

s

3,2

s

3,3

out

0

out

4

out

8

out

12

out

1

out

5

out

9

out

13

out

2

out

6

out

10

out

14

out

3

out

7

out

11

out

15

(21)

At the beginning of the cipher, the input array is copied into the State according

the following scheme:

s[r,c] = in [r + 4c]

for

0 ≤

r

<

4 and

0 ≤

c

<

4 ,

and at the end of the cipher the State is copied into the output array as shown below:

out[r+4c] = s[r,c]

for

0 ≤

r

<

4 and

0 ≤

c

<

4 2.3 Cipher Transformations

The AES cipher either operates on individual bytes of the State or an entire

row/column. At the start of the cipher, the input is copied into the State as described in

Section 2.2. Then, an initial Round Key addition is performed on the State. Round keys

are derived from the cipher key using the Key Expansion routine. The key expansion

routine generates a series of round keys for each round of transformations that are

performed on the State.

The transformations performed on the state are similar among all AES versions

but the number of transformation rounds depends on the cipher key length. The final

round in all AES versions differs slightly from the first N

r

−

1 rounds as it has one less

transformation performed on the State. Each round of AES cipher (except the last one)

consists of all the following transformation:

-

SubBytes( )

-

ShiftRows( )

(22)

-

MixColumns( )

-

AddRoundKey ( )

The AES cipher is described as a pseudo code in Figure 2. [1] As shown in the

pseudo code, all the N

r

rounds are identical with the exception of the final round which

does not include the MixColumns transformation. The array w[] represents the round

keys that are generated by the key expansion routine. In the following sections,

individual transformations that are used in each encryption round are described.

Cipher(byte PlainText[4*Nb], byte CipherText[4*Nb], word w[Nb*(Nr+1)])

begin

byte state[4,Nb]

state = in

AddRoundKey(state, w[0, Nb-1])

for round = 1 step 1 to Nr–1

SubBytes(state) ShiftRows(state) MixColumns(state)

AddRoundKey(state, w[round*Nb, (round+1)*Nb-1])

end for SubBytes(state) ShiftRows(state) AddRoundKey(state, w[Nr*Nb, (Nr+1)*Nb-1]) out = state end

(23)

2.3.1 - SubBytes ( ) Transformation

The SubBytes is a byte substitution operation performed on individual bytes of the

State, as shown in Figure 3, using a substitution table called S-box.

s

0,0

s

0,1

s

0,2

s

0,3

s

1,0

s

1,1

s

1,2

s

1,3

s

2,0

s

2,1

s

2,2

s

2,3

s

_3,0

s

_3,1

s

_3,2

s

_3,3

s

’

0,0

s

’

0,1

s

’

0,2

s

’

0,3

s

’

1,0

s

’

1,1

s

’

1,2

s

’

1,3

s

’

2,0

s

’

2,1

s

’

2,2

s

’

2,3

s

’

3,0

s

’

3,1

s

’

3,2

s

’

3,3

Figure 3 – SubBytes Transformation

The invertible S-box table is constructed by performing the following transformation on

each byte of the State. [1]

-

Take the multiplicative inverse in the finite field GF(2

8

) of the byte.

-

Apply the following transformation to the byte:

i i i i i i i

b

c

b

'

=

⊕

₍₊₄₎_mod₈

⊕

₍₊₅₎_mod₈

⊕

₍₊₆₎_mod₈

⊕

₍₊₇₎_mod₈

⊕

The b

i

is the i

th

bit of the byte and c

i

is the i

th

bit of a constant byte with the value of {63}.

The combination of the two transformations can be expressed in matrix form as shown

below:

(24)













+

























=













0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

7 6 5 4 3 2 1 0 ' 7 ' 6 ' 5 ' 4 ' 3 ' 2 ' 1 ' 0

b

The S-box table shown in Table 2 is constructed by performing the two

transformations described earlier for all possible values of a byte, ranging from {00} to

{ff}. For example the substitution value for {53} would be determined by the

intersection of the row with index ‘5’ and the column with index ‘3’.

Y

0

1

2

3

4

5

6

7

8

9 a

b

c

d

e

f

0 63 7c 77 7b f2 6b 6f c5 30 01 67 2b fe d7 ab 76

1 ca 82 c9 7d fa 59 47 f0 ad d4 a2 af 9c a4 72 c0

2 b7 fd 93 26 36 3f f7 cc 34 a5 e5 f1 71 d8 31 15

3 04 c7 23 c3 18 96 05 9a 07 12 80 e2 eb 27 b2 75

4 09 83 2c 1a 1b 6e 5a a0 52 3b d6 b3 29 e3 2f 84

5 53 d1 00 ed 20 fc b1 5b 6a cb be 39 4a 4c 58 cf

6 d0 ef aa fb 43 4d 33 85 45 f9 02 7f 50 3c 9f a8

7 51 a3 40 8f 92 9d 38 f5 bc b6 da 21 10 ff

f3 d2

8 cd 0c 13 ec 5f 97 44 17 c4 a7 7e 3d 64 5d 19 73

9 60 81 4f dc 22 2a 90 88 46 ee b8 14 de 5e 0b db

A

e0 32 3a 0a 49 06 24 5c c2 d3 ac 62 91 95 e4 79

B

e7 c8 37 6d 8d d5 4e a9 6c 56 f4 ea 65 7a ae 08

C

ba 78 25 2e 1c a6 b4 c6 e8 dd 74 1f 4b bd 8b 8a

D

70 3e b5 66 48 03 f6 0e 61 35 57 b9 86 c1 1d 9e

E

e1 f8 98 11 69 d9 8e 94 9b 1e 87 e9 ce 55 28 df

X

F

8c a1 89 0d bf e6 42 68 41 99 2d 0f b0 54 bb 16

Table 2 – AES S-box

(25)

2.3.2 - ShiftRows ( ) Transformation

The ShiftRows transformation cyclically shifts the last three rows of the state by

different offsets. The first row is left unchanged in this transformation. Each byte of the

second row is shifted one position to the left. The third and fourth rows are shifted left

by two and three positions, respectively. The ShiftRows transformation is illustrated in

Figure 4.

s

0,0

s

0,1

s

0,2

s

0,3

s

1,0

s

1,1

s

1,2

s

1,3

s

2,0

s

2,1

s

2,2

s

2,3

s

3,0

s

3,1

s

3,2

s

3,3

s

0,0

s

0,1

s

0,2

s

0,3

s

1,1

s

1,2

s

1,3

s

1,0

s

2,2

s

2,3

s

2,0

s

2,1

s

3,3

s

3,0

s

3,1

s

3,2

Figure 4 – ShiftRows Transformation

2.3.3 MixColumns ( ) Transformation

This transformation operates on the columns of the State, treating each columns

as a four term polynomial the finite field GF(2

8

). Each columns is multiplied modulo

x

4

+1 with a fixed four-term polynomial a(x) = {03}x

3

+ {01}x

2

+ {01}x + {02} over the

(26)

GF(2

8

). The MixColumns transformation can be expressed as a matrix multiplication as

shown below:

























=













c c c c c c c c

s

, 0 , 0 , 0 , 0 ' , 3 ' , 2 ' , 1 ' , 0

02

01

03

02

01

03

02

01

03

02 The MixColumns transformation replaces the four bytes of the processed column

with the following values:

c c c c c

s

₀'_,

=

({

02 }

•

₀_,

)

⊕

({

03 }

•

₁_,

)

⊕

₂_,

⊕

₃_, c c c c c

s

₁'_,

=

₀_,

⊕

({

02 }

•

₁_,

)

⊕

({

03 }

•

₂_,

)

⊕

₃_,

)

}

03 ({

)

}

02 ({

₂_, ₃_, , 1 , 0 ' , 0c

s

c

s

c

s

c

s

c

s

=

⊕

• ⊕

• )

}

02 ({

)

}

03 ({

₀_, ₁_, ₂_, ₃_, ' , 1c

s

c

s

c

s

c

s

c

s

=

• ⊕

⊕

• The “

• ” corresponds to the multiplication of polynomials in GF(2

8

) modulo an

irreducible polynomial of degree 8. A polynomial is irreducible if its only divisors are

one and itself. For the AES algorithm the irreducible polynomial is:

(27)

The MixColumns transformation is illustrated in Figure 5. This transformation

together with ShiftRows, provide substantial diffusion in the cipher meaning that the

result of the cipher depends on the cipher inputs in a very complex way. In other words,

in a cipher with a good diffusion, a single bit change in the plaintext will completely

change the ciphertext in an unpredictable manner.

s0,0

s0,1 s0,2 s0,3

s1,0

s1,1 s1,2 s1,3

s

2,0

s

2,1

s

2,2

s

2,3

s3,0 s3,1 s3,2 s3,3

s0,0

s0,1 s0,2 s0,3

s1,1

s1,2 s1,3 s1,0

s

2,2

s

2,3

s

2,0

s

2,1

s

3,3

s

3,0

s

3,1

s

3,2

Figure 5 – MixColumns Transformation

2.3.4 AddRoundKey ( ) Transformation

During the AddRoundKey transformation, the round key values are added to the

State by means of a simple Exclusive Or (XOR) operation. Each round key consists of

N

b

words that are generated from the KeyExpansion routine. The round key values are

(28)

[

s

₀'_,_c

,

s

₁'_,_c

,

s

₂'_,_c

,

s

₃'_,_c

] [

=

s

₀_,_c

,

s

₁_,_c

,

s

₂_,_c

,

s

₃_,_c

]

⊕

[

w

_round_*_Nb₊_c

]

for

0 ≤

c

<

N

_b

In the equation above, the round value is between

0 ≤

round

≤

N

r

. When

round=0, the cipher key itself is used as the round key and it corresponds to the initial

AddRoundKey transformation displayed in the pseudo code in Figure 2.

The AddRoundKey transformation is illustrated in Figure 6.

s

0,0

s

0,1

s

0,2

s

0,3

s

_1,0

s

_1,1

s

_1,2

s

_1,3

s

2,0

s

2,1

s

2,2

s

2,3

s

3,0

s

3,1

s

3,2

s

3,3

s

0,0

s

0,1

s

0,2

s

0,3

s

1,1

s

1,2

s

1,3

s

1,0

s

2,2

s

2,3

s

2,0

s

2,1

s

_3,3

s

_3,0

s

_3,1

s

_3,2

⊕

Figure 6 – AddRoundKey Transformation

2.4 AES Key Expansion

The AES algorithm requires four words of round keys for each encryption round.

That is total of 4*(N

r

+ 1) round keys considering the initial set of keys required for the

first AddRoundKey transformation. All the round keys are derived from the cipher key

itself.

(29)

According to the Federal Information Processing Standards (FIPS) Publication

197 [1], there is no restriction on the cipher key selection, as no week cipher key has been

identified for the AES algorithm. The expansion of the cipher key into the round keys is

performed by the KeyExpansion algorithm as shown in the pseudo code in Figure 7. [1]

KeyExpansion(byte CipherKey[4*Nk], word w[Nb*(Nr+1)], Nk)

begin word temp i = 0

while (i < Nk)

w[i] = word(key[4*i], key[4*i+1], key[4*i+2], key[4*i+3]) i = i+1 end while i = Nk while (i < Nb * (Nr+1)] temp = w[i-1] if (i mod Nk = 0)

temp = SubWord(RotWord(temp)) xor Rcon[i/Nk]

else if (Nk > 6 and i mod Nk = 4)

temp = SubWord(temp) end if

w[i] = w[i-Nk] xor temp

i = i + 1 end while end

Figure 7 – KeyExpansion Algorithm

In the above pseudo code, the

array w[] represents the round keys that are generated

by the KeyExpansion routine and N

k

represents the size of the cipher key. Depending on

the version of the AES algorithm, N

k

=4, 6 or 8. The first N

k

words of the expanded key

(30)

The SubWord( ) function applies the same S-box substitution to each of the four

bytes in the word. The RotWord( ) function takes a word [a0,a1,a2,a3] as input and

perform a cyclic shift and returns the word [a1,a2,a3,a0]. The round constant word array,

Rcon[i], contains a 32 bit value given by [{02}

i-1

,{00},{00},{00}].

Every following round key , w[i], is equal to the XOR of the previous round key,

w[i-1], and the word N

k

positions earlier, w[i-N

k

]. For words in positions that are a

multiple of N

k

, two transformations are initially applied to the previous round key, w[i-1].

These transformations are a cyclic shift of the bytes in the previous round key, followed

by the application of the S-box table lookup to all four bytes of the word. Afterwards, an

XOR with a round constant value, Rcon[i], is applied to the previous round key.

The KeyExpansion routine for the AES256 (N

k

=8) is slightly different than the

AES128 and AES192 ones, as an additional SubWord function is applied to the previous

round key, w[i-1], prior to the XOR with w[i- N

k

].

(31)

Chapter 3 AES128 DESIGN AND IMPLEMENTATION

3.1 Overview

In this chapter, a hardware model for implementing the AES128 algorithm is

introduced. The model is implemented using the SystemVerilog hardware description

language [5]. This chapter covers the design and implementation issues of the AES128

algorithm. In the next chapter, a test infrastructure is presented that thoroughly tests the

functionality of the implemented model. The hardware model developed in this chapter

is synthesizable. This means that the model provides a cycle-by-cycle RTL description

of the circuit that a logic synthesis tool can convert to an optimized gate-level netlist. [3]

The modeling process utilized in this project is the bottom-up approach. This

means that the leaf components in the design hierarchy were developed first and the

higher-level modules were constructed by instantiating their subcomponents and

connecting them with the internal signals. All the modules in the design hierarchy were

modeled in behavioral style, but the root module consisted of data flow modeling as well

to implement the four major cipher transformations.

(32)

3.2 Design Hierarchy

The proposed AES128 hardware model is a 3-level hierarchical design as shown in

Figure 8. The root module in the hierarchy is the AES128_cipher_top. This module

implements the AES128 pseudo code displayed in Figure 2. It has two 128-bit inputs for

receiving the cipher key and the plaintext. There is also a single bit input signal, ‘Ld’,

which is used to indicate the availability of a new set of plaintext or cipher key on the

input ports. The completion of the encryption process is indicated by asserting the ‘done’

single bit output.

AES128_Cipher_Top AES128_Key_Expand AES128_Rcon clk rst plaintext done 128 b 128 b 128 b ciphertext cipherkey ld

(33)

A unique feature of the proposed design is that the AES128_Key_Expand module is

pipelined with the AES128_cipher_top module. While the AES128_cipher_top module

is performing an iteration of the encryption transformations on the State using the

previously generated round keys, the AES128_Key_Expand produces the next round’s

set of keys to be used by the root module in the next encryption iteration.

3.2.1 AES128 Encryption Process

The AES128_cipher_top module state diagram is shown in Figure 9. There are ten

rounds of transformations represented by r1 to r10 states. The four cipher

transformations introduced in section 2.3 are applied to each state. The r0 state

corresponds to the initial AddRoundKey transformation in Figure 2.

After leaving the Reset state, the AES128_Cipher_Top module waits for assertion

of the ‘Ld’ signal, which indicates that a valid set of plaintext and cipher key is available

on the input ports. After reaching the r0 state, there is a transition on every clock cycle

for the next ten cycles, as ten rounds of encryption is applied to the State.

After going through ten rounds of transformations, the ‘done’ signal is asserted to

indicate the completion of cipher and availability of the ciphertext on the corresponding

output port.

(34)

Figure 9 – AES128_Cipher_Top Module State Diagram

3.2.2 AES128 Round Key Generation

The round keys used by the AES128_Cipher_Top module are generated based on

the state diagram shown in Figure 10. The AES128_Key_Expand and the

AES128_RCon modules are responsible for generating the round keys. These two

modules operate based on the state diagram shown in Figure 10, which is slightly

different than the one used for the encryption process.

(35)

r0 r8 r10 r6 r5 r7 Reset r4 r3 r1 r2 r9 rst !rst Ld !Ld ↑clk ↑clk ↑clk ↑clk ↑clk ↑clk ↑clk ↑clk ↑clk ↑clk States Outputs · --- ---R0 … R10 w0 = roundkey(Round*i) w1 = roundkey(Round*i+1) w2 = roundkey(Round*i+2) w3 = roundkey(Round*i+3)

Figure 10 – AES128_Key_Expand Module State Diagram

In the state diagram shown above, the ‘Ld’ signal is checked in the ‘r0’ state and if

asserted, then the cipher key is provided to the AES128_Cipher_Top module to be used

for the initial AddRoundKey transformation.

The AES128_Key_Expand module generates four 32-bit keys for each round of the

encryption process, by using the cipher key. Figure 12 shows the block diagram of the

AES128_Key_Expand module. The cipher key is passed to this module through a

128-bit input port, and the round keys are generated on the four output ports.

(36)

AES128_Key_Expand clk rst 128 b cipherkey ld 32 b w3 32 b w2 32 b w1 32 b w0

Figure 11 – AES128_Key_Expand Module

There is a 32-bit round constant value, which is used by the key expansion

algorithm to generate the round keys. This value varies for each encryption round and for

N

r

=1 to N

r

=10 is given by [{02}

i-1

,{00},{00},{00}]. The AES128_RCcon module is used

to generate this value as shown in Figure 13. The AES128_RCon module also operates

based on the state diagram shown in Figure 10.

(37)

AES128_RCon

clk rst ld

32 b rcon

Figure 12 – AES128_Rcon Module

3.3 AES128 Pipelined Design

As stated earlier in this chapter, the round key generation in the proposed design is

pipelined with the encryption rounds. The pipelined operation of the round key

expansion and the cipher is shown in Figure 11. Each AES encryption round ‘n’ (white

cells) is pipelined with the key generation for round ‘n+1’ (gray cells).

(38)

r0

r10 r1

r9

r0

r8 r10

r7 r9

r6 r8

r5 r7

r4 r6

r3 r5

r2 r4

r1 r3

r0 r2

wait for

ld

r1

reset

r0

reset

Figure 13 – AES128 Pipelined Round Key Generation and Cipher Rounds

The most important advantage of the pipelined design is the lower delay for each

encryption iteration, since the round keys for each encryption iteration is present at the

beginning of the iteration cycle. The lower delay in each encryption iteration means

faster completion of each round of encryption. This reduces the overall encryption delay

and allows the design to operate at higher clock frequencies. The higher clock frequency

will increase the message encryption rate (throughput) making this design suitable for

time critical encryption applications.

(39)

Chapter 4 AES128 VERIFICATION

4.1 Overview

In this chapter, we describe the test infrastructure that is developed in

SystemVerilog to verify the functionality of the model described in the previous chapter.

The simulation was done using the Synopsis VCS tool. The testbench fully validated the

design by constructing random cyclic test vectors for the plaintext and the cipher key,

passing them to the model, and comparing the ciphertext to the expected result.

4.2 Testbench Infrastructure

There are four major steps involved in verifying a design using an HDL, including

test vector generation, passing the test vectors to the design and capturing the design

response, determining correctness by comparing the design response with the expected

results, and measuring the verification coverage. The test infrastructure described in this

chapter performs all the above steps in a systematic way.

The AES128 test infrastructure contains several components, some of which are

unique SystemVerilog features. These SystemVerilog features make the verification of a

design more reliable and more structured. The test infrastructure components are

(40)

Figure 14 – AES128 Test Infrastructure

The test infrastructure utilizes the SystemVerilog program block, which has

multiple implicit timing regions to evaluate the design events separately from the

testbench events. The program block is connected to the model through another unique

feature of the SystemVerilog, called Interface.

The Interface bundles the connections between the testbench and the design while

enforcing the synchronization and communication protocol between the two entities. [4]

The definition of the AES128_Top module in SystemVerilog is shown in Figure 15,

which has the high-level instantiation of the modules constructing the test infrastructure.

AES128_Cipher_Top AES128_Key_Expand AES128_rcon AES128_Program AES128_Interface AES128_Top Clock Generator Clk

(41)

module top;

bit clk;

always #5 clk=~clk;

AES128_interface intf(clk);

AES128_program prog(intf);

AES128_cipher_top aes(intf);

endmodule

Figure15 – AES128_Top Definition

The AES128_Top module instantiates the design,

Interface

and the

Program

. The

Interface

and the

Program

constructs are discussed in the next two sections. The clock generator

is defined inside the AES128_top module as well, to avoid any potential race conditions. [4]

4.3 AES128_Interface

As designs are becoming more complex, the number of module ports and the

complexity of the interconnections between the modules are also increasing. The

SystemVerilog Interface construct is the solution for properly connecting the modules as

it provides an intelligent means of communication between several modules.

The Interface bundles the ports together and enforces synchronization between the

modules connected through it. The Interface can provide connectivity between design

modules and/or testbench. The modport construct is used in an Interface to specify the

direction of signals that are bundled together and to group the signals that are

(42)

synchronous to a specifc clock. In this project, the SystemVerilog Interface was only

used to connect the high-level design with the testbench as shown in Figure 14. As a

result, there were two modports declared for the Interface in this project.

In an Interface, the signals that are synchronous to a clock are defined inside a

Clocking Block to ensure correct timing between the testbench and the high-level design.

This ensures that any synchronous signal is driven or sampled with respect to clock and

eliminates the potential race condition that exists between the testbench and high-level

design written in Verilog. The AES128_Interface definition is shown in Figure 16.

Interface AES128_interface(input bit clk); logic rst, ld, done;

logic [127:0] key, text_in, text_out; clocking cb @(posedge clk); output ld ; output key; output text_in; input done; input text_out; endclocking modport dut( input clk, input rst, input ld , input key, input text_in, output done, output text_out); modport tb( input clk, output rst, clocking cb); endinterface

(43)

4.3 AES128_Program

In Verilog, a testbench is basically another module which is connected to the

high-level design. This can cause a race condition between the testbench and the design. [4]

SystemVerilog hardware description language introduces a new construct called Program

to be used as the testbench. “The SystemVerilog Program, having one (or more entry)

points, is closer to a program in C, than Verilog’s many small blocks of concurrently

executing hardware” [4]. It also has multiple implicit timing regions to evaluate the

design events separately from the testbench event, eliminating any race condition

between the design under test and the testbench.

The testbench described in this chapter consists of a single Program, which uses the

Object Oriented Programming feature of SystemVerilog to dynamically build random test

vectors. This is done by defining a Class inside the AES128_Program that encapsulates

two random cyclic variables (Properties) for generating stimulus to the high-level design.

The class defined in the AES128_Program is shown in Figure 17.

As stated earlier in this chapter, another important feature of a testbench is keeping

track of the verification coverage. In other words, to make sure that a design is

thoroughly verified, the testbench needs to test all the design features. “Functional

Coverage is a measure of which design features have been exercised by the test”. [4]

(44)

Functional Coverage is done by means of Cover Groups defined inside the

SystemVerilog Program. Each Cover Group consists of multiple Cover Points that are

the variables used for generating stimulus for the design under test. As it is shown in

Figure 17, the class defined in the AES128_Program uses a single Cover Group to keep

track of the 128-bit plain_text and cipher_key stimuli. Due to limitations of the Synopsys

VCS compiler that limits the cyclic random objects to no more than 16 bits, the 128-bit

stimuli are broken into arrays of 16-bit elements. Each array element is declared as a

Cover Point inside the Cover Group to be sampled together for measuring the Functional

Coverage.

class Transaction;

randc bit [15:0] plain_text[8]; randc bit [15:0] cipher_key[8]; covergroup Coverage; coverpoint this.plain_text[0]; coverpoint this.plain_text[1]; coverpoint this.plain_text[2]; coverpoint this.plain_text[3]; coverpoint this.plain_text[4]; coverpoint this.plain_text[5]; coverpoint this.plain_text[6]; coverpoint this.plain_text[7]; coverpoint this.cipher_key[0]; coverpoint this.cipher_key[1]; coverpoint this.cipher_key[2]; coverpoint this.cipher_key[3]; coverpoint this.cipher_key[4]; coverpoint this.cipher_key[5]; coverpoint this.cipher_key[6]; coverpoint this.cipher_key[7]; endgroup function new; Coverage = new(); endfunction endclass

(45)

The AES128_Program pseudo code is shown in Figure 18. This testbench verifies

the design until the Functional Coverage is 100%. The verification procedure involves

generating the stimuli and passing them through the AES128_Interface to the design

under test and verifying correctness of the results obtained from the design.

Class Transaction // see Figure 17 end class

initial begin

//reset the design

while (Functional_Coverage < 100) begin // randomize the cover points

// populate palin_text & cipher_key using the cover points

// calculate the expected ciphertext using the following function aes128_cipher(plain_text, cipher_key, expected_cipher_text);

// pass the stimuli to the design and wait for the result

// compare the expected result with the ciphertext generated by // the design to determine correctness

// sample the Functional Coverage percentage end

$finish; end

Figure 18 – AES128_Program Pseudo Code

To verify the correct functionality of the design under test, a C-style function is

developed in SystemVerilog, which takes the stimuli as input and calculates the expected

ciphertext. This function is defined as part of package that contains all the variables and

routines involved in the encryption process as shown in Figure 19.

(46)

package AES128_testbench_package logic [7:0] state [4][4];

function aes128_KeyExpansion(input bit [127:0] cipher_key); //generate the round keys

endfunction

function aes128_SubBytes();

//performs SubBytes transformation on the state endfunction

function aes128_ShiftRows();

//performs ShiftRows transformation on the state endfunction

function aes128_AddRoundKey(input int round);

//performs AddRoundKey transformation on the state endfunction

function aes128_MixColumns();

//performs MicColumns transformation on the state endfunction

/*********************************************************************/ function aes128_cipher( input bit [127:0] plain_text, input bit [127:0] cipher_key, output [127:0] expected_cipher_text);

state = plain_text; aes128_KeyExpansion(cipher_key); aes128_AddRoundKey(0); for(round=1;round<10;round++) begin aes128_SubBytes(); aes128_ShiftRows(); aes128_MixColumns(); aes128_AddRoundKey(round); end aes128_SubBytes(); aes128_ShiftRows(); aes128_AddRoundKey(10); expected_cipher_text = state endfunction endpackage

(47)

The complete simulation result of the testbench is included in Appendix C.

Figure 20 illustrates the simulation result for the first three test cases. Each test case starts

with randomizing the cover points to populate the plaintext and cipher key inputs to the

design under test. Then, the expected ciphertext is calculated using the AES128_cipher

function shown in Figure 19. After the design under test has encrypted the plaintext and

the “done” signal is asserted, the ciphertext generated by the hardware model is compared

with the expected result to catch any mismatch. The last step in each test case is gathering

the Functional Coverage and continuing with the next test case until all design features

are tested.

Test# 0 plain_text=55f529e00b1a3f14d8a746860e9b533e cipher_key=bbda8d5457141b255a022fee50b6461c expected_cipher_text:116340860130033742714813403090106826404 intf.cb.text_out: 116340860130033742714813403090106826404 *****+++++Match+++++***** Functional Coverage = %1.562500 Test# 1 plain_text=37500380d9d6dccbf474334e02c23ec9 cipher_key=fd1f4dd414ec0fec5078a0a5ef328294 expected_cipher_text:279883244544087465675915927115776104969 intf.cb.text_out: 279883244544087465675915927115776104969 *****+++++Match+++++***** Functional Coverage = %3.125000 Test# 2 plain_text=dd27152407a1dfc8f2c67423377b3d28 cipher_key=e9a308df435809a059ce2b9e26b08c8b expected_cipher_text: 55911193611511870268248153978729662868 intf.cb.text_out: 55911193611511870268248153978729662868 *****+++++Match+++++***** Functional Coverage = %4.394531

(48)

Chapter 5 AES128 SYNTHESIS

5.1 Overview

A primary objective of this project was to develop a synthesizable model for the

AES128 encryption algorithm. Synthesis is the process of converting the register transfer

level (RTL) representation of a design into an optimized gate-level netlist. This is a

major step in ASIC design flow that takes an RTL model closer to a low-level hardware

implementation.

Synthesis consists of three main steps. The first step is the “Translation”, which

involves converting the RTL description of a design into a non-optimized intermediate

representation that is used by the synthesis tool. The second step is the “logic

optimization”, which optimizes the internal representation by removing redundant logic

and performing Boolean logic optimizations. The third step is called “technology

mapping & optimization” which maps the internal representation to an optimized gate

level representation using the technology library cells based on design constraints.[3]

In this chapter, we describe how the Synopsys Design_Compiler tool was utilized to

synthesize the verified AES128 model, by using a script that was developed to perform

the synthesis based on certain constraints. The script generates several reports about the

synthesis outcome including timing and area estimates.

(49)

5.2 Synthesis Methodology

The first step in the synthesis process is to read all the components in the design

hierarchy. There are three components in the 3-level design hierarchy that needs to be

synthesized. Since the RTL model utilizes a SystemVerilog “Package”, then the

synthesis tool needs to enable the semantics of a package. In addition, the synthesis tool

needs to know if there are multiple instances of calling an automatic function in the

design, to preserve separate values for each instance.

The following Synopsys Design Compiler (DC) shell commands enable package and

automatic function utilizations:

set hdlin_sv_packages "enable"

set hdlin_infer_function_local_latches "true"

Then, the package and the modules in the design hierarchy are read using the following

commands:

read_file -format sverilog {./AES128_DUT_package.sv} read_file -format sverilog {./AES128_rcon.sv}

read_file -format sverilog {./AES128_key_expand.sv} read_file -format sverilog {./AES128_cipher_top.sv}

After reading the design files, they are “Analyzed” and “Elaborated” through

which the RTL code is converted into the Synopsys Design Compiler internal format. [6]

(50)

The intermediate results are stored in the defined “working library”. The following DC

commands are used for these steps:

analyze -library WORK -format sverilog {./AES128_rcon.sv}

analyze -library WORK -format sverilog {./AES128_key_expand.sv} analyze -library WORK -format sverilog {./AES128_cipher_top.sv} elaborate AES128_rcon -architecture verilog -library WORK

elaborate AES128_key_expand -architecture verilog -library WORK elaborate AES128_cipher_top -architecture verilog -library WORK

Then, the “dont_touch” attribute is removed from all the modules in the design

hierarchy so that during the optimization phase the tool can modify the modules. The

following DC command is used for this step:

remove_attribute [find design -hierarchy] dont_touch

After this step, a 40MHz clock signal is applied to the clock port of the root

module, and the synthesis tool is programmed not to modify the clock tree during the

optimization phase. In addition, an arbitrary input delay of 5ns with respect to the clock

port is applied to all input and output ports (except the clock port itself) to set a safe

margin by considering any unintended source of delay such as the delay associated with

driving module/modules.

(51)

Then, the design is constrained with hypothetical maximum area equal to zero to

force the tool to make the gate level netlist as compact as possible. The following DC

commands are used for these steps:

create_clock -name clk -period 25 [find port intf_clk] set_dont_touch_network [find clock "clk"]

set non_clock_ports [remove_from_collection [all_inputs] [get_ports intf_clk]]

set_input_delay 5 $non_clock_ports -clock clk set_output_delay 5 [all_outputs]

set_max_area 0

In the next steps, the tool is programmed to consider a unique design for each cell

instance by removing the multiply-instantiated hierarchy in the current design. Then, the

synthesis script removes the boundaries from all the components in the design hierarchy

and removes all levels of hierarchy.

uniquify

set_boundary_optimization [find design -hierarchy] true ungroup -all -flatten -all_instances

Finally, the tool compiles the design with high effort and reports any warning

related the mapping and final optimization step. At the end, the tool generates reports for

the optimized gate level netlist area, the worst combinational path timing, and any

(52)

report_attribute > ./Synthesis_Reports_Attribute.txt report_area > ./Synthesis_Reports_Area.txt

report_constraints -all_violators >

./Synthesis_Reports_Constraint_Violaters.txt

report_timing -path full -delay max -max_paths 1 -nworst 1 > ./Synthesis_Reports_Timing.txt

5.3 Synthesis Timing Result

The synthesis tool optimizes the combinational paths in a design. In General, four

types of combinational paths can exist in any design: [3]

1-

Input port of the design under test to input of one internal flip-flip

2-

Output of an internal flip-flip to input of another flip-flip

3-

Output of an internal flip-flip to output port of the design under test

4-

A combinational path connecting the input and output ports of the design

under test

The last DC command in the script developed in previous section, instructs the tool

to report the path with the worst timing. In this case, the path with the worst timing is a

combinational path of type two. The delay associated with this path is the summation of

delays of all combinational gates in the path plus the Clock-To-Q delay of the originating

flip-flop, which was calculated as 24.09ns. By considering the setup time of the

destination flip-flop in this path, which is 0.85ns, the 40MHz clock signal satisfies the

worst combinational path delay. The delays of combinational gates, setup time of

A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION STANDARD (AES) ALGORITHM USING SYSTEMVERILOG

A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION

STANDARD (AES) ALGORITHM USING SYSTEMVERILOG

Bahram Hakhamaneshi

B.S., Islamic Azad University, Iran, 2004

PROJECT

Submitted in partial satisfaction of

the requirements for the degree of

MASTER OF SCIENCE

in

COMPUTER ENGINEERING

at

CALIFORNIA STATE UNIVERSITY, SACRAMENTO

FALL

2009

ii

A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION

STANDARD (AES) ALGORITHM USING SYSTEMVERILOG

A Project

by

Bahram Hakhamaneshi

Approved by:

__________________________________, Committee Chair

Dr. Behnam Arad

____________________________

Date

__________________________________, Second Reader

Dr. Isaac Ghansah

____________________________

Date

iii

Student: Bahram Hakhamaneshi

I certify that this student has met the requirements for format contained in the University

format manual, and that this project is suitable for shelving in the Library and credit is to

be awarded for the Project.

__________________________, Graduate Coordinator ________________

Dr. Suresh Vadhva

Date

iv

of

A HARDWARE IMPLEMENTATION OF THE ADVANCED ENCRYPTION

STANDARD (AES) ALGORITHM USING SYSTEMVERILOG

by

Bahram Hakhamaneshi

The increasing need for protecting data communication in computer networks has

led to development of several cryptography algorithms. The Advanced Encryption

Standard (AES) is a computer security standard issued by the National Institute of

Standards and Technology (NIST) intended for protecting electronic data. Its

specification is defined in Federal Information Processing Standards (FIPS) Publication

197. The AES cryptography algorithm can be used to encrypt/decrypt blocks of 128 bits

and is capable of using cipher keys of 128, 196 or 256 bits wide (AES128, AES196, and

AES256).

The Advanced Encryption Standard can be implemented in either software or

hardware. Hardware acceleration is the use of hardware to perform a task more

efficiently than is possible in software. In order to achieve higher performance in today’s

heavily loaded communication networks, utilization of hardware accelerators for

v

proposed. A unique feature of the proposed pipelined design is that the round keys,

which are consumed during different iterations of encryption, are generated in parallel

with the encryption process. This lowers the delay associated with each round of

encryption and reduces the overall encryption delay of a plaintext block. This leads to an

increase in the message encryption throughput.

The proposed pipelined design was modeled and validated in SystemVerilog

hardware description language. The testbench developed for validating the design kept

track of Functional Coverage to make sure the design is thoroughly verified. The design

was validated using the Synopsys VCS tool and synthesized using the Synopsys

Design-Compiler tool. The gate level netlist generated during the synthesis phase using the

LSI_10K technology library was capable of operating at 40MHz frequency. We expect

the timing and area of the gate level netlist to improve if a more efficient technology

library file is used for synthesis.

Finally, to get an estimate of the speed gain by the hardware implementation, a

virtual system was created using the Virtutech® Simics™ software to emulate the

execution of a “C” program that implements the AES128 encryption in software. The

Simics virtual system utilized in this project is based on Intel’s x86 architecture with the

440BX chipset and has a 2GHz Pentium4 processor.

vi

The statistics gathered from the virtual system showed that it would take more than

30,000 CPU cycles to encrypt a block of plaintext, assuming one clock per instruction.

The results indicate that the hardware implementation proposed in this project is at least

60 times faster than the software implementation.

__________, Graduate Coordinator