VERILOG DESIGN OF A 256-BIT AES CRYPTO PROCESSOR CORE
LAI YIT PIN
UNIVERSITI TEKNOLOGI MALAYSIA
iii
iv
ACKNOWLEDGEMENT
In preparing this thesis, I was in contact with many people, researchers,
academicians, and practitioners. They have contributed towards my understanding
and thoughts. In particular, I wish to express my sincere appreciation to my thesis
supervisor, Professor Dr. Mohamed Khalil bin Hj. Mohd Hani, for encouragement,
guidance, critics and friendship. Without his continued support and interest, this
thesis would not have been the same as presented here.
I am also indebted to Intel Microelectronics (M) Sdn. Bhd for funding my
part-time Master study. My manager, Intel-University staffs, UTM School of
Postgraduate Study and Faculty of Electrical Engineering staffs also deserve special
thanks for their assistance in providing stable and comfortable environment for me to
focus on my research.
My fellow postgraduate coursemates should also be recognized for their
support. Special appreciation to Mohd. Izuan Ismail and Norashikin M. Thamrin for
their support in ramping me up on previous UTM research made on this topic.
I am truly grateful to all my family members, who motivated me the most
v
ABSTRACT
The 21st Century is the Century of Digital, almost all of the information
processing and telecommunication are in digital formats, therefore it is a must to
ensure the digital data storage and transmission are secured, thus to ensure privacy.
Cryptography is the practice to protect one’s information through encryption. As of
today, the most widely used and studied algorithm is the Advanced Encryption
Standard (AES) published 2001. AES algorithm is fast and easy to be implemented
in both software and hardware; however software implementation is vulnerable due
to loopholes in operating system and generally is less efficient than hardware
implementation. This thesis proposes a design of 256-bit AES crypto-processor core
in Verilog RTL, by extending from previous UTM student’s research, targeted at
FPGA implementation in System-on-Chip (SoC) designs. This soft core is designed
in compact form and generalized to comply with multiple AES specifications, which
vi
ABSTRAK
Abad ke-21 adalah Abad Digital, hampir kesemua activity pemprosesan
maklumat dan telekommunikasi adalah dijalankan dalam format digital, oleh itu
adalah kemestian supaya penyimpanan and pertukaran data digital adalah selamat
dan dipertahankan daripada penceroboh maklumat, supaya kesulitan maklumat
adalah terjamin. Kriptografi adalah suatu bidang mempertahankan maklumat digital
melalui encryption. Pada masa kini, algorithm yang paling banyak digunakan and
dipelajari adalah Advanced Encryption Standard (AES) yang diterbitkan pada tahun
2001. Algorithm AES adalah pantas dan senang dilaksanakan melalui perisian
ataupun perkakasan. Walaubagaimanapun, kriptografi melalui perisian adalah
kurang selamat kerana perisian selalu dipengaruhi oleh masalah system operasi, dan
biasanya kekecapan perisian adalah kurang berbanding dengan perkakasan. Thesis
ini melanjutkan penyelidikan pelajar UTM yang lalu, mencadangkan satu rekabentuk
prosessor kriptografi menggunakan 256-bit AES algorithm dengan VerilogRTL,
menujukan penggunaan FPGA dalam rekabentuk System-on-Chip (SoC).
Rekabentuk ini adalah padat dan boleh melaksanakan beberapa spesifikasi AES,
menjadikannya salah satu asset yang penting kepada UTM dalam penyelidikan SoC
vii
TABLE OF CONTENTS
CHAPTER TITLE PAGE
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
ABSTRAK vi
TABLE OF CONTENTS vii
LIST OF TABLES xi
LIST OF FIGURES xiii
LIST OF ABBREVIATIONS xvi
LIST OF SYMBOLS xvii
LIST OF APPENDICES xviii
1 INTRODUCTION 1
1.1 Background and Research Motivation 1
1.2 Objectives 3
1.3 Scopes of Work 3
1.4 Significant of Work and Research Contributions 4
1.5 Research Methodology, Techniques and Tools 5
1.6 Organization of the Thesis 6
2 BACKROUND AND LITERATURE REVIEW 8
2.1 Introduction to the Advanced Encryption Standard
(AES) Algorithm
viii
2.2 Differences of AES128 and AES256 Architectures 9
2.3 Previous Related Work 10
2.4 RTL Modeling with Verilog HDL 10
2.5 Pipeline Design 11
2.6 Summary of Chapter 2 11
3 AES ALGORITHM AND SPECIFICATION 12
3.1 Introduction 12
3.2 Algorithm Notations and Conventions 13
3.3 Comparison of Different AES Specifications 13
3.4 Background Mathematics 14
3.4.1 The Field GF (28) 14
3.4.2 Finite Field Addition 15
3.4.3 Finite Field Multiplication 15
3.4.4 Multiplicative Inverse 16
3.4.5 Polynomials with Coefficients in GF (28) 17
3.5 The Advanced Encryption Standard Specification 20
3.5.1 State Array and Cipher Key State Array 20
3.5.2 AES Key Expansion 22
3.5.2.1RCON Transformation 23
3.5.2.2ROT() Transformation 24
3.5.2.3SubByte() Transformation 25
3.5.3 AES Round Transformation 27
3.5.3.1AddRoundKey() Transformation 28
3.5.3.2ShiftRows() Transformation 29
3.5.3.3MixColumn() Transformation 30
3.6 Summary of Chapter 3 31
4 VHDL TO VERILOG CONVERSION 32
4.1 Case Sensitivity and Coding Convention 32
4.2 VHDL To Verilog Conversion Summary 33
4.3 Coding Efficiency Gain From Verilog Conversion 35
ix
5 DESIGN OF UTM-CRYPTO256 PROCESSOR CORE 36
5.1 Design Considerations 36
5.2 System Level Modeling 37
5.3 Top Level Design of the Processor Core 39
5.4 Functional Block Design at the Processing Engine 40
5.5 Detailed Functional Design of the Processor Core 40
5.5.1 Input and Output Signals 42
5.5.2 Interface Handshake Signals 43
5.5.3 Control Signals in the UTM-Crypto256 46
5.6 Modular Design of the Processing Engine 46
5.7 AES Key Expander Unit 48
5.7.1 reg_32 51
5.7.2 Key Module 52
5.7.3 SubByte 53
5.7.3.1Design of Inverter Circuit 55
5.7.3.2Design of Squarer Circuit 55
5.7.3.3Design of Multiplier Circuit 56
5.7.3.4Design of SubByte and InvSubByte Module 57
5.7.3.5Design of SubWord Module 61
5.7.3.6RCON 62
5.7.3.7ROT 62
5.8 Key RAM 63
5.9 AES Transformer Unit 64
5.9.1 ShiftSubByte 68
5.9.2 MixColumn 70
5.9.2.1xtime 73
5.9.2.2mixColumnByte 74
5.9.2.3mixColumnWord 74
5.9.2.4mixColumn_128 75
5.9.3 AddRoundKey 76
5.10Design of AES256_CU 77
5.10.1 Behavioral Flowchart 78
x
5.11Exploration of Pipeline Implementation 92
5.12Summary of Chapter 5 92
6 DESIGN VERIFICATION AND PERFORMANCE
ANALYSIS
93
6.1 Design Verification Procedures 93
6.1.1 Design Synthesis 94
6.1.2 Functional and Timing Simulation 95
6.2 Design Synthesis Results 96
6.3 Issues and Challenges In the Design and Verification
Process
97
6.3.1 Race Condition in ShiftSubByte Module 98
6.3.2 FSM Modeling Requirement by Quartus II 100
6.3.3 Sized Operand in Arithmetic Operations 102
6.3.4 Clocked AddRoundKey Transformation 103
6.4 Timing Simulation Results 104
6.4.1 ShiftSubByte 104
6.4.2 MixColumn 105
6.4.3 AES Key Expansion 105
6.4.4 Round Transformation 112
6.4.5 UTM-Crypto256 Processor Core 118
6.6 AES Crypto Processors Performance and Comparison 123
6.6 Summary of Chapter 6 125
7 CONCLUSION 126
7.1 Concluding Remarks 126
7.2 Future Improvement Works 127
7.2.1 Verification with FPGA Development Board 127
7.2.2 Performance Push through Synchronous Design 128
7.2.3 Built-In-Self-Test (BIST) Implementation 128
7.2.4 Pipeline / Multithreading Implementation 128
REFERENCES 129
xi
LIST OF TABLES
TABLE
NO. TITLE PAGE
3.1 AES Key-Block-Round Combinations in FIPS-197 13
3.2 RCON Values 24
4.1 Language Construct Mapping between VHDL and Verilog 33
4.2 Coding Comparison between VHDL and Verilog 35
5.1 Primary I/O Signals of UTM-Crypto256 Processor Core 42
5.2 Interface Handshake Signals of UTM-Crypto256 43
5.3 Component Names and Related Section 47
5.4 Primary I/O Signals of AES Key Expander 50
5.5 Control Vector Signals of AES Key Expander 50
5.6 Primary I/O Signals of reg_32 Module 51
5.7 Primary I/O Signals of keyModule Module 52
5.8 The Inverse of 4-bit Values 55
5.9 Primary I/O Signals of KeyRAM_128 Module 63
5.10 Primary I/O Signals of AES Transformer 64
5.11 Control Vector Signals of AES Transformer 67
5.12 Primary I/O Signals of shiftSubByte Module 68
5.13 Primary I/O Signals of mixColumn_128 Module 70
5.14 Steps of Obtaining Results of Encryption and Decryption 72
5.15 Primary I/O Signals for AddRoundKey Funtional Block 77
5.16 FSM State Transition Description 84
5.17 RTL Control Sequences 87
xii
6.2 Results of Logic Synthesis of AES Components 97
6.3 Simulation Summary for AES Key Expander 105
6.4 Simulation Result for AES Transformation 112
6.5 Throughput of the UTM-Crypto256 Processor Core 118
6.6 Comparisons with other Implementations 123
7.1 Specification of the Proposed AES Crypto-Processor Core 127
B.1 S-box: substitution values for the byte xy 159
B.2 Inverse S-box: substitution values for the byte xy 160
C.1 Steps of 128-bit Cipher Key Expansion 162
C.2 Steps of 256-bit Cipher Key Expansion 163
xiii
LIST OF FIGURES
FIGURE
NO. TITLE PAGE
3.1 Initial input bytes, State Array and Key State Array 21
3.2 Example of State Array and Key State Array 22
3.3 AES256 Round Key Expansion Algorithm 23
3.4 AES128 Round Key Expansion Algorithm 23
3.5 ROT() Transformation 24
3.6 SubByte() and InvSubByte() Transformation 26
3.7 AES Round Transformation Algorithm 27
3.8 AddRoundKey() Transformation 28
3.9 ShiftRows() and invShiftRows() Transformation 29
3.10 MixColumn() and invMixColumn() Transformation 30
5.1 System Level Modeling of the UTM-Crypto256 Processing
Core
38
5.2 Top Level Design of the UTM-Crypto256 Processing Core 39
5.3 Functional Block Diagram of UTM-Crypto256 Processing
Engine
40
5.4 Functional Block Diagram of UTM-Crypto256 Processor Core 41
5.5 Handshaking Protocol with Other Control Processor 45
5.6 Design Hierarchy of the UTM-Crypto256 Processing Engine 46
5.7 Top-level Block Diagram for the AES Key Expander 48
5.8 Functional Block Diagram for the AES Key Expander 49
5.9 Functional Block Diagram of reg_32 module 51
xiv
5.11 Schematic Representation of a Hardware-Efficient Calculation
of the Inverse in GF (28)
54
5.12 Squarer Circuit 55
5.13 Multiplier Step-1: DataA x DataB 56
5.14 Multiplier Step-2: 7-bit to 4-bit Multiplexing 56
5.15 Translating Initial Input Signal 57
5.16 Up_Input and Low_input 58
5.17 Square of Low_Input 58
5.18 Transformation of Up_Input to mult 58
5.19 mul2 Transformation, part1 Transformation, part2
Transformation
59
5.20 Invert, mult32, mul4 59
5.21 invout Signal 59
5.22 Result for SubByte in Encryption Mode 60
5.23 Result for SubByte in Decryption Mode 60
5.24 Functional Block Diagram for sboxWord Module 61
5.25 ROT Design 62
5.26 Structural View of the Key RAM Design 63
5.27 Top-level Block Diagram of AES Transformer 64
5.28 Functional Block Diagram of AES Transformer 66
5.29 Functional Block Diagram of shiftSubByte Module 68
5.30 ShiftRows Design 69
5.31 Top Level Block Diagram of MixColumn Design 70
5.32 Functional Block Diagram of mixColumnByte Module 73
5.33 mixColumnWord Design 74
5.34 mixColumn_128 Design 75
5.35 AddRoundKey Design 76
5.36 Functional Block Diagram of the AddRoundKey Design 76
5.37 Behavioral Flowchart for AES128 Key Expander (Encryption) 78
5.38 Behavioral Flowchart for AES256 Key Expander (Encryption) 79
5.39 Behavioral Flowchart for AES128 Key Expander (Decryption) 80
5.40 Behavioral Flowchart for AES Transformation (Encryption) 81
xv
5.42 FSM State Transition for the AES256_CORE 83
6.1 Quartus II Screenshot Showing Hold Violation Due to
1-Process FSM Modeling
101
6.2 Timing Simulation for ShiftSubByte Module 104
6.3 Timing Simulation for MixColumn Module 105
6.4 Timing Simulation for keyGen_256 using AES256
specification (AES_spec = 0)
108
6.5 Timing Simulation for keyGen_256 using AES128
specification (AES_spec = 1)
109
6.6 Functional Simulation for keyGen_256 in Modelsim (AES256
Encryption) to highlight the RAM content
110
6.7 Functional Simulation for keyGen_256 in Modelsim (AES256
Decryption) to highlight the RAM content
111
6.8 Timing Simulation for AES Transformation (AES256 -
Encryption)
114
6.9 Timing Simulation for AES Transformation (AES256 -
Decryption)
115
6.10 Timing Simulation for AES Transformation (AES128 -
Encryption)
116
6.11 Timing Simulation for AES Transformation (AES128 -
Decryption)
117
6.12 Timing Simulation Result for UTM-Crypto256 Processor Core
(AES256-Encryption)
119
6.13 Timing Simulation Result for UTM-Crypto256 Processor Core
(AES256-Decryption)
120
6.14 Timing Simulation Result for UTM-Crypto256 Processor Core
(AES128-Encryption)
121
6.15 Timing Simulation Result for UTM-Crypto256 Processor Core
(AES128-Decryption)
122
E.1 Modelsim Main GUI 168
xvi
LIST OF ABBREVIATIONS
AES – Advanced Encryption Standard
AES128 – AES specification in 128-bit architecture
AES192 – AES specification in 192-bit architecture
AES256 – AES specification in 256-bit architecture
FIPS PUBS – Federal Information Processing Standards Publications
FIPS-197 – Federal Information Processing Standards Publication 197 a.k.a.
ADVANCED ENCRYPTION STANDARD (AES)
NIST – National Institute of Standards and Technology, USA
RCON – Round Constant
xvii
LIST OF SYMBOLS
Nb – Number of columns (32-bit words) comprising the State array.
Nk – Number of 32-bit words comprising the Cipher Key.
Nr – Number of rounds, which is a function of Nk and Nb (which is fixed).
Rcon[] – The round constant word array
K – Cipher Key
M – Plain Text (in encryption) or Cipher Text (in decryption)
S – State array
W – Key State array
⊕ – Exclusive OR
xviii
LIST OF APPENDICES
APPENDIX TITLE PAGE
A Functional Block Diagram And Verilog Source Code
For The Utm-Crypto256 Processor Design
134
B SubByte And InvSubByte Transformation 157
C Cipher Key Expansion 159
D Cipher Transformation 163
E Mentor Graphics Modelsim 6.1g 170
CHAPTER 1
INTRODUCTION
This thesis proposes system level modeling of an encryption core in Verilog
Hardware Description Language (HDL) for Field-Programmable Gate Array (FPGA)
implementation. The design is to accelerate fast computation of digital data
encryption and decryption using the Advanced Encryption Standard (AES)
algorithm. In this chapter, the challenges of cryptography are discussed, providing a
framework for the objectives of this project. This chapter covers the background,
research motivation, research objectives, significant of the work, scope of work,
research methodology and finally the thesis organization.
1.1 Background and Research Motivation
As we move into twenty-first century, almost all information processing and
telecommunication are in digital formats. Most data, for example photos, music and
private information can be transmitted through copper, optical or wireless network to
a recipient anywhere in the world. In order to protect the data and keep privacy, the
information system should be equipped with cryptography and robustness techniques
2
Cryptographic services are required across variety of platforms in a wide
range of applications such as secure access to private networks, electronic commerce
and health care. Cryptography means hidden writing, the practice of using
encryption to conceal text. The security of conventional encryptions depends on
several factors. First, the encryption algorithm must be powerful enough that is
impractical to decrypt a message on the basis of cipher text alone. Beyond that, the
security depends on the secrecy of the key, not the secrecy of the algorithm. That is,
it is assumed that is also impractical to decrypt a message on the basis of the cipher
text plus knowledge of the encryption or decryption algorithm.
Generally, most of cryptography algorithms are implemented in software, but
software implementation cannot offer the physical security for the key (Joon et al.,
2002). Software is operating system (OS) dependent and also exposed to viruses and
hackers attacks that may interrupt the OS running on the general computer, for
example on Microsoft Windows based computer or Apple Macintosh machine.
Execution on general-purpose processor (CPU) of the algorithm will use most CPU’s
resources to calculate and execute all processes in the algorithm because CPU lacks
of instructions for modular arithmetic with operations on very large operands. Thus,
word sizes mismatch, less parallel computations and algorithm/architecture are the
main problems faced by software implementation of cryptosystem (Janssens et al.,
2001).
Different applications of the data encryption algorithm may require different
speed/area trade-offs. Some applications, such as smart card and cellular phone,
require a small area. Other applications, such as World Wide Web (WWW) servers
and Asynchronous Transfer Mode (ATM) networks are speed critical. Some other
applications, such as digital video recorders, require an optimization of speed/area
ratio (Xinmiao et al., 2003).
In general, hardware based solution are the embodiment of choice for military
and serious commercial applications (Schneier, 1996). As an encryption algorithm
running on a generalized computer has no physical protection, hardware
3
the implemented algorithm and also can be embedded the hardware as co-processor
in any devices that require data security processing.
In this research, the UTM-Crypto256 Processor Core design is implemented
on hardware (FPGA) with key RAM, which can make not only a forward key
scheduling for encryption but also a reversed key scheduling for decryption.
Therefore, compared to software implementation, hardware implementation
enhances the physical security as well as higher speed and outside attackers cannot
easily attack, interrupt or modify its operation.
1.2 Objectives
From the discussion from previous section, this report set out two main
objectives for the research:
1. To design a 256-bit architecture AES encryption processor core in Verilog
HDL based on previous UTM-Crypto128 IP.
2. To explore the implementation of pipeline architecture to the design.
1.3 Scopes of Work
Based on available hardware and software resources, limited time frame and
expertise, this research project is narrowed down to the following scope of work:
1. The UTM-Crypto128 processor core was designed in VHDL using AES128
4
256-bit architecture (128-bit data block and 256-bit key length), with
backward compatibility to AES128 specification.
2. The design is to be modeled at system level, and be translated to RTL
abstraction by hand.
3. The design is targeting implementation with Altera APEX20KE FPGA, using
EP20K200EFC484-1 device specifically.
4. Logic design and functional validation is performed using the Mentor
Graphics Modelsim simulator, it also acts as primary debugger tool due to its
rich debug capabilities.
5. Synthesis and timing simulation for verify the design correctness in FPGA
implementation is performed with the Altera Quartus II Web-Edition
software.
6. Test vectors used to verify the design is based on the “Advanced Encryption
Standard” published by National Institute of Standards and Technology, USA
(Federal Information Processing Standards Publication 197), hereafter
referred as FIPS-197.
1.4 Significant of Work and Research Contributions
1. UTM will own its encryption soft core IP that supports multiple AES
architectures (128-bit and 256-bit key length), thus enables future works on
System-On-Chip (SoC) researches.
2. The soft core is modeled at system level for easier debug and upgrade in SoC
design, and implemented in Verilog language thus provides wider options in
5
1.5 Research Methodology, Techniques and Tools
In order to make this research successful and complete within a limited time
frame, a proper planning is essential and all working procedures should be identified
clearly. This research involves mostly efforts on hardware design and the remaining
is software development to support the hardware environment for validation and
testing purposes.
The work begins with the literature review on cryptography with AES
algorithm and specification between 128-bit and 256-bit architectures, RTL
modeling with Verilog HDL, and fundamental of pipeline design.
After literature review, problem formulation and scope identification are done
to extend the UTM-Crypto128 to 256-bit architecture (UTM-Crypto256) with
implementation done in the Altera FPGA device. Applications for UTM-Crypto256
are targeted at security devices and secured SoC, such as smart card reader.
The most important part before designing the hardware is to understand the
AES algorithm and specification thoroughly, as well as other essential mathematical
concepts such as finite field theory, modular arithmetic, number theory, and etc.
Doing arithmetic in finite field is the key part to the implementation of the
communication and coding systems including the AES (M. H. Jing et al., 2001).
In this research, most of the efforts are focus on architectural design of the
UTM-Crypto256 processor core with consideration on all resources possibly needed.
Architecture designs of the UTM-Crypto256 processor are coded in Verilog
by hand. The compilation, synthesis and simulation are performed using the Mentor
Graphics Modelsim simulator and Altera Quartus II Web-Edition software. Any
design errors or bugs are fixed before a limited and experimental prototype is
developed. Timing and waveform simulation are then performed using test vector
6
Along with the development, exploration will be made on enabling pipelining
capabilities to the encryption core, which will improve the performance without cost
on power. This attempt is also to re-use whatever learnt in the UTM MEHA1BPA
Master Course back to the project, such as the pipeline knowledge which was taught
in the Advanced Computer Architectures subject.
From the starting to ending of this research, literature review is a continuous
process in order to get the latest update related to the research. At the same time,
every research progress and status are documented and reported. Any problems and
issues faced can be solved effectively by supervision and discussion.
1.6 Organization of the Thesis
This report is organized into seven chapters. The first chapter is the
introduction which covers the background, problem statement, objectives, scopes, the
significant and contributions of the project, and at the end of the chapter deals with
the methodology, tools and techniques employed in this project. It discusses on
design environment and also the ways on how the hardware mapping of Advanced
Encryption Standard algorithm is possible in this project using state-of-the-art design
tools.
Chapter 2 covers high level overview of the AES algorithm and key
differences between its 128-bit and 256-bit architectures. It also presents the reason
for modeling the design in Verilog HDL, as well as brief introduction to pipelining.
Chapter 3 elaborates the specification of AES algorithm. It covers all the
functions and transformation in AES in details, based on the 256-bit architecture.
Chapter 4 summarizes the work done in migrating the VHDL
UTM-Crypto128 into Verilog HDL modeling, as well as the efficiency gain from the HDL
7
Chapter 5 discusses the design and development of the UTM-Crypto256
processor core. It includes on how the original AES algorithm can be rearranged and
restructured in such a way to make it easy and possible to design in hardware. All
signals including the needed control signals to drive the UTM-Crypto256 processor
core are clearly identified and defined.
Chapter 6 presents the validation and performance analysis of the
UTM-Crypto256 processor core. It presents the results obtained from running the
performance analysis with artificially generated data. Both Functional and Timing
simulations using Modelsim and Quartus-II simulators are used to proof the design
correctness.
Finally in Chapter 7, the research work is summarized and ended with