The efficiency of any cryptographic primitive can significantly influence its popularity. For example, Serpent [8] was one of the AES (Advanced Encryption Standard) com- petition finalists and it was described by NIST (the competition organiser) as having a high security margin. However, in the last round of the competition, Serpent failed in favour of Rijndael [53], which was described as having just an adequate security, because Serpent was very slow in software compared to Rijndael. In this appendix, we will be mainly concerned with the software efficiency of hash functions (but see section A.2.2for a brief discussion about hardware efficiency).
A.2.1 Software Optimisation
Generally, there are two types of software optimisations, high-level and low-level op- timisations. In high-level optimisation, a cross-platform implementation written in a high-level language, such as C, is optimised. However, different compilers may treat
1
We note that some SHA-3 submissions include optimisation discussions, e.g. [69], but these by no means are comprehensive.
high-level code slightly differently such that a code might be considered optimised only if it was compiled by a particular compiler. On the other hand, low-level optimisation involves optimising a machine (or assembly) code, and is rarely cross-platforms since different platforms often use different instruction sets. While optimising a low-level code is tedious and error-prone, it gives the highest degree of control over the code. In general, efficiency requirements highly depend on the application, and thus the tar- geted application should also be taken into account when implementing a hash function. Software efficiency of hash functions can be influenced by several factors including, the platform in which the hash function is executed, the compiler used to compile the hash function code, and the executing operating system (hash functions optimised for 64-bit operating systems are slower in 32-bit operating systems, e.g., Skein [69]).
Platforms. Both high-level and low-level optimisations are usually tuned for a spe- cific platform. For example, in the SHA-3 competition2the reference platform in which the candidates were instructed to evaluated their submissions on was Intel Core 2 Duo, thus most of the candidate submissions were especially tuned to be optimal in Intel platforms (which mean that they may not be optimal in other platforms!). Platforms can be roughly classified as follows:
• High-end. These are platforms with high computational and memory resources, and usually based on 32-bit or 64-bit architectures, often with multiple processing cores. Examples of such platforms include Intel and AMD.
• Intermediate. These are most 16-bit and 32-bit microcontrollers with moderate computational resources (Note that some microcontrollers are low-end platforms). Examples of such platforms include ARM and AVR.
• Low-end. These are 8-bit platforms with limited computational and memory (usually kilobytes) resources. Examples include Smart Cards and FRID.
Compilers. Another very important factors to consider when investigating hash functions efficiency is the sophistication of the compiler. Most of the available compilers (commercial and open source) like Microsoft Visual Studio and GCC are sophisticated enough to automatically optimise the code. However, these compilers sometimes apply some optimisation techniques that may not be optimal for all platforms. Thus, it is advisable to compile the code with several different compilers and use different optimi- sation switches, then only choose the optimum one for a target platform, though this
2
For comprehensive resource about SHA-3 competition and all its candidates, seehttp://ehash. iaik.tugraz.at/wiki/The_SHA-3_Zoo.
process may be tedious. One would think that a commercialised compiler developed by the vendor of a particular platform would outperform other open source or third-party commercial compilers. However, Wenzel-Benner and Graf [157] showed that this is not always the case when they implemented several SHA-3 hash function candidates on an ARM Cortex platform and then compiled them twice, once by ARM-CC compiler (ARM’s own compiler) and another with GCC compiler (open source compiler). Sur- prisingly, they found that in some cases, GCC compiled code is more efficient than that compiled by ARM-CC.
Instruction Sets. High-level code will eventually be converted into a low-level (ma- chine) code that consists of instructions. A platform with only several tens of instruc- tions will most likely not perform as well as another with hundreds of instructions, basically because no matter what optimisation techniques are applied, if no efficient instructions exist for a particular operation, the code will need to be converted to a series of instructions implementing that operation. Most recent platforms adopt the so- called Single Instruction Multiple Data (SIMD) technology, which provides instructions allowing for parallel data execution (see sectionA.3.1).
A.2.2 Hardware Optimisation
Although hardware implementation and optimisation is not our main focus here, in this section we discuss a few interesting results of recent hardware evaluation of SHA-3 can- didates. In [150], Tillich et al. presented optimised hardware implementations of the 14 SHA-3 round 2 candidates. Their results show that Keccek and Luffa significantly out- perform all other candidates. The authors did not make any conclusions, but we point out that Keccek and Luffa are the only round 2 candidates adopting permutation-based (sponge and sponge-like) constructions. Although not strictly a hardware implementa- tion aspect, Intel has recently released a new instruction set named AES-NI [30]. Hash functions based on AES, such as LANE, ECHO and Lesamnta, will benefit from these instructions significantly. However, in order for a hash function to make the most of these new AES-NI instructions, it should be based on an unmodified AES construction.