• No results found

Chaos-based Hash Functions. Chaos theory is the mathematical representation of dynamic systems. These systems possess many desirable properties that suit the requirements of hash functions. For example, chaotic systems are very sensitive to changes in their initial values, potentially fulfiling the desirable hash function property requiring the output of the hash function to be highly sensitive to changes in its input; this phenomena is called the avalanche effect (also called butterfly effect in the chaos theory literature). Moreover, chaotic systems are one way functions and unpredictable. Hash functions based on chaos theory use chaotic maps, which are functions that exhibit particular chaotic behaviours; examples of these maps include: logistic map [104], tent map [162], and cat map [59]. Unfortunately, most chaos-based hash functions suffer from poor efficiency due to their inherent complex structure, which makes them unattractive as a practical approach for building hash functions.

Cellular Automata-based Hash Functions. Cellular Automata (CA) are discrete time models consisting of collections of cells organised in a grid, and each cell has a current state. The states of the cells evolve over time depending on their current states and the states of the neighbouring cells. CA were originally used by von Neumann [153] while he was studying self-reproducing systems and then popularised by Wolfram’s substantial work in this area [158] who observed that based on simple rules, very complex behaviours can be obtained. Damg˚ard was the first to propose a hash function based on CA [55], but his proposal was cryptanalysed by Daemen et al. [51] who, in turn, proposed another CA-based hash function, called CellHash. The same authors later proposed SubHash [52] which is an improved version of CellHash; both CellHash and SubHash are hardware-oriented and were cryptanalysed in [42]. Another hash function based on CA was proposed by Mihaljevie et al. [112].

3.6

Summary

In this chapter we provided a thorough discussion of the state of art of hash functions design. Roughly speaking, hash functions can either be keyless or keyed. Each class has different applications and is based on different design principles. Hash functions can also be classified as iterative or parallel. While iterative functions are indeed the most common, parallelisable hash functions are increasingly being popularised with the rapid advent of parallel systems. We provided a lengthy discussion about the popular Merkle-Damg˚ard construction, how it fell prey to various generic attacks and what modifications were proposed to patch it. Finally, we also discussed how compression functions are being designed and what approaches are adopted.

Chapter 4

Integrated-Key Hash Functions

Traditionally, hash functions were designed in the keyless setting, where a hash function accepts a variable-length message and returns a fixed-length fingerprint. Unfortunately, over the years, significant weaknesses were re- ported on instances of some popular keyless hash functions. This has moti- vated the research community to start considering the dedicated-key setting, which also allows for more rigorous security arguments. However, it turns out that converting an existing keyless hash function into a dedicated-key one is non-trivial since the keyless compression function does not normally accommodate the extra key input. In this chapter we formalise an approach that can potentially solve this problem. In this approach, keyless hash func- tions are seamlessly transformed into keyed variants by introducing an extra component accompanying the (still keyless) compression function to handle the key separately. Hash functions constructed in this setting are called integrated-key hash functions. We propose several integrated-key construc- tions and prove their collision, pre-image and 2nd pre-image resistance.

4.1

Introduction

Recent years have witnessed a research proliferation in the cryptographic hash func- tions. Hash functions have traditionally been designed in the keyless setting where a construction (iteration mode) Hf : M → {0, 1}n with access to a keyless compression function f : {0, 1}n×{0, 1}m → {0, 1}nhashes a variable-length message M ∈ M by it-

eratively calling f . However, as a result of the recent attacks [154,155,156] against sev- eral widely used keyless hash functions, such as MD5 and SHA-1, another approach has increasingly been popularised. In that approach hash functions are constructed in the

4.1. Introduction 4. Integrated-Key Hash Functions

dedicated-key setting [25], where a family of hash functions Ch: K×M → {0, 1}n, with access to a publicly keyed compression function h : {0, 1}k×{0, 1}n×{0, 1}m→ {0, 1}n,

is constructed, such that instances (members) of Chare indexed by different public keys ki ∈ K. Hash functions constructed in the dedicated-key setting are not to be confused

with secretly keyed hash functions usually used to build Message Authentication Codes (MACs); these are discussed extensively in chapter 6.

Dedicated-Key. In [25], Bellare and Ristenpart discussed in length the benefits of the dedicated-key setting and showed how adopting this setting potentially improves hash function heterogeneity, which enables users to utilise different instances of the same hash function family. That is, if an attack against an instance (member) of the hash function family (specified by a particular key) was found, then it only breaks that particular instance while very likely having almost negligible (or even no) effect on other instances (indexed by different keys). The dedicated-key setting also improves the security guarantees of hash functions and gives an easy solution to Rogaway’s foundation of hashing dilemma [134], which states that keyless hash functions cannot be collision resistant due to the pigeonhole principle (see section2.1) where he solved it by means of explicit (but slightly complex) problem reduction. Instead, the dedicated- key setting allows for simpler and more straightforward theoretical arguments about collision resistance of hash functions. An obvious drawback of the dedicated-key setting is slight efficiency loss due to the introduction of an extra input (i.e., the key) that the hash function needs to process beside the message input, but such efficiency loss seems to be unavoidable in any keyed setting.

Integrated-Key. Unfortunately, in most cases, existing keyless hash functions can- not be easily turned into dedicated-key hash functions1 because their corresponding keyless compression functions do not naturally accommodate the key input introduced in the dedicated-key setting. This means that the dedicated-key approach may, in most cases, only be used in designing new hash functions, not to patch/strengthen existing ones. Thus, in this chapter we try to answer the following question:

Given a keyless hash function Hf : M → {0, 1}n, where f : {0, 1}n × {0, 1}m → {0, 1}n, how to construct a keyed hash function C : K × M →

{0, 1}n from Hf without modifying the internal structure of f , while still

keying the compression function as in the dedicated-key setting? 1

MAC schemes (e.g., HMAC) can trivially turn keyless hash functions into keyed ones, but these schemes are secretly keyed, while here we are interested in the dedicated-key setting, where keys are publicly-known.

4.1. Introduction 4. Integrated-Key Hash Functions

While here we explicitly require keyless hash functions to be keyed as in the dedicated- key setting (where the key is processed with every application of the compression function), there are other approaches that can key a hash function without strictly adhering to this requirement, which makes adapting a keyless hash function to accept a key input easy and efficient. So, why do we insist on matching the dedicated-key setting, while there are other cheaper and more convenient approaches? To answer this question, let’s consider an example where a keyless hash function is keyed by processing the key only at the last compression function call (e.g., EMD [24]). Now suppose that an attack exploiting the intermediate chaining variables was found, this should readily break the whole function family because apart from the last compression function call, the whole hashing procedure is common for all members. Clearly, this is not the case with the dedicated-key hash functions. Therefore, in this chapter, we introduce a bridging approach that seamlessly transforms a keyless hash function into a dedicated-key one without modifying the internal (keyless) structure of the former, we call hash functions constructed in this setting integrated-key hash functions.

Related Work. There is a distinction between merely creating families of hash func- tions and creating them in the dedicated-key setting, where the latter explicitly states that all applications of the compression function should be keyed. Examples of con- structions forming families of hash functions, but without keying all compression func- tion applications, are EMD (Enveloped Merkle-Damg˚ard) [24], and BCM (Backward Chaining Mode) [15]. EMD was originally proposed as a keyless hash function intro- ducing a second IV which can be used as a key. On the other hand, BCM was explicitly proposed as a way of creating families of hash functions out of keyless compression func- tions, but it still does not strictly follow the dedicated-key approach since the keys are not applied to every applications of the compression function. A proposal that can be seen as a variant of an integrated-key construction is RMX [79] which is based on the randomized hashing paradigm [78] (see section3.3.3for detailed description of the RMX construction). RMX performs some sort of key whitening (a block-cipher technique) by combining a random salt (i.e., a key) with every message block before sending it to the compression function. However, randomized hashing is more suitable for digital signatures where the hash function is used by the signature algorithm as a black-box. That is, since the message M in RMX is merely XORed with the publicly available key K, it is possible to retrieve parts of M via some differential and linear cryptanalysis techniques [36,106], but if RMX is used in digital signatures, the signature algorithm will further process the hash value produced by RMX, potentially resisting differen- tial/linear cryptanalysis. Other recent hash functions such as Skein [69] and MD6 [133]