1.2 Basic cryptographic primitives
1.2.5 Hash functions
A hash function is used to distill a small amount of information out of large messages. Such an action can ensure integrity of the data in question. In fact, suppose that one maintains a data base in North America and its mirror image in Europe. In order to check that both data bases are identical after for instance an update of both bases, one can compute a so-called message digest or fingerprint of each data base using the hash function and compare the results; if the data bases are identical then the resulting fingerprints will agree. The converse is not always true since we are mapping a set of large messages to a smaller set of typically 160-bit length strings. However, the event corresponding to having two different data bases mapping to the same fingerprint is very unlikely if the hash function is properly chosen as we will show later in this subsection.
In the rest of this subsection,Σ will denote the set{0, 1}, whereas n will denote a non-negative integer and ℓ(l) an integer such that ℓ(n) > n. We would like also to note that most material presented in this subsection comes from a course on the topic by Bart Preneel during the summer school “crypt@b-it 2009”.
Formal definition
In the discussion above, we considered a fixed hash function, however in more practical situations, it is useful to consider families of hash functions parameterized by keys.
Definition 1.1. A family of hash functions is a 4-tuple (D, R, K, H) such that:
1. D = Σℓ(n), is the set of possible messages, called also the domain of the hash functions
family,
2. R = Σnis the finite set of possible fingerprints, called also the range of the hash functions
family,
3. K is the finite set of possible keys,
4. H is the set of hash functions hk∈ H, where k ∈ K and hkmaps messages fromD to R.
Security properties
The most important security properties required in a cryptographic hash function are:
One wayness. Leth be a function with domain D = Σℓ(n)and rangeR = Σn. h is one-way if it
meets the following conditions:
• Preimage resistance: let x be selected uniformly in D and let M be an adversary that on the inputh(x) outputs, in polynomial time, M(h(x)) ∈ D. For each such an adversary, we require that:
Pr
x∈RD
[h(M(h(x))) = h(x)] < ǫ,
where the probability is taken over the input toM as well as on his random tosses, and ǫ is a negligible function in the security parameter.
• Second preimage resistance: let x be selected uniformly at random from D and let M be an adversary that on the inputx ∈ D outputs, in polynomial time x′ ∈ D such that
x′ 6= x. For each such an adversary, we require that:
Pr
x∈RD
[h(M(h(x))) = h(x)] < ǫ,
where the probability is taken over the input toM as well as on his random tosses, and ǫ is a negligible function in the security parameter.
Collision resistance. Let(D, R, K, H) be a function family with domain D = Σℓ(n) and range R = Σn. LetF be a collision string finder that on input k ∈ K outputs in polynomial time
either? or a pair x, x′ ∈ D such that x 6= x′ andh
k(x) = hk(x′). We require for each such
an F the following:
Pr
k∈RK
[F (H)6= ”?”] < ǫ,
where the probability is taken over the random choices ofF and of its input k ∈ K.
The work [Rogaway & Shrimpton, 2004] studies the relations (implications and separations) be- tween these properties and further security notions known for hash functions.
Finally, we finish this list with a property required in many cryptographic applications, that is the random oracle model, introduced by Bellare and Rogaway in [Bellare & Rogaway, 1996]. In this model, a hash functionh : D → R is chosen uniformly at random from the set of functions fromD to R. Moreover, h is not given by a formula or algorithm to compute its outputs. Thus, the only way to compute the valueh(x) of some x ∈ D is through a call to the function oracle. This can be assimilated to looking up a huge codebook consisting of values inD and corresponding values inR such that for each possible x∈ D, there exists a completely random value h(x) ∈ R.
Constructions and issues
The design of cryptographic hash functions started with the iterated structure proposed by Damg˚ard in [Damg˚ard, 1989]. The basic idea of this structure consisted in splitting the message to be hashed into blocks of fixed length, and hashing them block by block with a compression function. The idea was efficient and elegant and has inspired a growing study of the relations between the compression function and the resulting hash function. Moreover, this structure was the origin of two series of celebrated hash functions which are massively used in cryptography that are: MDx (x=4,5) and SHA-y (y=0,1). In fact, the first series of iterated hash functions was due to Rivest and appeared under the name MD4 in 1990, and was later replaced by MD5 due to some weaknesses in the previous version. The next series is called SHA-y (Secure Hash Algorithm) and was conceived by NIST in 1992 (SHA-0) and 1994 (SHA-1). Other constructions of hash functions are based on block ciphers or on algebraic structures, for instance elliptic curves. The advantage of such constructions resides in benefiting from the comprehensive study furnished by their underlying structures, for instance in case of algebraic constructions, one can even come up with formal security proofs, however these constructions remain slow compared to dedicated hash functions.
The current state-of-the-art in hash functions is that all the practical proposals have been bro- ken. Starting from MD4, this algorithm was first shown to have collisions in 1996 by Hans Dob- bertin in [Dobbertin, 1996]. A more efficient collision attack was found by the Chinese team of Wang in [Wang et al., 2005]. Generating collisions now in MD4 is as fast as verifying it. MD5 was similarly partially cryptanalyzed by Dobbertin in [Dobbertin, 1996] and later fully broken in [Wang & Yu, 2005] by the same Chinese team. Besides, SHA-0 and SHA-1 had the same fate and
were identified to have weaknesses which argue against keeping them in use. SHA-2 (a set of four hash algorithms, namely SHA-224, SHA-256, SHA-384, and SHA-512)was intact so far however it is algorithmically close to SHA-1 which means that efforts are underway to break it. This has motivated seeking a new hash standard SHA-3 which will be selected via an open competition running between falls 2008 and 2012.