attack, then there is no need to encrypt it.
10.4 Message Digest Functions
Message digest functions distill the information contained in a file (small or large) into a single large number, typically between 128 and 256 bits in length. This is illustrated in Figure 10.3. The best message digest functions combine these mathematical properties:
•
Every bit of the message digest function is influenced by every bit of the function's input.•
If any given bit of the function's input is changed, every output bit has a 50 percent chance of changing.•
Given an input file and its corresponding message digest, it should be computationally infeasible to find another file with the same message digest value.Figure 10.3. A message digest function
Message digests are also called one-way hash functions because they produce values that are difficult to invert, resistant to attack, mostly unique, and widely distributed.
Many message digest functions have been proposed and are in use today. Here are just a few:
HMAC
The Hashed Message Authentication Code, a technique that uses a secret key and a message digest function to create a secret message authentication code. The HMAC method strengthens an existing message digest function to make it resistant to external attack, even if the message digest function itself is somehow compromised. (See RFC 2104 for details.)
MD2
Message Digest #2, developed by Ronald Rivest. This message digest is the most secure of Rivest's message digest functions, but takes the longest to compute. It produces a 128-bit digest.
MD4
Message Digest #4, also developed by Ronald Rivest. This message digest algorithm was developed as a fast alternative to MD2. Subsequently, MD4 has been shown to be insecure. That is, it is possible to find two files that produce the same MD4 codes without requiring a brute force search. MD4 produces a 128-bit digest.
MD5
Message Digest #5, also developed by Ronald Rivest. MD5 is a modification of MD4 that includes techniques designed to make it more secure. Although widely used, in the summer of 1996 a few flaws were discovered in MD5 that allowed some kinds of collisions to be calculated. As a result, MD5 is slowly falling out of favor. MD5 produces a 128-bit digest.
SHA
The Secure Hash Algorithm, developed by the NSA and designed for use with the National Institute for Standards and Technology's Digital Signature Standard (NIST's DSS). Shortly after the publication of the SHA, NIST announced that it was not suitable for use without a small change. SHA produces a 160-bit digest.
SHA-1
The revised Secure Hash Algorithm, also developed by the NSA and designed for use with the NSA's DSS. SHA-1 incorporates minor changes from SHA. It is not known if these changes make SHA-1 more secure than SHA, although some people believe that it does. SHA-1 produces a 160-bit digest.
Besides these functions, it is also possible to use traditional symmetric block encryption systems such as the DES as message digest functions. To use an encryption function as a message digest function, simply run the encryption function in cipher feedback mode. For a key, use a key that is randomly chosen and specific to the application. Encrypt the entire input file. The last block of encrypted data is the message digest.
10.4.1 Message Digest Algorithms at Work
Message digest algorithms themselves are not used for encryption and decryption operations. Instead, they are used in the creation of digital signatures, message authentication codes (MACs), and the creation of encryption keys from passphrases.
The easiest way to understand message digest functions is to look at them at work. Consider the message digest algorithm MD5, developed by Ronald Rivest and distributed by RSA Data Security. The following example shows some inputs to the MD5 function and the resulting MD5 codes:
MD5(There is $1500 in the blue box.) = 05f8cfc03f4e58cbee731aa4a14b3f03 MD5(The meeting last week was swell.)= 050f3905211cddf36107ffc361c23e3d MD5(There is $1100 in the blue box.) = d6dee11aae89661a45eb9d21e30d34cb Notice that all of these messages have dramatically different MD5 codes. Even the first and the third messages, which differ by only a single character (and, within that character, by only a single binary bit), have completely different message digests. The message digest appears almost random, but it's not. Let's look at a few more message digests:
MD5(There is $1500 in the blue bo) = f80b3fde8ecbac1b515960b9058de7a1 MD5(There is $1500 in the blue box) = a4a5471a0e019a4a502134d38fb64729 MD5(There is $1500 in the blue box.) = 05f8cfc03f4e58cbee731aa4a14b3f03 MD5(There is $1500 in the blue box!) = 4b36807076169572b804907735accd42 MD5(There is $1500 in the blue box..)= 3a7b4e07ae316eb60b5af4a1a2345931
Consider the third line of MD5 code in the above example: you can see that it is exactly the same as the first line of MD5 code shown previously. This is because the same text always produces the same MD5 code. The message digest function is a powerful tool for detecting very small changes in very large files or messages; calculate the MD5 code for your message and set it aside. If you think that the file has been changed (either accidentally or on purpose), simply recalculate the MD5 code and compare it with the MD5 that you originally calculated. If they match, there is an excellent chance that the file was not modified. Two different files can have the same message digest value. This is called a collision. For a message digest function to be secure, it should be computationally infeasible to find or produce these collisions.
10.4.2 Uses of Message Digest Functions
Message digest functions are widely used today for a number of reasons:
•
Message digest functions are much faster than traditional symmetric key cryptographic functions but appear to share many of their strong cryptographic properties.•
There are no patent restrictions on any message digest functions that are currently in use.•
There are no export restrictions on message digest functions.•
They appear to provide an excellent means of spreading the randomness (entropy) from an input among all of the function's output bits.59•
Using a message digest, you can create encryption keys for symmetric key ciphers by allowing users to type passphrases. The encryption key is then produced by computing the message digest of the phrase that was typed. PGP uses this technique for computing the encryption keys for conventional encryption.•
Message digests can be readily used for message authentication codes which use a shared secret between two parties to prove that a message is authentic. MACs are appended to the end of the message to be verified. (RFC 2104 describes how to use keyed hashing for message authentication.) Message digest functions are also an important part of many public key cryptography systems.•
Message digests are the basis of most digital signature standards. Instead of signing the entire document, most digital signature standards simply sign a message digest of the document.•
MACs based on message digests provide the "cryptographic" security for most of the Internet's routing protocols.•
Programs such as PGP use message digests to transform a passphrase provided by a user into an encryption key that is used for symmetric encryption. (In the case of PGP, symmetric encryption is used for PGP's "conventional encryption" function as well as to encrypt the user's private key.) It is somewhat disconcerting that there is little published theoretical basis behind message digest functions. 10.4.3 Attacks on Message Digest FunctionsThere are two kinds of attacks on message digest functions. The first attack is to find two messages - any two messages - that have the same message digest. The second attack is more general: given a particular message, find a second message that has the same message digest code. There's extra value if the second message is in a human-readable message, in the same language, and in the same word processor format as the first.
Message digest functions have become such an important part of the public key cryptography infrastructure and working public key cryptography systems that a workable attack on a message digest function can significantly weaken the security of an entire cryptosystem. For this reason, when a series of collisions using the MD5 algorithm was discovered, the IETF TLS working group (Chapter 12 describes this group) decided to abandon MD5 and instead use HMAC as its message digest function.
MD5 is probably secure enough to be used over the next five to ten years. Even if it becomes possible to find MD5 collisions at will, it will be very difficult to transform this knowledge into a general purpose attack on SSL. However, it is better to have a message digest function that does not have any known weaknesses, which is the reason for the IETF's decision to move to a more secure algorithm.
59
To generate a "random" number, simply take a whole bunch of data sources that seem to change over time, such as log files, time-of-date
clocks, and user input, and run all of the information through a message digest function. If there are more bits worth of entropy in an
input block than there are output bits of the hash, then all of the output bits can be assumed to be independent and random, provided that
the message digest function is secure.
Why Publish Your Attack?
For years, cryptography has been an academic discipline, with cryptographers publishing their results in journals, on the Internet, and at prestigious conferences.
As time progresses, and cryptography is becoming increasingly the basis of electronic commerce, this trend may stop. Instead of publishing their results, some mathematicians may decide to exploit them and use them as tools for defrauding banks and other financial institutions.
Whether or not this approach succeeds is anybody's guess. There's vastly more money to be made in fraud than in academia. On the other hand, it's unlikely that banks will rely solely on the strength of their cryptographic protocols to protect their assets.
10.5 Public Key Infrastructure
The last piece of the cryptography puzzle is a system for establishing the identity of people who hold
cryptographic keys. In recent years, such a system has come to be called the public key infrastructure, as we discussed in Chapter 6.
Recall that public key encryption systems require that each user creates two keys:
•
A public key, which is used for sending encrypted messages to the user and for verifying the user's digital signature.•
A secret key, which is used by the user for decrypting received messages and for signing the user's digital signature.While secret keys are designed to be kept secret, public keys are designed to be published and widely distributed.
Schematically, you might imagine that public and secret keys contain little information other than the actual values that are needed for public key encryption and decryption, as shown in Figure 10.4.
Figure 10.4. A simplistic idea for storing public and secret keys
It turns out, though, that we need to store more information with each public key. In addition to the encryption information, we may wish to store the user's name (see Figure 10.5) or some other kind of
Figure 10.5. A better representation for public and secret keys, containing space for the user's name
The name field can contain anything that the key holder wishes. It might contain. "Sascha Strathmore." Or it might contain "S. Strathmore" or "Ahcsas Obsidian" or even "Head Honcho". Once the key is created with a name, it can be signed by a third party. Third parties that verify the information on the key before it is signed are called certification authorities; these are described in detail in Chapter 7.