• No results found

A Representative of Larger Data

If you’re looking for something to produce a representative of a larger amount of data, it’s easy to see that a message digest does that job fairly well. First, the output of a digest algorithm is usually smaller than the data itself, and no matter how big the data gets, the digest as a represen- tative will always be the same size. If someone tries to surreptitiously change the original message, the new, fake message will not produce the same digest. If the digest produced by the algorithm does not represent the data, you know that something went wrong (see Figure 5-7). Maybe the data has been altered, maybe the digest is wrong. You might not know what exactly happened, but you do know something happened.

Here’s how an application can check a digest. Pao-Chi is sending Daniel some data, such as an e-mail or a contract; for this example, it’s the mes- sage about selling four units to Satomi. Before Pao-Chi sends the mes- sage, he digests it. Now he sends the data and the digest. When Daniel gets the data, he also digests it. If his digest matches Pao-Chi’s, he knows the data has not been changed in transit. If Satomi had intercepted and altered the message, the digest that Daniel produced would not have matched the digest Pao-Chi produced. Daniel would know that something happened and would not trust the data.

Your immediate response might be, “If Satomi could alter the data, she could alter the digest.” That’s true, but there are two ways to prevent that. One is to use a digital signature, a topic we’ll return to shortly. For now, let’s look at the second way: a keyed digest. The most common keyed digest is called HMAC.

HMAC

MACstands for message authentication checksum (or message authenti- cation code), and Hstands for hash or hash-based function, so an HMAC

Chapter 5

150

Figure 5-7

If the data does not match the digest, you know that something went wrong

pose you had a column of numbers (say, in an accountant’s ledger). If the correct numbers are there, the sum of the column is a specific value. Later, to check that the ledger is still correct, you don’t compare each number individually; rather, you find the sum of the column. If the second sum matches the first sum, the check passes. Of course, if someone can change one number, it’s easy also to change the sum at the bottom of the ledger so that it matches the change in the single number. It would also be easy to change another number in the column to offset the first change. A MAC is a way to detect changes in the data or in the sum. To detect changes in the data, a MAC can be based on a digest, block cipher, or stream cipher (see Chapter 2). To detect changes in the actual checksum, the MAC uses a key. Most HMACs work this way. Two parties share a secret key (Chapter 4 shows how that’s done), and then each digest the key and message. The digest depends on the message and the key, so an attacker would have to know what the key is to alter the message and attach a correct checksum. For example, suppose Pao-Chi sends Daniel message 1 shown earlier (the message instructing him to ship four units to Satomi). Pao-Chi uses an HMAC so that Daniel can verify that the data did not change. Using a key exchange algorithm (RSA, DH, ECDH), the two agree on a 128-bit key. Pao-Chi uses SHA-1 to digest the key and message as one chunk of data. The result is as follows. (The two vertical lines || indicate concatenation; see also Figure 5-8.)

Pao-Chi’s HMAC result (SHA-1 digest of key || message 1): 60 c4 65 a8 a4 9d 35 6a 68 36 f8 f0 56 3d d2 7f 7e 26 35 b2

NOTE:

We haven’t told you what the key is, so you can’t verify that the result we present is the actual result of an HMAC. If you want to know what the key is, you can figure it out. Put together a chunk of data—a key candi- date followed by the message—and then digest it. Is it the same result given here? No? Try another key, and another, and so on until you find the correct one. It’s a 128-bit key.

Now Pao-Chi sends Daniel the message and the HMAC result together. Suppose that Satomi intercepts the transmission and tries to get Daniel

to ship five presses instead of four by substituting message 2 for Pao- Chi’s. After replacing the message, she sends it to Daniel. If she failed to replace the HMAC result, Daniel would digest the key and fake message and get the following (see Figure 5-9).

Daniel’s HMAC result (SHA1 digest of key || message 2): a8 32 3b 8d f3 6b 3e e1 08 bb 6b 0b f0 cc a5 5b 26 d4 d1 41 Chapter 5

152

Figure 5-8 The HMAC algorithm digests the key and the data (in that order) to produce a value

Figure 5-9

Daniel digests the correct key but the wrong message, so he knows that something is wrong

that what Pao-Chi digested and what he digested are not the same. Some- thing—maybe the key, maybe the actual message, maybe even the HMAC value—was changed. Daniel doesn’t know exactly what was changed, but that doesn’t matter. He knows something went wrong. He contacts Pao- Chi again, and they start over.

Another possibility is for Satomi to substitute message 2 for message 1

and substitute the HMAC. But the problem is that Satomi can’t know what the correct HMAC value should be. To demonstrate this, suppose Satomi substitutes six presses for four presses. Here’s the SHA-1 digest.

Daniel, I sold 6 presses to Satomi. Ship immediately. SHA-1 digest:

66 05 40 8c 24 6e 05 f8 00 20 f4 72 14 08 bc 22 53 b2 eb d2

If Satomi substitutes this digest, Daniel will still know something is wrong because that’s not the value he’s going to get. He’s not digesting the message; rather, he’s digesting the key and the message. So what should Satomi use?