CS3235 - Computer Security
Eleventh topic: Hashes and signatures
Hugh Anderson
National University of Singapore School of Computing
Outline
1
Hash functions
Simple hash functions (Math: polynomial arithmetic)
Cryptographic hash functions (Math: birthday paradox)
Hash functions Simple hash functions (Math: polynomial arithmetic)
Cryptographic hash functions (Math: birthday paradox)
Outline
1
Hash functions
Simple hash functions (Math: polynomial arithmetic)
Cryptographic hash functions (Math: birthday paradox)
What is a hash function?
Protecting data (I and A)...
Ahashfunction maps alongmessage, or a large amount of data, to a (shorter) check message of some sort. We attach thehashedvalue to the original message, as acheck. Cryptographic functions may be used to construct hash functions.
Hash functions can indicate if data has been corrupted (perhaps through noisy message transmission) - i.e.integrity. They can also be used as part of a scheme to check/confirm who sent a message (a digital signature) - i.e.
Hash functions Simple hash functions (Math: polynomial arithmetic)
Cryptographic hash functions (Math: birthday paradox)
Error detection
Checking for errors...
Transmit data:
1
65
3
22
47
2
Transmit data+checksum:
1
65
3
22
47
2
140
One-way parity for message “A0DBBC”
Use XOR to find the parity of each bit...
A
0
1
0
0
0
0
0
1
0
0
0
1
1
0
0
0
0
D
0
1
0
0
0
1
0
0
B
0
1
0
0
0
0
1
0
B
0
1
0
0
0
0
1
0
C
0
1
0
0
0
0
1
1
Check:
0
1
1
1
0
1
1
0
Hash functions Simple hash functions (Math: polynomial arithmetic)
Cryptographic hash functions (Math: birthday paradox)
Two way parity for message “A0DBBC”
Both vertical and horizontal...
A
0
1
0
0
0
0
0
1
0
0
0
0
1
1
0
0
0
0
0
D
0
1
0
0
0
1
0
0
0
B
0
1
0
0
0
0
1
0
0
B
0
1
0
0
0
0
1
0
0
C
0
1
0
0
0
0
1
1
1
Check:
0
1
1
1
0
1
1
0
X
Simple check codes
Simple systems are OK, but...
The simplesumof values iseasyto calculate, but hasproblemswith repetitive errors.
Theparityof bits scheme iseasyto calculate, anddetects all 1 bit errors, but it ignores all 2 bit errors.
Horizontal and vertical parity isbetter, but hasproblemswith repetitive errors. ... we want a better level of error check codes
Cyclic redundancy check codes
A scheme using remainder-after-polynomial-division...
Treat the stream of transmitted bits as a polynomial with coefficients of 1:
10110=x4+x2+x1=F(x) T(x) T(x) F(x) data cksum g(x) = Z r0 cksum T’(x) T(x) = T’(x) + E(x) Sender data Receiver E(x)
Can a stream with errors have no remainder?
Single bits?- No a single bit error means thatE(x)will have only one term (x1285say). If the generator polynomial hasxn+...+1it will never divide evenly.
Multiple bits?- Various generator polynomials are used with different
properties. Must have one factor of the polynomial beingx1+1, because this gets all odd numbers of bit errors.
Some common generators:
Used in systems all around us...
CRC-12 -x12+x11+x3+x2+x1+1CRC-16 -x16+x15+x2+1
CRC-32 -x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+
x5+x4+x2+1
Polynomial long division is easy!
Easy is, of course, a relative term...
Generatorg(x):x5+x2+1(100101)andF(x): 101101011. DivideF(x)byg(x), append remainder toF(x)to getT(x):
1010.01000 100101 )101101011.00000 100101 100001 100101 1001.00 1001.01 01000 T(x) =10110101101000.
Polynomial long division is easy!
The division can be done with very simple hardware
When this stream arrives at a decoder for checking, if the stream has no errors, the division will have no remainder
Q D C S/R D Q C S/R D C S/R D Q C S/R Data Clock XOR XOR Q D C S/R Q D0 D1 D2 D3 D4
Hash functions Simple hash functions (Math: polynomial arithmetic)
Cryptographic hash functions (Math: birthday paradox)
Polynomial long division is easy!
Step by step...
Input D4 D3 D2 D1 D0 ↓ 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 1 0 1 0 0 0 ↓ ↓ ↓ ↓ ↓ ↓→
(At end, feed in zeroes...)
↓ ↓ ↓ ↓ ↓ ↓ 1 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 ...
Case study: ethernet
Case study - use of CRC in ethernet
Ethernetis used for networking computers, principally because of its speed and low cost. The maximum size of an ethernetframe: 1514 bytesa; A 32-bit FCS (Frame Check Sequence) is calculated over the full length of the frame. The FCS used is:CRC-32
-x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+1
a1500 bytes of data, a source and destination address each of six bytes,
and a two byte type identifier. The frame also has a synchronizing header and trailer which is not checked by a CRC.
Hash functions Simple hash functions (Math: polynomial arithmetic) Cryptographic hash functions (Math: birthday paradox)
Outline
1
Hash functions
Simple hash functions (Math: polynomial arithmetic)
Cryptographic hash functions (Math: birthday paradox)
Implementation of MD5 is called md5sum or md5
hugh@sf0:~[508]$ md5sum ss.c 550114bc3cc3359e55ba33abe8983a85 ss.c hugh@sf0:~[509]$ cp ss.c XXX.c hugh@sf0:~[510]$ md5sum XXX.c 550114bc3cc3359e55ba33abe8983a85 XXX.chugh@sf0:~[511]$ md5sum TXT/cybercom.txt
9ec4c12949a4f31474f299058ce2b22a TXT/cybercom.txt
hugh@sf0:~[512]$
The terms message digest, checksum, hash and digital fingerprint are all used for hash functions. At best they will be a one-way function, with the hope being that the only way to reverse a hash is to generate a huge number
Hash functions Simple hash functions (Math: polynomial arithmetic) Cryptographic hash functions (Math: birthday paradox)
MD5
US Cyber defence!
MD5 weaknesses
But how secure is it?
There is some suspicion that MD5 may have cryptographic weaknesses. In Crypto2004, approaches for generating an MD5 collision were demonstrated:
http://eprint.iacr.org/2004/199.pdf
Note that this does not reduce the effectiveness of MD5 (yet). No-one has shown how to generate a collision for an existing hash.
Hash functions Simple hash functions (Math: polynomial arithmetic) Cryptographic hash functions (Math: birthday paradox)
SHA
SHA-1, SHA-224, SHA-256, SHA-384, SHA-512
Originally designed by NIST in 1993 (SHA-0) but revised in 1995 as SHA-1. Revisions in 2002 led to SHA-256/384/512: higher security levels.
SHA-1 SHA-224 SHA-256 SHA-384 SHA-512
Digest (hash) size (bits) 160 224 256 384 512
Message size <264 <264 <264 <2128 <2128
Block size (bits) 512 512 512 1024 1024
Number of steps 80 64 64 80 80
Properties of cryptographic hash functions
Four desirable properties...
1 The function shouldefficientlyidentify an arbitrary message using a
fixed size check valueh=H(m)
2 The function should bepublic
3 It should be computationallyinfeasibleto find datammapping to a
specific hashh=H(m)(one-wayproperty)
4 It should be computationallyinfeasibleto find two datam
1,m2which both map to some hashh=H(m1) =H(m2)(collision-freeproperty)
Collisions of cryptographic hash functions
When two messages map to same hash...
It is a collision ifm16=m2butH(m1) =H(m2).Consider the following two points:
1 Can there be no collisions at all? If the number of messages#mis
greater than the number of hashes#H(. . .), then consider the
pigeonhole principle- if there arenroosts forn+1pigeons, then at least one roost has two pigeons on it...
2 How likely are collisions? Consider thebirthday paradoxa- What is the
probability that at least two ofNrandomly selected people have the same birthday (month and day-of-month)? It turns out it ismuch more likelythan you would suspect.
aBTW - It is not really a paradox, just an unexpected, counter-intuitive
The Birthday “Paradox” explained
It is NOT the likelihood that someone in a room full of
people shares a specific person’s birthday...
It is the likelihood that amongst every pair of candidates in a room there will be (at least) one matching pair.
It is easiest to calculate the likelihood that the peoplewill not share
birthdaysa:
IfN=2, then the likelihood the two do not share a birthday is 364365, because the first person can have any of the365days, leaving364
days available for the second person.
IfN=3, then the likelihood all three do not share a birthday is 364
365×363365.
ForN, the likelihood none of them share a birthday is 364
365×363365×. . .×
365−(N−1) 365 . ForN=23, this likelihood is about0.5.
aNote that the likelihood that the peoplewillshare birthdays is1-the
The “Birthday” attack
Digital signatures use hash functions...
They provide an ability to verify anauthor, thedate and timeof a signature, authenticate messagecontents, and can be verified by third parties to
resolve disputes.
The 3 main security requirements are:
1 Integrity: Any modification can be detected. 2 Authenticity: Only the authentic entity can sign.
3 Non-repudiation: Signer cannot deny signature (not addressed today).
We can attack somedigital signaturesusing an attack based on the birthday paradox.
Reminder of digital signatures
A model for signing messages
The Birthday “Attack”
Based on the previous phenomenon...
Lets say someone digitally “signs” messages saying that they are “correct”, by calculating the hash function value and then signing that hash value. Assume that the hash function generates anm-bithash, then the attacker...
generates the hashes for2m2 variations of a desiredvalidmessage generates the hashes for2m2 variations of a desiredfraudulentmessage The likelihood that there will be a collision is greater than0.5
Now, the attacker gets the matched valid message signed, and later
substitutes the matched fraudulent message. It will have the same hash, and that hash has been “signed”...
Reversing a hash...
Precomputed tables for helping find collisions?
A precomputed table for 8 character pass-words, might have (say) 728 = 722,204,136,308,736 entries, each containing a 16 byte value. Thats a big disk (about 11,000 TB).
Indexing by hash is even worse. We do not really have
Password (MD5) Hash aaaaaaaa 3dbe00a1676... aaaaaaab 2125ea8b81b... aaaaaaac ea67f32d4e6... aaaaaaad 746a8ab05d6... aaaaaaae c554d695eb0... aaaaaaaf 09eb61fd25b... aaaaaaag 68b5af18408...
Reversing a hash: “rainbow” tables
Precompute long chains, but only keep two values
Precompute chains of val-ues starting from a pass-word guess, and using alternate hash functions h(p), and a reversing func-tion r(h), which gener-ates a predictable plausible guess from the hash.
Only store the first and last entries from the chain. It is space efficient, and you can re-compute the inter-mediate values (a space-time tradeoff). Chain #1 Compute: Chain #2 h(p) r(h) h(p) h(p) h(p) h(p) r(h) r(h) r(h) h(p) Store: cracyl13d
f0e377b6.. aaaaaaaaa fedc1234.. srxx21try srxx21try 4fad6544.. asbdhdf13 aaaaaaaaa 0a224fad.. 2399afb0.. mlacziryt f0e377b6.. xyuivlzrs d001afde.. fedc1234..
Precompute long chains, but only keep two values
h(p)
1. Compute chain from hash
r(h)
h(p)
2. Compare candidates with chain ends
h(p)
r(h) 3. Recompute chain to reverse hash
4. Password asbdhdf13 aaaaaaaaa srxx21try srxx21try 4fad6544.. fedc1234.. f0e377b6.. d001afde.. d001afde.. cracyl13d fedc1234..