6.9 Differential cryptanalysis
6.9.1 Basic definitions and assumptions
Differential cryptanalysis of block ciphers falls into the category of chosen plain- text attacks. The basic idea is that two plaintexts with a carefully chosen dif- ference can encipher to two ciphertexts such that their difference has a specific value with non-negligible probability, and by taking sufficiently many plain- text/ciphertext pairs and analyzing their differences, certain bits of the secret key may be revealed.
In the context of a (keyless) hash function, the goal of differential cryptanal- ysis is to find message pairs with an appropriately chosen difference such that
the pair of hash function outputs would have no difference—meaning a collision. If the probabilty for such collisions is non-negligible, then by hashing enough input messages, a collision of the hash function can be found.
The first step of differential cryptanalysis is to define a proper measure of differences relating the computations, and the choice of differences may vary, depending on the mathematical operations involved in the hash function or block cipher. The most commonly used measure of difference is exclusive-or. Let A and A0be a pair of values. Their difference is defined to be ∆A = A ⊕ A0. So an exclusive difference identifies individual bit positions for which the pair of values differ.
A differential path for a given hash function is the set of differences between the pair of inputs, all corresponding intermediate states, and the final hash outputs. Since MD6’s state variables form a simple sequence, the differential path for the MD6 compression function can be expressed concisely as
{∆Ai} for i = 0, ...t + n − 1.
A differential path with the property that ∆Ai = 0 for i = t + n − c, ..., t + n − 1
is called a collision differential path for the compression function.
The most important property of a differential path is its associated probability. The probability of the differential path in step i, denoted by pi, is defined to
be the probability that the pair of outputs (from step i) follows the differential path given that the pair of inputs (to step i) satisfy the difference specified by the differential path. So we can express pi as
pi= prob[∆Ai|∆Ai−tj, j = 0, 1, ..., 5].
The input difference {∆Ai−tj, j = 0, 1, ..., 5} and the output difference ∆Ai, together with its associated probability pi, are often referred to as a differential
characteristic of step i. The total probability p of the entire differential path is the product of probability pis from individual steps, assuming that computations
in individual steps are independent.
While the definition seems fairly straightfoward, we need to pay special at- tention to a few issues and implicit assumptions in the analysis of differential paths and their probabilities.
First, in most existing work on differential cryptanalysis of hash functions and block ciphers, it is commonly assumed that, after a certain number of rounds, the output from each step appears random and the computations in different steps are independent of each other. Such randomness and indepen- dence assumptions are made in terms of statistical properties of the underlying function, and they are generally necessary to carry out the probability analysis. (The same assumptions are needed for linear cryptanalysis as well.)
More specifically, in our differential cryptanalysis of MD6, the probability pi
is computed by assuming that the pair of inputs to step i are chosen at random with the only constraint that they satisfy the given difference. In addition, it is assumed that computations in different steps are independent, and so the probability pis can be multiplied when computing the total probability p.
CHAPTER 6. COMPRESSION FUNCTION SECURITY 103
We remark that, for MD6, the above assumptions are well supported by our statistical analysis results. Recall that in Section 6.6, we show that MD6 passes a variety of statistical tests after 10 rounds.
Next, the workload of a differential collision attack (using a given differential path) is computed as the inverse of some probability p0, which can generally be much larger than p for two reasons. First, by choosing appropriate inputs to the hash function, we can force pi = 1 for a number of steps at the beginning
of the computation (message modification). Second, for given input difference and output difference, there can potentially be multiple differential paths (with different intermediate differences ∆Ai) that satisfy the input and output con-
straints. The overall probability would then be the sum of the probabilities for these individual paths.
Lastly, we introduce the notion of differential path weight pattern. Let {∆Ai} be a differential path of MD6. The corresponding differential path weight
pattern is the sequence {Di}, where
Di= |∆Ai|
is the Hamming weight of ∆Ai. That is, Di is the number of bits that differ
between Ai and A0i.
Differential path weight pattern turns out to be a useful notion in the security analysis of MD6. As we will see, it allows us to separate the effect of tap positions and shift amounts with respect to differential cryptanalysis. Roughly speaking, the tap positions help to propagate the differences forward from one step to the next, while the shift amounts help to spread the differences within a word. Since the sequence {Di} mainly reflects how the differences propagate forward,
it facilitates us to study the effect of the tap positions without paying too much attention to how the bit differences line up within a word. This also greatly simplifies our analysis.