Quantitative Information Flow - Software Side-Channel Analysis

Program analysis methods in the area of secure information flow (SIF) track the propagation of sensitive information through a program. SIF detects insecure information flows, commonly known as information leaks. These methods produce a binary answer: yes, there is an information leak, or no, there is not, and these methods have seen success in verifying anonymity protocols [34] and firewall protocols [35], and network security protocols [36].

Requiring that a program does not leak any information is too strict to be a useful filter for determining program security. The canonical example is that of a password checking function. Each time the password checker rejects an incorrect password, some information about the password is leaked; namely, the number of possible correct pass- words is reduced by 1. Indeed, SIF methods will tell us that this program leaks. On

tempt which reduces the search space by 1 with each query to the password checker is an infeasible attack. Hence, contrary to SIF’s insecure classification, we would like to say that such a password checking function is in fact secure because the information leakage is small relative to the search space. In a more general setting, the question becomes: given a program, how much information is leaked? The ability to answer this question allows us to tolerate small leaks and compare the information leakage of two different implementations. This “how much” question led to the development of Quantitative In- formation Flow (QIF), which gives a foundational framework in which we can measure information leakage [37].

In order to explain how information leakage is quantified, I remind the reader of some terminology and introduce a simple model. We shall consider a program P , which accepts a public low-security input l, a private high security input h, and produces a observation o. In addition, it is customary to introduce the concept of an adversary, A. In this model setting, the adversary invokes P with input l and records observation o. A does not have direct access to h, but would like to learn something about its value. Before invoking P , A has some initial uncertainty about the value of h, while after observing o, some amount of information is leaked, thereby reducing A’s uncertainty about H. A popular intuitive adage in this setting was popularized by Geoffrey Smith [37]:

“information leaked = initial uncertainty - remaining uncertainty” The field of QIF formalizes the intuitive statement above by casting the problem in the language of information theory. The field of information theory traces its ori- gins to Claude Shannon’s landmark 1948 paper “A Mathematical Theory of Commu- nication” [38], which adapted the concept of entropy for the purpose of measuring the amount of information that can be transmitted over a channel, measuring information transmission in bits of entropy. In the context of QIF, the information entropy of h is considered a measurement of the adversary’s uncertainty about h.

I briefly give three relevant information entropy measures [39]. Given a random variable X which can take values in {x1, . . . , xn} with probabilities p(xi), the information

entropy of X, denoted H(X) is given by

H(X) = X

xi∈X

p(xi) log2(1/p(xi)) (2.2)

Given another random variable Y and a conditional probability distribution p(X|Y ), we have the conditional entropy of X given knowledge of Y :

H(X|Y ) = X

yi∈Y

p(yi)H(X|Y = yi) (2.3)

Given these two definitions, the mutual information of X and Y is given by

I(X; Y ) = H(X) − H(X|Y ) (2.4) In the context of QIF, we consider random variables H, L, and 0 for the high-security input h, low-security input l, and observation o. We can then interpret, for instance, p(H) to be the adversary’s initial belief about H, and the initial uncertainty to be H(H). The conditional entropy H(H|O, L) quantifies A’s remaining uncertainty after providing input L and observing output O. We can then write

I(H; O, L) = H(H) − H(H|O, L) (2.5) and interpret I(H; O, L) as the amount of information leaked. These formal definitions are then in line with Smith’s intuitive statement of QIF. In addition, if we asssume that the secret and attacker inputs are chosen independently and uniformly at random, we

can make use of well-known indentities of information theory [26] to observe that

I(H; O, L) = H(H) − H(H|O, L) = H(O|H, L) (2.6) Notice that the right hand side of Equation 2.6 is defined in terms of p(O|H, L), and this probability distribution is exactly that which is determined by the path condition probabilities defined in Equation 2.1. For instance, looking at the final row of Table 2.1, we see that the probabilistic symbolic execution table defines a conditional probability distribution of the observation oi given choices of h and l. Information leakage can be

computed from the probabilities that result from symbolic execution and model counting. I will make extensive use of these concepts of entropy throughout the reset of the dissertation. Later, in Chapter 6, we will require a slightly different formulation of entropy, using the Kullback-Leibler divergence [26]. In the meantime, I will conclude this chapter with an example of how information theory can be used to quantify the information leakage for the two example programs shown earlier.

Side Channel Quantification Example

Using the ideas of probabilistic symbolic execution (Section 2.3) and quantitative information flow (Section 2.4), we can compute the amount of information gained by an attacker for a given program. I illustrate this idea using the running example of Figures 2.2 and 2.4 and Table 2.1.

For the function checkPIN we have a set of 5 possible side-channel observations, {o0, o1, o2, o3, o4, o5}. In addition we have a probability distribution over these observa-

tions given by the probabilities 1 2, 1 4, 1 8, 1 16, 1

tion 2.6 to compute I(H; O, L) = 1 2log22 + 1 4log24 + 1 8log28 + 1 16log216 + 1 16log216 = 1.875 bits Intuitively this makes sense. Half the time, an attacker will learn the first bit, for which there are 2 possibilities, in which case they gain log₂(2) = 1 bit of information. One quarter of the time, an attacker will learn the first two bits, for which there are 4 possibilities, in which case they gain log₂(4) = 2 bits of information, and so on. Com- puting the weighted sum of these information gains tells us the amount of information that an attacker can gain on average. What I have illustrated is that this can be computed automatically with symbolic execution, so long as we can compute the number of solutions to a constraint, which I address in the following chapter on model counting techniques.

Now, compare the leakage we just computed to the leakage for the “safe” PIN checking function of Figure 2.3. Since, all executions take the same amount of time, no information can be gained from the side channel. But how much information can be gained from the main channel? The function will return true only if all bits match. Since there are 16 possible secrets, this happens with probability pT = ₁₆1 , and the function returns false

with probability pF = 15₁₆. Computing the entropy for this distribution gives us 0.33729

bits of information. Thus, we have a way to quantify the relative vulnerability of two implementations which are functionally equivalent.

In document Software Side-Channel Analysis (Page 32-36)