New Generic Method - Generic Polynomial Evaluation Technique

5.3 Evaluation of Polynomials

5.3.4 Generic Polynomial Evaluation Technique

5.3.4.2 New Generic Method

Description. Consider an n-bit to n-bit S-Box represented by a polynomial P (x) ∈ F2n[x]. We consider a collection S of ` cyclotomic classes w.r.t. n:

S = {Cα1=0, Cα2=1, Cα3, . . . , Cα`} . (5.27)

Also, define L as the set of all integers in the cyclotomic classes of S: L = ∪

Ci∈S

Ci. (5.28)

We choose the set S of ` cyclotomic classes in (5.27) so that the set of corresponding monomials xL from S can be computed using only ` − 2 non-linear multiplications. We require that every monomial x0, x1, . . . , x2n−1, can be written as product of some two monomials in P(xL). Moreover, we try to choose only those cyclotomic classes with the maximum number of n elements (except C0 which has only a single element). This gives

|L| = 1 + n · (` − 1) . (5.29)

Next, we generate t − 1 random polynomials qi(x) $

← P(xL_{) that have their}

monomials only in xL. Suitable values for the parameters t and |L| will be determined later. Then, we try to find t polynomials pi(x) ∈ P(xL) such that

P (x) =

t−1

i=1

pi(x) · qi(x) + pt(x). (5.30)

It is easy to see that the coefficients of the pi(x) polynomials can be obtained

by solving a system of linear equations in F2n, as in the Lagrange interpolation

theorem. More precisely, to find the polynomials pi(x), we solve the following

system of linear equations over F2n:

A · ~c = ~b (5.31)

where the matrix A is obtained by evaluating the R.H.S. of (5.30) at every element of F2n, and by treating the unknown coefficients of p_i(x) as variables.

5.3 Evaluation of Polynomials 101

pi(x) has |L| unknown coefficients. The matrix A can also be written as a

block concatenation of smaller matrices:

A = (A1|A2| . . . |At), (5.32)

where Ai is a 2n× |L| matrix corresponding to the product pi(x) · qi(x). Let

aj ∈ F2n (j = 0, 1, . . . , 2n− 1) be all the field elements and p_i(x) consists of

the monomials xk1_{, x}k2_{, . . . , x}k|L| _{∈ x}L_{. Then, the matrix A}

i has the following

structure: Ai =              ak1 0 · qi(a0) a0k2 · qi(a0) . . . a k|L| 0 · qi(a0) ak1 1 · qi(a1) a1k2 · qi(a1) . . . a k|L| 1 · qi(a1) ak1 2 · qi(a2) a2k2 · qi(a2) . . . a k|L| 2 · qi(a2) .. . ... . . . ... ak1 2n₋₁· qi(a2n₋₁) ak₂n2₋₁· qi(a2n₋₁) . . . a k|L| 2n₋₁· qi(a2n₋₁)              (5.33) The unknown vector ~c in (5.31) corresponds to the unknown coefficients of the polynomials pi(x). The vector ~b is formed by evaluating P (x) at every

element of F2n. Note that since P (x) corresponds to an S-box, the vector ~b

can be directly obtained from the corresponding S-box lookup table.

If the matrix A has rank 2n, then we are able to guarantee that the decomposition in (5.30) exists for every polynomial P (x). To be of full rank 2n the matrix must have a number of columns ≥ 2n. This gives us the necessary condition

t · |L| ≥ 2n. (5.34)

We stress that (5.34) is only a necessary condition. Namely we don’t know how to prove that the matrix A will be full rank when the previous condition is satisfied; this makes our algorithm heuristic. In practice for random polynomials qi(x) we almost always obtain a full rank matrix under condition

(5.34).

From (5.29), we get the condition

t · (1 + n · (` − 1)) ≥ 2n (5.35)

where t is the number of polynomials pi(x) and ` the number of cyclotomic

classes in the set S, to evaluate a polynomial P (x) over F2n.

We summarize the above method in Algorithm 2 below. The number of non-linear multiplications required in the combining step (5.30) is t − 1. As mentioned earlier, we need ` − 2 non-linear multiplications to precompute the set xL. Hence the total number of non-linear multiplications required is then

Algorithm 2 New generic polynomial decomposition algorithm Input: P (x) ∈ F2n[x].

Output: Polynomials pi(x), qi(x) such that P (x) = t−1

i=1

pi(x) · qi(x) + pt(x).

1: Choose ` cyclotomic classes Cαi : L ←

i=1

Cαi, and the basis set x

L _{can be}

computed using ` − 2 non-linear multiplications.

2: Choose t such that t · |L| ≥ 2n.

3: For 1 ≤ i ≤ t, choose qi(x)← P x$ L.

4: Construct the matrix A ← (A1|A2| . . . |At), where each Aiis the 2n× |L| matrix

given by (5.33).

5: Solve the linear system A · ~c = ~b, where ~b is the evaluation of P (x) at every element of F2n.

6: Construct the polynomials pi(x) from the solution vector ~c.

where t is the number of polynomials pi(x) and ` the number of cyclotomic

classes in the set S.

Remark 5.5 If A has rank 2n, then the same set of basis polynomials qi(x)

will yield a decomposition as in (5.30) for any polynomial P (x). That is, the matrix A is independent from the polynomial P (x) to be evaluated.

Remark 5.6 Our decomposition method is heuristic because for a given n in F2n we do not know how to guarantee that the matrix A has full rank 2n.

However for typical values of n, say n = 4, 6, 8, we can definitely check that the matrix A has full rank, for a particular choice of random polynomials qi(x).

Then any polynomial P (x) can be decomposed using these polynomials qi(x).

In other words for a given n we can once and for all generate the random polynomials qi(x) and check that the matrix A has full rank 2n, which will

prove that any polynomial P (x) ∈ F2n[x] can then be decomposed as above.

In summary our method is heuristic for large values of n, but can be proven for small values of n. Such proof requires to compute the rank of a matrix with 2n rows and a slightly larger number of columns, which takes O(23n) time using Gaussian elimination.

Asymptotic Analysis. Substituting (5.36) in (5.35) to eliminate the pa- rameter `, we get t · (1 + n · (Nmult− t + 2)) ≥ 2n, =⇒ Nmult≥ 2n n · t+ t − 2 + 1 n . (5.37)

The R.H.S. of the above expression is minimized when t ≈ q 2n n, and hence we obtain Nmult ≥ 2 · r 2n n − 2 +1 n . (5.38)

5.3 Evaluation of Polynomials 103

Hence, our heuristic method requires O(p2n_{/n) non-linear multiplications,}

which is asymptotically slightly better than the Parity-Split method [CGP+12], which has proven complexity O(√2n_{). If one has to rigorously establish the}

above bound for our method, then we may have to prove the following state- ments, which we leave as open problems:

– We can sample the collection S of cyclotomic classes in (5.27), each having maximal length n (other than C0), using at most ` − 2 non-linear

multiplications.

– The condition t · |L| ≥ 2n _{suffices to ensure that the matrix A has full}

rank 2n.

Table 5.3 lists the expected minimum number of non-linear multiplications, as determined by (5.38), for binary fields F2n of practical interest. It also lists

the actual number of non-linear multiplications that suffices to evaluate any polynomial, for which we have verified that the matrix A has full rank 2n, for a particular random choice of the qi(x) polynomials. We also provide a

performance comparison of our method with that of the Cyclotomic Class and the Parity-Split methods from [CGP+12]. In Table 5.4, we list the specific choice of parameters t and L that we used in this experiment.

n 4 5 6 7 8 9 10

Cyclotomic Class method

[CGP+12]

3 5 11 17 33 53 105

Parity-Split method [CGP+12] 4 6 10 14 22 30 46

Expected minimum value of Nmult (cf. (5.38))

2 3 5 7 10 13 19

Achievable value of Nmult 2 4 5 7 10 14 19

Table 5.3: Minimum values of Nmult

Remark 5.7 Note that there is still a gap between the lower bound from Table 5.2 and the achievable value of Nmult for our method in Table 5.3. This

is because in our method the decomposition of P (x) as

P (x) =

t−1

i=1

pi(x) · qi(x) + pt(x) (5.39)

is performed by first generating the polynomials qi(x) randomly and inde-

pendently of P (x), in order to have a linear system of equations over the coefficients of pi(x). Instead one could try to solve (5.39) for both the pi(x)

and the qi(x) polynomials simultaneously; however this gives a quadratic sys-

n t L |L| 4 2 C0∪ C1∪ C3 9 5 3 C0∪ C1∪ C3∪ C7 16 6 3 C0∪ C1∪ C3∪ C7∪ C11 25 7 4 C0∪ C1∪ C3∪ C7∪ C11∪ C15 36 8 6 C0∪ C1∪ C3∪ C7∪ C29∪ C87∪ C251 49 9 8 C0∪ C1∪ C3∪ C7∪ C29∪ C45∪ C119∪ C191∪ C255 73 10 11 C0∪ C1∪ C3∪ C7∪ C29∪ C45∪ C119∪ C191∪ C155∪ C255∪ C339 101

Table 5.4: Heuristics for choosing parameters t and L.

Counting the Linear Operations. From (5.29) and (5.30), we get (2t − 1) · (|L| − 1) + (t − 1) as an upper-bound on the number of addition operations required to evaluate P (x). This is because each of the 2t − 1 polynomials pi(x) and qi(x) in (5.30) have (at most) |L| terms, and there are t summands

in (5.30). From (5.34), we get:

(2t − 1) · (|L| − 1) + (t − 1) ≤ 2 t |L| ≈ 2 · 2n

Similarly, we get (2t − 1) · |L| ≈ 2 · 2nas an estimate for the number of scalar multiplications. Since the squaring operations are used only to compute the list L, we need |L| − ` ≤ |L| ≈√n · 2n _{many of them (cf. (5.37)).}

Remark 5.8 The above count of the linear operations can be significantly reduced if the linear operations are replaced by table lookups as much as possible. Such an approach is particularly well suited for application in the higher-order masking scheme of [CGP+12], where we need to evaluate a given polynomial with many shares and that the processing of linear polynomials with shares is particularly straightforward. More specifically, we can write each pi(x) as a sum of F2-linear polynomials pi,j, one for each cyclotomic

class in the pre-computed set S (cf. (5.27), (5.28))1_:

pi(x) =

C_αj∈S

pi,j(xαj) .

The ` polynomials pi,j are F2-linear and hence are of the form n−1 P k=0 γkx2 k . Similarly, the polynomials qi(x) can also be expressed in the above form. If

we tabulate the values of each of the linear polynomials pi,j and qi,j, then

it suffices to evaluate xαj _{for each cyclotomic class C}

αj ∈ S using only non-

linear multiplications. Then the polynomials pi,j and qi,j can be evaluated by

just table lookups, and then each of the 2t − 1 polynomials pi and qi can be

eventually evaluated with ` − 1 additions each. Finally, we need t − 1 more

In document Practical Provable Security against Side-Channel Attacks (Page 116-121)