Optimized Modular Karatsuba Multiplication

As opposed to following the directions of the Barrett reduction or Mont- gomery reduction, this paper exploits a different category of modular multiplication optimization, i.e., modular reduction of partial products. As this technique calculates several partial products in parallel and then performs reduction before combining them back, it has the potential to accelerate the operation as well as reduce the area consumption of the hardware implementation. To this end, we first perform a flattening step to split the operands and then reduce the partial products according to a base B.

8.2.1 Base B Representation

Recall the standard definition of base B representation for an integer. For positive integers Q, B, `, if Q < B`_{, then any integer y}_{∈ [0, Q − 1] can be written}

uniquely in base B as

y = y0+ y1B + y2B2+· · · + y`−1B`−1 (8.2)

where 0 _{≤ y}i < B. If B is a power of 2, B = 2v, it is simple to convert from

the binary representation of an integer into its base B representation. In this case, yi = [(i + 1)v− 1, iv], where i is an integer and 0 ≤ i ≤ ` − 1.

8.2.2 Optimally Chosen Prime

Performing an NTT is the primary algorithm for polynomial multiplication used in lattice-based cryptography. We will design our algorithm to use a fixed prime that is chosen to be compatible with fast NTT operations modulo (xn_{+ 1)}

where n is a power of 2. In order to perform the NTT, the prime must be chosen such that 2n divides (Q_{− 1). To further optimize the modular multiplication, we} will to chose the prime to have a “sparse” representation in the following sense.

The core of our algorithm is based on choosing a power of 2, B = 2v_{, and}

selecting a sparse prime Q such that Q < B2_{. In particular for integers v}

1 and v2

where v2 < v1 ≤ v − 2, we consider sparse primes of the following two forms:

Q = 22v_{− 2}v1 _{± 2}v2 _{+ 1,} _(8.3)

and

Q = 22v_{− 2}v1_{+ 1.} _(8.4)

We refer to the forms in Equations (8.3) and (8.4) as 4-sparse and 3-sparse primes, respectively. We choose these forms for several reasons:

1. our technique will use the fact that Q < 22v _{= B}2 _{to represent the inputs as}

two base B digits, thus we must subtract the power 2v1_;

2. one of our optimizations is based on the assumption that v1 ≤ v − 2;

3. the least significant bit (LSB) of a prime has to be 1, otherwise Q is even, and we add 1 rather than subtract so that Q− 1 will be divisible by a larger power of 2;

4. finally, in order to be able to find primes of any desired size we add the power 2v2 _{to the form which lets us find more numbers of this form which are prime.} In order to support an NTT in rings with large n (as is needed for strong security of the Ring Learning with Errors (RLWE) problem [14]), v2needs to be large in the

case of a 4-sparse prime, while v1 should be large in the case of a 3-sparse prime.

For example, the 64-bit prime Q = 264_{− 2}24_{+ 1 supports an NTT with n}

Figure 8.1: Optimized Modular Karatsuba Multiplication

applications, we may choose a prime with small v1 and v2 for better efficiency in

the multiplication, such as the 32-bit prime Q = 232_{− 2}3_{+ 2}1_{+ 1.}

8.2.3 Proposed Algorithm for the Modular Karatsuba Multiplica- tion

We describe our algorithm in detail for the case of a 4-sparse prime Q = 22v_{− 2}v1_{+ 2}v2_{+ 1, but this algorithm can be easily extended to the 3-sparse prime} case and the other 4-sparse case we described above.

Algorithm 8.1 presents our optimized modular Karatsuba multiplication. It can be seen that only three small multiplications are required, while the other operations are only simple shifting and addition/subtraction, as the multiplication with 2k _{can be realized using shifting.}

resenting a and b in base B as a = a0 + a1B and b = b0+ b1B, as shown in Step

1. Then, similar to the original Karatsuba multiplication, we generate the partial products of c0 = a0b0, c1= a1b1, and c2 = (a0+ a1)(b0+ b1), as shown in Step 2. As

a result, the product ab = (a0+ a1B)(b0+ b1B) can be expressed in terms of these

three partial products as

ab = c0+ (c2− (c1+ c0))B + c1B2. (8.5)

The main idea is to reduce this to the range [0, Q−1] while preserving its congruence modulo Q and do so in an optimal way. As opposed to multiplying c1 by B2

as in the original Karatsuba algorithm, we multiply it by (2v1 _{− 2}v2 _{− 1) since} B2 _{= 2}2v _{≡ 2}v1_{− 2}v2_{− 1 (mod Q). In Step 3, we note this intermediate value as c,}

c := c0+ (c2− (c1+ c0))B + c1(2v1 − 2v2− 1). (8.6)

At this point, c is still not guaranteed to be in the range [0, Q_{− 1]. However, we} prove in Lemma 1 that c < 23v+2_.

Lemma 8.2.1. The intermediate value c obtained in Step 3 can be strictly upper- bounded by 23v+2_{, i.e.,}

c = c0+ (c2− (c1+ c0))B + c1(2v1 − 2v2 − 1) < 23v+2.

Proof. We have c2 = (a0+ a1)(b0+ b1), c1 = a1b1, and c0 = a0b0, so c2− (c1+ c0)

simplifies to a0b1+ a1b0. Since a0, a1, b0, b1 ≤ 2v− 1, all the products c0 = a0b0, c1=

then bound (since the terms contribute positively)

c = c0+ (a0b1+ a1b0))B + c1(2v1 − 2v2− 1)

≤ (2v_{− 1)}2_{+ 2(2}v_{− 1)}2₂v_{+ (2}v_{− 1)}2₍₂v1 _{− 2}v2_{− 1)} = (2v_{− 1)}2[1 + 2v+1+ 2v1_{− 2}v2_{− 1]}

= (2v− 1)2_[2v+1_{+ 2}v1 _{− 2}v2_].

Since v1 < v, we can bound 2v1 ≤ 2v+1. Also (2v− 1)2 < 22v, and we can drop the

negative term ₋₂v2 _{and bound as,}

< (2v_{− 1)}2_[2v+1_{+ 2}v+1_]

< 22v(2v+2) = 23v+2.

By utilizing Lemma 1, in Step 4 we split c in a base B representation as

c = f0+ f1B2+ f2B3, (8.7)

where 0≤ f0< B2, 0≤ f1 < B and 0≤ f2 < 22− 1 (Lemma 1). Consequently, we

reduce this modulo Q in Step 5 as

f = f0+ f1(2v1 − 2v2− 1) + f2(2v1 − 2v2 − 1)2v, (8.8)

since B3 _{≡ (2}v1 _{− 2}v2 _{− 1)2}v _{(mod Q). After this step, f will be very close to the} desired range of [0, Q_{−1]. We show in Lemma 2 that f < 2Q−1, if v}1 ≤ v−2. Thus,

we only need to check whether f _{≥ Q in Step 6. If it is the case, one subtraction by} Q is enough to reduce it into the range of [0, Q_{− 1].}

thenf _{− Q < Q − 1.}

Proof. We have f0 ≤ 22v− 1, f1 ≤ 2v− 1 and by Lemma 1 f2 ≤ 22− 1. Thus, we

can bound f as

f =f0+ f1(2v1− 2v2 − 1) + f22v(2v1− 2v2 − 1)

≤22v− 1 + (2v− 1)(2v1_{− 2}v2 _{− 1) + 3 · 2}v₍₂v1 _{− 2}v2 _{− 1).}

Factoring out (2v1_{− 2}v2 _{− 1), we can simplify and drop some negative terms,}

= 22v_{− 1 + (2}v1 _{− 2}v2_{− 1)[(2}v_{− 1) + 3 · 2}v_] = 22v_{− 1 + (2}v1 _{− 2}v2_{− 1)[2}v+2_{− 1]}

< 22v+ (2v1 _{− 2}v2 _{− 1)2}v+2

Then dropping the negative term ₋₂v2 _{and using the assumption that v}

1 ≤ v − 2 we can bound 2v1 _{≤ 2}v−2_, < 22v+ (2v−2− 1)2v+2 = 22v+1_{− 2}v+2 < 22v+1_{− 2}v1+1_{+ 2}v2+1 = 2Q− 2.

For the hardware implementation and performance results we refer the reader to our forthcoming paper.

Chapter 9

Another Algorithm for the

Partial Approximate Common

Divisor Problem

In this chapter, we analyze a previously unpublished algorithm for solving the Partial Approximate Common Divisor (PACD) problem, also known as the Par- tial Approximate Greatest Common Divisor (PAGCD). PACD was the problem used in early fully homomorphic encryption schemes [122], though most later fully homomorphic encryption schemes use the LWE/RLWE problems for greater efficiency. Given t samples of the form xi = pqi+ ri 0 ≤ i ≤ t − 1, where p is a fixed prime,

qi are random integers larger than p and ri are random integers with absolute value

smaller than p with r0 = 0, the PACD problem is to recover p.

In document Homomorphic Encryption and Cryptanalysis of Lattice Cryptography (Page 183-189)