Quadratic Sieve and Extensions

(1)

CS 6550 Design and Analysis of Algorithms Section A, Lecture #8

Quadratic Sieve and Extensions

Instructor: Richard Peng Feb 15, 2021

DISCLAIMER: These notes are not necessarily an accurate representation of what I said during the class. They are mostly what I intend to say, and have not been carefully edited.

These notes are directly adapted from Eric Bach’s course notes on arith-metic algorithms http://pages.cs.wisc.edu/~cs812-1/, specifically Lectures 21-23.

Recall that given some n = pq, we want to find x 6= ±y (mod n) such that x2 ≡ y2 _{(mod n).}

We will further develop the strategy from Dixon’s algorithm [Dix81]. 1. GENERATE: a2₁ ≡ pe11 1 ∗ . . . ∗ p e1s s (1) a2₂ ≡ pe21 1 ∗ . . . ∗ p e2s s (2) · · · (3) a2_r ≡ per1 1 ∗ . . . ∗ p ers s (4)

2. COMBINE: Select a product of the ais that makes all the exponents of the pis in

the resulting right hand side even. Call these numbers a and b, so that a2 _{≡ b}2

(mod n).

3. SPLIT: Compute gcd(a ± b, n), and show (via backwards) analysis from the random source of a and b that this is a nontrivial divisor of n.

1 Quadratic Sieve

Dixon’s algorithm chooses the aito be random elements of Zn∗. The quadratic sieve [Pom84]

uses a different GENERATE step that both makes the generated numbers easier to fac-torize, and also keeps them smaller (compared to n)

Let

m ←√n (5)

(2)

This seems more or less like generating the entire list of random numbers via some random hash function, and then use sieve-like methods to speed up the factorization process for them. By expanding out the square, we get

f (x) ≤ x2+ 2xm (7)

= O (x (x + 2m)) (8)

for any x > 0.

Such an algorithm has two major advantages:

1. the residues are smaller, hence more likely to be smooth. We will ensure x = no(1)_,

which means

f (x) ≤ n1/2+o(1).

2. the residues are values of a polynomial which we can factor using a sieve. Observe that for any prime p, p|f (x) implies either p|x or p|(x + 2m). So we can generate all such divisors by ‘walking around’ 2m and 0, in the same way we do a sieve. A fully rigorous analysis of the complexity of quadratic sieve is not known. The heuristic justification for the bound is that the values of f (x) factor like randomly chosen numbers of the same size.

Suppose we check all x ∈ [1, U ], and we only keep the ones that are y-smooth, the cost of the sieve is then about

X

2≤p≤y

U

i ≈ U · log log y

Then recall from last time that the probability of a number up to m being m1/λ-smooth is:

P r[x ≤ m is m1/λ-smooth] ≈ λ−λ. In this case, subbing in m ← n1/2 _gives:

λ = log n/(2 log B).

So to get y factored quadratic residues we should take M ← yλλ. Since y = n2/λ, we get a total cost for the factoring algorithm of

T = n2/λB + n3/λ, after taking the time for Gaussian elimination into account.

We get a ”good” value for lambda by setting these terms equal and solving for lambda. The analysis proceeds similarly to the Dixon algorithm, and we get

T = exp(p9/8(L log L) + o(1))

The constant p9/8 = 1.060660... in the exponent is significantly smaller than Dixon’s algo. Note that this is constant in exponent: it more or less square roots the runtime...

(3)

2 Improvements and Sanity Checks

The quadratic sieve is a practical factoring algorithm, and was the workhorse for large number factorization during the 80’s and early 90’s.

To make it run even faster:

1. Instead of factoring the f (x), put log f (x) into the array and subtract log p instead of dividing by p. This replaces division (an expensive step) by a single-precision operation.

2. Use multiple polynomials [Sil87].

3. Use sparse matrix techniques on the linear equations, instead of Gaussian elimina-tion [Mon95].

4. Parallelize. If you have many processors, give each one the task of sieveing a block of values of f (x). Alternatively, give each processor its own polynomial to work on. The master processor does the combining step at the end, which (in practice) is faster than sieveing.

5. Use higher degree polynomials: this will be the focus of the rest of our discussion. Here we largely follow the presentation from [Pom96].

On the other hand, note that the theoretical assumption of f (x) behaving like random numbers is a very major one. It’s mapping a rather narrow band of numbers, x = 1 . . . exp(L1/2_{) onto another rather narrow band of numbers around m ≈} √_{n. What}

we need is that when we multiply these numbers further (using the computed exponent vector), and take their residue modulo n, the roots don’t collide with good probability.

The only high level intuition I have for this is that square roots are essentially random: for any ‘low description’ sets A, B, the probability of a number in A squaring to a number in B is roughly |A||B|/n, at least once the wrap around effect kicks in.

There seems to be a lot of recent progress in math on how such sets interact, but I’m not aware of work in theoretical algorithms on this topic this century. Any pointers would be helpful: this issue is going to show up again in the next part as well, when we go to fancier f (x) functions.

3 Using Higher Degree Polynomials

We now work back in asymptotics, and ignore constants in the exponent. The QS can be viewed as generating square roots via the mapping

f (x) = x2+ 2mx + m2− n.

As long as this number is larger than n, we are able to ‘randomize’ the pre-image of the square.

(4)

To use higher degree polynomials, the idea is to go to a degree d polynomial, and obtain these large values through the product of a number of smaller polynomials. That is, we treat m as a special symbol, and piece together the higher degree poly via a product

Y

i

(ai+ bim) .

The issue is we still need to do a modulo n. For that, it’s useful to define fields/rings augmented with algebraic integers. That is, we pick a polynomial f (θ) such that

n = f (m) ,

or equivalently, let θ denote a root of the polynomial obtained from the base m represen-tation of n.

For n with L digits, we will pick parameters so that d = L1/3, and thus log_m ≈ L2/3_.

We will also pick the smoothness threshold y to be exp(L1/3_).

The main issue is how to ensure that Y

i

(ai+ biθ)

is a square over Z[θ].

In this lecture, we make a special assumption, which leads to the special number field sieve. That is Z[θ] is a unique factorization domain.

In this case, we can factor each ai+ biθ into products of primes polynomials. Let the

min-poly of θ be

xd+ cd−1xd−1+ . . . c0

the norm of a + bθ is defined as

N (a + bθ)def= ad− cd−1ad−1b + cd−2ad−2b2+ . . . (−1)dc0bd.

It can be shown that norms are multiplicative. Furthermore, the key property over the (assumed) unique factorization domains is:

Lemma 3.1. If Z[θ] is a unique factorization domain, then N(a + bθ) factors into the primes whose norms equal to the prime factorization of N (a + bθ).

Note that if ai, bi are picked to have L1/3 digits, the norm has at most L2/3 digits.

So by the ‘sqrt’ rule, both a + bm and N (a + bθ) are y-smooth (for log_y ≈ L1/3_{) with}

probability exp(−L1/3_{). So as long as one tries more than exp(2L}1/3_{) pairs, we get more}

than 10y ones where both are y-smooth. Solving equations on exponents modulo 2, we are able to get a subset S such that

Y

i∈S

(a + bm) Y

i∈S

(5)

are squares in Z and Z[θ] respectively. Evaluating the latter with the mapping θ ← m then gives two different numbers whose square match.

This is roughly how an exp(L1/3_{) type runtime. The general issue is that Z[θ] cannot} be expected to be a unique factorization domain in general. We will also discuss how to interpret these norms next time.

4 More on Norms

The formal definition norms is based on embeddings of Q[θ] into C. Such embeddings are defined by what θ gets mapped to. Let the embedding be σ : Q[θ] → C, then requirement of

f (σ (θ)) = 0 means σ(θ) must be mapped to a root of f .

As polynomials factorize completely over complex numbers, we can factor f into its roots f (x) = d Y i=1 (x − θi)

and define the embeddings σ1. . . σd via σi(θ) = θi. Note that implies that for some α,

which is really α(θ) =Pd

i=0αiθ

i_{, we have}

σi(α) = α (θi)

Then formally, the norm of some α ∈ Q[θ] is defined as N (α)def= Y

1≤i≤d

σi(α)

For the discussion above, we needed:

1. Norm is multiplicative, N (a) · N (b) = N (ab). This follows from the definition of σi(α) = α(θi), that is

σi(αβ) = αβ (θi) = α (θi) β (θi) = σi(α) σi(β) .

2. If α = a+bθ, we have N (α) =Pd

i=0(−1) i_c

iad−ibiwhere ci = [θi]f (θ). This is just the

product of a + bθi over all d roots. Expanding and collecting over all combinatiosn

of a and b gives Y 1≤i≤d (a + bθi) = X 0≤k≤d ad−kbk X S⊆[d],|S|=k Y i∈k θi.

That is, the coefficients on ad−k_bk_{is precisely the symmetric sum over k of the roots.}

By Vieta’s formula https://en.wikipedia.org/wiki/Vieta%27s_formulas, this is precisely the coefficients of f (θ) times (−1)k_.

(6)

References

[Dix81] John D Dixon. Asymptotically fast factorization of integers. Mathematics of computation, 36(153):255–260, 1981.

[Mon95] Peter L Montgomery. A block lanczos algorithm for finding dependencies over gf (2). In International Conference on the Theory and Applications of Crypto-graphic Techniques, pages 106–120. Springer, 1995.

[Pom84] Carl Pomerance. The quadratic sieve factoring algorithm. In Thomas Beth, Norbert Cot, and Ingemar Ingemarsson, editors, Advances in Cryptology: Pro-ceedings of EUROCRYPT 84, A Workshop on the Theory and Application of of Cryptographic Techniques, Paris, France, April 9-11, 1984, Proceedings, volume 209 of Lecture Notes in Computer Science, pages 169–182. Springer, 1984. [Pom96] Carl Pomerance. A tale of two sieves. In Notices Amer. Math. Soc. Citeseer,

1996.

[Sil87] Robert D Silverman. The multiple polynomial quadratic sieve. Mathematics of Computation, 48(177):329–339, 1987.