An Overview of Integer Factoring Algorithms. The Problem

(1)

An Overview of Integer Factoring Algorithms

Manindra Agrawal IITK / NUS

The Problem

Given an integer n, find all its prime

divisors as efficiently as possible.

(2)

A Difficult Problem …

• No efficient algorithm (= taking time (log n)

^c

) is know for the problem.

• The fastest known algorithm takes time exp( c (log n)

^1/3

(loglog n)

^2/3

) with c ≈ 1.9.

• With this, we can factor 140 digit numbers in reasonable time.

• It is believed that no efficient algorithm exists.

… Useful in Cryptography

• RSA cryptosystem’s security is based on hardness of factoring.

• Several other cryptosystems rely on

this problem as well.

(3)

We present an overview of the known factoring algorithms.

#1: Trial Division

Divide n with all primes up to √ n starting from 2 and collect all divisors.

• A very simple algorithm.

• Takes time exp(½ log n) = L(1, ½).

Notation: Denote exp(c(log n)

^ε

(loglog n)

^1-ε

)

(4)

#2: Pollard’s Rho Method

1. Randomly select x

₀

∈ {1, 2, …, n-1}, and compute x

_i

= x

_i-1²

+ 1 (mod n) for i = 1, 2, … 2. Compute gcd(x

_i

– x

_2i

, n) until a factor is

found.

• Discovered by J. Pollard in 1975.

• Takes time L(1, ¼).

• Used to factorize eighth Fermat number 2

²⁸

+ 1, a 78 digit number.

x0

Pollard’s “Rho” Shape

x1

x2

xt = xm

xt+1

xt+2

xm-1 xm-2

(5)

Analysis

• Let p be the smallest prime factor of n, so p

< √n.

• Number sequence x

₀

, x

₁

, x

₂

, … behaves randomly modulo p.

• So the probability that x

_t

= x

_m

(mod p) for t

< m is roughly 1/√p.

• Notice that if x

_t

= x

_m

(mod p), then x

_t+k

= x

_m+k

(mod p) for all k > 0.

• Therefore, there exists a s < 2t with x

_s

= x

_2s

(mod p).

• Again using randomness of the

sequence, with probability at least ½, x

_s

≠ x

_2s

(mod n).

• Therefore, p | gcd(x

_s

– x

_2s

, n) < n.

• For good probability of success, we need to generate roughly √ p = n

^1/4

x

_i

’s.

• So the time complexity is exp(¼log n).

(6)

#3: Pollard’s p-1 Method

1. Fix a factor base = set of all primes ≤ B.

2. Compute m = ∏

q prime, q ≤B

q

^{log n}

.

3. Compute gcd(a

^m

-1, n) for a random a.

• Discovered by J. Pollard in 1974.

• Takes time O(B (log n)

²

).

• Works if prime p | n and p-1 has no prime divisor greater than B.

Fermat’s Little Theorem

If p is prime then for all a with gcd(a, p) = 1, a

^p-1

= 1 (mod p).

• In other words, the set of numbers { a | 0 < a < p }

forms a group of size p-1 under

multiplication modulo p.

(7)

Analysis

• Suppose prime p | n and p-1 has no factor greater than B.

• This implies that p-1 | m.

• So, by Fermat’s Little Theorem, p divides a

^m

-1.

• So it might be found when computing gcd(a

^m

-1, n).

Useful only for a subset of numbers n.

#4: Elliptic Curve Method

• Previous method works only for n’s with a prime divisor p such that p-1 is a product of small primes.

• It is always true that a number m

“close” to p will have this property.

• So if we can work with a group of size

m, instead of p-1, the method will work

for all numbers.

(8)

Elliptic Curves

• Elliptic curve E(a,b) has the following form:

y

²

= 4x

³

- ax – b; a

³

– 27 b

²

≠ 0

• The set of points on an elliptic curve form a group under “addition.”

• We consider elliptic curves modulo n.

• The number of points on an elliptic curve modulo prime p (= #E

_p

(a,b)) is between p+1- 2√p and p+1+2√p.

Curve y

²

= 4x

³

- 4x

A

B

-C

C

Addition on curve: A + B = C; E + F = O, point at infinity

E F

(9)

Algorithm

1. Fix a factor base = set of all primes ≤ B.

2. Compute m = ∏

q prime, q ≤B

q

^{log n}

.

3. Choose a random a and b with a

³

– 27b

²

≠ 0 (mod n).

4. Choose a random point P on elliptic curve E

_n

(a,b).

5. Attempt to compute a factor of n from mP–O (the “zero” for “addition”)

Analysis

• Similar to Pollard’s p-1 method.

• If prime p | n and #E

_p

(a,b) has no divisor >

B, then n can be factored.

• This works for all the numbers since

#E

_p

(a,b) is randomly distributed between p+1-2√p and p+1+2√p.

• A careful analysis shows the running time to be L(½, 1) – much better than earlier

methods!

(10)

• Used to factor tenth and eleventh Fermat numbers: 2

²¹⁰

+ 1 (308 digits) and 2

²¹¹

+ 1 (610 digits).

• Fastest known algorithm for most of numbers.

• Discovered by H. Lenstra in 1987.

#5: Fermat’s Method

1. Compute m = [√n].

2. For d = 1, 2, 3, … do:

i. Let x = m + d and test if x

²

-n is a perfect square.

ii. If yes, let y

²

= x

²

-n and factor n using gcd(n,x+y).

• Discovered by P. Fermat in 17

^th

century.

• Works fast if n has two factors close to √n.

(11)

Analysis

• Suppose n = k (k + t) with t small compared to k.

• Then m = [√n] ≈ k (1 + t/k)

^1/2

≈ k + ½t.

• Notice that with x = k + ½t,

x

²

- n = k

²

+ kt + ¼t

²

- k

²

- kt = (½t)

²

• So the “right” x will be quickly found.

#6: Dixon’s Method

• Proposed by Dixon in 1970’s.

• Simple version of Morrison-Brillhart method.

• Based on Fermat’s method.

• Aims to find x and y such that x

²

= y

²

(mod n).

(12)

Algorithm

Data Collection Step:

1. Fix a factor base = set of primes ≤ B.

2. Randomly choose a number v and compute u = v

²

(mod n).

3. If u has all prime factors ≤ B, store the pair (v,u).

• Do this until about B pairs have been stored.

Data Analysis Step:

1. Let p

₁

, p

₂

, …, p

_t

be primes ≤ B.

2. Let u

_i

= p

₁^ei,1

* p

₂^ei,2

* … * p

_t^ei,t

for every stored u

_i

.

3. Let vector w

_i

= [ e

_i,1

e

_i,2

… e

_i,t

].

4. Find a linear dependency amongst these vectors over F

₂

:

∑

_i

β

_i

w

_i

= 0 (mod 2).

5. Compute x = Π

_i

v

_i^βi

.

6. Compute y = (Π

_i

u

_i^βi

)

^½

.

7. Factor n as gcd(n, x+y).

(13)

Analysis

• Over integers, all numbers in ∑

_i

β

_i

w

_i

are multiples of 2.

• So,

Π

_i

u

_i^βi

= p

₁^∑^{i βiei,1}

* p

₂^∑^{i βiei,2}

* … * p

_t^∑^iβiei,t

is a perfect square.

• Since v

_i²

= u

_i

(mod n), we get

x

²

= Π

_i

v

_i^2βi

= Π

_i

u

_i^βi

= y

²

(mod n).

Analysis

• How quickly can we find required number of pairs?

Observation: If B is small, we need to find only a few pairs. But the chances of finding one pair are small.

If B is large, we need to find

many pairs. But chances of finding

(14)

Analysis

• What is the best value of B?

• It turns out to be

L(½, 1/√2) = exp(1/√2(log n)

^1/2

(loglog n)

^1/2

).

• With this value, the running time is L(½, √2) = exp(√2(log n)

^1/2

(loglog n)

^1/2

).

• Not as good as Elliptic curve method.

#7: Quadratic Sieve

• Proposed by C. Pomerance in 1981.

• A combination of Fermat’s method and Dixon’s method.

• Does the Data Collection step cleverly to reduce time.

• The best value of B becomes L(½, ½).

• The running time reduces to L(½, 1).

• Betters Elliptic curve method for large

numbers that are used in cryptography.

(15)

• Used to factor 129-digit RSA challenge in 1994:

RSA-129 = 1143 81625 75788 88676 69235 77997 61466 12010 21829 67212 42362 56256 18429 35706 93524 57338 97830 59712 35639 58705 05898 90751 47599 29002 68795 43541

The Sieving Idea

• The v

_i

’s to be tested are chosen from the range [√n, √n+A].

• For each v

_i

, we check if v

_i²

– n has all prime divisors ≤ B.

• For a prime q ≤ B, if q divides v

²

– n,

then it will also divide (v + kq)

²

– n and

(kq - v)

²

– n for all integers k.

(16)

The Sieving Idea

• So, for each q ≤ B, do the following:

– Solve the equation x

²

= n (mod q) to obtain two solutions, say α and β .

– Divide all numbers in the range [√n, √n+A]

that are of the form α + kq or β + kq by q as many times as possible.

• Once all q’s are finished, the numbers in the range that become 1 are the useful ones.

The Time Complexity of Factoring

• A number of algorithms have time complexity L(½, c) for constants c.

• This led to the belief that the optimal complexity for factoring is L(½, c) for some c ≤ 1.

And then the Number Field sieve

appeared …

(17)

#8: Number Field Sieve

• Proposed by J. Pollard in 1988 and improved by C. Pomerance, H. Lenstra and others.

• A generalization of Quadratic sieve to number fields.

• The running time is L(1/3, 1.923).

• Used to factor ninth Fermat number 2

²⁹

+ 1 (153 digits) and RSA-130 (in 1996).

Number Field Sieve Idea

• Select a small degree d.

• Find a polynomial f(x) and number m such that (1) m ≈ n

^1/d

and (2) n divides f(m).

• Let α be a root of f(x) over complex numbers.

• Consider ring Z[α], consisting of all complex numbers that can be written as:

∑

_j

c

_j

α

^j

(18)

• Define a map ψ from Z[α] to Z/nZ, the ring of residues modulo n as:

Ψ (∑

_j

c

_j

α

^j

) = ∑

_j

c

_j

m

^j

(mod n).

• Clearly,

0 = Ψ(f(α)) = f(m) = 0 (mod n), and so Ψ is a ring homomorphism.

• Now find sequence of pairs (u

_i

, v

_i

) and a sequence of exponents β

_i

such that:

1. Π

_i

(u

_i

- mv

_i

)

^βi

is a square in Z.

2. Π

_i

(u

_i

- αv

_i

)

^βi

is a square in Z[α].

• Then,

x

²

= Π

_i

(u

_i

- mv

_i

)

^βi

= Π

_i

Ψ (u

_i

- αv

_i

)

^βi

(mod n)

= Ψ(Π

_i

(u

_i

- αv

_i

)

^βi

) (mod n) = Ψ(g

²

(α)) (mod n)

= [Ψ(g(α))]

²

(mod n) = g(m

²

) = y

²

(mod n).

(19)

• Pairs (u

_i

, v

_i