Integer Factorisation
Vassilis Kostakos
Department of Mathematical Sciences
University of Bath
http://www.geocities.com/vkostakos
May 7, 2001
MATH0082 Double Unit Project
Comparison of Integer Factorisation Algorithms
Candidate: Kostakos, V Supervisor: Russell Bradford SURNAME Checker:
Review date: December 2000 Final submission date: 10 May 2001 Equipment required:
Implement and compare several integer factorisation algorithms. Algorithms descriptions 15 α
Implementation 15 α Comparison tests 30 2α Report and analysis 40 2α
Total 100 6α
Note: All the software files which are refereed to by this report may be found on the BUCS filesystem at : ~ma9vk\public_html\project\
Abstract
The problem of integer factorisation has been around for a very long time. This report describes a number of algorithms and methods for performing factori-sation. Particularly, the Trial Divisions and Fermat algorithms are dicussed. Furthermore, Pollard’sρandp−1 methods are described, and finally Lenstra’s Elliptic Curves method. The theory behind each algorithm is explained, so that the reader can become familiar with the process. Then, a sample pseudocode is presented, along with the expected running time for each algorithm. Finally, this report includes test data for each algorithm.
CONTENTS
1 Introduction 1I
Documentation
3
2 Project Plan 4 2.1 Resources . . . 4 2.2 Scheduling . . . 4 2.3 Coding standards . . . 5 3 Requirements 7 3.1 User Definition . . . 7 3.2 Functional Requirements . . . 7 3.3 Non-functional Requirements . . . 83.4 Software and Hardware Requirements . . . 9
4 Testing 10 4.1 Correctness tests . . . 10
4.2 Performance tests . . . 10
II
Implementation
12
5 Tools for factorisation 13 5.1 Greatest common divisor . . . 135.2 Fast exponentiation modulo . . . 13
5.3 Primality testing . . . 14
6 Trial divisions algorithm 15 6.1 Description of trial divisions algorithm . . . 15
6.2 Implementation of trial divisions algorithm . . . 15
6.3 Running time . . . 16
6.4 Remarks . . . 17
CONTENTS ii
7 Fermat’s algorithm 18
7.1 Quick description of Fermat’s algorithm . . . 18
7.2 Detailed description of Fermat’s algorithm . . . 18
7.3 Implementation of Fermat’s algorithm . . . 19
7.4 Running time . . . 19
7.5 Remarks . . . 20
8 The Pollard ρmethod 22 8.1 Description of the algorithm . . . 22
8.1.1 Constructing the sequence . . . 22
8.1.2 Finding the period . . . 23
8.1.3 Calculating the factor . . . 24
8.2 Implementation of Pollardρ . . . 24
8.3 Running time . . . 25
8.4 Remarks . . . 26
9 The Pollard p−1 method 27 9.1 Description of the algorithm . . . 27
9.2 A slight improvement . . . 28
9.3 Implementation of Pollardp−1 . . . 28
9.4 Running time . . . 29
9.5 Remarks . . . 29
10 Elliptic Curves Method 31 10.1 Introduction to elliptic curves . . . 31
10.1.1 Elliptic curves as a group . . . 31
10.1.2 Elliptic curves modulon. . . 32
10.1.3 Computation on elliptic curves . . . 32
10.1.4 Factorisation using elliptic curves . . . 33
10.2 Implementation of elliptic curves method . . . 34
10.3 Running time . . . 34 10.4 Remarks . . . 36 11 Overall Comparison 37 12 Epilogue 39
III
Appendices
40
A Benchmarks 41 A.1 Tests with products of two nearby primes . . . 41A.2 Tests with products of three nearby primes . . . 42
A.3 Tests with products of three arbitrary primes . . . 42
B Program output 43 B.1 Tests output . . . 43
B.2 Combined factorisation output . . . 45
B.3 Biggest factorisation . . . 47
LIST OF TABLES
2.1 My schedule . . . 5
A.1 Products of two nearby primes . . . 41
A.2 Products of three nearby primes . . . 42
A.3 Products of three arbitrary primes . . . 42
LIST OF FIGURES
5.1 Pseudocode for computing gcd(a, b) using the Euclidean
algo-rithm . . . 13
5.2 Pseudocode for fast computation ofabmodm . . . . 14
6.1 Pseudocode for trial divisions algorithm . . . 16
6.2 Results of tests on Trial divisions algorithm . . . 16
7.1 Pseudocode for Fermat’s algorithm . . . 20
7.2 Results of tests on Fermat’s algorithm . . . 21
8.1 Pseudocode for the Pollardρalgorithm . . . 25
8.2 Results of tests on the Pollardρalgorithm . . . 26
9.1 Pseudocode for Fermat’s algorithm . . . 28
9.2 Results of tests on Pollardp−1 algorithm . . . 29
10.1 Pseudocode for main loop of Elliptic curves method . . . 34
10.2 Pseudocode forNEXTVALUESfunction of Elliptic curves method . 35 10.3 Results of tests on Elliptic curves algorithm . . . 36
CHAPTER 1
Introduction
This report, along with the software which I wrote, consist of my final year project. The main objective of this report is to balance somewhere between a theoretical explanation of certain factorisation algorithms and a description of my source code.
Background information
The problem of factorisation has been known for thousands of years. However, only recently did it become “popular”. This sudden interest in factorisation was due to the advances in cryptography, and mainly the RSA public key cryptosys-tem.
The problem of factorisation may be stated as follows: “Given a composite integer N, find a nontrivial factor f of N.”
There are a lot of factorisation algorithms out there. Some of them are heav-ily used, others just serve educational purposes. The factorisation algorithms may be distinguished in two different ways:
• Deterministic or nondeterministic • Run time depends on size ofN orf.
Deterministic algorithms are algorithms which areguaranteedto find a solution if we let them run long enough. On the contrary, nondeterministic algorithms may never terminate. The most usual distinction, however, deals with the run-time of the algorithm. The running run-time of recent algorithms dependS on the size of the input number N, whereas older algorithms depended on the size of the factorf which they find.
About my project
In doing my project, I tried to cover a broad range of algorithms and methods. The running time of all the algorithms I have implemented depends on the size
CHAPTER 1. INTRODUCTION 2 of the factor f which they find. Furthermore, only the first two algorithms which I describe aredeterministic.
About this report
This report is divided into 3 parts. The first part deals with my preparation and scheduling for doing the project. Matters like requirements, resources, etc.
are all covered in the first part.
The second part of this report presents an account of all the algorithms I implemented. For each algorithm, I have tried to describe the theoretical background in order to make the reader understand what’s going on. Then, I describe my implementation of the algorithm , along with some sort of pseu-docode for illustration purposes. Finally, I present my test results, in the form of a graph. (In Appendices A and B I have included a set of tests on all of the algorithms).
The third part consists of the Appendices, in which I have includes sample timings of the algorithms, as well as output of my program.
Part I
Documentation
CHAPTER 2
Project Plan
2.1
Resources
I started planning for this project by writing down what resources I though I was going to need in order to successfully complete the project.
In terms of Hardware, all I needed was a computer, which I already owned. Furthermore, I could use the computing facilities of the University as well. In terms of software, I decided that I wanted to write the program using C. There are lots of different environments for creating C programs. However, I used the LCC-WIN32 version 3.3 for Windows, which includes an ansi-C compiler.
My main concern was finding a suitable arbitrary-precision library, which I could use with my program. In the end, I decided to use Mike’s Arbitrary Precision Math Library (MAPM) version 3.70, written by Michael C. Ring ([email protected]).
Furthermore, I thought that I would also need some kind of books or papers, which would help me. In addition to the resources listed in the bibliography section, I also made use of the following programming books:
• Walter A. Burkhard, “C for programmers”, 1988 Wadsworth, Inc. • Morton H. Lewin, “Elements of C”, Piscataway, New Jersey.
• M.I. Bolsky, “The C Programmer’s Handbook”, AT&T Bell Laboratories, Prentice Hall, Inc.
• Leslie Lamport, “LaTeX user’s guide and reference manual”, 1994 Addison-Wesley Publishing Company.
2.2
Scheduling
The next part in planning my project was to devise of a schedule, which would roughly be my guide in what I do. In table 2.1 you can see my schedule, or to be precise, the final version of my schedule.
CHAPTER 2. PROJECT PLAN 5 Schedule
Weeks Tasks
1 (Semester 1) 2 (Semester 1)
3 (Semester 1) Signed up for LEGO maze-solving robot 4 (Semester 1) Preliminary research on Robot movement, etc. 5 (Semester 1)
6 (Semester 1) Wrote first version of software for robot. 7 (Semester 1) NEW PROJECT: Integer factorisation
8 (Semester 1) Looking for a maths library
9 (Semester 1) Found the MAPM library, performance tests 10 (Semester 1) Implement trial divisions algorithm
11 (Semester 1) Wrote low-level functions for MAPM 12 (Semester 1) Implemented Fermat’s algorithm (Christmas) Revise for exams
(Christmas) Revise for exams (Christmas) Revise for exams 13 (Exams) Exams
14 (Exams) Exams 15 (Exams) Exams
1 (Semester 2) Research into Pollard’s algorithms
2 (Semester 2) Implement MODEXPO, GCD, PRIME functions 3 (Semester 2) Pollard’sρalgorithm
4 (Semester 2) Tests on all algorithms so far implemented 5 (Semester 2) Pollardp−1. Read about Elliptic curves 6 (Semester 2) Elliptic curves algorithm and testing 7 (Semester 2) Function interface modifications, more tests 8 (Semester 2) Developed COMBINED function. Started report (Easter) Test result analysis, graph generation
(Easter) Report writting (Easter) Report writting 9 (Semester 2) Report writting
10 (Semester 2) Report revision, final version preparation 11 (Semester 2) DEADLINE
Table 2.1: My schedule
I tried to follow my schedule as close as possible. Sometimes, I made changes to it, in order to accommodate any new tasks I thought were required. The final version of my schedule resembles quite a lot my initial schedule, however I have made a number of changes.
2.3
Coding standards
It is always a good idea to specify some coding standards before starting a project, even if only one person is going to do any coding.
First of all, I should say that all the source files were compiled using the -ansi flag. I received no warning messages when compiling the final version of my program.
CHAPTER 2. PROJECT PLAN 6 • Function names beginning withm belong to the MAPM library. Specifi-cally, the functions that begin withm apm are functions which are defined in the library itself. Any other functions beginning withm are macros of functions in the MAPM library, which I defined in order to shorten the code.
• Function names beginning withM are low-level functions which interface the MAPM library. I wrote these functions in order to improve the per-formance of the program, and shorten the code as well.
• The prototypes for functions in filexxx.care placed in the filexxx.h. It was obvious that the software program I was creating was quite modular, and could be built in “big chunks” at a time. Therefore, I decided to use a common algorithm testing interface. This meant that I would place each algorithm in a separate file, and use a common file to call the factorisation routines. This would also make it possible to call all of my algorithms from another function in an effort to factorise a really hard number.
By doing the above, I was planning to minimise the effort of adding a new algorithm to my program, and make the testing of different algorithms quicker and easier.
CHAPTER 3
Requirements
This chapter describes all the requirements and specifications that I used for implementing this project. Of course, these requirements were in no case static. In fact, they would change quite often, as I moved further into the project. A change in the requirements would often reflect upon a new idea that I came up with, or an idea that I wanted to drop. Therefore, these are the requirements at the end of the project.
3.1
User Definition
The first thing that I had to specify was my target audience. It helps a lot to know who you want to look at your work. I guess it would be too naive to assume that my audience consisted of the two examiners that would assess my project. On the other hand, I wouldn’t like to embark on a commercial software project, which would target a large piece of the market.
With the above in mind, I chose my audience to be the academic community. Such an audience is not really keen about software that blows and whistles, but is more interested in the theoretical background. In fact, I believe that my project could be used for educational purposes, because it demonstrates a simple implementation of some fundamental mathematical concepts.
Of course, when I refer to myproject, I refer to both thesoftware as well as thefinal report. Therefore, my choice of the academic community as an audience should have an effect on both the software and the final report.
3.2
Functional Requirements
I believe that it was clear that my software should accept as input an integer N, and produce as output a factorisationp1p2· · ·pn ofN. But there is more to
it than just that.
A very important requirement was that the software should be able to per-form arbitrary-precision arithmetic. In other words, it should be able to deal with really long numbers, and perform calculations on them.
CHAPTER 3. REQUIREMENTS 8 Also, the software should output the computational time that was required to complete the factorisation, and also verify that the results it gives are correct. This should also be done while running long tests, and in which case the results should be somehow stored on a disk file.
3.3
Non-functional Requirements
The most important element of the non-functional requirements deals with the algorithms that the program will implement. Therefore, I decided to implement the following algorithms.
• Trial division algorithm • Fermat’s algorithm • Pollard’sρmethod • Pollard’sp−1 method • Elliptic curves method
The fact that I chose not to implement one of the “big” algorithms, namely
MPQS andNFS, is that I did not have enough time. By applying a variety of smaller algorithms, I got the flavour of different methods and theories, on which the very advanced algorithms are based.
In terms of the user interface, I believe that a GUI was not something really required. Therefore, I chose to implement a command line interface, with simple input and output.
The source code of the program was divided into the following files: MAIN.C This is the main file of the program. Nothing special here.
MAIN.H Main header file. Contains definition of output destination for pa-rameterised compilation.
AL TRIAL.C This file contains the source code for the trial divisions algo-rithm.
AL FERMT.C This file contains the source code for fermat’s method. AL PRHO1.C This file contains the source code for Pollard’sp−1 method. AL PRHO.C This file contains the source code for Pollard’sρmethod. AL ELLCRVS.C This file contains the source code for the Elliptic curve
method.
TESTS.C Here are defined some tests for measuring the speed of each algo-rithm.
TESTS.H This file contains parameters for the testing routine.
MYLIB.C In this file I have included some of my “tool” functions, as well as some low-level functions for the arbitrary-precision arithmetic library I used.
CHAPTER 3. REQUIREMENTS 9 MYLIB.H This file includes function prototypes as well as macro definitions. COMBINED.C This file contains a function which utilises all the factoring algorithms. It tries to factor a given number by applying the different algorithms until the number has been completely factorised, or until it gives up.
3.4
Software and Hardware Requirements
I developed the software on an MS-Windows 98 machine, with an Intel Celeron 433MHz processor. However, the software is capable of running on any machine which fulfills the minimum MS-Windows 95 requirements. Also, the source code may be compiled under a different operating system (Unix, Linux, etc.) in order to produce compatible versions of the program.
CHAPTER 4
Testing
The tests I performed for my project come in two flavours. First, I had to test my algorithms to see if they ran as expected, i.e. try to find “bugs” in the program. However, I also ran lots of performance tests, ie perform lots of factorisations in order to get a feeling of performance of each algorithm.
4.1
Correctness tests
Most of my testing for correctness was performed “in place” with the program. Essentially, I had to make sure that my algorithms did indeed perform a fac-torisation. This is quite easy to check within the main flow of the program, so I felt that there was no need for separate testing modules.
By just adding a couple of lines of code, I was able to test the correctness of my results every time I performed a factorisation. This way, I was constantly checking for errors, even when I was running the performance tests.
I should note at this point that all my checking was performed (inevitably) using the facilities of the MAPM library. I guess that if the MAPM library contained any sort of errors, my checks, and in fact my whole program, would be erroneous.
4.2
Performance tests
I had to perform two separate kinds of performance tests. First of all, I ran tests on the library MAPM, to get a feel for its capabilities. These tests were supposed to give me an approximation of how fast this library was, and how to judge my algorithms according to the library’s capabilities.
The second, and most important kind of performance test was to benchmark the algorithms I implemented. These kinds of tests I usually performed after I felt that an algorithm was fully implemented. The results of these tests are included in the last section of each algorithm’s chapter. I have tried to evaluate these tests, to the best of my abilities, and perhaps draw on some conclusions.
CHAPTER 4. TESTING 11 In the Appendix A I have tried to perform a mini “benchmarking” scheme, where all the algorithms were given the same numbers, and their performance was timed and entered into a table. Although I did not run too many of these tests, I felt that the results were quite within what I expected.
Finally, in Appendix B I have included some sample printouts of the perfor-mance tests for each algorithm, as well as sample output of my final program, which utilises all the algorithms in order to factorise an input number.
Part II
Implementation
CHAPTER 5
Tools for factorisation
Before proceeding with the actual algorithms and their description, it would be useful to describe some “tool” algorithms which are used throughout the factorisation algorithms.
5.1
Greatest common divisor
This algorithm is by far the most used algorithm in my program. It is used by all the factorisation methods I have implemented. A very efficient routine for finding the greatest common divisor of two numbersaand b would greatly enhance the performance of the factorisation algorithms.
In figure 5.1 I have included pseudocode for finding thegcd(a, b) using the Euclidean method. WHILE b 6= 0 DO temp := b b := a MOD b a := temp RETURN a
Figure 5.1: Pseudocode for computinggcd(a, b) using the Euclidean algorithm
5.2
Fast exponentiation modulo
The idea behind fast exponentiation is that if the exponent is a power of 2 then we can exponentiate by successively squaring:
x8= ((x2)2)2
CHAPTER 5. TOOLS FOR FACTORISATION 14 n = 1 WHILE b 6= 0 IF b is odd THEN n := n × a MOD m b := bb/2c a := a × a MOD m
Figure 5.2: Pseudocode for fast computation of abmodm
x256= (((((((x2)2)2)2)2)2)2)2.
If the exponent is not a power of 2, then we use its binary representation, which is just a sum of powers of 2:
x291=x256×x32×x2×x1.
The pseudocode shown in figure 5.2 will quickly compute abmodm. The
way it works is that it finds the binary representation of b, while at the same time compute successive squares ofa. The variablenrecords the product of the powers ofa, and also contains the final result at the end of the computation.
5.3
Primality testing
According to Fermat’s little theorem, ifnis odd and composite andnsatisfies 2n−1≡1 (modn)
then we say that nis pseudoprime. Therefore, for any number n, we can just compute the value 2n−1 (modn) using the algorithm 5.2, and then simply check
to see if the return value is 1 or not.
Despite the fact that this test is not a 100% guarantee of primality, in practice it is very useful. This test can be made stronger by computing the same values for the bases 2,3,5,7, and then checking to see if all of them yield the result 1.
CHAPTER 6
Trial divisions algorithm
The most straight-forward algorithm for factorising an integer is using trial divi-sions. This algorithm is a good place to start, and it is quite easy to understand.
6.1
Description of trial divisions algorithm
This algorithm essentially tries to factorise an integer N using “brute force”. Starting at p= 2, this algorithm tries to divideN with every number until it succeeds. When this happens, it sets N←N/p, and resumes its operation.
The way in which we choose ourpcan speed up, or slow down, our algorithm. For instance, we could pick ourp’s sequentially, by adding 1 at every iteration. Even better, we could divide N by 2 and 3, and then keep adding 2 to p in order to generate a sequence of odd numbers. The fastest way, but with more memory requirements, is to generate a list of all prime number below a specified limit, and then assign those values top.
6.2
Implementation of trial divisions algorithm
In figure 6.1 you can see the pseudocode of my implementation. I have not made any attempts to optimise this algorithm, and so I have used the “naive” way of choosing myp’s, i.e. by adding 1 to the trial divisor at every iteration.
As far as the source code is concerned, this function accepts the following parameters:
• n: The number to be factorised. Note that no changes are made to the original value of this variable.
• max: This variable sets the limit of the maximumtest factorto be used. • factors: An array of MAPM variables, in which the factors ofnwill be
written.
CHAPTER 6. TRIAL DIVISIONS ALGORITHM 16
INPUT N
test factor := 2
WHILE (N > 1 AND test factor < max) IF (N MOD test factor) == 0 THEN
N := N / test factor PRINT test factor ELSE
test factor := test factor + 1
Figure 6.1: Pseudocode for trial divisions algorithm
Figure 6.2: Results of tests on Trial divisions algorithm
6.3
Running time
According to [2], the expected running time of this algorithm is O(f·(logN)2),
wheref is the size of the factor found. The efficiency of this algorithm depends on your strategy of choosing the trial divisorsp, as explained earlier. In figure 6.2 you can see the results of the tests of my implementation of this algorithm. The graph shows the factor size versus the amount of time it took, from a sample of 1427 factorisations. As expected, the amount of time the algorithm takes increases exponentially with the size of the factor found. Practically, after 6 or 7 digits, this algorithm becomes too expensive.
CHAPTER 6. TRIAL DIVISIONS ALGORITHM 17
6.4
Remarks
One of the features of this algorithm is that if we let it run long enough on a prime Np, it will prove the primality of Np. In most cases this is not wanted,
and it is regarded as a waste of effort. However, this algorithm is very fast in finding prime factors of size less than 5-6 decimal digits. Furthermore, this algorithm may be used in breaking up composite factors which are found using the algorithms described in the following chapters.
CHAPTER 7
Fermat’s algorithm
The first of the modern algorithms that I will describe is due to Fermat. It is not usually implemented these days unless it is known that the number to be factored has two factors which are relatively close to the square root of the number. However, this algorithm contains the key idea behind two of the most powerful algorithms for factorisation, the Quadratic Sieve and the Continued Fractions algorithm.
7.1
Quick description of Fermat’s algorithm
Fermat’s idea is the following. Let the number to be factored be N. Suppose that N can be written as the difference of two squares, such as
N =x2−y2
Instantly, we could write N as (x−y)(x+y), and thus we have successfully brokenN into two factors. The two factors may not be prime. In that case, we could recursively apply this process until we deduce a prime factorisation forN.
7.2
Detailed description of Fermat’s algorithm
The first step in describing this algorithm is to prove that everyoddnumberN can be written as a difference of squares.
Let us suppose thatN =a×b. Since we assumedN to be odd, then both aandbmust be odd. Now, let us definexandy as follows:
x= (a+b)/2, y= (a−b)/2
Then, if we try to work outx2−y2 for the above values, we get
x2−y2= (a2+ 2ab+b2)−(a2−2ab+b2) =ab=n.
Fermat’s algorithm works in the opposite direction from trial division. When we apply trial division, we start by looking at small factors, and we work our
CHAPTER 7. FERMAT’S ALGORITHM 19 way up to√N. In Fermat’s algorithm, we start by looking for factors near√N, and work our way down.
7.3
Implementation of Fermat’s algorithm
Now I will describe an implementation of Fermat’s algorithm. As I mentioned earlier, we search for integersxandysuch thatx2−y2=N. We can start with
x=d√Ne, and try increasing y until x2−y2 is equal or less thanN. If it is equal toN then we are done! If not, we increasexby one, and we iterate.
In order to further optimise the algorithm, let us set r = x2−y2 −N.
Therefore, we have success when r= 0. All that we really want to do is keep track ofr. The value of r can change only when we increasexby one or y by one. When we replace x2 with (x+ 1)2, variable r increases by 2x+ 1. We
could express this increase in r by setting u= 2x+ 1. Similarly, when y2 is
replaced by (y+ 1)2the variabler decreases by 2y+ 1. This decrease inrcan
be expressed asv= 2y+ 1. (Note that whenxandy increase by one,uandv increase by two.)
Having defined r, u, and v, we can proceed with our implementation. It turns out that we do not actually need the values x andy. Since we start by setting x=e√Ndand y = 0, it follows that u= 2d√Ne+ 1 and v = 1. Also, r= (d√Ne)2−N.
All we now have to do is define an increase in x and an increase in y. According to the definition of uand v, an increase to xby 1 would increase r byu, anduby 2. Similarly, and increase toy by 1 would decrease rbyv, and increasev by 2.
The algorithm is completely defined. All we now have to do is keep increasing xandy (in practiceu, v, andr), until r= 0. Whenr is zero, we can compute (x+y) and (x-y) as follows:
x+y= (u+v−2)/2, x−y= (u−v)/2
At this point, I believe that some sort of pseudocode would be most appro-priate in order to fully understand my implementation. Figure 7.1 contains the pseudocode which describes my implementation.
7.4
Running time
How much work is actually needed to find the factors ofN? Let us suppose that N =a×b, with a < b. The factorisation will be achieved whenx= (a+b)/2. Since the starting value of x is √N, and b =N/a, the factorisation will take approximately 1 2(a+ N a)− √ N= ( √ N−a)2 2a cycles.
If the two factors of N are really close, i.e. if a=k√N, with 0< k <1, then the number of cycles required in order to obtain the factorisation is
(1−k)2
2k √
CHAPTER 7. FERMAT’S ALGORITHM 20 INPUT N sqrt := d√Ne u := 2 * sqrt + 1 v := 1 r := sqrt * sqrt - N WHILE r <> 0
IF r > 0 THEN /* Keep increasing y */ WHILE r > 0 r := r - v v := v + 2 IF r < 0 THEN /* Increase x */ r := r + u u := u + 2 PRINT (u + v - 2) / 2 PRINT (u - v) / 2
Figure 7.1: Pseudocode for Fermat’s algorithm
This complexity is of the order O(cN12). However, the value of k can be very small, and thus making this algorithm impractical. For instance, let us consider an “ordinary” case wherea≈N13 and b≈N23. In such a case, the number of cycles necessary will be
(√N−√3 N)2 2√3 N = (√3 N)2(√6 N−1)2 2√3 N ≈ 1 2N 2 3,
which is considerably higher than O(N12). Therefore this algorithm is only practical when the factorsaandbare almost equal to each other.
In figure 7.2 you can see the test results of my implementation of Fermat’s algorithm, from a sample of 2075 factorisations. Again, the graph shows the relation of the size of the factor found versus the amount of time it took. As we expected, this algorithm become too slow for factors with 7 or more digits. The graph follows the same trend as the trial divisions algorithm. In practice however, we will prefer the trial divisions algorithm.
7.5
Remarks
This algorithm has a very nice feature: it does not involve multiplication. We have defined the variables r, u, v in such a way that we only need to perform addition and subtraction. This is why sometimes this algorithm is called fac-torising by addition and subtraction. However, the number of additions and subtractions that we have to perform is quite large. For example, in order to factorise
1783647329 = 84449×21121 we need to increasex10551 times, and y 31664 times.
CHAPTER 7. FERMAT’S ALGORITHM 21
Figure 7.2: Results of tests on Fermat’s algorithm
Additionally, this algorithm suffers from the same problem as trial divisions, it will prove primality in the worst case. If this algorithm is given a prime number p, then the results will eventually be 1 and p. By the way, this is even worst than proving primality with trial divisions. The total number of cycles required for proving primality isn−√n, which is much worst than trial divisions.
CHAPTER 8
The Pollard
ρ
method
This method is also called Pollard’s second factoring method or the Monte Carlo Method because of it pseudo random nature. It is based on a “statistical” idea [7] and has been refined by Richard Brent [1]. The ideas involved for finding the factors of a number N are described below.
8.1
Description of the algorithm
In short, the algorithm comprises of the following steps:
1. Construct a sequence of integers {xi} which is periodically recurrent
(modp), where p is a prime factor ofN.
2. Search for the period of repetition, i.e. findiandjsuch thatxi ≡xj (modp).
3. Calculate the factor pofN.
8.1.1
Constructing the sequence
The first step in finding a factor is to construct a sequence of periodically recur-rent values. Let us consider a recursively defined sequence of numbers, according to the formula
xi ≡f(xi−1, xi−2, . . . , xi−k) (modm)
where m is any arbitrary integer, and given the initial valuesx1, . . . , xk. This
means that the valuesxk+1, xk+2, . . .can be computed by using thekpreviously
computed values. However, all the values are computed mod m, and therefore there are onlym possible values that eachxi can take. This means that there
are at most msdistinct sequences of svalues. Therefore, after at mostms+ 1
values, we will have two identical sequences ofsconsecutive numbers. Let these sequences of s values be xi, xi+1, . . . , xi+s−1 and xj, xj+1, . . . , xj+s−1. Since
these sequences are identical, it follows that their next elements, namely xi+s
and xj+s respectively, will be the same. In fact, every element xi+s+n and
xj+s+n will be identical thereafter.
CHAPTER 8. THE POLLARDρMETHOD 23 This means that the sequence {xi} is periodically repeated, except maybe
from a part at the beginning which is called the aperiodic part. This part can be thought of the “tail” of the Greek letterρ. Once we get off the tail, we keep cycling around the same sequence of values. That’s why this algorithm is known as the Pollardρalgorithm.
Back to our problem, instead of random integers{xi}, it would be sufficient
to recursively compute a sequence of pseudo-random integers. The simplest way to do this would be to define a linear formula such as
xi+1≡axi (modN)
for a fixed a and x0. Unfortunately, it turns out that this does not produce
sufficiently random values to give a short period of recurring values. This means that we would have to compute a lot of values before we can identify the period of recurrence.
The next simplest choice is to use a quadriatic formula such as xi+1≡x2i +a (modN)
for a fixedaandx0. It has been empirically observed that the above expression
does produce sufficiently random values1. Pollard found that in such a sequence
{xi} of integers modN an element is usually recurring after only about C
√ N steps.
8.1.2
Finding the period
The second step of the algorithm is to search for the period within the sequence {xi}. To determine it in the most general case would require finding where
a sequence of consecutive elements is repeated if the period is long. This is quite a tedious task, and is ruled out by the amount of labour involved. In the simplest case however, where xi is defined in terms of xi−1 only, the sequence
will start to repeat as soon as any singlexk is the same as any of the previous
ones. Therefore, in order to find the period, we only need to compare each new xj with the previous values.
The original version of Pollard’s method usedFloyd’s cycle-finding algorithm
for finding the period.2 Suppose the sequence {x
i} (modm) has an aperiodic
part of length a and a period of length l. The period will then ultimately be revealed by the test: Isx2i≡xi (modm)?
Theρmethod of Pollard has been made about 25% faster by a modification to the cycle-finding algorithm due to Brent [1]. As we saw above, Pollard searched for the period of the sequence xi (modm) by considering x2i −xi
(mod m). Instead, Brent haltsxi wheni= 2k−1 and subsequently considers
x2n−1−xj, 2n+1−2n−1≤j≤2n+1+ 1.
In this way the period is discovered after fewer arithmetic operations than de-manded by the original algorithm of Pollard. The saving in Brent’s modification is due to the fact that the lowerxi’s are not computed twice as in Floyd’s
algo-rithm.
1Note that this is not true ifais either 0 or -2.
CHAPTER 8. THE POLLARDρMETHOD 24
8.1.3
Calculating the factor
Finally, consider the third and last step of Pollard’s ρ method. If we have a sequence {xi} that is periodic (modN), how can we find p, the unknown
factor ofN?
In section 8.1.1 we saw that the formulaxi ≡x2i +a (mod N) is sufficient
to give us a desired sequence of pseudo-random numbers. Now, let us introduce the formula
yi=xi (modp)
wherepis the unknown factor ofN. This formula gives rise to a few nice proper-ties. The sequence{yi}is periodic, and eventually we will haveyi=yj (modp).
But when this happens, then xi =xj (mod p), which means that pwill divide
xi−xj. Therefore, by taking the GCD ofxi−xj andN, we have a very good
chance of finding a non-trivial divisor ofN.
All this is nice, except from the fact that we do not knowp, which means that we cannot compute{yi}, and therefore we do not know whenyiwill equal
yj. This is where the algorithms for finding the period in a periodic sequence
are used. What we do is that we use Floyd’s or Brent’s algorithm to choose lots ofxi’s andxj’s, and we each time we compute the GCD ofxi−xj andN.
Usually, the GCD will be one. But as soon as xi ≡xj (modp), thenxi−xj
will be divisible by p, which means that the GCD will be a non-trivial divisor ofN.
A further improvement that can be made to both versions of Pollard’s ρ algorithm has as follows. Instead of computing the GCD at every cycle of the algorithm, we can accumulate the product of differences of all the pairs we have considered. After say 20 cycles, we can compute the GCD of this product and N, without risking to miss any factors ofN. This way, the burden of computing a GCD at each cycle is reduced to one subtraction and one multiplication.
8.2
Implementation of Pollard
ρ
As with the previous algorithms, I implemented Pollard’sρalgorithm in a single function. The parameters that the function expects are:
• n: The number to be factorised. Note that no changes are made to the original value of this variable.
• max: This variable sets the limit of the maximumtest factorto be used. • factors: An array of MAPM variables, in which the factors ofnwill be
written.
In figure 8.1 you can see the pseudocode of my algorithm. Note that the constanta, which is used to generate the pseudorandom sequence, is hardcoded in the function. It is quite an easy task to change its value, in order to get a different sequence of numbers.
CHAPTER 8. THE POLLARDρMETHOD 25
INPUT N, c, max x1 := 2
x2 := x12 + c /* Our chosen function */ range := 1
product := 1 terms := 0
WHILE terms < max DO FOR j := 1 to range DO
x2 := (x22 + a) MOD N /* Our chosen function */ product := product × (x1 - x2) MOD N
terms := terms + 1
IF (terms MOD 20 == 0) THEN g := gcd(product, N) IF g > 1 THEN
PRINT g
N := N / g product := 1
next values(x1, x2, range) /* Brent’s improvement */
Figure 8.1: Pseudocode for the Pollardρalgorithm
8.3
Running time
Under plausible assumptions, the expected running time of Pollard’sρalgorithm is
O(f1/2·(logN)2),
where f is the size of the factor found. Figure 8.2 shows the test results of my implementation, from a sample of 4997 factorisations. There are a number of conclusions and comments to be made about this funny-looking graph.
First of all, we have to remember that this algorithm is not deterministic, but probabilistic. Therefore, the results might contradict themselves at some points. For instance, at first glance one might think that this algorithm takes more time to finds small factors than larger ones. However, this is not entirely true.
You should keep in mind that this graph only contains timings ofsuccessful
factorisations. So, although the times for 20-digit factors are quite small, the “success rate” of the algorithm is quite low for such factors.
The best way to explain this graph if we observe its patterns. There is an obvious pattern for each group of factors. The timings seem to build up slowly, and then explode very high. If this pattern is true for 20-digit numbers as well, we can see that the graph only contains the first part of the pattern, where the timings are quite small. If we had enough space to fit the entire graph, then when the pattern for 20-digit factors completed itself, its height could as much as the Eiffel tower’s!
CHAPTER 8. THE POLLARDρMETHOD 26
Figure 8.2: Results of tests on the Pollard ρalgorithm
8.4
Remarks
The method that I have just described for finding prime factors of composite integers is probabilistic. This means that we have to be prepared to be unlucky on occasion, and not get any results. If we run the Pollardρalgorithm and do not find any prime divisors that might be because there are no prime divisors in the appropriate interval or it might be because of bad luck.
What we need to do in such situations is to change our luck. For this algorithm, this would mean to change certain constants, such as the recursive function described in section 8.1.1. Then, of course, we have to know when it is time to give up, and perhaps try another algorithm. In practice, after running trial divisions up to 106 or 107, one would run the Pollard ρ algorithm for a
while. Keep in mind, however, that if all the prime factors are roughly larger than 1012 then this algorithm will not usually work.
CHAPTER 9
The Pollard
p
−
1 method
The next algorithm that I will consider is known as the Pollardp−1 algorithm [6]. It formalises several rules, which have been known for some time. The principle here is to use information concerning the order of some element a of the group MN of primitive residue classes mod N to deduce properties of the
factors of N.
9.1
Description of the algorithm
This algorithm is pretty much based on Fermat’s littletheorem: If pis prime, anda6= 0 modpthen
ap−1≡1 (modp).
Now, let us suppose that the number to be factored isN, and that one of its prime factors isp. Also, assume thatp−1 dividesQ. Using Fermat’s theorem, and under the assumption that (p−1)|Q, we arrive at
aQ ≡1 (modp),
and therefore pdivides aQ−1. Now, we can apply GCD to N andaQ−1 to
getpor some other non-trivial divisor ofN.
Our problem now is to find aQ such that (p−1)|Q, and keeping in mind that we do not know p. This can be done in two ways. The easiest way is to set Q=max! (modn). This value can be computed quickly, since
amax!= (· · ·(((a1)2)3)4· · ·)max,
and because as we saw in Section 5.2, exponentiation modulo N is very fast. Note that acan be any number, as long as it is relatively prime toN.
Another, more efficient way to chooseQ is to setQ=p1p2· · ·pk, wherepi
is a prime number less than a specified limit. In such a case we should also append to Qsome additional multiples of the small primes, so as not to miss out any factors of N. This will cut the number of exponentiations required by about a factor of eight.
CHAPTER 9. THE POLLARDP−1METHOD 28 INPUT N, c, max m := c FOR i := 1 to max DO m := modexpo(m, i, N) IF (i MOD 10 == 0) THEN g := gcd(m-1, N) IF g > 1 THEN PRINT g
Figure 9.1: Pseudocode for Fermat’s algorithm
No matter how we chooseQ, we have to keep in mind that essentially the size ofQis what limits our search space. For instance, by choosingQ= 10000! we are assuming thatp−1 has prime factors less than 10000.
9.2
A slight improvement
In practice, we do not know how close we have to get to max before we have picked up the first prime divisor ofN. And we do not want to go so far that we pick them all up. For that reason, we periodically check the value of GCD(aQ− 1, N). If it is still 1, we continue. If it is N, then we have picked up all the divisors of N. In such a case we need to either backtrack a bit, or try using a differenta.
9.3
Implementation of Pollard
p
−
1
In this section I will describe how I implemented the algorithm, as well as discuss certain issues that came up while implementing this algorithm.
The function accepts the following parameters:
num: the number to be factorised
c, max: so thatQ=cmax!
factors: an array where the factors of numare stored
The algorithm is essentially a loop which runs until we have reached the specified limit of iterations, which ismax. In most literature, this limit is set to 10000, so I decided to follow this guideline. My implementation uses the simple way of choosingQ, i.e. settingQ= 10000!, and subsequently calculating 210000!
(mod N). This is done using the procedure modexpo, which was described in section 5.2.
Every 10 cycles, the program calculates thegcdof the current 2k! (modN) andN, using the algorithm described in section 5.1. If thegcdis greater than one, then the gcdis written in the factorsarray. Subsequently, the program setsN ←N/gcd. If the remainingN is composite, then the procedure is applied recursively to the newN, otherwise the function terminates.
CHAPTER 9. THE POLLARDP−1METHOD 29
Figure 9.2: Results of tests on Pollardp−1 algorithm
The pseudocode of my implementation is shown in figure 9.1. I should note that my implementation makes no effort in backtracking or changingain case
gcdis equal to 1. It is up to the caller of the function to choose an appropriate a(c) and limit of iterations (max).
9.4
Running time
In the worst case, Pollard’s p−1 algorithm takes as long as the trial divisions algorithm. However, it usually does better, provided that we are lucky enough to find a factor.
In figure 9.2 I have plotted the results of 14217 factorisations using this algorithm. As previously, the graph contains timings derived only from the
successful tests, not the ones that failed. The patterns in the graph resemble greatly the graph of Pollard’s ρalgorithm. However, there is another point to be made about this algorithm.
Apparently, Pollard’s p−1 algorithm is much faster that Pollard’s ρ algo-rithm, but with less success. It turns out that the algorithm sheldomly gives back results, but when it does, it is very fast. This is why I had to perform so many tests on this algorithm, because more than 70% of the tests failed.
9.5
Remarks
This algorithm has the same problems as the previous one. As described earlier, at some point we might find the GCD to be equal toN. In such cases we will want to try to change the base a to a different integer. Also, the algorithm might not terminate ifp−1 has only large prime factors.
CHAPTER 9. THE POLLARDP−1METHOD 30 It has been statistically found that the largest prime factor of an arbitrary integer N usually falls aroundN0.63. Therefore, with a limit of 10000, Pollard p−1 will find prime factors that are less than two million. We should keep in mind however that there is a fairly wide distribution of the largest prime factor ofN, and therefore factors much larger than two million may be found.
According to [2], the largest factor found by this algorithm during the Cun-ningham project is a 32-digit factor
49858990580788843054012690078841 of 2977−1.
I should also note that because of Pollardp−1, the RSA public key crypto-system has restrictions on the primes a and b that are chosen. Essentially, if a−1 orb−1 have only small prime factors, then Pollard p−1 will break the encryption very quickly.
CHAPTER 10
Elliptic Curves Method
Factorisation based on elliptic curves is a relatively new method. As its name implies, this method is based on the theory of elliptic curves. First, I will briefly describe what elliptic curves are, and demonstrate the theory behind them. Then, I will go on with the description of the factorisation method using elliptic curves.
10.1
Introduction to elliptic curves
Elliptic curves are equations of the form
y2=x3+ax+b,
whereaandb are constants, such that 4a3+ 276= 0.
These curves have the curious property that if a line intersects it at two points, then it will also have a third point of intersection. A tangent to the curve is considered to have two points of intersection at the point of tangency.
If we know the two points (x1, y1),(x2, y2) of intersection, we can compute
the slopeλof the line, as well as the third point of intersection in the following way: λ = ( 3x2 1+a 2y ifx1=x2, y1−y2 x1−x2 otherwise x3 = λ2−x1−x2 (mod n) y3 = λ(x3−x1) +y1 (mod n)
10.1.1
Elliptic curves as a group
In order to perform factorisation with elliptic curves, we need to make the set of points on an elliptic curve into a group. To do this, we must define a binary operation∂, the identity element, as well as the inverse.
CHAPTER 10. ELLIPTIC CURVES METHOD 32 We start by defining the binary operation as follows:
(x1, y1)∂(x2, y2) = (x3,−y3)
where x3 and y3 are computed as shown earlier. Note that the new point is
notthe third point of intersection, but its reflection across the x-axis. It is still, however, on the same elliptic curve.
Now we proceed with defining the identity element of our group as follows: (x, y)∂(x,−y) = (x,−y)∂(x, y) =∞
With the above definition, we have managed to define both the identity element and the inverses. The identity element∞can be thought of as a point far north, such that every vertical line passes through it.
In terms of notation, E(a, b) denotes the group of rational points on the curvey2=x3+ax+b, where 4a2+ 27b26= 0, together with the point∞. Also,
with (xi, yi) we denote (x1, y1)#i, where
(x1, y1)#i= (x1, y1)∂(x1, y1)∂· · ·∂(x1, y1)
| {z }
itimes
.
10.1.2
Elliptic curves modulo
n
All our reasoning from the previous sections still applies to elliptic curves modulo n.
Ifx1≡x2 (modn) andy1≡ −y2 (mod n) then
(x1, y1)∂(x2, y2) =∞.
Letsbe the inverse ofx1−x2. As before, we define:
λ = (3x2 1+a)×s ifx1≡x2 (mod n), (y1−y2)×s otherwise x3 = λ2−x1−x2 (mod n) y3 = λ(x3−x1) +y1 (mod n)
Furthermore, we will define the binary operation as (x1, y1)∂(x2, y2)≡(x3,−y3) (modn),
and we will define (xi, yi) modnas
(xi, yi)≡(x1, y1)#i (modn).
Finally,|E(a, b)/n|will denote the elliptic group modulonwhose elements are pairs (x, y) of non-negative integers less thannand satisfyingy2≡x3=ax+b,
together with the point∞.
10.1.3
Computation on elliptic curves
In order to implement factorisation, we need a fast way of computing (x, y)#i. Given the first coordinate of (x1, y1), we can compute the first coordinate of
(x2, y2) as follows:
x2=(x
2−a)2−8bx
CHAPTER 10. ELLIPTIC CURVES METHOD 33 Therefore, given the first coordinate of (x, y)#i, we can compute the first coor-dinate of (x, y)#2iusing the above formula. We can extend this to 2i+ 1 with the following formula:
x2i+1=
(a−xixi+1)2−4b(xi+xi+1)
x1(xi−xi+1)2
.
As you can see, such computations involve lots of fractions. We can avoid using rational numbers if we introduce the notion of atriplet(X, Y, Z), where
x=X/Z, y =Y /Z,
and where X,Y, and Z are integers. Another nice feature of this notation is that the identity element∞now has the explicit representation (0, Y,0), where Y can be any integer.
If we define (Xi, Yi, Zi) = (X, Y, Z)#i, we can adjust our previous formulas
to our new notation:
X2i = (Xi2−aZi2)2−8bXiZi3,
Z2i = 4Zi(Xi3+aXiZi2+bZi3),
X2i+1 = Z((XiXi+1−aZiZi+1)2−4bZiZi+1(XiZi+1+Xi+1Zi)),
Z2i+1 = X1(Xi+1Zi−XiZi+1)2.
I should note that for our purposes, we do not need to calculate the second coordinateY of the triplets. Still,Yi can always be recovered fromXi andZi.
Also, we can use our triplets modulo n, as long as we do all our computations modulon.
10.1.4
Factorisation using elliptic curves
The method I will be describing is essentially due to A. K. Lenstra, and H. W. Lenstra, Jr.
LetN be a composite number relatively prime to 6. (In practice, this means thatN has no small factors). We randomly chooseafor our elliptic curve, and a random point (x, y) on the curve. We can now computeb as follows:
b≡y2−x3−ax (modN).
We convert to triplets (X, Y, Z), with our initial triplet being (x, y,1). Ifpis a prime number which dividesN, and|E(a, b)/p|divides k!, then
(X, Y, Z)#k! = (· · ·(((X, Y, Z)#1)#2)· · ·)#k
will be the identity element inE(a, b)/p(but not inE(a, b)). This simply means that there is at least one coordinate of (X, Y, Z)#k! which is not divisible by N, but all the coordinates are divisible byp.
SinceZk! is divisible byp, there is a good chance that the greatest common
CHAPTER 10. ELLIPTIC CURVES METHOD 34 INPUT N, X, Y, a, max b := Y2 - X3 - aX MOD N g := gcd(4a3 + 27b2, N) IF g > 1 THEN PRINT g Z := 1 k := 2 WHILE k <= max DO FOR i := 1 to 10 DO NEXTVALUES(X,Z,k,N,a,b) k := k + 1 g := gcd(Z, N) IF g > 1 THEN PRINT g
Figure 10.1: Pseudocode for main loop of Elliptic curves method
10.2
Implementation of elliptic curves method
My implementation of the Elliptic Curves Method consists of two “big” functions and four “smaller” functions. The first two are shown in figures 10.1 and 10.2. The main loop of the algorithm uses the same structure as some of our previous algorithms. Essentially, we loop many times, and at each iteration we take the gcdofN andZ. This function accepts the following parameters:
• n: The number to be factorised. Must be relatively prime to 6. • X, Y:These are arbitrary integers, between 1 andn.
• a: An arbitrary integer, the first parameter of our curve. • max: This variable sets the limit of the maximum iterations.
• factors: An array of MAPM variables, in which the factors ofnwill be written.
Most of the work, however, is done in theNEXTVALUES function. This func-tion is responsible for calculating the first and third coordinates of our triplets. This algorithm uses the binary expansion of kin order to find the results. By doing this, it manages to computeXk andZkby successively computingX2ior
X2i+1 in a minimum number of steps.
The four “small” functions that I mentioned use the formulas from section 10.1.3 to compute the values X2i, X2i+1, Z2i, Z2i+1. Their implementation is
quite straightforward.
10.3
Running time
According to [2], under plausible assumptions, the expected running time of this algorithm is O(exp(√clnfln lnf)·(logN)2), where c ≈ 2 is a constant.
CHAPTER 10. ELLIPTIC CURVES METHOD 35
/*Calculates the first and third coordinates of (X, Y, Z)#k (mod N).*/ INPUT X, Z, k, n, a, b i := 0 C[] := BINARY(k) X1 := X Z1 := Z X2 := X2i(X,Z) Z2 := Z2i(X,Z) FOR i := length(C[])-1 TO 1 DO U1 := X2i+1(X1,Z1,X2,Z2) U2 := Z2i+1(X1,Z1,X2,Z2) IF C[i] == 0 THEN temp := X2i(X1,Z1) Z1 := Z2i(X1,Z1) X1 := temp X2 := U1 Z2 := U2 ELSE temp := X2i(X2,Z2) Z2 := Z2i(X2,Z2) X2 := temp X1 := U1 Z1 := U2 PRINT X1, Z1
CHAPTER 10. ELLIPTIC CURVES METHOD 36
Figure 10.3: Results of tests on Elliptic curves algorithm
In figure 10.3 you can see the results of 6398 factorisations which I performed using this algorithm.
As in the Pollard p−1 algorithm, we can speed up this algorithm by re-stricting kto a set of powers of primes less than maxrather than running over all integers less thanmax. Also, we can expect better results if we regularly in-terrupt the run and restart with a new set of parameters rather than persisting on our initial choice of parameters.
10.4
Remarks
The Elliptic Curve Method has the characteristic of being practical from the point where trial division becomes impossible until well into the range where MPQS and NFS can be implemented.
The largest factor that has been found by ECM is the 53-digit factor of 2677−1, according to [2]. Note that if the RSA system was implemented with
512-bit keys and the three-factor variation, the smallest prime would be less than 53 digits, so Elliptic Curves could be used to break the system.
CHAPTER 11
Overall Comparison
The various factorisation methods I have described are all useful in different situations. When factoring a large number, the method to be chosen must depend on knowledge about the factors of the number. To begin with, you must make sure that the number is composite, so that you do not make a long computer run which will result in nothing. It would be really frustrating to discover after a very long run that N has the prime factorisation N = 97·p, which could have been obtained almost immediately by using trial division.
Further, you could use Fermat’s method in caseN is the product of two al-most equal factors, or Pollard’sp−1 method in the event ofNhaving one factor pwith p−1 being a product of only small primes. There are lots of methods for finding middle-sizes factors, all of which are good for specific situations.
But the question remains: how long do you keep looking for these middling sized factors before pulling out something like the Quadriatic Sieve or NFS? A well-balanced strategy, developed by Naur [5] may be summarised as follows:
1. Make sure N is composite. Since very small divisors are quite common and are found very quickly by trial division, it is worthwhile attempting trial division up to 100 or 1000 even before applying a strong pseudoprime test.
2. Perform trial division up to 105or 106. If Pollard’sρmethod is available,
then trial division need only be performed to a much lower search limit, e.g.. 104, since the small divisors will fall out rapidly also with Pollard’s methods. One reason why trial division with the small primes is useful, despite the fact that Pollard’sρmethod is quicker, is that the small factors tend to appear multiplied together when found with Pollard’s method, and thus have to be separated by trial division anyhow. Apply a compositeness test on what is left ofN every time a factor has been found and removed. 3. At this point you need to take a long shot, and with a little luck, shorten the running time enormously – it could even be decisive for quick success or complete failure in the case whenN is very large. The strategy to be employed is: Take the methods you have implemented on your computer
CHAPTER 11. OVERALL COMPARISON 38 covering various situations, which will mean one or more of the following: Pollard’sp−1 andp+1 methods, Fermat’s , Shanks’, or even the Williams’ methods. The methods should be capable of being suspended and resumed from where they stopped.
Since you cannot possibly know in advance which of these methods will achieve a factorisation (If a factorisation will be found at all), it is a good technique at this stage to run the program of each method in sequence for a predetermined number of steps, say 1000 or 10000, and breaking the runs off at re-start points in order to be able to proceed, if necessary. IfN does not factorise during such a run you have to repeat the whole process from the re-start point of the previous run. Also, you might want to consider the possibility of changing your choice of constants.
4. If the number N has still not been factored, you will need to rely upon the “big algorithms”. Depending on the size of the number and on the capacity of your computer, this can be the Multiple Polynomial Quadriatic Sieve (MPQS), the Number Field Sieve (NFS), or even the Elliptic Curves Method. Now you have to sit down and wait; fairly good estimates of the maximal running times are available for all these methods, so that you will know approximately how long the computer run could take.
Choosing which methods to use, and when, is still more an art than a science. You should keep in mind that the “big” algorithms are much more cumbersome and it is worth spending at least a few minutes trying to vary your luck first. Theoretically and experimentally, it has been shown that you have a better chance of finding your mid-sized factors if you run several algorithms with several choices of parameters rather than spending the same amount of time on a single algorithms with a single set of parameters.
CHAPTER 12
Epilogue
When I started my work on this project, I had very little knowledge of this field of study. Factorisation was something that I had never encountered before, at least not in great detail. I believe that this was to my advantage, since I was able to write my report from an introductory point of view, paying attention to the points which were hard for me to understand.
During the course of my project, I had to make lots of choices regarding the material which I would study. For instance, I chose not to implement one of the “big guns” of factorisation, such as MPQS or NFS. I believe that my choices allowed me to focus on the quality of what I did, instead of doing many things, but without enough care. This way, I was able to firmly understand the concepts of these elementary algorithms, and thus obtain a good background in the subject.
Of course, my project included lots of programming. Although I was already familiar with the programming language I used (ANSI C), I was able to further develop my programming skills. My final program consisted of roughly 1500 lines of code, which means that my program was not small. Furthermore, I de-veloped a sense of responsibility as far as organisation procedures are concerned, such as keeping a logbook and doing tests.
I am really happy to have managed to keep the balance between the theoret-ical and practtheoret-ical issues in doing my project. Although my report tends more towards theory, nevertheless I did quite a lot of work on the actual software. Thus I have been able to produce a complete tutorial of factorisation, including the theoretical description and background, the pseudocode description, and the actual implementation. I hope that this project will be helpful to those who get their hands on it.
Part III
Appendices
APPENDIX A
Benchmarks
In this appendix I have included some sample test results of all the algorithms I implemented. Theses tests used certain numbers which I specifically chose for some reason, and not any arbitrary numbers.
A.1
Tests with products of two nearby primes
For the first set of tests, I used numbers which were products of two nearby primes. In table A.1 you can see the results of this set of tests.
Algorithm number 1 number 2 number 3 number 4
Trial Divisions 0.05 1.71 8.58 15.81 Fermat Method 0.01 0.02 0.02 0.02 Pollard’sp−1 0.12 0.72 1.9 0.39 Pollard’sρ 0.02 0.33 0.59 0.61 Lenstra’s ECM 0.08 0.11 0.56 0.93 number 1 = 3980021 = 1993 x 1997 number 2 = 16831170221 = 129733 x 129737 number 3 = 431589872009 = 656951 x 65695 number 4 = 1469322167111 = 1212121 x 1212191
Table A.1: Products of two nearby primes
As it was expected, Fermat’s algorithm was by far the fastest. Another point to make is the quick time of Pollard p−1 for number 4. If we look at the decomposition of the factors of number 4 (minus one), we find out that 1212121−1 = 1212120 = 2·2·2·3·3·5·7·13·37. This means thatp−1 had small factors, and that’s why Pollard’sp−1 algorithm found the factors so quickly. Finally, I should note that Pollard’sρ algorithm was faster that Trial Divisions, even though the numbers were relatively small.
APPENDIX A. BENCHMARKS 42
A.2
Tests with products of three nearby primes
For this set of tests, I used numbers which were products of three nearby primes. In table A.2 you can see the results of this set of tests.
Algorithm number 1 number 2 number 3 number 4
Trial Divisions 0.07 0.14 0.26 2.72 Fermat Method - - - -Pollard’sp−1 0.26 0.68 0.28 1.74 Pollard’sρ 0.05 0.05 0.07 0.29 Lenstra’s ECM 0.12 0.67 0.82 0.70 number 1 = 7956061979 = 1993 x 1997 x 1999 number 2 = 110154695923 = 4789 x 4793 x 4799 number 3 = 1019829472003 = 10061 x 10067 x 10069 number 4 = 1005660644975291 = 100183 x 100189 x 100193
Table A.2: Products of three nearby primes
The results showed that Pollard’sρalgorithm was the fastest. The Elliptic Curves Method was fairly quick for large factors. For smaller factors, Trial Divisions was quicker. Note that it made no sense to run this test on Fermat’s method, since this method is used with numbers which have two factors.
A.3
Tests with products of three arbitrary primes
For the last set of tests, I used numbers which were products of three arbitrary primes. The size of the factors gradually grows as we move on to the next
number. In table A.3 you can see the results of this set of tests.
Algorithm number 1 number 2 number 3 number 4
Trial Divisions 0.11 0.221 3.122 8.26 Fermat Method - - - -Pollard’sp−1 0.23 0.90 5.46 -Pollard’sρ 0.07 0.06 0.14 1.31 Lenstra’s ECM 0.53 0.46 6.781 0.94 number 1 = 14960418503 = 179 x 8467 x 9871 number 2 = 8355211084777 = 1163 x 12347 x 581857 number 3 = 416531649825896503 = 12983 x 987533 x 32487877