A Parallel Line Sieve for the Gnfs Algorithm

—RSA is one of the most important public key cryp-tosystems for information security. The security of RSA depends on Integer factorization problem, it relies on the difficulty of factoring large integers. Much research has gone into problem of factoring a large number. Due to advances in factoring algorithms and advances in computing hardware the size of the number that can be factorized increases exponentially year by year. The General Number Field Sieve algorithm (GNFS) is currently the best known method for factoring large numbers over than 110 digits. In this paper, a parallel GNFS implementation on a BA-cluster is presented. This study begins with a discussion of the serial algorithm in general and covers the five steps of the algorithm. Moreover, this approach discusses the parallel algorithm for the sieving step. The experimental results have shown that the algorithm has achieved a good speedup and can be used for factoring a large integers.


I. INTRODUCTION
Factoring is very important in the field of cryptography, specifically in the RSA cryptosystem. The RSA algorithm [5] is the most popular algorithm in public-key cryptosystems and RSA is used in real world applications such as: internet explorer, email systems, and online banking [12]. The security of RSA algorithm relies on the difficulty of factoring large integers. There are many integer factorization algorithms used to factor large numbers, such as Trial division [6], Pollards p-1 algorithm [7], Lenstra Elliptic Curve Factorization (ECM) [8], Quadratic Sieve (QS) [9] and General Number Field Sieve (GNFS) algorithm [1]- [4]. GNFS is the best known algorithm for factoring large composite numbers over than 110 digits. This algorithm takes a long time to factor large integers. Therefore, this paper presents an implementation of parallel GNFS algorithm on a BA-cluster.
The main objective of this paper giving new proposed algorithm for sieving step in cluster system. This paper consists of eight sections. Section II will introduce the GNFS algorithm. Section III gives the reasons for selecting sieve step. Section IV gives an overview for serial sieve step and give an overview for previous parallel sieve step. Section V proposes a new method for parallel sieve step on cluster system. In section VI the configuration of hardware and software used to implement the parallel sieving step on cluster system. Section VII introduces the experimental results for the proposed methods. Section VIII focus in conclusion and future works.
GNFS have five major steps which are described as follows: Find a polynomial f : R → R of degree d with integer coefficients as follows: such that f (m) ≡ 0( mod n).

2)
Step 2: (Factor bases) The main objective of this step is to find three types of factor bases: (1) rational factor base, R, (2) algebraic factor base, A, and (3) quadratic character base, Q. The three factor bases are define as follows: The rational factor base R [1] A rational factor base is a finite collection of prime numbers, p i , up to some bound M , M ∈ N. i.e. R = {p : p is a prime and p ≤ M , M ∈ N }. Smooth over R [1] An integer l ∈ Z is said to be smooth over a rational factor base R if R contains all of the prime divisors of l. i.e. l = pi∈R p i .
The algebraic factor base A [1] An algebraic factor base is a finite set {a+bθ} ⊂ An element l ∈ Z[θ] is said to be smooth over an algebraic factor base A if The quadratic character base Q [1] The quadratic character base Q is a finite set of pairs (p, r) with the same properties as the elements of the algebraic factor base, but the primes p i ∈ Q are larger than the largest in the algebraic factor base, p i >p ∈ A wherep is the largest element in the algebraic factor base A, i.e.  Step 3: (Sieving) Find many pairs of integers (a, b) with the following properties: 1. gcd(a, b) = 1.
2. a+bm is smooth over the rational factor base.
3. a+bθ is smooth over the algebraic factor base.

4)
Step 4: (Linear algebra) The relations are put into relation sets and a very large sparse matrix is constructed. The matrix is reduced resulting in some dependencies, i.e. elements which lead to a square modulo n.

5)
Step 5: (Square root) • Calculate the rational square root, r, such that: • Calculate the algebraic square root, s, such that: • Then p and q can then be found by gcd(n, s− r) and gcd(n, s + r) where p and q are the factors of n.

III. WHY SIEVING STEP?
The main objective of this section is to give the importance of the sieving step. Previous studies shows that the sieving step is very important for several reasons: 1) The sieving step is the most time consuming, it takes more than 70% of the total time from the time of implementation as shown in Table I [14].
2) The second reason is that the sieving step can be parallelized easily.
The experimental studies show that there are some problems in the implementation that led to slow the previous parallel program. In the previous algorithm there are many communications between the master nodes and the slaves. The communication times increase when the size of n increases. Another cause for inefficiency is that each processor does sieving for different pairs. Therefor, the sieving time for each processor might be different. The master node can not start the next sieving until all the slave nodes finish their sieving [14].

IV. PREVIOUS SIEVING WORK
Algorithm 1 shows the steps of serial sieving. The sieving step uses nested for-loops, one for the values of b s and the other for the values of a s. In the outer loop, b ranges from −C to C, usually the values of b s are in range 1 ≤ b < C. In the inner loop, b is fixed and a changes from −N to N . The sieving step takes long time because it uses two loops and the values of a and b are usually very large.
for (a = a 1 ; a < a 2 ; a + +) do  [12]- [15]. The basic idea of the proposed algorithm is that each processor takes a range of b s values and generate a set of (a, b) pairs as shown in algorithm 2.

V. THE NEW METHODS
The main objective of this section is to describe the new methods for the parallel sieving step of GNFS algorithm. The new methods improve the parallel sieving algorithm by decreasing the communications between the master node and the slaves. In the following sections (V-A, V-B) we explain the new methods and the results of each method.

A. The first method
The main idea of the first method is to divide the range of b between the processors. This is because, each b in the outer loop generates a set of ordered pairs (a, b) independently of the others b s. So, each processor takes a range of b values and generates a set of (a, b) pairs and then saves in a local file, see Fig. 1. When all processors finish the computations of finding their sets of ordered pairs (a, b), the master node copy all the files that have the sets of (a, b) pairs from the slaves into one file.

B. The Second Method
The main idea of the second method is the same as the first method, it depends on dividing the range of b between the processors. Except that each processor takes a range of b values and generate a set of (a, b) pairs and then save it in an array of large size (rels), see Fig. 2. Then each slave, find the rels of different sets of ordered pairs (a, b) for each b in the range belonging to this slave, then the slave will send the rels to the master, and the master node receives all the sets of rels from the slaves. This process will be repeated until we reach the last b in the range belonging to this slave.

VI. HARDWARE AND SOFTWARE PROGRAMMING ENVIRONMENT
The parallel GNFS program is implemented on a Bibliotheca Alexandrina (BA) Supercomputer which is located in Alexandria library, Alexandria, Egypt. The supercomputer is a high performance computing cluster with performance reaching 11.8 TFLOPS. It is composed of 130 computational nodes, 6 management nodes including two batch nodes for job submission (64 Gbyte RAM), inter-process Communication network, and 36-TByte storage. Each node has two Intel Quad core Xeon 2.83 GHz processors (64 bit technology), 8 Gbyte RAM, 80 Gbyte hard disk, and a GigaEthernet network port.
The parallel code is based on the serial code developed by C. Monico in [3]. The program is written in ANSI C and compiled by GNU C compiler (gcc) and run under Linux operating system. We have used MPI library to write the parallel program. MPICH1 [16] is installed for MPI library. Also we installed a free library GMP [17] which is required to compile and to run the program.

A. Test Cases
In order to test our parallel algorithm for speedup and efficiency, we choose different n and different number of processors. In Table II shows all test cases and number of processors which are used.

B. Timing Results
The time for the first method and the second method for each test case is shown in Fig.3.
The Fig.3 show that the ruining time decreases by increasing the number of processors. From Fig.3 the first

C. Speed-Up
Speedup is defined by the following formula: Tp , where p is the number of processors, T 1 is the execution time of the sequential algorithm, and T p is the execution time of the parallel algorithm with p processors. The speedup for the test cases using different number of processors for the first method and for the second method are presented in Fig.4.
From Fig.4 the second method is better than the first method when n is small number, the first method is better than the second method when n is large number.

D. Sieving Efficiency
Efficiency is defined as E p = Sp p = T1 pTp . It is a value, between zero and one. The sieving efficiency for each test case is shown in Fig.5.
From Fig.5 the first method is better than the second method using small number of processors, otherwise is approximately equal.

VIII. DISCUSSIONS
We propose two algorithms for sieving step in cluster system. The difference between them is that one generate a set of (a, b) pairs and then save it in local file for each processor, the other strategy is to generate a set of (a, b) pairs and then save it in an array of large size then send the set of (a, b) pairs to the master.
The experimental studies show that the ruining time decreases by increasing the number of processors. From Fig.3 the first method is faster than the second method when using small number of processors, otherwise they are approximately equal. Fig.4 shows that the speed-up for the second method is better than the first method when n is small number, the first method is better than the second method when n is large number. Fig.5 shows that the efficiency for the first method is better than the second method using small number of processors, otherwise the efficiency is approximately equal.
There are still open questions and some research points which can be studied, in future, such as: 180 | P a g e www.ijacsa.thesai.org