Performance Evaluation of Blowfish Algorithm on Supercomputer IMAN 1

Cryptographic applications are becoming increasingly more important in today’s world of data exchange, big volumes of data need to be transferred safely from one location to another at high speed. In this paper the parallel implementation of blowfish cryptography algorithm is evaluated and compared in terms of running time, speed up and parallel efficiency. The parallel implementation of blowfish is implemented using message passing interface (MPI) library, and the results have been conducted using IMAN1 Supercomputer. The experimental results show that the run time of blowfish algorithm is decreased as the number of processors is increased. Moreover, when the number of processors is 2, 4, and 8, parallel efficiency achieves up to 99%, 98%, and 66%, respectively. Keywords—Blowfish; Encryption; MPI; Supercomputer


I. INTRODUCTION
As we are moving to information society, where information can travel fast, and through various modes of communication in what called the global village, it has become more apparent that the same information can end up in the wrong hands either by mistake, or with the intention to harm.Hence for secure communication required cryptography algorithms.Several cryptography algorithms have proposed like AES, DES, 3DES, RC2.Among these algorithms is blowfish cryptography algorithm.
The encryption algorithms are usually divided into two types: Symmetric key encryption (private) and Asymmetric key encryption (public), in Symmetric key encryption or secret key encryption, only one key is used to encrypt and decrypt data, the key should be distributed before start sending between entities.The Symmetric key cryptography algorithms include Blowfish , AES , RC2, DES, 3DES,and RC5.
In Asymmetric key encryption or public key encryption, private key and public key are used, Public key is used for encryption and private key is used for decryption [2].
The aim of this paper is to implement and evaluate the performance of parallel blowfish algorithm in terms of execution time, speedup, and parallel efficiency using Message Parallel Interface (MPI) on supercomputer IMAN1.The Iman1 is the first Jordanian supercomputer with high performance computer resources.It is used not only from inside Jordan also in the region for academic purposes.It is using 2260 PlayStation3 devices.IMAN1 is Jordan's first and fastest High Performance Computing resource, funded by JAEC and SESAME.It is available for use by academia and industry in Jordan and the region [8].
Parallel and distributed computing systems are highperformance computing systems that spread out a single application over many multi-core and multiprocessor computers in order to rapidly complete the task.Parallel and distributed computing systems divide large problems into smaller sub-problems and assign each of them to different processors in a typically distributed system running concurrently in parallel [9][10] [11] [12] [13] [14].
The rest of this paper is organized as follows; Section II presents the related works.Section III reviews the blowfish algorithm, Section IV presents the experiments and results, and Section V presents the conclusion.

II. RELATED WORK
One of the very important functional features of cryptographic algorithms is cipher speed, this feature is significant in case of block ciphers since they usually work on large data sets, there are many researches and studies to increase the speed of encryption algorithms using parallel implementation.
In [1] the author describes the parallelization process of the encryption algorithm and he use eight Quad-Core(32) Intel Xeon Processors 7310 Series -1.60 GHz, from experimental result he showed that the application of the parallel encryption algorithm for multiprocessor would considerably improve the time of the data encryption and decryption, he believed that the speed-ups received are satisfactory.
In [6] the authors demonstrated the way of implementing blowfish cryptography algorithm on graphical processing unit (GPU ) which can be used for parallel computing to improve the performance of the algorithm ,the experiment shows improvement in performance of GPU in encryption and decryption of large files, they observed also that even if input file size increases, average encryption and decryption time can be reduced using GPU.
In [7] the authors present high throughput blowfish architecture , it integrate pipeline technique to break critical path delay and increase speed , the result show that the architecture of blowfish algorithm provide better performance due to parallel execution of the algorithm on FPGA.
In our research we implemented the blowfish algorithm on parallel platform (supercomputer) which is different from the above researches in it is architecture and the number of processors used.www.ijacsa.thesai.org

III. BOWFISH ALGORITHM
In 1993 Bruce Schneier, one of the world's leading cryptologists, designed the Blowfish algorithm and made it available in the public domain, blowfish is a variable length key, blowfish is also a block cipher with a length of 64 bit , and has not been cracked yet, it can be used in hardware applications due to its compactness [3][5] [7].There are two parts for this algorithm; a part that addresses the expansion of the key and apart that addresses the encryption of the data.

A. Key Expansion
The key Expansion of blowfish algorithm begins with the P-array and S-boxes with the utilization of many sub-keys, which requires pre-computation before data encryption or decryption.The P-array consists of eighteen 4 byte sub-keys: P1, P2…P17, P18.
Blowfish with keys up to 448 bits length is transformed into several sub-key arrays. Encrypt all zero string by blowfish algorithm using sub keys described in step (1, 2).
 Change P1 and P2 with the output of step (3).
 Using the modified sub keys encrypt the output of step (3).
 Change P3 and P4 with the output of step (5).
 This process is continued, until the entire of the P-array and 4 s-boxes are changed [5].Applying the blowfish cryptography algorithm, the encryption process of the message "HI world" is passed through the following stages as shown in Fig 1 .As follows:

B. Encryption/Decryption Process
 The blowfish is a block cipher algorithm of size 64 bit for the block.
 The message "HI world" consist of 7 character + 1 space which equal to 8 byte (64 bit ).
 First divide the message into 32 bits, the left 32 bits which represent "Hi w" are XORed with the first element of P-array (P 1 ) (which generated from key expansion) and create a value called P 1 '.
 Run the result in point 3 (P 1 ') through a transformation function called F In which the 32 bit is divided into 4 bytes each one uses as indices to a value in the S-boxes a ,b ,c ,d as shown in Fig 2.
 The first two values from S-box 1,2 are added to each other then Xored with the third value from S-box3.
 The result is added to the value from S-box4 to produce 32 bit.
 The output of the transformation function F is XORed with the "right" 32 bits of the message "orld" to produce F 1 '.
 F 1 ' replace the left half of the message and P 1 ' replace the right half.
 The process is repeated 15 more times with successive members of the P-array.
 The resulting P 16 ' and F 16 ' after 16 round are then XORed with the last two entries in the p-array (entries P 17 and P 18 ) and recombined to produce the 64 bit cipher text of the message "HI world ", a graphical representation of function (F) appears in Fig 2.

IV. EXPERIMENT AND RESULTS
In order to evaluate the performance of the blowfish algorithm in parallel platform, experiment method was used, the experiment conducted on IMAN1 super computer, and on sequential platform as a reference to compare the results from parallel platform with it.
In this section, the results are evaluated in term of encryption time, speed up and parallel efficiency, we run the program many times then calculate the average encryption time and record it in table II.
The hardware and software specification with implementation parameters are listed in Table I. A. Encryption Time Evaluation Fig 3 .show the run time for the blowfish algorithm , using 1 processor (sequential ) , when the plaintext size increased the time of encryption is increased. When the number of processors increase the encryption time decrease, due to the data distribution among the processors, and this is clear when we are moving from 2 -4-8-16-32 processors.
 When we use 64 and 128 processors the time of encryption decrease slightly and sometimes remain the same , or more than 32 processors this is due to the increase in communication overhead , and it became more than the computation overhead.

B. Speedup Evaluation
The speed up is the ratio between the sequential time and the parallel time  The speed up increase when the number of processors increase, and the increments is the same for all plaintext size for processors 2-8.
 When the number of processors =16 the speed up for 8 Mbyte and 40 M byte is the same while the speed up for 160 Mbyte is the best.
 When the number of processors = 32, 64,128 the speed up is the best for 160 Mbyte plaintext and after that 40 Mbyte then 8 Mbyte , respectively.So the speed up on large size of data is better when use number of processor 32,64,128.

C. Parallel Efficiency Evaluation
Parallel Efficiency is the ratio between the speed up and the number of processors Fig 8 .shows the parallel efficiency for blowfish algorithm on different plaintext size 8 M byte , 40 M byte , 160 M byte on different number of processors.

Figure 1
Figure 1 illustrate the Blowfish algorithm.

Fig. 3 .Fig. 4 .
Fig. 3.The time of encryption for sequential platform for different plaintext size.

Fig 4 .
Fig 4. illustrate the encryption time according to different number of processors , we chose 6 different data size from 4 Mbyte up to 160 Mbyte which cover small and large data size, we note from the figure the following:

Figure 5 ,
Figure 5,6 show the encryption time for different plaintext size 4,8,16 Mbyte and 40,80,160 Mbyte for different number of processors.

Fig 7 .
Illustrate the speed up for 3 different plaintext size 8 M byte, 40 M byte , 160M byte, From Fig 8, we note the following:

Fig. 7 .
Fig. 7.The speedup of the blowfish algorithm on different number of processors and different plaintext size.

Fig. 8 .
Fig. 8.The Efficiency of the blowfish algorithm on different number of processors and different plaintext size.