High-Speed and Secure Elliptic Curve Cryptosystem for Multimedia Applications

The last few years witnessed a rapid increase in the use of multimedia applications, which led to an explosion in the amount of data sent over communication networks. Therefore, it has become necessary to find an effective security solution that preserves the confidentiality of such enormous amount of data sent through unsecure network channels and, at the same time, meets the performance requirements for applications that process the data. This research introduces a high-speed and secure elliptic curve cryptosystem (ECC) appropriate for multimedia security. The proposed ECC improves the performance of data encryption process by accelerating the scaler multiplication operation, while strengthening the immunity of the cryptosystem against side channel attacks. The speed of the encryption process has been increased via the parallel implementation of ECC computations in both the upper scaler multiplication level and the lower point operations level. To accomplish this, modified version of the Right to Left binary algorithm as well as eight parallel multipliers (PM) were used to allow parallel implementation for point doubling and addition. Moreover, projective coordinates systems were used to remove the time-consuming inversion operation. The current 8-PM Montgomery ECC achieves higher performance level compared to previous ECC implementations, and can reduce the risk of side channel attacks. In addition, current research work provides performance and resources-consumption analysis for Weierstrass and Montgomery elliptic curve representations over prime field. However, the proposed ECC implementation consumes more resources. Presented ECCs were implemented using VHDL, and synthesized using the Xilinx tool with target FPGA. Keywords—Elliptic curves cryptosystem; performance; binary method; projective coordinates; security applications


I. INTRODUCTION
Elliptic Curve Crypto-system (ECC) is a type of public key cryptosystems that depend on the discrete logarithm problem for elliptic curves. It has been introduced by Miller and Koblitz in 1985 [1,2]. Since that date, it has been widely used in many security applications due to its reliability and efficiency. By using much shorter key length, ECC can provide equivalent security level to that obtained by other asymmetric ciphers such with consuming less time and resources, which made it very efficient for multimedia applications that need to provide the security services for huge amount of data in the shortest possible period of time and, of course, with the least amount of resources consumed.
A variety of ECC representations over GF(p) and GF (2 n ) were presented and used for different elliptic curves applications. First, ECC represents the paintext as a point on an elliptic curve. Then, it encrypts the plaintext by performing a number of arithmetic operations over finite fields. ECC computations can be categorized into upper and lower layers. The upper layer's computations are mainly point doubling and point addition operations, which are performed by the scaler multiplication operation. It is worth mentioning here that the scaler multiplication is the key operation in ECC encryption process. On the other hand, lower level of computations includes addition, multiplication, and inversion operations. The latter is the most time-consuming operation [3].
Previous research works focused on improving the performance and security of ECC encryption. These works studied several possible techniques to accelerate the encryption process by speeding up scaler multiplication operation as well as increasing the cryptosystem immunity against side channel attacks such as simple time (STA) and simple power (SPA) attacks. The major performance improvement techniques include the use of projective coordinates to avoid the costly inversion operation and the parallel implementation of ECC arithmetic computations, especially in the lower level [1][2][3][4][5].
The current research utilizes both the use of projective coordinates and the inherited parallelism in ECC computations, and perform these computations in parallel for both upper and lower computational layers. This is achieved by using parallel hardware components, which are multipliers and adders. In addition to performing the lower level computations in parallel, this study implements the upper layer's operations in parallel to achieve higher speed for encryption process. In particular, proposed ECC performs the two main operations (point In order to enable parallel implementation of point doubling and point addition operations, the current ECC uses modified version of the Right to Left binary algorithm (RLA), which is widely used to perform scaler multiplication. The modified RLA has the ability to apply both operations in parallel once required. This is obtained by assigning an independent variable to save the point operations result of certain iteration of the RLA and use it for point addition in the next iteration. To accomplish optimum performance, proposed ECC uses eight parallel multipliers .
The parallel ECC implementation can be realized by using parallel hardware components for hardware implementations, and the known multithreading technique for software implementations [6][7][8]. This research uses hardware implementations because they are faster and more secure than software implementation for ECC.
Several projective coordinates systems are studied and implemented in this research to achieve greater speed for ECC encryption. Moreover, the main ECC representations over GF(p) are implemented to study their characteristics in terms of performance and security. It should be mentioned that proposed ECC implementation strengthen the security against SPA and STA.
The core contribution of this research is represented by developing a novel and fast ECC using modified version of the RLA. The proposed ECCs utilize the maximum parallelization levels for both Montgomery and Weierstrass curves to achieve optimum performance.
Results showed that the use of proposed RLA to implement ECC operations improved the speed of the encryption considerably. Such ECC designs and implementations are highly recommended for securing applications that need a high-speed ECC, such as multimedia applications.
The remaining parts of this article are the background and related works, ECCs computations and architectures, Results and analysis, and conclusion.
II. BACKGROUND AND RELATED WORKS ECC is a public key cryptography that can be used to provide different security services including confidentiality of data transmitted over communication networks. ECC supper passes other public key cryptosystems because it can provide equivalent security level with using much shorter key size, which represents considerable improvement in terms of performance and resources consumption [1,6].
Scaler multiplication is the main operation in ECC encryption, and it consists of two operations; point doubling and point addition. Several algorithms were used to perform scaler multiplication. It can be noticed that scaler multiplication algorithms use iterative approach to perform point operations, which leads in the end to converting the plaintext to the ciphertext [1,[7][8][9]. ECC computations performed during scaler multiplication are called upper level computations. The Binary method, the Non Adjacent Form (NAF), and the Montgomery ladder are the main algorithms used to apply ECC scaler multiplication. These algorithms vary in performance and security. The RLA is a form of the binary algorithm which is intensively used for ECC encryption due to its security advantages and the ability to withstand against side channel attacks [10][11][12].
The RLA, scans the binary bits of the key and always performs point doubling operation regardless the value of the key bit. If the value of the bit is one, the point addition operation is performed as well. Therefore, the RLA assumes that, on average, the point addition operation is executed half times of point doubling during the entire scaler multiplication. However, previous ECC that uses standard RLA implements point operations sequentially, which increases the critical path delay of scaler multiplication [13,14].
Point doubling is the dominating operation in scaler multiplication. In this operation, the elliptic curve point is added to itself, while in point addition, two different points are added. Computations performed within each point doubling and addition operation are called lower level of computations, which represent arithmetic operations over finite fields. Those operations are mainly modular multiplication, addition, and inversion operations [2][3][4][5][6].
The inversion is the most time-consuming operation in elliptic curve cryptography. Previous researches suggested to use projective coordinates instead of affine coordinates to eliminate the inversion operation by converting it to consecutive multiplications. This played important role in reducing the time delay of scsler multiplication operation. Several projective coordinates systems were presented including homogenous, López-Dahab, and Jacobean coordinates systems [5,6,[14][15][16][17][18].
Since its first introduction in 1985, different elliptic curves forms over GF(p) were presented [1][2][3][16][17][18][19][20]. Some curves such Montgomery, and Tripling Oriented curves have less computational complexity in comparison with other curves such as Weierstrass and Binary Edwards. Thus, particular types of elliptic curves can reduce the time delay for the scaler multiplication operations. This made the curves with lower computational complexity more suitable for security applications that require high-speed cryptosystem [21][22][23].
The majority of researches in this field focused on hardware implementations of ECC since they are faster and more secure than software implementations. Researchers studied the effect of using the different projective coordinates systems with the main forms of elliptic curves. The majority of previous research works studied the performance and resources-consumption of Weierstrass curve over GF(p) [7][8][9].
Researchers found that the homogenous projection system (X/Z, Y/Z) accomplished the highest performance levels when implemented with many GF(p) elliptic curve representations; including [8,11,[12][13][14][15][16][17][18][19][20][21]. On the other side Jacobean and López-Dahab coordinates showed better performance when used with other types of elliptic curves. The performance or speed of ECC is usually estimated by the number of sequential In addition to using projective coordinates, researchers used specific hardware components such as Montgomery multiplier to increase the speed of ECC operations [8].
However, the majority of previous researches studied the use of usual serial implementation of ECC computations, in which only one multiplication or addition operation is performed in every level of computations. Although, serial ECC implementation consumes the least possible resources, it cannot satisfy performance requirements for applications that need to provide the confidentiality of many data streams simultaneously as in multimedia applications [20][21][22][23]. A highperformance cryptosystem is in demand to satisfy the requirements of many emerging cloud services technologies such as the frame work presented in [31], which aims to provide secure environment for cloud infrastructures that allows to serve clients in efficient and secure way.
Another study developed hardware implementations for Weierstrass ECC over GF(2 n ). The presented fast ECC is suitable for smart card implementations [25]. In [26], researchers introduced Weierstrass ECC, which uses the López-Dahab projective coordinates to eliminate the costly inversion operation.
In [24], authors developed Weierstrass ECC design that can support both types of finite fields. However, the results obtained from proposed ECC showed that it needs extra memory resources.
Recent research studies proposed parallel implementations of ECC computations to increase the speed of encryption process. Actually, this technique improved ECC performance, but with consuming more resources [10,[13][14][15][16][17][18][19][20][21][22][23]. It should be highlighted that the inherited parallelism in elliptic curve computations make it possible to perform lower level operations in parallel manner, which shortens the time delay of the scaler multiplication [19].
In addition to improving the performance, parallel ECC implementation can strengthen the security of the cryptosystem against the STA and SPA. Furthermore, using parallel ECC designs considerably enhance the area×time-consumption 2 (AT 2 ) factor [20][21][22][23].
Authors is [10] developed parallel hardware designs for Weierstrass ECC over GF(p). Homogenous projective coordinates were used to remove inversion operation. The ECC implementation with four parallel multipliers (4-PM) consumed the least time for the encryption process. However, with consuming more resources compared to usual serial ECC implementation. Similar parallel ECC implementation over GF(2 n ) was presented in [14]. This study found that using Jacobean projection with Weierstrass curve obtained shorter time-delay compared to other projective coordinates systems.
Another ECC implementation that uses Weierstrass curve over GF(2 n ) was proposed in [15]. Although it provided considerable trade-off between performance and power consumption, the presented ECC implementation has low system utilization level, which is necessary for efficient ECC [10].
Many ECC design introduced in previous researches [10,12,14,15] worked on the standard Weierstrass elliptic curve representation, while there are many elliptic curve representations that have not been sufficiently studied. Furthermore, parallel ECC implementations in the aforementioned studies consume additional resources and area. Therefore, they are not appropriate for applications with limited resources.
Other research works provided different ECC design choices by using variable degree of parallelism to mitigate the problem of resources consumption. In other words, researchers studied and implemented all possible ECC design schemes that provided significant trade-off between performance and resources consumption. These research efforts aim to introduce a variety of ECC designs that can satisfy the requirements of several security applications in terms of performance and resources-consumption [17][18][19][20][21][22][23].
A number of parallel ECC design schemas over GF(p) were introduced in [71]. Proposed designs used variable degrees of parallelism for ECC computations. Experimental results showed that the 4-PM design achieved the shortest time delay for ECC point addition, which is estimated by four SMs. Other researchers [19] studied the use of variable degrees of parallel designs to increase the speed of point doubling operation. Experiments of both researches found that Weierstrass ECC accomplishes the highest performance level when implemented using four PMs and homogenous coordinates system. Other presented design schemes in [17,19] provide important tradeoff between area and performance, which could be useful for a variety of elliptic curves applications.
Few research works investigated the parallel implementation of ECC forms. In [22] authors analyzed the performance and resources-consumption levels of Montgomery elliptic curve over GF (p) when implemented using different projective coordinates and parallel hardware designs. A set of design choices for Montgomery ECC were presented. Experimental results illustrated that the 2-PM design accomplished the best trade-off between area and performance, where the 4-PM ECC obtained the highest speed for encryption process.
Similar research works studied the characteristics of Tripling Oriented ECC when implemented using different degrees of parallelism [23].
Authors in [20] and [21] used similar methodology to study the performance and cost features of Binary Edward and Edward elliptic curves respectively. Experiments showed that the greatest speed level of Edward ECC can be reached using the 5-PM design. On the other side, Binary Edward ECC accomplished the shortest time delay when implemented using the 7-PM design. Homogenous projective coordinates system represented best choice for both forms since it involves less arithmetic computations than other projections. www.ijacsa.thesai.org The vast majority of the previous studies used the known Binary method to apply the ECC scaler multiplication. It should be mentioned here that recent few researches investigated the use of other algorithms to process ECC point operations.
Authors in [29] used the NAF algorithm and parallel hardware designs the perform ECC upper level of computations. The results of that research proved that the use of NAF algorithm to perform Montgomery ECC operations can reduce the time delay of the encryption process considerable in comparison with usual Binary method. These outcomes are supported by the fact that NAF algorithm requires performing less number of point addition operations during the scaler multiplication.
Another ECC implementation that uses NAF algorithm was reported in [30]. Researchers studied possible improvement for both Edward and Binary Edward ECCs using parallel hardware designs as well as the NAF algorithm for scaler multiplication. It was found that NAF algorithm can enhance the performance of the encryption process compared to parallel ECC implementations using Binary method. However, the Montgomery ECC presented in [29] remains ahead of the later ECCs developed by [30] in terms of performance.
In [32], researchers presented secure ECC suitable for compact devices by using a modified addition formula for usual RLA. Although proposed method needs less space complexity, the time delay of the encryption process was increased because of using affine coordinates. This makes it unable to satisfy the performance requirements for many web applications. Another promising research work tried to accelerate ECC computations using a modified approach for the Montgomery ladder algorithm [33]. However, the proposed algorithm did not utilize inherited parallelism in the different levels of computations in the encryption process, which is essential to develop high speed crypto processor. Moreover, ECCs presented in [32][33] are vulnerable to side channel attacks.
The research works toward improving the speed and security of ECC needs to give more attention to the development of fast and secure algorithms that benefit from the fact that encryption's computations can be implemented in parallel in both the upper and lower levels of operations.
The current research work proposes a modified version of the RLA that enables parallel implementation of the upper level of computations for ECC represented by point addition and point doubling operations. In addition, proposed ECC implementations utilize the inherited parallelism of the lower level of computations for ECC represented by arithmetic computations over GF(p). This research analyzes the performance and resources consumption level of the major types of elliptic curves, which are Weierstrass, Edward, and Montgomery curves. The use of different projective coordinates systems is also investigated to find the most suitable ECC design for applications that need highperformance cryptosystem such as multimedia applications.
The next section presents the research methodology followed in this study to obtain the research outcomes.

III. RESEARCH METHODOLOGY AND METHODS
The research methodology used to conduct this study is depicted in Fig. 1.
It can be noticed from Fig. 1 that the research methodology of the current study is divided into three phases, as follows:  Preparation of elliptic curve computational schemes,  Improving the speed of ECC's lower level of computations, and  Improving the speed of ECC's scaler multiplication (upper level of computations).
In the first stage, the main research efforts focused on studying the key ECC representations over GF(p). According to security, and computational complexity features, three forms of elliptic curves were chosen for implementation in the current research. These forms are Weierstrass, Edward, and Montgomery curves.
In addition to selecting the elliptic curves forms, the use of different projective coordinates systems was investigated. In particular, the computations of point doubling and point addition operations for each ECC form were performed using different projection. The use of projective coordinate avoids the need for the time-consuming inversion operation. Inversion is converted to a number of multiplications. This contributes in reducing the time delay of the lower level of computations. The projective coordinates implemented in this study are homogenous, López-Dahab, and Jacobean systems. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 121 | P a g e www.ijacsa.thesai.org The last step in the first stage involves analyzing the performance or time consumption level for the ECC computations. Such analysis aims to find the most efficient projective ECC that has the least computational complexity. This is essential for the process of developing a high-speed ECC for different security applications.
At the second stage of this work, researchers performed ECC's lower level of computations using different degrees of parallelism in order to reduce the time delay for finite fields arithmetic included in each point doubling and addition. The ECC design or computational scheme that gives the shortest possible time delay for each ECC form is then implemented using parallel hardware designs. Performance and resourcesconsumption features for ECCs are thoroughly analyzed and studied.
It should be mentioned that the performance of proposed ECCs is estimated using the number of sequential multiplication levels consumed during point doubling or point addition operation. On the other side, resources consumption is estimated by the number of parallel hardware components needed for each design.
In the final stage of this research, a modified version of RLA is used to apply scaler multiplication. This modification allows to perform point addition in parallel to point doubling via saving the results of the previous RLA's iteration in a separate variable, and then it is added to the current elliptic curve point, while performing the point doubling computations.
In other words, researchers parallelized the ECC's upper level of computations to accomplish the maximum speed for encryption process. Such ECC implementation increases the performance but needs more resources to apply ECC computations.
Proposed ECCs will be first studied theoretically by observing the parallel computational schemes and estimating the time and resources consumption levels. Then, presented ECCs are implemented using the Xilinx tool to find out, more accurately, the time and resources consumed for encryption process.
In addition to performance, other factors that are important to determine the efficiency and appropriateness of ECC are investigated. Those factors include the area × time (AT), area × time 2 (AT 2 ), and hardware/system utilization.
The next section of this research illustrates the computations and hardware designs for proposed ECCs.

IV. COMPUTATIONAL SCHEMES AND DESIGNS FOR HIGH-SPEED ECCS
This section presents the equations of the main ECC forms studied in this research as well as the points computations involved in ECC encryption process.
In addition, this section shows computational schemes for each curve and the parallel hardware designs needed to perform ECC computations.
The modified version of RLA used in scaler multiplication is illustrated as well.

A. ECC Points Computations
The two main operations in ECC scaler multiplication are point doubling and point additions. The key difference in the computations of the two operations is in finding the slop (m), which is the derivation of elliptic curve equation.
For point addition the calculation of m is similar for all ECC forms over GF(p), while in point doubling the slope differs from one form to another. Equations required to compute the slope for point doubling and point addition operations are presented in (1) and (2), respectively. For more information about calculating the slope for ECC point operations, the reader may refer to [27][28][29].
In point doubling, the slope (m) = It can be noticed that elliptic curve points are represented by x and y coordinates. In point doubling (3), the elliptic curve point is added to itself where in point addition (4), to different points are added. The result is a third point P3(x3, y3) on an elliptic curve, which can be computed as follows: This research studies two major forms of elliptic curves, which are Weierstrass, and Montgomery curves represented in equations (5), and (6), respectively. These curves are very important because the Weierstrass curve is the most used form for ECC cryptographic operations, while the Montgomery curve is distinguished from its counterparts in terms of the degree of complexity for ECC computation, which may lead to the development of high-speed cryptosystem.
where a and b belong to GF (p) and 4a 2 + 27 and b ≠ 0.
Using projective coordinates, the elliptic curve points are represented by three coordinates, which are x, y, and z.
In Montgomery ECC, the result of point doubling can be calculated as follows: The slope M= 3X 2 +2aXZ+Z 2 2bYZ X 3 = (2bYZ)[(3X 2 + 2aXZ + Z 2 ) 2 − 8b 2 Y 2 ZX] www.ijacsa.thesai.org The result of point doubling operation for Weierstrass is another point represented by the three coordinates x3, y3, and z3, as follows: It can be noticed that the slope (M) depends on the elliptic curve equation, while in point addition the calculation of the slope does not relate to the elliptic curve equation. Therefore, point addition computation is similar for all curves using the homogenous projection, and can be computed as follows: ] − Y 1 Z 2 * (X 1 Z 2 − X 2 Z 1 ) 3 } 3 = 1 2 * ( 1 2 − 2 1 ) 3 In order to find the result of point doubling and point addition for each curve, a number of arithmetic operations, such as modular multiplication and addition over GF(p), should be performed. The required computations for point doubling and addition are called computational scheme for certain ECC and they play important role in determining the performance of the cryptosystem.
The next section presents the computational schemes for proposed Weierstrass and Montgomery ECCs as well as the hardware designs needed to implement them.

B. ECC Computational Schemes and Hardware Designs
The computational scheme for ECC is the series of modular multiplication, addition, and subtraction operations required to perform point doubling and addition operations and hence calculating the values of x3, y3, and z3 presented in the previous section.
Usually, ECC computations are implemented sequentially using hardware components, which are multipliers (M) and adders (A). In this research, all possible parallelization levels for performing ECC computations were investigated to find out the parallelization level that obtains the shortest time delay. As can be noticed from Table I, the performance of ECC implementation is estimated by the number of sequential multiplication (SM) and sequential Addition (SA) levels required to perform point doubling operation. It is worth mentioning that the time consumed by SAs is neglected compared to SMs. So, the later will be the main factor used to estimate the time-consumption of ECC.
It is obvious that the 4-PM implementation for Montgomery ECC achieves the shortest time delay estimated by three SMs and three SAs. Similar performance was reported by Weierstrass ECC implementation but with using higher degree of parallelism, and hence more resources. In particular, the 5-PM Weierstrass ECC implementation can get comparable performance level to that obtained by Montgomery curve using 4 PMs.
Theoretical results showed that the use of more than 4 PMs for Montgomery ECC has no impact on the performance level. The same applies for the 5-PM ECC implementation for Weierstrass curve. In other words, the performance level is saturated at this degree of parallelism and cannot be improved further by adding more parallel Ms and As.
Experiments showed that the use of parallel hardware components is controlled by the inherited parallelism in ECC computations for both curves. For example, no additional parallel computations for Montgomery ECC point doubling can be performed in the first SM level, which involves five parallel multiplication operations. In order to start the second level's computations, the crypto processor needs to obtain the results of the first level, and so on.
Although other ECC implementations presented in Table I requires less resources because of the use of less number of hardware components, it seems that they consume greater time to perform point doubling. This research focuses on ECC implementations that score the highest possible performance level.
Similarly, Table II presents the theoretical results for studying the performance and resources consumption for ECC point addition. It is worth remembering here that the computations of point addition are similar for all ECC forms.  Other presented ECC implementations represent a considerable trade-off between performance and timeconsumption. This could benefit the purpose of developing efficient ECC for different security applications in accordance with required speed and the limitations on available resources. This research concerns the high-speed ECC implementations appropriate for multimedia security applications. Note that only the hardware designs that provide the shortest time delay are introduced, which the 4-PM and 5-PM designs for Montgomery and Weierstrass ECCs, respectively.  It should be highlighted here that the time consumed by one SM level is equivalent to the time consumption of one multiplication operation regardless how many multiplications are performed in that level. For example, the first level SM level in Weierstrass ECC involves five multiplications. In despite of that, it consumes similar time to that needed for one multiplication operation. This represents a great benefit in terms of performance that can be achieved using parallel hardware implementation for ECC.  (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 9, 2020 124 | P a g e www.ijacsa.thesai.org The results of point addition can be obtained after performing four SM and four SA levels.
As can be seen from Fig. 2, 3, and 4, the parallel implementation of the lower level of computations for ECC can improve the performance greatly. In addition, the next section presents the modified version of RLA, which allows the parallel implementation of upper level of computations for ECC. This contributes in accelerating the scaler multiplication even further.

C. Modified Implementation of RLA
This section introduces an implementation of a modified version of known RLA for scalar multiplication, which is the main operation in ECC encryption process. The current proposed implementation allows to perform the upper level computations, represented by point doubling and point addition operations in parallel once required during scalar multiplication. The modified version of RLA is depicted in Fig. 5.
As can be noticed from Fig. 5, the inputs to the RLA are as follows:  P which is a point on an elliptic curve E. The point P represents the original plaintext.
 K is the scalar represented by a series of binary bits (from 0 to n-1). N is the key length.
Outputs of RLA is another point (Q), which represents the encrypted message. Q is the result of scalar multiplication operation ([k]P). At the beginning of the algorithm, three elliptic curve points are defined, as follows:  R0 = O, which is used to save the result of point addition during the scalar multiplication.
 R1 = P, which is used to save the result of point doubling during the scalar multiplication.
 J = P, which is additional elliptic curve point used for temporary saving the value of R1 of previous iteration in scalar multiplication.
The RLA depends on the iterative approach and starts by scanning the binary bits of k from right to left.
For each bit in k, if the value of ki equals one then both point addition and doubling operations are performed. Otherwise, only the point doubling operation is performed.
Note that the variable point J is used in each iteration to save the value of R1 obtained from the previous iteration of scalar multiplication. Then, J is used to perform the point addition operation in the current iteration. Thus, the point addition does not require the current value of R1 which is being calculated by point doubling operation. In other words, both operations become independent and the can be processed in parallel manner.
The use of proposed modified version of RLA allows to perform the upper level computations of ECC in parallel to reduce the time delay of scalar multiplication and hence improve the performance of the cryptosystem.
In order to implement point doubling and point addition operations in parallel, extra hardware resources are needed. In particular, eight PMs are required for Montgomery curve. The 8-PM ECC design allows to perform the four parallel multiplications included in each point doubling and point addition as illustrated in Fig. 2 and 4. If the value of ki is zero, only the point doubling is performed which makes four out of eight multipliers idle as this iteration. On the other hand, the Weierstrass ECC needs nine PMs to implement point operations in parallel. Five PMs for point doubling and four PMs for point addition as clarified in Fig. 3 and 4. The 8-PM and 9-PM designs accomplish the highest performance level for Montgomery and Weierstrass ECCs, respectively.
It should be mentioned here that the time complexity of proposed RLA is estimated by O(n), where n is the size of inputs key or the number of binary bits of the key. This means it takes linear time complexity, which can satisfy required performance level for many applications.
Another advantage of using the proposed ECC is that it can increase the immunity of the cryptosystem against side channel attacks such the known STA. Each iteration of the modified RLA consumes approximately similar time regardless the value of the binary bit of the scalar k. This is because both operations are conducted in parallel if the value is one. So, the attacker cannot reveal the value of the binary bit ki in each iteration by analyzing the time consumed. In this way the secret key, represented by the binary bits' vector of the scalar k, is kept confidential and cannot be exposed to the attacker by using the SPA. www.ijacsa.thesai.org On the other side, usual ECC implementations that use serial design implements point doubling and addition operations sequentially. This enables the attacker to trace the time consumed by certain iteration of the scalar multiplication operation and thus determining whether one or two operations have been performed by that iteration. Thus, it is possible to infer the values of the key bits by observing and analyzing the time consumption of RLA's iterations.
As mentioned previously, the current research work aims to develop secure and high-performance ECC that can encrypt/decrypt huge amount of data within the least possible time.
The next subsection describes the hardware implementation environment for proposed ECCs.

D. Implementation Environment
This research work seeks to develop high-speed ECC using hardware implementation. Thus, experiments conducted in this study investigates possible hardware design choices and analyzes the time delay and resources consumption for proposed hardware implementations. For this purpose, this study uses the popular hardware description language VHDL to implement proposed designs. The code for each proposed ECC is written in VHDL and then simulated using the ModelSim tool for validation purposes.
Authors used the standard carry save multiplier and carry save adder to perform field arithmetic operations for elliptic curve.
In order to simulate and synthesize suggested ECC implementations and obtain performance and resourcesconsumption results, researchers also use the Xilinx tool with the target FPGA (Field Programmable Gate Array) chip family chosen to be virtex5 (XC5VLX30).
The next section presents and analyzes the implementation results for proposed Montgomery and Weierstrass ECCs.

V. RESULTS AND ANALYSIS
This section discusses the performance and resources consumption results for proposed ECCs in the current research.
Researchers observed the architectures of the proposed ECC implementations for Weierstrass and Montgomery forms. It can be noticed from previously presented hardware designs in Fig. 1 and 3 that such design requires eight parallel multipliers to implement the Montgomery ECC point doubling and point addition operations in parallel manner. In particular, four parallel multipliers are needed for each operation.
The Weierstrass ECC designs presented in Fig. 2 and 3 needs nine parallel multipliers; four for point addition and five for point doubling.
The required hardware resources, the estimated time consumption for point doubling and point addition as well as the overall time delay for the scalar multiplication are shown in Table III. It also shows the AT and AT 2 cost factors.
It should be mentioned here that time consumption for proposed ECCs is estimated in terms of the number of SM levels consumed by each point operation. The overall time consumption is calculated using the following equation: Overall time= (No. SMs for point doubling) + (0.5 × (TIME of POINT ADDITION) (7) The RLA assumes that point addition happens in half the number of the binary bits for the scalar (k). So, the time of point addition is multiplied by 0.5 in the above equation.
Remember that point doubling and addition are performed in parallel and the point addition consumes one SM level more than point doubling. Therefore, if the binary bit of k equals one, the time consumed is equivalent to the time of point doubling plus one SM level. The time delay of one SM is about one third the time allocated for point doubling operation as can be seen from Fig. 2 and 3. It can be estimated by (0.33 × TIME OF POINT DOUBLING). Therefore, equation (7) can be written as follows: Overall time= (No. SMs for point doubling) + (0.5 × (0.33 × (No. SMs for point doubling))) (8) Table III shows performance and area consumption features for the proposed Montgomery and Weierstrass ECCs. This research uses the modified version of RLA to apply point doubling and point addition operations in parallel as depicted in Fig. 4. The 8-PM ECC for Montgomery curve scores the shortest time delay, estimated by 3.5 multiplication cycles. It also achieves the best AT 2 results compared to other ECC implementations presented in Table III. It is worth mentioning that the AT 2 factor focuses more on the time delay and it should be considered very important for developing high-speed ECC. Results showed that proposed 9-PM Weierstrass ECC achieves similar performance level but with consuming more resources.
It can be noted from Table III that proposed Montgomery and Weierstrass ECCs outperform their counterparts that use normal RLA in terms of performance. However, they consume more resources to enable the parallel implementation of the upper level of computations for the scalar multiplication operation. This research provides hardware implementation for proposed ECCs using the modified RLA. Implementation results show acual performance and resources consumption levels for Montgomery and Weierstrass ECCs when implemented using 8-PM and 9-PM designs. Obtained results are then compared with the corresponding ECC implementations that use usual RLA.
The performance is the most important factor to be considered when developing a high-speed ECC. It is assessed using the time required to perform the scalar multiplication operation (TKP), which can be computed through the following four steps of calculations. In the first step, we need to calculate the time consumption of one multiplication (TM) operation by using the following equation: TM= (cycles/bit) × m ×clockperiod (9) where m is the key size. The current research experiments implement proposed ECCs using the key sizes of 256 bits, 512 bits, and 1024 bits.
For example, by using equation (9), the TM can be computed as follows: In the third step, we compute the time of one inversion operation (TINV), which is required to convert the projective coordinates back to the affine coordinates. The time of one inversion is equivalent to the time of three SMs [20][21][22][23][24]. Hence, the TINV can be computed as follows: TINV= 3 * TSM= 6972.672 n sec Finally, in the last step the total time consumption of the scalar multiplication (TKP) is computed via the following equation: TKP= ((256× 0.5 × TADD) + (256 × TDBLE) + TINV) × 10 -6 (11) Remember that using proposed parallel implementation for point doubling and point addition the TADD can be estimated one SM (TSM), which is equivalent to (0.33 × TDBLE), approximately. Therefore, equation 11 can be rewritten as follows: TKP= ((256× 0.5 × (TSM)) + (256 × TDBLE) + TINV) × 10 -6 (12) The result of the equation (12) is multiplied by 10 -6 to convert the final value from nano to mille seconds. The performance of proposed Montgomery ECC equals 2.089 m sec. It overcomes the corresponding ECC implemented using normal RLA as well as the fastest known implementation using NAF algorithm introduced in [30]. ECCs are implemented using key sizes 256 bits 512 bits, 1024 bits, and 2048 bits as can be noticed from the table. Table IV shows a comparison between proposed Montgomery ECC and other ECC implementations using NAF algorithm and usual RLA presented in [30] and [22] respectively. It is obvious that proposed 8-PM ECC accomplishes the best performance results compared to other ECC implementations using the different key sizes. For security reasons, it is recommended to use key sizes of 2048 bits and above for encryption and decryption processes. Current implementation shows the time consumption of ECC encryption using different key sizes. It was noticed from the experiments that the time consumptions of encryption and decryption operations are comparable when using the same key size. The Montgomery ECC using NAF algorithm obtains higher performance in comparison with the corresponding ECC implementation that uses normal RLA.
It should be mentioned that Montgomery elliptic curve has less computational complexity compared to the majority of other elliptic curves forms. Therefore, it is highly recommended to use it when developing high-speed cryptosystem. www.ijacsa.thesai.org The 8-PM ECC introduced in this research improves the cryptosystem immunity against STA. Using such cryptosystem's architecture makes it difficult to identify the binary bits of the decryption key via tracing and analyzing the time consumed by each RLA's iteration. The parallel implementation of upper level's operations prevents the attacker from distinguishing the points operation performed in every iteration, and thus the key bit cannot be revealed.
For example, assume that the time-consumption of point doubling is equivalent to TD while the time of point addition is estimated by TA. In usual RLA, if the current key bit equals one the time consumed by the cryptosystem for the processing of the finite field arithmetic computations can be estimated by TD+TA, while it can be estimated by TD if the bit value is zero. The proposed ECC algorithm consumes approximately similar time regardless the value of the key bit. This prevents the attacker from tracing the time-consumption of ECC to find out the sequence of bits in the key.
However, it seems that proposed ECC consumes more area and resources than other ECC implementations presented previously. In despite of that, the additional cost can be afforded for the sake of developing high-speed and secure ECC that fits certain security applications.
Although the proposed 9-PM Weierstrass ECC achieves comparable performance results to that achieved by the 8-PM Montgomery ECC, it needs more resources. Fig. 6 shows a comparison between the performance levels of ECC implementations presented in Table IV. It can be clearly noted that proposed 8-PM Montgomery ECC using modified RLA outperforms its counterparts for all key lengths ranging from 256 bits to 2048 bits. Therefore, it represents considerable choice for the development of high-performance cryptosystem. Table V presents time-consumption comparison between proposed Montgomery ECC and the main ECC implementations found in previous research works. As can be noticed, the proposed high-speed ECC achieves better performance results for the encryption/decryption process than results reported in previous literature.
This highlight the importance of the parallel implementation of point doubling and point addition once needed in the modified RLA iterations.
The next section of this article presents the key conclusions and future research works.