Data Security: A New Symmetric Cryptosystem based on Graph Theory

Sharing private data in an unsecured channel is extremely critical, as unauthorized entities can intercept it and could break its privacy. The design of a cryptosystem that fulfills the security requirements in terms of confidentiality, integrity and authenticity of transmitted data has therefore become an unavoidable imperative. Indeed, a lot of work has been carried out in this regard. Although many cryptosystems have been proposed in the published literature, it has been found that their robustness and performance vary relatively from one to another. Adopting this reflection, we address in this paper the concept of block cipher, which is a major cryptographic solution to guarantee confidentiality, by involving the properties of graph theory to represent the plaintext message. Our proposal is in fact a new symmetric encryption block cipher that proceeds by representing plaintext messages using disjoint Hamiltonian circuits and then dealing with them as an adjacency matrix in a pre-encryption phase. The proposed system relies on a particular sub-key generator that has been carefully designed to produce the encryption keys according to the specifications of the system. The obtained experimental results demonstrate that our proposed cryptosystem is robust against statistical attacks, particularly the DIEHARD test, and presents both good confusion and good diffusion. Keywords—Cryptosystem; graph theory; hamiltonian circuits; adjacency matrix; block cipher; encryption


I. INTRODUCTION
Cryptography is a component of cryptology that is based on a number of methods and principles for converting a readable message to a totally unreadable one. This field is dealing with many security problems such as the confidentiality of communications via non-secure channels, the privacy of individuals, the data storing on unsecured mediums, and so on. Cryptography refers to the study and analysis of data encryption systems intended to reduce the impact of hackers and to prevent, as best as possible, any unauthorized attempts to gain access to these confidential data. The main principles of information security, notably confidentiality, integrity, authentication, and non-repudiation [1].
Confidentiality is a crucial part of security. It can be ensured by an encryption process, whereby the data becomes non-intelligible to any non-authorized parties trying to gain access to it. The idea behind of encryption process is to transform a plaintext into a ciphertext, so only authorized parties can obtain the message in its original format by reversing the encryption process, known as decryption. Technically, decryption should be extremely difficult for any unauthorized and unqualified parties attempting to perform it.
Over the years, cryptography has continued to be improved and has progressively became an indispensable part for private data sharing. All contributions dedicated to this field of research have aroused great interest. In the literature, cryptography can be classified into three categories: Symmetric Key Cryptography, which is an encryption system where both the transmitter and the recipient of the message use one common key such as DES [2], AES [3], or IDEA [4], to encrypt and decrypt the messages. The second category is Asymmetric Key Cryptography. In this system, a couple of keys(private and public keys) are used in order to encrypt and decrypt the messages such as RSA [5], ElGamal [6], Diffie-Hellman [7], etc. The last category is Hybrid key Crypthography, Which consist of using an encryption mode that utilizes both symmetric and asymmetric public key encryption. This method benefits from public key cryptography for key sharing and from the speed of symmetric encryption for message encryption. Nowadays, cryptology is able to handle a substantial set of mathematical tools, that allowed for improvements in terms of efficiency and performance. In particular, graph theory is a field that is considered very promising in this regard, since it provides concepts that could be useful in solving problems in every network related areas.
Graph theory in mathematics refers to the study of graphs, which are a major object of discrete mathematics. Generally, a graph is represented as a set of vertices linked by edges. They are thus mathematical structures used for modelling pair-wise relationships between objects. It can be found in road networks, electrical circuits, constellations, etc. Graphs provide a way of thinking that can be used for modeling a vast range of problems. They are the foundation of numerous computer programs that allow communication and advanced technological processes. The seven bridges of Konigsberg (1736) [8] is a mathematical problem well known for having established the foundations of the theory of graphs. Graph theory is a relatively new concept that has been successfully incorporated and has enabled the development of stronger encryption algorithms that have proven to be difficult to break, even for the latest software solutions. In fact, it consists of modeling encryption problems by graph representation, so that they eventually become problems in graph theory where the solutions are usually well-known. Although solutions to graph problems can be fairly easy and efficient (with respect to the time required for computational processing which is reasonable), they can also be rather difficult (relative to the processing time increases exponentially). This resulted in the application of concepts introduced in graph theory to largescale cryptography, since many NP-hard problems are derived from this theory.
Considering the above mentioned points, the design of cryptosystems based on the concepts of graph theory is of utmost importance. In this work, we present a new cryptosystem that takes advantage of the principles of graph theory, which enable a high degree of security while maintaining the performance of data processing. The main idea of our approach is to represent the plaintext with all disjoint Hamiltonian circuits as a pre-encryption phase, then using our own subkeys generator following the cipher block chaining mode of operation to encrypt the plaintext.
The rest of the paper is structured as follows. Section 2 presents preliminary knowledge. Section 3 presents a literature review of related work. Section 4 details the proposed scheme. Security analysis and experimental results are elaborated in Section 5, and lastly, the conclusion and future works is given in Section 6.

II. PRELIMINARY KNOWLEDGE
• Graph: A graph G is a set of points called vertices V and a set of lines called edges E that connect some vertices together. The graph is defined as a set of vertices and edges that form a pair G = (V, E).
• Simple graph: A graph in which each pair of vertices is linked by at the very most one edge and where no vertex has a loop.
• Undirected Graph: An undirected graph G is a pair (V, A) where V is a finite set of vertices and A is a set of unordered pairs of vertices. Also, loops are not allowed in undirected graphs.
• Cycle: A chain whose start and end nodes are the same and which does not use the same link more than once.
• Hamiltonian Path: A path that passes once and only once through each of the vertices of an undirected graph.
• Hamiltonian Circuit: simple cycle passing through all the vertices of a graph one and only once.
• Adjacency Matrix: Let G be an undirected graph with m vertices from 1 to m. We call the adjacency matrix of graph the matrix A = (a jk ) where a jk is the total number of edges joining vertex j to vertex k: a jk = w if and only if j and k are adjacent. a jk = 0 if not.
(1) with w is the weight of the edge (j, k).
• Blum Blum Shub (BBS): is a pseudo-random number generator first proposed in [9] x n+1 = x 2 n mod M With M = pq the product of two large primes p and q.
The complexity of the factorization of M is the main basis for the security of this generator, which means that the two primes must be carefully chosen to guarantee robustness.

III. RELATED WORK
The application of graph theory in cryptography has become more emergent. However various encryption techniques have been proposed in this context. A technique has been proposed by Amudha et al [10] that encodes clear messages through the Euler graph, the key used to protect the data in this approach is a kind of Hamilton circle. The authors in [11] sequentially construct three different graphs on the basis of an unconventional mapping, conjectured to be a one-way trapdoor function and designed specifically for graph structures. Some work focuses on the application of graph theory principles in computer networks and its potential to tackle the challenges of provisioning in secure cloud computing environments [12]. Two graph based public key cryptosystems have been suggested to secure sensitive Data in the work of Sensarma et al [13], where one is based purely on the properties of matrices, while the other is based on graph codes. In the work described in [14], the authors proposed a hybrid Cipher Block Chaining encryption system for e-mail protection. The suggestion was predicated on the integration of encryption technologies. Yousif et al [15], introduced a process to produce a new key on the basis of chaotic maps that are utilized to encode images. Within the work in [16], the emphasis is on the possibility of employing the Euler graph as a method object used in the remote method invocation (RMI) technique.
Among the most recent works, we mention the work presented in [17], where a block cipher system has been proposed using disjoint Hamiltonian circuits to present the data as a graph. Also in [18], a double vertex graph has been suggested to encrypt a word. At first, the given message was encrypted using the encryption table. The plaintext was then converted into a path graph. From the latter, a double vertex graph is constructed. We also mention the work [19], in which the original message is converted into several graphs. The ciphertext is obtained from the projection of the adjacency matrices representing the graphs into the secret key. A number of other proposals were suggested in the same thematic area [20], [21].
The originality of our work lies in the fact that our proposed system was able to blend both the concept of block ciphers, which is a major category of symmetric cryptography, and graph theory properties for representing plaintext, in particular Hamiltonian circuits, unlike the majority of works from the literature that rely only on graph theory properties to conceive their encryption systems.

IV. PROPOSED APPROACH
The primary aim that drives the system put forward in this paper, is to propose a robust variant of the encryption scheme proposed in [17] while maintaining the performance levels. The main concept on which our approach is based is inspired by the divide-and-conquer design method, which consists in dividing an initial problem into sub-problems and then addressing every component of the resulting subset independently. The final solution of the initial problem is then deduced from the solutions found to the sub-problems.
The system described in this work has been designed in such a way that it takes into account the complexity of www.ijacsa.thesai.org the processing that the plaintext messages are subjected to during their encryption process. Indeed, this is the objective of the contribution in this paper, which is to improve the processing of the plaintext by making it more difficult and more complex than [17], using mainly all the Hamiltonian circuits that represent the plaintext.
The scheme proposed in [17] used a block of 25-characters length, which can be represented by 2 disjoint Hamiltonian circuits in a graph of order 13, given that a graph of order 13 contains 6 disjoint Hamiltonian circuits (Theorem 1). In contrast to [17], which used only 2 of the 6 circuits, the concept put forward in this approach makes use of all the disjoint Hamiltonian circuits of the graph (6 circuits), which allows the representation of blocks with 78-characters length in a single graph.
Theorem 1: In a complete graph with n vertices there are (n − 1)/2 edge-disjoint Hamiltonian circuits, if n is an odd number strictly greater than 3 [22].
Considering a message which consist of 78 characters, the formula for splitting into blocks in [17] would be as follows: 78 = 25 × 3 + 3. This means that four blocks will be transformed into four adjacency matrices. The formula used in the proposed algorithm is limited to a single block, which in turn will be partitioned into six sub-blocks to form a single graph with six disjoint Hamiltonian circuits, thus forming a single adjacency matrix (FIG. 1 illustrates the difference between both systems).
Generally, the pre-encryption process of the plaintext message involves several steps: First the plaintext is converted into ASCII values and then divides into several blocks of size 78 (referring by Block i ). This operation uses the following formula: n = 78k +r, where n is the size of the plaintext, r ( r ∈ [0,77)] the remainder of the division of n over 78) represents the remainder of the plaintext after block partitioning, and k is the number of blocks (refers to the quotient).
If the division is exact.
Where k ′ represents the total number of blocks resulting from the division. Each Block i is partitioned into 6 sub-blocks of size 13(each sub-block is represented by subBlock ij ), which are then converted into Hamiltonian circuits where the weights of the edges of the graph G i are represented by the ASCII values of the characters that compose them. Finally, the resulting graph is converted into an adjacency matrix M i .
The main process involving in our proposed system are presented in the following:

A. Key Generation / Re-Generation Algorithms
The generation of the sub-key K i occurs in four steps. The first involves the random selection of a character Char from the Block i . The second step consists in using the position corresponding to the ASCII value of Char in two ways, to construct the vector of positions V P that is necessary for the decryption, as well as to recover the value N located in the same position in the master key KEK (of size 256), which will be used as the seed of the BBS generator. The third step in the process allows the generation of a vector S i of size 13 from N using BBS generator. The fourth and final step uses the resulting S i to generate the sub-keys K i as a square matrix of order 13. The all sub-keys K i (i = 0,. . ., k'-1) that are generated constitute the set SK k ′ . This process is illustrated in FIG. 2. The regeneration of K i during the decryption process begins with the use of V P to recreate a key Key of size 13 2 k ′ from KEK. Key is then divided into sub-vectors S i of size 13 which are subsequently used to generate the sub-keys K i as square matrices of order 13. This process is described in FIG. 3.

B. Encryption Process
The encryption process begins with the pre-encryption step described above. Cipher block chaining (CBC) is used as mode of operation in our approach. The chaining uses a feedback method, in the sense that the result of the encryption of the previous block C i−1 is reused for the purpose of encrypting the current block M i . More specifically, an exclusive or (XOR) operation is applied between the current block M i and the previous block of ciphertext C i−1 as shown below: A second XOR operation is then performed between the result of operation (3) and the sub-key K i generated by the pseudorandom generator to compute the cipher C i of the current block: Since the first block does not have an antecedent. We generate an random matrix referring to IM (initialization matrix) which allows to perform the XOR operation with M 0 . Each encrypted block consequently depends not only on the corresponding plaintext block, but also on all the encrypted blocks that precede it. The rows of the matrix C i are concatenated to form a vector eBlock i of size 13 2 , representing each encrypted block.
The resulting vectors eBlock i (i = 0,. . ., k'-1) generated from all blocks are then concatenated to form a single vector EM of size 13 2 k ′ . The encryption process, as shown in FIG. 4, ends with the transmission of the encrypted message EM in addition to the vector V P that is related to the decryption process.

C. Decryption Process
In general, the process of decryption corresponds to the process of encryption performed in reverse order (ALGO-RITHM 4). In the decryption process described in this paper, the ciphertext EM refers to the input of the algorithm. EM is decomposed into k ′ vectors (eBlock i ) and then gathered to constitute the set eBlockSet k . The eBlock i (i = 0,. . ., k'-1) are subsequently transformed into a matrix C i . The number of blocks k ′ is calculated as follows: The sub-key generation algorithm presented in ALGO-RITHM 2 makes use of the provided vector V P to produce a key of size 13 2 k ′ from the master key KEK. Each block C i (i = 0,. . ., k'-1) is decrypted using its own sub-key K i using the following formula: Where www.ijacsa.thesai.org and At this stage, the decrypted blocks M i are transformed into a graph G i and then into Block i . Finally, the plaintext message is formed by concatenating the Block i (i = 0,. . ., k'-1) as shown in FIG. 5.

V. SECURITY ANALYSIS AND EXPERIMENTAL RESULTS
The evaluation of the encryption system addressed the reliability of the suggested algorithm. For this purpose, we study the system reaction in terms of performance and security according to fundamental criteria. For which we also perform different statistical tests. This evaluation is described in the following sections.

A. Confusion and Diffusion Tests
Diffusion and confusion are very important as aspects of the functioning of a secure encryption which were first identified in 1949 by Claude Elwood Shannon [23]. In his original definitions: Confusion means making the relationship between key and ciphertext as complicated and as involved as is feasible, whereas in this case refers to the property that redundancy in the plaintext's statistics is "dissipated" in the ciphertext's statistics. Diffusion is related to the reliance of the output bits upon the input bits. In a cipher with proper diffusion, the changing of an input bit is expected to change every output bit with a probability of half (this is referred to as the strict avalanche criterion). Accordingly, the used equation (10) is: Where w is the hamming weight, C and C ′ are respectively the original and modified inputs, and the value 16 refers to the number of bits representing each element in the cipher.

B. Plaintext Sensitivity Test
The diffusion property is intended to produce an avalanche effect [24] between the plaintext and the encrypted messages. The sensitivity test of the bit change in the plaintext is used to verify the diffusion property of a particular algorithm. Given pairs of plaintext and secret keys, we generate the ciphertext corresponding to each pair (plaintext, secret key) through our cryptosystem, changing one or more bits (Knowing that a change at character level implies a change of bit) in the randomly generated plaintext, and by retaining the key unchanged. Subsequently, we calculate the average of the percentage of bit difference by the equation (10) as illustrated in the FIG. 6. Over 50% of the bits in the cipher text are changed. We can clearly see that the average of the percentages of bit difference is between 48.16% and 51% for our encryption system and between 47.1% and 50.80% for AES-128. These percentages demonstrate that our encryption system offers a good diffusion compared to AES-128.

C. Key Sensitivity Test
The confusion property establishes a relation between the key and the ciphertext. The key sensitivity test ensures this property. Indeed, we consider a set of pairs of plaintext and secret keys. Each pair is encrypted by applying the proposed cryptosystem. Then, we modify one or more bits in the different randomly generated keys while the clear text still fixed. Afterward we calculate the average of the percentage of bit difference by applying the equation (10).
FIG. 7 represents the results obtained by using our encryption system and our generator to produce the encryption keys. We can notice that more than 50% of the bits are modified. Specifically, the average percentage of bit difference is between 49.64% and 50.79% for our encryption system and between 48.25% and 50.73% for AES-128. Thus, according to the experimental results, it can be said that the key generation via our algorithm is more robust than AES-128.

D. Statistical Tests
In order to study the quality of the random generation of the suggested encryption block cipher, we apply the wellknown DIEHARD test [25]. The primary objective of this test is to demonstrate whether our cryptosystem is able to withstand statistical attacks. In other terms, the output of a secure block cipher must be indistinguishable statistically from a random output using the encryption function. To perform this test, a randomly generated cipher sequence is initially binarized to generate a bitstream of over 10 MB. Thereafter, the bitstream is analyzed statistically by putting it under the The results indicate that the bitstream generated with our encryption system successfully passed all DIEHARD tests. Moreover, our encryption system shows a satisfactory and statistically indistinguishable random behavior.

E. Brute Force Attack
Brute force attacks are a mean to get all possible key arrangements with a fast prediction tool. Assuming that a highperformance machine that spends 10 −10 seconds on testing the validity of every key is used, and that the numbers entailed in the master key range from 1 to 1000. Given that algorithm has 1000 256 potential keys. A brute force attack would require about 10 −10 × 1000 256 seconds to find the appropriate key. Therefore, a brute force attack with an exhaustive search over the key possibility space is not feasible in a reasonable amount of time.
To find a 78-character message when a single block is used, it normally takes 1000 possibilities to find one of the master key numbers, which will represent the seed of the BBS generator involved in producing the S 0 vector. However, the prime numbers used as the input parameters for the generator are not easily determined (due to factorization problems). As a result, it is nearly impossible to figure out the sub-key if the pq product is large enough. Table II represents the performance test of our cryptosystem, compared to other known block ciphers such as triple DES [26] and AES [27] in terms of their CPU time consumption. The computations are run on a computer with on an Intel Core i7-6600U processor, 64-bit OS, 2.81 GHz with 20 GB of RAM. It can be seen from TABLE II that our algorithm can achieve good results in terms of run time over the other standard encryption systems.

VI. CONCLUSION AND FUTURE WORK
The work presents a new cryptosystem that takes advantage of the principles of graph theory, which enable a high degree of security while maintaining the performance of data processing. Our proposed encryption block cipher using in particular the disjoint Hamiltonian circuits that have been adopted to represent the plaintext in a pre-encryption phase. the process makes use of a specific sub-key generator that has been set up to generate the encryption keys according to the requirements of the proposed system. We have performed different statistical tests, specifically the DIEHARD, confusion and diffusion tests to prove the security and performance of our cryptosystem. The experiments results proved the good behaviour of our proposed design in terms of robustness and CPU time compared to 3DES and AES. In a future work, we intend to use another pseudorandom generator, such as the one proposed in [28] known as PSOCA, which is mainly based on cellular automata, and we also investigate other properties of graph theory for a more discriminating and robust representation of the data.