Compressed Sensing Based Encryption Approach for Tax Forms Data

—In this work we investigate the possibility to use the measurement matrices from compressed sensing as secret key to encrypt / decrypt signals. Practical results and a comparison between BP (basis pursuit) and OMP (orthogonal matching pursuit) decryption algorithms are presented. To test our method, we used 10 text messages (10 different tax forms) and we generated 10 random matrices and for distortion validate we used the PRD (the percentage root-mean-square difference), its normalized version (PRDN) measures and NMSE (normalized mean square error). From the practical results we found that the time for BP algorithm is much higher than for OMP algorithm and the errors are smaller and should be noted that the OMP does not guarantee the convergence of the algorithm. We found that it is more advantageous, for tax forms (or other templates that show no interest for encryption) to encrypt only the recorded data. The time required for decoding is significantly lower than the decryption for the entire form


INTRODUCTION
The theory of compressed sensing, perfected in the past few years by prestigious researchers such as D. Donoho [1], E. Candès [2], M. Elad [3], demonstrates the feasibility of recovering sparse signals from a number of linear measurements, dependent with the signal sparsity.Compressed sensing (CS) is a new method which draws the attention of many researchers and it is considered to have an enormous potential, with multiple implications and applications, in all fields of exact sciences [1][2][3][4].Specifically, CS is a new technique for finding sparse solutions to underdetermined linear systems.In the signal processing domain, the compressed sensing technic is the process of acquiring and reconstructing a signal that is supposed to be sparse or compressible.
The perfect secrecy together with the secret communication is a well-defined field of research, being a difficult problem in the domain of information theory.One of the requirements for the information theoretic secrecy is to assure that a spy who listens a transmission containing messages will collect only small number of information bits from message.Additionally, it should provide protection against of an computationally unlimited adversary based on the statistical properties of a system.Shannon introduced the idea of perfect secrecy, in his fundamental paper [5].
An encryption idea by utilizing CS has been mentioned for the first time in [7], but not been addressed in detail [6].In paper [8], the secrecy of CS is researched, and whose result is that CS can provide a computational guarantee of secrecy.In [9] examine the security and robustness of the CS-based encryption method.In paper [10], the authors describe a new coding scheme for secure image using the principles of compressed sensing (CS) and they analyze the secrecy of the scheme.

A. Compressed Sensing
Compressed sensing studies the possibility of reconstructing a signal x from a few linear projections, also called measurements, given the a priori information that the signal is sparse or compressible in some known basis  .
To define sparsity precisely, we introduce the following notation: for  -a matrix whose columns form an orthonormal basis, we define a K-sparse vector , where N R   has K non-zero entries (i.e., is Ksparse) and K  as the set of K indices over which the vector The vectors on which x is projected onto are arranged as the rows of a nxN projection matrix  , n < N, where N is the size of x and n is the number of measurements.Denoting the measurement vector as y, the acquisition process can be described as: The equations system (1) is obviously undetermined.Under certain assumptions on  and  , however, the original expansion vector  can be reconstructed as the unique solution to the optimization problem (2); the signal is then reconstructed with (3).Note that (2) amounts to finding the sparsest decomposition of the measurement vector y in the dictionary  .Unfortunately, (2) is combinatorial and unstable when considering noise or approximately sparse signals.www.ijarai.thesai.orgFor a K-sparse signal, only -K+1 projections of the signal onto the incoherent basis are required to reconstruct the signal with high probability‖ [5].In this case, is necessary to use combinatorial search with huge complexity.In [1] and [2] is proposed tractable recovery procedures based on linear programming.In these papers is demonstrated that the tractable recovery procedures obtain the same results toward combinatorial search when for signal reconstruction are used aprox.3 or 4 cK projections.
Two directions have emerged to circumvent these problems:  Pursuit and thresholding algorithms seek a sub-optimal solution of (2)  The Basis Pursuit algorithm [1] relaxes the 0 l minimization to, solving the convex optimization problem (4) instead of the original.
The matrix  satisfies a restricted isometry property of order K whether there is a constant holds for all x with sparsity K.

A. Notions of secrecy and Model
In cryptography, -a secret key system is an encryption system where both sender and receiver use the same key to encrypt and respectively, decrypt the message‖ [11][12].
A conventional encryption scheme consists of five elements [13][14]: • Plain text: This is the original message or input information for the encryption algorithm.
• Encryption algorithm: This algorithm performs various substitutions and modifications to the clear text.
• Secret Key: This key is an input to the encryption algorithm.
• Ciphertext: The text resulting from encryption algorithm and it is depends on the plaintext and the secret key.Thus, for a given message, two different secret keys produce two different ciphertexts.
• Decryption algorithm: This algorithm is the inverse of the encryption algorithm.The decryption algorithm is applied with the same secret key to the ciphertext in order to get the original clear text.
Following two elements must be taken into account in order to achieve a secure encryption [15]: 1) The encryption algorithm should be very strong.If an attacker knows the encryption algorithm (encryption) and has access to one or more ciphertext, he cannot decrypt the ciphertext or find the secret key.
2) Both the transmitter and the receiver must obtain the secret key in a safe manner (on a secure communication channel) and to keep it secret.
Based on previous remarks, in Figure 1 (in the upper half) is shown the basic model for CS and it includes two major aspects: measurements taking and signal recovery.The measurements taking involve an encryption algorithm and signal recovery is associated with a decryption algorithm from the perspective of symmetric-key cipher.The relationship between CS and symmetric cryptography indicates that some possible cryptographic features can be embedded in CS.Secure key exchange channel www.ijarai.thesai.orgtransmitted to Bob.The recipient knows the key used for encryption of the message.Knowing i  and y, the compressed sensing literature provides conditions for x and i  to allow the recovery of the original message x.The classical example of secret message communication assumes that the Alice's encrypted message y is being intercepted by an eavesdropper named Eve.For the third person, the used key the message encryption is unknown.
In our case, the measurement matrix  can be selected from a set of keys that is known for the transmitter (Alice) and the permitted receiver (Bob).Each random measurement matrix  is generated with a seed which can be exchanged through a secure approach between two desired sides [16][17].
A computational encryption scheme is secure if the ciphertext has one or two properties:  The cost of breaking ciphertext is much higher than the encrypted information.
 The time needed for breaking ciphertext is longer than the lifetime of the information.
A brute force attack on the compressive sampling based encryption scheme would be guessing the linear measurement matrix i  .Thus, an eavesdropper, e.g.Eve, could directly try to do this by performing an exhaustive search over a -grid‖ of values for i  .But, the step size of this grid is critical because a too large step size may cause the search to miss the correct value and a too small grid size will increase the computational task unnecessarily.
The computational cost of signal reconstruction is high.For the best optimization algorithm (BP), the computational cost is in the order of ) ( 3 N O and a random search will make the search too expensive.

III. SIMULATIONS AND DISCUSSIONS
To test our method, we used 10 text messages (10 different tax forms) and we generated 10 random matrices.
To validate the decoding results, we evaluate the distortion between the original plaintext and the reconstructed plaintext by means of the PRD (the percentage root-mean-square difference), its normalized version (PRDN) measures and NMSE (normalized mean square error).
The percentage root-mean-square difference (PRD) measure defined as (6): is the reconstructed signal, and N is the length of the window over which the PRD is calculated.The normalized version of PRD, PRDN, which does not depend on the signal mean value, x , is defined as (7): x PRDN (7) The normalized mean square error (NMSE) measure defined as (8): Where  are the variance and MSE are mean square error measure.
Because our messages are text type, ie contain characters and numbers, we chosen to transform the messages in numerical signals based on the ASCII codes.
To use the identity matrix as decoding dictionary, the plaintext is necessary to be a sparse signal [18].Because our messages had not this property, we have modified them by artificial insertion of zeros, thus obtaining sparse signals.
We used random matrix for encryption and for reconstruction we used two different algorithms, and namely,  Basis pursuit algorithm (BP), known in the CS domain as the optimal algorithm in terms of errors [19][20] and  Orthogonal matching pursuit algorithm (OMP) known in CS domain for its speed far superior to BP [21].
The orthogonal matching pursuit algorithm (OMP) is an iterative greedy algorithm.In this algorithm, at each step, the dictionary element which has the maximum correlation with the residual part of the signal is selected.The Basis Pursuit algorithm (BP) is a more sophisticated approach comparatively with OMP.In case of the BP algorithm, the initial sparse approximation problem is reduced to a linear programming problem.
Generically, the greedy algorithms (such OMP) have the disadvantage that there are not general guarantees of optimality.The basis pursuit algorithm, namely the convex relaxation algorithms, has the disadvantage of high computational complexity, translated into large computing time [22][23][24][25][26].
To synthesize ideas, we present the encryption and decryption necessary steps, namely:  The message transformation into digital signal using extended ASCII code.This achieves a 1D digital signal. Encryption of sparse segments using a random matrix.Encryption is done by multiplying the signal sparse with a random matrix (  ), resulting a lower dimension signal than initially sparse signal.The signal thus obtained is not sparse.
 Transmission of the message text is achieved by transmitting the encrypted signals (ciphertext) on an insecure line.It is important that random matrix (encryption matrix representing the secret key) is not sent with the ciphertext; it should be sent on a secure line.Another variant is use case when there is an agreement between the transmitter and receiver to generate random matrices in the same way, for example, using the same random number generator which is started from the same initial conditions.
 Decryption of the message will be achieved using a greedy algorithm (either orthogonal matching pursuit (OMP), or matching pursuit (MP), or greedy LS etc.) or convex relaxation algorithm (basis pursuit (BP)).For decryption, it is necessary to know the following: random matrix encryption  , the encrypted message (the ciphertext) Y, and the base for sparsity  (in case of this paper, it is the identity matrix, due the fact that the message that was encrypted was a sparse signal).
 Because there is a decryption error which is very small, to return to the decrypted text, a decryption correction will be necessary.This correction consists in rounding of decrypted values to the nearest integer because the ASCII code is built from integers.4. Obiectul principal de activitate 5. Sediul/Datele de identificare a bunului pentru care se cedeaza folosinta 8. Data încetarii activitatii asociere fara personalitate juridica entitati supuse regimului transparentei fiscale individual 6.Documentul de autorizare/Contractul de asociere/Închiriere/Arendare 3. Forma de organizare: 2. Determinarea venitului net: comerciale profesii libere drepturi de proprietate intelectuala cedarea folosintei bunurilor operatiuni de vânzare-cumparare de valuta la termen, pe baza de contract transferul titlurilor de valoare, altele decât partile sociale si valorile mobiliare în cazul societatilor închise activitati agricole 1. Categoria de venit Venituri: cedarea folosintei bunurilor calificata în categoria venituri din activitati independente sistem real modificarea modalitatii/formei de exercitare a activitatii We have chosen to split the signal into segments of length 100 and to insert a number of 800 by zeros for each ASCII codes segment.This means that each plaintext sequence with length 100 was transformed into a sequence with length 900. Figure 4 shows the plot of sparse plaintext.We used for encryption a random matrix of size 500x900.This random matrix represents the secret key. Figure 5 show the ciphertext obtained a random matrix for encryption.Note that the ciphertext contains positive and negative numbers and it has a different length than the plaintext.OMP is an iterative greedy algorithm and selects at each step the column of  matrix which has the maximum correlation with the current residuals.A set of iteratively selected columns is built.The residuals are iteratively updated by projecting the observation y onto the subspace spanned by the previously selected columns.This algorithm has simpler and faster implementation toward similar methods.
The Basis Pursuit (BP) algorithm consists in finding a least L1 norm solution of the underdetermined linear system The both methods can be guaranteed to have bounded approximation solution of sparse coefficients estimation for the condition that the L0 norm of sparse coefficients is smaller than a constant decided by the dictionary [1].Because in the case of typical tax forms often it is required to encrypt only registration data and because the decryption time is higher for the completed form (data + template), we tested the proposed algorithm for encrypting data alone.In Figure 8 is an example of data belonging to the form shown in Figure 2.For a signal of dimension m with assumed sparsity s<<m, and a dictionary of N>>m atoms, computational costs for pursuits using general and fast dictionaries are: )) log ( Alternatives to BP (e.g., greedy matching pursuit) also have computational complexities that depend on N. Table 1 presents average results for 10 text messages and for 10 datasets from tax forms.The time for BP algorithm is much higher than for OMP algorithm and the errors are smaller.It should be noted that the OMP does not guarantee the convergence of the algorithm and for a smaller number of measurements; the results can be much worse for OMP comparatively with BP. Results depend on the number of measurements and on used decoding algorithm [24][25][26].

IV. CONCLUSIONS
In this paper, the perfect secrecy via compressed sensing was studied and discussed.We presented an analysis with practical results for tax forms as plaintexts.For decoding we used BP and OMP algorithms, and we presented a comparative analysis.The time for BP algorithm is much higher than for x 10 www.ijarai.thesai.orgOMP algorithm and the errors are smaller and should be noted that the OMP does not guarantee the convergence of the algorithm.According to average results from Table 1, it is more advantageous, for tax forms (or other templates that show no interest for encryption) to encrypt only the recorded data.The time required for decoding is significantly lower than the decryption for the entire form.

Fig. 1 .
Fig. 1.The relationship between CS and symmetric-key cipherThe classical example of communication of a secret message from Alice to Bob assumes that Alice must use key from the set of keys.In this paper, let be i a key chosen by

Fig. 5 .
Fig. 5.The ciphertextTo decode the ciphertext we tested two known algorithms from compressed sensing domain, namely, orthogonal

Fig. 6 .
Fig.6.Error for decoding with OMP, before decoding correction.In the bottom right corner there is a zoom for the first 600 samples

Figure 6
Figure 6 and figure 7 show errors for decoding with OMP, respectively BP algorithms.

Fig. 7 .
Fig. 7. Error for decoding with BP, before decoding correction For the OMP based decoding, where the original signal (plaintext) was sparse (had null values), null values were obtained after decoding.In case of BP decoding, the algorithm approximates all values and it failed to return null values for the null values from plaintext, but it returned values very close to zero.

Fig. 8 .
Fig. 8.The plaintext with registration data from tax form for measurements, s stands for sparsity.The popular basis pursuit algorithm (BP) has computational complexity Transforming of the signals (signals with length 100) in sparse signals by inserting a predefined number of zeros.The position of the zeros is random from one segment to another.
The segmentation of message or digital signal into segments of length 100.www.ijarai.thesai.org