A Study of Encryption for Multimedia Digital Audio Security

—Driven by the development of multimedia, the encryption of multimedia digital audio has received more attention; however, cryptography-based encryption methods have many shortcomings in encryption of multimedia information, and new encryption methods are urgently needed. This paper briefly introduced cryptography and chaos theory, designed a chaos-based encryption algorithm that combined Logistic mapping and Sine mapping for confusion and used a Hopfield chaos neural network for diffusion, explained the encryption and decryption process of the algorithm, and tested the algorithm. It was found that the keys obtained by the proposed algorithm passed the SP800-22 test, and the correlation between the three encrypted audio and the original audio was 0.0261, -0.0536, and 0.0237, respectively, all of which were small, and the peak signal-to-noise ratio (PSNR) values were -0.348 dB, -7.645 dB, and -3.636 dB, respectively, which were significantly different from the original audio. The NSCR and UACI were also closer to the original values. The results prove that the proposed algorithm has good security and can encrypt the actual multimedia digital audio.


INTRODUCTION
The dissemination speed of multimedia information is increasingly accelerated with the development of computer technology [1]. Relying on the Internet, mobile terminals, etc., digital images, video, audio, and other multimedia information is generated and transmitted all the time, which facilitates people's communication and exchange and also brings new challenges to information security. Multimedia information is mostly transmitted and stored in public environment, and under the influence of network, it spreads faster and wider, and the danger of information leakage is also greater [2]. Encryption can effectively improve the security of multimedia information, so multimedia encryption has also become an important content [3]. At present, many methods have been applied in the encryption of texts and images; however, compared with them, audio has greater redundancy and higher relevance, so the traditional text and image encryption methods are not applicable to audio. Therefore, encryption for digital audio has become a common concern for researchers [4]. Singh et al. [5] compared the performance of dynamic advanced encryption standard (AES) and standard AES for audio encryption and analyzed the quality of the algorithms by histogram, correlation, etc. Babu et al. [6] converted audio data to image data, studied the encryption and decryption of audio using a fractional order hyperchaotic system, and verified the security of the system through analysis. Wang et al. [7] proposed an encryption method using a chaotic system and deoxyribonucleic acid (DNA) coding and found that the method performed well in multichannel audio processing through comparative experiments on different types of audio. Zaid et al. [8] proposed two chaos-based permutation algorithms: Arnold cat mapping and Baker mapping. The experiments showed that both algorithms can provide reliable security, but in most cases, Arnold cat mapping performs better. At present, there are still many challenges in multimedia digital audio encryption, and the security of existing methods cannot meet such encryption needs yet. Therefore, in order to find out a more suitable encryption method for multimedia digital audio, this paper designed a chaos-based method and proved the reliability of the method through experiments. This work provides a new method for the research of multimedia digital audio encryption and also provides theoretical support for the in-depth research of multimedia information encryption. This paper first briefly introduces cryptography and chaos theory in Section II. It describes the encryption and decryption method based on Logistic mapping, Sine mapping, and Hopfield chaotic neural network designed in this paper. Section III presents the experiments on the proposed encryption and decryption method used to prove its security for multimedia digital audio encryption and decryption. Section IV is the conclusion section, which briefly summarizes and reflects on the research of this paper.

A. Cryptography and Chaos Theory
A simple password system generally consists of several components, as shown in Fig. 1.  582 | P a g e www.ijacsa.thesai.org As shown in Fig. 1, the plaintext is the original message to be encrypted, written as . The ciphertext is the encrypted message, written as . It is assumed that there is an encryption algorithm , then the encryption process is written as: ( ) . Let the decryption algorithm be , then the decryption process is written as: ( ) .
For audio information with high redundancy and high correlation, traditional encryption algorithms, such as AES and DES [9], are unable to encrypt effectively. Chaos contains characteristics such as ergodic, unpredictable, and random, and it can be applied to encryption to get good results [10]. In the Devaney's definition of chaos [11], for mapping in the metric space , if it is chaotic, then the following conditions are satisfied: Chaos is usually determined using the Lyapunov exponential method [12]. In a one-dimensional chaotic system, there exists an orbit: ( ) ( ) . A perturbation is added to . After -step iterations, the resulting perturbation is written as: The Lyapunov exponent is written as: When , it means that the orbit is sensitive to the initial value, i.e., it is a chaotic orbit.
Classical chaotic systems include the following types.

1) Logistic mapping [13]:
( ), where is the number of iterations and is the system bifurcation parameter, ( ). When , the system is in a chaotic state.

B. Audio Encryption Algorithm
One-dimensional chaotic mapping is simple in the chaotic system. In order to improve the security of chaotic encryption, this paper proposes an improved method, i.e., combining two one-dimensional chaotic mappings. Logistic mapping has the problem of uneven data distribution, and the same defect also exists in the Sine mapping. Therefore, they are combined to obtain the Logistic-Sine-coupling mapping (LSCM), and the corresponding equation is: .  When ( -, the system is in a chaotic state.
With the continuous development of neural networks, their applications in fields such as artificial intelligence are becoming more and more widespread, and neural networks also carry the chaotic characteristics. Hopfield neural networks are enough to meet the requirements of cryptography and have good performance in encryption [17]. It is divided into two types, discrete and continuous. The discrete type is used in this paper, and its expression is: where is the weight matrix. The three-dimensional Hopfield neural network with high operational efficiency and a good chaotic state is called Hopfield chaotic neural network (HCNN), and the corresponding equation is: .

 
In multimedia digital audio encryption and decryption, the LS mapping is used to perform confusion operation on audio, and then HCNN is used to generate diffusion sequence. First, the encryption process is as follows: 1) The original audios from the left and right channels are read and denoted as two sets of audio ( ).

2)
Hash operation is performed on the original audios to get hashed value : ( ).

3)
The key generation process is as follows.
is a function that converts a hexadecimal hash code to a decimal number, and is the number of iterations.
where is the bitwise exclusive OR function and and are the chaotic sequence obtained by HCNN. 8) To further improve the encryption performance, , , and are combined two by two for three times of diffusion to obtain the final encrypted speech and complete the encryption of the audio.
The decryption process of multimedia digital audio is as follows: 1) The encrypted audio is read.
2) Initial values are obtained using LSCM and HCNN in accordance with the same steps as encryption to get chaotic sequences and needed for decryption.

3)
and are used to obtain decrypted diffusion sequences , , and .
4) The encryption process is reversed to perform decryption diffusion on the encrypted audio, followed by confusion. Finally, the decrypted audio is obtained.

III. AUDIO ENCRYPTION ALGORITHM SECURITY ANALYSIS
Experiment was carried out in Windows 10 environment, 3.4GHz processor, and 4G RAM. In the chaotic system, the value of was set as 3.707 and 3.808, respectively, and initial values , . The audios to be tested were all in wave format. The first three audios, named audio1.wav, audio2.wav, and audio3.wav, came from the Internet, and the other three audios came from THCHS-30 voice library [18]. Audios in THCHS-30 voice library were collected in a quite office environment at a sampling frequency of 16 kHz, the total duration of those audios was 30 hours, and the sampling size was 16 bits. Three audios were randomly selected from the library for experiments, named audio4.wav, audio5.wav, and audio6.wav. Taking audio1.wav as an example, the result of encryption and decryption using the proposed method is shown in Fig. 2.   Fig. 2 shows the original audio waveform of audio1.wav, and Fig. 3 shows the audio waveform obtained after audio1.wav was encrypted. It was found from the comparison between Fig. 2 and 3 that the encrypted audio did not have similarities with the original audio and was not associated with the original audio, which showed that the audio encryption method was effective and could encrypt the audio well. Fig. 4 shows the audio waveform obtained after decrypting the encrypted audio. The comparison between Fig. 2 and 4 showed that the correct original audio was obtained after decrypting using the proposed method, which proved the usability of the method.
First, the randomness of the key was tested using 15 items in the SP800-22 test package from National Institute of Standards and Technology (NIST) test, and the randomness was judged by the P value. The higher the P value, the stronger is the randomness. The results of the key test are displayed in Table I.    It was seen from Table I that the keys generated using the proposed method could pass the SP800-22 test, and the P values were all greater than 0.01, indicating that the keys had good randomness and were suitable for encrypting multimedia digital audio.
The correlation coefficient reflects the correlation between two data. If there is a small correlation coefficient between the encrypted audio and the plaintext audio, it means the less similarity between the plaintext and the ciphertext. The correlation coefficient is calculated as follows: where ̅ and ̅ are the mean values of and . The correlation coefficient of the audio before and after the encryption by the proposed method was calculated, and the results were compared with Mohamed's method [19], as shown in Fig. 5.
It was observed in Fig. 5 that the correlation between the six encrypted test audios and the original audio was small, and the coefficients were 0.0261, -0.0536, 0.0237, 0.0227, -0.0577, and 0.0219, respectively. Compared with Mohamed's method [19], the audio correlation before and after encryption by the method proposed in this paper was smaller, indicating that the similarity between the ciphertext and the plaintext was lower, i.e., the method was safe.
The peak signal-to-noise ratio (PSNR) reflects the quality of signal compression. The larger the value of PSNR, the better is the quality of signal compression, and the closer to the original audio. Conversely, if the PSNR value of the encrypted audio is smaller, it means that it is more different from the original audio. The PSNR calculation formula is: where and are the width and height of the audio, and are the original and encrypted audio. The PSNR obtained by the method proposed in this paper was compared with the results in Tamimi's study [20] and Liu's study [21], as shown in Fig. 6.  It was observed in Fig. 6 that the PSNR of the six audios were -0.348 dB, -7.645 dB, and -3.636 dB, which were small, and the PSNR was 4.373 dB in Tamimi's study [20] and 4.530 dB in Liu's study [21]. The PSNR values obtained in this paper were smaller; indicating that the audios encrypted by the method proposed in this paper had higher security and was more resistant to attacks.
Finally, the performance of this method against differential attacks was analyzed based on the indexes of the number of samples changes rate (NSCR) and the uniform average change intensity (UACI). The following equations are: where ( ) is the encrypted audio, ( ) is the encrypted audio with one original audio sampling data randomly changed, and is the sign function. When the audio signal was 8 bit, the ideal values of NSCR and UACI were 100% and 33.33%, respectively. The average values were taken after several tests and compared with the results in Soliman's study [22] and Shah's study [23], and the results are shown in Fig. 7. It was observed in Fig. 7 that compared with Soliman's study [22] and Shah's study [23], the NSCR obtained by the proposed method was always above 99.99%, which was closer to the ideal value (100%), and the UACI obtained by the proposed method was 34.8542%, 33.5628%, 34.2587%, 34.5515%, 34.3637%, and 34.6987%, which was closer to the ideal value (33.33%). These results verified the performance of the chaos-based audio encryption method in resisting differential attacks.
It was concluded from the above experimental results that the method proposed in this paper had a good encryption and decryption performance for multimedia digital audio, the encrypted audio files were not similar to the original files, and the original audio was well recovered after decryption. From the security point of view, the key obtained by the method had good randomness and passed the SP800-22 test. Then, from the comparison of different indicators, the experiments on six different audios revealed that the correlation between the audio before encryption and after decryption obtained by the method was very small, and the PSNR was also significantly smaller compared with the results in other literature, suggesting good resistance to attacks. The experimental results prove the superiority of the method for multimedia digital audio encryption and the reliability of the encryption method combining different chaos methods and further verify the usability of chaos theory for multimedia information encryption.

IV. CONCLUSION
This paper designed a chaos-based encryption method for the encryption of multimedia digital audio, combined LSCM with HCNN to realize the encryption of digital audio, and analyzed its security. It was found that the key obtained by the proposed method could pass the SP800-22 test, with good randomness, and the encrypted audio had less correlation with the original audio (below 0.03), smaller PSNR value, above 99.99% NSCR value, its UACI was closer to the ideal value (33.33 %), and its resistance to differential attacks was strong. The method can be further applied in practical multimedia digital audio encryption. However, there are also some shortcomings in this paper, such as the small scale of experimental data and no practical application. In future research, further studies can be conducted in hardware implementation and encryption system design to understand the operability of the method in a practical environment.