Authenticating Sensitive Speech-Recitation in Distance-Learning Applications using Real-Time Audio Watermarking

Thispaper focuses on audio-watermarking authentication and integrity-protection within the context of the speech-data transmitted over the Internet in a real-time learning environment.The Arabic Quran recitation through distance learning is used as a case-study example that is characteristic of sensitive data requiring robust authentication and integrity measures. Thisworkproposes an approach for the purpose ofauthenticating and validatingaudio-data transmitted by a publisher or during communications between an instructor and students reciting via Internet communications.The watermarking approach proposed here is based on detection of the key patternswithin the audio signal as an input to the algorithm before the embedding phase is performed.The developed application could be easily used at both sides of the communication for ensuring authenticity and integrity of the transmitted speech signal and is proved effective for many distance-learning applications that require low-complexity processing in real-time. Keywords—Audio; Watermarking; Quran-recitation; Integrity; Authentication

The constraints that should satisfy an audio watermarking scheme depend on the application.The main constraints are: 1) Inaudibility: the watermark signal must not be perceived by the listener, 2) Robustness: the watermark must resist any change in the signal, since this change does not result in degradation quality, 3) Capacity: the capacity corresponds to the quantity of bits to hide in the host signal, 4) Complexity: in practice, most watermarking operations must be done in real time (especially in the case of the watermark detection / extraction processes), this factor should be as low as possible by maintaining a high robustness.Hence, any watermarking scheme should find an ideal compromise between inaudibility, capacity, complexity, robustness and security which is not easy to achieve.
The audio watermarking techniques can be classified into two categories: spatial domain and frequency domain.The www.ijacsa.thesai.orgspatial domain is the classic process where the watermark embedding /extraction will take place directly on the signal values and does not require any transform processing.The frequency domain is the space in which the signal will be considered as a sum of frequencies of different amplitudes by applying some transforms such as Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), Fast Fourier Transform (FFT), Singular Value Decomposition (SVD), etc.In this work, a technique based on DWT is proposed to identify the Quran reciter.
The objective of this work is to design and develop a digital audio-based encoding algorithm for sensitive speech (such as the Quran recitation case study used in this work) that embeds watermark data into the digital content whilst preserving the exact wordings, diacritics and Tajweed (quality of the pronunciation) sounds of the audio transmission.
The remainder of this paper is organized as follows: Section II provides the related work on audio-watermarking schemes and their classification.Section III provides the methodology and implementation for the proposed approach, and Section IV explains the integration of the various components and overall framework for the recognition system.Section V contains the analysis and results of the proposed framework, while Section VI provides a comparative study with other related works.Finally, Section VII concludes the paper.

II. RELATED WORK
The research in audio watermarking started well after many techniques have been developed for watermarking on different multimedia files such as images and text.Embedding data in audio is usually more difficult compared to images since the Human Auditory System is more complicated than the Human visual system.Recently, many watermarking techniques have been developed to address different research problems related to audio files; however, techniques for highly-sensitive audio-content still need to be developed for sensitive applications such as the identification of recitations of the Holy Quran, in which even slight modifications to the audio-data can render the recitation/file as invalid [2].The objective of this work is to develop an audio identification system for Quran reciters within the context of a distance-learning environment.
The patchwork technique first presented by Bender et al. [3]in 1996 was used on images.Statistical methods based on hypothesis testing had relied on large data sets.This method is usuallyapplied in a transform domain such as Fourier, Wavelet, ,… etc. to spread the watermarking in the time domain in order to increase robustness against any modifications [3][4][5].
Yee and Wei, implemented a non-blind two-channel timefrequency digital bits audio watermarking scheme with errorcorrecting code.The watermark bits are encoded with cyclic code before embedding it into the audio signal using timefrequency compression expansion technique with psychoacoustic model which decides on the coefficients to be deleted or added.Both channels of the stereo audio signal are used for watermark embedding.This combination of cyclic code and two-channel approach using the robust timefrequency technique of coding watermark bits has resulted in perfect recovery of watermark under attacks [6].
The work of Zhang et al. [7] dealt with the implementation of real-time audio watermarking techniques based on Digital Signal Processing (DSP).The implementation was illustrated using DSK5402.It uses qualitative watermark methods and fast Modulated Complex Lapped Transform(MCLT).The experimental results show the robustness and transparency of this technique [7].Other works based on different transforms such as: Gao et al. in [8] proposes an audio zero-watermarking algorithm based on FFT.The proposed algorithm provided a solution for the contradictions of imperceptibility and robustness.The algorithm shows effective resistance to different types of attacks and appears to meet the requirements of watermarking security.Xiong-hua and Wei-zhen [9], proposed an adaptive digital audio blind detection fragile watermarking algorithm based on a modified Discrete Fourier Transform (DFT) transform.
The work developed by Furon and Pierre proposes an asymmetric watermarking method which provides higher security level against malicious attacks used for copy protection purposes.This method is versatile, as it can be adapted to a large number of watermarking techniques based on Direct Sequence Spread Spectrum (DSSS).The method studied was applied to copy protection framework by analyzing the possible threats and estimating the complexity of each class of attacks.The proposed method shows that watermarked content only attack is not possible with this method which is seen to be a real threat to other techniques such as DSSS and Watermarking Costa's Schemes(WCS).The disadvantages of this method are that asymmetric detectors need more complexity, memory and accumulate large amount of content in order to make a reliable decision [10].
The work by Tavakoli proposes a watermarking technique for cover communication through the telephone system.This technique is suitable for theIntegrated Services Digital Network(ISDN) and the Public Switched Telephone Network(PSTN) networks that can be modified for mobile systems.It uses a direct sequence spread spectrum algorithm with perceptual modeling of the Human Auditory System for embedding watermark into audio signals.Experimental results show the watermark is robust against attacks such as, Additive White Gaussian Noise(AWGN), Low Pass Filtering (LPF), D/A and A/D conversion, A-Law or u-Law conversion and down sampling to 64 kbps.In addition, it is also robust against audio format conversions such as wave to mp3 [11].
The authors in [12] proposed an audio watermarking technique based on chaotic mapping and used DWT to extract the wavelet coefficients of the audio signal.Here, the detail wavelet domain is chosen to embed the watermark so that to achieve transparency and fragility; Principle Component Analysis(PCA) was used to help reduce the watermark information needed to be embedded.Therefore, the signal reconstruction was achieved by the extracted watermark and can accurately locate the tampered region in the time domain.The experimental results demonstrate the efficiency of the www.ijacsa.thesai.orgproposed method in terms of fragility, transparency and tamper localization.Dutta et al. proposed an audio watermarking method based on Biometrics, in which the biometric pattern of an iris is used to generate the watermark that has a stamp of ownership.The watermark is then embedded in the high-energy regions selectively, which makes the embedding process robust against cropping and synchronization attacks [13].
In Chen et al. [14], a fragile watermarking scheme was proposed that embeds watermark data into the principlecomponents of the detailed wavelet coefficients with blind extraction based on the fast independent component-analysis system.The proposed fragile authentication scheme had demonstrated excellent transparency and tamper-detection capabilities under a number of simulated attack-scenarios.
Zhao and Shen [15] presented a semi-fragile audio watermark algorithm for authentication that includes tamper detection capabilities.The experimental results had also confirmed its robustness against various signal-processing operations.The contribution in [16] describes an inaudible speech and watermarking algorithm which embeds copyright information into audio files as proof of ownership.In this study, the watermarking process was achieved using a cascade of the SVD and DWT transforms.A set of attack scenarios specified by the Stirmark benchmark for audio files were simulated.It was demonstrated that the embedding logo could be successfully extracted whilst remaining robust against most attacks being simulated.
Baranwal and Datta, presented a comparative study of spread spectrum based audio watermarking techniques [17].Er and Gul [18] presented a comparison of audio watermark techniques that can be used for source-origin authentication in real-time session initiation protocol (SIP) such as, Voice over IP (VoIP).The least significant-bit (LSB), DC-level shift (DCSHIFT), frequency-hopping spread spectrum (FHSS) andDSSS approaches were compared in terms of robustness and evaluation-times, complexity and capacity metrics.The results demonstrated the effectiveness of the FHSS and DSSS schemes for VoIPapplications that require sourceauthentication.
Other recent works found in literature include: The work of Kang et al. whoproposed a multi-bit spread-spectrum audio watermarking technique based on geometric invariant log coordinate mapping feature [19].Chen et al. proposed an optimization based audio watermarking technique based on DWT [20].Wang and Zhao proposed a synchronized invariant audio watermarking scheme based on DWT and DCT [21], Petrovic and Yang developed an audio watermarking in the compressed domain [22].Zhao and Shen developed an audio watermarking algorithm for audio authentication [23].Al-Haj et al. proposed a hybrid DWT-SVD audio watermarking [24].For further details on some of the above techniques or others the reader may refer to the following surveys published on this topic: [1], [25 -27].
In an attempt to classify audio watermarking techniques found in literature, but not limited to [1 -27], the authors developed the classification tree, Figure 1, of potential Audio-Speech Watermarking Techniques considered for the Holy Quran.Table 1 provides the general classification of audio watermarking techniques with their applicability to Quran computing.In addition, it provides the limitations and considerations needed.Finally, from studying the methods available in the literature, Figure 2 provides an overview of key issues of particular importance in Audio-Watermarking for Quran recitations.In this work, the problem being addressed isthat of the security aspect of integrity and authentication with regard todigital Quran recitations and audio resources.Hence, a critical requirement is to ensure that all digital Quran audio/content that had originated from a known reference or reciter would be secure from being tampered with or modified in any way.That is, any modification or tampering of the digital Quran audio signal by an original publisher or sourcerecitation would be easily detected by the detection software and rendered as an invalid signal.The proposed methodology is provided in the next section.

III. METHODOLOGY AND IMPLEMENTATION
The proposed work involves identifying the Holy Quran reciter by providing an improvedapproach based on DWTfor one level to achieve very low complexity (low watermark size).The frequency domain based research works in the literature deals with copyright issues and individual/public property.It should be noted that the audio watermarking approach based on transforms such as FFT, DCT, DWT,SVD, etc., provide remarkable robustness, but unfortunately are associated with high complexity which makes it difficult to adapt for real time applications.In contrast, the results obtained in our approach were remarkable in terms of robustness and complexity.We explain our improvement through three implementation approaches considered in this work, which include: (i) the embedding and extraction process for the case of enhanced robustness in section A, (ii) the approach of enhanced robustness and security by applying the Rivest, Shamir, Adleman (RSA) algorithm on the watermark as in section B, and finally, (iii) the approach for enhanced robustness with the use of secure Hyper Text Transfer Protocol (HTTPS), which employs the secure-socket layer (SSL) technique (section C).The use of security measures in section B and C significantly helps to preserve the outstanding safety against falsification of the reciter identity.However, due to the need for avoiding high processing-complexity and running-times, the RSA-based encryption/decryption approach (section B) was then replaced with the use of HTTPS, and SSL (section C) in order to achieve reduced online complexity for real-time applications.

A. Encoding Based on Robustness Requirement Only
In this section, we describe the first of three implementation approaches considered in which only the robustness metric is considered in the embedding scheme prior to signal transmission from the user-end.The mathematic and algorithmic steps involved in the embedding and extraction schemes are detailed in section A.1 and A.2, respectively.The linear interpolation based watermarking embedding process from Figure 3is defined as: where:

1) Watermark Embedding Scheme
is the watermarked original audio signal is the used watermark is the original audio signal (p, q) is the position of the sampled point is the watermarking key, CA: is the audio-segment with the most important information (from the input signal) CD: are the detailed-segments within the input audio-signal (embedding is applied on this segment).CD iw is the embedded signal Normalization: is the process of transforming the input matrix into a linear-vector.
In linear interpolation based watermarking, two cases will be analysed:  In this section, we describe the second implementation approach developed, in which the robust embedding scheme follows with RSA encryption prior to signal transmission from the user-end in order to ensure robust and secure transmission.The mathematic and algorithmic steps involved in the embedding and extraction schemes are detailed in section B.1 and B.2, respectively.Nencry w following encryption.www.ijacsa.thesai.org6. Achieve the watermark embedding by: CD iw =(1-t) Nencry w +tCD i 7. Apply the wavelet inverse (DWT -1 ) to obtain iw (the watermarked original-signal).

C. Enhanced Security Mechanism For Real-Time Support by
Combining Robust Watermarking with HTTPS/SSL This section describes the third implementation approach developed, in which the robust embedding scheme follows with transmission from the user-end using HTTPS and SSL in order to ensure robust and secure transmission with lower complexity as compared with the RSA approach (from section B).
To enforce the security of the watermarked .wavsignal and also the authenticity of the reciter, it was decided to assign a session-based protocol to each reciter, identified using the username and password followed by a user-verification code sent to the user's email.This was the first layer of security used in this work and was used in order to prevent (internal) attacks/non-authentic access from the client-side.The second layer of security used was to secure all data transmitted on the network from external attacks.As previously mentioned, the RSA algorithm was initially considered and applied to provide the required encryption.However, the RSA encryption scheme was then replaced using the HTTPS protocol, using SSL libraries in order to provide a lightweight security scheme that creates a secure session between the client and the server before proceeding with the audio transmission.The main advantage of this alternative security-mechanism is that it is able to secure the transmitted audio-signal using a lightweight secure-socket layer (SSL) approach that is more suitable for real-time requirements since the RSA-based approach (section B) proved to be too complex when encrypting the whole signal to be transmitted or even when encrypting the watermark-signal only.

IV. PROPOSED QURAN RECITATION RECOGNITION FRAMEWORK
The proposed Quran recitation recognition framework is shown in Figure 7, and comprises of two main parts; the encoder at the sender-side and the decoder at the receiver-side.Biometric fingerprints were simulated using textual bitstreams of the user's name for demonstration of the initial prototype.In reality the text-string would be replaced by a bitstring of the fingerprint applied by the client.The encoding part is at the sender's side where we have voice input by a given student (reciter) and at the same time we have a database for all students in the course where a fingerprint/watermark is stored for each student/trainer.The fingerprint could be any type of user-specific signal/signature from a biometric database (or a new biometric signature), which is also input into the application.In Figure 7, the input voice-signal is initially in analog format and has active and inactive periods whereby the signal is sampled and quantized into a digital signal.Embedding in done on the central region of the detail-components (e.g. the CD component from section III) of the signal, whilst avoiding the most important information-components (e.g. the CA component from section III) that contains the main recitation signal components.The process of watermark embedding is illustrated in section V. Following the embedding phase, the Quran-signal would not be altered, since due to its sensitivity requirement, a small change would render the signal as invalid.On the other hand, the experiments showed that the embedded watermark had resulted with low noise effect without altering any www.ijacsa.thesai.orgfundamental characters, Tajweed or diacritic pronunciations in the Quran recitation (which are mainly found in the unaltered CA components).Additionally, the encoded-signal may undergo security operations as described in section III.B and section III.C, for the case of the RSA and HTTPS/SSL approaches, respectively.This work initially considered and applied the RSA cryptographic algorithm, which was then replaced by the HTTPS approach with SSL in order to avoid the high complexity and long processing delays for the RSA algorithm to encrypt and decrypt the audio-signal.Through our experiments, the SSL approach was found to be significantly better than the RSA algorithm for real-time audio-streaming applications due to the lower complexity when using SSL. Figure 8 summarizes the stages of data-flow at the sender's side prior to transmission.At the receiver's side, when this information is received, the decoding process is completed by going through the following steps: decrypting the signal (optional; as in the case when RSA was used), identifying the watermark portions in the signal, extracting the watermark to generate the original watermark code, then comparing this watermark to the stored watermark in the database: if there is a match, then the reciter is successfully identified and validated in the system, or otherwise discarded as invalid.In the case of a new reciter, the new signature/watermark is stored, verified by the receiverinstitution and usable thereafter.The principle system architecture diagram, combining all functional components at the sender and receiver sides is now summarized in Figure 9.  V.

ANALYSIS AND RESULTS
The example of the watermark embedding/extracting processis illustrated as follows: 1) Reading the original wave signal i,Figure 10.For demonstration purposes, the RSA-approach will be illustrated when being applied on the watermark.Here, the www.ijacsa.thesai.orgsecurity will be checked and the embedding/extracting processes will concern the encrypted watermark and not the watermark itself.An example of encrypting and decrypting the normalized watermark is shown in Figure 14. 2) The server-end is where the whole application is accessed.
 PHP programming language was used for developing basic services in the developed application, such as the uploading function.
 HTTPS protocol was used for securing all communications between the client-side and serverside.
 The operations of the watermark embedding, extraction and the checking of possible attacks on the watermarked signal were achieved using three Matlab programs.
These three programs have been used as executable files within our PHP web page.These programs are the core of this proposed audio prototype.The complexity of our scheme is polynomial, which makes our approach realizable with acceptable running-time performance.It remains to note that the checking tasks have worked as required with an accuracy rate of 100%.
The client/server prototype provides the user with the choice either focusing only on robustness (watermark embedding and extracting) or either to introduce the security aspect (encrypted watermark embedding and extracting) as an additional requirement during communications.The prototype essentially requires download and installation on a servermachine, which can be accessed for transmitting audio from the client side and receiving audio at the instructor-side.Hence, the sender and receiver both access the same application/interface; however, have different uses and accessprivileges.Figure15 illustrates a snapshot of the application.Steps to use the application include: first, the client records the recitation, then the reciter embeds the recitation, following this the reciter submits the recorded recitation and sends the results.Finally, the audio-signal is encrypted and watermarked ready for transmission to the instructor/receiver-side.The receiver side operation proceeds with login as an admin i.e.Quran-instructor/evaluator, who is presented with the updated recitations received.

VI. COMPARISON WITH THE STATE OF THE ART AUDIO WATERMARKING LITERATURE
The scheme presented in this approach provides a number of practical advantagesand enhancements compared to some other existing schemes and can be summarized as follows: -First, the proposed approachonly requires embedding little information(e.g.only a few data-bits to hide),thereby offering good capacityin terms of complexity representing therecite data,and is adaptable for the identification processas required in real time applications.www.ijacsa.thesai.org-Second, the detailed coefficients vector in wavelet transformwere exploitedduring the watermarking process, since it had resulted with no considerable alteration to the quality of the audible sound.Hence, one further enhancement to existing approaches in the literature was that all embedded data was completely encoded into the detailed coefficients vectors.In contrast,modifying the approximationcoefficients vector in the wavelet transform(as done in other schemes) had a directly impact on the signal quality.
-Third, the proposed approachwas found to bemore robust against various attacks applied as compared to other related works in [23] and [28].
-A secure process is achieved in the proposed scheme thatconfirms the identification of the reciter and detects any false/unauthenticreciters.
-Finally, it is worth noting that the use of the improved audio-watermarking scheme proposed here for application in highly-sensitive Quran voice-signals (and thus the constraints consequently imposed on the embedding scheme) is itself novel and not found anywhere in the related literature.
Many works found in the audio-watermarking literature have focused onrobustness and inaudibility/capacity performance for various approaches using either the Stirmark Benchmark attacks or self-simulated attack scenarios by simulation after varying a number of influential parameters.In this study, we present our results in comparison with two other closely-related studies after matching the attackscenarios, types and parameter-values used in the other studies (Table 2).The results obtained from our approach were found to be better than the results presented in [23] in terms of the normalized correlation coefficients (NCs), since all used NCvalues were closer to 1 as compared with [23].Furthermore, the proposed approach provides highly significant results in terms of Bit Error Rate (BER) values, which are close to 0 inalmost all attack scenarios.This suggests that the proposed approach is more robust against the attacks considered in the tests when compared to the results obtained in [28] (Table 2).

VII. CONCLUSION
The authentication scheme presented in this work provided a robust, secure and practical approach in terms of achieving low complexity as required for ensuring real-time authenticity of sensitive-speech data.Arabic Quran recitations were taken as a case study during experimentations due to the sensitive nature of the recitation, which had resulted with additional complexities to overcome, in contrast to ordinary speech-data mainly addressed in the literature.The main novelty in this work was found in our application and enhancement of existing audio-watermarking techniques under the constraints of the sensitive voice data, which should not be altered, with the further requirement that any embedded data remain inaudible.Themechanism executes a number of functional stages at the sender and receiver sides and avoids distortion of the intelligible and audible Quran input-signal, and had therefore successfully addressed the sensitivities of the digital-Quran audio-signal.Following experiments with several protocol variations, our final solution had employed the DWTthrough an HTTPS protocol in order to achieve reduced complexity and online authentication in real-time.Notably, our contribution compared very well with the other related approaches in the literature and had provided enhanced results for our key metrics of interest that had included robustness, inaudibility and capacity, enabling us to achieve real-time authentication.
The Quran recitation recognition framework and prototype produced in this work facilitates Quran Learning Institutions to authenticate the student-identity/reciter over an unreliable network, such as the Internet in cases where remote/distancelearning is required.The work in this paper is also very useful for student-evaluation purposes in a distance-learning environment, particularly where certificates are issued by an institution following student/client-verification.The prototype was successfully tested and was able to confirm authentic and non-authentic client identities.Finally, such a system could also be employed for more general verification purposes or other similar online-based learning centers/institutions requiring user voice-authentication before issuing academic or other certificates.

Fig. 1 .
Fig. 1.Classification of potential Audio-Speech Watermarking Techniques for the Holy Quran

Case1
www.ijacsa.thesai.orgSince the embedding process will only concern the CD i component, we have: Where: s 1 , s 2 are the size of the watermark image w k is the length of the normalized watermark image Hence, after applying the wavelet inverse on the components as follows, we will obtain the audible watermarked original audio-signal : Algorithmic Steps: 1. Read the original wav signal i. 2. Compute the wavelet components of i (CAi, CDi) 3. Create a watermark image w ( representing the reciter name) 4. Normalize w and obtain Nw ( where Nw=w/255) 5. Perform the watermark embedding: CD iw =(1-t)N w +tCD i 6. Apply the wavelet inverse (DWT -1 ) to obtain iw (the watermarked original signal), which is sent over the SSL transmission line.2) Watermark Extracting Scheme

Fig. 5 .
Fig. 5. Encoding Scheme based on Robustness and Security The algorithmic steps for embedding and encryption are summarized as follows: Algorithm steps: 1. Read the original wav signal i. 2. Compute the wavelet components of i (CAi, CDi) 3. Create a watermark image w ( representing the reciter name) 4. Normalize w and obtain Nw ( where Nw=w/255) 5. Encrypt Nw (based on RSA Algorithm) and obtain Nencry w following encryption.

Fig. 6 .
Fig. 6.Extraction Scheme based on Robustness and Security The algorithmic steps for extracting an RSA-encrypted signal are summarized as follows: Algorithm steps: 1. Read the attacked signal (iw a ), derived from the original-signal (iw) following an attack.2. Compute the wavelet components of iw a (CAiw a , CDiw a ) 3. Read the watermark image w (e.g. the reciter name as a bit-stream/text-string) 4. Normalize w and obtain Nw ( where Nw=w/255) 5. Encrypt Nw (based on the RSA Algorithm) and obtain Nencry w 6. Perform watermark extraction using: Nencry wa =1/(1t) CDiw a -(t/1-t ) Nencry w 7. Decrypt Nencry wa and denormalize it to obtain w a

Fig. 7 .
Fig. 7. Framework Diagram of Functional Blocks at the Sender and Receiver

Fig. 8 .
Fig. 8. Detailed Analysis of Data-Flow at the Senders-Side

Fig. 10 .
Fig. 10.Original Audio File 2) Calculate the wavelets coefficients of i (CAi and CDi) on one level, Figure 11.

Fig. 11 .
Fig. 11.Calculate the wavelets coefficients of i (CAi and CDi) on one level 3) Embedding the normalized watermark in CD coefficients of the signal i by the watermark w in the center of CDi, Figure 12.

Fig. 14 . 1 )
Fig. 14.Illustration of RSA-based Encryption and Decryption on the Watermarked-Signal A website was developed which consists of two-tier architecture as follows:-1) The client-end can access the application through any web-browser (e.g.Internet Explorer, Mozilla, Chrome, etc. ....).2) The server-end is where the whole application is accessed.PHP programming language was used for developing basic services in the developed application, such as the uploading function.

TABLE I
Internet-