Video Watermarking System for Copyright Protection based on Moving Parts and Silence Deletion

In recent years, video watermarking has emerged as a powerful technique for ensuring copyright protection. However, ensuring the lowest level of distortion, high transparency and transparency control, integrity of the watermarked video, and robustness against attacks that can be applied to destroy the embedded watermark are important properties that should be satisfied in a watermarking system. In this paper, we propose a video watermarking system that hides a watermark in both the visual and audio streams to ensure the integrity of the watermarked video. Specifically, we propose the moving block detection (MBD) algorithm for hiding the watermark in the moving parts of the original visual stream of the video. The MDB algorithm ensures that a minimal amount of distortion is caused by embedding the watermark. The MBD uses entropy to find the moving parts of the visual stream to hide the watermark. The process of hiding in the visual stream is performed using DWT to ensure both transparency and resistance against attacks. We employ the power factors of DWT to control the level of transparency. In addition, we propose the silence deletion algorithm (SDA), which generates a pure original audio stream by removing the noise from the original audio stream to form the hiding place of the watermark within the audio stream. DCT is employed to hide the watermark within the pure original audio stream to ensure resistance against attacks. Under a threat model, which includes bilinear, curved, and LPF geometric attacks and compression and Gaussian noise nongeometric attacks, the experimental results demonstrated that the proposed system outperformed four similar systems: keyframe-, I-frame-, spread-spectrum-, and LBS-based systems. Keywords—Watermark; audio stream; visual stream; moving block, silence deletion; DWT; DCT; attacks


I. INTRODUCTION
Facing the ever-growing quantity of digital videos that are transmitted, shared and exchanged over the Internet, illegal copying and unreliable distribution of digital content have become serious, alarming problems.
Importance of video watermarking video watermarking can be defined as the process of hiding a watermark in a video [1,2].This watermark can be an image, audio, or text file.The importance of video watermarking is due to its valuable applications, such as authentication, tamper detection, and fingerprinting [3,4,5].One of the most important applications of video watermarking is copyright protection [6,7].To demonstrate this feature, suppose that a company developed a special tool that contributes to resolving a critical issue.The solution is recorded by a video and transmitted via the Internet.To ensure the product ownership, the logo of the company is hidden within the video so that if an attacker tries to steal this product, the company can prove that this product is related to its own inventories by extracting the hidden logo.
Despite the benefits that are provided by video watermarking, it is not without problems.To define these problems, we must examine the general scenario of a video watermarking system, which is illustrated in Fig. 1.
Fig. 1 shows that the original video is manipulated to hide the original watermark.This process is called the embedding stage, which is performed at the sender side.At the receiver side, the contract process, which is called the extraction process, is executed; this yields the original video and the extracted watermark.Finally, the original watermark and the extracted watermark are matched to ensure the similarity.
Statement of the problem and the corresponding research questions.According to the previous Figure, embedding a digital watermark within a video ensures the copyright protection.However, the embedding process causes distortion of the original video.If this distortion is observed, the attacker can infer that this video is protected by a watermarking technique.Therefore, the original video (prior to the embedding process) must match the watermarked video (after the embedding process).Thus, the corresponding research question is as follows: How can the matching between the original video and the watermarked video be ensured?In addition, hiding a watermark within the video stream of the original video leads to an incomplete watermarking process because the video has another component (the audio stream) and the video file cannot be represented by only one part.This situation leads to the following research question: How can the accurate integration of the watermarked video be ensured?Moreover, the transparency of the embedded digital watermark, namely, the invisibility of the digital watermark to the naked human eye, is a critical issue and leads to the following research question: How can the transparency of the embedded digital watermark be ensured [8,9]?Regarding ensuring transparency, another issue arises, which is related to controlling the level of the transparency that is realized after the video watermarking process.The corresponding research question is as follows: How can the transparency level be controlled to render the digital watermark invisible, semivisible, or fully visible in the watermarked video [10,11]?In addition, the attacker can manipulate the watermarked video by applying geometric or non-geometric attacks, such as a low-pass filter (LPF), rotation, compression, or noise addition [12,13], which results in the destruction of the extracted digital watermark.The corresponding research question is as www.ijacsa.thesai.orgfollows: How can robustness against these types of attacks be ensured?
Motivated by the five research questions that are posed above, the construction of a robust video watermarking system that ensures copyright protection is essential.
By selecting a suitable location for the watermark to be hidden, we can ensure the matching between the original video and the watermarked video.In addition, employing frequency-based techniques, rather than spatial-based techniques such as least significant bit (LSB), endows the process of hiding with higher resistance against potential attacks.
The main contributions of this work are as follows:  In response to the first three research questions, we propose a novel watermarking approach that ensures copyright protection while satisfying the requirements of video watermarking (no distortion and transparency).The process of hiding is performed in both the audio and visual streams.The no-distortion and transparency requirements are satisfied by hiding the watermark within the moving parts of the original video file with the help of the discrete wavelet transform (DWT).In the audio stream, the hiding process is performed using the discrete cosine transform (DCT).
 In response to the fourth research question, the transparency can be controlled (to high transparency or low transparency) in the proposed approach by adjusting the power factors of DWT.
 In response to the last research question, the proposed approach is resistant to various types of attacks, such as rotation, compression, LPF, salt and pepper, and Gaussian noise.The resistance is guaranteed in the video stream part by hiding the watermark within moving objects in the original video.Meanwhile, the resistance of the watermarked audio stream part is realized via a proactive silence-deletion-based step.
The remainder of the paper is organized as follows: Section II reviews the related works.Section III describes our proposed system, along with its components' roles, in detail.Security analysis is discussed in Section IV.Section V presents the metrics that were considered, followed by the experimental results and evaluations in Section VI.Finally, we present the conclusions of this work in Section VII.

II. RELATED WORK
Video watermarking approaches can be proposed under two main domains: the spatial domain and the frequency domain.Each domain has its own techniques, as illustrated in Fig. 2.

A. Spatial Domain
In this domain, a frame of the video (the image) is manipulated at the pixel level, where the color space is employed in the embedding process.The most common techniques that are used in this domain are reviewed below.

1) Additive watermarking technique:
This technique focuses on the intensity of the pixels in the image, where the watermark will be hidden as a spread noise in terms of (-1, 0, +1) [14].
2) Least significant bit (LBS) technique: This technique is an old technique.Its key strategy is to hide the watermark within the least significant bit since it will produce the smallest distortion after hiding.Many enhancements over LSB can be applied, which involve encryption, randomization, or both.LSB can be used in both image and audio files [15].
3) Texture mapping coding technique: This technique is used only with noisy images.A noisy image is an image that contains many textured areas, which are the best places to hide the watermark [16].

4) SSM-modulation-based technique:
This technique mainly utilizes spread-spectrum methods to modulate the color signal and embeds the watermark in the energy of the color wave [17].
The spatial-domain techniques are highly vulnerable to most attacks according to [18,4].Hence, the focus of research is moving toward the frequency domain.

B. Frequency Domain
In this domain, the color waves of the pixels are considered and the frame of the video is converted from the spatial domain to the frequency domain via mathematical transforms.The previous works can be classified into three main classes, as illustrated in Fig. 3.  1) Hiding only in the visual stream of the video: In [19], the authors proposed a watermarking method in which the watermark is represented as a label and embedded in pixels of each frame via DCT.For this purpose, a search table of pixel patterns and their sign sequences of eight low DCT coefficients are exploited.The main advantage of this approach is that it is robust against changes in the group of pictures.Focusing on the transparency requirement, Ahmed at el. [20] proposed a blind video watermarking scheme.The watermark is embedded into preselected frames of the original video.Theses frames are selected based on a key value and are referred to as key frames.Then, the key frames are converted into the YUV color system and the watermark is hidden in the luminance layer (Y layer) using DWT to ensure transparency.To make the process blind, the watermarked video was manipulated without the original video, where the key frames are manipulated using the inverse DWT to extract the hidden watermark.This approach provides high invisibility of the watermark and requires less processing time compared to the previous approach since the hiding process is not applied on all frames of the original video.However, the process of selecting key frames may not be suitable for many video files formats.
Another watermarking method is presented in [21], which uses static 3D-DCT to hide a watermark in video.The key strategy is to identify a scene change in the video and convert the frames into the YUV color space to select the luminance layer (Y) for the hiding process.This model yields satisfactory results for videos that have low motion activity; in other cases, there is noticeable distortion.Similar to the previous work, the authors of [22], who developed the previous model, used dynamic 3-D DCT to realize the benefits of utilizing the frequency of the video sequences, which provides more robustness against attacks.
In [23], a copyright video protection approach is proposed.DWT is used in the hiding process, where it is implemented on both the watermark and the I-frames that represent the location for hiding.Instead of converting the I-frames from RGB into YUV, the authors use the YCbCr color space to realize the transparency objective.This work was subsequently enhanced by the same authors, who focused on capacity and security features [24].The capacity feature is realized by manipulating the original video at the bit level, while the security feature is realized by encrypting the watermark prior to hiding it.
2) Hiding only in the audio stream of the video: Based on an audio stream compression method, Petrovic et al. proposed an audio stream watermarking approach [25].They focused on minimizing the processing requirements at the embedding side while maintaining high perceptual quality.The key strategy is to employ advanced audio coding (AAC) technology.Two main steps are performed: (1) preprocessing and (2) marking.In the preprocessing step, a host signal is marked by one or more hiders.Each hider embeds a string of identical symbols.In the second step, two or more distinct copies of the host signal are retrieved from the memory to be input to a multiplexer (MUX) when the creation of a marked copy is requested.However, this approach has a substantial drawback: it is vulnerable to compression non-geometric attacks.
The authors of work [26] were motivated to deal with the audio stream because due to the narrow-bandwidth limitation, speech signals are seldom used, despite their popularity in communication applications, such as military, bank, phone and network security.Therefore, they proposed a spreadspectrum-based technique for hiding the watermark within the audio steam.The authors combine direct-sequence spread spectrum (DSSS) technology with a simple basic frequency mask to conduct the hiding process.
In [27], a three-step audio watermarking system is proposed.The first step is to use the standard LBS technique.The second step is to search for the level of audio that is closest to the level of the original audio after watermarking.The search process depends on the minimum error level.The main objective of the second step is to ensure transparency.The third step utilizes error diffusion to ensure the high capacity of the proposed system.
To realize high capacity when hiding data in the audio signal, the authors of [28] utilized the fast Fourier transform (FFT) spectrum.The key strategy is to divide the FFT spectrum into short frames and change the magnitudes of selected FFT samples using Fibonacci numbers.Using Fibonacci numbers, it is possible to change the frequency samples adaptively.

3) Hiding in both the visual and audio streams of the video:
A self-adaptive approach is proposed in [29] for hiding a watermark within both the visual and audio streams.The authors relied on two main processing steps: The watermark is constructed from the audio stream of the video, where the features of the audio signal are extracted and used to generate the watermark.Then, the generated watermark is embedded within the visual stream via DCT.
Aiming at providing a solution with robust and fragile aspects to guarantee authentication and integrity, the authors of [30] proposed an approach that uses watermarks in combination with content information.The authors used the same strategy as in the previous work.The main difference is that they used a seed-based method in the hiding process.www.ijacsa.thesai.org

III. PROPOSED SYSTEM
In this section, we introduce our proposed video watermarking system, which satisfies the integrity, transparency, and robustness requirements.The section is organized as follows: a threat model is defined, followed by the corresponding architecture of the proposed video watermarking system.Then, the role of each component of the system architecture is described in detail.

A. Threat Model
In the context of defining the threat model, we define the attacker, his/her objective, the type of the attack, and the capabilities of the attacker that are used to achieve the objective.

For an original video (
) with both a visual stream ( ) and an audio stream ( ), ( ) is defined as: (1) After hiding the original watermark within both and , a watermarked video is generated as: (2) where and ⋃ The type of the attack is active.Therefore, the objective of the attacker (man in the middle) is to destroy the embedded watermark, as illustrated in Fig. 4. To accomplish his/her objective, the attacker uses geometric or non-geometric attacks.Table 1 lists the capabilities of the attacker.
Table 2 shows the effects of the previously described attacks on an image (or video frame).

B. Our Proposed System Architecture
The framework of the proposed system consists of the sender and the receiver of the watermarked video and the attacker.All three are connected via a network.The system is managed by eight components ( , , , , , , and ), as shown in Fig. 5.

Main mission Location
Recording the original video.Sender side.
Extracting original visual and audio streams.
Finding the place of hiding within the visual stream.
Hiding process within the visual stream.
Sender & receiver sides.Hiding process within the audio stream.
Sender & receiver sides.Merging the watermarked visual and audio streams.
Sender side.
Table 3 lists the components and identifies the main mission of each component and where it is installed.
The mission of each component is integrated with the missions of the others.The following explains the roles of the components.

C. Roles of the Components 1) Role of the component:
This component is responsible for creating the original video (both the visual and audio streams).Any multimedia recorder can be used here; the generated video file can be converted later into other formats.We used the Zoom program for this purpose [32].
2) Role of the component: This component is responsible for obtaining the visual and audio streams of the recorded original video separately.At the end, the two streams are ready for the hiding process.We use the Wondershare Filmora multimedia tool for this purpose [33].
3) Role of the component: This component is responsible for identifying a suitable place for the original watermark to be embedded.Selecting the suitable place to hide the original watermark mainly contributes to ensuring matching between the original video and the watermarked one.The component executes the moving part detection approach (MPDA), as described below.

D. Moving Part Detection Approach (MPDA)
Randomly selecting a part of a frame for hiding is a poor solution because, depending on the static parts of a frame, for example, leads to highlighting of the distortion after the watermark has been hidden.By contrast, depending on moving parts of the frame is an effective strategy for hiding because the moving parts of the frame can be viewed as a type of noise, which is referred to as the dirty window effect [31], which is demonstrated in Fig. 6.
In Fig. 6, two frames of a Miss America contestant are shown, in which the woman is speaking.In the frames within a video, the moving part (i.e., her mouth) appears as a noise.Inserting a watermark leads to distortion, where foreign information is added to the pure visual stream of the original video.However, inserting a watermark within such a moving part will not lead to a noticeable change.The reason behind this is that the result of the insertion process can be viewed as a noise over a noise.This, in turn, leads to unnoticeable distortion, which contributes to the matching of the original visual stream with the watermarked one.
The component separates the original frame into moving and non-moving parts, as illustrated in Fig. 7.
To identify the blocks of the moving parts from the original visual stream (rather than the non-moving parts), we utilize the entropy metric.In image processing, entropy is used to classify textures: a texture might correspond to a known entropy value if patterns repeat themselves in approximately regular ways, which is true in videos in which the frames are periodically repeated to create the motion.Specifically, the watermark is embedded in the moving part of each color frame in all three RGB channels.Several beginning frames of original visual stream are selected as references.Then, the state of each block that is involved in the current frame is determined (moving or non-moving), which is accomplished by comparing the entropy value of each block (in the current frame) with the corresponding entropy values of the references blocks.If the difference between the entropy values is high, a high disorder or high variance is detected.Thus, the current block is moving; otherwise, it is nonmoving.Entropy has already been implemented as a function in Matlab.Fig. 8 illustrates this strategy.Formally, each color channel is divided into blocks of size .Let and .Then, each block can be represented as: where { } To accurately determine the entropy value, which will be used to decide whether a block is moving or non-moving, we use a normalization process.The average of all entropy values from all blocks is calculated as: where denotes the entropy function.
Any block can be evaluated as moving or non-moving as follows: where denotes the entropy value of the specified block.
Algorithm 1 presents the pseudo code of the mission of the component.1) Role of the component: This component is responsible for hiding the original watermark within the moving blocks that are obtained from the executed mission of the previous component.The mission of the component is performed using DWT.In addition, it makes it possible to control the transparency of the embedded watermark.

E. DWT-Based Hiding Approach (DWTHA)
By definition, DWT generates a sparse time-frequency representation of an input signal.The output of DWT is four subbands of data: a low/low-frequency band , a low/high frequency band , a high/low frequency band , and a high/high frequency band [34].Most of the information of the input signal is included in subband and the other subbands are viewed as shadows of the input signal that have decreased appearance quality, which gives DWT an advantage: multi-resolution.The key power of the multiresolution feature is that the localization characteristics match the theoretical models of the human visual system (HVS).Depending on the localization characteristics of the multiresolution feature, a watermark can be embedded within any of the four generated subbands.However, embedding a watermark within the subband results is a high transparency requirement guarantee, but leads to low resistance against attacks.Meanwhile, embedding a watermark within the subband results in high resistance against attacks but leads to noticeable distortion (thereby decreasing the quality of the watermarked signal) [35].
To solve this problem, the moving blocks are converted from the RGB color system into the YUV color system.Then, the Y layer, which refers the luminance layer, is extracted.Finally, DWT is applied on the Y layer and the watermark is embedded within the subband, as illustrated in Fig. 9.
As illustrated in Fig. 9, the hiding process is performed within the Y layer of the detected moving block.Both the transparency of the embedded watermark and the resistance against attacks are ensured by hiding each resultant subband in the corresponding subband.where W represents the wavelet coefficient function; and denote the dilation and translation parameters, respectively; and is the length of the signal .
For images (i.e., frames), two-dimensional DWT is used.Two-dimensional DWT is derived from one-dimensional DWT.A two-dimensional scaling function and threedimensional wavelets are required, as follows: The expanded and translated basis functions are: where { } Then, the discrete wavelet transform function of size is: The two previous formulas are applied in the luminance layer (Y) of the moving blocks , where (1) each block is of size and (2) .Thus, DWT decomposes the two-dimensional moving block into wavelet-like matrices (i.e., the four subbands that are illustrated in Fig. 9).In addition, DWT decomposes the original watermark into the four corresponding subbands.

Let
, , , and denote the four subbands that represent the output of DWT on the original watermark, which is denoted as .Let , , , and denote the corresponding subbands of a moving block that was extracted from .The hiding process is performed according the following formulas: where , , , and denote the watermarked subbands of the moving block.The coefficient vector contains the power factors that are related to the transparency.This vector is used to control the transparency value of the embedded watermark, where .If has high values, then the embedded watermark is visible in the watermarked video (i.e., poor transparency).If has low values, then the embedded watermark is invisible in the watermarked video (i.e., satisfactory transparency).Thus, by adjusting the values of the power factors, full control of the embedded watermark can be realized (visible, invisible, and semi-visible).
Algorithm 2 presents the pseudocode of the mission of the component.
Algorithm 1) Role of the component: This component is responsible for manipulating the original audio stream to prepare it for the hiding process.This manipulation is performed in a pre-processing stage via the silence deletion approach, as described below.

F. Silence Deletion Approach (SDA)
Typically, speech signals vary slowly over time.Therefore, if a speech signal is detected over a short time window, it reflects stationary characteristics (i.e., silence parts).Meanwhile, if it is detected over a long time window, it reflects changing characteristics, which lead to various speech sounds.Typically, the first 200 msec of a speech signal (approximately 1600 samples) correspond to the silence parts.In addition, the silence parts can spread over a speech signal [36], as shown in Fig. 10.The key strategy for preparing an audio stream for the hiding process is to detect and delete the silence samples so that the watermark is embedded within the pure original audio stream.This strategy can provide high resistance against compression attacks because the compression attacks delete the silence samples to decrease the size of an audio file.Thus, if a compressions attack is applied on a watermarked audio stream, the embedded watermark will not be affected.
Formally, let ζ and λ denote the mean and standard deviation, respectively, of the first 1600 samples of an original audio stream.Then, the noise that is distributed over the audio signal is expressed as: A sample is categorized as silence or voiced via the following formula: To represent the original audio signal as a series of zeros and ones, we label the voiced samples as ones and the silence samples as zeros.Thus, the audio signal is decomposed into two non-overlapping windows of voiced and silence samples.The process of marking the silence samples consists of two steps: (1) labeling the silence samples and (2) associating the label with the location of the silence sample.Via these two steps, the silence part is obtained, saved and, finally, deleted from the original audio stream.Later, we reincorporate the silence part after watermarking the original pure audio stream.
Algorithm 3 presents the pseudocode of the mission of the component.

1) Role of the component:
This component is responsible for hiding the original watermark within the pure original audio stream.Here, the process of hiding mainly depends on DCT.The hiding process is performed on the pure original audio stream after silence samples have been deleted, as illustrated in Fig. 11.
Formally, let and denote the pure original stream and silence samples, respectively.Then, (20) The watermark is embedded within the by modifying the DCT coefficients.DCT is formulated as: where is the time pure original audio stream series, are the DCT coefficient series, and is the number of samples on which DCT is performed.
Inverse DCT (IDCT) is expressed as: where is a coefficient that is defined as follows: When a watermark is embedded within the DCT coefficient, the coefficient is modified: Then, the corresponding time series are obtained via the IDCT as follows: www.ijacsa.thesai.org where represents the noise that is caused by the modification of the DCT coefficient on the sample in the time domain.
2) Role of the component.:This component is responsible for adding back the silence samples that are saved in the hash that was used in the silence deletion approach.Therefore, the input of this component is the watermarked pure audio stream and the output is the watermarked audio stream , as illustrated in Fig. 12.

3) Role of the component: This component is responsible for combining the watermarked visual stream
and the watermarked audio stream , as inputs, to produce the watermarked video as output, as illustrated in Fig. 13.

IV. SECURITY ANALYSIS
In this section, we prove that the attacks considered in the threat model fail to destroy the embedded watermark.We follow the definition-theorem-proof style in discussing the resistance against both geometric and non-geometric attacks.

A. Security Analysis of Geometric Attacks
Definition 1.A video watermarking system is bilinear attack resistant if the boundaries of the host frame (or image) do not change differently (in length or direction) such that the embedded watermark can be distinguished.
Theorem 1.The proposed video watermarking system is bilinear attack resistant.

Proof 1.
Let denote the original image (or frame) where the watermark is hidden, where , and represent the height, width, and four boundary angles (i.e., properties), respectively.After the bilinear attack has been applied, the resultant (distorted) image will be ̈ ̈ ̈ ̈ ̈ ̈ ̈ .Due to the motion, the moving parts of be distorted (i.e., updating the properties).This distortion can be represented as ̇ ̇ ̇ ̇ ̇ ̇ ̇.Since the watermark is embedded within the moving parts of , the distortion that is caused by the hiding process is: The distortion is sufficiently small to preserve the features of the embedded watermark.Hence, the bilinear attack fails.Definition 2. A video watermarking system is curved attack resistant if the boundaries of the host frame do not change equally (in an arc manner) such that the watermark can be distinguished.Theorem 2. The proposed video watermarking system is curved attack resistant.
Proof 2. The same justification as was provided for the bilinear attack can be provided here, while taking into consideration the effect of the curved attack.That is because the effect of the curved attack is similar to that of the bilinear attack, with different property values of the resultant frame.Therefore, hiding within moving parts of the video contributes to the failure of the curved attack.Definition 3. A video watermarking system is LBF attack resistant if the smoothness of the host frame does not change substantially such that the watermark can be distinguished., and represent the height, width, and www.ijacsa.thesai.orgfour boundary angles (i.e., properties), respectively, and suppose the smoothness is at a natural level.After applying the LPF attack, the resultant (distorted) image is denoted as , where denotes the changed smoothness level.The smoothness level of the moving parts of the host frame was originally natural due to the motion .Consequently, is considered a part of that is caused by the LPF attack.Therefore, the watermark is embedded within the frame that has , which, in turn, mitigates the effect of the LPF attack since it can be viewed as a distortion over a distortion.In other words, a part of the effect of the LPF attack ( ) is absorbed by .Hence, this feature of the host frame is preserved and the embedded watermark is not altered.As a result, the LPF attack fails.

B. Security Analysis of Non-Geometric Attacks
Definition 4. A video watermarking system is Gaussian noise attack resistant if the resolution of the pixels in the host frame does not decrease substantially such that the watermark can be distinguished.Theorem 4. The proposed video watermarking system is Gaussian noise attack resistant.

Proof 4. Let
denote the original image (or frame) where the watermark is hidden, where , and represent the height, width, and boundary angles (i.e., properties), respectively.The first six properties are not affected by the Gaussian noise attack and we examine the change in the resolution due to the added noise.After applying the Gaussian noise attack, the resultant (distorted) image is denoted as ̃ , where ̌ denotes the new resolution.When adding the Gaussian noise to the moving parts of the host frame, it is viewed as a noise over a noise since the motion itself can be viewed as a type of noise, which changes the resolution of the frame when it is viewed by human eyes.Therefore, the Gaussian noise is also absorbed by the noise of the motion.In other words, the embedded watermark is inserted within the noisy part of the host frame, which, in turn, prevents the Gaussian noise attack from destroying the watermark.In the audio stream, the Gaussian noise attack also fails because the watermark is embedded within the pure audio stream and is not substantially affected by this attack; the silence that is deleted is considered to be the place where the noise of the Gaussian attack is added.
Definition 5. A video watermarking system is compression attack resistant if both the resolution and contrast of the pixels in the host frame do not increase such that the watermark can be distinguished.
Theorem 5.The proposed video watermarking system is compression attack resistant.
Proof 5. Let denote the original image (or frame) where the watermark is hidden, where , and denote the height, width, and four boundary angles (i.e., properties), respectively, and denote the resolution and the contrast.After applying the compression attack, the resultant (distorted) image will be ̅̅̅̅̅̅̅̅ ̅̅̅̅̅̅̅̅̅ , where ̅̅̅̅̅̅̅̅ refers to the new resolution and ̅̅̅̅̅̅̅̅̅ refers to the new contrast.Since most of the representation of the watermark is embedded within the LL subband of the Y layer of the moving parts, the new resolution does not affect the embedded watermark.Moreover, because the shadows of the watermark are embedded within the corresponding shadows of the moving parts of the Y layer, the new contrast does not affect the embedded watermark.Therefore, the strategic employment of the multi-resolution feature of DWT contributes to the failure of the compression attack.Regarding hiding in the audio stream, the effect of the compression attack will be limited within the space of silence that was originally deleted before the hiding process.Therefore, the space of hiding (i.e., the pure audio) is not affected and the hidden watermark is kept safe.

V. METRICS
To evaluate the proposed video watermarking system, several metrics are used to measure the quality of video (QoV) after watermarking and the similarity between the original watermark and the extracted one.

A. QoV Metrics
To evaluate the QoV, the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) metrics are used.Calculating the PSNR value requires two inputs: a frame from the original video and a frame from the watermarked video.Let and refer the original frame and the corresponding watermarked frame, respectively, both of which are of size .Then, the PSNR is represented by: (28) where the mean squared error (MSE) is given by: A higher PSNER value corresponds to a satisfactory QoV.A lower MSE value also corresponds to a satisfactory QoV, where the optimal QoV is obtained when the MSE value is close to zero.
The SSIM metric is used to quantify image quality degradation and to accurately measure the variation of structural information between the original frame and the watermarked Frame . SSIM is defined in the context of three components: the luminance, contrast, and structural components.Formally, it is defined as: (30) where , , are parameters that are used to control the luminance, contrast, and structural components, respectively.

B. Watermark Similarity Metrics
Here, we use the correlation coefficient metric that was proposed by Lee et al. [37].This metric is widely used in statistical analysis, pattern recognition, and image processing.For monochrome digital images, the correlation coefficient is defined as: where and are the intensity values of the pixel in the original watermark and the extracted watermark respectively.The maximum value of the correlation coefficient metric is 1, which is attained when the two watermarks are identical.When the value of the correlation coefficient metric is 0, the two watermarks are completely uncorrelated.When the value of the correlation coefficient metric is -1, the two watermarks are completely anticorrelated.In this context, we employ the correlation coefficient metric to evaluate the resistance of the proposed video marking system against the attacks that are listed in the threat model above.

C. Audio Watermarking Metrics
To evaluate the audio watermarking performance, we use the PSNR metric, where and are replaced by and , which represent the original audio signal and the watermarked audio signal, respectively.In addition, we use the waveform difference of the audio signals (i.e., before and after watermarking) to graphically demonstrate the similarity between the original audio and the watermarked audio.

VI. EXPERIMENTAL RESULTS AND EVALUATIONS
In this section, we present the results of our experiments in terms of the metrics that were described in the previous section.In addition, the results are compared with previous works that were discussed in the related work section.

A. System Setup
The proposed video watermarking system is implemented using the Matlab programming language.The system is executed on a laptop that has a Genuine Intel® 2.4 GHz PC with 4.00 G RAM and is running Microsoft Windows 7 Ultimate.We apply our proposed video watermarking system to a rhino video and use the logo of Naif Arab University for Security Sciences as a watermark, as shown in Fig. 14.Table 4 briefly describes this rhino video.
Our proposed watermarking system can be applied to videos that have other extension formats if they are converted into the AVI extension format.

B. Evaluations
The following table lists the works to which we compare our proposed system.

1) PSNR-based QoV evaluation: Under increased values of power factors (
) that conrol the transperancy, we evaluate our proposed MBD approach in comparison with the I-frames and Key-frames approaches.Fig. 15 presents the results.
Discussion.Among the approaches in Fig. 15, the MBD approach occupies the first rank, followed by the I-frames and Key-frames approaches.The reason behind the best performance of the MBD approach is that error (or noise) that is caused by hiding the watermark is minimal, as it propagates within the moving parts of the frames.By contrast, this error is centered in the I frames or other frames in the other approaches, which, in turn, deteriorates the QoV.In the Keyframes approach, the three types of frames (the I, B, and P frames that form a video) may contain the watermark if it is embedded within some motion (or some moving blocks) that is formed by the sequence of the three previous frames.This type of embedding leads to the maximization of the PSNR values compared to hiding in the I-frames only.

Type Approach Location of Hiding Hiding Technique
Visual Stream 2) SSIM-based QoV evaluation: Fig. 16 shows the results that were obtained under increased values of the power factors ( ) that conrol the transperancy.
Discussion.The results shown in Fig. 16 support those shown in Fig. 15 because there is an inverse relationship between the QoV and the frame quality degradation.In other words, if the frame quality degradation decreases, the QoV increases, which results in higher SSIM values.The amount of quality degradation in the frames (when using the MBD-based hiding approach) is the lowest; hence, it outperforms the keyframes-and I-frames-based hiding approaches.The keyframes-based hiding approach outperforms the I-frames-based hiding approach due to the smaller amount of error caused by hiding the watermark.However, sometimes, the watermark is embedded in a key frame that includes a high moving block frequency, which explains the results that were obtained in the third and final trial (i.e., when ).Therefore, under the SSIM metric, the key-frames-based hiding approach yields results that are close to those of the MBD-based hiding approach in such cases.
In evaluating the proposed video watermarking system under the attacks, we follow the following strategy: (1) the system is run (i.e., hide the watermark); (2) the attacks in the threat model are applied; (3) the extraction process is performed to obtain the watermark; and (4) the correlation coefficient metric is used to extract the results (i.e., we calculate the similarity between the original watermark and the extracted one using the correlation coefficient metric).
3) Impact of the bilinear attack: After applying the bilinear attack on the watermarked video, the extracted watermark is distorted.Fig. 17 shows the original and extracted watermarks.Under an increased level of ripple and power factors of ( ), we calculate the correlation values, which are plotted in Fig. 18.
Discussion.There is an inverse relationship between the level of ripple and the correlation value.Therefore, the values of the correlation are decreased when the level of the ripple is increased in the all compared approaches.However, the proposed MBD-based hiding approach yields the best results because a high percentage of ripple levels are included in the moving blocks that are used to hid the watermark, resulting in a small effect of the bilinear attack and hence the highest similarity between the original and extracted watermarks and the highest resistance against the bilinear attack.In the keyframe-based hiding approach, the selected key frames may include many moving parts, which contain a considerable percentage of the ripple levels of the original.Hence, the approach ranks second in terms of resistance against the bilinear attack.The I-frame-based hiding approach performs the worst since none of the ripples are originally included in the I-frames that are selected for hiding the watermark.Consequently, it has the lowest resistance against the bilinear attack.

4) Impact of the curved attack:
After applying the curved attack on the watermarked video, the extracted watermark is distorted.Fig. 19 shows the original and extracted watermarks.
Under an increased level of ripple and power factors of ( ), we calculate the correlation values, which are plotted in Fig. 20.Discussion.The curved attack can be viewed as an expanded bilinear attack because the curved attack negatively affects each part of the embedded watermark (i.e., each line that is drawn in the watermark is distorted in an arc-like manner).For this reason and due to the nature of the watermark that is used in this work (i.e., it includes many connected straight lines), the values of the correlation that are plotted in Fig. 20 are slightly lower compared to those that are plotted in Fig. 18.However, the MBD-based hiding approach still performs the best among the compared approaches against the curved attack.The same justification as was offered for the results that were obtained when applying the bilinear attack holds here.

5) Impact of the LPF attack:
After applying the LBF attack on the watermarked video, the extracted watermark is distorted.Fig. 21 shows the original and extracted watermarks.Under increased filter sizes and power factors of ( ), we calculate the correlation values, which are plotted in Fig. 22. Discussion.According to Fig. 22, the correlation value decreases as the window size of LBF increases in all three approaches.The MBD-based hiding approach performs the best under the LPF attack threat.That is because the smoothness of the moving blocks in the host frames is not affected substantially by the LPF attack, which protects the embedded watermark from degradation.The reason is that the degradation of the smoothness can be viewed as a type of blurring, which is originally included in the motion.Therefore, the original blurring of the moving blocks can disperse the blurring that is added by the LPF attack.Thus, the similarity between the original watermark and the extracted one is the highest.The I-frame-based hiding approach does not cause any blurring since no motion is created by the Iframes of a video.Hence, the host frame is substantially affected by the LPF attack, which results in a high dissimilarity between the original watermark and the extracted one.Consequently, the I-frame-based hiding approach has the lowest resistance against the LPF attack.In the Key-framebased hiding approach, motion is formed by the key frames, which mitigates the negative impact of the LPF attack and results in moderate correlation values.

6) Impact of the Gaussian noise attack:
After applying the Gaussian noise attack on the watermarked video, the extracted watermark is distorted.Fig. 23 shows the original and extracted watermarks.
Under an increased noise percentage and power factors of ( ), we calculate the correlation values, which are plotted in Fig. 24.
Discussion.According to Fig. 24, the correlation value substantially decreased as the noise percentage increased for all three approaches due to external and new parts (i.e., the noise points or signals) being added to the original frame, which affects the resolution of each pixel of the host frame.This decrease leads to a highly distorted extracted watermark, which results in a poor correlation value.However, the MBDbased hiding approach yields correlation values that are in the range of [0.4 -0.8] compared to [0.25 -0.64] and [0.13 -0.55] in the key-frame-and I-frame-based hiding approaches, respectively, which corresponds to a correlation average of 60 % in the MBD-based hiding approach, 45 % in the key-framebased hiding approach, and 34 % in the I-frame-based hiding approach.The MBD-based hiding approach has the highest resistance against the Gaussian noise attack, which is due to selection of a suitable place for the watermark to be embedded.

7) Impact of the compression attack:
After applying the compression attack on the watermarked video, the extracted watermark is distorted.Fig. 25 shows the original and extracted watermarks.
Under an increased compression level and power factors of ( ), we calculate the correlation values, which are plotted in Fig. 26.
Discussion.Compared to the Gaussian noise attack, Fig. 26 shows that under the threat of the compression attack (i.e., increasing compression level), the correlation value dramatically decreased in all three approaches, especially in the key-frame-and I-frame-based approaches, because the compression attack negatively affects both the resolution of the pixels and the contrast of the host frame and, consequently, the embedded watermark.However, the MBDbased approach its resistance against the compression attack and is assigned the top ranking.The corresponding range within which the correlation value varies is [0.37 -0.72], compared to [0.1 -0.6] and [0.07 -0.51] for the key-frame-and I-frame-based hiding approaches.The ranges correspond to 51 %, 35 %, and 29 % correlation averages.The reasons behind the highest resistance of the MBD-based approach are as follows: (1) it uses DWT as the hiding technique, which is resistant against the compression attack, and (2) selects the moving parts of the host frames for hiding the watermark.The key-frame-based hiding approach has a higher resistance against the compression attack than the I-frame-based hiding approach.because the key-frame-based hiding approach relies on DWT as a hiding technique, while the I-frame-based hiding approach relies on DCT as a hiding technique.DCT has a lower resistance against the compression attack compared to DWT [38].
To evaluate the proposed SDA-based audio watermarking approach, we calculate the PSNR values of the approaches that are related to the audio stream and listed in Table 5 above.Table 6 presents the results, along with the corresponding extracted watermarks.Discussion.PSNR yields lower values when watermarking audio streams compared to visual streams.That is because of the nature of the audio signals: human ears are more sensitive to changes than human eyes.However, the LBS-based hiding approach performs the worst since it depends on the spatial www.ijacsa.thesai.orgdomain in the hiding process.The spread-spectrum-and the SDA-based hiding approaches yield similar PSNR values since both depend on the frequency domain in the hiding process.The SDA-based hiding approach yields the highest PSNR value.The reason is that DWT is more accurate in manipulating the frequencies of the audio stream compared to the spread-spectrum-based hiding approach [38].
Regarding resistance against the Gaussian noise attack, Fig. 27 shows the waveform differences of the audio signals.
Discussion.According to Fig. 27, the proposed SDAbased hiding approach performs the best and has the highest resistance against the compression attack.That is because of the silence deletion, where the watermark is embedded within the pure (or cleaned) original audio stream.The spreadspectrum-based hiding approach does not take into consideration the silence deletion, which leads to a large difference between the original audio stream and the watermarked one.The LBS-based hiding approach performs the worst, with the largest difference between the original audio stream and the watermarked one.The reasons are as follows: (1) it depends on the spatial domain in hiding process and (2) it does not take into consideration the silence deletion, resulting in the lowest resistance against the compression attack.
Regarding the resistance to the Gaussian noise attack, Fig. 28 shows the waveform differences of the audio signals.
Discussion.The Gaussian noise attack has a stronger negative impact compared to the compression attack when they are applied on audio signals [39], which justifies the larger difference between the waveforms for all the approaches, as shown in Fig. 28.The SDA-based hiding approach has the highest resistance against the Gaussian noise attack, with the smallest difference between the original audio stream and the watermarked one.The spread-spectrum-based hiding approach is ranked second in terms of resistance against the Gaussian noise attack.The LBS-based hiding approach has the weakest resistance against the Gaussian noise attack.The silence deletion step being used in the SDAbased hiding approach but not in the other approaches plays a significant role in justifying these results.

VII. CONCLUSIONS
Video watermarking is a powerful method for ensuring copyright protection of digital multimedia content.The integrity of the watermarked video (in both the visual and audio streams), high quality of the watermarked video, transparency of the embedded watermark, and resistance against attacks (geometric and non-geometric) are top requirements in any video watermarking system.In this work, we propose a component-based video marking system that satisfies these requirements.The components are as follows: , , , The component finds a place to hide the watermark within the visual stream.The component executes a moving block detection (MBD) algorithm to form the hiding place.The process of hiding in the visual stream is executed by the component, which depends on DWT.Regarding watermarking the audio stream, the component uses DCT to hide the watermark in the pure original audio stream.The pure original stream is obtained by the component, which is responsible for deleting the noise from the original audio stream by executing a silence deletion algorithm (SDA).The proposed system is tested under various geometric and non-geometric attacks.According to the quality of video (QoV) metrics, namely, PSNR, SSIM, and the correlation coefficient, the proposed system is highly resistant against the attacks compared to similar systems that watermark the visual stream.Moreover, according to the PSNR and waveform difference metrics, the proposed system is highly resistant against attacks compared to similar systems that watermark the audio stream.
In future work, we will extend the proposed video watermarking system to deal with additional attacks, such as rotation and bilinear-curved attacks.In addition, we intend to satisfy the capacity requirement, which was not considered in this work.

Fig. 1 .
Fig. 1.General Scenario of a Video Watermarking System.

Fig. 7 .
Fig. 7. Moving and Non-Moving Parts of an Original Video.

Theorem 3 .
The proposed video watermarking system is LPF attack resistant.Proof 3. Let denote the original image (or frame) where the watermark is hidden, where

Fig. 18 .
Fig. 18.Correlation Vs. the Level of Ripple under a Bilinear Attack, where .

Fig. 19 .
Fig. 19.Original and Extracted Watermarks after Applying the Curved Attack.

Fig. 20 .
Fig. 20.Correlation vs. the Level of Ripple under the Curved Attack, with .

Fig. 22 .
Fig. 22. Correlation Vs. the Size of the Window under the LPF Attack, where .

Fig. 23 .
Fig. 23.Original and Extracted Watermarks after Applying the Gaussian Noise Attack.

Fig. 27 .
Fig. 27.Waveform differences between the Original Audio Streams and the Watermarked Audio Streams in the Three Approaches under the Compression Attack.

Fig. 28 .
Fig. 28.Waveform differences between the Original Audio Streams and the Watermarked Audio Streams in the Three Approaches under the Gaussian Noise Attack.

TABLE I .
CAPABILITIES OF THE ATTACKER

TABLE III .
COMPONENTS

TABLE IV .
RHINO VIDEO

TABLE VI .
APPROACH DESCRIPTIONS