Human Visual System-based Unequal Error Protection for Robust Video Coding

To increase the overall visual quality of the video services without increasing data rate, a human visual systembased video coding, founded on a hierarchy of the video stream in different levels of importance, is developed. Determining these importance levels takes in count three classification criteria: the position of current image in the group of images (image level), the importance of the motion vectors of macroblocks in the current image (macroblock level) and belonging or not of a pixel in a spatial region of interest (pixel level). At the end of this classification process, an interpolation between the results of the three-level selection allows to establish an index of importance for each macroblock of the image to be encoded. This index determines the type of channel coding to be applied to the corresponding macroblock. Tests have shown that the technique presented in this paper achieves better results in PSNR and SSIM (structural similarity) than an equal error protection technique. Keywords—video coding; unequal error protection; human visual system (HVS); Regions of Interest ROI; Significant Motion Vectors SVM; Classification; index of importance


INTRODUCTION
Many subjective studies and experiences in the fields of human vision and electronic imaging revealed that the human visual system (HVS) tends to focus on a few favorite areas in images or scenes data.Usually, a loss occurred in the region of the image drawing the viewer's attention, causes a bigger discomfort than if it occurs outside of this area.Moreover, this discomfort can be amplified by the temporal spread of damage related to loss in several images.This phenomenon of propagation is accentuated because of the intensive use of intra and inters images in hybrid video encoders.In general, areas or regions of the image that attracts the attention of the HVS are called regions of interest (ROI), while the rest of the Image is called the background.
Many factors influencing visual attention have been identified, and are grouped into two categories.The first category includes all spatial information that stimulates visual attention (color, orientation, intensity, size, etc ...).The second category concerns the temporal information (motion).
A video sequence contains these two types of information.Since the final quality is judged by humans, it is useful to define the levels of importance of the information, according to their influence on the visual quality as perceived by the final user.The use of coding based on regions of interest in combination with radio resource management algorithms RRM (Radio Resource Management) can improve the global system capacity.However, because of the high efficiency of the compression, the resulting data is very sensitive to the effects of transmission errors.To remedy this problem, the encoding based on the regions of interest was combined with unequal error protection approaches.This paper is organized as follows.Section 2 discusses related work using visual attention in video coding.Section 3 describes the proposed robust video encoder based on the perceptual unequal error protection.Sections 4, 5, 6 describe three classification levels: Image, Macroblocks and pixel, and section 7 explains the determination of the index of importance IG and correcting codes allocation.Synthesis results are discussed in Section 8, and conclusions are given in Section 9.

A. The concept of regions of interest applied in image / video coding
The concept of regions of interest ROI has been treated by several studies in the fields of human vision and electronic imaging.These studies revealed that the human visual system (HVS) tends to concentrate on certain preferred areas when viewing an image [1].They also revealed that certain factors may influence visual attention.These factors are: the contrasts, the shape of objects, object size, color, position.The proposed scheme in [2] exploits the hierarchical nature of the coding based on the regions of interest of the JPEG2000 standard.Two levels of protection against errors are applied.Strong protection is assigned to packets of the ROI using an extended Golay code (24,12), while a low protection is applied to the other packages using an extended Hamming code (8,4).Author in [3] proposes perceptual unequal error protection methods.These encoding techniques, employing Flexible Macroblock Ordering (FMO) for H.264 / AVC, are based on the judgment of the spatiotemporal spread of damage.[4] Presents a novel arbitrary shape ROI coding for scalable wavelet video codec.The motion information of the ROIs is estimated by macroblock padding and polygon matching.In [5], authors propose a novel rate control scheme for ROI coding.They set the area covering people are interested in as the ROI to preserve better quality than the background.A coefficient ω is set to evaluate the significance of ROI, and then it's used to www.ijacsa.thesai.orgcalculate the mean absolute distortion (MAD) of the ROI and the background.Finally, the quantization parameter (QP) can be decided by the quadratic model.

B. Significant Motion Vectors
The temporal aspect is crucial in determining visual attention.In a visual search context, the movement was clearly identified as a visual attractor.The contrast in movement is a key factor that attracts human visual attention.Therefore, a moving area degraded by errors draws attention immediately.Thus was born the concept of significant motion vectors (SMV).This term refers to the decisive motion vectors for a good reconstruction of the transmitted image.If these vectors are lost, the quality of the decoded picture deteriorates considerably.So it's wise to overprotect these vectors comparative to motion vectors judged insignificant.
In [6]; the authors propose a technical error resistance based on the determination and protection of so-called significant macroblock.This schema attributes for each macroblock of the current image a degree of importance (SD Significant degree), calculated from three factors: the prediction mode of the macroblock, the difference between the predicted motion vector and the estimated ones, and residual error between the original image and the reference image.The result of the error concealment technique is also taken into account in determining the degrees of importance.Another way to determine the significant macroblock is presented in [7].The distortions caused by the compensation motion vectors are used for evaluating the effects of error propagation.Macroblocks with most of these effects are encoded Intra, limiting the propagation of errors.In [8], significant motion vectors are selected and protected according to the value of the image PSNR decoded using the motion vectors and the concealment technique.A model of rate distortion optimization is proposed for determining the rate allocated to significant motion vectors.Authors in [9] propose an enhanced error protection scheme using flexible macroblock ordering in H.264/AVC.The algorithm uses a two-phase system.In the first phase, the importance of every macroblock is calculated based on its influence on the current frame and future frames.In the second phase, the macroblocks with the highest impact factor are grouped together in a separate slice group using the flexible macroblock ordering feature of H.264/AVC.

III. ROBUST VIDEO ENCODER BASED ON THE PERCEPTUAL UNEQUAL ERROR PROTECTION
The influence of packet loss on the image quality of a video sequence depends on several factors including the spatial position of the loss in the image, the amount of movement between images, and the image position in the group of pictures (GOP).A protection technique does not take into account these factors, can be costly in terms of bit rate.So, to increase the global visual quality of service video without increasing bit rate, an unequal error protection technique based on the prioritization of the video stream in different levels of importance, are developed.This technique consists of two main steps (Fig. 1): 1) The extraction of some image characteristics enabling the development of cards.These cards allow a classification of macroblocks according following selection criteria:  Image Level: The position of the image to be encoded in the group of images.
 Macroblock level: The importance of the motion vector of a macroblock in the image to be encoded.
 Pixel level: belonging or not of a pixel in a spatial region of interest.
2) Interpolate between the results of the three-level selection allows assigning an index of importance IG to each macroblock of the image to be encoded.This important index determines the type of channel coding to be applied to the corresponding macroblock.

IV. IMAGE CLASSIFICATION LEVEL
In a video encoder, each video sequence is distributed in Group of pictures GOP.Each GOP begins with an Intra Image I followed by a number of predicted images P. the impact of an error transmission occurred in an image, on the rest of the images differs depending on the position of this image in GOP.Indeed, the authors in [10] have shown that the propagation of errors due to predicted images, depend on the position of those images within the sequence encoded: If the predicted image P is just after an intra-image I, then its influence on the rest of the sequence will be greater than if it is just before or near an image I. therefore, one can obtain a gain on the quality of the video by increasing the protection of first predicted images P.
Since the redundancy introduced by this approach should not exceed one introduced by a conventional equal protection technique, the latest predicted images P in each sequence will be sacrificed in terms of protection.So it's important to determine the best compromise between the number of the protected images and one of the least protected images, a compromise that minimizes the global average distortion evaluated by MSE or PSNR.MSE is calculated from the decoded sequence S d (transmitted through an imperfect channel), the original sequence S 0 , the image size N * M and the length T of the sequence [10].
The global MSE Is: Tree categories of image are considered (Fig. 2): R1 class of high protected images, R2 class of medium protected images and R3 class of least protected images.Determine t0 (= 0), t1, t2 and t3 (= T): limits of each class, is to minimize the global distortion of the sequence.
The relationship between the encoding efficiency and the parameters t0, t1, t2 and t3 is formulated as follows [10]: (2) t1 is the solution of the following equation: P1, P2 and P3 are the error probabilities of the classes R1 R2 and R3 respectively.t2 is calculated by: So, taking into account these codes, values of t are: t 1 = 5 and t 2 = 25.This means that the 5 first images of GOP belong to the class R1, the following twenty images belong to the R2 class and images that remain belong to R3.

V. MACROBLOCK CLASSIFICATION LEVEL, BASED ON SIGNIFICANT MOTION VECTORS
Given the major impact of the loss of significant motion vectors, the degradation of the visual quality of reconstructed images; we chose to take into account the importance of the motion vectors in the development of an unequal error protection scheme.In the following, two methods to identify significant motion vectors are presented [11].
A. First approach: classification based on the global distortion SMVs are extracted in function of the measure of the distortion.We propose to express the visual importance of a macroblock in terms of distortion that would be caused by the loss of this macroblock.Such distortion may be calculated by comparing the video sequence decoded using the correct data with the video sequence decoded using the replacement data generated by the concealment technique in the decoder.If transmission errors are not quite hidden, the distortion will be high and therefore the corresponding motion vectors corrupt macroblock are considered important.
Temporal concealment technique of corrupted macroblocks is integrated in the encoder.Error is injected into the encoded stream; lost macroblocks are replaced by those corresponding to the previous image.The importance of each macroblock in an image is therefore estimated by calculating a distortion measure between this macroblock and the one produced by the error concealment from the previous image.This value is compared to a threshold.If it is below the threshold, the motion vector is considered insignificant.Otherwise, it is classified significant.

Fig. 2. P images Classification according to their positions relative to I image
Let D(i), the distortion of the i th macroblock (MBi) decoded without transmission error, and Dc(i) distortion of the i th macroblock corrupted by errors, and decoded using the concealment technique.The distortion is calculated by MSE.
Let Y(i) the difference between D(i) and Dc(i), Y s is the comparison threshold.
 If Y (i) <Y s , this means that the error concealment is able to predict correctly MB i .The MV associated at MB i is classified not significant.
 By cons, if Y (i) ≥ Y s , the corrupted or lost macroblock cannot be substituted.In this case, the motion vector associated at this Macroblock is defined as important.
The selection of a comparator threshold is crucial insofar as this threshold will determine the number of significant motion vectors in an image.Now if this number increases the redundancy introduced by the channel coding also increases.Therefore, two criteria must be taken into account in the choice of Y s : the computational complexity and compression   r www.ijacsa.thesai.orgefficiency translated by the number of macroblocks judged significant.In order to find a compromise between the two criteria, Y s is chosen as the overall average distortion Y m of the current image.It represents the mean values of Y (i) of all the macroblocks of the image.This value is easy to calculate and does not require complex operations or intensive computation time and memory.Fig. 3 shows the result of a macroblock classification in the 2 nd image of Foreman sequence, according to the value of the distortion associated with each macroblock.This approach has two major disadvantages: the complexity of the encoder diagram and not taking the subjective quality of images into account.Indeed, the use of MSE as distortion measure does not allow classifying motion vectors according to the impact of the injected errors on the visual quality of the image.Appeal to subjective quality measures for calculating distortions, will certainly remedy this disadvantage, but will greatly increase the encoder complexity, especially as this complexity is already increased by the need to integrate a decoder within encoder to assess the performance of the error concealment technique in the case of a corrupted stream and determine thus significant motion vectors.This added complexity of the scheme affected the performance of this approach and makes it unsuitable in the case of mobile wireless transmission and real-time applications.In the following, another approach to classify the motion vectors are proposed, which are not based on the distortion but on the value of the motion vector himself.

B. Second approach: classification based on the amplitude of motion vectors
The idea is to make a "coarse" classification of the motion vectors.This idea is based on the fact that at the decoder level, the simplest concealment technique is replacing a lost macroblock; by the corresponding one in the previous image, considering the motion vector of the macroblock as zero (VM = 0).Therefore, the zero motion vectors do not need a large protection since, in the decoder, they can be correctly compensated.Thus, a motion vector is defined as:  Non-significant if its magnitude is zero (no movement);  Significant in the opposite case (presence of motion).This classification will cause, certainly, a slight reduction in compression efficiency; however, it will be compensated by the use of two other classification blocks.In addition, this approach allows us to take advantage of the resources offered by the encoder in the classification algorithm.Thus, the results of motion estimation block will be used directly in the development of the map of significant motion vectors.Fig. 4 shows the classification of the macroblocks of the image number 2 of the Foreman sequence, depending on the value of the motion vectors associated with each macroblock.

VI. PIXELS LEVEL CLASSIFICATION
This classification is made according to or not belonging of a pixel to a region of interest.In an encoding scheme based on ROI, the identification of these regions is crucial.Indeed, the results of ROI automatic identification algorithms should be consistent with the results of identification done by a human observer.For this, human perception and visual attention (AV) should be taken into account when developing algorithms to identify regions of interest.In this work we have chosen to apply the identification algorithm of ROI based on visual attention model proposed by [12].Its functional principle is as follows: first, primitive visual characteristics (intensity, color and orientation) are extracted from the image to form maps of characteristics (Fig. 5 (a)).Then, the cards are standardized by filtering using a difference of two-dimensional Gaussian filter.In each map, the most salient areas are selected and finally the maps are combined by a weighted average to obtain the saliency map of the image (Fig. 5 (b)).From this map, a belonging card of pixels to a ROI was developed (Fig. 5 (c)).
In part two of this scheme, we have to make an interpolation between the card delivered by the Macroblocks classification process and that delivered by the pixel classification process.Now, these two cards do not have the www.ijacsa.thesai.orgsame dimensions (9X11 for the first and for the second 144X176).To remedy this inconvenience, another card representing the membership or not of a macroblock to a region of interest was created (Fig. 6 (d)).A macroblock is considered belonging to a region of interest, if at least half of its pixels belong to a region of interest.

VII. DETERMINATION OF THE INDEX OF IMPORTANCE I G AND CORRECTING CODES ALLOCATION
As it been explained above, proposed unequal error protection scheme is in two phases: classification process and allocation of different correcting codes to Macroblocks.This allocation is done according to an index of importance calculated from the results of the first phase.
Once the classification process is completed, a Multicriteria evaluation (MCE) is conducted to estimate the global index of importance of each macroblock of the image to be encoded.The value of this index will determine the type of correcting code applied to each macroblock.This evaluation consists of applying to the result of the classification process, a Boolean overlay with intersecting operators (AND) and union operators (OR), that act as constraints.The classification stage presented in the previous section, delivers three index cards for each image: Index card, relative to the image classification.It is noted I R, such as: ; Index card, relative to the macroblocks classification.It is noted I MB, such as: I MB = 1 if the macroblock is significant, otherwise I MB = 0; Index card, relative to the macroblocks classification.It is noted I pixel , such as: I pixel = 1 if the macroblock is in ROI, otherwise I = 0 pixel.
The global index of importance I G of a macroblock is calculated according to the diagram in Fig. 6.In images of the R1 class, the criteria I MB and I pixels are combined with a logical OR (union), which means that a macroblock is considered important, if only one criterion equals 1.While in the images of the R2 class, the criteria are combined with a logical AND (the intersection operator) requires a macroblock respond positively to the two criteria for it to be selected as important.This selection process is done in the worries of keeping good value rate / quality.It has been explained in section 4 that the values of t 1 and t 2 depend on C1, C2 and C3.So, taking into account the chosen yield values obtained t 1 = 5 and t 2 = 25.This means that 5 first images belong to R1 class, the following twenty belong to R2 and images that remain belong to R3. www.ijacsa.thesai.orgVIII.
TESTS AND RESULTS In the simulation two video sequences of 30 images was used: Foreman and Akiyo (QCIF size and 176 x 144 pixels).The GOP size is 30: an Image I is inserted periodically every thirty images.The transmitted signal is subjected to Rayleigh noise.This proposed system is compared to a conventional Equal Error Protection technique (EEP) using a correcting code of 2/5 rate.Convolution encoder Rates applied to the high, medium and low protection, are respectively 1/3, 2/5 and 1/2.
To better correlate the proposed technical performance with the real quality as perceived by humans, it is necessary to use measures that integrate the characteristics of the human visual system in their algorithms.PSNR and SSIM measures (structural similarity) between the reconstructed video and the original video are used to measure the quality of the decoded video.

A. The distribution of Macroblocks according their importance
The distribution of Macroblocks according to their importance in GOP is presented in Fig. 7 Foreman sequence.Since the first image is encoded Intra (no motion vectors), 1 is assigned to all elements of the card 2, which implies that all I G in this image are set to 1. Certainly, this image will be overprotected (all macroblocks are important) by the channel code C1.Nevertheless, this overprotection is acceptable given the importance of I image in the prediction of following P-type images.

B. Visual quality of images
The performance of our proposed scheme, in terms of quality, compared to the equal error protection (EEP) scheme is shown in Fig 8 .It can be seen that this unequal error protection shows better results than equal error protection.Indeed, in terms of PSNR, this method allows an improvement that can reach up to 3dB.In Fig. 9 and Fig. 10 are presented the images from the decoded Foreman sequence and Akiyo sequence.To illustrate the robustness of this technique, errors are injected into the ROI of three P Images belonging to three different classes (R1, R2 or R3).The resulting images demonstrate the efficiency of this approach even in extreme cases.Indeed, the impact of the errors is barely visible on the images coded by this technology, while images encoded by EEP are considerably degraded by the effect block.The visual quality drops rapidly in the last images because of errors propagation.The results of table 1, in terms of SSIM (structural similarity) show that this technique also overcomes the equal protection in tests of the subjective quality of the reconstructed images.

Fig. 1 .
Fig. 1.Unequal Error Protection based on a hierarchy of the video stream in different levels of importance the 1/3 turbo codes are used in UMTS systems, we opted to use the same codes in the proposed scheme, Such as:  If I G = 1, the macroblock will be coded by a C1 = 1/3 code;  If I G = 2, the macroblock will be coded by a C2 = 2/5 code ;  If I G = 3, the macroblock will be coded by a code C3 = 1/2.

Fig. 3 .
Fig. 3. Classification of macroblocks of the 2 nd image of Foreman sequence, according to the value of the distortion associated with each macroblock

Fig. 4 .
Fig. 4. Classification of macroblocks of the Image No. 2 of the sequence Foreman, depending on the value of the motion vectors associated with each macroblock

Fig. 5 .
Fig. 5. Identification of regions of interest in image 2 of Akiyo sequence (a) maps of the primitive visual characteristics; (b): saliency map (c), pixel classification map; (d) macroblocks classification map

Fig. 6 .
Fig. 6.Determination of Index of importance I G Using a different correcting code for each P image requires the implementation of three coding blocks in the encoder, and such in the decoder, which increases the complexity of the global configuration.To reduce this complexity, we use only one type of correcting code, with different rate.Convolutional codes are a highly efficient and flexible class error correction codes.These are the most used in fixed and mobile telecommunications systems code.Since the 1/3 turbo codes are used in UMTS systems, we opted to use the same codes in this scheme, Such as:  If I G = 1, the macroblock will be coded by a code C1 = 1/3 ;  If I G = 2, the macroblock will be coded by a code C2 = 2/5 ;  If I G = 3, the macroblock will be coded by a code C3 = 1/2.

Fig. 7 .
Fig. 7.The number of Macroblocks according to their importance in the 30 images of the sequence Foreman

Fig. 8 .
Fig. 8.Comparison between this unequal error protection and equal error protection of the Foreman sequence