Estimation of the Visual Quality of Video Streaming Under Desynchronization Conditions

This paper presents a method for assessing desynchronized video with the aid of a software package specially developed for this purpose. A unique methodology of substituting values for lost frames was developed. It is shown that in the event of non-similarity of the sent and received sequences because of the loss of some frames in transit, the estimation of the quality indicator via traditional (existing) software are done inaccurately. We present in this paper a novel method of estimating the quality of desynchronized video streams. The developed software application is able to carry out the estimation of the quality of video sequences even when parts of the frame is missing, by means of searching out contextually similar frames and “gluing” them in lieu of the lost frames. Comparing obtained results with those from existing software validates their accuracy. The difference in results and methods of estimating video sequences of different subject groups is also discussed. The paper concludes with adequate recommendations on the best methodology to adopt for specific estimation scenarios. Keywords–video streaming; encoder; decoder; video streaming quality; PSNR.


INTRODUCTION
The assessment of video quality is currently being researched both theoretically and practically.New and objective metrics for estimating the quality of video signals are constantly being developed.These metrics are often in the form of mathematical models that imitate subjective estimates.Calculating Peak Signal-to-Noise Ratio (PSNR) between the source signal and that obtained at the output of the system being analyzed remains the conventional means of estimating the quality of digitally processed video signals using software.There exists a plethora of software solutions from both academia and industry, that allow for estimation of the quality of streaming video; popular examples are: Elecard Video Estimator [1], Video Quality Studio 0.32 [2], MSU Video Quality Measurement Tool [3], PSNR.exe [4].
All the packages cited above are however meant exclusively for the assessment of fluctuations in video information arising because of coding and compression.They cannot process files with frames either lost in the process of transmission or desynchronized vis-à-vis the input sequence.
Video transmission over wireless networks presupposes a possible loss of synchronization between the original video sequence and the decoded copy at the receive end.This is due to the unpredictable nature of the effect of the transmission medium on the data packet, leading to packet distortion and often times, outright loss of packets.It is for this reason that a manual synchronization between the analyzed video sequences must be done.
The development of a software package that incorporates knowledge of these peculiarities as well as the capability of calculating quality parameters in cases of loss of frames in the video sequence becomes of paramount importance.

A. Theoretical Background
The major peculiarity of pictures is their mode (modal characteristic).There are three possible picture modes namely: Red-Green-Blue (RGB), half-tone scale (formation of pictures using the brightness level) and indexing.RGB: In this mode, each picture element (pixel) is described by the Red, Green, and Blue color levels.Since any perceivable color can be presented as a combination of these three basic colors, the RGB picture is a full-color picture.Each color is described as an 8-bit information, this allows for the usage of 256 color intensity levels, resulting in 16.7 million colors (i.e.2^8R x 2^8G x 2^8B), also known as True Color.
Half-tone scale: On the other hand for Half-tone images, each pixel can be described in 8-bit brightness levels ranging from 0 (absolute black) to 255 (maximum brightness).The actual difference between half tone and RGB images lies in the number of color channels: one for half-tone images and three for RGB images.An RGB image may be presented as the superposition of three half-tone images, each of which corresponds to R, G, and B respectively.Three matrices, each of which correspond to one of the RGB colors and determine the pixel color, describe the image.For example, an image of 176x144 pixels would require three matrices of equal size whose elements are the intensity values of the color of each pixel.
The PSNR parameter is a mathematical instrument by which the correspondence (relationship) between the original and distorted video sequences can be established.The greater the difference between the sent and received video sequences, the lower the PSNR value measured in dB, in accordance with the visual logarithmic sensitivity of the human eye.www.ijacsa.thesai.org In the comparison of two video sequences made up of N frames with resolution Dx x Dy (e.g. for QCIF x = 144, y = 176; for CIF x = 288, y = 372) and pixel coordinates I(n, x, y), where n = 0, ... , N -1, x = 1, ... , Dx ; y = 1, ... , Dy represents the brightness (component Y) of a pixel with coordinates (x, y) in the video frame n), the following equation holds: (1) Applications used for real-time transmission usually code multimedia information in a standard that is not stringent on packet loss, an example is the popular MPEG coding standard.This standard employs both intra-frame and inter-frame compression with different types of frames (I, P and B).Repeated parts of I, P and B frames are called Group of Pictures (GoP).The choice of the GoP structure affects the properties of the MPEG video; such as file size, which in turn affects the video stream bitrate and ultimately the resultant visual quality.The number and relationship of different types of frames in GoP is chosen relative to coding efficiency for the particular video subject group in question: static, pseudo static, and highly-dynamic (i.e.SSG, PSSG, and HDSG).
The GoP length is a function of the structure and number of frame types used.Short GoP of less than six frames is used when image transition is very fast.If a fast transition of the preceding frame is observed, the presence of a large quantity of P and B frames may worsen the quality.The possible value of P frames is in the range of 2 to 14. Usually, only a small value of P frames is used in practice (say 3 or 4).Only I and P frames (e.g. 1 I-frame and 14 P-frames) can be used if coding occurs with high bitrate (e.g.> 6000 kbps).B-frames guarantee good compression, but like P-frames, they cause degradation of quality in dynamic subjects.A small value of B-frames is used in practice (say 0 or 3).
The most commonly employed GoP structure is IBBPBBPBBPBB.The maximum GoP length according to DVD specification is 18 for NTSC and 15 for PAL.In order to increase coding efficiency, it is important to determine the size and spread of GoP frames.Since the subject of any video sequence may change in time, the GoP structure may also be non-static and change with time.Hence the need for a dynamic determination of the GoP structure.It is possible to automatically determine the GoP structure in real-time with availability of information such as Sum of Absolute Difference (SAD) and Mean of Absolute Difference (MAD).Works on this problem abound in the literature: a relatively easy methodology for determining the GoP structure is developed in [5], [6].The optimization of GoP structure based on the time relation of sequential frames was considered in [7].A simple method for determining the GoP structure is given in [8], while [9] presents an algorithm for GoP determination.
The MPEG standard assumes that in most videos, adjacent frames are similar.Since adjacent frames are described based on how they differ from the input frame, assuming that frames within the GoP may replaced with each other in cases of loss of one of them with negligible effect on the quality of the whole video sequence is logically sound.However, during frame padding, image analysis is necessary to determine the most similar frame for use.The decision to use Matlab for this purpose was adopted, since it allows for the processing of a large amount of data and has its own scripting language.

III. DESCRIPTION OF DEVELOPED SOFTWARE
The basic principle of the developed program is to estimate distortions added to the video sequence, taking into account frame-wise comparison of the original and received video sequence.PSNR is calculated for each frame of video sequence received (i.e. that passed through the data transmission network).The input data for the program are frames of video sequences (original and received) in bmp format.Each frame is represented as a three-dimensional matrix in the software, while each matrix element can be any value from 0 to 255 that determines the saturation of one of the colors (Red, Green, Blue) of each image pixel.Repeated identical frames in the transmitted and received video sequences are deleted.In cases of different frame numbers in the video sequences, the software will process the minimum value (usually a received sequence).For correct comparison of video sequences, the software compares the received sequences with the original, i.e. synchronizes the video data.Each received frame sequence is compared with the frames in the original sequence within a certain range of frame numbers.This interval is set manually with consideration for the adopted video subject group and a certain GoP structure.If the numbers of received and initial frames do not match, this interval is extended to the difference of frames.The principle of frame synchronization is shown in Figure 1.If the search interval is set to be 3 frames, then this value will be increased to 5 frames, because two frames are added due to difference in amounts of the original and received frames in the video sequence (the difference of two frames).Figure 1 shows three cases of frame search, depending on the position of the lost frame in the sequence.An example of when received frames are less than the original is given.
The value of video frame similarity is the average of all matrix elements calculated sequentially for each dimension.To calculate the PSNR, a difference matrix containing the pixel differences of the received and input frames is generated.The data of the difference matrix is also averaged sequentially in each dimension, and the maximum difference in the frames is presented as an 8-bit number ranging from 0 (identical images) to 255 (maximally different frames in terms of comparison of www.ijacsa.thesai.orgblack to white image).The obtained PSNR values are stored in a file containing the source frame, corresponding number of the received frame and value of PSNR of the frames.The algorithms for the developed software is presented in flowchart form in Figure 2. IV.RESULTS AND DISCUSSION For the experiment, the video sequences of "Hall", "Foreman", and "Football" in YUV format accessible at [10] and recommended for the carrying out test experiment [11] were used.These sequences are characterized by having various subject groups, e.g.static subject group (SSG)hall, sedentary or pseudo static (PSSG) -foreman, and highly dynamic (HDSG) -football.In evaluating the developed software, two experiments were conducted: In the first experiment simulation of video transmission over a wireless AWGN channel with BER = 104 was done using VCDemo [12].Synchronization of video sequences was intact.In the next experiment, the de-synchronization of video sequences was achieved by removing frames from the received video using VirtualDub [13] as shown in Figure 3. Frame removal was performed in three different combinations: 1)the 50th frame was removed; 2)the 50th to 55th frames were removed; 3)the 50th to 60th frames were expunged.These removals correspond to 1%, 5% and 10% frames removed.
Quality assessment was performed using the psnr.exesoftware, a component of the hardware and software complex in detail in [14].Results of quality assessment of synchronized and desynchronized video sequences are shown in Figure 4a  and 4b. Figure 4b shows that the PSNR value after the 50th frame drops sharply and then remains virtually unchanged.However, this is not true, because removing some frames do not affect the next in the long run.The unreliability of this result shows the inability of the conventional software to compare individually the original and received frames, but rather only executes a serial comparison.Thus, it should be noted that for non-correspondence between transmitted and received frames, due to the loss of some of them, the assessment of quality done by existing software is incorrect.The purpose of the second experiment was to test and compare the results of the developed software with traditional software in assessing the quality of video sequences.www.ijacsa.thesai.orgFor these purposes, the original and the tested video sequences were subjected to a frame-by-frame transformation into a format suitable for Matlab processing (bitmap picture .bmp)using VirtualDub [13].The numbering of frames for each of the three video sequences namely: a) the original, b) distorted with no loss of frame, and c) distorted with frame loss is done sequentially, that is from 1 to 100.The output of the software is a file report in tabular form containing the number of transmitted frames, number of received frames, and the corresponding PSNR in dB (Table 1).In the absence of lost frames and retention of synchronization, the serial number of the received frame will match that of the transmitted frame.In the event of loss of some frames during transmission, their serial numbers will be missing in the "№ of sent frames" column, which means the absence of the corresponding frame in the received sequence.For such frames, the PSNR is not calculated (in this example from 50th to 55th frames).It is therefore necessary to note missing (i.e.lost) frames during analysis and computation of PSNR.Starting from 50th received frame, there is a correspondence with the source frames albeit by a shift of six frames (i.e. a displacement due to the loss of frames 50-55).Evaluation of the accuracy of the calculated values of PSNR by means of the developed software compared to those of traditional (existing) software is presented in Figure 4a.The small observable change is due to conversion in video sequence format from MPEG video to BMP image format.

№ of
Figure 4b shows the values of the PSNR indicator calculated using traditional software with and without synchronization.It is shown that in the measurement of PSNR of video sequences from which frames are removed, the traditional software's calculation is incorrect after the 50th frame.The developed software's calculations are accurate, due to the presence of the synchronization function incorporated in it (Figure 4c, 4d).
In Figure 4c a shift in the PSNR graph is observable and it corresponds to the number of lost frames to the left."null" values may be inserted in place of lost frames, which as a rule describe the worst quality scenario.This corrective approach makes it possible to match the values before and after the loss of frames.Figure 4d shows the insertion of the value PSNR = 20 dB, which characterizes very poor quality.The PSNR values cannot be predicted even under deterministic experimental conditions.We can only indicate the likelihood of quantities taking a certain value or falling within a given interval.However, with knowledge of the probability distribution of this quantity, one can determine its properties and characteristics [14].
Figure 5 shows the quality indicator of video sequences corresponding to different subject groups namely: static (Hall), pseudo-static (Foreman), and highly dynamic (Football).In Figures 5c and 5d show that the calculation of PSNR value using traditional software is incorrect, since some of the results in the area are less than 25 dB, which corresponds to poor quality.If frames are lost in the sequence, PSNR histogram should not change, and there should be a reduction in the number of observations PSNR, i.e. the histogram should only decrease in the vertical axis (number of observations) in those bands, to which the lost frames belonged.This is the reason for the high distortion of the empirical distribution function (Figure 5e). Figure 6 displays the results of the calculations gotten from the developed software.In Figure 6b as described above, there is a shift in PSNR left by the amount of lost frames.Histograms of PSNR (Figure 6с and 6d) show that the distortions before and after frame removal are virtually identical, indicating the correctness of PSNR calculation after frame removal using the developed software.The distribution functions also vary slightly depending on the number of lost frames (Figure 6e).
Figure 7 shows computation results of developed software and insertion of "null" values.The graphs of PSNR on Figure 7b are identical except for the data number, which equals the quantity of lost frames.The distribution functions also vary slightly depending on the number of lost frames (Figure 7d).Thus, the method of inserting "nulls" does not correspond to the original data for large values of lost frames.One of the methods of mitigating this short fall is by insertion of average PSNR value of video sequences rather than "nulls".www.ijacsa.thesai.orgIn all, nine experiments were conducted.For ease of presentation of the results, we denote them by their corresponding numbers as follows: 1calculated using traditional software with 1% packet loss; 2calculated using traditional software with 5% packet loss; 3calculated using traditional software with 10% packet loss; 4calculated using developed software with 1% packet loss; 5calculated using developed software with 5% packet loss; 6calculated using developed software with 10% packet loss; 7calculated using developed software with 1% packet loss, and insertion of "null" values; 8calculated using developed software with 5% packet loss, and insertion of "null" values and 9calculated using developed software with 10% packet loss, and insertion of "null" values.Delta (Δ) values were calculated from the data of distribution functions using formula (2): where ( )distribution function of PSNR video sequence, calculated with traditional software without lost frames (Figure 4a); ( )distribution function of PSNR video sequence, calculated with the developed software,with a certain number of lost frames (Figure 4c and 4d).
Figure 8 shows the values of ΔF (PSNR), calculated according to formula (2).Analysis shows that for calculation of desynchronized video sequences using the developed software, a slight difference in the distribution function is noticed as a result of "gluing" values for lost frames.The largest variance from the original distribution function is observed for inserting "null" values.At the same time an increase in the percentage of lost packets causes a substantial increase in variance with original sequence making this method unsuitable for the calculation of frames, regardless of the video subject group.The histograms of the values of ΔF (PSNR) are shown in Figure 9.
The mean and variance were gotten from the computed values of ( ) using the following formula (3).The computation results are presented in Table 2.
V. CONCLUSIONS AND RECOMMENDATIONS From the analysis of obtained results, we safely conclude that: a) Assessing the quality of desynchronized video sequences via traditional software leads to incorrect computational results that do not reflect the true value of PSNR.The error value, ΔF (PSNR) for traditional software increases from 0.3dB to 0.5dB for SSG and PSSG respectively, and ranges from 0.35dB to 0.45dB for HDSG with increase in percentage of lost frames.The Mean and Variance of the error, ΔF (PSNR) in estimation of quality using the traditional video software has the highest values, which confirms the incorrectness and inappropriateness of using such software for assessing the quality of results.Assessing the quality of desynchronized video sequences using the developed software is reliable when used for both normal outcome ("gluing" values instead of lost frames) and for insertion of "null" values.A small difference in the distribution function is observed for inserted values instead of lost frames.Regardless of the percentage of lost frames the error index ΔF (PSNR) has a maximum value of < 0.2dB, < 0.3dB and < 0.1dB for the SSG, PSSG and HDSG respectively.In comparison with corresponding values for the traditional software, the superiority of the developed software becomes evident.The biggest difference from the original distribution function is observed in the case of inserted "null" values.With the increase in the percentage of lost packets, the difference in the quality indicator is substantially increased.So, for all subject groups, ΔF (PSNR) increases by an amount not less than 0.05dB and equals 0.25dB, 0.35dB and0.2dB for the SSG, PSSG and HDSG respectively.Thus, the most appropriate method of estimating the desynchronized video sequences with a small amount of packet loss (less than 10%) in terms of error is by inserting "non-zero" values.The method of inserting "null" values can be recommended for a larger number of lost frames, or if there is a need to restore the number of PSNR values in the original sequence as well.At the same time it is recommended that average values of PSNR be used in lieu of "null" values.
c) It is shown that different subject groups have different effects on the resultant quality of the desynchronized video sequence.For example the PSNR indicatorfor SSG is characterized by a slight variation and its histogram has a bimodal shape with a width of 8 dB;for PSSG has a large variation, with its histogram having a bimodal shape with a width of 14 dB;for HDSG has the largest variation, its histogram has a bimodal shape with a width of 24 dB.This in turn affects the range of PSNR values containing the error ΔF (PSNR).The highest range of 20dB -45dB occurs for PSSG and HDSG.
d) Analysis of the mean and variance of the error ΔF (PSNR) shows that the use of insertion is more profitable.Thus with increasing number of lost frames, the mean for the SSG increases from 0.11dB to 0.12dB, and for PSSG and HDSG decreases from 0.22dB to 0.21dB and from 0.14dB to 0.12dB respectively.Meanwhile, the variance only experiences a slight change.The analysis shows that in the case of a small number of lost frames (1%), the mean of the error ΔF (PSNR) using the method of inserting "null" values has the lowest values (0.09, 0.2 and 0.13) dB for SSG, PSSG and HDSG respectively.The "glue" method has the largest corresponding to largest (0.11, 0.22 and 14) dB for each subject group respectively.e) From the above conclusions, the developed software is recommended for assessing the quality of video streams of different subject groups in desynchronization conditions.With a small number of lost frames (about 1%) it is recommended to use the insertion of "null" values, and the "gluing" method for other cases.This approach guarantees reliable estimation of the quality of the received video stream with minimal estimation error.

VI. SUMMARY
In summary, it is thus shown that in the event of nonsimilarity of the sent and received frames because of the loss of some frames in transit, the estimation of the quality indicator is done inaccurately using existing software packages.This paper has presented and painstakingly described a novel method of estimating the quality of desynchronized video streams.It is likewise shown that the developed software application is able to carry out the estimation of the quality of a video sequence even when parts of the frame is missing, by means of searching out relevant frames.The accuracy of obtained results is established by comparison with existing software the estimated quality of video streams.The difference in methods of estimating video sequences of different subject groups is also highlighted.

Figure 1 .
Figure 1.A Method for Frame Synchronization

Figure 2 .
Figure 2. A Method for Frame Synchronization

Figure 3 .
Figure 3.The distortion of the video sequence after transmission over AWGN wireless channel on the 22nd frame (top) and desynchronized video for 51st frame (bottom)

Figure 4 .
Figure 4. PSNR values for Forman video sequence computed with traditional software (without loss of frames) and the proposed software (with loss of frames № 50-55): а)correspondence of values; b) -Computed values using traditional software; c) -Factual computation of values (observable "gluing" of values in place of lost frames); d) -Insertion of "null" values in place of lost frames.

Figure
Figure 5a and 5b.PSNR values for video sequences of Hall, Foreman, and Football, distorted during transmission over a wireless channel and computed using traditional software: а) -Synchronized video sequences; b)for frame loss and desynchronized conditions.

Figure 6 .
Figure 6.PSNR values for video sequences of Hall, Foreman, and Football, distorted during transmission over a wireless channel and computed using traditional software: а) -Synchronized video sequences; b)for frame loss and desynchronized conditions; c) -Synchronized video sequence distribution histograms; d) -Distribution histograms under frame loss and Desynchronization conditions; e) Probability distribution of experimental data with and without packet loss.

Figure 7 .
Figure 7. PSNR values for video sequences of Hall, Foreman, and Football, distorted during transmission over a wireless channel and computed using traditional software with insertion of "null" values: а) -Synchronized video sequences; b)for frame loss and desynchronized conditions; c) -Synchronized video sequence distribution histograms; d) -Distribution histograms under frame loss and desynchronization; e) Probability distribution of experimental data with and without packet loss.

Figure 8 .
Figure 8. ∆F (PSNR) values for Hall, Foreman and Football video sequences