Hybrid Non-Reference QoE Prediction Model for 3D Video Streaming Over Wireless Networks

—With the rapid growth in mobile device users, and increasing demand for video applications, the trafﬁc from 2D/3D video services is expected to account the largest proportion of internet trafﬁcs. User’s perceived quality of experience (QoE) and quality of service (QoS) are the most important key factors for the success of video delivery. In this regard, predicting the QoE attracts high importance for provisioning of 3D video services in wireless domain due to limited resources and bandwidth constraints. This study presents a cross-layer no-reference quality prediction model for the wireless 3D video streaming. The model is based on fuzzy inference systems (FIS), and exploits several QoS key factors that are mapped to the QoE. The performance of the model was validated with unseen datasets and even shows a high prediction accuracy. The result shows a high correlation between the objectivley measured QoE and the predicted QoE by the FIS model.


I. INTRODUCTION
The success of the modern communication networks and systems has made many applications and services widely accessible and more usable.Among of these are multimedia applications which become popular over the Internet.One critical multimedia service is video streaming that is nowadays not only delivered as 2D stream but also in 3D.It is expected that 82% of the global Internet traffic will be video, and more than 65% of the video traffic will be transferred over wireless communication links and using mobile devices [1].It is evident that maintaining video quality, particularly in the scarce wireless and mobile environments, is of critical importance.While this can be tackled focusing on the performance of the network, it is also important to highly consider it from the end user perspective.Focusing on consumer's QoE rather than the network's QoS leads to evaluating the quality of videos as perceived by end-users.Compared to 2D videos, a 3D video introduces a new dimension, namely the depth, which needs to be additionally taken into account when processing users QoE.
Maintaining high QoE for the end users requires the ability for 3D videos to be efficiently monitored, predicted, and controlled [2].Nevertheless, the prediction of QoE is highly based on understanding the impact of QoS parameters [3], [4].In this regards, the relationship between QoE and QoS could be established in a defined prediction model.This can be done based on no-reference QoE prediction without the need for a reference 3D video.In order to achieve such capability, it is important to develop QoE prediction models that consider more QoS factors related to user's QoE.One of the effective techniques for developing objective QoE prediction models is the learning-based technique with the different types of machine learning methods [5].Developing QoE prediction models using the machine learning methods would result in a model that can dynamically and intelligently learn and then make a decision like human reasoning.
There is a number of machine learning techniques that have been adopted to realize QoE prediction models [6].Among the popular examples are Fuzzy Inference Systems (FIS), Artificial Neural Networks (ANN), and Decision Tree.However, it is critical to comprehensively incorporate the key QoS factors for video traffic, instead of relying on a basic model of limited factors which is the case in the majority of the proposed solutions.According to the ITU classification of objective quality assessment models [7], a hybrid QoE prediction model is the best choice to build a generic prediction model, which is proposed in this paper.
The FIS method has been the choice for many solutions in telecommunications and engineering.It provides an efficient system for addressing the innate uncertainty, which can be caused by internal or external factors.Developing a QoE prediction model based on FIS with predefined rules is an effective approach to make effective decision with imprecise information.Typically, fuzzy rules in FIS cannot be automatically formulated and need to be manually updated when the input dataset is updated.We proposed in this paper a hybrid no-reference prediction model using an automated FIS for predicting the quality of wireless 3D video streaming.The main target of the proposed solution is to support real-time QoE prediction at an intermediate measuring point over a wireless network for 3D video streaming.
The structure of the paper is as follows; the related works is discussed in Section 2. Section 3 provides a description of the experimental setup.The validation and statistical analysis of the QoE measurements resulted from the simulations are presented in Section 4. Section 5 discusses the methodology of the proposed video quality prediction system.The QoE prediction model performance validation and evaluation are provided in Section 6, respectively.Finally, the conclusion is in section 7.

II. RELATED WORKS
Compared to other multimedia services, video streaming is a demanding service that is more sensitive to any degradation over the network.The is more evident in the case of 3D video communications over wireless networks with mobile end-user devices.Without careful and efficient system design, degradation in user experience would occur.Therefore, it is critical to consider the different application and networking aspects in this regard.There is a variety of video quality measurement methods in the literature with different operational and computational requirements [3].The measurement of video quality is typically achieved with one of the two modes: non-intrusive no-reference (NR) and intrusive fullreference (FR).The FR mode has to use both the original and distorted video streams.In contrast, the NR mode is only based on the distorted video streams, which is more reliable for real-time applications.
The effects of QoS can happen at the access network layer, referred to as NQoS, or at the application layer, referred to as AQoS.The NQoS can be described with a number of NQoS parameters including delay, packet loss, and random loss.These have been considered in many research efforts such as [8], [9], and [10] for estimating QoE in 2D video streaming applications.On the other hand, AQoS parameters such as frame rate bit rate were the focus in [11], [12], [13], [14].However, QoE estimation for 3D video streaming received less attention in the literature.This would require the consideration of additional parameters relevant to 3D videos including depth perception, naturalness, and comfort levels.The study in [15] focused on examining random packet losses and its effect on the overall 3D perception.With a subjective test, the results showed that the increase in packet loss rate led to a negative trend in 3D perception.In [16] and [17], the researchers considered the degradation on the 3D video quality when delivered over wireless mobile networks.The aim was to understand how the overall 3D perception can be affected by random packet loss using a subjective test.It is apparent that there was no consideration in these studies for evaluating the quality of 3D videos with VBR streams in different resolutions.Moreover, there is no need to completely rely on both NQoS and AQoS parameters when adopting hybrid prediction models.
Recently, there is a growing interest in the literature in the utilization of machine learning techniques for the development of objective QoE prediction models [5], [18], [19].As the selflearning capability can be provided by such methods, nonintrusive prediction of video quality can be implemented to dynamically adapt to any update in QoS parameters.One of these methods is the Adaptive Neural Fuzzy Inference System (ANFIS) [20] which was adopted in [21] and [22] for the estimation of the quality score, with the focus on 2D video in a single resolution, QCIF (176x144).The authors in [23] presented a real-time prediction engine for 3D video using Random Neural Network (RNN).Another research work focused on 3D video is [24], which presented a RR metric based on Peak Signal-to-Noise ratio (SNR).
It is observed from the investigated literature that most of the video quality prediction models considered either network impairments, encoders compression artifacts, or the features of video content.It is uncommon that these factors are considered all together in one solution.The proposed work in this paper introduces a a non-intrusive QoE prediction model based on the use of an automated FIS method considering a collection of key QoS parameters for wireless 3D video streaming.

A. Video Encoding Parameters
Three classes of H.264 coded 3D video streams were evaluated based on a temporal activity by using the spatiotemporal classification in the ITU-T P.910 recommendation [25].The temporal and spatial features were extracted from the 3D video stream, and then a temporal index (TI) and spatial index (SI) were assigned by the Sobel filter.The temporal activity and spatial complexity of the video sequence are indicated by the computed index.Consequently, 6 3D video sequences (2 in each class) were chosen, as shown in Table I  The H.264/AVC JM Reference Software [26] was used to encode and decode all video sequences with the H.264/AVC video coding standard [27] (for both the colour image and the depth map).Table II presents the configuration parameters of the encoding process.The frame rate (FR) was fixed at 25 fps as this is typical in wireless video streaming [28].The videos sequences were encoded with different quantization parameter (QP) and resolutions.The network abstraction layer (NAL) units were RTP packetized and encapsulated in IP packets.The Group of Picture (GOP) size was 16, where each group included one I-frame and all remaining frames were P-frames.This reduced the computation time arising from bi-predictive B-frames, and is the structure recommended for wireless video streaming [28].

B. The selected QoS Parameters
The selected QoS parameters in this study were resolution (R), quantization parameter (QP) and content type (CT) from the AQoS level, while mean burst loss (MBL) and packet loss rate (PLR) from the NQoS level.Table III summarises the values of the simulated QoS parameters.Moreover, the simulation of each tested condition is repeated 10 different times to increase data confidence.

C. Simulation Scene
The simulation scene illustrated in Fig 1, which is designed and conducted for mapping the QoS to QoE.The coded 3D video streams (2D colour image and depth map) were simulated on wireless transmission environment.The packet loss traces with varying MBL and PLR metrics were generated by using the Gilbert-Elliot model [29].The degraded 3D video streams were then assessed by an objective 3D video quality metric, which is explained in the following subsections.The QoE measurement methods can be categorised into subjective and objective methods [30].In this paper, an objective quality metric was used for quality assessment and then validated by a subjective assessment.

A. Objective Test
A validated full reference perceptual 3D video quality metric (Q) [31] was used for the objective measurements.This metric uses VQM (Video Quality Metric) [32] for the 2D colour images' assessment, and then uses a joint mathematical model [31] to be combined with the corresponding depth map.For the VQM scale, 1 represents severe distortion and 0 represents original quality.The 3D quality scale is mapped to the subjective metric called MOS (Mean Opinion Score) [25] by means of the equations [33]:

B. Subjective Test
In this work, the subjective assessment test was conducted to assure the credibility of the huge measured objective dataset.The standard recommendation ITU-R BT.500-13 [34] was followed for this test.Because the total number of test conditions for the measured objective dataset was huge (about 1080 conditions), a systematic approach (called Kennard and Stone algorithm) [35] was followed to select a subset of 64 video sequences for subjective testing.A single stimulus (SS) quality evaluation method was applied with a panel of 21 viewers in a lab under controlled conditions.The viewers were marked their MOS scores between 1 to 5, as illustrated in Table IV. Figure 2 illustrates the observers' MOS scores and their corresponding 95% confidence intervals.The analysis of variance (ANOVA) test [36] was also conducted in this study to explore the impact of AQoS and NQoS parameters on the 3D video quality.Tables V shows the results of the ANOVA test.A small p value (p <= 0.01) indicates that the QoS parameter highly affectes the video quality.It is clear from the table that the PLR metric had the highest impact.The interactions between these QoS parameters also affect the video quality

V. QOE PREDICTION METHODOLOGY
The peoposed 3D video quality prediction was built by using the automated Type-1 FIS, which outperforms other methods in terms of making decisions and modelling capabilities [37].The FIS control consists of three main modules; fuzzifier, fuzzy inference engine and defuzzifier.A functional block of the proposed model is presented in Figure 4.  Generally, the procedure of designing a FIS model can be divided into two main steps; learning (initialization) step and control step.The learning step includes defining the linguistic variables, designing the MFs and extracting the fuzzy rules base.While, the control step includes fuzzification, inference engine and defuzzication.Algorithm 1 gives an overview about the process of fuzzy logic system [38].The following subsections describe the procedure of designing the proposed model, its MFs and fuzzy rules extraction. A. Identifying the Inputs and Output the chosen QoS parameters are outlined in table III and the output MOS scores (QoE) in table IV.The collected dataset consists of multiple data pairs of the input and output using the following form: where N is the data instances number, x (t) ∈ R n , and y (t) R k .Once the inputs and the output were identified, both were converted into linguistic expressions (the MFs design) to represent the quantification of the output (QoE scores).

B. Membership Functions Design
The correlation between the input and output parameters was transferred into fuzzy MF.In this study, the MFs were derived using probability distribution functions (PDF) [39] for every QoS parameter.The PDF is divided by its peak to convert the probabilistic information into fuzzy sets, which are a set of rules that take linguistic expressions form.In this work, the QoS input parameters were assigned by three fuzzy sets (low, moderate, high), while the output was assigned by five fuzzy sets, which are the MOS scales.All the fuzzy sets were presented in an equivalent triangular shape due to its simplicity.Figure 5 shows the MFs of the inputs parameters, while the MF of the output is presented on Figure 6.

C. Automated Fuzzy Rules Extraction
The proposed model automatically adapts the rules necessary for the fuzzy inference system using learning from example (LFE) approach [40].A developed version of the (Mendal-Wang) method [41], [38] was used to apply the LFE approach in designing the proposed automated FIS model.The extracted rules can take different forms.In the proposed system, the following fuzzy IF-THEN rules was used to represent the relationship between the input pattern x = (x 1 , ..., x n ) T and the output y = (y 1 , ..., y n ) T : where the rules index i = (1, 2, ....M ), and M represent the rules number.A number of V fuzzy sets A q s , for q ∈ {1, 2, . . ., V }, defined for each input X s , and W fuzzy sets B h , for h ∈ {1, 2, . . ., W }, defined for each output y c .Moreover, the AND operator, which selects the minimum value of the fuzzy sets was used in this model.The process of extracting the fuzzy rules from the data was performed using the following two steps, as described in [41], [38]: STEP A: For a fixed N pair (x (t) ; y (t) ) of input-output parameters, the membership values µ A q s (x s ) are computed for each MF (q ∈ {1, . . ., V }), and each input variable s ∈ {1, . . ., n} to give q * ∈ {1, . . ., V }: Thus, this step defines the process of choosing the fuzzy set that reaches the highest membership degree at the data point as the one in the IF part of the rule.

IF x t
1 is A q * 1 and x t 2 is A q * 2 and .....
Note that each of the fuzzy sets X s associated to the input variables is characterised by V fuzzy sets A q s , where q ∈ {1, . . ., V }, and V n is the maximum number of possible generated rules.Nevertheless, depends on the dataset, only those rules were generated from the V n possibilities whose dominant regions contain at least one data point.In this STEP, one rule was generated for each pair of the input and output data.This rule is modified to create its final form in step 2. The rule's weight was computed using the formula [41]:

B. Validation by The External Dataset
A further validation was conducted by using an external dataset that has been used in [21] and publicly available in their website.This dataset is for H.264 encoded QCIF 2D videos with network conditions of PLR, MBL for three types of video sequences (CT).In this case, we decrease the input parameters in the proposed model to three inputs; PLR, MBL and CT.Fig 9 illustrates the model validation results after using the external database.This external dataset was also used in [21] for the ANFIS-based prediction model.Thus, we compared the results of our FIS-based model with the ANFIS-based model developed in [21].As shown in table VII, the proposed model achieved a correlation coefficient of 90.2%, while 87.1% for the ANFIS-based model in [21].

VII. CONCLUSIONS
In this proposed work, a non-reference prediction model was developed to predict the 3D video quality in wireless transmission environment.The proposed model was built by using an automated FIS method.Moreover, a selection of AQoS and NQoS parameters were identified and mapped to the MOS scores of the transmitted 3D video streams for end-to-end quality prediction.A subjective assessment was conducted to validate the objectively measured QoE dataset.Furthermore, in order to identify the most influential QoS parameters that affect the video QoE, the collected dataset was also analysed by the ANOVA test.After that the validated dataset was then used to build the proposed model.From the results, the proposed FIS-based model shows a high correlation between the objectively measured QoE and the predicted QoE.The results also confirmed that the choice of the AQoS/NQoS parameters is essential to achieve a high prediction accuracy.This work advances the development of non-reference quality prediction models for wireless 3D video streaming.For future work, the usage of the FIS-based model can be investigated to implement a potential application, such as content provisioning for network/service providers.Furthermore, additional AQoS and NQoS parameters can be considered for end-to-end quality prediction.

Algorithm 1 :
process of the FIS 1. Define the linguistic expressions (Initialisation) 2. Design the membership function using triangle shape (Initialisation) 3. Convert crisp input value to fuzzy value using the MFs (Fuzzification) 4. Automatically extract the fuzzy rule base (Fuzzy inference engine) 5. Evaluate the fuzzy rules in the rule base (Fuzzy inference engine) 6. Aggregate the results of each rule (Fuzzy inference engine) 7. Convert the fuzzy value to crisp output value (Defuzzification)

Fig. 5 .Fig. 6 .
Fig. 5. MF of the Input Parameters Membership Functions of the output (QoE) VI. MODEL VALIDATIONA.Validation by The Testing DatasetAs listed in TableI,6 3D video sequences were chosen, two in each class.The sequences Music, Poker and BMX are used for model training and Fencing, Poznan and Pantomime are used for model testing.The testing dataset was used to validate the proposed prediction model.The measured QoE results are compared with the predicted QoE by the proposed prediction model.The used validation metrics were R 2 correlation and RMSE (root mean squared error).R 2 scored 0.951 and RMSE was 0.1058.The validation of the proposed system is illustrated in Fig 7 and 8.In Fig 7, the measured MOS (QoE) represented by the line, while each point shows the estimated MOS (QoE) of a particular test condition.The obtained result indicates that the measured QoE is greatly correlated with predicted QoE.So, the proposed FIS-based model succeeds in estimating the user's perception, and shows how the relationship between the AQoS/NQoS parameters and the video QoE is consistent.

ACKNOWLEDGMENT
This work was supported by Deanship of Scientific Research, Qassim University, according to the agreement of the funded project No. SRD-2045-coc-2016-1-12-S, the authors thanks the sponsor of this work for their support . The sequences Music, Poker and BMX were used for training, while Fencing, Poznan and Pantomime were used for testing and validation.

TABLE III .
QOS PARAMETERS

TABLE V .
FIVE-WAY ANOVA ON QOE OF 3D VIDEO

TABLE VII .
PERFORMANCE COMPARISON