Inter Prediction Complexity Reduction for HEVC based on Residuals Characteristics

High Efficiency Video Coding (HEVC) or H.265 is currently the latest standard in video coding. While this new standard promises improved performance over the previous H.264/AVC standard, the complexity has drastically increased due to the various new improved tools added. The splitting of the 64×64 Largest Coding Unit (LCU) into smaller CU sizes forming a quad tree structure involves a significant number of operations and comparisons which imposes a high computational burden on the encoder. In addition, the improved Motion Estimation (ME) techniques used in HEVC inter prediction in order to ensure greater compression also contribute to the high encoding time. In this paper, a set of standard thresholds are identified based on the Mean Square (MS) of the residuals. These thresholds are used to terminate the CU splitting process and to skip some of the inter modes processing. In addition, CUs with large MS values are split at a very early stage. Experimental results show that the proposed method can effectively reduce the encoding time by 62.2% (70.8% for ME) on average, compared to HM 10, yielding a BD-Rate of only 1.14%. Keywords—HEVC; inter prediction; early termination scheme; complexity reduction; prediction residuals


I. INTRODUCTION
High Efficiency Video Coding (HEVC), produced by the Joint Collaborative Team on Video Coding (JCT-VC) consisting of ISO-IEC/MPEG and ITU-T/VCEG, is currently the latest standard in video image compression [32].In fact, it makes use of the same hybrid approach, consisting of the intra/inter prediction coupled with 2-D transform coding, which all video compression technology have been using since H.261.
Video coding consists essentially of removing the maximum redundancy possible through intra and inter prediction.While intra prediction, which makes use of data from the same frame, produces a very high quality output, it is also accompanied by a high bitrate.Inter prediction exploits the redundancies among already decoded frames and contributes significantly towards the low bitrate of the compression algorithm.
Unlike its predecessor, H.264/AVC [35], which partitions the frame into 16 × 16 macroblocks, HEVC makes use of a block of size 64×64 pixels which is also known as the Largest Coding Unit (LCU).This large LCU size is advantageous for smooth regions of a picture although HEVC has to test all the various combinations of CU sizes.In fact, HEVC adopts a highly flexible quad-tree structure as shown in Fig. 1.For each CU size, HEVC selects one out of many prediction possibilities resulting in the testing of various Prediction Units (PUs) and Transform Units (TUs) prior to selecting the best one based on the Rate Distortion (RD) cost.In addition, using a bottom up approach in the quad-tree structure, the four aggregated children CUs are compared with their parent to check the possibility of combining them as one CU only.The larger the CU size, the higher will be the compression as the prediction information sent will be less than those required by the four children.This process applies to both the intra and the inter prediction, although the PU partitioning is different for these two modes.
A high proportion of the encoding time is however devoted to inter-prediction during the encoding phase.Inter frame prediction alone consumes from 60% to 96% of the total encoding time [26].In [13], inter prediction is shown to consume 84% of the encoding time while only 3% is taken by intra prediction for High Definition (HD) sequences.
Consequently, for practical applications such as highresolution video services and real-time processing, HEVC still requires a significant complexity reduction while maintaining the high coding performance.Several approaches have been proposed to reduce the encoding complexity for both intrapicture and inter-picture prediction.In intra-prediction, com-plexity reduction is achieved by reducing the number of modes used out of the maximum of 35 and by terminating the CU splitting process as early as possible [5], [6], [9], [22], [27], [28], [31], [40].While the determination of the CU size at an early stage is also applicable to inter prediction, time reduction is also obtained by optimizing on the Motion Estimation (ME) process.
The complex quad-tree structure of the HEVC standard, which is the basis of the high encoding time, also implies that various optimization techniques could be implemented.Thus, most of the existing works have come up with heuristics to eliminate the processing of some of the possible configurations within the full range available so as to encode the input video with substantial time savings.The results of these techniques to accelerate the encoding process also come along with varied loss in quality and increase in the bit rate.
The high encoding time during inter prediction is largely contributed by the use of various Motion Estimation (ME) and motion compensation techniques to produce accurate prediction.In addition, the use of multiple reference frames in the ME techniques since the H.264/AVC standard, makes the motion estimation process even more complex.The common optimization method to reduce the encoder complexity therefore consists of decreasing the number of ME operations during the inter prediction process.
In this paper, the splitting of CUs in the quad tree is terminated based on the Mean Square (MS) of the residuals.In addition, CUs are also identified as split-CUs during the 2N × 2N PU processing itself.In such cases, the processing for other PU modes are skipped.The different thresholds are identified after analyzing the splitting process of a sample of the sequences.The rest of this paper is thus organized as follows.Section II enumerates the techniques proposed by related works.Section III provides an overview of inter prediction in HEVC.The time consuming elements during inter-prediction are presented in Section IV.Section V describes the proposed approach consisting of the early termination schemes.The experiments conducted and discussions on the results obtained are subsequently presented in Section VI.Finally, Section VII concludes the paper.

II. RELATED WORKS
Several algorithms have been proposed to reduce the HEVC complexity while ensuring negligible loss in quality and bit rate.
For example, significant time savings are obtained by avoiding the unnecessary splitting of some CUs.In [41], the quad-tree CU depth decision process is modeled as a threelevel of hierarchical binary decision problem.In addition, a flexible CU depth decision structure is used to allow the performance of each CU depth decision be smoothly transferred between the coding complexity and RD performance along with binary classifiers to control the risk of false prediction.By making use of the mode information of the current CU and a depth range selection mechanism (DRSM), [19] produces an effective splitting decision process.The RD costs of the parent and current levels are used in [11] to terminate the quadtree-based structure earlier, thus reducing the computational complexity of the encoder.In [38], the CU splitting decision is constructed on a pyramid motion divergence (PMD) based CU selection along with a k nearest neighboring like method.Splitting and termination decision approaches are also explored in [15], [30].In [30], a Bayesian decision rule is also used in the splitting decision.The splitting decision in [29] is based on skipping some specific depth levels rarely used in the previous frame and neighboring CUs in addition to using termination methods based on motion homogeneity checking, RD cost checking and SKIP mode checking.An early pruning method based on statistics of the prediction residuals is used in [33] to produce significant time gain.A fast CU decision approach is proposed in [36] based on an exponential model expressing the relationship between the motion compensation R-D cost and the SAD cost for the upper CU and its sub-CUs.
Fast prediction unit decision methods are proposed in [17], [18], [24], [34].Motion Estimation is performed only on the selected inter prediction mode in [17] based on a priority level.Inter prediction information from previously predicted blocks and neighboring blocks enable the selection of only one inter prediction mode.Spatio-temporal analysis and depth correlations along with a classification of motion activity are explored in [18].In [34], optimization techniques are proposed based on the rate-distortion-complexity characteristics of the HEVC inter prediction for the different block partitioning structures.In [24], the depth information of a CTU is already determined from information collected in the 2N × 2N PU.It makes use of the high probability of 2N × 2N being the best mode.Thus a fast scheme is proposed to make this decision at an early stage itself.In addition, a merge SKIP extraction method is developed and integrated with the CU depth decision algorithm to effectively decrease the encoding time.
A simple tree-pruning algorithm is proposed in [7] that exploits the observation where the sub-tree computations can be skipped if the coding mode of the current node is sufficient, i.e. a SKIP mode.An early detection of SKIP mode is also proposed in [39] to reduce an encoding complexity of HEVC.The proposed method is similar to the early skip detection scheme implemented the H.264/AVC, but slightly modified to address the different encoding scheme of HEVC.The SKIP mode and the number of transformed coefficients of a CU which are already computed values in HEVC are utilized in [10] and [16] to produce fast mode decision algorithm.A fast rate-distortion estimation algorithm for HEVC is proposed in [10] based on genuine zero blocks (GZBs) and pseudo zero blocks (PZBs) of the transformed coefficients.In [16], the early mode decision is modeled as a binary classification problem of SKIP/non-SKIP or split/unsplit along with the Neyman-Pearson-based rule to balance the rate-distortion (RD) performance loss and the complexity reduction.
In [25], a new global search pattern for finding the global minima and an adaptive early termination condition are proposed to speed up the Motion Estimation (ME) algorithm.By merging N ×N PU partitions in order to compose larger ones, several Motion Estimation (ME) calls during the PU interprediction decision are avoided in [26] to reduce the overall encoding process.
Reference [13] proposes a system-level Adaptive Workload Management Scheme (AWMS).The AWMS collect feedbacks at a frame level and dynamically configures different parameters for the video-coding system such as the maximum CU depth and the search range.A fast CU size decision based on Sobel operator is proposed in [14].By using textual features of the video images, the algorithm identifies the coding depth of the final CU without having to traverse the coding layers and results in a reduced computational complexity.

III. OVERVIEW OF INTER PREDICTION IN HEVC
Each frame in HEVC is initially partitioned into blocks of 64 × 64 pixels prior to any prediction and encoding being made.Each of these blocks which is also known as the Largest Coding Block (LCU) is further split recursively into four children until the Smallest Coding Units (SCUs) of size 8 × 8 are reached.The quadtree structure formed is illustrated in Fig. 1.Intra prediction iterates among the 35 available modes to select the one that results in the cheapest Rate-Distortion (RD) cost.The reference data which are obtained from the already decoded top and left CUs are interpolated to form the prediction for the current CU regarding the angular modes.While intra prediction makes use of reference samples within the same frame, the references of inter prediction comes from the already decoded frames.As such, inter-prediction makes use of a list of reference frames in the case of uni-predictive frame (P-frame) and two lists (list 0 and list 1) for the bipredictive frame (B-Frame).
Inter prediction exploits the motion data among frames.A block of pixels in a frame generally has a very close match with another block of the same size in a different frame.This match may be at the same location in the reference frame when there is no motion but at a different location when motion is present.HEVC represents this prediction as a set of motion vector which gives the translational movement of the block from the reference frame to the current one.The residuals which is the difference between the original block of pixels and the prediction block is transformed and together with the motion vector constitute the motion data.
During the inter-prediction process, a single motion data may not optimally represents the prediction of the CU in terms of RD cost as objects within this CU may have different motion vectors.CUs are therefore split into PUs to more accurately reflect the different motion vectors within the single block of pixels.In order to determine the best PU configuration, an intensive Motion Estimation (ME) process is undertaken.HEVC has to theoretically search through all the possible blocks in the search window which is commonly known as the full search algorithm so as to reach the best match in terms of minimum distortion and low number of bits to represent the encoded region.There are many fast ME algorithms which optimize this search.Efficient search techniques [20], [25] have been implemented as this process is the most time consuming element in the inter-picture process.
There are 8 possible PU configurations for each CU and they consist of one, two or four PUs.The different PU modes in inter prediction are illustrated in Fig. 2. The PUs are categorized as either symmetric PUs or asymmetric ones.For the symmetric PUs, the CU is either not split or split into 4 PUs or it may be split into two identical PUs (horizontal or vertical).The asymmetrical PUs are also known as Asymmetrical Motion Prediction (AMP) and consist of 4 PU configurations: the CU is split into 1 / 4 and 3 / 4 of the square region.This is especially useful when only a small part within the region shows a different motion vector.There are fours AMP PU modes namely 2N ×nU , 2N ×nD, nL×2N and nR×2N .The N × N PU configuration is only processed for the 8 × 8 CUs.It is to be noted that the smallest size CUs do not check for the AMP PU modes.They only adopt the 2N × 2N , N × 2N , 2N × N and the N × N PU configurations [4].Once the PU modes for each CU size has been identified in the quad tree, a bottom-up approach comparison is performed to determine whether the parent CU is more optimized compared to the combined RD-cost of the four children.In case the parent CU along with the appropriate PU mode is more optimized, it is retained.Otherwise, the 4 children, each with their independent PU configuration, are adopted.

IV. HEVC INTER PREDICTION COMPLEXITY
Although the new features introduced in the HEVC inter prediction contributes to the high performance, it is also the cause for the increase in encoding time compared to AVC.This complexity is distributed among many stages throughout the prediction process.The following sub-sections provide an insight into the time consuming elements during the inter prediction stage.

A. Inter Frames Encoding Time
Inter frame prediction is the most time consuming part in the HEVC encoder.The processing time of inter-predicted frames is in fact higher than their corresponding intra frames within the same sequence.A preliminary experiment is conducted by encoding 10 frames for each sequence and the result is illustrated in Fig. 3.It clearly shows that inter frames processing requires more than thrice the processing time of intra frames, indicating the complex nature of the inter prediction process.This higher computing time is associated with the exhaustive Motion Estimation (ME) processing which is performed for each possible PU configuration at the different depth of the quadtree structure.

B. Time spent at different depth of the LCU
The amount of time spent within each CU size or depth is further analyzed by collecting the data from the inter predicted frames for a number of sequences and the result is provided   As opposed to intra prediction where the complexity of the texture in the frame normally dictates the granularity of the CU structure, the movement of the objects in the frame relative to the background or to other objects determines the CU structure during inter prediction.
The ME operation precisely maps the region of the CU being processed to those within the searched window in order to determine the closest match in terms of the lowest distortion and bit rate.This intensive ME operation lends itself to a very time consuming process although some of these steps could be avoided by identifying the non-split CUs or specific PU modes at an early stage of the quad-tree traversal.

C. Frequency of PU modes
Data has also been collected from the inter predicted frames for a number of sequences and the frequency of occurrences of the different PU modes is provided in Fig. 5.Among the CUs which are not split, almost 90% of them adopt the 2N × 2N PU mode.
By carefully identifying those terminating CUs that potentially will result in 2N × 2N PU mode, the complexity of the inter prediction process may be further reduced.In the following section, a method of reducing these unnecessary computations is proposed by analyzing the residuals formed following the 2N × 2N PU mode operation.

V. PROPOSED APPROACH
Using the lowdelay B main profile for HEVC, there are three B-frames for every I-frame in a GOP.As illustrated in Fig. 3, inter-predicted frames encoding consumes more than thrice the amount of time compared to intra-predicted frames.The complexity of only the B-frames have therefore been reduced in this approach as they account for the larger portion of the encoding time.Moreover, since regions from the Iframes are the main references in the inter-prediction process and are used for assessing the quality of the predicted CUs, they are encoded using the conventional HEVC process.
The final structure of the LCU is determined after traversing the quadtree formed by HEVC with CUs of size 64 × 64 down to 8×8.The proposed approach of complexity reduction targets CUs of size 16 × 16 and larger only.By terminating CUs of size at least 16 × 16, only a reduced number of CUs of size 8 × 8 will be left.These remaining CUs follow the HEVC conventional processing.
By making an informed decision at the beginning of the CU processing for inter prediction, a number of unnecessary processing can be avoided leading a substantial decrease in the encoding time.For example, by correctly identifying that a CU will be split at the beginning itself can prevent the unnecessary checking of other modes.Next, determining that a CU will not be further divided can avoid the processing at higher depths.Within this very CU, the encoding time may be further reduced by recognizing that this terminating CU may be encoded as a single PU, thus avoiding the verification of other PU modes.These decisions, when applied collectively during the quad tree traversal, can lead to enormous time savings for the encoder.The residuals, which is the difference between the original block of pixels and the prediction block, obtained after performing the 2N × 2N inter prediction processing are analyzed to determine the probable configuration of that CU.The residuals are grouped in blocks of 8 × 8 and the Mean Square (MS) of the residuals is computed for each one of these blocks as follows where h and w represent the height and width of the CU and i and j represent the coordinates of each pixel within the block  The distribution of the three classifications can be approximated to three Normal curves as shown in Fig. 7.The early CU termination threshold is set to the M S M ax value for which 90% of the ECUs are captured.A preliminary analysis is performed on the first 10 frames of 4 sequences (one for each class) to determine this value.Table I shows that there is no apparent relationship between the threshold values and the QPs.The thresholds, however, is dependent on the sequences and varies accordingly.In this paper, the average value of 22 is adopted to terminate the CU (for QP = 22) and denoted by T ECU .The threshold value to terminate the CU as 2N × 2N PU is taken as 1/2 T ECU and denoted by T 2N ×2N .As shown in Fig. 6, most of the CUs with values below T 2N ×2N (11) actually terminates as 2N × 2N PUs.The high M S M ax in Fig. 7 values are the split CUs.The split threshold is therefore set above the ECU threshold, i.e. 1.25 × T ECU .These CUs will not be allowed to perform the various modes checks but considered directly as split CUs.It is also observed that the MS of the residuals is proportional to the QP values.A large QP (lower quality) normally results in higher residual values.By observing the MS values obtained with same motion vectors and different QP values, the following relationship is formed (2) M S 37 = 10.0 × M S 22 (4) Combining the relationship among the different QP values with the threshold defined above for QP value of 22, the different threshold values are set as shown in Table II.As illustrated in Fig. 7, the region between the ECU threshold and the split threshold is not considered in our complexity reduction approach.The inter-prediction modes of CUs falling in this area, where the curves intersect, are difficult to predict.The risk of an incorrect decision is quite high and may weight significantly on the quality.For this reason, these CUs be left to HEVC normal processing.
The overall approach is summarized in Fig. 8 where the highlighted (gray) decision boxes indicate the modifications brought about to the conventional HEVC processing for interprediction.

VI. EXPERIMENTS, RESULTS AND DISCUSSION
HEVC Test Model Reference software version 10 (HM10) has been used to implement the proposed early termination schemes for complexity reduction in inter prediction using the QP values of 22, 27, 32 and 37. Thirteen standard sequences, ranging from class B to class E, as defined in [3] have been used so as to cover a broad range of resolutions.The experiments were performed on 50 frames for each sequence using the lowdelay B main encoding configuration with an IBBB Group of Picture (GOP) structure where optimization was carried out on the B frames only.
The performances of the proposed methods, compared to HM10, are reported in terms of the change in average bit rate, peak signal-to-noise ratio (PSNR), Total Encoding Time (TET) and Motion Estimated Time (MET) based on the following formula: ∆T ET (%) = T ET (proposed) − T ET (HM ) T ET (HM ) * 100 TET is the processing time for both the I and the B frames while the MET represents only the encoding time of the B frames for the sequences under consideration.The Bjøntegaard metrics [2] rate distortion performance is used in the computation of BD-Rate and BD-PSNR.

A. Complexity Reduction using Early CU termination
The result of experiments conducted for the early CU termination is presented in Table III.By stopping the splitting process using the early CU termination threshold, T ECU , an average complexity reduction of 41.9% (47.1% for ME) is achieved with a BD-Rate of 0.44%.This reduction is accompanied with a drop of 0.015 dB in terms of PSNR along with a decrease of 0.05 in bitrate.The time savings range from 23.3% for the PartyScene sequence to 63.0% for the Vidyo1 sequence.
Class E sequences show the highest reduction in encoding time as they contain large stationary regions leading to large CUs being formed when encoded.Compared to the PartyScene sequence (lowest performance), the Vidyo1 sequence has more CUs of larger sizes.For example, during the first 10 HEVC encoded frames (7 inter-frames), 54% of the 64 × 64 CUs are terminated in Vidyo1 as opposed to only 30% for PartyScene.However, these high performance sequences from Class E also display high BD-Rates since the fixed thresholds used are slightly higher for these sequences.CUs containing small regions of motion are incorrectly terminated in the proposed approach leading to a relatively higher drop in quality.
It is also noted that the high complexity reduction of 42.7% achieved for the BQTerrace sequence yields a BD-Rate of only 0.09%.

B. Complexity Reduction using Early CU and Early PU Termination
Table IV shows the result of applying the early CU and the early PU thresholds simultaneously.An average complexity reduction of 48.1% (54.4% for ME) is achieved with a BD-Rate of 0.55%.The drop of PSNR is only 0.023 dB while a decrease in the bit rate is also observed.In fact, almost half of the number of CUs terminated earlier are identified as 2N ×2N PUs.For these CUs, the other modes processing are avoided leading to the additional time savings.A similar trend as in the early CU threshold is found for the performance of each sequence.

C. Complexity Reduction using Early CU and Early PU Termination along with Early Splitting
In addition to the early termination of CUs and PU modes, CUs with high MS values at the level of the 2N × 2N mode processing are identified as split CUs.The processing associated with the other modes processing are therefore avoided and the splitting is performed directly.The results of combining all three techniques is provided in Table V.The reduction in TET ranges from 50.1% for the BlowingBubbles sequence to 76.0% for the Vidyo1 sequence.An average overall complexity reduction of 62.2% (70.8% for ME) is thus achieved along with a BD-Rate of only 1.14%.The MET shows an average reduction of 70.8% in encoding time.The best and worst performances in terms of BD-Rate during the experiments are illustrated in Fig. 9 for the BQTerrace sequence and in Fig. 10 for the Kimono sequence.The BQTerrace sequence shows practically no deviation with the proposed approach from the standard HEVC encoder while the Kimono sequence (worst case) indicates only a very slight deviation.This comparison further confirms the high time savings produced by the proposed approach with negligible deterioration in quality.

D. Comparison with related works
A number of works on complexity reduction for HEVC inter prediction have already been published.Comparison with existing works have been limited to those based on the lowdelay main profile of the HEVC.In addition, since different degree of reductions with varying BD-Rates are achieved, the performance indicator ratio, BD-Rate/∆TET, proposed in [8] is used for comparison purposes in Table VI.The comparison is grouped into 3 categories with complexity reduction around 40%, 50% and 60%.The proposed approach for each category, outperforms the other works in terms of the ratio BD-Rate/∆TET, confirming the effectiveness of the proposed approach.For the 40% category, the BD-Rate of 0.44% achieved in the proposed ECU termination approach is well below those of comparable works leading to a BD-Rate/∆TET value of only 1.05.The proposed ECU + EP U termination approach in the 50% category results in 48.7% time savings.Compared to other works, the BD-Rate is lower along with a BD-Rate/∆TET ratio of only 1.27 .The proposed approach in the high reduction category (60%) also surpasses other works in terms of the BD-Rate/∆TET ratio and with an overall complexity reduction of 62.2%.

VII. CONCLUSION
In this paper, a new approach is proposed to enhance the complexity reduction in the HEVC inter prediction process for the lowdelay Main profile.The Mean-Square (MS) of each 8 × 8 block of the residuals following the 2N × 2N inter prediction processing are computed.The highest MS value is used for comparison with the determined thresholds for early termination of the CU and PU processing along with the early identification of the splitting decision.
When this technique is applied to CUs of size 16 × 16, 32 × 32 and 64 × 64, an average overall complexity reduction of 62.2% (70.8% for ME) is achieved at a BD-Rate of only 1.14%.
The proposed approach achieves a higher encoding time reduction compared with the state-of-the-art algorithms while maintaining a good average bitrate performance.In the proposed work, a set of static thresholds are used.The good performances of the proposed work in predicting the termination of CUs and PU processing are observed by the low BD-Rate/∆TET ratios.However, since the proposed approach is based purely on static thresholds, it produces relatively higher BD-Rates when the actual sequence thresholds should have been slightly higher or lower.Therefore, the proposed work can still be enhanced by including in the thresholds some characteristics intrinsic of each sequence.

Fig. 3 .
Fig. 3. Inter frames processing time relative to corresponding intra frames

Fig. 4 .
Fig. 4. Time spent within each CU size during inter-prediction

Fig. 5 .
Fig. 5. Occurrences of PU modes for different CU sizes

The
CUs under consideration are thus classified as 1) Single PU mode (2N × 2N ) 2) Other PU modes (two PUs), and 3) Splitting of the CU into four children In this paper, CUs are classified into the different categories based on the luma residuals.Once the best 2N ×2N PU motion data is obtained following the merge and the SKIP operations, the luma residuals which are also of the order of 2N × 2N are further analyzed.

Fig. 6 .
Fig. 6. a) Frequency of occurrence of ECU, Other PU modes and Split CUs as a relationship of M S M ax for the BasketballPass sequence with QP 22 b) Magnified version for M S M ax values for low frequencies of occurrences The largest MS value, termed as M S M ax is used as the coefficient for the threshold values.The first 10 frames of a sample of sequences are studied to find the relationship between the early CU size determination and M S M ax .Fig. 6 illustrates the frequency of occurrence of the Early CUs (ECUs), other PU modes and Split CUs as a relationship of M S M ax for 16 × 16 CUs of the BasketballPass sequence encoded with a QP value of 22.The ECUs consist of the 2N × 2N CUs (1 PU) as well as those CUs with 2 PUs (N ×2N , 2N ×N , 2N ×nU , 2N ×nD, nL×2N and nR×2N ).It is noted that the majority of the non-split CUs (ECUs) are concentrated at the lower M S M ax values while the split CUs

Fig. 7 .
Fig. 7. Using Normal curves to approximate the occurrences of Early CU, Other PU modes and split CUs

Fig. 8 .
Fig. 8. Illustrating early PU mode decisions based on residuals obtained following the 2N × 2N inter-prediction operation

Fig. 9 .
Fig. 9. Performance of BQTerrace sequence for complexity reduction using Early CU, Early PU and Early Splitting 4. It is noticed that the 64 × 64 CU size shows the smallest percentage of time spent, i.e, 18%.This largest size CU seldom tests all the possible PU configurations and therefore the encoder spends a smaller proportion of time at this depth compared to others.Processing in the other CU depths averagely spend slightly more than 25% of the total inter prediction time spent for the whole LCU.Avoiding the processing of higher depths during inter prediction can therefore bring around 25% time savings for each depth at the level of the CU.

TABLE I .
90% THRESHOLD VALUES FOR ECU BASED ON FIRST 10 FRAMES WITH QP=22

TABLE III .
RESULTS OF COMPLEXITY REDUCTION USING EARLY CU ONLY

TABLE IV .
RESULTS OF COMPLEXITY REDUCTION USING EARLY CU AND EARLY PU

TABLE V .
RESULTS OF COMPLEXITY REDUCTION USING EARLY CU, EARLY PU AND EARLY SPLITTING

TABLE VI .
COMPARISON WITH RELATED WORKS