Content based Video Retrieval Systems Performance based on Multiple Features and Multiple Frames using SVM

In this paper, Content Based Video Retrieval Systems performance is analysed and compared for three different types of feature vectors. These types of features are generated using three different algorithms; Block Truncation Coding (BTC) extended for colors, Kekre’s Fast Codebook Generation (KFCG) algorithm and Gabor filters. The feature vectors are extracted from multiple frames instead of using only key frames or all frames from the videos. The performance of each type of feature is analysed by comparing the results obtained by two different techniques; Euclidean Distance and Support Vector Machine (SVM). Although a significant number of researchers have expressed dissatisfaction to use image as a query for video retrieval systems, the techniques and features used here provide enhanced and higher retrieval results while using images from the videos. Apart from higher efficiency, complexity has also been reduced as it is not required to find key frames for all the shots. The system is evaluated using a database of 1000 videos consisting of 20 different categories. Performance achieved using BTC features calculated from color components is compared with that achieved using Gabor features and with KFCG features. These performances are compared again with the performances obtained from systems using SVM and the systems without using SVM. Keywords—CBVR; KFCG; Multiple Frames; SVM; BTC; Gabor filter I. LITERATURE REVIEW AND RELATED WORK Researchers have developed a number of techniques, methods and systems in the field of content based video retrieval systems. They are required to effectively search, index and retrieve videos from databases but the reliable and effective systems are still awaited for huge databases [6]. For this reason, text based searches are still in practice for the video retrieval systems [5]. A content based retrieval system was developed for commercial use [15]. Face detection method was used for image and video searches in this system. But this method also proved to be very poorly performing [8] by the automatic systems participated in the video retrieval track [16]. A hope emerged when low level features were utilized. Comparison of low level features extracted from key frames of the query and the videos from database provide better results for video retrieval systems [6]. Other useful and much more important information from videos can bring performance of the video retrieval systems to a great level of success. Researchers still face a challenge to utilize important information such as sequence of shots, temporal and motion information [5]. To compensate this problem and to get better retrieval performance, a video retrieval system [2] utilized all frames of a shot instead of only the key frames so that more visual features are extracted. Another system [12] integrated color and motion features for better utilization of spatiotemporal information but a fact is still relevant that an efficient image retrieval technique results in an efficient video retrieval technique where image from the query video is used as a query [8]. The system proposed here utilizes visual features from multiple frames instead of a single frame, key frames of the shot or all of its frames. The proposed system provides the much required solutions to the problems mentioned above which are, lower efficiency when only a single image is used, high computational cost when key frames are used and unavailability of proper tools for clustering algorithm. This system provides reasonable efficiency along with low computation cost. In section II, features extraction algorithms and classification are discussed; section III discusses about similarity measure; section IV shows the methodology to calculate result parameters in the proposed CBVR system, while the proposed CBVR system is elaborated in section V. Result analysis is presented in section VI; problems and challenges faced by the CBVR system are discussed in section VII and it is concluded by section VIII. II. FEATURES EXTRACTION AND CLASSIFICATION Color, texture and motion features are the most useful features for classification and retrieval of videos. Color histogram proves to be useful to represent color content while extraction of Gabor features is a popular way to represent texture features [4]. A. Extraction of BTC Features Block truncation coding (BTC) is basically a compression technique for images [14]. BTC features are calculated for small blocks formed by dividing an image instead of calculating for each pixel [17], [18]. BTC is used to obtain features from color information of pixels belonging to the small blocks. BTC features from multiple frames are employed to obtain very high precision and recall values. These features can also be used for image classification and retrieval purpose. The BTC technique can be extended to RGB (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 7, No. 8, 2016 101 | P a g e www.ijacsa.thesai.org images by considering each color component (red, green and blue) as a separate plane [14]. BTC features are obtained as shown in the equations (1-5).  An inter band average image (IBAI) is formed as shown in (1) ( ) ( ) ( ) ( ) ( )  Threshold values for the three color components are calculated as shown in (2) for one of the components (red).

Abstract-In this paper, Content Based Video Retrieval Systems performance is analysed and compared for three different types of feature vectors.These types of features are generated using three different algorithms; Block Truncation Coding (BTC) extended for colors, Kekre's Fast Codebook Generation (KFCG) algorithm and Gabor filters.The feature vectors are extracted from multiple frames instead of using only key frames or all frames from the videos.The performance of each type of feature is analysed by comparing the results obtained by two different techniques; Euclidean Distance and Support Vector Machine (SVM).Although a significant number of researchers have expressed dissatisfaction to use image as a query for video retrieval systems, the techniques and features used here provide enhanced and higher retrieval results while using images from the videos.Apart from higher efficiency, complexity has also been reduced as it is not required to find key frames for all the shots.The system is evaluated using a database of 1000 videos consisting of 20 different categories.Performance achieved using BTC features calculated from color components is compared with that achieved using Gabor features and with KFCG features.These performances are compared again with the performances obtained from systems using SVM and the systems without using SVM.

I. LITERATURE REVIEW AND RELATED WORK
Researchers have developed a number of techniques, methods and systems in the field of content based video retrieval systems.They are required to effectively search, index and retrieve videos from databases but the reliable and effective systems are still awaited for huge databases [6].For this reason, text based searches are still in practice for the video retrieval systems [5].A content based retrieval system was developed for commercial use [15].Face detection method was used for image and video searches in this system.But this method also proved to be very poorly performing [8] by the automatic systems participated in the video retrieval track [16].A hope emerged when low level features were utilized.Comparison of low level features extracted from key frames of the query and the videos from database provide better results for video retrieval systems [6].Other useful and much more important information from videos can bring performance of the video retrieval systems to a great level of success.Researchers still face a challenge to utilize important information such as sequence of shots, temporal and motion information [5].To compensate this problem and to get better retrieval performance, a video retrieval system [2] utilized all frames of a shot instead of only the key frames so that more visual features are extracted.Another system [12] integrated color and motion features for better utilization of spatiotemporal information but a fact is still relevant that an efficient image retrieval technique results in an efficient video retrieval technique where image from the query video is used as a query [8].The system proposed here utilizes visual features from multiple frames instead of a single frame, key frames of the shot or all of its frames.The proposed system provides the much required solutions to the problems mentioned above which are, lower efficiency when only a single image is used, high computational cost when key frames are used and unavailability of proper tools for clustering algorithm.This system provides reasonable efficiency along with low computation cost.
In section II, features extraction algorithms and classification are discussed; section III discusses about similarity measure; section IV shows the methodology to calculate result parameters in the proposed CBVR system, while the proposed CBVR system is elaborated in section V. Result analysis is presented in section VI; problems and challenges faced by the CBVR system are discussed in section VII and it is concluded by section VIII.

II. FEATURES EXTRACTION AND CLASSIFICATION
Color, texture and motion features are the most useful features for classification and retrieval of videos.Color histogram proves to be useful to represent color content while extraction of Gabor features is a popular way to represent texture features [4].

A. Extraction of BTC Features
Block truncation coding (BTC) is basically a compression technique for images [14].BTC features are calculated for small blocks formed by dividing an image instead of calculating for each pixel [17], [18].BTC is used to obtain features from color information of pixels belonging to the small blocks.BTC features from multiple frames are employed to obtain very high precision and recall values.These features can also be used for image classification and retrieval purpose.The BTC technique can be extended to RGB www.ijacsa.thesai.orgimages by considering each color component (red, green and blue) as a separate plane [14].BTC features are obtained as shown in the equations (1-5).

 An inter band average image (IBAI) is formed as
shown in (1) )  Threshold values for the three color components are calculated as shown in (2) for one of the components (red).

∑ ∑ ( ) ( )
 Binary bitmaps are created for each of the three components as shown in (3) for the red component  m 1 and m 2 are the mean values found for the three components as shown in ( 4) and ( 5) for the red components.
where, m 1 = {m R1 , m G1 , m B1 } and m 2 = {m R2 , m G2 , m B2 } m 1 and m 2 represent the entire block.Mean values of all the blocks considered together represent the entire image.

B. Extraction of Gabor Features
Gabor features provide good representation of edge and texture features for objects and texts and help to distinguish them effectively from the background [7].Gabor filters are capable of extracting features from edges or regions of different objects inside an image directed towards desired orientations with different frequencies [22].Method to extract Gabor features is shown in Fig. 1 while the mathematical expressions are given from equation 6 to equation 11 [20].
Where, is complex conjugate of . is generated by some morphological operations on mother wavelet.p * q is the size of filter mask, u and v are scale and orientations.
Gabor filters are applied on the image with different orientations and different scales to find a set of magnitudes ( ) containing the energy distribution in the image in different orientations and scales as shown in (7).
To obtain texture features Standard deviation σ and mean are required and calculated as shown in equations ( 8) and ( 9) respectively Standard Deviation, Texture features vector F is formed by a set of feature components [25], [26] i.e., different values of and calculated by varying u and v as shown in equation (10).

C. Extraction of KFCG Features
Compression is achieved in vector quantization by using some bits to represent a closest codeword for small blocks formed by dividing an entire image [27].Linde-Buzo-Gray (LBG) is most commonly used algorith to generate codebook [28].In LBG algorithm, vectors found in the blocks are training vectors which are seperated to form different clusters.They are divided again and again by process of iteration.Codebook vectors are centroid of these clusters [29].A training vector is represented by codebook vector closest to it [30].Codebook vectors are represented by a set of codewords which are used to encode and decode the images [31].Kekre's Fast Codebook Generation (KFCG) Algorithm is basically used for image compression [32] [23].It requires less time to generate the codebook through vector quantization method.The codebook generated is used in the proposed system as a feature vector for video retrieval purpose [20].

D. Classification of Features using Support Vector Machine
Support Vector Machine (SVM) improves performance of content based image retrieval (CBIR) significantly [11].It is the inspiration to use SVM for CBVR too.SVM can utilize the features representing a video similarly it does for CBIR.Here, the feature vector can be the features extracted from frames, shots, scenes or events.Features from known categories of videos are labeled to train the svm.Similar features extracted from other videos are used by SVM for classification of videos.Use of SVM is a milestone in automatic classification of videos [19] with better efficiency.

III. SIMILARITY MEASURE
Features extracted from the images provide most convenient method for similarity measurement [1].The query video is retrieved by finding similarity between its feature www.ijacsa.thesai.orgvector [9], [10] and feature vector of the videos stored in database.Video similarity is measured at different resolution [13].So the selection of features becomes relevant for calculating similarity.Similar videos can also be obtained by using SVM.The videos classified by SVM to form one category show greater similarity among them.The most similar video can be obtained by finding euclidean distance between the query video and the videos classified to form that category.Again, the feature vector is used to calculate the euclidean distance.
The equation for Euclidean distance between a query frame q and a database frame d is shown in ( 12) Where V dn are the feature vectors of database frame d and V qn are the feature vectors of query frame q each having size N [20].

IV. RESULT EVALUATION METHOD
The performance of video retrieval is evaluated with the same parameters as it is evaluated in image retrieval [11].Recall and precision are the two parameters [2] as given in ( 13) and ( 14).

V. PROPOSED CBVR SYSTEM
A CBVR system is proposed in this paper in which multiple frames are obtained for the query videos and the videos' database instead of using single frame or key frames or all frames [2].BTC, Gabor and KFCG features are obtained as mentioned above in features extraction section.The similar and most relevant videos are obtained from the output directory containing videos of that category.Significantly higher results have been obtained using this system.A typical methodology is used in this system where a video is retrieved from its category.Here, database is processed offline.The videos are represented by feature vectors formed from any one or a combination of more than one from three types of features extracted from their multiple frames.Feature vectors are then labelled and stored in the features database.An SVM is trained for the categories registered in the system using labelled feature vectors stored in the database.Variables are obtained from the trained SVM.Feature vectors from the query videos are used for classification using SVM variables already saved.Videos obtained in the output folder are the videos of the desired category.For a query clip, videos stored in the given category can be ranked according to the distance measures and most similar videos are retrieved.Euclidean distance is used to measure similarity [20].Retrieval system without using SVM is shown in Fig. 2 [20].Most similar videos are obtained based on minimum distance between feature vectors stored in the database and feature vectors of the query image.As mentioned above, multiple frames based classification and retrieval yields acceptable results without the complexity of finding key frames to represent a shot.A process flow of the CBVR system using SVM is shown in Fig. 3. Multiple frames are obtained during segmentation.Features are then extracted for each of those frames and stored in feature vectors database.Feature vectors are labelled for the pre-decided categories.

A. Database
The technique using multiple frames with one or multiple features using SVM is applied to a video database having 1000 videos with 20 categories of 50 videos each as shown in Fig. 4. Videos similar to the query video are stored in output folder after classification using SVM classifier.The precision and recall values are computed by grouping the number of classified videos belonging to the category of query video and then finding minimum distance between them and the query video.

B. Analysis of Results
The charts shown from Fig. 5 to Fig. 10 for different features represent the retrieval results obtained for retrieving and classification of video clips from different categories.These categories are among the 20 categories of video clips from the video database of 1000 videos.The results obtained are highly appreciable for all the categories.The results are obtained using SVM based on Gabor features extracted from multiple frames of the video clips.Similarly, results are also obtained using block truncation coding method extended for color images [24] and KFCG algorithm.The charts compare the performance obtained by the system using SVM with the performance obtained from system based on same features without using SVM [21].Comparison of systems performance is also done using three different features without using SVM and while using SVM.

1) Results for video clips using Gabor features
Fig. 5 shows results (precision values) obtained by CBVR system based on Gabor features extracted from multiple frames using SVM.There is a significant improvement in results using SVM as compared to results obtained without using SVM except for one case.Fig. 5. Comparison of Precision values shown for given categories of videos using SVM and without using SVM using Gabor features Fig. 6.Comparison of Recall values shown for given categories of videos using SVM and without using SVM using Gabor features Fig. 6 shows results (recall values) obtained by CBVR system based on Gabor features extracted from multiple frames using SVM.There is significant improvement in results using SVM as compared to results obtained without using SVM except for one case.
2) Results for video clips using KFCG features Fig. 7 shows results (precision values) obtained by CBVR system based on KFCG features extracted from multiple frames using SVM.There is significant improvement in results using SVM as compared to results obtained without using SVM except for one case.8 shows results (recall values) obtained by CBVR system based on KFCG features extracted from multiple frames using SVM.There is significant improvement in results using SVM as compared to results obtained without using SVM except for one case.www.ijacsa.thesai.org 3) Results for video clips using BTC features Fig. 9 shows results (precision values) obtained by CBVR system based on BTC features extracted from multiple frames using SVM.There is significant improvement in results using SVM as compared to results obtained without using SVM.Fig. 10 shows results (recall values) obtained by CBVR system based on BTC features extracted from multiple frames using SVM.There is significant improvement in results using SVM as compared to results obtained without using SVM.Low level features representing the frames are used in implementation of CBVR systems using query by image or query by clips like the one shown in the proposed system.These low level features extracted from frames are used to measure similarity between different videos.Due to this, different types of videos containing distinct objects but with similar backgrounds may produce false retrievals.For example, videos showing players playing football may be retrieved along with videos showing players playing cricket due to similar background of the field or a video showing a person delivering speech may be retrieved with videos showing a different person delivering speech with a similar background of the stage.Low level features are utilised for content based image retrieval when query is done by example image.Performance and efficiency of such systems searching video is quite acceptable when non-identical features are present in them but the performance is very poor when low level features belonging to different videos are identical.This is due to the fact that they are unable to utilise the semantic features e.g., different videos having different electronic equipments but with distinct low level features.

VIII. CONCLUSION
The proposed system shows better classification and enhanced video retrieval results.The higher efficiency has become possible due to utilization of distinct features representing distribution of color information (BTC method), inclination of edges in multiple directions (Gabor algorithm), codewords representing blocks (KFCG algorithm).Though, the result is appreciable for all the three types of features but the Precision and Recall values are much higher for BTC and KFCG features as compared to Gabor features.The performance is boosted further due to use of features from multiple frames instead of using single key frame representing a shot.Additional improvement is achieved by the use of SVM which makes the system highly efficient.

IX. FUTURE SCOPE
Computational cost of the proposed system is better as there is no requirement to find the key frames for each shot.Though we have enhanced classification and retrieval of videos, an attention and focus is required to eliminate the drawback of producing false result when videos have similar backgrounds.Another scope of future research is the recognition and grouping of videos belonging to same category but having different low level features.

Fig. 3 .
Fig. 3. Proposed CBVR systemSVM is trained and its variables are stored.This process is done offline.The query videos are separated into the categories based on stored SVM variables using feature vectors of the query videos.Videos obtained for different categories are stored with different categories in the output database.The query video can be retrieved by exact similarity matching from the classified videos using Euclidean distance method.

Fig. 7 .
Fig. 7. Comparison of Precision values shown for given categories of videos using SVM and without using SVM using KFCG features

Fig.
Fig.8shows results (recall values) obtained by CBVR system based on KFCG features extracted from multiple frames using SVM.There is significant improvement in results using SVM as compared to results obtained without using SVM except for one case.

Fig. 8 .
Fig. 8.Comparison of Recall values shown for given categories of videos using SVM and without using SVM using KFCG features

Fig. 9 .
Fig. 9. Comparison of Precision values shown for given categories of videos using SVM and without using SVM using BTC features www.ijacsa.thesai.org