Unsupervised Commercials Identification in Videos

Commercials (ads) identification and measure their statistics from a video stream is an essential requirement. The duration of a commercial and the timing on which the commercial runs on TV cost differently to the ads owner. Automatic systems that measure these statistics will facilitate the ad owner. This research presents a system that segment semantic videos and identify commercials automatically from broadcast TV transmission. The proposed technique uses color histogram and SURF features resulting in identify individual ads from TV transmission video stream. Experimental results on unseen videos demonstrate better results for ads identification. The target for the proposed approach is television transmission that do not use blank frame between the ads and a non-ad part of the transmission like in Pakistan, different from European countries TV transmission. The proposed segmentation approach is unsupervised. Keywords—TV commercial; semantic analysis; segmentation; video classification; commercial detection; commercial classification


INTRODUCTION
Commercials display on broadcasted TV transmission are a very important part of transmission as majority of revenue for a broadcaster is generated by advertising as well as useful sources of information for the viewers.Knowing what, when and who is advertising can be useful information in knowing market trends and forming business strategy.These commercials can be used as interesting object or segment for semantic analysis of videos.
The term semantics is a very broad and can be used in several different domains e.g.sports, drama, a song, commercial etc.The proposed framework chose commercials that appear in television transmission broadcast for semantic analysis.The target was to develop a framework that can differentiate between commercial and non-commercial segment and compute the statistics of any particular commercial in a TV transmission video stream.The statistics can describe the timing, duration and frequency etc. of a target commercial in a recorded TV transmission video stream.Although several people have attempted commercial detection and identification but most of the work relates to the Europe and USA television transmission which follow transmission standards.Whereas most of the TV channels transmission in Pakistan does not follow those standards e.g.appearance of a blank frame between the ads and a non-ad part of the transmission.
The proposed framework is developed for the transmission that does not use presence of black frames between commercials like in Pakistan TV transmission.Several techniques such as [1] make use presence of black frames between commercials for boundary detection of commercials, which technique is not commonly used for TV channels transmission in Pakistan.Others have used absence of channel logo during the commercial break as a way of separating a video between commercial and non-commercial segments [2] this only technique is again not uses in TV transmission in Pakistan.
This paper has been divided into 7 sections.Section I provides an introduction and overview of the project.Section II discussed related work.Sections III describe Video segmentation implementation; Section IV provides description for commercial identification.Section V explains the method of using Commercials Analysis Application to demonstrate the algorithm.Section VI presents results of experiments that were conducted and Section VII discuss the Future work.In [2] the problem of identifying and categorizing of commercial from TV videos is explored.A multi modal approach is used for boundary detection of individual commercials.Black frames and silence is used for separating ads.Text detection on ads is done for classification of ads.For www.ijacsa.thesai.org the purpose of separating commercials from normal transmission absence of channel logo in commercial segment is used.Audio and visual features were used and trained on a Hidden Markov Model.After evaluation, a precision value of 90% and a recall value of 80% were observed.
Covell M. et al in [3] explored the problem of possibility of video segments that are repeated in a video transmission stream.Their major area of concern was the detection of those advertisements which are related to broadcasters own programs such as ads of programs that will be broadcasted next week of on same day at a different time.Their suggested approach has three major steps audio repetition detection, visual descriptors and the endpoint detection.For the purpose of evaluation experiment was run on four days video footage that was captured from different TV channels transmission.A precision rate of 85% and recall rate of 94% was reported for audio matching part.After video matching, precision rate was 92% and recall was 93% reported and in final result precision was 99% and recall was 95% reported.
In [4] the problem of detecting commercials in videos that are encoded in H.264/AVC is explored.This approach is unique in a sense that it works directly on compressed stream instead of having uncompressed video as a unique separate step.The proposed approach makes use of the fact that logo of channel is not present during commercial segment which is true for European and particularly German television.For the purpose of evaluation recordings from 19 television channels was used that were predominantly related German viewers.An average recall value of 97.33% and precision value 99.31% is reported.
In [5] the problem of detection of scene change in video by making use of audio and visual information that is available from video is explored.The proposed method consist of first determining shot boundary detection using and unsupervised segmentation algorithm making use of object tracking.For the purpose of evaluation several videos from TV news were used and after running experiment average recall value of 89% and average precision value of 92% was observed.
In [6] the problem of detecting commercial breaks in MPEG compresses video stream is explored.It makes use of features that are derived from MPEG parameters.For commercial detection presence of black frame, unicolor frame and change in aspect ratio is used.It was also observed that minimum duration of commercial break is one minute.For the purpose of evaluation television transmission of eight hour length was recorded from Dutch TV stations and presence of black frames in commercial break was found to be the strongest ad detecting parameter.
In [7] a learning based approach for the detection of TV commercials is proposed.Their approach is to do Support Vector Machine based classification that is based on several visual and audio features.Used visual features include average of Edge Change Ration, variance of edge change ratio, average of frame difference and variance of frame difference.Melfrequency Cepstral Coefficient and short time energy are used as audio features.Some post processing steps such as removing of scenes that have very small length, checking of long commercials and refining of commercial boundaries.For purpose of evaluation 10.75 hour of recording TV transmission was used that was collected from different TV channels such as NBC, ESPN2 and CNN.Without post processing a recall value of 88.21% was observed that increased to 91.77% and a precision value of 89.39% was observed without applying post processing and when post processing was applied, it increased to 91.65% [8] has proposed a method for automatic unsupervised segmentation of TV content that makes use of a signal based approach.This approach can be applied to audio, visual or a combination of audio and visual signal by making use of general likelihood ratio and Bayesian Information Criterion .This system was evaluated on recordings from French television and also on TRECVid dataset.For ARGO as recall value of 93% was observed whereas for TRECVid recall value of 89% was observed.For ARGO precision value of 93% was observed whereas for TRECVid precision value of 91% was observed.
[9] has proposed a novel method for commercial detection that is centered around cookery programs.Commercial boundaries are detected based on presence of audiovisual features.Initially different audio features are used for detecting the start and end of commercial break.Then logo of program name is matched with start and end of commercial break.Zero crossing rate and short time energy are used as audio features.

Edge detection and corner detection is used for visual analysis
In [10] the problem of automatically annotating broadcast videos for later search and indexing is discussed as manual annotating is a very costly time consuming and subjective.The approach used is to apply multi-modal machine learning techniques to audio video and text components of video for analysis and retrieval.The system is unique in that it uses and combines audio, video and text information present in video for annotation.Machine learning is applied to create a library of semantic models from training dataset.Human interaction is required in training phase but just for a small dataset.The system allowed users to query videos in several ways based either on feature by selecting a key frame, text based, semantic based or based on model.For the purpose of evaluation TRECVID benchmark was used.Accuracy of 90% was achieved.
In [11] the problem of real-time indexing of videos based on their content is investigated.The suggested approach is to apply statistical methods using Hidden Markov Model (HMM) for content based video indexing.Most of the features used as input to HMM are based of difference image sequence that specify the motion of main object in scene.Other used features are average motion deviation that helps to distinguish shots where large parts are in motion, grey level histogram that is useful in detection of cuts, center of motion and overall intensity of motion.This approach merge scene detection and scene classification in a single step because having scene detection and scene classification as two separate steps cause a fault in scene detection to result in wrong classification.For evaluation recording of 12 news shows recorded from different German TV stations were used.Six news shows were used for training and six were used for testing.They were classified into nine classes namely Studio Speaker, Report, Begin, End, www.ijacsa.thesai.orgWeather Forecast, Out, Interview , Cut ,Frame Translate and Window Change.Out of nine classes seven had recognition rate greater than 80%.It was found that recognition rate for short news is significantly better than that for long news.
In [12] the problem of classifying feature films into categories based on their preview is explored.They have classified films into four categories namely Comedy, Action, Drama and Horror based only on computable visual cues.The suggested approach is to describe input as a set of features that are likely to minimize variance of points within a class and maximize variance between points of different class.The features used included average shot length that was computed by using average color histogram in HSV (Hue, Saturation, Value of brightness) space.For the purpose of evaluation 101 movie previews were taken from apple website and classified into four categories.Out of 101 movies 17 were wrongly classified.
In [13] the problem of retrieving commercial stream based on their salient semantics is discussed from a semiotic perspective.Four semiotic categories of commercials were identified namely practical, playful, utopic and critical.For evaluation purpose 150 commercials from several Italian channels were used and tests were conducted to verify that classifications done by system are in conformity with that done by human experts.The best results were seen for playful commercials and worst performance was for practical features and reason was inability of system to properly detect that the promoted product is in foreground or not.

III. VIDEO SEGMENTATION
The first step of the proposed framework is the video segmentation based on semantics [14].These segments will broadly consist of different programs in addition to the commercial ads.The target is to segment commercials and then identify any particular commercial in the video stream.In order to automatically segment semantic videos, RGB mean is computed for all frames in video, plot histogram corresponding to the frame numbers and calculate variance.From the plots, it is observed that there was a pattern being followed by a commercial segment and normal transmission.The commercial segment had higher distortion in graph whereas the segment that corresponds to normal transmission had fewer peaks per second.Based on histogram variance information the video is segmented in to commercial and no-commercial segments.The details of video segmentation based on semantic can be reviewed in [15].
From the segmented video the first step is to detect start and stop boundary of a commercial.

Ad Boundary Detection
After segmenting video into commercial segment and noncommercial segment the aim is to find automatically the boundaries of individual commercials.Mostly the existing systems [15] make use of property of black frame existence at the end of each commercial.But the black frame existence was not present in all countries TV transmission channels, like in Pakistan.Therefore, a technique for automatically detection of commercials in such kind of TV transmission is needed.
To detect boundary of a commercial in video stream, color histogram of repeating patterns are computed, as the commercials are usually repeated several times during transmission.In order to determine a frame belong to a same scene or not, we calculate histogram difference of change point with last 5 scenes, if a match was found then the frame belongs to the existing scene.
After a scene is identified it is compared with other scenes those were already been generated from the video.If a new scene is found that was not present before it is assigned a new scene ID.However if that scene was already been observed in an earlier location in the video it is added to the list of scenes belonging to existing scene ID.
In the next step repeating segments of scenes are computed.This allows detection of unique commercials because normally a commercial segment is composed of several scenes and it appears several times in a TV transmission as compared with other programs in transmission.The details of the algorithm can be reviewed in [14].Each discovered pattern generally represents a commercial segment that was repeated several time in the video IV.
COMMERCIAL IDENTIFICATION Commercials identification is the step where the location(s) and duration(s) of a particular commercial if exist in the video stream are investigated.It uses a trained feature file as reference for target commercial.This section describes training and commercial identification phases.

 Training
In the Training phase first select frames from a specific commercial for which the system is required to train and RGB histogram is calculated.Using histogram variance information the commercial segment is split into scenes.If the histogram variance of two consecutive frames is greater than threshold the point is considered as a change point for scene.Then in second step user asks to select a key_object in the commercial segment which is significant with respect to the product of commercial.The key_object is used for computation of SURF (Speed Up Robust Transform) feature.Example of a selected key object is shown in figure 2. In this figure a bottle of Lifebuoy shampoo is selected as key_object from a video frame.

Fig. 2. Selection of object for SURF feature computation
One or more key_objects can also be selected for computation of SURF feature; however, selecting one object is enough to eliminate false positive ad detections.After this the www.ijacsa.thesai.orgcomputed SURF feature file uses as a training file for that specific commercial, which is used for the commercial identification stage in the transmission video stream.

 Identification
In the identification phase the training file is uses that find the presence of the investigated commercial segments in the provided video stream.It finds all possible matches (if exist) of the key_object in the video and mark all related frames in the video stream.Sometime a long commercial segment is broadcasted first time and after some time a shorter version of that commercial is broadcast.In this case it is recommended that the system should train on longer version of commercial.This will also be able to detect or identified shorter version of the commercials.To reduce the search space, histogram of each frame in the scene is compared with only the histogram of first frame in the trained scene.The details of commercials identification step is given below.
First histogram of each frames is matched with histogram of previous frame and if the difference is greater than the threshold a new scene is generated.For each new generated scene, histogram of all frames is compared with histogram of first frame the trained scene.If it matches the criteria for 2/3rd match with any one of those scenes then the scene is kept as candidate for the investigated commercial otherwise it is ignored and move to a next scene.When a frame is found where histogram difference is less than the threshold for the object scene then the content of that frame is read and first generate Integral Image from RGB image.Next interest points are calculated for current frame, and then matches are determined between described interest points of object and described interest points of frame.If number of matched interest points is greater than 10 (set a threshold) current commercial segment keep as a valid commercial otherwise it is considered as false detection and deleted.
The advantage of this approach is that we do not need all frames data of video transmission for investigating a commercial because only the histogram information is need that can be computed once only.Frame content is only needed for training stage to compute SURF descriptors of key_object of the commercial.Therefore, new training files can easily be added to the system for new commercial to be investigated.For example if we have a training file of e.g."DEW" commercial that will detect commercial segments of DEW appear in transmission stream and we get asked to find occurrences of other commercial e.g."Fair and Lovely" in a the transmission stream, all we need to do is make a training file for Fair and Lovely commercial by computing SURF descriptors of Fair and Lovely key_object and run it on system.Because histograms have already been calculated when we checked for DEW so the system will just load them from file and can give result quickly about presence of Fair and Lovely commercial.

V. COMMERCIALS ANALYSIS APPLICATION
An application was created to demonstrate how the proposed algorithm can be used for detection, identification and computing statistics of a target commercial in a video stream.It consists of four (4) main modules: Commercial Discovery is detection of all commercial segments at all positions where they are present inside a video stream based on Scene Identification and detecting repeating patterns of scenes.
Training is used to train the system for a new commercial that was previously unknown to the system.Detection is used for detection or identification and verification of single ad based on training file that is provided.
Detect All allows user to check for presence of all commercials that are known to the system in a specified video stream and provide user with a summary result.
The screen shot of the designed application shown in figure 3. The details of each step are given below.This part is used for finding unique ads inside the given video based on repeating patterns.At video stream and computed RGB histogram is selected.Clicking on Detect Scenes button extracts scenes from frames that present in the selected folder.Find Ads will apply repetition detection algorithm to look for repeating patterns of scenes and mark them as unique ads.
Result can be viewed in a grid and any row can be selected to generate a training data for that row. Figure 4 shows screen Commercial discovery grid, from which any desired ad (scene) may be selected.

 Training
In this screen user can view the selected scenes for a selected commercial and view the parameters that are detected www.ijacsa.thesai.orgfor selected scene or frame.From the selected scene user can click on "Select SURF Object" to select object that will be used for computation of SURF features.
Following three parameters are to be computed.
 MaxDiff is maximum histogram difference between start frame and any other frame of scene.
 LastDiff is value of histogram difference between current frame and previous frame.
 StartDiff is value of histogram difference between current frame and first frame of scene.

 Object Selector
Object selector screen allows user to select a key-object that is used for calculating reference SURF descriptor points.In figure 5, on the left side complete image is shown and on the right side selected object is showed.Clicking on "Test" button calculates surf descriptors for selected region and also calculates surf descriptor for complete image.Then these two are compared to get matching surf points.Number of matching surf points are shown to the user which can be used to decide if selected object is a good candidate for SURF matching or a different object may be chosen.Individual commercial detection steps can be performed by clicking on buttons "Detection Steps" or all steps can be performed by clicking on "Perform Complete Detection" that will perform detection and will show results single ad as shown in figure 6 and all ads shown in figure 7 respectively by indicating a timeline with commercial region highlighted.For evaluation, the commercials in all 3 recorded segments were first marked manually and then the detection algorithm was executed.Outcome automatic detection results were compared with manually marked commercials.Target in this test was to find those ads that appear at least 3 times in the test video segment.Detection results for 3 test video segments are given in the Tables.
Table1 shows results for video segment recorded from HUM TV channel.For segment from HUM television there were no false positive results.In total 20 second of invalid commercials content was marked as commercials.
The performance of the algorithm is measured using standard Precision, Recall and F1 value [15].Statistics of Precision, Recall and F1 value is given in Table 4.
Precision recall and F1 value of each segment is given in table 4. From table 4, the average Precision, Recall and F-1 values for all three segments are 97.6%,84.6% and F1 value is 89% respectively.
For evaluation of commercials identification the system was trained on 6 commercials for different products and evaluated.The identification results are given in table 5. We have developed a framework for commercials detection and identification using color histogram and SURF featured descriptors.The framework performs well for TV transmission having no use of black frame between different programs.The proposed technique is unsupervised and able to differentiate between commercials and noncommercial program parts of transmission in addition to find a way of identifying position, frequency and duration of commercials in a TV transmission stream.
The proposed method can be extended and converted into a complete media monitoring and ad verification package suitable for environment where other solutions are not suitable with a feedback approach used to improve performance of automatic commercial detection.From experiments, it is observed that by increasing the duration of video stream that is analyze, will achieve better results.

Fig. 1 .
Fig. 1.Framework for proposed technique II.LITERATURE REVIEW Most of the work has been done with European or American television transmissions those use presence of black frames between commercials as target for analysis.

Fig. 3 .
Fig. 3. Screenshot of main screen, showing options for selecting frames folder and training files for Ad detection  Commercial Discovery

Fig. 5 .
Fig. 5. Object Selector Screen, left side complete image right side selected cropped object of interest  Commercial Detection

Fig. 6 .
Fig. 6.Single Commercial Detection In figure 6, start time and end time is shown by the timeline with red color bar highlighted regions.This figure shows

Fig. 7 .Figure 7
Fig. 7. Multiple Commercials Identification (Lifebuoy Shampoo selected) Figure 7 shows multiple commercials detected include DEW, Lifebuoy, PANADOL, VOICE and Other Unknown in the input video shown in pi graph and timeline.Each region can be clicked to view the location and size of that commercial in timeline of original provided video.VI.RESULTSThis section will describe results and statistics by running our algorithm on the test dataset.Test dataset composed of 03 segments of transmission that was recorded from two different TV channels -ARY Digital and Hum TV.Two segments of ARY digital were 2 hour 30 minutes in length and HUM TV segment was 1 hour 40 minutes in length.All 3 segments were recorded at 25 frames per second.

TABLE I .
DETECTION RESULTS FOR COMMERCIAL SEGMENT IN VIDEO RECORDED FROM HUM TV CHANNEL

Table 2 ,
shows results for video segment recorded from ARY digital TV channel.www.ijacsa.thesai.org

TABLE II .
DETECTION RESULTS FOR COMMERCIAL SEGMENT IN VIDEO RECORDED FROM ARY DIGITAL CHANNEL

TABLE III .
DETECTION RESULTS FOR COMMERCIAL SEGMENT IN 2 ND VIDEO RECORDED FROM ARY DIGITAL CHANNEL

TABLE IV .
INTABLE, COLUMN T.P IS TOTAL NUMBER OF TRUE POSITIVE DURATION IN SECONDS FOR ADS, F.P ARE TOTAL NUMBER OF FALSE POSITIVE DURATION IN SECONDS, F.N IS TOTAL NUMBER OF FALSE NEGATIVE DURATION IN SECONDS

TABLE V .
SHOW RESULTS OF AD IDENTIFICATION, AD COLUMN IS NAME OF AD, ACTUAL IS TOTAL NUMBER OF TIMES THAT AD WAS SEEN, DETECTED IS NUMBER OF TIMES THAT AD WAS DETECTED BY SYSTEM