Omega Model for Human Detection and Counting for application in Smart Surveillance System

Driven by the significant advancements in technology and social issues such as security management, there is a strong need for Smart Surveillance System in our society today. One of the key features of a Smart Surveillance System is efficient human detection and counting such that the system can decide and label events on its own. In this paper we propose a new, novel and robust model, The Omega Model, for detecting and counting human beings present in the scene. The proposed model employs a set of four distinct descriptors for identifying the unique features of the head, neck and shoulder regions of a person. This unique head neck shoulder signature given by the Omega Model exploits the challenges such as inter person variations in size and shape of peoples head, neck and shoulder regions to achieve robust detection of human beings even under partial occlusion, dynamically changing background and varying illumination conditions. After experimentation we observe and analyze the influences of each of the four descriptors on the system performance and computation speed and conclude that a weight based decision making system produces the best results. Evaluation results on a number of images indicate the validation of our method in actual situation.


I. INTRODUCTION
The state-of-art of surveillance has made a quantum jump in recent years. However with the increase amount of video data to be processed it is becoming more and more unmanageable for human beings to monitor continuously. So if we could develop a surveillance system which could detect and classify objects, take decisions and label events autonomously, then a complete revolution can be brought in the current surveillance system. Vision based Human detection and counting is currently one of the most challenging tasks in the field of computer vision. The general surveillance cameras are like machines that can only see, but cannot decide or identify things or events on its own. So, keeping in mind the present day scenarios, it is important that we make our surveillance system intelligent and smart. Therefore, we propose to design a new framework to robustly and efficiently detect and count human beings, for application in surveillance. The proposed system would consist of: Background subtraction, boundary extraction, Head-neck-shoulder detection and, finally human/non-human classification and based on that, counting the number of human present in a scene. For these, we first intend to subtract the background and extract the foreground of any real time video. There are a lot of techniques available for background subtraction. And Gaussian Mixture Model (GMM) is found to be more efficient in the literature. So we intend to use GMM for this purpose. Moreover some of the commonly faced problems in background subtraction are sudden changes in illumination, dynamic background, camouflage, etc. Hence we intend to design a robust adaptive GMM algorithm which can effectively deal with all these problems and produce a foreground mask. Secondly we intend to detect human presence in the scene by detecting the head and shoulder portion by using the Omega Model. We propose this model because the head-shoulder portion is the most unvarying part of human body. Based on the number of human beings detected we shall count the total number of human present in the scene. And hence the entire system could be used for application in an effective surveillance system. The rest of the paper is organized as follows: In section II we discuss some of the related work in this field; in section III we give an overview of the method adopted for our work. In section IV our human detection and counting system is discussed giving a detailed description of our proposed Omega model explaining each of the descriptors and the algorithm. In section V we have explained results followed by conclusion and our future work in section VI.

II. RELATED WORK
There is an extensive literature on shape classification. Various approaches for shape based classification are discussed in [1][2][3][4][5][6][7][8][9][10].However different moving objects like bird, vehicle, etc may be present in the scene, so it is very important that we correctly distinguish humans from other moving objects. There are mainly two methods for classifying a moving object: shaped based detection and motion based detection [11]. In former one, human can be detected with the help of their shape information. This kind of a work was done in [12][13][14] where they used an SVM classifier to detect human beings based on finding people's head by searching for circular patterns through a 2D correlation using a bank of annular patterns. Also it is a general fact that non articulated human motion exhibits certain periodicity. This property was used by many researchers to detect human beings based on their motion. In [14] based on the color object's moving and background subtraction method, a color classifier based on the HS thresholds was proposed to detect moving object. In [15] edge-based features combined along with color and texture information was used for efficient human detection. In [16] human had been detected by detecting skin like pixels and www.ijacsa.thesai.org locating each face like region. Also some researchers have employed model based human detection [17,18].In [17] such kind of work was done wherein they proposed a method for human detection by modeling human as flexible assemblies of parts represented by co-occurrence of local features. In [18] part detectors were learned by boosting a number of weak classifiers based on edgelet features. Recently in 2013 authors [19] have presented a method for human detection in range images captured from a vertically oriented camera by analysis of 3D range data.

III. OVERVIEW OF THE METHOD
This section gives an overview of the method adopted in our Human detection and Counting system. One of the major challenges in the field of object recognition is the ability to detect human beings irrespective of the variations in pose, body shape, clothing, illumination, moving cameras and changing background. So in this work we have developed the Omega model that could detect human beings under all this challenging scenarios. The methodology or the general flow diagram of our work is shown below: The various steps involved are:  Acquiring real time video input from any video acquisation device  Background modelling using adaptive GMM  Background subtraction and shadow detection  Human detection and counting using the Omega Model.

IV. HUMAN DETECTION AND COUNTING SYSTEM
In this work, we first perform adaptive background modeling to extract the foreground region from a real time Surveillance video. Then we acquire a set of these foreground images from a surveillance video. As we know that the human head shoulder portion is the most unvarying part of human body, so we have used this dominant feature as the key information and developed the Omega Model for human detection.

A. Foreground Extraction
A good surveillance system requires an accurate segmentation of moving objects from a video sequence. Foreground extraction is generally done by using background subtraction, optical flow and frame differencing. However, Background subtraction is one of the most efficient and widely used methods for segmenting dynamic scene in a video. The most common paradigm for background subtraction is to use an explicit model of the background. Background is generally modeled based on some regular statistical characteristics. Intruding objects are then detected by comparing the statistical parameters of the modeled background with that of the current frame. However this method does not work well in surveillance scenarios where the background is generally subjected to challenges like dynamic lightning conditions, long term scene changes, bimodal background, repetitive flickering motions etc. So, for application in surveillance it is important that the parameters of the background are also adaptive. Hence we have employed the adaptive GMM method proposed in [20] for modeling the background. Then its parameters may be updated as follows: Where  is the learning rate for the weights.

If a Gaussian is labeled as unmatched only its weight is decreased as
If none of the Gaussians match, the one with the lowest weight is replaced with Z t as mean and a high initial standard deviation.
The rank of a Gaussian is defined as w/σ. This value gets higher if the distribution has low standard deviation and it has matched many times. When the Gaussians are sorted in a list by decreasing value of rank, the first is more likely to be background. The first B Gaussians that satisfy (1) are thought to represent the background. The Gaussian mixture model (GMM) is adaptive; it can incorporate slow illumination changes and the removal and addition of objects into the background. Further it can handle repetitive background changes like swaying branches, a flickering computer monitor etc. The higher the value of T in (1), the higher is the probability of a multi-modal background.
In our work we have modeled the background as a mixture of three Gaussians.

B. Omega Model for Human Detection
Significant research has been devoted to detecting people in images and videos. Human detection is a challenging classification problem which has many potential applications in the field of machine vision. The main problems in detecting human beings are due to the variations in pose, body shape, clothing, illumination, moving cameras and changing background.
Therefore the main challenge is to find a set of unique features that characterizes human being in a scene, while remaining resistant to the above mentioned problems.
Thus in this work a new algorithm is presented to detect human beings in still images using a set of four descriptors. After the foreground extraction, the human beings have been detected by studying some of their invariant features like the head-neck-shoulder signature.

a) Outline of approach for Human Detection system:
The block diagram of the proposed Human detection system is as shown below in Fig2.
This approach uses a shape based representation of the extracted foreground contour for human detection. The advantages of this approach are:  It can detect human beings even in partial occlusion (when legs are partially occluded).  It is tolerant to varying human pose.  It can detect human beings even if the person is not facing the camera directly.
 The final decision is weight based and depends on multiple evidences obtained from descriptors.

Fig. 2. Flow chart for Omega model for human detection
In this approach, the boundary of the contour of the extracted foreground object has been examined experimentally to obtain some of the invariant features of human beings from the shape of the contour. Four descriptors have been designed to specifically analyze these invariant features and thereafter take a weight based decision to detect the presence of human beings in the scene.

b) Descriptors for Human Detection:
The choice of the distinguishing features for classification is a critical design step and depends on the characteristics of the problem domain. Having extracted the contour of the foreground objects, a set of invariant features have been chosen to detect the presence of human being in the scene. In this work we have developed four descriptors to classify the human beings from other non-human objects by using distinct features that are simple to extract as well as invariant to irrelevant transformations.
From the set of boundary points obtained, by processing the contour of the segmented objects, the main aim here is to develop descriptors that describe the 'Omega' shape (i.e. the shape of upper portion of human body) in the best possible manner.
The four Descriptors we use are as follows:

Descriptor 1 (Ω d ) :( Head-neck-shoulder dimensions of Ω)
 This shape based descriptor is firstly defined by its dimension given as shown in figure (3(b)): {Y max -Y min , X min -X max }  A bounding box is designed to include the object of interest and whose axes are aligned with the image axes as shown in figure (d)  Based on the set of boundary points obtained, coordinates of the centroid are calculated.  From this obtained centroid, data for width of shoulder and neck is obtained.  The data obtained is then experimentally analyzed with a number of training images to obtain a threshold for www.ijacsa.thesai.org describing the optimum ratio of these width and compare with the testing images. Based on this threshold (obtained experimentally) a decision is made if a human being is present in the scene or not.

Descriptor 2 (Ω m ): (Radial Feature of Ω)
 This descriptor particularly defines the radial feature of the human head.  Based on experimental analysis the upper (head) portion of the contour is extracted and a point (S') lying somewhere between the neck and tip of head is obtained.  The radial distance between each of the points in the boundary and point S' is calculated.  The pattern of occurrence of these distances is observed for human contours.  Based on the pattern a decision is taken if the extracted contour is that of human head or not.

Descriptor 3 (Ω k ): (Curvature of Ω)
 This descriptor classifies human based on the information of curvature of human head-neck-shoulder portion.  At each point in the boundary of the contour, curvature is estimated which is an indicator of the amount of bending of the curve that occurs at that position.  Based on the set of curvature values obtained for each of the boundaries of the contour, the patterns have been studied.  Analysis of these pattern shows that a specific number of local minimas occur if the contour under observation is that of human being.  Based on this experimental analysis, a threshold is obtained and decision is taken whether a human being is present in the scene or not.

Descriptor 4 (Ω s ): (Convexity of Ω)
 Here shape description is based upon the convex hulls of the set of boundary points obtained from the extracted contour.
 The convex hull of the set of boundary points of the contour is the enclosing convex polygon with the smallest possible area.  So here we analyze the convexity of the head-shoulder portion of human body. We define convexity (R s ) as:  The ratio obtained above have been analyzed for a number of test image and based on experimentation a threshold have been obtained to detect the presence of human being in the scene.
Weights are assigned to each of the descriptor based on the experimental analysis. Finally based on the decisions obtained from the four descriptors, a weight based decision is taken and if outcome is above a certain threshold than a human being is said to be present in the scene.
The complete algorithm for detecting human beings employing these four descriptors is given in the next subsection.

c) Algorithm for Human Detection: The algorithm, that
have been designed for human detection is as described below. Each of the extracted contour present in an image is processed to find human beings based on the descriptors. Each of the descriptors has been assigned a weight depending on their performance analysis. Finally a weight based decision is taken and compared with a standard threshold (Ω th ) obtained from experimental analysis and accordingly the human beings present in the scene is detected and counted.

V. EXPERIMENTAL RESULTS AND DISCUSIONS
The results obtained for human detection and counting are very satisfactory. Here after background subtraction, the contours present in the segmented foreground image is processed using the developed algorithm for Omega model. The resolution of the camera used in the work is 120X160, running in a 32 bit operating system, 2.00 GHz processor, and 2 GB RAM. The achieved speed of execution for foreground extraction is 21 fps. The developed algorithm was then tested on 100 frames, each consisting one or more number of human beings (including frames where the human is partially occluded i.e legs are occluded) and 50 frames that did not contain human beings. We have achieved a success rate of 95%. The time required to detect human in a frame is 18ms.Certain error arouse due to complete occlusion of the head shoulder portion. However our method is tolerant to changing background and also effectively deals with different poses of head-shoulder shape taken from different camera angles. A Matlab based tool with Graphical User Interface (GUI) has been developed for the ease of use by anyone to detect the number of human being present at a scene.
When the tool is started, the user can browse and select any image containing contours of foreground object. The GUI shows the user a bounding box for each of the object present in the original image. Then the segmented contour for each of the object is also shown in the window and then using the algorithm it automatically generates the count of the number of human beings present in the image. A screen shot of the developed GUI is as shown below.

(i) Human count=3
Algorithm for Omega model for human detection:

Descriptor 1(Ω d ) (Neck shoulder dimensions of Ω)
1. Get the boundary points {x i, y i } for each contour obtained from background subtraction. 2. Find Y min , Y max , X min , X max values for each of the boundaries obtained in step1.( refer fig.3(c) ) 3. Obtain the height (h) and width (w) of the contour. 4. Find the co-ordinates of centroid (C x , C y ) of the contour. 5. Find distance, d= 1/3 of h and d' =1/2 of d. 6. Obtain the following points: (a) X min1 , X max1 < C x (b) X min2, X max2 > C x 7. Define two variables ш 1 (neck width) and ш 2 (shoulder width) such that: (a) ш 1 = X max1 -X min2 (b) ш 2 = X max2 -X min1 8. Take  [Ω k = 1, if a 1 <C< a 2 = 0 otherwise], where a 1 and a 2 are the thresholds for number of local minima.

Descriptor 4(Ω s ) (Convexity of Ω)
16. Find the convex hull of the upper segmented boundary of the contour. 17. Find the area (A c ) of the convex hull. 18. Find the area (A r ) of the rectangle bounding the upper segmented contour. 19. Find the ratio: R s = A r / A c . 20. Take a decision: [Ω s = 1, if r 1 < R s < r 2 = 0 otherwise], where r 1 and r 2 are the thresholds values for R s. 21. Finally take a weight based decision.  VI. CONCLUSION AND FUTURE WORK A method for human detection and counting has been presented in this paper. The key feature of our work is, we have employed four descriptors to detect four invariant and significant feature of human head-shoulder region to achieve our goal. We studied the influence of various descriptor parameters and conclude that none of them can individually detect human, hence we employed a weight based decision system for a good performance. Experiments performed on several images validate the effectiveness of our approach.
In our future work we shall focus on implementing the Omega Model in video, and hence develop a Real-time Smart Surveillance Systems that can decide and label events and give threat alerts for security conscious venues.