Person Detection from Overhead View : A Survey

Abstract—In recent years, overhead view based person detection gained importance, due to handling occlusion problem and providing better coverage in scene, as compared to frontal view. In computer vision, overhead based person detection holds significant importance in many applications including person detection, person counting, person tracking, behavior analysis and occlusion free surveillance system, etc. This paper aims to provide a comprehensive survey on recent development and challenges related to person detection from top view. To the best of our knowledge, it is the first attempt which provides the survey of different overhead person detection techniques. This paper provides an overview of state of the art overhead based person detection methods and guidelines to choose the appropriate method in real life applications. The techniques are divided into two main categories: the blob-based techniques and the feature-based techniques. Various detection factors such as field of view, region of interest, color space, image resolution are also examined along with a variety of top view datasets.


I. INTRODUCTION
In video surveillance, one of the key tasks is to detect, identify, and monitor person in crowded and public scenes such as airports, train stations, and supermarkets.The problem of locating a person in the surveillance images and videos sequences from overhead view has been actively researched since last decades.Top view based person detection got importance, due to its better handling occlusion and providing wide coverage of the scene as compared to a frontal view.Overhead based person detection has many applications in various fields, however, the most significant is surveillance systems.Other applications include person detection in indoor and outdoor surveillance systems, person counting [1] (including pedestrian [2], [3], [4], & passenger counting in railway stations [5], shopping malls, airports, buses [6], person tracking [7], [8], [9], [6] [10], [11], behavioural understanding [12], action recognition, person posture characterization [13], crowd analysis [14], industrial work flow [15], [10], provision of large or more coverage area.Furthermore it is also helpful is search and rescue situations.Privacy issues can also be reduced by using an overhead camera because instead of face images overhead view of the person body is captured [16].
Detecting person in overhead videos and images is a challenging job because of the following factors: person body appearance, the wide range of poses, complex backgrounds, unconstrained Illumination conditions and selfocclusion.For detecting the person in top view videos and images the understanding of the shape, structure and features of overhead viewed person body is mandatory.Once a person is detected the application system can be further improved.
A variety of top view person detection algorithms are available however these algorithms are still far from human ability to detect the person in images and videos just using a single clue.
One of the major hurdles in the person detection task is the flexible nature of person body, such as variation in poses, size, orientation and direction of person body.The variation in hair colour and texture, clothes colour and texture also add to the complexity.Similarly, the complex environment i.e. cluttered background, crowd, and lighting conditions, also create hurdles for researchers.
Various overhead person detection techniques have been developed in the last few years.In this paper, the significance of overhead view person detection techniques have been studied.To the best of our knowledge, this paper is the first review and survey that objects to cover the most significant advances reported in the literature until now.
From the broad literature of overhead person detection, a representative sample of papers have been selected.This paper classify developed techniques into two groups; blob based and features based.
As discussed earlier that task of person detection can be done using two different perspectives; frontal and overhead.The content of the paper and general frame work of overhead person detection has been summarized in Fig. 1.It can be seen that categorically the developed techniques are divided into two groups: Blob based techniques and feature based techniques.
The different detection factors including camera field of view (narrow or wide), the region of interest (person head, head-shoulder, whole overhead body), colour space (RGB, Depth, HSV & YCbCr), the device used for video and image recordings, recording environments (indoor and outdoor) are also shown in Fig. 1.Furthermore the challenges and applications of overhead based person detection are also depicted in Fig. 1.
The remainder of this paper is structured as follows: Section II begins by introducing the overhead based person detection techniques in general.It is divided into two subsections; the first reflects the different blob-based techniques used by most of the researchers.While the second discusses the features-based techniques.Different datasets used in the literature are also examined in Section II.The discussion on the existing techniques, dataset, issues related to the existing overhead based person detection are covered in Section III.Section IV concludes and summarize the paper and provide some future directions.

II. OVERHEAD BASED PERSON DETECTION TECHNIQUES
Locating a person in top view images and videos is merely a two step process; defining the ROI and localizing the person in the input image and video sequence.In overhead, the person in the image can be localized by its head, head-shoulder or sometimes, the whole overhead body is considered.In below Fig. 2 the overhead view of the person body can be clearly seen.As in the Fig. 2 it can be clearly perceived how the shape, size and body orientation of the person changes from an overhead view.In computer vision, it is important to understand how the person is detected in overhead images and videos.This chapter mainly focuses on algorithms which involved people detection in given overhead videos or images.This chapter broadly divides the relevant literature into two practices.In section, two different practices have been discussed, the first is based on simple blob based algorithms, while the second subsection discusses the feature based techniques used for person detection.Fig. 2.

A. Blob based Techniques
In this section, blob based techniques for overhead view person detection is debated.The general framework of blob-based techniques is shown in Fig. 3.These involve background subtraction, foreground extraction, segmentation and pre-processing techniques.Usually, in blob based techniques a foreground image is obtained from background subtraction.Different pre-processing techniques are also used to remove noise, shadow and illumination.A threshold is set to get the desired foreground image.From the foreground image the blob is extracted which is further classified into categories like, a single person, number of the person or another object, etc.The classification is based on the shape, color, motion or other feature of the blob.The review of the techniques which detect a person from top view using blob and background subtraction based techniques is as follows: Cohan et al. [19] and other studied out in this paper adopted a basic background subtraction and segmentation method to detect the person in overhead images.Table I illustrates that majority of the researcher adopted background subtraction and segmentation based methods to detect the individual in the overhead image and video sequences.In pre-processing, morphological operators including erosion and dilation are typically used by mainstream researchers to detect the required blob.For noise removing Gaussian and median filter methods have been used as revealed in Table I Connected component labeling method for blob extraction is used by [2], [20], [18] and [13] shown in Fig. 4. By examining Table I it can be clearly stated that after background subtraction maximum of the researchers used the extracted blob information to detect the person.The basic features of blob including shape, color, edges and size are considered.Fig. 4. Connected Component Labeling Method used by [2], [18] and [13].
The blob shape feature is used by majority of the researchers as shown in Table I.Zhang et al. [7] considered the cylindrical shape blob to detect individual body in images.The author in [8] used the blob shape information to reconstruct the hemi ellipsoid head model of the person with the help of image stitching seen in Fig. 5(a).Ozturk et al. [21] used the elliptical shape blob to detect the person in input images (Fig. 5(b)).Fig. 5. Images of some shape based features [8], [21] used elliptical based shape, [16] two spherical bounds to detect the person.
Similarly [22] used Hough circle to detect the spherical blob in the image Fig. 5(c).The blobs are further divided into two sub-spherical bounds with a same central point.Inner spherical was used to detect a person head while the outer one was used to detect the head shoulder of the person as shown in Fig. 5(c).Nakatani et al. [16] used the shape information of hair whorl shape to detect the person in overhead images shown in Fig. 5(d).
Moreover blob color information is also considered by the researchers e.g.Cohan et al. [19] used color information to detect the human in topview images.Similarly, [17], [23], [16], [21], [24] also used color information to detect the person in overhead images.As seen in Fig. 7 some of the researchers [25], [7], [16], [21], [9] & [26] took advantage of the edge information of the blob to detect the person.Ozturk et al. [21] and Garcia et al. [9] used the Sobel edge method while Mukherjee et al. [26] used canny edge methods for detecting the person in overhead images as shown in Fig.   [21] and [9], Canny Edge detector [26].
The texture information is also a significant factor, some of the work discussed in this paper also used texture feature to detect the person for example Cohan et al. [19], Chia et al. and Nakatani [16] used hair texture information to detect the person head in overhead images as shown in Fig. 8 below.It can be seen from Table I that to detect the person in overhead images there exists a variation in region of interest ROI.Some of the researchers assumed the person head while others considered head-shoulder as ROI to detect the person.Few of them also considered whole overhead person body as ROI for detection.Some examples of considered ROI used by some of the researchers discussed in this paper has been depicted in Fig. 9.In first Row researcher [16], [9] considered Head as ROI.In the second row some [27], [25], [23] & [18] considered Head-shoulders as ROI.Some of them are discussed in this paper [1], [27], [21]& [28] also considered whole overhead body as ROI.Fig. 9.The First Row shows Head as ROI [16], [9] the second row whose Head-shoulder [27], [25], [23] & [18] as ROI while the third in third row [1] , [27], [21]& [28]whole overhead body has been considered as ROI, respectively.
Table I demonstrate that most of the researchers except [12], [16], [22], [22], [8] & [19] used overhead background subtraction methods for people counting.Authors in [25], [29], [28] & [18] have taken account of virtual lines for counting people in top-view images as shown in Fig. 10 below.In the surveillance system, overhead person detection methods along with tracking has also been developed.For tracking the person, overhead images and video sequences has been considered.For tracking purposes different tracking algorithms are adopted as reflected in Table I.
Majority of the work considered for this paper, used depth information to detect the person in images as depicted by Table I and Fig. 12.Few of them also considered RGB images.Authors in [17] and [21] have taken account of HSV model while only [12] used YCbCr model to detect the person in overhead images as shown in Fig. 12.For recording purposes, multiple researchers used variety of recording devices and sensors.Fig. 12.First Row shows the example of depth images, (d) [30], third row contains images of RGB color space while last row include YCbCr [12]and HSV [21] color model, respectively.

B. Feature based Techniques
In recent decades, with the advancement of computer vision and machine learning, various feature based techniques gained importance due to their robustness and efficient detection accuracy.Some of the feature based techniques considered for overhead view person detection has been highlighted in Table II.The general frame work of feature based person detection is shown in Fig. 13 below.
These techniques operate on features extracted from overhead view videos and images.The extracted features are further used to classify person and non-person images.The extracted features contain shape (in the form of contours, edges, corners or other descriptors), color and texture (of hair or clothes), direction, orientation or motion information and sometimes combinations of these.Some used features based algorithms including SIFT and HOG algorithms to detect person in overhead view images as reflected in Table II.The images are often divided into samples for training and testing.These samples are further fed into machine learning classifiers i.e.SVM, adaboost, KNN.One of the attractive, preferred and widely used feature based algorithm for person detection is Histogram of Oriented Gradient [36].This feature based method counts occurrences of gradient orientation localized in image portion.For overhead person detection, Pang et al. in [37] proposed efficient HoG based features detection method for human detection as shown in Fig. 14.Pang et al. [38] proposed another human feature extraction technique using both top view images.The proposed method mainly focused on co-occurrence histograms of oriented gradients of HOG-based human detection method.Rauter et al. [11] proposed a method based on local ternary patterns features for human detection and tracking.The main features used for training SVM was human head and shoulder.
Ahmed et al. [15] used Rotated-HoG algorithm in overhead images in industrial environment as shown in Fig. 14.Another robust algorithm is proposed by Ahmed et al. [39] in which wide angle lens camera is used for capturing overhead view images of person.The proposed algorithm used bounding boxes with variable size having different orientations.Features from detected windows or bounding box images were extracted using RHOG for building a learning or training model with the help of a linear Support Vector machine.[24] also used HOG features to detect after background subtraction to detect and count the people in complex environments.
Another popular feature based algorithm is SIFT [40].Ozturk et al. [21] used optical flow of SIFT features to observe the orientation change of person body and head.From Table II it can be clearly comprehended that SVM is the most popular classifier.To detect a person in image Table II shows that scholars used a kernel or sample called detection window (as shown in Fig. 14).The size of the detection window varies with height and width of the person.Likewise [37] used 80 × 120 size detection window to detect person body in overhead images.The author in [15] used variable size detection window due to the variations in the size of the person as the distance of the person form the camera increases.The author in [11] used 64 × 128 size detection window along with CoHOG to detect person in overhead images.The authors in [41] and [3]   Table II shows that to reduce the feature vector after feature extraction [11] PCA has been used.To reduce feature vector size, [41] used statistical measurement, like mean and standard deviation.For normalization, non maxima suppression NMS was used which improved the performance of the proposed models.In feature based person detection techniques majority of the work consider in this chapter used Non Maxima Suppression in multiple ways (as shown in Table II).To increase the accuracy and efficiency of the feature based person detection techniques.Christian Ertler et al. [3] suggested two extensions to the state of the art Faster R-CNN detection model.The RGB and depth representations were fused at different layers of the proposed model.Their experiment shows that the most eligible layer for the fusion was the mid layer.

III. DISCUSSION
The above discussion shows that most of the work done in literature is based on background subtraction techniques.Most of the researchers used simple background subtraction and segmentation methods to detect the ROI in overhead images including (person head, head shoulder and body).These techniques are effective but only on their own developed dataset.As seen in Table I that most of their researcher developed their own indoor based data set for overhead based person detection.Some of the data sets have been recorded in a constrained environment against simple backgrounds as seen in Fig. 10, Fig. 11 and Fig. 12, etc.Some detect the person in overhead images when the person lies exactly below the camera as shown in Fig. 7, Fig. 8 and Fig. 9 Mostly considered the basic blob features i.e. shape, texture, color and size information to detect the person in overhead images.The comparison of the accuracy of developed blob based techniques cannot be made as every method has been tested on their own developed data set according to environments requirement, illumination conditions and application needs.
Background subtraction based methods performed accurately but it mostly suffers from issues including gradual(day-night) and sudden illumination changes, background geometry (entrance of other objects in scene), camera related issues including (camera oscillation, noise, motion blur), shadow factors, same cloths colors as background and some time self occlusion of the own person body.Some feature based overhead person detection methods has also been discussed in this paper.Feature based methods, increases the accuracy and robustness of overhead based person detection methods.But because of the self generated data sets the accuracy of feature based method also cannot be more meaningful as they have also been recorded against simple backgrounds, low illumination condition.Some of the researcher used very complex industrial environment as shown in Fig. 14.With the recent trend of using a feature-based method to increase the efficiency of the algorithm and with the advancement of machine learning algorithms, this work can be extended and make more robust using deep learning.Feature based methods are robust but sometimes it increases the computation cost for researchers.Also, sometimes the size of feature vector is too large which creates complexity for the researchers.
In this paper, we consider methods used different recording devices for capturing topview images, some used single sensor based device e.g.Kinect, while others used multiple sensors.Majority of the researchers consider in this paper used narrow field of view as shown in Fig. 10, Fig. 11 and Fig. 12.

IV. CONCLUSION
In this paper, different overhead person detection methods found in the literature has been reviewed.Differ-ent issues faced during topview person detection methods have also been discussed.The literature considered for this paper has been divided in to two categories based on blob and features.We have analyzed that most of the work considered for this paper is based on background subtraction and very few discussed feature based methods for overhead person detection.Different challenges and issues related to the data set are also pointed out including noise, complex and cluttered background and self occlusion.From above discussion, we concludes that overhead person detection is widely open to new research and development.The accuracy of discussed methods can not be evaluated because of self recorded datasets.The paper, also suggests new directions for future research.Some of the possible suggestion would be: The developed techniques should be implemented on same benchmark data set, most of the data set has been recorded in constrained environments and illumination conditions.Different feature based methods might be considered for future work.For feature classification advanced machine learning algorithms might be used which makes person detection more efficient and robust.

Fig. 1 .
Fig. 1.Overview of the paper and general frame work of Person detection.

Fig. 6 .
Fig.6.Color information used by some of the discussed used by some of the authors a[23], b[16], c[21].

Fig. 13 .
Fig. 13.General Frame Work of Feature based Person Detection.