A Survey of Pedestrian Detection in Video

—Pedestrian detection is one of the important topics in computer vision with key applications in various fields of human life such as intelligent vehicles, surveillance and advanced robotics. In recent years, research related to pedestrian detection commonplace. This paper aims to review the papers related to pedestrian detection in order to provide an overview of the recent research. Main contribution of this paper is to provide a general overview of pedestrian detection process that is viewed from different sides of the discussion. We divide the discussion into three stages: input, process and output. This paper does not make a selection or technique best method and optimal because the best technique depends on the needs, concerns and existing environment. However, this paper is useful for future researchers who want to know the current researches related to pedestrian detection.


INTRODUCTION
Pedestrian is one of the important objects in computer vision. Machine must be able to detect and recognize pedestrians properly so that it can interact with it. Research related to pedestrian detection the last four years this is a topic that is pretty much done and have increased every year. If seen from the results of studies that have been published in the IEEE from 2010 to 2013, more than 822 journals and proceedings. The amount of studies related to pedestrian detection is quite reasonable because the results of these studies are widely used in various applications. Some examples of applications that take advantage of the research results related to pedestrian detection such as video surveillance, traffic safety, optimization of the navigation system, robotics and its application to the special needs.
The objectives of this paper are to review the research papers related to pedestrian detection in order to provide an overview of the recent developments related to research pedestrian detection. Contribution of this paper is to provide a general overview of pedestrian detection process is viewed from different sides of the discussion. However, this paper does not make a selection or technique best method and optimal because the best technique depends heavily on the needs, concerns and existing environment. Papers were included in this paper review have been selected from the papers that have been published within the period of 4 (four) years, from 2010 to 2013. Figure 1 shows an increase in the number of papers related to pedestrian detection that have been published in the IEEE.
We divide this paper into three stages: input, process and output, to facilitate discussion and understanding of the process of pedestrian detection. Figure 2 illustrates the process of pedestrian detection. In the input process, we discuss the shape of the data and the input device used in the study. The form of the input data and devices will greatly affect the proper method in the detection of pedestrians. For example, in the study [1]- [4] using a smartphone mobile devices as input devices. Given the limited capabilities of mobile devices in the computing process, of course, fast pedestrian detector need to be selected, and it does not necessarily require large memory resources. Once the data are received from the input device, then processed using techniques and specific algorithm. In general, preprocessing will be done in advance to ensure the quality of incoming data and the same format. Besides that, the process of determining the region of interest (ROI) and object segmentation are two processes that plays an important role in the detection process. A number of techniques and algorithms widely researched to optimize this process. Object classification techniques on pedestrian detection processes also play an important role. Some object classification algorithms currently used algorithms, from the simple to the complex. Examples of object classification algorithm that is widely used is the Support Vector Machine (SVM) [5]- [7] and neural networks [2], [8].
Meanwhile, at the end of the process resulting conclusion or result of the detection process in the form of pedestrian annotations. Detection results can be used for decision-making and response to the situation, according to the research objectives. In addition to the three main processes, the paper will also discuss various datasets that frequently used in the various researches.
The organization of this paper is as follows. First, in section II we discuss each stage in the process of pedestrian detection, including input devices, the detection process, 42 | P a g e www.ijacsa.thesai.org datasets and also methods to detect pedestrians. In section III, we discuss some open research issues for pedestrian detection. And in section IV, we provide concluding remarks.

II. DISCUSSIONS
This section will discuss each stage in the process of pedestrian detection. In each sub-section will discuss a number of things that have been done in various researches. In addition, an analysis of the things that have not been done or needs to be improved as input for future researches.

A. Input Devices
When viewed from the type of input device used in pedestrian detection process, there are some devices that had been tried such as laser scanner sensors, thermal sensors, video cameras, PTZ cameras and infrared cameras. Several researches using single laser scanner sensors [9], [10] and some others using multiple laser sensors [11], [12]. Meanwhile, the research of [8] used far infrared sensors that can detect objects in low resolution (long distance) as well as color and texture are less clear. For an environment with poor lighting, such as at night, often used stereo cameras that have night vision features [13].
Another input device is pan-tilt-zoom (PTZ) camera that widely used in the object detection process with the dynamic direction, position and size [14], [15]. The video camera is a type of camera that is most widely used in pedestrian detection researches, as in [16]- [18]. Video cameras are cheaper and easier to find than other types of camera. Meanwhile, several other studies using smartphone built-in camera [1]- [3]. Some recent researches have tried to combine several types of input devices as well as to improve the accuracy of detection results, such as the research by Weimer et al. [19]. He combines laser scanner sensors and infrared cameras. The study by Gang and Sun [17] tried to combine IP-cameras and infrared cameras. In addition to using the camera, Oliveira and Nunes [20] uses sensor technology that can estimate the distance of an object. The sensor technology is called by LIDAR (Light Detection And Ranging).
In relation to the input devices, there are still a few researchers are trying to optimize the use of cameras in large quantities systematically organized. Some researchers have tried to do research in a way makes the camera network as in [15], [21], [22] . However, still a little research that considers the limitations of the data transfer capability in the network given the data in the video format require large bandwidth if it should transfer over the network. Opportunities to conduct research in this domain is still potential.

B. Datasets
In some pedestrian detection methods, training and testing data are needed to test the perform of a method or algorithm. Nowadays, many people provide training and testing data (often called a dataset) and can be downloaded for free. Dollar et al. [23] summarizes some of the datasets are freely available and at the same time publish a more comprehensive datasets, Caltech Pedestrian dataset. Ahad in [24], [25] also summarizes the various datasets associated with action recognition on video. Ahad divides the datasets into tree categories: person dataset as a single object, movement of body parts and social interaction between objects. Table 1 provide some datasets that we summarized based on the results of Dollar et al. [23] research, Ahad et al. [24], [25] and some other papers. Based on papers are included in this paper review, some paper are using the above datasets. When viewed from the amount of usage, INRIA is the most widely used datasets. INRIA a fairly complete datasets and varied. It published in 2005. Because the Caltech datasets and CVC Pedestrian are more complete than INRIA dataset, in future research both of them will be more widely used. In Table 1 are also presented some papers that use each dataset.

C. Detection Process
Once the video data captured from the camera, then performed a pre-processing. It is mainly aimed to normalize and calibrate the input, so the next process can take place properly. We divide pedestrian detection process into two groups, offline detection process and real-time detection process. Offline detection process uses the data input in the form of video or a set of images that obtained from a separate input device. The input data is processed manually, such as standardizing the format, size and so on. Meanwhile, in realtime detection process, the video data is captured directly and in real-time through input devices such as cameras, CCTV or other sensors. Challenges in the process of real-time pedestrian detection is all to be done automatically by the system, so it required detection method that relies on speed. www.ijacsa.thesai.org Several studies in real-time pedestrian detection as in [7], [12], [55], [56] .
After the pre-processing stage, the further stage is object segmentation or segmenting the ROI (region of interest). Segmentation of the objects from the background or other objects is a significant step in the process of pedestrian detection. The better object segmentation process will result in a better level of accuracy as well. The simplest and fast segmentation process is background subtraction techniques as in the study [57]- [61]. However, the background subtraction techniques have weakness when applied to dynamic environments. In a dynamic environment, the background can change suddenly and unpredictable. But it weakness can be overcome by adaptive background subtraction techniques [62]- [64]. In adaptive method, the background is determined adaptively and adjust environmental conditions. This technique has resulted in the detection process becomes slower than the static background subtraction because computation performed continuously for every frame in the video. It facts, not all object segmentation method requires the separation of background first. Several studies to segment and classify objects by extracting certain features in the image. Examples of features used are HOG [26] and optical flow [65], [66]. Table 2 provides the various methods of pedestrian detection. Of papers that discuss pedestrian detection, Histograms of Oriented Gradient (HOG) is the most widely used features. HOG-based technique proved quite accurate for pedestrian detection process both in image and video. HOG method originally proposed by Dalal and Triggs [26]. Furthermore, many researchers do modifications HOG method to improve the level of accuracy and speed. Table 2 presents some pedestrian detection methods that utilizes the HOG method. Qu and Liu in [16] , proposed modifications to the method to be Non-background HOG that improves the ability in terms of noise reduction of the image background. Other studies have suggested a new method of Gaussian Particle Swarm Optimization (Gaussian-PSO), an HOG-based detection technique with the ability to more quickly and accurately [29].
One of the recent studies related to pedestrian detection proposed CHOG-DOD method [67]. This method override the previous methods were HOG features are computed based on the image blocks. In a cell-based HOG (CHOG) algorithm, the features in one cell are not shared with overlapping blocks. To increase the speed of the detection process, feature extraction through distributed to multiple frames at once. In other words, the process of feature extraction and classification is distributed in the current frame and several previous frames. www.ijacsa.thesai.org The method is tested by INRIA dataset and use SVM classification algorithm. The method has a speed of up to 21.24 time per frame, and it only requires a 252-dimensional features vector. There are much smaller dimension than the BHOG method [68] which requires 3780-dimensional feature vectors.
If seen from the features used for the detection process, the pedestrian detection process can use several features, including primary or derivative feature. Some of the features are the trajectory of the object [69], [70], edge [71], shape [72]- [74] and HOG. Two latest features are the most widely used in researches. Many studies modify the features of HOG to improve performance and speed. Li et al. [42] combine HOG feature and ABM (Active Base Model) feature to improve the accuracy of detection results in a complex traffic. It tested on INRIA and TUD dataset and resulted in the detection accuracy rate of 90.2%. In the other study, HOG features applied to pedestrian detection process by cascaded full body and part based detectors [46]. It detection framework capable of efficiently classifying both un-occluded and partially occluded pedestrians. Dollar et al. in [35] was applied the HOG feature of an object at different sizes. This method will increase the speed and accuracy of object detection process. Pedestrian in any sizes can be detected very well by using these methods.
Some pedestrian detection methods are utilizing the shape features of the object. One of them is the Shape Context method that perform matching and object recognition based on shape [75] . Furthermore, the method is also developed in [76] for infrared images. The method is known as ISC (Improved Shape Context). The results showed that the method is suitable to be applied to the infrared image. Compared with the method using HOG feature, ISC method has better accuracy rate of 4.95%.
Classification is part of the pedestrian detection process that very important. Classification algorithm will classify the extracted features into several classes. Of the whole paper are included in this paper, SVM (Support Vector Machine) classification is the most widely used method. Of the 26 papers are included in Table 2; there are 17 papers that use the SVM classification method and its derivatives. SVM is one technique that can be used to perform data classification and prediction. This method is rooted in statistical learning theory that the results are quite good when compared to other methods. The main principle of this technique is to find the function of separator (classifier) that is optimal to separate the data in a different class. In the neural network techniques, all training data to be learned during the training process, and then the SVM is only a number of selected data are included in the training process. It is the excess of the SVM because not all training data to be included so that the process will be faster. The data involved in the training process is called support vector.
In the study of [77], conducted a two-stage process with an SVM classification method to improve the accuracy and speed of pedestrian detection process. In the first stage of the classification process, SVM method is used to eliminate the errors in determining the ROI based on the training data. In the second stage, the ROI has been obtained from the first phase will be considered as a pedestrian. SVM method are more strongly applied to classify the ROI becomes a pedestrian or not. The results of the study showed overall FPPI (false positives per image) value by 78%. In terms of speed, the two-stage classification method increases the speed up to ten times.
In addition to pedestrian detection method based on the shape of the object, a method based on the movement in the video is also quite effective and widely studied. The method is quite accurate and potential because pedestrian movement has its specificity. Changes in sequential movements can be detected and predicted, although anomalous movements may still occur. The study of [70] using statistics on the movement of objects and HOG feature selection techniques for detecting pedestrians. Bayes classification method is also used to increase the speed in the study. The results are good enough to perform object detection in environments where pedestrian are quite close to the camera.
Research for pedestrian detection is also implemented in the smartphone mobile devices, such as in the paper [2], [3], [87] . Limited computing capability may be solved by the use of appropriate algorithms. Shin et. al. [2] build a navigation systems in the room by using Pedestrian Dead Reckoning (PDR) method. He uses some sensors that available in a smartphone such as a motion sensor, accelerometer and gyroscope. A neural network algorithm is applied to perform classification. Same method was also applied in [3] but does not use GPS and WIFI function because both require high memory resources. Optimal algorithm is also proposed in [87] in order to optimize the detection process on a smartphone device.

D. Pedestrian Detection Applications
As already stated at the beginning of this paper, the results of pedestrian detection are widely used in various fields. Some of them are robotics, surveillance systems, traffic analysis, advanced driver assistance systems and many other fields. One of the applications that take advantage of pedestrian detection technique is an application that calculate the pedestrian either in indoor or outdoor environments such as a shopping centers, airports and streets [55], [88]- [90] . Another area that using pedestrian detection is an industrial environment [91] and applications that provide driving navigation within an indoor or outdoor area [2], [92], [93] .

III. FUTURE RESEARCHES
Although it has been quite a lot of papers that discussed the pedestrian detection, but future researches are still potential in this fields. There are still many issues to be resolved. Finding effective methods and appropriate with the environmental conditions also need to be done continuously. Based on the papers that have been reviewed, there are some potential research that can still be developed in the future.
 The speed and accuracy of detection methods still need to be improved, especially in relation to the use of cheap input devices such as web camera or smartphone camera. Although several methods of detection have www.ijacsa.thesai.org reached a level of accuracy up to 90%, but it is still performed on a simple dataset, not complex dataset.
 Improving the accuracy of detection by applying multiple cameras also still needs to be studied further. The handling of a single camera is easier to do when compared with multiple cameras because multiple cameras must consider communication and tasks management between the camera with another camera.
 Limitations of memory and computational capabilities of the input device such as a camera requires a new breakthrough in terms of resources management and computational processes. One of the potential techniques is separate between the input device (client) to the server. It might be achieved more effective process.
 With the need to build a detection system that can run in real-time, it is necessary to further study the optimization of data transfers from the input device (client) to the server over the network. Compression or data selection process may be done to speed up the process of data transfer over the limited network speed.

IV. CONCLUSIONS
In this paper, we survey some papers related to pedestrian detection in video. Through this paper, we got an overview of the current researches related to various techniques and methods of pedestrian detection. Although this paper's study does not conclude the best method, but the results of experiments that have been conducted by previous researchers explained briefly. And also useful for future researchers who want to know the current researches related to pedestrian detection. In addition to discuss the process of pedestrian detection, we present some pedestrian datasets frequently used in the various studies. Future research related to pedestrian detection focuses on how to improve the level of accuracy of the method, the use of multi-cameras, the optimization of resources and processes that improve the speed, and pedestrian detection in real-time.