An Abnormal Behavior Detection Method using Optical Flow Model and OpenPose

Abnormal behavior detection and recognition of pedestrian in escalator has always been a challenging task in intelligent video surveillance system. To cope this problem, a method combining optical flow vector of passenger with human skeleton extraction is proposed. At first, adaptive dual fractional order optical flow model is used to estimate the optical flow field under scenes with illumination changes, low contrast and uneven illumination. At the same time, the OpenPose deep convolutional neural network is used to extract body skeleton and persons in image can be located. Then, the optical flow field and the human skeleton are combined to obtain the optical flow vector of the passenger head. After that the optical flow field of the passenger head and the step of escalator under the passenger foot are used for abnormal behavior detection and recognition, random forest is employed to behavior classifier. Experimental results show that our proposed method and its improvement strategy can accurately estimate the optical flow field in real time of low contrast outdoor videos with insufficient illumination, uneven brightness and illumination changes, the accuracy of abnormal action detection and recognition can reach to 97.98% and 92.28%. Keywords—Image sequence analysis; abnormal behavior recognition; fractional order variational optical flow model; random forest


I. INTRODUCTION
With the rapid development of machine vision, artificial intelligence and other related technologies, intelligent video monitoring system has been widely used in supermarket, campus, medical treatment, government and other public places, which plays a very important social value in improving people's living standards and safeguarding the safety of life and property. Abnormal behavior recognition is one of the core tasks of most intelligent video monitoring systems. It is an effective means of intelligent monitoring the public places in real time.
The traditional abnormal behavior recognition methods can be divided into three steps: target detection, target tracking and behavior recognition. Target detection is to find the interested target methods in image. When the interested target is the escalator passenger, some simple geometric models can be used to match the human body, such as skeleton model [1][2], two-dimensional human body model [3][4] and three-dimensional human body model [5][6]. Target tracking is to track the interested moving target in real time. Behavior recognition is based on the motion characteristics of the interested area to identify whether its behavior is abnormal. In literature [7], a feature detection method based on dense trajectory descriptors is proposed; in literature [8], a human behavior recognition method based on Fisher vector is proposed; in literature [9], a hierarchical image segmentation method based on time and space is used. However, these methods are easily affected by scene changes, feature selection and feature extraction. In the outdoor moving escalator scene, the scene is particularly affected by the change of light, the difference of light intensity in different time periods is very large, and the difference of light intensity in different elevator positions in the same time period is also very large, at the same time, the target features are easy to be occluded. At present, there is no high-precision behavior recognition method in the moving escalator scene yet.
The neural network based method is a research hotspot in recent years. This kind of method only needs to design neural network according to the input image and its characteristics, and then carry out model training by the training samples labeled abnormal behaviors, after the neural network finished training, we can get whether the image contains abnormal behaviors and the types of abnormal behaviors for each input image. In [10], the image sequence is divided into time stream and space stream, which are processed by different convolution neural networks, and then the results are combined; in [11], a two-layer convolution neural network based on human body region is proposed to distinguish human behavior; in [12][13], different convolution neural networks are designed for tools and actions of pedestrian hands in the scene. However, the training of the network model usually needs the support of a large number of data samples. In the scene of outdoor moving escalator, the collection of abnormal behavior samples such as falling is particularly complex, and due to the large difference of light intensity in different time periods and different elevator positions in the same time period, the required number of training samples is particularly large, even up to tens of millions of levels. But so far, there is no database based on the abnormal behavior recognition for the scene of moving escalator.
29 | P a g e www.ijacsa.thesai.org Aiming at the problem of abnormal behavior recognition for escalator passenger, we apply adaptive fractional variational optical flow model and OpenPose model to generate a new method. In this algorithm, adaptive fractional order variational optical flow model is used to solve the problem of feature estimation in the scene of illumination change, low contrast and uneven brightness. OpenPose model is used to track escalator passengers in real time, and random forest classifier is used for feature classification and recognition. The advantages of the algorithm are: only extract the optical flow features of a key point that is not easy to be occluded; use image sequence instead of a single image for feature classification; only a small number of samples can complete the training of the classifier.
The innovation of this paper is as follows: 1) A method of identifying the abnormal behavior of the passengers in the walking elevator in the outdoor scene is proposed; 2) The continuous optical flow feature of image sequence is used to distinguish the abnormal behavior of the passengers in the walking elevator.
The rest of the work is listed as follow: Section 2 shows the flowchart of the algorithm; Section 3 introduces the optical flow model used in this paper; Section 4 presents the skeleton extraction method; and Section 5 is the experimental results and analysis. Section 6 concludes the paper. Fig. 1 shows the flow chart of this paper's abnormal behavior detection and recognition algorithm: first, obtain the input video, then extract image sequences from input video, and then carry out Gaussian filtering, light compensation and other preprocessing for each image, and then carry out the optical flow field estimation through the adaptive fractional variational optical flow model; at the same time, use the depth neural network based on OpenPose to extract the human skeleton of passengers in each frame of the image; and combining the human skeleton and the optical flow field at the same position, the velocity and direction of the passenger's nose joint are calculated. Finally, according to the characteristics of the optical flow vector of the passenger's nose joint and the escalator steps, the random forest classifier is used to distinguish whether the passenger's behavior is abnormal, and the abnormal behavior is classified. Because the optical flow vector of the same moving speed pixel point closer to the camera position is larger, the optical flow vector of the key point must be standardized before use. In this paper, the ratio of the vertical distance from the neck joint point of the human skeleton to the connecting line of the left hip and the right hip is taken as the reference standard to standardize the optical flow vector of the key point.

III. ADAPTIVE FRACTIONAL OPTICAL FLOW MODEL
The optical flow model is based on DFOVOFM (Dual Fractional Order Variational Optical Flow Model) [14]. In order to enhance the correlation of similar optical flow areas and improve the accuracy of optical flow estimation, the adaptive strategies of fractional order and fractional differential mask are added into DFOVOFM. DFOVOFM is the fractional version of HS (horn Schunck) optical flow model [15], that is, the data and smooth terms of the integral derivative in the original model are replaced by fractional derivative: Where () E u represents the energy function of the optical flow model, represents the vector of the optical flow field, , uv represents the components of the optical flow field in the axial , xy , respectively, x DI  is the fractional derivative of the brightness function I in the axial x , and y DI is the fractional derivative of the function in the corresponding axis,  is the smoothing parameter, 3

R 
, and represents the spatial range of calculating the fractional derivative of the target point ( , , ) i j t . www.ijacsa.thesai.org The above model can rewrite using the convolution form of fractional order differential mask and optical flow vector or brightness function: In the calculation process, the image SNR is used to calculate the fractional order and the size of the fractional differential mask of each pixel point. The obtained optical flow field is segmented by super-pixel, and then the shape of the fractional differential mask is adjusted. Each fractional differential mask is limited in the same super pixel area. The optical flow field is recalculated by the adjusted fractional differential mask. After repeated for 5-10 times, the optical flow field with distinct contour and minimum error rate can be obtained. The detailed process can be found in reference [16].

IV. EXTRACTION OF PASSENGER SKELETON
Accurate, real-time and stable tracking and positioning of escalator passengers in complex scenes is the prerequisite for subsequent abnormal behavior detection and recognition. The human body model is the basis of recognizing the human body position in the image. Compared with other human body models [3], the human skeleton features can directly reflect the contour and position of the human body, and the probability of misjudging other targets as human body is small, so the human skeleton is selected to represent the escalator passengers. Compared with the traditional image processing or machine learning methods, the method based on deep learning can extract human skeleton more accurately and stably. It obtains skeleton extraction network through iterative training, processes the input original image through the network, and finally outputs skeleton extraction results. With the continuous development of deep learning methods, skeleton extraction network has been improved and put forward one after another. Among them, the deep convolution neural network based on OpenPose [17] is a kind of extraction method which can extract human skeleton in real time and accurately and stably.
It has gradually become the current mainstream skeleton extraction method and is widely used in the engineering field.
In this paper, the method of literature [18] is used to extract the two-dimensional skeleton of escalator passengers. In this method, firstly, through the trained OpenPose model, we can accurately and stably detect the human body joints and the bones between joints in the complex escalator environment, and then use the part affinity fields (PAFS) to associate the joints of the human body, thus forming the two-dimensional skeleton of the human body. The human skeleton consists of 14 joint points and 13 human bones. The joint points are nose, neck, left shoulder, left elbow, left wrist, left hip, left knee, left ankle, right shoulder, right elbow, right wrist, right hip, right knee and right ankle. Fig. 2 is the sketch map of human skeleton extraction: Fig. 3 is the human skeleton model, in which the red dot represents 14 joint points of the human body, and the 13 green segments represent the human skeleton; Fig. 3(a) is the human skeleton extraction result of the escalator passenger; Fig. 3(b) deleted other image contents, only the human skeleton of the escalator passenger is retained. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 5, 2020 31 | P a g e www.ijacsa.thesai.org V. EXPERIMENTAL RESULTS AND ANALYSIS The experiment is divided into several parts. First, the experiment of estimating the optical flow field of the adaptive fractional order optical flow model is carried out to verify the superiority of the optical flow model in the light abnormal environment and its practicability in the outdoor scene. Second, the relationship between the passenger's behavior and its optical flow vector is explained by demonstrating the optical flow field of the passenger's different behavior. Third, the application of the random forest classifier [19][20] is beneficial. This paper classifies the passenger behavior with the characteristics of optical flow to verify the practicability of the algorithm. All the experiments are run in MATLAB. The selected platform is win 7, Intel 3.3 GHz, 16 GB memory.

A. The Result of Adaptive Fractional Optical Flow Model
MPI Sintel (scene with insufficient light and low contrast), Kitti (scene with inconsistent light) and outdoor image sequence (scene with inconsistent light and insufficient light) are selected for algorithm evaluation.
The experimental results are shown in Fig. 4. Different letters represent different algorithms, a) is Hast, b) is MDP_Flow, c) is PH_Flow, d) is DFMVOFM; the image sequences include: MPI_Sintel_cave3, MPI_Sintel_shaman1, KITTI; Fig. 5 is the optical flow field of outdoor image sequence. It can be found from the figure that the method in this paper can get clear moving object contour in the abnormal illumination scene, but some texture details will be lost, such as the sharp cone on the girl's hand in cave3; Hast is better than other algorithms in highlighting moving details, but in the large non texture area, the object contour is incomplete, such as the tail of the monster in cave3, the sleeve contour in shaman1; PH-flow will have a big deviation when the light is insufficient; MDP-flow will fail in the low resolution area. However, in the task of abnormal behavior recognition, it is necessary to obtain the complete contour of the target in the case of abnormal light, and do not need to care about the texture details of the target. Compared with the Hast, MDP_Flow and PH_Flow with higher average accuracy, the optical flow model selected in this paper is more suitable for escalator riding customer's abnormal behavior identification task. Feature  Fig. 6, Fig. 7, Fig. 8 and Fig. 9 show the color coding diagram and human skeleton extraction diagram of the optical flow field of normal passenger riding, passenger retrograde, passenger falling forward and passenger falling backward, respectively. In order to avoid the transient state of slight shaking of passengers, which is also regarded as abnormal behavior, four frame images, namely three consecutive optical flow features, are used as the basis for abnormal behavior discrimination. The color coding graph of optical flow field uses different colors to represent the direction of optical flow vector, and uses the depth of color to represent the size of optical flow vector.

B. The Optical Flow Feature for Different Passenger
When the passenger stay on the escalator normally, the optical flow field of the passenger is the same as that of the escalator step under his/her feet, as shown in Fig. 6; when the passenger is retrograde, the optical flow field of the passenger is opposite to that of the escalator step under his /her feet, in Fig. 7, the color of the passenger is purple, while the color of the escalator step under his / her feet is green and light yellow. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 5, 2020 32 | P a g e www.ijacsa.thesai.org  Fig. 8 shows the characteristics of light flow when the passenger falls ahead. It can be found that the color of the passenger is dark yellow, while the color of the escalator step under his feet is green or light yellow. This is because the movement direction of the passenger is not different from the reverse direction of the escalator, but the movement speed is faster. The fall in the experiment is a simulated fall, and the movement speed of the real fall is faster. Fig. 9 shows the light flow characteristics when the passenger falls back. It can be found that the color of the passenger is red, while the color of the escalator steps under his feet is green and light yellow. Therefore, the experiment shows that different behaviors of passengers correspond to different characteristics of optical flow of passengers, and the difference of optical flow characteristics between passengers and escalators can be used for detection and recognition of abnormal behaviors of passengers.

C. The Classification and Recognition of Abnormal Behavior
1) The dataset: In view of the escalator abnormal behavior detection task, the escalator passenger behavior data set is built. The data set consists of continuous image sequence and test video. The image sequence comes from the interval sampling of escalator monitoring video, which is used to train the classification model; the test video is the video segment with key information intercepted from the monitoring www.ijacsa.thesai.org video, which is used to test the detection and recognition of abnormal behavior.
The dataset contains 4000 images and 48 videos. Images are used to train the classification model, in which every four images represent one behavior and construct 1000 passenger behavior sequences, including 100 normal behavior sequences and 900 abnormal behavior sequences. The normal behavior sequence includes passenger walking and riding, passenger standing and riding, slight head up, down, left and right shaking, passenger squatting, etc.; the abnormal behavior includes retrograde, forward and backward falling. The 48 videos include zcdc1-zcdc8, nx1-nx20 and sd1-sd20. Among them, zcdc1-zcdc2, nx1-nx5, sd1-sd5 are the videos intercepted under sufficient and uniform illumination, which are collectively referred to as PA1; zcdc3-zcdc4, nx6-nx10, sd6-sd10 are the videos intercepted under sufficient and uneven illumination, which are collectively referred to as pa2; zcdc5-zcdc6, nx10-nx15, sd10-sd15 are the videos intercepted under insufficient and uniform illumination, which are collectively referred to as PA3; zcdc7-zcdc8, nx16-nx2 0, sd16-sd20 is the video captured in the scene with insufficient illumination and uneven illumination, collectively known as PA4. All images and videos are captured in the scene of sparse escalator passengers, that is, there is no mutual occlusion between passengers in the image.
2) The process of classification: First, the training database is established. The elements in the database are composed of the category code of the behavior sequence and the optical flow vector of the nose joint points and the foot step pixel points of three consecutive images in the behavior sequence. Then, the training database is input into the classification model to train the classification model. In this paper, random forest is selected as the classification model. Finally, the trained classification model is tested in 48 videos.
3) Experimental results and analysis: The performance of the algorithm is evaluated by using general indicators, which are precision, recall, F1 score and time. The unit of detection speed is FPS (frame per second). Table I shows the test results in four different scenarios. Among them, sufficient, uniform and sparse represent whether the lighting in the scene is sufficient (a), whether the lighting is uniform (b). It can be found from the table that the algorithm in this paper has achieved high accuracy in four kinds of videos, which shows that the algorithm can accurately detect abnormal behavior when the passenger flow is sparse. In addition, the decrease of harmonic mean value caused by insufficient and uneven illumination is 1.17% and 0.81% respectively, which shows that the algorithm is robust to the change of environmental factors. The average processing speed of the algorithm in this paper is around 16 FPS, and the real-time detection speed of the algorithm is higher than 20 FPS, so the algorithm in this paper needs to adopt the method of interlace sampling to meet the real-time requirements.
In abnormal behavior recognition, it is easy to have some wrong classification, mainly because of the intersection of different types of feature space. For example, when a passenger falls backward or retrograde, the light flow field at the nose joint of the passenger is red, the difference is only that the red is deeper when the passenger falls backward; and when the passenger squats down, it will mix with the passenger's falling forward. This is due to the limitations of experimental conditions, the training samples of falls are simulation rather than real falls; the movement speed of real falls is much faster than that of simulation, and the number of training samples is also an important factor affecting the accuracy of recognition. Table II shows the results of abnormal behavior recognition. The abnormal behaviors were divided into retrograde (NX), backward (FB), forward (FA) and other abnormal behaviors. From the table, it can be found that in the sparse crowd scenario, the accuracy of the algorithm in this paper for all kinds of behavior recognition is more than 90%, which shows that the algorithm can accurately identify abnormal behavior when the passenger flow is sparse.

VI. CONCLUSION
In this paper, an algorithm of escalator passenger abnormal behavior recognition based on adaptive fractional variational optical flow model and OpenPose model is proposed, which solves the problem of escalator passenger abnormal behavior recognition under outdoor and sports background. Firstly, the adaptive fractional variational optical flow model is used to estimate the optical flow field in the scene of illumination change, low contrast and uneven brightness, so as to obtain the motion speed and direction of each pixel in the video image; at the same time, the human skeleton extraction method based on OpenPose is applied to solve the passenger positioning problem in the complex scene; then the optical flow field and human skeleton are combined Finally, according to these two characteristics of optical flow field, a random forest classifier is used to detect and identify the abnormal behavior. Experiments show that this method can detect the abnormal behavior of escalator passengers in real time in outdoor scenes, and has important application value. Future work would focus on recognizing other abnormal behavior and the construction of a public dataset with different abnormal behaviors in moving escalator scenes.