Abnormal Event Detection using Additive Summarization Model for Intelligent Transportation Systems

— Video surveillance is used for capturing the abnormal events on roadsides that are caused due to improper driving, accidents, and hindrances resulting in transportation lags and life-critical issues. It is essential to highlight the accident keyframes in videos to achieve intelligent video surveillance. Video summarization plays a vital role in summarizing the keyframe for an abnormal event from the stacked video surveillance input. The observed video is converted into frames and analyzed for providing an accurate summarization for accident analysis forecast and guiding the users in avoiding such events. The main issues in summarization arise from the inconsistency between the spatiotemporal redundancies and the classification of sequence verification in video surveillance. This article introduces an Additive Event Summarization Method (AESM) for projecting classified events through a gated recurrent unit learning paradigm. In this process, the gates are assigned for unclassified and active frames for sequence verification. Based on the sequence, the abnormality is classified and summarized with higher accuracy than the state of art techniques. This proposed method relies on heterogeneous features for classifying events with better structural indices. The proposed method’s performance is analyzed using the metrics accuracy, false rate, analysis time, SSIM, and F1-Score.


I. INTRODUCTION
Road transportation is one of the cheapest and easiest among the other types of the transportation system. Many people around the world are traveling via road to travel from one place to another. Abnormal event detection is one of the critical tasks to perform in transportation systems [1]. Closer circuit television (CCTV) plays a major role in detecting events which are occurred on the roadside. A transport monitoring system plays a vital role in analyzing every detail which is occurred on the roadside and helps to protect people. Accurate abnormal event detection helps to reduce the crime rate and death rate on roadsides [2]. Abnormal events on roadsides are classified based on the physical attributes and behavior, postures, and gestures of the vehicle's position. The abnormal Event detection process is done based on two stages namely classification and video summarization process [3].
The video summarization process is done by analyzing the events which are occurred on the roadside based on certain keyframes or parameters from the given video clips. Keyframe plays a major role in the summarization process which helps to identify the exact features of the video which is done by comparing it with an important set of features [4]. The classification process is processed by combining both normal and abnormal events which are occurred on the roadside and then it produced a dataset that contains the cause of abnormal events in a detailed manner [5]. The Video Summarization process is used in every monitoring system to enhance the network by understanding the exact cause of events by analyzing the given set of videos [6]. The video summarization process helps to control accidents and crime on roadsides. A keyframe is generated to identify the exact actual cause of the events and it also helps to find out the upcoming events based on people's activities [7]. The machine learning algorithm is mostly used in the summarization process which helps to increase the accuracy rate in the detection process and also helps to reduce the time consumption rate in processing data [8]. A dynamic hierarchical clustering algorithm is used in the summarization process which is done by training the data which are captured by CCTV and producing trained data for further uses. It is done by combining the current clips or data with the previously collected data and generating detailed information which helps to prevent the upcoming accident [3,9]. A reinforcement algorithm is also used here to identify the keyframes based on features such as gestures, signs, and postures of people and produce sequenced keyframes which help to reduce the crime rate on roadsides [10]. The main disadvantages of the video summarization cause contradiction between the spatiotemporal redundancies and sequence prediction. The proposed Additive Event Summarization Method (AESM) is used for projecting classified events through a gated recurrent unit. The proposed system is used to increase the chances of early classification due to limited state-based classification and summarization. The main advantages of the proposed system are to decrease the computational complexity caused by recurrent replicationbased classification and reduce the sensitivity of the output. The experimental analysis of the proposed work is conducted using the dataset of UCSD to find the predominance compared to state of art techniques. The remaining sections of the paper are organized as follows. Section 2 presents the analysis of the related work with the merits and limitations. Section 3 presents the proposed work with a detailed mathematical www.ijacsa.thesai.org analysis of the summarization and classification process. Section 4 describes the experimental analysis of the proposed work. Section 5 concludes the paper with its contributions and the scope of the research.

II. RELATED WORK
Yang et al. [11] proposed an algorithm for the learning model in the real-time event summarization process HRES. The learning model is proposed to capture the information which is stored in the knowledge base (KB) and implicitly the information based on the queries which are given to the users. The proposed HRES method improves the robustness and effectiveness and reduces the time consumption rate.
Wan et al. [12] proposed a long video retrieval algorithm based on a superframe segmentation process for ITS event detection. A long video stream is used to identify the unwanted frames which are present in the database and helps to reduce the unnecessary frame. The segment of Interest (SOI) is generated by using the superframe segmentation process. The proposed method increases the effectiveness by reducing the retrieval time.
Thomas et al. [13] proposed video summarization based on a perceptual model for the roadside event detection process. This method is used to find out the optimal solutions by analyzing the vast number of videos that are captured during accidents time. The surveillance camera is used here to capture video on the roadside. The proposed method increases the accuracy rate in the detection process.
Ji et al. [14,15] proposed a summarization method based on a multi-video by using archetypal analysis on a multimodal weighted method. To create WAA weight, the multimodal graph is used which is done based on the query. A multi-modal graph is used to fuse the information which is generated such as the tags, frames, and video clips for the prediction process. The proposed method outperformed the traditional summarization method by increasing the accuracy rate. The proposed method introduced a sparse coding framework for video summarization using query-aware. The proposed framework uses web images for identifying the exact information of the events. Unsupervised multi-graph fusion is used here to find out the keyframes which are available in the database based on the priority of the queries.
Elharrouss et al. [16] proposed multiple human action detection methods for the recognition and summarization process. Human activities are analyzed and generated into sequences to form a dataset. Then the sequence is divided into shots for the detection process. The histogram of oriented gradient (HOG) is framed based on the frames which are generated by based on the given video clips. The proposed method increases the efficiency and accuracy ofthe recognition and summarization process.
Zhang et al. [17] proposed a method that uses the key contents of the frames from the given video. A discriminator is used to find out the keyframes for the summarization process. The proposed approach increases the efficiency and accuracy rate.
Yang et al. [18] proposed a new framework using a deep neural network to leverage the benefits generated by the systems. It uses LSTM to represent the priority of the queries which are captured by the network. The proposed method increases the accuracy rate.
Gao et al. [19] proposed a key framework for the video summarization process of surveillance videos. Videos are sequenced based on the overlapping maps features. The clustering approach is used to finalize the key frames and generate an accurate set of frames for further use. The proposed method increases the performance and effectiveness of the system.
Lei et al. [20] have introduced a video summarization model using action parsing driven by a reinforcement algorithm. Action parsing is used to divide the videos into a sequenced part which is used in the final stage. The proposed system deals with recurrent neural networks used in the summarization process which selects the frames based on the actions and activities. The proposed method increases the accuracy rate and classification rate of key frames.
Ji et al. [21] have proposed a new video summarization method by combining a deep attentive ad semantic preserving approach. The Huber loss approach is used to replace the error loss which is occurred during summarization. A deep learning approach is used to ensure the security and safety of the keyframes. The proposed framework increases the performance and robustness of the system.
The proposed method is designed for mitigating the inconsistencies in the frame series detection process. In the MWAA process, the graph alignments are based on weights that imbalance the detection due to frame segregation. Contrarily, the sparse representations in the proposed ERA-SS increase the complexity due to multiple superframes. In this process, the computing time is hiked due to frequent switches over. Therefore, these drawbacks increase the difference in pixel representations, resulting in errors.

III. PROPOSED METHOD
The proposed method intakes video inputs for analyzing its sequence and event detection. The input videos are segregated as frames from which distinct features are extracted for analysis and classification. The data from external dataset is used for validating the proposed method. The input is split into different parts for individual processing as presented in the below Fig. 1. Based on the gate assignments for the observed variations are presented for analysis. In Fig. 1, the proposed method's process is illustrated. In the series assignment (as in Fig. 1), the mixed heterogeneous features namely contrast and entropy are analyzed. First, these two features are extracted from the frames as defined in (1).
In equation (1) the variables and represent the pixel density for a frame of size . Here and balance to and for uneven pixel frames or and for even pixel frames. The computation illustrates the variations between two consecutive pixels. Let denote the time frame sequence for observing such that the series assignment is mapped as denoted in (2).
This series assignment as in (3) is used for assigning gates in the learning process. This assignment requires based assignment in improving the fidelity of summarization. The contrary process of active (sequence) and unclassified is performed. For this purpose, we define current and update gates for state updates. This process is explained in the following subsection.

A. Event Classification
The events are classified by variations in the observed sequences, for which the mapping in equation (2) is used. First, the gates are defined for sequence mapping as in equation (3).
In equation (3), the variables and represents the current and update states at time . Based on the further requirement, the gate states are changed and hence the classifications are performed. In the classification process, the mapping discreteness is observed for detecting active and unclassified series. This detection is performed until the end of the frame. If is the end of the frame , then: Based on the above classification, the mismatching and detection processes are differentiated. In this process, the and based mismatching for mapped instances as in (2) is performed. The two conditions for and based on requires multi-feature analysis for a gate assignment. The similarity feature is verified for the stored and acquired features from the mapping as in (3).
This similarity verification is performed for different wherein the mismatch for the above requires an alternate mapping such that is assigned with a new and is updated for . This means the variations in consecutive pixels are violated in detecting an event. The detection is performed in multiple intervals from to such that the mismatching are segregated. A contrary part of the abnormal event detection is the synchronization of the ( ) and mapping as in (2) for the different features. In the proposed method, the abnormal events at different are considered non-cumulative (due to different occurrences). Therefore, the occurrences are synchronized based on | | and until is achieved. This is validated for such that the alternating sequences are varied until the end of classification. In Fig.2, the gate assignment and classification processes are illustrated. In the classification process, as presented in Fig. 2, the and symbols represent the product and sum of the mapping presented in equation (2). First, the product represents the in to (or) to ; the sum is the joint set of and . For an abnormal event summarization, the is segregated for classifying it as a whole interval. In contrast to the augmenting and mapping processes, the variations are detected for event detection and categorization. Therefore, } In equation (6) the abnormal (Post the matching) is identified for summarization. In the summarization process, the distinct event occurrences are augmented cumulatively. The differences are mitigated without augmenting the as classified for and | | . The summarization process is described below.

B. Summarization Process
In the summarization process, the that encloses both and | | (failing) conditions are augmented based on . This is either discrete/ sequential depending on multiple updates as in and . The process requires unidentified post the gate allocation for maximizing event aggregation. If the event is observed in interrupts, then.
In equation (7), the augmenting events classified for and are identified. Depending on the { } and { } the abnormal classifications is grouped. This is required for projecting and hence the deviations are identified. The proposed method performs a cumulative augmentation of the above observation post assessment and hence the summary is an allocation of distinct and . This is non-recurrent and hence new frames in and (intermediate) are identified without false rates.

IV. EXPERIMENTAL RESULTS AND DISCUSSION
This section discusses the proposed method's performance assessment using MATLAB simulations. The dataset from UCSD [22] is used for validating the proposed method's performance for accuracy, false rate, analysis time, SSIM, and F1-Score. The inputs are classified based on the available objects; the objects are used as in the dataset labeling. Depending on 4 textural features, the classification is performed; the pixel un-matching inputs are alone mitigated. In this comparative analysis, the identified objects and state updates are varied for the proposed and existing MVS-MWAA [14] and ERA-SS [12] methods. The data set provides multiple video frames observed at 30fps in 800x480 pixel resolution. A total of 9390 frames are observed in this dataset.

A. Accuracy
In Fig. 3, the accuracy for different objects and state updates is analyzed. The proposed method maximized accuracy by improving the and detection in and and detection . This is non-recurrent based on the available in multiple such that accuracy is maximized. Another detection is the | | where multiple pixels with the observed features are validated. This is consistent for different objects identified in the frames wherein accuracy is high.

B. False Rate
The augmentation of and relies on multiple factors of and such that no error arises. The proposed requires classification based on and such that in interference is avoided. In distinct instances, interferences are modeled independently.
The gate updates are non-linear | | for unclassified such that is high. In this process, the unclassified instances are reduced which achieves less False rate (refer to Fig. 4). www.ijacsa.thesai.org

C. Analysis Time
The proposed method achieves less analysis time as the proposed method classifies and . In the active classification and estimation, independent assessments are performed.
These are validated based on the mapping and hence and as independent rather than cumulative and joint analysis. Therefore, the proposed method achieves less analysis time for identified objects and state updates (Fig. 5).

D. SSIM
In Fig. 6, the SSIM for the proposed method is compared for different objects and state updates.
In multiple state updates, the classification is performed under different . These classifications are performed for and for detecting multiple SSIM for such that is achieved in different . Fig. 6. SSIM Analysis. www.ijacsa.thesai.org This is unanimously observed for distinct intervals and objects where is less, maximizing SSIM.

E. F1-Score
For any density of objects and state updates, the F1-score is high for the proposed method (Fig. 7). The proposed method achieves a high F1score by mitigating instances. This is performed based on the classification and mapping of distinct . In the classification process, and instances are distinguished for maximizing the F1-score, preventing false rate. The comparative analysis results are tabulated in Tables I and II for different objects and state updates.   The significance of the proposed method is adaptable for varying objects and computations (state updates). In a video processing, the variations due to objects and computations are practically addressed using this proposed method. The variations are suppressed using the gate assignment and hence the accuracy, SSIM, and F1-Score are improved.

V. CONCLUSION
This article discussed an additive event summarization method for reducing the inconsistencies in video event summarization. The classification events are identified using different state assignments through gated recurrent units. The recurrent unit identifies unclassified and active frames for preventing false rates in event extraction. This classification is performed based on the heterogeneous features over the varying pixel densities over different sequences. From the analyzed sequences, the abnormal feature exhibiting pixels are segregated for providing a summarized output. For the different objects classified, the proposed method achieves 8.83% high accuracy, 11.45% high SSIM, 12.3% high F1score, 7.7% less false rate, and 8.88% less analysis time. Though the proposed method is reliable in summarizing event related to abnormal occurrences the varying textural features result in pixel errors. Therefore, a spatiotemporal feature classification pre-processing is planned to be integrated in the future work.